Apache CarbonData 2.0 开发实用系列之二:与Hive集成使用
【准备CarbonData】
参考:Apache CarbonData 2.0 开发实用系列之一:与Spark SQL集成使用
【准备Hadoop】
1. 下载Hadoop 2.7.7
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/
2. 解压
tar -xvzf hadoop-2.7.7.tar.gz cd hadoop-2.7.7 export HADOOP_HOME=$(pwd)
3. 启动dfs
参考以下链接配置启动dfs:
https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-common/SingleCluster.html
脚本摘要如下: a) 修改etc/hadoop/hadoop-env.sh,定义如下参数: export JAVA_HOME=<java目录> b) 修改etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> c) 修改etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> d) ssh免密 ssh localhost 如果需要输入密码,请执行以下命令 ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys e) 格式化filesystem bin/hdfs namenode -format f) 启动dfs sbin/start-dfs.sh
【准备Hive】
1. 下载Hive 3.1.2
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/
2. 解压
tar -xvzf apache-hive-3.1.2-bin.tar.gz cd apache-hive-3.1.2-bin
3. 部署CarbonData jar包
cp <carbondata jar包> lib/
4. 启动Hive
请参考以下链接配置启动Hive
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive
脚本摘要如下:
export HADOOP_HOME=<hadoop目录> $HADOOP_HOME/bin/hadoop fs -mkdir /tmp $HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse export HIVE_HOME=$(pwd) $HIVE_HOME/bin/schematool -dbType derby -initSchema $HIVE_HOME/bin/beeline -u jdbc:hive2://
【使用CarbonData】
1. 创建lineItem text表
CREATE TABLE IF NOT EXISTS lineitem_text( l_orderkey INT, l_partkey INT , l_suppkey STRING, l_linenumber INT, l_quantity DOUBLE, l_extendedprice DOUBLE, l_discount DOUBLE, l_tax DOUBLE, l_returnflag STRING, l_linestatus STRING, l_shipdate DATE, l_commitdate DATE, l_receiptdate DATE, l_shipinstruct STRING, l_shipmode STRING, l_comment STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
2. 加载TPC-H dbgen生成的lineitem.tbl文件(或者使用附件中的lineitem.txt文件,包含1000条样例数据)
LOAD DATA LOCAL INPATH 'lineitem.tbl/txt路径'INTO TABLE lineitem_text;
3. 创建lineitem carbondata表
CREATE TABLE IF NOT EXISTS lineitem( l_orderkey INT, l_partkey INT , l_suppkey STRING, l_linenumber INT, l_quantity DOUBLE, l_extendedprice DOUBLE, l_discount DOUBLE, l_tax DOUBLE, l_returnflag STRING, l_linestatus STRING, l_shipdate DATE, l_commitdate DATE, l_receiptdate DATE, l_shipinstruct STRING, l_shipmode STRING, l_comment STRING) STORED BY 'org.apache.carbondata.hive.CarbonStorageHandler';
4. 入库数据
INSERT INTO lineitem SELECT * FROM lineitem_text;
5. 查询
执行TPC-H query 1:
SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice*(1-l_discount)) AS sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= date('1993-09-02') GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
- 点赞
- 收藏
- 关注作者
评论(0)