Apache CarbonData 2.0 开发实用系列之二:与Hive集成使用
【摘要】 在Hive中使用CarbonData
【准备CarbonData】
参考:Apache CarbonData 2.0 开发实用系列之一:与Spark SQL集成使用
【准备Hadoop】
1. 下载Hadoop 2.7.7
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/
2. 解压
tar -xvzf hadoop-2.7.7.tar.gz
cd hadoop-2.7.7
export HADOOP_HOME=$(pwd)
3. 启动dfs
参考以下链接配置启动dfs:
https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-common/SingleCluster.html
脚本摘要如下:
a) 修改etc/hadoop/hadoop-env.sh,定义如下参数:
export JAVA_HOME=<java目录>
b) 修改etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
c) 修改etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
d) ssh免密
ssh localhost
如果需要输入密码,请执行以下命令
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
e) 格式化filesystem
bin/hdfs namenode -format
f) 启动dfs
sbin/start-dfs.sh
【准备Hive】
1. 下载Hive 3.1.2
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/
2. 解压
tar -xvzf apache-hive-3.1.2-bin.tar.gz
cd apache-hive-3.1.2-bin
3. 部署CarbonData jar包
cp <carbondata jar包> lib/
4. 启动Hive
请参考以下链接配置启动Hive
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive
脚本摘要如下:
export HADOOP_HOME=<hadoop目录>
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
export HIVE_HOME=$(pwd)
$HIVE_HOME/bin/schematool -dbType derby -initSchema
$HIVE_HOME/bin/beeline -u jdbc:hive2://
【使用CarbonData】
1. 创建lineItem text表
CREATE TABLE IF NOT EXISTS lineitem_text(
l_orderkey INT,
l_partkey INT ,
l_suppkey STRING,
l_linenumber INT,
l_quantity DOUBLE,
l_extendedprice DOUBLE,
l_discount DOUBLE,
l_tax DOUBLE,
l_returnflag STRING,
l_linestatus STRING,
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct STRING,
l_shipmode STRING,
l_comment STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
2. 加载TPC-H dbgen生成的lineitem.tbl文件(或者使用附件中的lineitem.txt文件,包含1000条样例数据)
LOAD DATA LOCAL INPATH 'lineitem.tbl/txt路径'INTO TABLE lineitem_text;
3. 创建lineitem carbondata表
CREATE TABLE IF NOT EXISTS lineitem(
l_orderkey INT,
l_partkey INT ,
l_suppkey STRING,
l_linenumber INT,
l_quantity DOUBLE,
l_extendedprice DOUBLE,
l_discount DOUBLE,
l_tax DOUBLE,
l_returnflag STRING,
l_linestatus STRING,
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct STRING,
l_shipmode STRING,
l_comment STRING)
STORED BY 'org.apache.carbondata.hive.CarbonStorageHandler';
4. 入库数据
INSERT INTO lineitem SELECT * FROM lineitem_text;
5. 查询
执行TPC-H query 1:
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice*(1-l_discount)) AS sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM lineitem
WHERE l_shipdate <= date('1993-09-02')
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
作者其他文章
ypc2020/06/08 03:28:141楼编辑删除举报