Windows10配置运行Hive on tez
背景
前面博客已经介绍了如何配置Hadoop和Hive运行环境。即本文开始前,本地已经顺利运行Hive,并可以提交hive on mr任务。
在hive领域中,tez是其中一款比较常用的执行引擎。其支持DAG作业模式,可以将多个有依赖的作业转换为一个作业从而大幅提升DAG作业的性能。Hive也有一部分优化特性基于tez实现。
本文介绍如何让hive启动执行tez任务。步骤主要的执行环境是Windows PowerShell,但组件对Windows的支持实在不友好,有部分内容不得不使用WSL。WSL下使用Hadoop和Hive方式基本类似,后续有需要再补充。
版本
OS:Windows 10 Pro 1903
Java:1.8.0_261
Tez:apache-tez-0.9.2-bin.tar.gz
(可以在 官方主页 找到下载,也可以在 Apache Backup Site 中下载)
tomcat:8.5.60,官网下载
前置依赖
- tez版本选择
- 经过测试,3.1.0支持使用tez-0.9.2
- 其他版本需要自行测试,如发现不合适可以稍作调整
- 启动HDFS,以便上传tez lib
- WSL环境
- 支持tez ui运行,支持tar打包命令
- 如果没有WSL环境,则不必执行配置步骤6、7,影响是不能使用tez的日志web页面,看任务执行细节不太方便
配置步骤
目的:配置 tez.lib.uris
- 解压,apache-tez-0.9.2-bin.tar.gz
- 目录结构
apache-tez-0.9.2-bin
|- share
|- |- tez.tar.gz
|- conf
|- |- tez-default-template.xml
|- |- tez-runtime-default-template.xml
|- lib
|- tez-ui-0.9.2.war
|- ...
-
准备
apache-tez-0.9.2-bin/share
目录下的tez.tar.gz -
上传hdfs
# PowerShell下执行
hadoop fs -mkdir -p /apps/tez
hadoop fs -put tez.tar.gz /apps/tez
hadoop fs -ls /apps/tez
# WSL模式安装的Hadoop, 执行命令类似
- 新建配置tez-site.xml
先放在
apache-tez-0.9.2-bin/conf
目录下
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>false</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez/tez.tar.gz</value>
</property>
<!-- Optional: tez ui related -->
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>tez.tez-ui.history-url.base</name>
<value>http://localhost:9999/</value>
</property>
</configuration>
- 修改配置yarn-site.xml
$HADOOP_HOME/etc/hadoop
目录下
<!-- Optional: tez ui related -->
<!-- timeline server, allow tez ui visit -->
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<!-- log aggregation, collect for timeline server -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- jobhistory server url, for log proxy -->
<property>
<name>yarn.log.server.url</name>
<value>http://localhost:19888/jobhistory/logs</value>
</property>
<!-- unchecked virtual memory limits -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
- 可选配置:要求WSL环境支持
- 可选,启动timeline server:http://localhost:8188/
./sbin/yarn-daemon.sh start timelineserver
- 可选,启动jobhistory server:http://localhost:19888/
./sbin/mr-jobhistory-daemon.sh start historyserver
- 可选启动tez ui:要求WSL环境支持
- 准备tez目录下的tez-ui-0.9.2.war
- 解压apache-tomcat-8.5.60
- 配置:
conf/server.xml
<Connector port="9999" protocol="HTTP/1.1"
-
添加到webapps
- 清空webapps(先备份再清空)
cp tez-ui-0.9.2.war ROOT.war
-
启动
./bin/startup.sh # 启动第一次,以解压war
./bin/shutdown.sh # 停止,以便修改配置
- 修改配置:
webapps/ROOT/config
默认端口就是rm 8088和timeline 8188
如实际端口不对,需要自行调整
- 配置hive on tez
- 复制配置文件到hive conf
cp apache-tez-0.9.2-bin/conf/tez-site.xml $HIVE_HOME/conf/
这就不必修改HADOOP_CLASSPATH来添加tez-site.xml文件,因为启动hive时默认会加载hive conf目录
- 添加配置(tez-site.xml)
<property>
<name>hive.tez.container.size</name>
<value>4</value>
</property>
- 添加配置(mapred-site.xml)
<property>
<name>mapreduce.job.counters.max</name>
<value>12000</value>
</property>
可用性测试
Windows10非WSL环境下,无tez ui
-
D:\install\apache-tez-0.9.2-bin
目录下 -
环境变量
$env:TEZ_CONF_DIR="D:\install\apache-tez-0.9.2-bin\conf"
$env:TEZ_JARS="D:\install\apache-tez-0.9.2-bin"
$env:HADOOP_CLASSPATH="$env:TEZ_CONF_DIR;$env:TEZ_JARS\*;$env:TEZ_JARS\lib\*"
- 测试数据
hadoop fs -put LICENSE /tmp
- 执行样例
hadoop jar tez-examples-0.9.2.jar orderedwordcount /tmp/LICENSE /tmp/result
- 通过hdfs查看结果
hadoop fs -cat /tmp/result/part-v002-o000-r-00000
-
可选:已启动tez ui
http://localhost:8088/cluster 可以看到运行记录 -
启动时添加环境变量
export TEZ_HOME=/mnt/d/install/apache-tez-0.9.2-bin
for jar in `ls $TEZ_HOME | grep jar`; do
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar
done
hive
- 测试语句
set hive.execution.engine=tez;
-- SQL1
select a.wr_returned_date_sk, a.cnt, b.cnt
from (select wr_returned_date_sk, count(1) as cnt
from web_returns
where wr_returned_date_sk between 2452977 and 2452979
group by wr_returned_date_sk) a
join (
select wr_returned_date_sk, count(1) as cnt
from web_returns
where wr_returned_date_sk between 2452977 and 2452979
group by wr_returned_date_sk
) b on a.wr_returned_date_sk = b.wr_returned_date_sk;
-- SQL2
select *
from (select wr_returned_date_sk
from web_returns
where wr_returned_date_sk between 2452977 and 2452979
) a join (
select wr_returned_date_sk
from web_returns
where wr_returned_date_sk between 2452977 and 2452979
) b on a.wr_returned_date_sk = b.wr_returned_date_sk;
后续启动或停止
- 可选启动tez ui:要求WSL环境支持
cd $HADOOP_HOME
./sbin/start-yarn.sh
./sbin/yarn-daemon.sh start timelineserver
./sbin/mr-jobhistory-daemon.sh start historyserver
cd /mnt/d/install/apache-tomcat-tez-ui
./bin/startup.sh
- 环境变量设置
export TEZ_HOME=/mnt/d/install/apache-tez-0.9.2-bin
for jar in `ls $TEZ_HOME |grep jar`; do
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar
done
- 启动hive
hive
其他问题
- java.lang.IllegalArgumentException: Illegal Capacity: -10444
- https://issues.apache.org/jira/browse/HIVE-19918
- hive.tez.container.size
参考文档
- http://tez.apache.org/install.html
- https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez
- http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html
- https://issues.apache.org/jira/browse/YARN-9517
- https://issues.apache.org/jira/browse/YARN-4037
- https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
- https://stackoverflow.com/questions/43441437/container-is-running-beyond-virtual-memory-limits
- https://tez.apache.org/tez-ui.html
- https://support.datameer.com/hc/en-us/articles/115005289466-How-to-Enable-Tez-History-UI-for-Hadoop-
- https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez#HiveonTez-InstallationandConfiguration
- https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/ref-a80115dc-6300-4372-9c8e-6f7c0f902b92.1.html
- 点赞
- 收藏
- 关注作者
评论(0)