Windows10配置运行Hive(非WSL模式)
背景
相关篇:Windows10配置运行Hadoop(非WSL模式)
Hive的运行依赖HDFS,需要先准备好Hadoop部分的配置。
版本
OS:Windows 10 Pro 1903
Java:1.8.0_231
Hive:Hive-3.1.0.tar.gz(Apache官方源下载 https://archive.apache.org/dist/hive/hive-3.1.0/ )
derby:db-derby-10.14.2.0-bin.tar.gz(可以在官方 https://db.apache.org/derby/derby_downloads.html 下载,或是在镜像站 https://downloads.apache.org/db/derby/ 寻找下载)
前置依赖
JDK环境
Hadoop环境:参考上一篇博客
derby数据库:
轻量级数据库,作为Hive Metastore;如果有时间捣鼓MySQL可以自行替换;
解压即可运行,注意执行脚本的目录即为数据目录
启动derby参考:
# PowerShell $env:DERBY_HOME="D:\install\db-derby-10.14.2.0-bin" # 创建目录: D:\install\db-derby-10.14.2.0-data cd D:\install\db-derby-10.14.2.0-data # 注意直接使用环境变量可能不成功, 用tab补全绝对路径后即可 $env:DERBY_HOME\bin\startNetworkServer.bat -h 0.0.0.0
WSL / Cygwin / Git Bash
N选一,为了拥有一个类unix的环境
不像Hadoop,Hive太多地方没有做好类unix和Windows的兼容
hive在windows下的可执行文件
此处演示在WSL中下载
安装svn:
sudo apt-get install subversion
下载:
svn co https://svn.apache.org/repos/asf/hive/trunk/bin hive-bin
可选:手动逐个下载
可选:通过svn下载
配置步骤
解压Hive包
可选:使用管理员权限运行winrar解压;
非管理员解压,会导致linux软连接无法转换为windows快捷方式。但是会改成复制内容。
打开指定的tar.gz后,选择解压地址并解压
D:\install\
是我常用的软件安装目录,各自实操时注意更换自己的解压/安装目录环境变量配置
环境变量
# PowerShell # Hadoop与JDK环境变量设置, 以防没设置 $env:JAVA_HOME="D:\install\Java\jdk1.8.0_231" $env:HADOOP_HOME="D:\install\Hadoop-3.1.1\hadoop" $env:PATH="$env:PATH;$env:JAVA_HOME\bin;$env:HADOOP_HOME\bin" # Hive环境变量设置 $env:HIVE_HOME="D:\install\Hive-3.1.0\hive-3.1.0" $env:DERBY_HOME="D:\install\db-derby-10.14.2.0-bin" $env:HIVE_LIB="$env:HIVE_HOME\lib" $env:HIVE_BIN="$env:HIVE_HOME\bin" $env:HADOOP_USER_CLASSPATH_FIRST="true"
复制derby lib到hive lib
cp D:\install\db-derby-10.14.2.0-bin\lib\* D:\install\Hive-3.1.0\hive-3.1.0\lib
配置hive-site.xml
D:\install\Hive-3.1.0\hive-3.1.0\conf
目录下新建文件添加内容:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby://localhost:1527/metastore_db;create=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.ClientDriver</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> <property> <name>hive.server2.authentication</name> <value>NONE</value> </property> <property> <name>datanucleus.autoCreateTables</name> <value>True</value> </property> <property> <name>hadoop.bin.path</name> <value>/D:/install/Hadoop-3.1.1/hadoop/bin/hadoop.cmd</value> </property> <property> <name>hive.jar.path</name> <value>/D:/install/Hive-3.1.0/hive-3.1.0/lib/hive-exec-3.1.0.jar</value> </property> </configuration>
Hadoop应当启动妥当,参考上一份文档
替换bin目录下Windows执行文件
hive-bin覆盖到bin目录下初始化metastore
由于脚本是bash,只得依赖Cygwin或者WSL了 (注意使用bash环境, 而不是zsh环境)
初始化过程可能遇到问题,参考下文的 其他问题 章节
# bash, 注意不是zsh # Hadoop与JDK环境变量 # WSL下的JDK与Windows下的有所区别, 注意替换 export JAVA_HOME=/mnt/d/install/jdk/jdk1.8.0_231 export HADOOP_HOME='/mnt/d/install/Hadoop-3.1.1/hadoop' export PATH=$PATH:$HADOOP_HOME/bin # Hive环境变量 export HIVE_HOME='/mnt/d/install/Hive-3.1.0/hive-3.1.0' export PATH=$PATH:$HIVE_HOME/bin export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*.jar # Hive目录下的初始化脚本 cd $HIVE_HOME ./bin/schematool -dbType derby -initSchema
启动hive
这里并没有启动hiveserver2,因为启动不起来 (手动微笑)
所以只是用本地模式,主要的目的是为了方便接下来的功能性开发与测试
# PowerShell cd $env:HIVE_HOME .\bin\hive.cmd
可用性测试
准备数据
注意替换Windows风格的换行符, hive处理时候不能正确处理,例如行尾的整数就没法正确转换。
2,zhangsan,98 7,lisi,87
建表
create table test ( id INTEGER, name STRING, grade INTEGER ) row format delimited fields terminated by ',' lines terminated by '\n';
从HDFS中读入表中
load data inpath "/tmp/samples.csv" into table 'test';
查询测试
测试过程可能遇到问题,参考下文的 其他问题 章节
select id, count(1) as cnt from test group by id; insert into table test select (id + 10) as id, name, grade from test; create table test2 stored as ORC as select (id + 10) as id, name, grade from test;
后续启动或停止
环境变量设置
# PowerShell # Hadoop与JDK环境变量 $env:JAVA_HOME="D:\install\Java\jdk1.8.0_231" $env:HADOOP_HOME="D:\install\Hadoop-3.1.1\hadoop" $env:PATH="$env:PATH;$env:JAVA_HOME\bin;$env:HADOOP_HOME\bin" # Hive环境变量 $env:HIVE_HOME="D:\install\Hive-3.1.0\hive-3.1.0" $env:DERBY_HOME="D:\install\db-derby-10.14.2.0-bin" $env:HIVE_LIB="$env:HIVE_HOME\lib" $env:HIVE_BIN="$env:HIVE_HOME\bin" $env:HADOOP_USER_CLASSPATH_FIRST="true"
启动hadoop
启动derby
启动Hive
# PowerShell cd $env:HIVE_HOME .\bin\hive.cmd
日志目录:
C:/Users/${Windows用户名}/AppData/Local/Temp/${Windows用户名}/beeline.log
其他问题
\r
command not found原因: Windows的换行符不兼容WSL/cygwin
替换: 例
sed -i 's/\r$//' hadoop
NumberFormatException: For input string: "0x100"
原因: 终端配置有误
配置:
export TERM=xterm-color
,可以加入.bashrcError: FUNCTION 'NUCLEUS_ASCII' already exists.
derby重复创建,到derby启动的目录删除metastore_db
Create symbolic links error 创建符号连接失败
hadoop无法创建符号链接,无法写入hdfs
开放给当前用户
hadoop配置篇也有说明
java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
只有配置在环境变量了
JAVA_HOME="D:\install\Java\jdk1.8.0_231" HADOOP_HOME="D:\install\Hadoop-3.1.1\hadoop" PATH="%PATH%;%JAVA_HOME%\bin;%HADOOP_HOME%\bin"
即使配置了环境变量,依然找不到HADOOP_HOME
etc/hadoop/yarn-site.xml
设置的白名单少了HADOOP_HOME (自己坑自己)放心,上一篇博文已经改了这里
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <!--<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>--> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME,HADOOP_HOME,PATH,LANG,TZ</value> </property> </configuration>
找不到
hive-log4j.properties
D:\install\Hive-3.1.0\hive-3.1.0\conf
目录下新建文件hive-log4j.properties
添加内容:
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. status = INFO name = HiveLog4j2 packages = org.apache.hadoop.hive.ql.log # list of properties property.hive.log.level = INFO property.hive.root.logger = DRFA property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name} property.hive.log.file = beeline.log property.hive.perflogger.log.level = INFO # list of all appenders appenders = console, DRFA # console appender appender.console.type = Console appender.console.name = console appender.console.target = SYSTEM_ERR appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n # daily rolling file appender appender.DRFA.type = RollingRandomAccessFile appender.DRFA.name = DRFA appender.DRFA.fileName = ${sys:hive.log.dir}/${sys:hive.log.file} # Use %pid in the filePattern to append <process-id>@<host-name> to the filename if you want separate log files for different CLI session appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd} appender.DRFA.layout.type = PatternLayout appender.DRFA.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n appender.DRFA.policies.type = Policies appender.DRFA.policies.time.type = TimeBasedTriggeringPolicy appender.DRFA.policies.time.interval = 1 appender.DRFA.policies.time.modulate = true appender.DRFA.strategy.type = DefaultRolloverStrategy appender.DRFA.strategy.max = 30 # list of all loggers loggers = NIOServerCnxn, ClientCnxnSocketNIO, DataNucleus, Datastore, JPOX, PerfLogger, AmazonAws, ApacheHttp logger.NIOServerCnxn.name = org.apache.zookeeper.server.NIOServerCnxn logger.NIOServerCnxn.level = WARN logger.ClientCnxnSocketNIO.name = org.apache.zookeeper.ClientCnxnSocketNIO logger.ClientCnxnSocketNIO.level = WARN logger.DataNucleus.name = DataNucleus logger.DataNucleus.level = ERROR logger.Datastore.name = Datastore logger.Datastore.level = ERROR logger.JPOX.name = JPOX logger.JPOX.level = ERROR logger.AmazonAws.name=com.amazonaws logger.AmazonAws.level = INFO logger.ApacheHttp.name=org.apache.http logger.ApacheHttp.level = INFO logger.PerfLogger.name = org.apache.hadoop.hive.ql.log.PerfLogger logger.PerfLogger.level = ${sys:hive.perflogger.log.level} # root logger rootLogger.level = ${sys:hive.log.level} rootLogger.appenderRefs = root rootLogger.appenderRef.root.ref = ${sys:hive.root.logger}
参考文档
- 点赞
- 收藏
- 关注作者
评论(0)