Windows10配置运行Hive(非WSL模式)
【摘要】 背景
相关篇:Windows10配置运行Hadoop(非WSL模式)Hive的运行依赖HDFS,需要先准备好Hadoop部分的配置。
版本
OS:Windows 10 Pro 1903Java:1.8.0_231Hive:Hive-3.1.0.tar.gz(Apache官方源下载 https://archive.apache.org/dist/hive/hive-3.1.0/ )derby:...
背景
相关篇:Windows10配置运行Hadoop(非WSL模式)
Hive的运行依赖HDFS,需要先准备好Hadoop部分的配置。
版本
OS:Windows 10 Pro 1903
Java:1.8.0_231
Hive:Hive-3.1.0.tar.gz(Apache官方源下载 https://archive.apache.org/dist/hive/hive-3.1.0/ )
derby:db-derby-10.14.2.0-bin.tar.gz(可以在官方 https://db.apache.org/derby/derby_downloads.html 下载,或是在镜像站 https://downloads.apache.org/db/derby/ 寻找下载)
前置依赖
JDK环境
Hadoop环境:参考上一篇博客
derby数据库:
轻量级数据库,作为Hive Metastore;如果有时间捣鼓MySQL可以自行替换;
解压即可运行,注意执行脚本的目录即为数据目录
启动derby参考:
# PowerShell
$env:DERBY_HOME="D:\install\db-derby-10.14.2.0-bin"
# 创建目录: D:\install\db-derby-10.14.2.0-data
cd D:\install\db-derby-10.14.2.0-data
# 注意直接使用环境变量可能不成功, 用tab补全绝对路径后即可
$env:DERBY_HOME\bin\startNetworkServer.bat -h 0.0.0.0
WSL / Cygwin / Git Bash
N选一,为了拥有一个类unix的环境
不像Hadoop,Hive太多地方没有做好类unix和Windows的兼容
hive在windows下的可执行文件
此处演示在WSL中下载
安装svn:
sudo apt-get install subversion
下载:
svn co https://svn.apache.org/repos/asf/hive/trunk/bin hive-bin
可选:手动逐个下载
可选:通过svn下载
配置步骤
解压Hive包
可选:使用管理员权限运行winrar解压;
非管理员解压,会导致linux软连接无法转换为windows快捷方式。但是会改成复制内容。
打开指定的tar.gz后,选择解压地址并解压
D:\install\
是我常用的软件安装目录,各自实操时注意更换自己的解压/安装目录环境变量配置
环境变量
# PowerShell
# Hadoop与JDK环境变量设置, 以防没设置
$env:JAVA_HOME="D:\install\Java\jdk1.8.0_231"
$env:HADOOP_HOME="D:\install\Hadoop-3.1.1\hadoop"
$env:PATH="$env:PATH;$env:JAVA_HOME\bin;$env:HADOOP_HOME\bin"
# Hive环境变量设置
$env:HIVE_HOME="D:\install\Hive-3.1.0\hive-3.1.0"
$env:DERBY_HOME="D:\install\db-derby-10.14.2.0-bin"
$env:HIVE_LIB="$env:HIVE_HOME\lib"
$env:HIVE_BIN="$env:HIVE_HOME\bin"
$env:HADOOP_USER_CLASSPATH_FIRST="true"
复制derby lib到hive lib
cp D:\install\db-derby-10.14.2.0-bin\lib\* D:\install\Hive-3.1.0\hive-3.1.0\lib
配置hive-site.xml
D:\install\Hive-3.1.0\hive-3.1.0\conf
目录下新建文件添加内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>hadoop.bin.path</name>
<value>/D:/install/Hadoop-3.1.1/hadoop/bin/hadoop.cmd</value>
</property>
<property>
<name>hive.jar.path</name>
<value>/D:/install/Hive-3.1.0/hive-3.1.0/lib/hive-exec-3.1.0.jar</value>
</property>
</configuration>
Hadoop应当启动妥当,参考上一份文档
替换bin目录下Windows执行文件
hive-bin覆盖到bin目录下初始化metastore
由于脚本是bash,只得依赖Cygwin或者WSL了 (注意使用bash环境, 而不是zsh环境)
初始化过程可能遇到问题,参考下文的 其他问题 章节
# bash, 注意不是zsh
# Hadoop与JDK环境变量
# WSL下的JDK与Windows下的有所区别, 注意替换
export JAVA_HOME=/mnt/d/install/jdk/jdk1.8.0_231
export HADOOP_HOME='/mnt/d/install/Hadoop-3.1.1/hadoop'
export PATH=$PATH:$HADOOP_HOME/bin
# Hive环境变量
export HIVE_HOME='/mnt/d/install/Hive-3.1.0/hive-3.1.0'
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*.jar
# Hive目录下的初始化脚本
cd $HIVE_HOME
./bin/schematool -dbType derby -initSchema
启动hive
这里并没有启动hiveserver2,因为启动不起来 (手动微笑)
所以只是用本地模式,主要的目的是为了方便接下来的功能性开发与测试
# PowerShell
cd $env:HIVE_HOME
.\bin\hive.cmd
可用性测试
准备数据
注意替换Windows风格的换行符, hive处理时候不能正确处理,例如行尾的整数就没法正确转换。
2,zhangsan,98
7,lisi,87
建表
create table test (
id INTEGER,
name STRING,
grade INTEGER
) row format delimited
fields terminated by ','
lines terminated by '\n';
从HDFS中读入表中
load data inpath "/tmp/samples.csv" into table 'test';
查询测试
测试过程可能遇到问题,参考下文的 其他问题 章节
select id, count(1) as cnt from test group by id;
insert into table test
select (id + 10) as id, name, grade from test;
create table test2 stored as ORC as
select (id + 10) as id, name, grade from test;
后续启动或停止
环境变量设置
# PowerShell
# Hadoop与JDK环境变量
$env:JAVA_HOME="D:\install\Java\jdk1.8.0_231"
$env:HADOOP_HOME="D:\install\Hadoop-3.1.1\hadoop"
$env:PATH="$env:PATH;$env:JAVA_HOME\bin;$env:HADOOP_HOME\bin"
# Hive环境变量
$env:HIVE_HOME="D:\install\Hive-3.1.0\hive-3.1.0"
$env:DERBY_HOME="D:\install\db-derby-10.14.2.0-bin"
$env:HIVE_LIB="$env:HIVE_HOME\lib"
$env:HIVE_BIN="$env:HIVE_HOME\bin"
$env:HADOOP_USER_CLASSPATH_FIRST="true"
启动hadoop
启动derby
启动Hive
# PowerShell
cd $env:HIVE_HOME
.\bin\hive.cmd
日志目录:
C:/Users/${Windows用户名}/AppData/Local/Temp/${Windows用户名}/beeline.log
其他问题
\r
command not found原因: Windows的换行符不兼容WSL/cygwin
替换: 例
sed -i 's/\r$//' hadoop
NumberFormatException: For input string: "0x100"
原因: 终端配置有误
配置:
export TERM=xterm-color
,可以加入.bashrcError: FUNCTION 'NUCLEUS_ASCII' already exists.
derby重复创建,到derby启动的目录删除metastore_db
Create symbolic links error 创建符号连接失败
hadoop无法创建符号链接,无法写入hdfs
开放给当前用户
hadoop配置篇也有说明
java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
只有配置在环境变量了
JAVA_HOME="D:\install\Java\jdk1.8.0_231"
HADOOP_HOME="D:\install\Hadoop-3.1.1\hadoop"
PATH="%PATH%;%JAVA_HOME%\bin;%HADOOP_HOME%\bin"
即使配置了环境变量,依然找不到HADOOP_HOME
etc/hadoop/yarn-site.xml
设置的白名单少了HADOOP_HOME (自己坑自己)放心,上一篇博文已经改了这里
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<!--<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>-->
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME,HADOOP_HOME,PATH,LANG,TZ</value>
</property>
</configuration>
找不到
hive-log4j.properties
D:\install\Hive-3.1.0\hive-3.1.0\conf
目录下新建文件hive-log4j.properties
添加内容:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
status = INFO
name = HiveLog4j2
packages = org.apache.hadoop.hive.ql.log
# list of properties
property.hive.log.level = INFO
property.hive.root.logger = DRFA
property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
property.hive.log.file = beeline.log
property.hive.perflogger.log.level = INFO
# list of all appenders
appenders = console, DRFA
# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
# daily rolling file appender
appender.DRFA.type = RollingRandomAccessFile
appender.DRFA.name = DRFA
appender.DRFA.fileName = ${sys:hive.log.dir}/${sys:hive.log.file}
# Use %pid in the filePattern to append <process-id>@<host-name> to the filename if you want separate log files for different CLI session
appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}
appender.DRFA.layout.type = PatternLayout
appender.DRFA.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
appender.DRFA.policies.type = Policies
appender.DRFA.policies.time.type = TimeBasedTriggeringPolicy
appender.DRFA.policies.time.interval = 1
appender.DRFA.policies.time.modulate = true
appender.DRFA.strategy.type = DefaultRolloverStrategy
appender.DRFA.strategy.max = 30
# list of all loggers
loggers = NIOServerCnxn, ClientCnxnSocketNIO, DataNucleus, Datastore, JPOX, PerfLogger, AmazonAws, ApacheHttp
logger.NIOServerCnxn.name = org.apache.zookeeper.server.NIOServerCnxn
logger.NIOServerCnxn.level = WARN
logger.ClientCnxnSocketNIO.name = org.apache.zookeeper.ClientCnxnSocketNIO
logger.ClientCnxnSocketNIO.level = WARN
logger.DataNucleus.name = DataNucleus
logger.DataNucleus.level = ERROR
logger.Datastore.name = Datastore
logger.Datastore.level = ERROR
logger.JPOX.name = JPOX
logger.JPOX.level = ERROR
logger.AmazonAws.name=com.amazonaws
logger.AmazonAws.level = INFO
logger.ApacheHttp.name=org.apache.http
logger.ApacheHttp.level = INFO
logger.PerfLogger.name = org.apache.hadoop.hive.ql.log.PerfLogger
logger.PerfLogger.level = ${sys:hive.perflogger.log.level}
# root logger
rootLogger.level = ${sys:hive.log.level}
rootLogger.appenderRefs = root
rootLogger.appenderRef.root.ref = ${sys:hive.root.logger}
参考文档
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)