【教程】大数据平台Hortonworks对接OBSFileSystem操作指南

举报
lanxinliuli 发表于 2019/06/04 10:45:39 2019/06/04
【摘要】 1 背景介绍Hortonworks公司,由Yahoo和Benchmark Capital于2011年7月联合创建,出身于名门Yahoo,Hortonworks拥有着许多Hadoop架构师和源代码贡献者,这些源代码贡献者以前均效力于Yahoo,而且已经为Apache Hadoop项目贡献了超过80%的源代码。Hortonworks 作为Apache Hadoop2.0社区的开拓者,构...

1      背景介绍

Hortonworks公司,由YahooBenchmark Capital20117月联合创建,出身于名门YahooHortonworks拥有着许多Hadoop架构师和源代码贡献者,这些源代码贡献者以前均效力于Yahoo,而且已经为Apache Hadoop项目贡献了超过80%的源代码。

Hortonworks 作为Apache Hadoop2.0社区的开拓者,构建了一套自己的Hadoop生态圈,包括存储数据的HDFS,资源管理框架YARN,计算模型MAPREDUCETEZ等,服务于数据平台的PIGHIVE&HCATALOGHBASEHDFS存储的数据通过FLUMESQOOP导入导出,集群监控AMBARI、数据生命周期管理FALCON、作业调度系统OOZIE等。

为支持HDP大数据平台使用华为云对象存储OBS进行数据存储和读写,华为云OBS推出大数据组件OBSFileSystem进行对接。

本操作指导书旨在帮助华为云用户在HDP平台上快速对接OBSFileSystem组件,更好的使用华为云对象存储OBS

 

2      部署视图

2.1      安装版本

硬件:1master+3core(配置:8U32G,操作系统:Centos7.5

软件:Ambari2.7.1.0HDP3.0.1.0

2.2      部署视图

1.png


3     Hortonworks平台对接OBS操作步骤

1.1      更新OBSFileSystem操作步骤

1.1.1        上传obsjar

1、从网址https://bbs.huaweicloud.com/forum/thread-12142-1-1.html 中下载OBSFileSystem后进行解压缩,其中Package目录中包含obs所需要的jar包,列表如下:

0.png

 

2、obs所需要的jar包放在/mnt/obsjar里面

1.1.2        增加hadoop-huaweicloudjar

1、hadoop-huaweicloudjar包放到如下目录中

命令:

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/share/hst/activity-explorer/lib/.

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/spark2/jars/.

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/tez/lib/.

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/hadoop-huaweicloud-2.8.3.13.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.

ln -s /usr/hdp/3.0.1.0-187/hadoop-mapreduce/hadoop-huaweicloud-2.8.3.13.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/hadoop-huaweicloud.jar


1.1.3        增加esdk-obs-javajar

1、esdk-obs-javajar包放到如下目录

命令:

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/share/hst/activity-explorer/lib/.

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/3.0.1.0-187/spark2/jars/.

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /usr/hdp/3.0.1.0-187/tez/lib/.

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/esdk-obs-java-3.1.3.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.


1.1.4        更换okiojar

1、查找okio*jar包,记录jar包路径,备份jar

命令:

find / -name okio*

cp /usr/hdp/3.0.1.0-187/hadoop/client/okio-1.6.0.jar /mnt/oldjar/.

cp /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okio-1.4.0.jar /mnt/oldjar/.

spacer.gif

2、删除所有旧的okio*jar包,删除后请再次find确认删除完全

命令:

rm -rf /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/okio-1.6.0.jar

rm -rf /usr/hdp/3.0.1.0-187/livy2/jars/okio-1.6.0.jar

rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okio.jar

rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okio-1.6.0.jar

rm -rf /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/okio-1.6.0.jar

rm -rf /usr/hdp/3.0.1.0-187/spark2/jars/okio-1.6.0.jar

rm -rf /usr/hdp/3.0.1.0-187/hbase/lib/okio-1.6.0.jar

rm -rf /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/okio-1.4.0.jar

rm -rf /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/okio-1.4.0.jar

rm -rf /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okio-1.4.0.jar


3、将新的okiojar包放到3.1.4步骤1查找到的目录和/usr/hdp/3.0.1.0-187/hadoop- mapreduce目录中

命令:

cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/.

cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/livy2/jars/.

cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hadoop/client/.

cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/.

cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/spark2/jars/.

cp /mnt/obsjar/okio-1.14.0.jar /usr/hdp/3.0.1.0-187/hbase/lib/.

cp /mnt/obsjar/okio-1.14.0.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/okio-1.14.0.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/okio-1.14.0.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/okio-1.14.0.jar  /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.


1.1.5        更换okhttpjar

1、查找okhttp*jar包,记录jar包路径,备份jar

命令:

find / -name okhttp*

cp /usr/hdp/3.0.1.0-187/hadoop/client/okhttp-2.7.5.jar /mnt/oldjar/.

cp /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar /mnt/oldjar/.


spacer.gif

2、删除所有旧的okhttp*jar包,删除后请再次find确认删除完全

命令:

rm -rf /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/okhttp-2.7.5.jar

rm -rf /usr/hdp/3.0.1.0-187/livy2/jars/okhttp-2.7.5.jar

rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okhttp-2.7.5.jar

rm -rf /usr/hdp/3.0.1.0-187/hadoop/client/okhttp.jar

rm -rf /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/okhttp-2.7.5.jar

rm -rf /usr/hdp/3.0.1.0-187/spark2/jars/okhttp-2.7.5.jar

rm -rf /usr/hdp/3.0.1.0-187/hbase/lib/okhttp-2.7.5.jar

rm -rf /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar

rm -rf /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar

rm -rf /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/okhttp-2.4.0.jar


3、将新的okhttpjar包放到3.1.5步骤1查找到的目录和/usr/hdp/3.0.1.0-187/hadoop- mapreduce目录中

命令:

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/.

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/livy2/jars/.

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hadoop/client/.

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/.

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/spark2/jars/.

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hbase/lib/.

cp /mnt/obsjar/okhttp-3.10.0.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/okhttp-3.10.0.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/okhttp-3.10.0.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/okhttp-3.10.0.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.


1.1.6        更换java-xmlbuilderjar

1、查找java-xmlbuilder*jar包,记录jar包路径,备份jar

命令:

find / -name java-xmlbuilder*

cp /usr/lib/ambari-server/java-xmlbuilder-0.4.jar /mnt/oldjar/.


2、删除所有旧的java-xmlbuilder*jar包,删除后请再次find确认删除完全

命令:

rm -rf  /usr/lib/ambari-server/java-xmlbuilder-0.4.jar


3、将新的java-xmlbuilderjar包放到3.1.6步骤1查找到的目录中,同时还需要将新的java-xmlbuilderjar包放到3.1.4步骤1查找到的目录和/usr/hdp/3.0.1.0-187/hadoop- mapreduce目录中

命令:

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/lib/ambari-server/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/share/hst/activity-explorer/interpreter/jdbc/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/livy2/jars/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hadoop/client/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hadoop-hdfs/lib/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/spark2/jars/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hbase/lib/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /var/lib/ambari-server/resources/views/work/CAPACITY-SCHEDULER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /var/lib/ambari-server/resources/views/work/FILES{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/WEB-INF/lib/.

cp /mnt/obsjar/java-xmlbuilder-1.1.jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/.

spacer.gif

1.2      更新配置文件操作步骤

1.2.1        HDFS集群中增加配置项

1、HDFS集群CONFIGSADVANCED配置项中增加Custom core-site.xml文件中的配置项,包括:fs.obs.access.keyfs.obs.secret.keyfs.obs.endpointfs.obs.impl,其中fs.obs.access.keyfs.obs.secret.keyfs.obs.endpoint分别为用户的akskendpoint,请根据实际情况填写,fs.obs.impl的配置值为org.apache.hadoop.fs.obs.OBSFileSystem

1.png

2、重启HDFS集群

2.png

1.2.2        MapReduce2集群中增加配置项

1、MapReduce2集群CONFIGSADVANCED配置项中修改mapred-site.xml文件中的mapreduce.application.classpath配置项,添加路径:/ usr/hdp/3.0.1.0-187/hadoop-mapreduce/*

3.png

 

2、重启MapReduce2集群

4.png

1.3      使用OBS桶进行验证

1、使用hadoop命令对接OBS桶进行验证

命令:

hadoop fs -ls obs://obs-test-tmp0001/

6.png

2、使用Mapreducewordcount进行对接OBS验证

命令:

yarn jar /usr/hdp/3.0.1.0-187/hadoop-mapreduce/hadoop-mapreduce-examples.jarr wordcount obs://bms-bucket-test01/f0.txt obs://bms-bucket-test01/result10

7.png


3、使用Spark进行对接OBS验证

val df0=spark.read.option("header","false").option("delimiter","|").csv("obs://obs-bucket12345/2019/tmplog.txt")

 

df0.select(max("_c2")).show()

8.png


【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。