华为云MRS MRS_3.1.0集群Spark&Hudi客户端融合指导书

举报
yugogo 发表于 2022/05/09 10:45:01 2022/05/09
【摘要】 1 替换hudi-archive.zip包下parquet相关jar包:parquet-column-1.12.0-hw-ei-1.0.jarparquet-common-1.12.0-hw-ei-1.0.jarparquet-encoding-1.12.0-hw-ei-1.0.jarparquet-format-structures-1.12.0-hw-ei-1.0.jarparquet-...

1 替换hudi-archive.zip包下parquet相关jar

parquet-column-1.12.0-hw-ei-1.0.jar

parquet-common-1.12.0-hw-ei-1.0.jar

parquet-encoding-1.12.0-hw-ei-1.0.jar

parquet-format-structures-1.12.0-hw-ei-1.0.jar

parquet-hadoop-1.12.0-hw-ei-1.0.jar

parquet-jackson-1.12.0-hw-ei-1.0.jar

1.1 hdfs文件系统获取hudi-archive.zip到指定目录/tmp

hadoop fs -get hdfs://hacluster/user/spark2x/jars/8.1.0/hudi-archive.zip /tmp

1.2 替换parquet

cd /tmp

unzip hudi-archive.zip -d hudi-archive

cd hudi-archive

cp /opt/Bigdata/client/Spark2x/spark/jars/parquet-* ./

zip -r hudi-archive.zip *

1.3 上传hudi-archive.zip包到新路径(为了不覆盖原生包),如hdfs://hacluster/user/spark2x/jars_new/8.1.0/

hadoop fs -mkdir -p hdfs://hacluster/user/spark2x/jars_new/8.1.0/

hadoop fs -put hudi-archive.zip hdfs://hacluster/user/spark2x/jars_new/8.1.0/

2 修改spark-defaults.conf配置文件引用新路径hudi-archive.zip压缩包

2.1 进入客户端conf目录Spark2x/spark/conf

2.2 修改spark-defaults.conf文件共四个配置复制原有配置到新一行,并注释原有配置红色为修改部分

第一处:

#spark.executor.extraClassPath =

spark.executor.extraClassPath =$PWD/hudi/*

第二处:

#spark.yarn.dist.innerarchives = hdfs://hacluster/user/spark2x/jars/8.1.0/spark-archive-2x-x86.zip#x86,hdfs://hacluster/user/spark2x/jars/8.1.0/spark-archive-2x-arm.zip#arm

spark.yarn.dist.innerarchives = hdfs://hacluster/user/spark2x/jars/8.1.0/spark-archive-2x-x86.zip#x86,hdfs://hacluster/user/spark2x/jars/8.1.0/spark-archive-2x-arm.zip#arm,hdfs://hacluster/user/spark2x/jars_new/8.1.0/hudi-archive.zip#hudi

第三处:

#spark.yarn.cluster.driver.extraClassPath = /opt/Bigdata/common/runtime/security

spark.yarn.cluster.driver.extraClassPath = $PWD/hudi/*:/opt/Bigdata/common/runtime/security

第四处:

#spark.driver.extraClassPath = /opt/Bigdata/client/Spark2x/spark/conf/:/opt/Bigdata/client/Spark2x/spark/jars/*:/opt/Bigdata/client/Spark2x/spark/x86/*

spark.driver.extraClassPath = /opt/Bigdata/client/Hudi/hudi/lib/*:/opt/Bigdata/client/Spark2x/spark/conf/:/opt/Bigdata/client/Spark2x/spark/jars/*:/opt/Bigdata/client/Spark2x/spark/x86/*


3 验证查询hudi

spark-sql --master yarn

select * from delta_demo5_ro limit 10;

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。