MRS1.9.2安装Flume客户端并且对接Kafka到HDFS
【摘要】 MRS1.9.2安装Flume客户端并且对接Kafka到HDFS
目录
环境信息
huaweicloud MRS 1.9.2
Flume 1.6.0
Kafka 2.11-1.1.0
KrbServer 1.15.2
mrs-wsc-prod-node-str-coregFNv0003
10.8.2.48
Flume MonitorServer
10.8.2.115
10.8.2.40
前提条件
- 配置好具有相关权限的
Kerberos
机机类型账号,案例里面使用wsc
账号,域名是wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM
。 - 下载
wsc
用户凭据文件,目录如下[root@mrs-wsc-prod-node-str-coregFNv0003 wsc_keytab]# pwd /opt/wsc/wsc_keytab [root@mrs-wsc-prod-node-str-coregFNv0003 wsc_keytab]# ll total 8 -rw-r-----. 1 root root 1095 Feb 13 15:16 krb5.conf -rw-r-----. 1 root root 182 Feb 13 15:16 user.keytab
Kafka
创建名为event
的topic
安装Flume客户端
-
root
用户登录节点mrs-wsc-prod-node-str-coregFNv0003
-
下载
Flume
客户端,mrsmanager
-> 服务管理 ->Flume
-> 下载客户端 -> 远端主机,下载客户端到/tmp
目录[root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# cd /tmp [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# ll | grep MRS_Flume_Client.tar -rw-------. 1 root root 569569280 Mar 2 11:33 MRS_Flume_Client.tar
-
解压
MRS_Flume_Client.tar
[root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# tar -xvf MRS_Flume_Client.tar MRS_Flume_ClientConfig.tar.sha256 MRS_Flume_ClientConfig.tar [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# ll | grep MRS_Flume_Client -rw-------. 1 root root 569559040 Mar 2 11:33 MRS_Flume_ClientConfig.tar -rw-------. 1 root root 92 Mar 2 11:33 MRS_Flume_ClientConfig.tar.sha256 -rw-------. 1 root root 569569280 Mar 2 11:33 MRS_Flume_Client.tar
-
校验文件包
[root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# sha256sum -c MRS_Flume_ClientConfig.tar.sha256 MRS_Flume_ClientConfig.tar: OK
-
解压
MRS_Flume_ClientConfig.tar
[root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# tar -xvf MRS_Flume_ClientConfig.tar
-
安装客户端运行环境到新的目录
sh /tmp/MRS_Flume_ClientConfig/install.sh /opt/client/Flumeenv
查看安装输出信息,如有以下结果表示客户端运行环境安装成功:
Components client installation is complete.
-
配置环境变量
source /opt/client/Flumeenv/bigdata_env
-
解压Flume客户端
[root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# cd /tmp/MRS_Flume_ClientConfig/Flume/ [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# ll total 362336 -rw-------. 1 root root 371029086 Mar 2 11:33 FusionInsight-Flume-1.6.0.tar.gz [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# tar -xvf FusionInsight-Flume-1.6.0.tar.gz [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# ll total 362372 drwx------. 14 knox 20007 4096 Mar 8 2020 adapter drwx------. 3 knox 20007 4096 Mar 8 2020 aix drwx------. 3 knox 20007 4096 Mar 8 2020 batch_install drwx------. 11 knox 20007 4096 Mar 8 2020 flume -rw-------. 1 root root 371029086 Mar 2 11:33 FusionInsight-Flume-1.6.0.tar.gz -rwx------. 1 knox 20007 17079 Mar 8 2020 install.sh
-
安装Flume客户端
[root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# sh /tmp/MRS_Flume_ClientConfig/Flume/install.sh -d /opt/client/FlumeClient -f 10.8.2.115,10.8.2.40 -e 10.8.2.48 -n event CST 2023-03-02 14:12:20 [flume-client install]: install flume client successfully. [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# ll /opt/client/FlumeClient/fusioninsight-flume-1.6.0/ total 168 drwxr-x---. 3 root root 4096 Mar 2 14:12 bin -rwxr-x---. 1 root root 69856 Mar 2 14:12 CHANGELOG drwxr-x---. 4 root root 4096 Mar 2 14:12 conf -rwxr-x---. 1 root root 6172 Mar 2 14:12 DEVNOTES drwxr-x---. 2 root root 4096 Mar 2 14:12 inst drwxr-x---. 2 root root 16384 Mar 2 14:12 lib drwxr-x---. 2 root root 4096 Mar 2 14:12 libexec -rwxr-x---. 1 root root 25903 Mar 2 14:12 LICENSE -rwxr-x---. 1 root root 249 Mar 2 14:12 NOTICE drwxr-x---. 3 root root 4096 Mar 2 14:12 plugins.d drwxr-x---. 2 root root 4096 Mar 2 14:12 plugins.s -rwxr-x---. 1 root root 1779 Mar 2 14:12 README -rwxr-x---. 1 root root 1585 Mar 2 14:12 RELEASE-NOTES drwxr-x---. 2 root root 4096 Mar 2 14:12 thirdparty drwxr-x---. 2 root root 4096 Mar 2 14:12 tools
各参数说明如下:
- “-d”:表示Flume客户端安装路径。
- “-f”:可选参数,表示两个MonitorServer角色的业务IP地址,中间用英文逗号分隔,若不设置则Flume客户端将不向MonitorServer发送告警信息,同时在MRS Manager界面上看不到该客户端的相关信息。
- “-c”:可选参数,表示Flume客户端在安装后默认加载的配置文件“properties.properties”。如不添加参数,默认使用客户端安装目录的“fusioninsight-flume-1.6.0/conf/properties.properties”。客户端中配置文件为空白模板,根据业务需要修改后Flume客户端将自动加载。
- “-l”:可选参数,表示日志目录,默认值为“/var/log/Bigdata”。
- “-e”:可选参数,表示Flume实例的业务IP地址,主要用于接收客户端上报的监控指标信息。
- “-n”:可选参数,表示自定义的Flume客户端的名称。
- IBM的JDK不支持“-Xloggc”,需要修改“flume/conf/flume-env.sh”,将“-Xloggc”修改为“-Xverbosegclog”,若JDK为32位,“-Xmx”不能大于3.25GB。
- “flume/conf/flume-env.sh”中,“-Xmx”默认为4GB。若客户端机器内存过小,可调整为512M甚至1GB。
配置flume客户端
- 拷贝配置文件
cp /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf cp /opt/client/HDFS/hadoop/etc/hadoop/hdfs-site.xml /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf cp /opt/Bigdata/MRS_1.9.2/1_7_Flume/etc/jaas.conf /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf
- 获取Kerberos用户域名
[root@mrs-wsc-prod-node-str-coregFNv0003 FlumeClient]# kinit -kt /opt/wsc/wsc_keytab/user.keytab wsc [root@mrs-wsc-prod-node-str-coregFNv0003 FlumeClient]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM Valid starting Expires Service principal 03/02/2023 14:54:20 03/03/2023 14:54:20 krbtgt/55D70652_C5FA_400F_BABB_292A75A0B88D.COM@55D70652_C5FA_400F_BABB_292A75A0B88D.COM [root@mrs-wsc-prod-node-str-coregFNv0003 FlumeClient]#
- 配置认证凭据
修改/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/opt/wsc/wsc_keytab/user.keytab" principal="wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM" storeKey=true useTicketCache=false; }; Client { com.sun.security.auth.module.Krb5LoginModule required storeKey=true principal="wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM" useTicketCache=false keyTab="/opt/wsc/wsc_keytab/user.keytab" debug=true useKeyTab=true; };
- 修改配置文件
flume-env.sh
文件所在目录/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/flume-env.sh
,在-XX:+UseCMSCompactAtFullCollection
后面,增加以下内容:-Djava.security.krb5.conf=/opt/wsc/wsc_keytab/krb5.conf -Djava.security.auth.login.config=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf -Dzookeeper.request.timeout=120000
- 重启
Flume
客户端[root@mrs-wsc-prod-node-str-coregFNv0003 fusioninsight-flume-1.6.0]# cd /opt/client/FlumeClient/fusioninsight-flume-1.6.0/bin/ [root@mrs-wsc-prod-node-str-coregFNv0003 bin]# ./flume-manage.sh restart Stop Flume PID=30402 successful. Start flume successfully,pid=24955.
从Kafka采集日志保存到HDFS
- 修改
Flume
客户端配置文件/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/properties.properties
##组件 client.sources = kafka_source client.channels = file_channel client.sinks = hdfs_sink ## 拼装 client.sources.kafka_source.channels = file_channel client.sinks.hdfs_sink.channel= file_channel ## source1 client.sources.kafka_source.type = org.apache.flume.source.kafka.KafkaSource # 批量写入channel的最大消息数 client.sources.kafka_source.batchSize = 200 # 等待批量写入channel的最长时间,这个参数和batchSize两个参数只要有一个满足都会触发批量写入channel操作,单位:毫秒 client.sources.kafka_source.batchDurationMillis = 10000 client.sources.kafka_source.kafka.bootstrap.servers = 10.8.2.224:21007,10.8.2.48:21007,10.8.2.50:21007 client.sources.kafka_source.kafka.topics = event client.sources.kafka_source.kafka.consumer.group.id= flume_group client.sources.kafka.security.protocol = SASL_PLAINTEXT # 当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,从头开始消费 client.sources.kafka_source.kafka.consumer.auto.offset.reset = earliest client.sources.kafka_source.kafka.kerberos.domain.name = hadoop.55d70652_c5fa_400f_babb_292a75a0b88d.com ## channel1 client.channels.file_channel.type = file client.channels.file_channel.checkpointDir = /tmp/flume/check client.channels.file_channel.dataDirs = /tmp/flume/data client.channels.file_channel.maxFileSize = 1024 client.channels.file_channel.capacity = 500 client.channels.file_channel.transactionCapacity = 250 client.channels.file_channel.keep-alive = 6 ## sinhdfs_sink client.sinks.hdfs_sink.type = hdfs client.sinks.hdfs_sink.hdfs.path = hdfs://hacluster/wsc/flume client.sinks.hdfs_sink.hdfs.fileType= DataStream client.sinks.hdfs_sink.hdfs.filePrefix = 20%y%m%d/event.data client.sinks.hdfs_sink.hdfs.round = false client.sinks.hdfs_sink.hdfs.idleTimeout = 60 client.sinks.hdfs_sink.hdfs.useLocalTimeStamp = false client.sinks.hdfs_sink.hdfs.minBlockReplicas = 1 client.sinks.hdfs_sinkhdfs.kerberosPrincipal = client.sinks.hdfs_sinkhdfs.kerberosKeytab = /opt/wsc/wsc_keytab/user.keytab client.sinks.hdfs_sink.hdfs.rollInterval = 0 client.sinks.hdfs_sink.hdfs.rollSize = 262144000 client.sinks.hdfs_sink.hdfs.rollCount = 0
client.sources.obs.kafka.kerberos.domain.name
配置可以从hosts
文件中找到hadoop
开头的[root@mrs-wsc-prod-node-str-coregFNv0003 conf]# cat /etc/hosts ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 1.1.1.1 hadoop.55d70652_c5fa_400f_babb_292a75a0b88d.com 1.1.1.1 hadoop.hadoop.com 1.1.1.1 hacluster 1.1.1.1 haclusterX 1.1.1.1 haclusterX1 1.1.1.1 haclusterX2 1.1.1.1 haclusterX3 1.1.1.1 haclusterX4 1.1.1.1 ClusterX 1.1.1.1 manager 10.8.2.115 mrs-wsc-prod-node-master1LmWV.mrs-tdki.com 10.8.2.40 mrs-wsc-prod-node-master2uFTq.mrs-tdki.com 10.8.2.48 mrs-wsc-prod-node-str-coregFNv0003.mrs-tdki.com 10.8.2.50 mrs-wsc-prod-node-str-coregFNv0001.mrs-tdki.com 10.8.2.224 mrs-wsc-prod-node-str-coregFNv0002.mrs-tdki.com 10.8.2.45 mrs-wsc-prod-node-ana-coreZsWm0001.mrs-tdki.com 10.8.2.130 mrs-wsc-prod-node-ana-coreZsWm0003.mrs-tdki.com 10.8.2.36 mrs-wsc-prod-node-ana-coreZsWm0002.mrs-tdki.com
查看Flume客户端信息
通过过滤启动Flume客户端进程关键信息,拿到Flume客户端启动所有参数,比如日志地址。
# 这里通过keytab来过滤
[root@mrs-wsc-prod-node-str-coregFNv0003 ~]# ps -ef | grep wsc_keytab
root 24883 18048 0 11:10 pts/1 00:00:00 grep --color=auto wsc_keytab
root 25700 1 0 Mar02 ? 00:17:04 /opt/client/Flumeenv/JDK/jdk/bin/java -XX:OnOutOfMemoryError=bash /opt/client/FlumeClient/fusioninsight-flume-1.6.0/bin/out_memory_error.sh /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf %p -Xms2G -Xmx4G -Dcom.amazonaws.sdk.disableCertChecking=true -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -Djava.security.krb5.conf=/opt/wsc/wsc_keytab/krb5.conf -Djava.security.auth.login.config=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf -Dzookeeper.request.timeout=120000 -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/Bigdata/flume-client-3/flume/flume-root-20230302173055-%p-gc.log -Dflume.instance.id=2966749457 -Dflume.agent.name=event -Dflume.role=client -Dlog4j.configuration.watch=true -Dlog4j.configuration=log4j.properties -Dflume_log_dir=/var/log/Bigdata/flume-client-3/flume/ -Dbeetle.application.home.path=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/service -Dflume.called.from.service -Dflume.conf.dir=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf -Dflume.metric.conf.dir=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf -Dflume.script.home=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/bin -cp /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf:/opt/client/FlumeClient/fusioninsight-flume-1.6.0/lib/*:/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/service/ -Djava.library.path=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/plugins.d/native/native org.apache.flume.node.Application --conf-file /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/properties.properties --name client
参考文献
- [1] huaweicloud. 安装MRS 3.x之前版本Flume客户端[DB/OL]. (2022-10-10) [2023-03-02]. https://support.huaweicloud.com/usermanual-mrs/mrs_01_1594.html.
- [2] 一夜. flume读取kafka的数据写入到HDFS[DB/OL]. (2020-12-22) [2023-03-02]. https://bbs.huaweicloud.com/forum/thread-96691-1-1.html.
- [3] 写Scala的老刘. kafka auto.offset.reset值失效 earliest/latest详解[EB/OL]. (2018-12-27) [2023-03-06]. https://blog.csdn.net/qq_40625030/article/details/85280013.
修改历史
- 2023-03-02T18:02:00+08:00 - 创建
- 2023-03-06T11:17:00+08:00 - 添加Kafka部分
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)