MRS1.9.2安装Flume客户端并且对接Kafka到HDFS

举报
yd_254158608 发表于 2023/03/06 11:20:30 2023/03/06
【摘要】 MRS1.9.2安装Flume客户端并且对接Kafka到HDFS

目录

环境信息

  • huaweicloud MRS 1.9.2
  • Flume 1.6.0
  • Kafka 2.11-1.1.0
  • KrbServer 1.15.2
  • mrs-wsc-prod-node-str-coregFNv0003
    • 10.8.2.48
  • Flume MonitorServer
    • 10.8.2.115
    • 10.8.2.40

前提条件

  1. 配置好具有相关权限的 Kerberos 机机类型账号,案例里面使用 wsc 账号,域名是 wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM
  2. 下载 wsc 用户凭据文件,目录如下
    [root@mrs-wsc-prod-node-str-coregFNv0003 wsc_keytab]# pwd
    /opt/wsc/wsc_keytab
    [root@mrs-wsc-prod-node-str-coregFNv0003 wsc_keytab]# ll
    total 8
    -rw-r-----. 1 root root 1095 Feb 13 15:16 krb5.conf
    -rw-r-----. 1 root root  182 Feb 13 15:16 user.keytab
    
  3. Kafka 创建名为 eventtopic

安装Flume客户端

  1. root 用户登录节点 mrs-wsc-prod-node-str-coregFNv0003

  2. 下载 Flume 客户端,mrsmanager -> 服务管理 -> Flume -> 下载客户端 -> 远端主机,下载客户端到 /tmp 目录

    [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# cd /tmp
    [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# ll | grep MRS_Flume_Client.tar
    -rw-------. 1 root root  569569280 Mar  2 11:33 MRS_Flume_Client.tar
    
  3. 解压 MRS_Flume_Client.tar

    [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# tar -xvf MRS_Flume_Client.tar
    MRS_Flume_ClientConfig.tar.sha256
    MRS_Flume_ClientConfig.tar
    [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# ll | grep MRS_Flume_Client
    -rw-------. 1 root root  569559040 Mar  2 11:33 MRS_Flume_ClientConfig.tar
    -rw-------. 1 root root         92 Mar  2 11:33 MRS_Flume_ClientConfig.tar.sha256
    -rw-------. 1 root root  569569280 Mar  2 11:33 MRS_Flume_Client.tar
    
  4. 校验文件包

    [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# sha256sum -c MRS_Flume_ClientConfig.tar.sha256
    MRS_Flume_ClientConfig.tar: OK
    
  5. 解压 MRS_Flume_ClientConfig.tar

    [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# tar -xvf MRS_Flume_ClientConfig.tar
    
  6. 安装客户端运行环境到新的目录

    sh /tmp/MRS_Flume_ClientConfig/install.sh /opt/client/Flumeenv
    

    查看安装输出信息,如有以下结果表示客户端运行环境安装成功:

    Components client installation is complete.
    
  7. 配置环境变量

    source /opt/client/Flumeenv/bigdata_env
    
  8. 解压Flume客户端

     [root@mrs-wsc-prod-node-str-coregFNv0003 tmp]# cd /tmp/MRS_Flume_ClientConfig/Flume/
     [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# ll
     total 362336
     -rw-------. 1 root root 371029086 Mar  2 11:33 FusionInsight-Flume-1.6.0.tar.gz
     [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# tar -xvf FusionInsight-Flume-1.6.0.tar.gz
     [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# ll
     total 362372
     drwx------. 14 knox 20007      4096 Mar  8  2020 adapter
     drwx------.  3 knox 20007      4096 Mar  8  2020 aix
     drwx------.  3 knox 20007      4096 Mar  8  2020 batch_install
     drwx------. 11 knox 20007      4096 Mar  8  2020 flume
     -rw-------.  1 root root  371029086 Mar  2 11:33 FusionInsight-Flume-1.6.0.tar.gz
     -rwx------.  1 knox 20007     17079 Mar  8  2020 install.sh
    
  9. 安装Flume客户端

    [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# sh /tmp/MRS_Flume_ClientConfig/Flume/install.sh -d /opt/client/FlumeClient -f 10.8.2.115,10.8.2.40 -e 10.8.2.48 -n event
    CST 2023-03-02 14:12:20 [flume-client install]: install flume client successfully.
    [root@mrs-wsc-prod-node-str-coregFNv0003 Flume]# ll /opt/client/FlumeClient/fusioninsight-flume-1.6.0/
    total 168
    drwxr-x---. 3 root root  4096 Mar  2 14:12 bin
    -rwxr-x---. 1 root root 69856 Mar  2 14:12 CHANGELOG
    drwxr-x---. 4 root root  4096 Mar  2 14:12 conf
    -rwxr-x---. 1 root root  6172 Mar  2 14:12 DEVNOTES
    drwxr-x---. 2 root root  4096 Mar  2 14:12 inst
    drwxr-x---. 2 root root 16384 Mar  2 14:12 lib
    drwxr-x---. 2 root root  4096 Mar  2 14:12 libexec
    -rwxr-x---. 1 root root 25903 Mar  2 14:12 LICENSE
    -rwxr-x---. 1 root root   249 Mar  2 14:12 NOTICE
    drwxr-x---. 3 root root  4096 Mar  2 14:12 plugins.d
    drwxr-x---. 2 root root  4096 Mar  2 14:12 plugins.s
    -rwxr-x---. 1 root root  1779 Mar  2 14:12 README
    -rwxr-x---. 1 root root  1585 Mar  2 14:12 RELEASE-NOTES
    drwxr-x---. 2 root root  4096 Mar  2 14:12 thirdparty
    drwxr-x---. 2 root root  4096 Mar  2 14:12 tools
    

    各参数说明如下:

    • “-d”:表示Flume客户端安装路径。
    • “-f”:可选参数,表示两个MonitorServer角色的业务IP地址,中间用英文逗号分隔,若不设置则Flume客户端将不向MonitorServer发送告警信息,同时在MRS Manager界面上看不到该客户端的相关信息。
    • “-c”:可选参数,表示Flume客户端在安装后默认加载的配置文件“properties.properties”。如不添加参数,默认使用客户端安装目录的“fusioninsight-flume-1.6.0/conf/properties.properties”。客户端中配置文件为空白模板,根据业务需要修改后Flume客户端将自动加载。
    • “-l”:可选参数,表示日志目录,默认值为“/var/log/Bigdata”。
    • “-e”:可选参数,表示Flume实例的业务IP地址,主要用于接收客户端上报的监控指标信息。
    • “-n”:可选参数,表示自定义的Flume客户端的名称。
    • IBM的JDK不支持“-Xloggc”,需要修改“flume/conf/flume-env.sh”,将“-Xloggc”修改为“-Xverbosegclog”,若JDK为32位,“-Xmx”不能大于3.25GB。
    • “flume/conf/flume-env.sh”中,“-Xmx”默认为4GB。若客户端机器内存过小,可调整为512M甚至1GB。

配置flume客户端

  1. 拷贝配置文件
    cp /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf
    cp /opt/client/HDFS/hadoop/etc/hadoop/hdfs-site.xml /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf
    cp /opt/Bigdata/MRS_1.9.2/1_7_Flume/etc/jaas.conf /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf
    
  2. 获取Kerberos用户域名
    [root@mrs-wsc-prod-node-str-coregFNv0003 FlumeClient]# kinit -kt /opt/wsc/wsc_keytab/user.keytab wsc
    [root@mrs-wsc-prod-node-str-coregFNv0003 FlumeClient]# klist
    Ticket cache: FILE:/tmp/krb5cc_0
    Default principal: wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM
    
    Valid starting       Expires              Service principal
    03/02/2023 14:54:20  03/03/2023 14:54:20  krbtgt/55D70652_C5FA_400F_BABB_292A75A0B88D.COM@55D70652_C5FA_400F_BABB_292A75A0B88D.COM
    [root@mrs-wsc-prod-node-str-coregFNv0003 FlumeClient]#
    
  3. 配置认证凭据
    修改 /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf
     KafkaClient {
     com.sun.security.auth.module.Krb5LoginModule required
     useKeyTab=true
     keyTab="/opt/wsc/wsc_keytab/user.keytab"
     principal="wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM"
     storeKey=true
     useTicketCache=false;
     };
    
     Client {
     com.sun.security.auth.module.Krb5LoginModule required
     storeKey=true
     principal="wsc@55D70652_C5FA_400F_BABB_292A75A0B88D.COM"
     useTicketCache=false
     keyTab="/opt/wsc/wsc_keytab/user.keytab"
     debug=true
     useKeyTab=true;
     };
    
  4. 修改配置文件 flume-env.sh
    文件所在目录 /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/flume-env.sh,在 -XX:+UseCMSCompactAtFullCollection后面,增加以下内容:
    -Djava.security.krb5.conf=/opt/wsc/wsc_keytab/krb5.conf -Djava.security.auth.login.config=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf -Dzookeeper.request.timeout=120000
    
  5. 重启 Flume 客户端
    [root@mrs-wsc-prod-node-str-coregFNv0003 fusioninsight-flume-1.6.0]# cd /opt/client/FlumeClient/fusioninsight-flume-1.6.0/bin/
    [root@mrs-wsc-prod-node-str-coregFNv0003 bin]# ./flume-manage.sh restart
    Stop Flume PID=30402 successful.
    Start flume successfully,pid=24955.
    

从Kafka采集日志保存到HDFS

  1. 修改 Flume 客户端配置文件 /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/properties.properties
     ##组件
     client.sources = kafka_source
     client.channels = file_channel
     client.sinks = hdfs_sink
    
     ## 拼装
     client.sources.kafka_source.channels = file_channel
     client.sinks.hdfs_sink.channel= file_channel
    
     ## source1
     client.sources.kafka_source.type = org.apache.flume.source.kafka.KafkaSource
     # 批量写入channel的最大消息数
     client.sources.kafka_source.batchSize = 200
     # 等待批量写入channel的最长时间,这个参数和batchSize两个参数只要有一个满足都会触发批量写入channel操作,单位:毫秒
     client.sources.kafka_source.batchDurationMillis = 10000
     client.sources.kafka_source.kafka.bootstrap.servers = 10.8.2.224:21007,10.8.2.48:21007,10.8.2.50:21007
     client.sources.kafka_source.kafka.topics = event
     client.sources.kafka_source.kafka.consumer.group.id= flume_group
     client.sources.kafka.security.protocol = SASL_PLAINTEXT
     # 当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,从头开始消费 
     client.sources.kafka_source.kafka.consumer.auto.offset.reset = earliest
     client.sources.kafka_source.kafka.kerberos.domain.name = hadoop.55d70652_c5fa_400f_babb_292a75a0b88d.com
    
     ## channel1
     client.channels.file_channel.type = file
     client.channels.file_channel.checkpointDir = /tmp/flume/check
     client.channels.file_channel.dataDirs = /tmp/flume/data
     client.channels.file_channel.maxFileSize = 1024
     client.channels.file_channel.capacity = 500
     client.channels.file_channel.transactionCapacity = 250
     client.channels.file_channel.keep-alive = 6
    
     ## sinhdfs_sink
     client.sinks.hdfs_sink.type = hdfs
     client.sinks.hdfs_sink.hdfs.path = hdfs://hacluster/wsc/flume
     client.sinks.hdfs_sink.hdfs.fileType= DataStream
     client.sinks.hdfs_sink.hdfs.filePrefix = 20%y%m%d/event.data
     client.sinks.hdfs_sink.hdfs.round = false
     client.sinks.hdfs_sink.hdfs.idleTimeout = 60
     client.sinks.hdfs_sink.hdfs.useLocalTimeStamp = false
     client.sinks.hdfs_sink.hdfs.minBlockReplicas = 1
     client.sinks.hdfs_sinkhdfs.kerberosPrincipal =
     client.sinks.hdfs_sinkhdfs.kerberosKeytab = /opt/wsc/wsc_keytab/user.keytab
     client.sinks.hdfs_sink.hdfs.rollInterval = 0
     client.sinks.hdfs_sink.hdfs.rollSize = 262144000
     client.sinks.hdfs_sink.hdfs.rollCount = 0
    
    client.sources.obs.kafka.kerberos.domain.name 配置可以从 hosts 文件中找到 hadoop 开头的
    [root@mrs-wsc-prod-node-str-coregFNv0003 conf]# cat /etc/hosts
    ::1     localhost       localhost.localdomain   localhost6      localhost6.localdomain6
    127.0.0.1       localhost       localhost.localdomain   localhost4      localhost4.localdomain4
    1.1.1.1 hadoop.55d70652_c5fa_400f_babb_292a75a0b88d.com
    1.1.1.1    hadoop.hadoop.com
    1.1.1.1    hacluster
    1.1.1.1    haclusterX
    1.1.1.1    haclusterX1
    1.1.1.1    haclusterX2
    1.1.1.1    haclusterX3
    1.1.1.1    haclusterX4
    1.1.1.1    ClusterX
    1.1.1.1    manager
    10.8.2.115 mrs-wsc-prod-node-master1LmWV.mrs-tdki.com
    10.8.2.40 mrs-wsc-prod-node-master2uFTq.mrs-tdki.com
    10.8.2.48 mrs-wsc-prod-node-str-coregFNv0003.mrs-tdki.com
    10.8.2.50 mrs-wsc-prod-node-str-coregFNv0001.mrs-tdki.com
    10.8.2.224 mrs-wsc-prod-node-str-coregFNv0002.mrs-tdki.com
    10.8.2.45 mrs-wsc-prod-node-ana-coreZsWm0001.mrs-tdki.com
    10.8.2.130 mrs-wsc-prod-node-ana-coreZsWm0003.mrs-tdki.com
    10.8.2.36 mrs-wsc-prod-node-ana-coreZsWm0002.mrs-tdki.com
    

查看Flume客户端信息

通过过滤启动Flume客户端进程关键信息,拿到Flume客户端启动所有参数,比如日志地址。

# 这里通过keytab来过滤
[root@mrs-wsc-prod-node-str-coregFNv0003 ~]# ps -ef | grep wsc_keytab
root     24883 18048  0 11:10 pts/1    00:00:00 grep --color=auto wsc_keytab
root     25700     1  0 Mar02 ?        00:17:04 /opt/client/Flumeenv/JDK/jdk/bin/java -XX:OnOutOfMemoryError=bash /opt/client/FlumeClient/fusioninsight-flume-1.6.0/bin/out_memory_error.sh /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf %p -Xms2G -Xmx4G -Dcom.amazonaws.sdk.disableCertChecking=true -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -Djava.security.krb5.conf=/opt/wsc/wsc_keytab/krb5.conf -Djava.security.auth.login.config=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf -Dzookeeper.request.timeout=120000 -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/Bigdata/flume-client-3/flume/flume-root-20230302173055-%p-gc.log -Dflume.instance.id=2966749457 -Dflume.agent.name=event -Dflume.role=client -Dlog4j.configuration.watch=true -Dlog4j.configuration=log4j.properties -Dflume_log_dir=/var/log/Bigdata/flume-client-3/flume/ -Dbeetle.application.home.path=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/service -Dflume.called.from.service -Dflume.conf.dir=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf -Dflume.metric.conf.dir=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf -Dflume.script.home=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/bin -cp /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf:/opt/client/FlumeClient/fusioninsight-flume-1.6.0/lib/*:/opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/service/ -Djava.library.path=/opt/client/FlumeClient/fusioninsight-flume-1.6.0/plugins.d/native/native org.apache.flume.node.Application --conf-file /opt/client/FlumeClient/fusioninsight-flume-1.6.0/conf/properties.properties --name client

参考文献

修改历史

  1. 2023-03-02T18:02:00+08:00 - 创建
  2. 2023-03-06T11:17:00+08:00 - 添加Kafka部分
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。