MRS : flume实时提交日志文件到hdfs系统
MRS : flume实时提交日志文件到hdfs系统
关键词:MRS flume hdfs Kerberos 日志文件
摘要: 本文主要介绍了在MRS集群环境,如何使用flume客户端从日志主机收集日志保存至hdfs指定文件夹
前期准备
1. 创建集群,参考https://support.huaweicloud.com/usermanual-mrs/mrs_01_0027.html.集群选择mrs2.1.0的混合集群, Kerberos认证开启,组件至少包含hadoop,flume.
2. 参考https://support.huaweicloud.com/usermanual-mrs/mrs_01_0091.html中”前提条件”小标题,创建集群外ECS日志主机节点.注意这台主机的VPC和安全组和集群保持一致.
3. 使用Flume搜集日志时,需要在日志主机上安装Flume客户端,参考链接https://support.huaweicloud.com/usermanual-mrs/mrs_01_0392.html
3.1 链接文档中没有flume客户端的下载步骤,参考下图
3.2 链接中操作步骤中14步可以参考作者执行的命令为:
sh /opt/MRS_Flume_ClientConfig/Flume/install.sh -d /opt/FlumeClient -f 172.16.0.135 -e 172.16.0.100
开发程序(参考https://support.huaweicloud.com/usermanual-mrs/mrs_01_0397.html)
1. 用户可以根据自己的需求,在MRS Manager页面创建具有合适权限的用户(作者创建的用户是”flumeuser”),下载用户的认证凭证,解压后得到krb5.conf和user.keytab文件.然后将两份文件上传到日志主机节点的flume客户端的配置目录"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"下
2. 在集群内的master节点将$HADOOP_HOME/etc/hadoop目录下的hdfs-site.xml和core-site.xml文件发送到日志主机节点的flume客户端的配置目录"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"下
3. 在MRS Manage确认flume角色所在的机器节点ip,登录这个节点,将目录/opt/Bigdata/MRS_x.x.x/1_x_Flume/etc/"下的jaas.conf文件发送到日志主机节点的flume客户端的配置目录"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"下
4. 登录到日志主机节点"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"目录下,完成配置
4.1 修改jaas.conf内容,参考下图, principal是MRS Manager创建的用户名, keyTab是keytab认证文件的路径.
4.2 保证配置文件对于程序具有访问权限,修改keytab文件的权限,例如”chmod 777 user.keytab”
4.3 修改flume-env.sh,在 “-XX:+UseCMSCompactAtFullCollection”后面,增加以下内容:” -Djava.security.krb5.conf=/opt/FlumeClient/fusioninsight-flume-1.6.0/conf/krb5.conf -Djava.security.auth.login.config=/opt/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf -Dzookeeper.request.timeout=120000”
4.4 将以下内容覆盖写入properties.properties文件中
client.sources = r1
client.sinks = k1
client.channels = c1
client.sources.r1.type = spooldir
client.sources.r1.spoolDir = /var/log/test
client.sources.r1.trackerDir = /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/trackerDir
client.sources.r1.ignorePattern = ^$
client.sources.r1.fileSuffix = .COMPLETED
client.sources.r1.maxBlobLength = 16384
client.sources.r1.batchSize = 51200
client.sources.r1.inputCharset = UTF-8
client.sources.r1.deserializer = LINE
client.sources.r1.selector.type = replicating
client.sources.r1.fileHeaderKey = file
client.sources.r1.fileHeader = false
client.sources.r1.basenameHeader = true
client.sources.r1.basenameHeaderKey = basename
client.sources.r1.deletePolicy = never
client.sources.r1.channels = c1
client.channels.c1.type = file
client.channels.c1.checkpointDir = /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/checkpointDir/
client.channels.c1.dataDirs = /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/dataDirs/
client.channels.c1.maxFileSize = 2146435071
client.channels.c1.minimumRequiredSpace = 524288000
client.channels.c1.capacity = 1000000
client.channels.c1.transactionCapacity = 10000
client.channels.c1.channelfullcount = 10
client.sinks.k1.type = hdfs
client.sinks.k1.channel = c1
client.sinks.k1.hdfs.path = /flume/file/%y-%m-%d/%H%M/
client.sinks.k1.hdfs.filePrefix = event
client.sinks.k1.hdfs.rollSize = 102400000
client.sinks.k1.hdfs.fileType= DataStream
client.sinks.k1.hdfs.rollInterval = 600
client.sinks.k1.hdfs.rollCount = 0
cient.sinks.k1.hdfs.inUseSuffix = .tmp
client.sinks.k1.hdfs.fileSuffix = .log
client.sinks.k1.hdfs.idleTimeout = 0
client.sinks.k1.hdfs.useLocalTimeStamp = true
client.sinks.k1.hdfs.round = true
client.sinks.k1.hdfs.roundValue = 10
client.sinks.k1.hdfs.roundUnit = minute
client.sinks.k1.hdfs.useLocalTimeStamp = true
client.sinks.k1.hdfs.kerberosPrincipal = flumeuser
client.sinks.k1.hdfs.kerberosKeytab= /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/user.keytab
4.5 在目录下面创建下面红框标记的文件夹
5. 在日志节点Flume客户端,执行一下命令
cd /opt/FlumeClient/fusioninsight-flume-1.6.0/bin
./flume-manage.sh restart
测试结果
1. Flume监控的日志文件是集群外日志主机节点下的 /var/log/test”目录,可以通过向此目录写入文本,然后在集群内hdfs的/flume/file目录下查看是否实时同步
- 点赞
- 收藏
- 关注作者
评论(0)