自建Spark集群log4j日志配置
【摘要】 在自建Spark集群安装完并完成Yarn Log日志配置后,Task在Yarn上的日志的stderr与stdout显示有异常,并且如与OBS相关log4j的日志打印不显示。
一、背景:
二、根因分析:
(1)driver、executor相关的log4j***.properties文件配置问题;
(2)log4j相关的Jar有log4j与log4j2的使用问题。
三、问题解决
3.1 log4j***.properties文件配置
(1)在Spark客户端目录下创建log4j-executor.properties文件
vim /opt/hadoopclient/Spark/spark/conf/log4j-executor.properties
log4j.appender.sparklog.File = ${spark.yarn.app.container.log.dir}/stdout
log4j.rootCategory = INFO,sparklog
spark.executor.log.level = INFO
log4j.appender.sparklog.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss,SSS} | %-5p | [%t] | %m | %l%n
log4j.logger.org.sparkproject.jetty.util.component.AbstractLifeCycle = ERROR
log4j.logger.org.sparkproject.jetty = WARN
log4j.appender.sparklog = org.apache.log4j.RollingFileAppender
log4j.appender.sparklog.Append = true
log4j.appender.sparklog.layout = org.apache.log4j.PatternLayout
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper = INFO
log4j.appender.sparklog.MaxBackupIndex = 10
log4j.appender.sparklog.MaxFileSize = 50MB
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter = INFO
(2)在Spark客户端目录下创建log4j.properties文件
vim /opt/hadoopclient/Spark/spark/conf/log4j.properties
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
(3)配置Spark客户端目录下的spark-defaults.conf。
vim /opt/hadoopclient/Spark/spark/conf/spark-defaults.conf
修改如下几个配置:
spark.executor.extraJavaOptions = -Dlog4j.configuration=./log4j-executor.properties -XX:CompressedClassSpaceSize=512m -verbose:gc -XX:+PrintGCDetails
spark.driver.extraJavaOptions = -Dlog4j.configuration=file:/opt/hadoopclient/Spark/spark/conf/log4j.properties -Djetty.version=x.y.z -XX:CompressedClassSpaceSize=512m
spark.yarn.am.extraJavaOptions = -Dlog4j.configuration=./log4j-executor.properties
spark.yarn.cluster.driver.extraJavaOptions = -Dlog4j.configuration=./log4j-executor.properties
3.2 log4j相关jar包清理
Spark客户端jars目录下面有如下log4j相关的jar包
log4j-1.2.17-atlassian-13.jar
log4j-api-2.10.0.jar
log4j-core-2.10.0.jar
其中log4j-api-2.10.0.jar与log4j-core-2.10.0.jar是针对log4j2的,如果有该jar包,则会优先使用log4j2,但是又没有log4j2相关的配置,导致log4j的日志就无法打印了。
直接将jars目录下的log4j-api-2.10.0.jar与log4j-core-2.10.0.jar删除即可。
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)