GaussDB(DWS)集群后台UDF进程异常
【摘要】 本帖通过简单示例介绍UDF进程异常的排查和处理方式。
背景:使用UDF出现报错,后台查看集群UDF进程发现进程异常。本帖通过简单示例介绍UDF进程异常的排查和处理方式,示例中用到的数据、路径配置、主机名等信息均为测试环境信息。
首先,通过以下命令查看集群UDF进程状态:
cm_ctl query -CvF
结果如下:
[ Fenced UDF State ]
node state
--------------------
1 ASG003 Down
2 host17967 Down
3 host17995 Down
发现UDF进程处于异常状态,查看cm_agent日志(在$GAUSSLOG/cm/cm_agent目录下)发现如下内容:
StartAndStop LOG: process (secbox) is not running, path is xxxx, have_found is 0
该日志说明secbox.conf的配置存在问题,进入$GAUSSHOME/secbox路径下,查看secbox.conf配置:
# read/write src_path [dst_path]
[mount_path] read /dev
[mount_path] read /sys
[mount_path] read /bin
[mount_path] read /sbin
[mount_path] read /usr/bin
[mount_path] read /lib
[mount_path] read /lib64
[mount_path] read /usr/lib
[mount_path] read /usr/lib64
[mount_path] read /usr/local
[mount_path] read /usr/share
[mount_path] read /etc
[mount_path] read /var
[mount_path] read /var/log
[mount_path] read /var/lkp
发现该配置中/var/lkp路径不存在,导致UDF进程拉起失败,注释或删除此条配置,几秒后观察本节点UDF进程恢复为Normal状态:
[ Fenced UDF State ]
node state
--------------------
1 ASG003 Down
2 host17967 Normal
3 host17995 Down
说明配置已生效,按照同样的方法对其他节点修改secbox.conf文件,问题解决。
[ Fenced UDF State ]
node state
--------------------
1 ASG003 Normal
2 host17967 Normal
3 host17995 Normal
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)