GaussDB目录权限问题导致DN异常
【摘要】 故障现象复现案例:集群升级完后一个DN状态为down。cm_ctl query -Cv适用版本GaussDB全部版本。业务影响所有CN/DN权限异常,会导致数据库无法正常启动。故障原因升级过程中需要重启DN,DN数据目录由于权限问题无法正常启动。处理办法步骤 1 查看DN异常节点om_agent、cm_agent、cm_server等守护进程无异常。ps ux步骤 2 查看c...
- 故障现象
复现案例:集群升级完后一个DN状态为down。
cm_ctl query -Cv

- 适用版本
GaussDB全部版本。
- 业务影响
所有CN/DN权限异常,会导致数据库无法正常启动。
- 故障原因
升级过程中需要重启DN,DN数据目录由于权限问题无法正常启动。
- 处理办法
步骤 1 查看DN异常节点om_agent、cm_agent、cm_server等守护进程无异常。
ps ux

步骤 2 查看cm_agent进程日志打印有拉起动作但是未正常拉起,排除CMA异常。
cd $GAUSSLOG/cm/cm_agent
vim cm_agent-*current.log
StartAndStop ASYN LOG: BuildStartCommand 0
StartAndStop ASYN LOG: DN START system(command:/usr/local/core/app/bin/gaussdb -D /var/lib/engine/data1/data/dn_6002 -M pending >> "/var/lib/engine/data1/log/Ruby/cm/cm_agent/system_call-current.log" 2>&1 &), try 4
StartAndStop ASYN LOG: the dn(id:6002) instance restarts counts: 66 in 10 min, 66 in hour.
DnStatus6002 ASYN ERROR: [get_connection: 1446]: fail to read pid file (/var/lib/engine/data1/data/dn_6002/postmaster.pid).
DnStatus6002 ASYN ERROR: failed to connect to datanode:/var/lib/engine/data1/data/dn_6002
DnStatus6002 ASYN ERROR: DatanodeStatusCheck failed, ret=-1
DnStatus6002 ASYN LOG: set 6002 on offline.
CheckProcess ASYN LOG: process (gaussdb) is not running, path is [[/var/lib/engine/data1/data/dn_6002]: [/var/lib/engine/data1/data/dn_6002]], haveFound is 0
CoreDumpCheck ASYN LOG: gaussdb state file "/var/lib/engine/data1/data/dn_6002/gaussdb.state" is not exist, could not get the build infomation: No such file or directory
StartAndStop ASYN ERROR: error.dn is 6002 ret=-1
TpNetReportDnGroupInfoMain ASYN LOG: tp net dn report dn group [0] to cms.
DnStatus6002 ASYN ERROR: [get_connection: 1446]: fail to read pid file (/var/lib/engine/data1/data/dn_6002/postmaster.pid).
DnStatus6002 ASYN ERROR: failed to connect to datanode:/var/lib/engine/data1/data/dn_6002
DnStatus6002 ASYN ERROR: DatanodeStatusCheck failed, ret=-1
DnStatus6002 ASYN LOG: set 6002 on offline.

步骤 3 查看system_call日志发现打印数据目录权限不足。
cd $GAUSSLOG/cm/cm_agent
vim system_call-current.log
LOG: assign g_instance_enable_dcf=0
LOG: configuration file "/var/lib/engine/data1/data/dn_6002/gaussdb.conf" contains errors; unaffected changes were applied
CAUSE: Error from config file occurs and unaffected changes were applied.
ACTION: Use correct config file.
FATAL: data directory "/var/lib/engine/data1/data/dn_6002" has group or world access
DETAIL: Permissions should be u=rwx (0700).
CAUSE: The data directory has group or world access.
ACTION: Add the user to the group or world of the data directory.

步骤 4 查看该DN数据目录权限为701异常。

步骤 5 手动将DN数据目录恢复为700权限。

步骤 6 集群状态恢复。

----结束
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)