DWS8.1.3部署问题分析分享
1、环境信息
本次部署基于统信操作系统UOS 1050e,三个节点集群部署。
[root@dws01 ~]# cat /etc/os-version
[Version]
SystemName=UnionTech OS Server
SystemName[zh_CN]=统信服务器操作系统
ProductType=Server
ProductType[zh_CN]=服务器
EditionName=e
EditionName[zh_CN]=e
MajorVersion=20
MinorVersion=1050
OsBuild=12038.103
[root@dws01 ~]# uname -a
Linux dws01 4.19.90-2201.4.0.0135.up1.uel20.x86_64 #1 SMP Mon Feb 21 18:36:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
2、DWS版本
DWS的安装包如下:
FusionInsight_MPPDB_8.1.3.6_Euler.tar.gz
FusionInsight_MPPDB_8.1.3.6_Euler.tar.gz.crl
FusionInsight_MPPDB_8.1.3.6_Euler.tar.gz.cms
FusionInsight_BASE_8.1.2.5_Euler.tar.gz.crl
FusionInsight_BASE_8.1.2.5_Euler.tar.gz.cms
FusionInsight_BASE_8.1.2.5_Euler.tar.gz
FusionInsight_UpdateService_8.1.2.5.tar.gz.crl
FusionInsight_UpdateService_8.1.2.5.tar.gz.cms
FusionInsight_UpdateService_8.1.2.5.tar.gz
FusionInsight_SetupTool_8.1.2.5.tar.gz.crl
FusionInsight_SetupTool_8.1.2.5.tar.gz.cms
FusionInsight_SetupTool_8.1.2.5.tar.gz
FusionInsight_Manager_8.1.2.5_Euler.tar.gz.crl
FusionInsight_Manager_8.1.2.5_Euler.tar.gz.cms
FusionInsight_Manager_8.1.2.5_Euler.tar.gz
DWS的内核:
[root@dws01 ~]# cat /tmp/FusionInsight_MPPDB/software/components/package/package/script/gspylib/common/VersionInfo.py |grep COMMON_VERSION
COMMON_VERSION = "(GaussDB 8.1.3 build a02a3422) compiled at 2024-02-27 17:33:25 commit 3629 last mr 5138 release"
3、问题描述
安装完成主备manager后,在WEB界面安装集群,在第8步初始化集群的时候报错,遇到的错误如下:
[2024-08-23 15:18:20]Begin to register default tenants.
[2024-08-23 15:18:20]End to register default tenants.
[2024-08-23 15:18:20]Begin to register default accounts.
[2024-08-23 15:18:21]End to register default accounts.
[2024-08-23 15:18:21]Initialize cluster for CLUSTER[name:huaao_dws].
[2024-08-23 15:18:21]Initialize service for Service[name: LdapServer, displayName: LdapServer, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize service for Service[name: meta, displayName: meta, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize service for Service[name: KrbServer, displayName: KrbServer, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize service for Service[name: MPPDB, displayName: MPPDB, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize service for Service[name: LdapClient, displayName: LdapClient, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize service for Service[name: KrbClient, displayName: KrbClient, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: SlapdServer, serviceDisplayName: LdapServer, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: meta, serviceDisplayName: meta, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: KerberosServer, serviceDisplayName: KrbServer, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: KerberosAdmin, serviceDisplayName: KrbServer, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: MPPDBServer, serviceDisplayName: MPPDB, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: SlapdClient, serviceDisplayName: LdapClient, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize role for ROLE[name: KerberosClient, serviceDisplayName: KrbClient, cluster:huaao_dws].
[2024-08-23 15:18:21]Initialize roleInstance for LdapServer#SlapdServer#172.16.1.127@dws01.
[2024-08-23 15:18:21]Initialize roleInstance for LdapServer#SlapdServer#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for meta#meta#172.16.1.129@dws03.
[2024-08-23 15:18:21]Initialize roleInstance for meta#meta#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for meta#meta#172.16.1.127@dws01.
[2024-08-23 15:18:21]Initialize roleInstance for KrbServer#KerberosServer#172.16.1.127@dws01.
[2024-08-23 15:18:21]Initialize roleInstance for KrbServer#KerberosServer#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for KrbServer#KerberosAdmin#172.16.1.127@dws01.
[2024-08-23 15:18:21]Initialize roleInstance for KrbServer#KerberosAdmin#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for MPPDB#MPPDBServer#172.16.1.127@dws01.
[2024-08-23 15:18:21]Initialize roleInstance for MPPDB#MPPDBServer#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for MPPDB#MPPDBServer#172.16.1.129@dws03.
[2024-08-23 15:18:21]Initialize roleInstance for LdapClient#SlapdClient#172.16.1.129@dws03.
[2024-08-23 15:18:21]Initialize roleInstance for LdapClient#SlapdClient#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for LdapClient#SlapdClient#172.16.1.127@dws01.
[2024-08-23 15:18:21]Initialize roleInstance for KrbClient#KerberosClient#172.16.1.129@dws03.
[2024-08-23 15:18:21]Initialize roleInstance for KrbClient#KerberosClient#172.16.1.128@dws02.
[2024-08-23 15:18:21]Initialize roleInstance for KrbClient#KerberosClient#172.16.1.127@dws01.
[2024-08-23 15:18:21]RoleInstance cleanup success for meta#meta#172.16.1.129@dws03.
[2024-08-23 15:18:21]RoleInstance cleanup success for meta#meta#172.16.1.128@dws02.
[2024-08-23 15:18:21]RoleInstance cleanup success for meta#meta#172.16.1.127@dws01.
[2024-08-23 15:18:21]RoleInstance initialization success for meta#meta#172.16.1.129@dws03.
[2024-08-23 15:18:21]RoleInstance initialization success for meta#meta#172.16.1.128@dws02.
[2024-08-23 15:18:21]RoleInstance initialization success for meta#meta#172.16.1.127@dws01.
[2024-08-23 15:18:21]Role initialization success for Service[name: meta, displayName: meta, cluster:huaao_dws].
[2024-08-23 15:18:21]Service initialization success for CLUSTER[name:huaao_dws].
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbClient#KerberosClient#172.16.1.129@dws03.
[2024-08-23 15:18:29]RoleInstance cleanup success for LdapClient#SlapdClient#172.16.1.129@dws03.
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbClient#KerberosClient#172.16.1.128@dws02.
[2024-08-23 15:18:29]RoleInstance cleanup success for LdapServer#SlapdServer#172.16.1.128@dws02.
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbServer#KerberosServer#172.16.1.128@dws02.
[2024-08-23 15:18:29]RoleInstance cleanup success for LdapClient#SlapdClient#172.16.1.128@dws02.
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbServer#KerberosAdmin#172.16.1.128@dws02.
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbServer#KerberosServer#172.16.1.127@dws01.
[2024-08-23 15:18:29]RoleInstance cleanup success for LdapServer#SlapdServer#172.16.1.127@dws01.
[2024-08-23 15:18:29]RoleInstance cleanup success for LdapClient#SlapdClient#172.16.1.127@dws01.
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbServer#KerberosAdmin#172.16.1.127@dws01.
[2024-08-23 15:18:29]RoleInstance cleanup success for KrbClient#KerberosClient#172.16.1.127@dws01.
[2024-08-23 15:18:35]RoleInstance initialization success for KrbClient#KerberosClient#172.16.1.129@dws03.
[2024-08-23 15:18:35]RoleInstance initialization success for KrbServer#KerberosServer#172.16.1.128@dws02.
[2024-08-23 15:18:35]RoleInstance initialization success for LdapServer#SlapdServer#172.16.1.128@dws02.
[2024-08-23 15:18:35]RoleInstance initialization success for KrbClient#KerberosClient#172.16.1.128@dws02.
[2024-08-23 15:18:35]RoleInstance initialization success for KrbServer#KerberosAdmin#172.16.1.128@dws02.
[2024-08-23 15:18:35]RoleInstance initialization success for KrbServer#KerberosAdmin#172.16.1.127@dws01.
[2024-08-23 15:18:35]RoleInstance initialization success for KrbClient#KerberosClient#172.16.1.127@dws01.
[2024-08-23 15:18:35]RoleInstance initialization success for LdapServer#SlapdServer#172.16.1.127@dws01.
[2024-08-23 15:18:35]Role initialization success for Service[name: KrbServer, displayName: KrbServer, cluster:huaao_dws].
[2024-08-23 15:18:35]RoleInstance initialization success for KrbServer#KerberosServer#172.16.1.127@dws01.
[2024-08-23 15:18:35]Role initialization success for Service[name: KrbClient, displayName: KrbClient, cluster:huaao_dws].
[2024-08-23 15:18:35]Role initialization success for Service[name: LdapServer, displayName: LdapServer, cluster:huaao_dws].
[2024-08-23 15:18:35]Service initialization success for CLUSTER[name:huaao_dws].
[2024-08-23 15:18:35]Service initialization success for CLUSTER[name:huaao_dws].
[2024-08-23 15:18:35]Role initialization success for Service[name: KrbServer, displayName: KrbServer, cluster:huaao_dws].
[2024-08-23 15:18:35]Service initialization success for CLUSTER[name:huaao_dws].
[2024-08-23 15:18:50]RoleInstance initialization success for LdapClient#SlapdClient#172.16.1.129@dws03.
[2024-08-23 15:18:50]RoleInstance initialization success for LdapClient#SlapdClient#172.16.1.128@dws02.
[2024-08-23 15:18:50]RoleInstance initialization success for LdapClient#SlapdClient#172.16.1.127@dws01.
[2024-08-23 15:18:50]Role initialization success for Service[name: LdapClient, displayName: LdapClient, cluster:huaao_dws].
[2024-08-23 15:18:50]Service initialization success for CLUSTER[name:huaao_dws].
[2024-08-23 15:20:02]RoleInstance cleanup success for MPPDB#MPPDBServer#172.16.1.129@dws03.
[2024-08-23 15:20:02]RoleInstance cleanup success for MPPDB#MPPDBServer#172.16.1.128@dws02.
[2024-08-23 15:20:02]RoleInstance cleanup success for MPPDB#MPPDBServer#172.16.1.127@dws01.
[2024-08-23 15:20:08]RoleInstance initialization failure [{ScriptExecutionResult=Value(scriptExecutionResult:ScriptExecutionResult [exitCode=1, output=, errMsg=Warning: Permanently added '172.16.1.127' (ECDSA) to the list of known hosts.
UnionTech OS Server 20 1050e
scp: /opt/huawei/Bigdata/mppdb/gs_install_success: No such file or directory
])}] for MPPDB#MPPDBServer#172.16.1.129@dws03.
[2024-08-23 15:20:08]RoleInstance initialization failure [{ScriptExecutionResult=Value(scriptExecutionResult:ScriptExecutionResult [exitCode=1, output=, errMsg=Warning: Permanently added '172.16.1.127' (ECDSA) to the list of known hosts.
UnionTech OS Server 20 1050e
scp: /opt/huawei/Bigdata/mppdb/gs_install_success: No such file or directory
])}] for MPPDB#MPPDBServer#172.16.1.128@dws02.
[2024-08-23 15:20:08]RoleInstance initialization failure [{ScriptExecutionResult=Value(scriptExecutionResult:ScriptExecutionResult [exitCode=1, output=, errMsg=Warning: Permanently added '172.16.1.127' (ECDSA) to the list of known hosts.
UnionTech OS Server 20 1050e
scp: /opt/huawei/Bigdata/mppdb/gs_install_success: No such file or directory
])}] for MPPDB#MPPDBServer#172.16.1.127@dws01.
[2024-08-23 15:20:08]Role initialization failure for Service[name: MPPDB, displayName: MPPDB, cluster:huaao_dws].
[2024-08-23 15:20:08]Service initialization failure for CLUSTER[name:huaao_dws].
从以上的日志来看不足以解决问题,所以需要到安装的主manager节点查看日志,注意先不用着急卸载重新安装。
4、排查路径
登录主manager节点,参考mpp安装日志
[root@dws01 scriptlog]# cat /var/log/Bigdata/mpp/scriptlog/postinstall.log
2024-08-23 16:03:08 INFO (initParameter:205) Create log file /var/log/Bigdata/mpp/scriptlog/postinstall.log
2024-08-23 16:03:08 INFO (initParameter:[mpp-postinstall.sh:98]) Init parameter successful.
2024-08-23 16:03:08 INFO (main:[mpp-postinstall.sh:1394]) parameters: MPPDBServer INSTALL
2024-08-23 16:03:08 install retry scenses , mpp cms active ip: 172.16.1.127 (main:[mpp-postinstall.sh:])
kernel.sysrq = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
kernel.dmesg_restrict = 1
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
net.core.busy_read = 100
vm.dirty_background_ratio = 30
kernel.panic_on_oops = 1
kernel.panic = 5
kernel.hung_task_timeout_secs = 3600
kernel.hung_task_panic = 1
vm.oom_dump_tasks = 1
kernel.softlockup_panic = 1
vm.swappiness = 0
fs.file-max = 640000
vm.dirty_ratio = 40
vm.max_map_count = 1048576
kernel.msgmnb = 7000000
kernel.core_pattern = /var/log/coredump/core-%t-%u-%e-%p
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_forward = 0
vm.panic_on_oom = 0
net.ipv4.neigh.default.gc_thresh1 = 1200
net.ipv4.neigh.default.gc_thresh2 = 2400
net.ipv4.neigh.default.gc_thresh3 = 4800
net.ipv4.tcp_max_tw_buckets = 10000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_retries2 = 12
kernel.sem = 250 6400000 1000 25600
net.core.wmem_max = 21299200
net.core.rmem_max = 21299200
net.core.wmem_default = 21299200
net.core.rmem_default = 21299200
net.ipv4.tcp_rmem = 8192 250000 16777216
net.ipv4.tcp_wmem = 8192 250000 16777216
net.core.somaxconn = 65535
vm.min_free_kbytes = 1619009
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_syncookies = 1
vm.overcommit_memory = 0
net.ipv4.tcp_retries1 = 5
net.ipv4.tcp_syn_retries = 5
net.ipv4.tcp_synack_retries = 5
kernel.shmmax = 18446744073709551615
kernel.shmall = 1152921504606846720
the os is not suse or redhat or centos or euleros.
2024-08-23 16:03:08 ERROR (main:[mpp-postinstall.sh:1420]) set OS parameters failed.
上面提示操作系统不支持
但是,我们在配置文档中是可以选择uos-1050的,说明可能是代码兼容性有问题。
继续看mpp-postinstall.sh程序的第1420行是什么代码
1417 sudo "${SUDO_PATH_NODEAGENT}" setOSParameters_MPPDB 2>&1 | LogPipe
1418 retVal=${PIPESTATUS[0]}
1419 if [ ${retVal} -ne 0 ]; then
1420 LOG "${LINENO}" "ERROR" "set OS parameters failed."
1421 return 1
1422 fi
1420的错误日志是由1417执行的结果引起的,继续全局搜索函数 setOSParameters_MPPDB
[root@dws01 opt]# grep -rn "setOSParameters_MPPDB" /opt/huawei/
/opt/huawei/Bigdata/components/FusionInsight_MPPDB_8.1.3.6/MPPDB/upgrades.xml:624: sudo "${SUDO_PATH_NODEAGENT}" setOSParameters_MPPDB 2>&1 | LogPipe
/opt/huawei/Bigdata/FusionInsight_MPPDB_8.1.3.6/install/FusionInsight-MPPDB-8.1.3/adapter/upgrade/controller/upgrades.xml:624: sudo "${SUDO_PATH_NODEAGENT}" setOSParameters_MPPDB 2>&1 | LogPipe
/opt/huawei/Bigdata/FusionInsight_MPPDB_8.1.3.6/install/FusionInsight-MPPDB-8.1.3/setup/mpp-postinstall.sh:1417: sudo "${SUDO_PATH_NODEAGENT}" setOSParameters_MPPDB 2>&1 | LogPipe
/opt/huawei/Bigdata/adapters/FusionInsight_MPPDB_8.1.3.6/MPPDB/upgrade/controller/upgrades.xml:624: sudo "${SUDO_PATH_NODEAGENT}" setOSParameters_MPPDB 2>&1 | LogPipe
/opt/huawei/Bigdata_func/sudo/runtime/sudoExecute.sh:30:declare -r mpp_existFunctionArray=(setFilesAccessibleForMpp execPython_preinstallMPP execPython_unPreinstallMPP setOSParameters_MPPDB setCgroupForMPP setCgroupMPP delete_cronForMPP add_cronForMPP installFcByOm updateFcByOm restartMonitor uninstallSingleFc clearFDParameters_MPPDB execPython_bindVirtualIPmppdbandElk execPython_deleteVirtualIPmppdbandElk)
/opt/huawei/Bigdata_func/sudo/runtime/mpp_sudoExecute.sh:11:declare -r g_fuctionArray=(setFilesAccessibleForMpp execPython_preinstallMPP execPython_unPreinstallMPP setOSParameters_MPPDB setCgroupForMPP setCgroupMPP delete_cronForMPP add_cronForMPP installFcByOm updateFcByOm restartMonitor uninstallSingleFc clearFDParameters_MPPDB execPython_bindVirtualIPmppdbandElk execPython_deleteVirtualIPmppdbandElk)
/opt/huawei/Bigdata_func/sudo/runtime/mpp_sudoExecute.sh:613:# FUNCTION : setOSParameters_MPPDB
/opt/huawei/Bigdata_func/sudo/runtime/mpp_sudoExecute.sh:619:setOSParameters_MPPDB()
可以看到在 /opt/huawei/Bigdata_func/sudo/runtime/mpp_sudoExecute.sh:619:setOSParameters_MPPDB() 定义了这个函数
663 #支持UOS
664 if [ "uos" = "$(cat "/etc/os-release" | grep "^ID=" | awk -F "\"" '{print $2}')" ]; then
665 retVal_centos=0
666 fi
针对统信UOS的代码这里有一个获取操作系统类型的判断,大概率是这个地方没有取到正确的值导致。
if中的判断语句单独执行如下:
[root@dws01 opt]# cat "/etc/os-release" | grep "^ID="
ID=uos
[root@dws01 opt]# cat "/etc/os-release" | grep "^ID=" | awk -F "\"" '{print $2}'
[root@dws01 opt]# cat "/etc/os-release" | grep "^ID=" | awk -F "=" '{print $2}'
uos
显然在os-release中ID等于的值,没用使用双引号,所以程序中取不到uos这个值,导致后面的错误。
至此,问题已经清楚,接下来看如何解决...
5、解决办法
一开始我很自信地修改了 /opt/huawei/Bigdata_func/sudo/runtime/mpp_sudoExecute.sh中对应函数的代码,以便其可以成功获取到uos的值。但是,当我重新安装事,发现事与愿违。因为每次安装都是重新解压的,在源码中,一直就是不兼容的。
我换个思维,修改了os-release文件
修改前:
修改后:
然后,我再次卸载集群,安装集群,一切顺利
6、后记
以上是本次安装过程中遇到的问题记录,分享给各位开发者!
特别,感谢DWS技术专家葛红老师的远程指导!
- 点赞
- 收藏
- 关注作者
评论(0)