GaussDB T 分布式集群部署(1)

举报
社会主义的一块砖 发表于 2019/12/25 18:56:35 2019/12/25
【摘要】 GaussDB 强大于ORACLE的一个核心在于其可以部署分布式集群,且有实际部署使用案例,理论上水平横向支持扩展1024个DN节点,实际案例中已有500+DN的大规模分布式集群。本节简要记录4节点单机房分布式高可用架构部署过程。环境说明: 4台虚拟机: 操作系统 redhat 7.5 mini安装 Gauss1 2C ...

   GaussDB 强大于ORACLE的一个核心在于其可以部署分布式集群,且有实际部署使用案例,理论上水平横向支持扩展1024个DN节点,实际案例中已有500+DN的大规模分布式集群。本节简要记录4节点单机房分布式高可用架构部署过程。


环境说明:

    4台虚拟机:

            操作系统 redhat 7.5  mini安装 

                Gauss1   2C 8G 100G        192.168.10.11/16 10.10.10.11/16

                Gauss2   1C 6G 100G        192.168.10.12/16 10.10.10.12/16

                Gauss3   1C 6G 100G        192.168.10.13/16 10.10.10.13/16

                Gauss4   1C 6G 100G        192.168.10.14/16 10.10.10.14/16

注:磁盘规划要点:每个实例至少保证20G,比如下边部署方案中除了4节点是3实例外,其他3个节点都是4个实例;本例网络平面分业务平面(192.168.0.0)和维护平面(10.10.0.0),生产环境还需加入备份平面,以提升网络独立性和安全性;使用默认模版安装对内存要求较高,可通过修改建库模版文件中内存参数降低内存消耗。修改后每个节点2G内存即可。



部署方案:

modb_20191224_103753.png

注:

    CN: 协调节点,负责分解任务,调度任务分片在DN上并行执行;

    CM: (Cluster Manager) 集群管理模块,管理和监控各个实例运行情况,确保系统稳定运行;

    ETCD: 是一个高可用的分布式键值(KEY-VALUE)数据库,收集实例状态信息,CM从ETCD获取实例状态信息;

    DN: (DataNode)数据节点,负责存储业务数据,执行数据查询结果并向CN返回结果;

    AZ1:1号机房(单机房);



1.系统初始化(所有主机均执行):


    1> 关闭防火墙


[root@Gauss1 ~]# systemctl status firewalld.service
[root@Gauss1 ~]# systemctl stop firewalld.service
[root@Gauss1 ~]# systemctl disable firewalld.service
[root@Gauss1 ~]# firewall-cmd --state
not running
[root@Gauss1 ~]#



    2> 优化内核参数

    /etc/sysctl.conf追加如下内容:


net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_retries2 = 80
vm.overcommit_memory = 0
net.ipv4.tcp_rmem = 8192 250000 16777216
net.core.wmem_max = 21299200
net.core.rmem_max = 21299200
net.core.wmem_default = 21299200
net.core.rmem_default = 21299200
kernel.sem = 250 6400000 1000 25600
vm.min_free_kbytes = 314572             --预留系统内存5%
net.core.somaxconn = 65535
net.ipv4.tcp_syncookies = 1
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_wmem = 8192 250000 16777216
net.ipv4.tcp_max_tw_buckets = 10000


增加完成后执行sysctl -p刷新


    3> 安装依赖包

    配置本地YUM源:


echo "/dev/sr0 mnt/iso iso9660 defaults,loop 0 0" >> etc/fstab
mkdir -p mnt/iso
rm -rf etc/yum.repos.d/*
cat >> etc/yum.repos.d/ios.repo <<EOF
[ios]
name=ios
baseurl=file:///mnt/iso
enabled=1
gpgcheck=0
EOF
mount -a
yum repolist


安装依赖包


[root@Gauss1 yum.repos.d]# unset uninstall_rpm;for i in zlib readline gcc python python-devel perl-ExtUtils-Embed readline-devel zlib-devel lsof;do rpm -q $i &>/dev/null || uninstall_rpm="$uninstall_rpm $i";done ;[[ -z "$uninstall_rpm" ]] && echo -e "\nuninstall_rpm:\n\tOK.OK.OK" ||  echo -e "\nuninstall_rpm:\n\t$uninstall_rpm"
uninstall_rpm:
        gcc python-devel perl-ExtUtils-Embed readline-devel zlib-devel lsof
[root@Gauss1 yum.repos.d]# yum -y install gcc python-devel perl-ExtUtils-Embed readline-devel zlib-devel lsof
[root@Gauss1 yum.repos.d]# unset uninstall_rpm;for i in zlib readline gcc python python-devel perl-ExtUtils-Embed readline-devel zlib-devel lsof;do rpm -q $i &>/dev/null || uninstall_rpm="$uninstall_rpm $i";done ;[[ -z "$uninstall_rpm" ]] && echo -e "\nuninstall_rpm:\n\tOK.OK.OK" ||  echo -e "\nuninstall_rpm:\n\t$uninstall_rpm"
uninstall_rpm:
       OK.OK.OK
[root@Gauss1 yum.repos.d]#



2.上传安装包(后续操作仅一个节点)


[root@Gauss1 ~]# mkdir -p /opt/software/gaussdb
[root@Gauss1 ~]# cd /opt/software/gaussdb
[root@Gauss1 gaussdb]# rz
[root@Gauss1 gaussdb]# tar -zxvf GaussDB_100_1.0.0-CLUSTER-REDHAT7.5-64bit.tar.g
[root@Gauss1 gaussdb]# ll
total 134984
-rw-r--r--.  1 root root 86248211 Dec 14 20:18 GaussDB_100_1.0.0-CLUSTER-REDHAT7.5-64bit.tar.gz
-rw-r--r--.  1 root root 44364163 Jul 29 19:14 GaussDB_100_1.0.0-CM-REDHAT-64bit.tar.gz
-rw-r--r--.  1 root root  7478115 Jul 29 19:16 GaussDB_100_1.0.0-DATABASE-REDHAT-64bit.tar.gz
-rw-r--r--.  1 root root   118593 Jul 29 19:12 GaussDB_100_1.0.0-ROACH-REDHAT-64bit.tar.gz
drwx------. 16 root root     4096 Jul 29 19:18 lib
drwx------.  6 root root     4096 Jul 29 19:11 script
drwxr-xr-x.  5 root root       49 Jul 29 19:11 shardingscript
[root@Gauss1 gaussdb]#



3.编辑集群配置文件

创建配置文件/opt/software/gaussdb/clusterconfig.xml,内容如下:


<?xml version="1.0" encoding="UTF-8"?><ROOT>
<CLUSTER>
 <PARAM name="clusterName" value="kevin"/>
 <PARAM name="nodeNames" value="Gauss1,Gauss2,Gauss3,Gauss4"/>
 <PARAM name="gaussdbAppPath" value="/opt/gaussdb/app"/>
 <PARAM name="gaussdbLogPath" value="/opt/gaussdb/log"/>
 <PARAM name="archiveLogPath" value="/opt/gaussdb/arch_log"/>
 <PARAM name="redoLogPath" value="/opt/gaussdb/redo_log"/>
 <PARAM name="tmpMppdbPath" value="/opt/gaussdb/temp"/>
 <PARAM name="gaussdbToolPath" value="/opt/gaussdb/gaussTools/wisequery"/>
 <PARAM name="dbdiagPath" value="/opt/gaussdb/dbdiag"/>
 <PARAM name="datanodeType" value="DN_ZENITH_HA"/>
 <PARAM name="clusterType" value="mutil-AZ"/>
 <PARAM name="coordinatorType" value="CN_ZENITH_ZSHARDING"/>
</CLUSTER>
<DEVICELIST>
 <DEVICE sn="1000001">
  <PARAM name="name" value="Gauss1"/>
  <PARAM name="azName" value="AZ1"/>
  <PARAM name="azPriority" value="1"/>
  <PARAM name="backIp1" value="192.168.10.11"/>
  <PARAM name="sshIp1" value="192.168.10.11"/>
  <PARAM name="innerManageIP" value="10.10.10.11"/>
  <PARAM name="cmsNum" value="1"/>
  <PARAM name="cmServerPortBase" value="21000"/>
  <PARAM name="cmServerListenIp1" value="192.168.10.11,192.168.10.12,192.168.10.13,192.168.10.14"/>
  <PARAM name="cmServerHaIp1" value="192.168.10.11,192.168.10.12,192.168.10.13,192.168.10.14"/>
  <PARAM name="cmServerlevel" value="1"/>
  <PARAM name="cmServerRelation" value="Gauss1,Gauss2,Gauss3,Gauss4"/>
  <PARAM name="dataNum" value="1"/>
  <PARAM name="dataPortBase" value="40000"/>
  <PARAM name="dataNode1" value="/opt/gaussdb/data/dn1,Gauss2,/opt/gaussdb/data/dn1"/>
  <PARAM name="cooNum" value="1"/>
  <PARAM name="cooPortBase" value="8000"/>
  <PARAM name="cooListenIp1" value="192.168.10.11"/>
  <PARAM name="cooDir1" value="/opt/gaussdb/data/cn"/>
  <PARAM name="etcdNum" value="1"/>
  <PARAM name="etcdListenPort" value="2379"/>
  <PARAM name="etcdHaPort" value="2380"/>
  <PARAM name="etcdListenIp1" value="192.168.10.11"/>
  <PARAM name="etcdHaIp1" value="192.168.10.11"/>
  <PARAM name="etcdDir1" value="/opt/huawei/gaussdb/data/etcd/data_etcd1"/>
 </DEVICE>
 <DEVICE sn="1000002">
  <PARAM name="name" value="Gauss2"/>
  <PARAM name="azName" value="AZ1"/>
  <PARAM name="azPriority" value="1"/>
  <PARAM name="backIp1" value="192.168.10.12"/>
  <PARAM name="sshIp1" value="192.168.10.12"/>
  <PARAM name="innerManageIP" value="10.10.10.12"/>
  <PARAM name="dataNum" value="1"/>
  <PARAM name="dataPortBase" value="40000"/>
  <PARAM name="dataNode1" value="/opt/gaussdb/data/dn2,Gauss3,/opt/gaussdb/data/dn2"/>
  <PARAM name="cooNum" value="1"/>
  <PARAM name="cooPortBase" value="8001"/>
  <PARAM name="cooListenIp1" value="192.168.10.12"/>
  <PARAM name="cooDir1" value="/opt/gaussdb/data/cn"/>
  <PARAM name="etcdNum" value="1"/>
  <PARAM name="etcdListenPort" value="2379"/>
  <PARAM name="etcdHaPort" value="2380"/>
  <PARAM name="etcdListenIp1" value="192.168.10.12"/>
  <PARAM name="etcdHaIp1" value="192.168.10.12"/>
  <PARAM name="etcdDir1" value="/opt/huawei/gaussdb/data/etcd/data_etcd1"/>
 </DEVICE>
 <DEVICE sn="1000003">
  <PARAM name="name" value="Gauss3"/>
  <PARAM name="azName" value="AZ1"/>
  <PARAM name="azPriority" value="1"/>
  <PARAM name="backIp1" value="192.168.10.13"/>
  <PARAM name="sshIp1" value="192.168.10.13"/>
  <PARAM name="innerManageIP" value="10.10.10.13"/>
  <PARAM name="dataNum" value="1"/>
  <PARAM name="dataPortBase" value="40000"/>
  <PARAM name="dataNode1" value="/opt/gaussdb/data/dn3,Gauss4,/opt/gaussdb/data/dn3"/>
  <PARAM name="cooNum" value="1"/>
  <PARAM name="cooPortBase" value="8000"/>
  <PARAM name="cooListenIp1" value="192.168.10.13"/>
  <PARAM name="cooDir1" value="/opt/gaussdb/data/cn"/>
  <PARAM name="etcdNum" value="1"/>
  <PARAM name="etcdListenPort" value="2379"/>
  <PARAM name="etcdHaPort" value="2380"/>
  <PARAM name="etcdListenIp1" value="192.168.10.13"/>
  <PARAM name="etcdHaIp1" value="192.168.10.13"/>
  <PARAM name="etcdDir1" value="/opt/huawei/gaussdb/data/etcd/data_etcd1"/>
 </DEVICE>
 <DEVICE sn="1000004">
  <PARAM name="name" value="Gauss4"/>
  <PARAM name="azName" value="AZ1"/>
  <PARAM name="azPriority" value="1"/>
  <PARAM name="backIp1" value="192.168.10.14"/>
  <PARAM name="sshIp1" value="192.168.10.14"/>
  <PARAM name="innerManageIP" value="10.10.10.14"/>
  <PARAM name="dataNum" value="1"/>
  <PARAM name="dataPortBase" value="40000"/>
  <PARAM name="dataNode1" value="/opt/gaussdb/data/dn4,Gauss1,/opt/gaussdb/data/dn4"/>
  <PARAM name="cooNum" value="1"/>
  <PARAM name="cooPortBase" value="8000"/>
  <PARAM name="cooListenIp1" value="192.168.10.14"/>
  <PARAM name="cooDir1" value="/opt/gaussdb/data/cn"/>
 </DEVICE>
</DEVICELIST>
</ROOT>



4.集群初始化

执行环境初始化脚本,执行过程会自动配置安装环境,创建用户,环境变量,目录等。


[root@Gauss1 script]# ./gs_preinstall -U omm -G dbgrp -X opt/software/gaussdb/clusterconfig.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)? yes
Please enter password for root.
Password:
Creating SSH trust for the root permission user.
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Successfully distributed SSH trust file to all node.
Verifying SSH trust on all hosts.
Successfully verified SSH trust on all hosts.
Successfully created SSH trust.
Successfully created SSH trust for the root permission user.
[GAUSS-52406] : The package type "" is inconsistent with the Cpu type "X86".
[root@Gauss1 script]#


报错处理:


root@Gauss1 ~]# cd /opt/gaussdb/log/omm/om
[root@Gauss1 om]# ll
total 20
-rw-------. 1 root root 1953 Dec 14 20:44 gs_local-2019-12-14_204431.log
-rw-------. 1 root root 9723 Dec 14 20:45 gs_preinstall-2019-12-14_204425.log
-rw-------. 1 root root   13 Dec 14 20:44 topDirPath.dat
[root@Gauss1 om]# tail gs_preinstall-2019-12-14_204425.log
2019-12-14 20:45:08.844963][gs_sshexkey][LOG]:Successfully appended authorized_key on all remote node.
[2019-12-14 20:45:08.845034][gs_sshexkey][LOG]:Checking common authentication file content.
[2019-12-14 20:45:08.845412][gs_sshexkey][LOG]:Successfully checked common authentication content.
[2019-12-14 20:45:08.845484][gs_sshexkey][LOG]:Distributing SSH trust file to all node.
[2019-12-14 20:45:09.251582][gs_sshexkey][LOG]:Successfully distributed SSH trust file to all node.
[2019-12-14 20:45:09.251687][gs_sshexkey][LOG]:Verifying SSH trust on all hosts.
[2019-12-14 20:45:09.658347][gs_sshexkey][LOG]:Successfully verified SSH trust on all hosts.
[2019-12-14 20:45:09.658447][gs_sshexkey][LOG]:Successfully created SSH trust.
[2019-12-14 20:45:09.681885][gs_preinstall][LOG]:Successfully created SSH trust for the root permission user.
[2019-12-14 20:45:10.539785][gs_preinstall][ERROR]:[GAUSS-52406] : The package type "" is inconsistent with the Cpu type "X86".
Traceback (most recent call last)
  File "./gs_preinstall", line 507, in <module>
  File "/opt/software/gaussdb/script/impl/preinstall/PreinstallImpl.py", line 1861, in run
[root@Gauss1 om]# vi opt/software/gaussdb/script/impl/preinstall/PreinstallImpl.py
  1760         self.createTrustForRoot()
  1761         #cpu  Detection
  1762         #self.getAllCpu()               --注释掉
  1763         #ram  Detection
  1764         self.getAllRam()
  1765         # LVM


再次执行


[root@Gauss1 script]# ./gs_preinstall -U omm -G dbgrp -X opt/software/gaussdb/clusterconfig.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)? yes
Please enter password for root.
Password:
Creating SSH trust for the root permission user.
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Successfully distributed SSH trust file to all node.
Verifying SSH trust on all hosts.
Successfully verified SSH trust on all hosts.
Successfully created SSH trust.
Successfully created SSH trust for the root permission user.
All host RAM is consistent
Pass over configuring LVM
Distributing package.
Successfully distributed package.
Are you sure you want to create the user[omm] and create trust for it (yes/no)? yes
Please enter password for cluster user.
Password:
Please enter password for cluster user again.
Password:
Creating [omm] user on all nodes.
Successfully created [omm] user on all nodes.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Creating SSH trust for [omm] user.
Please enter password for current user[omm].
Password:
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Successfully distributed SSH trust file to all node.
Verifying SSH trust on all hosts.
Successfully verified SSH trust on all hosts.
Successfully created SSH trust.
Successfully created SSH trust for [omm] user.
Checking OS version.
Successfully checked OS version.
Creating cluster's path.
Successfully created cluster's path.
Setting SCTP service.
Successfully set SCTP service.
Set and check OS parameter.
Successfully set NTP service.
Setting OS parameters.
Successfully set OS parameters.
Warning: Installation environment contains some warning messages.
Please get more details by "/opt/software/gaussdb/script/gs_checkos -i A -h Gauss1,Gauss2,Gauss3,Gauss4 -X opt/software/gaussdb/clusterconfig.xml".
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Preparing SSH service.
Successfully prepared SSH service.
Setting user environmental variables.
Successfully set user environmental variables.
Configuring alarms on the cluster nodes.
Successfully configured alarms on the cluster nodes.
Setting the dynamic link library.
Successfully set the dynamic link library.
Fixing server package owner.
Successfully fixed server package owner.
Create logrotate service.
Successfully create logrotate service.
Setting finish flag.
Successfully set finish flag.
check time consistency(maximum execution time 10 minutes).
Time consistent is running(20/20)...
Preinstallation succeeded.
[root@Gauss1 script]#


执行告警的脚本


[root@Gauss1 script]# opt/software/gaussdb/script/gs_checkos -i A -h Gauss1,Gauss2,Gauss3,Gauss4 -X opt/software/gaussdb/clusterconfig.xml
Checking items:
   A1. [ OS version status ]                                   : Normal
   A2. [ Kernel version status ]                               : Normal
   A3. [ Unicode status ]                                      : Normal
   A4. [ Time zone status ]                                    : Normal
   A5. [ Swap memory status ]                                  : Normal
   A6. [ System control parameters status ]                    : Normal
   A7. [ File system configuration status ]                    : Normal
   A8. [ Disk configuration status ]                           : Normal
   A9. [ Pre-read block size status ]                          : Normal
   A10.[ IO scheduler status ]                                 : Normal
   A12.[ Time consistency status ]                             : Warning
   A13.[ Firewall service status ]                             : Normal
   A14.[ THP service status ]                                  : Normal
   A15.[ Storcli tool status ]                                 : Normal
Total numbers:14. Abnormal numbers:0. Warning numbers:1.
[root@Gauss1 script]#


分布式环境对时间要求很高,测试环境配置Gauss1为时钟服务器,其他节点同步

Gauss1:


[root@Gauss1 ~]# yum -y install ntp ntpdate
[root@Gauss1 ~]# echo server 127.127.1.0 iburst >> /etc/ntp.conf
[root@Gauss1 ~]# systemctl restart ntpd
[root@Gauss1 ~]# systemctl enable ntpd
[root@Gauss1 ~]# ntpq -p
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.           5 l   12   64    1    0.000    0.000   0.000
[root@Gauss1 ~]#


Gauss2,3,4做如下配置:


[root@Gauss2 ~]# yum -y install ntp ntpdate
[root@Gauss2 ~]# echo server 192.168.10.11 >> /etc/ntp.conf
[root@Gauss2 ~]# echo restrict 192.168.10.11 nomodify notrap noquery >> /etc/ntp.conf
[root@Gauss2 ~]# ntpdate -u 192.168.10.11
[root@Gauss2 ~]# hwclock  -w
[root@Gauss2 ~]# systemctl restart ntpd
[root@Gauss2 ~]# systemctl enable ntpd
[root@Gauss2 ~]# ntpq -p
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
Gauss1          LOCAL(0)         6 u   10   64    1    0.416  197.856   0.000
[root@Gauss2 ~]#



重新检测:


[root@Gauss1 ~]# /opt/software/gaussdb/script/gs_checkos -i A -h Gauss1,Gauss2,Gauss3,Gauss4 -X /opt/software/gaussdb/clusterconfig.xml
Checking items:
   A1. [ OS version status ]                                   : Normal
   A2. [ Kernel version status ]                               : Normal
   A3. [ Unicode status ]                                      : Normal
   A4. [ Time zone status ]                                    : Normal
   A5. [ Swap memory status ]                                  : Normal
   A6. [ System control parameters status ]                    : Normal
   A7. [ File system configuration status ]                    : Normal
   A8. [ Disk configuration status ]                           : Normal
   A9. [ Pre-read block size status ]                          : Normal
   A10.[ IO scheduler status ]                                 : Normal
   A12.[ Time consistency status ]                             : Normal
   A13.[ Firewall service status ]                             : Normal
   A14.[ THP service status ]                                  : Normal
   A15.[ Storcli tool status ]                                 : Normal
Total numbers:14. Abnormal numbers:0. Warning numbers:0.
[root@Gauss1 ~]#



本文转自“墨天轮”社区GaussDB频道

<未完待续>

下一篇:  GaussDB T 分布式集群部署(2)https://bbs.huaweicloud.com/blogs/140821

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。