repmgr-witness
【摘要】 1,witness服务是一个普通的PostgreSQL 实例。( quorum of servers )2,在二个节点流复制环境使用,防止脑裂发生, 如果发生故障转移的情况,提供证据,证明主节点本身不可用,而不是不同的物理网络中断导致的脑裂。( witness server 和)3, witness 和 primary主服务放在一个位置,当网络故障发生, standby 不能看到 wit...
1,witness服务是一个普通的PostgreSQL 实例。( quorum of servers )
2,在二个节点流复制环境使用,防止脑裂发生, 如果发生故障转移的情况,提供证据,证明主节点本身不可用,而不是不同的物理网络中断导致的脑裂。
( witness server 和)
3, witness 和 primary主服务放在一个位置,当网络故障发生, standby 不能看到 witness和primary不会发生切换。
4,如果可以看到 witness ,不能看到 primary,备库 发生promote。
5, A witness server will only be useful if repmgrd is in use
01,初始化Witness节点。
#新建数据库和用户
initdb -D /data/db01
#网络和参数文件设置
#pg_hba.conf
host all all 192.168.5.0/24 trust
#postgresql.conf
listen_addresses = '*'
shared_preload_libraries='repmgr'
#创建数据库和用户
createuser -s repmgr
createdb repmgr -O repmgr
psql -Urepmgr -drepmgr
alter user repmgr set search_path to repmgr,public;
show search_path;
# 查是否能连接Witness节点
psql -Urepmgr -drepmgr -h 192.168.5.203
02,注册Witness节点至Repmgr集群
repmgr.conf
node_id=4
node_name='db04'
conninfo='host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
log_file='/tmp/repmgrd.log'
-h/--host host name of the primary node
注意:-h连接的是primary节点,当发生故障切换时,见证节点会自动连接到最新的primary节点
[postgres@db04 db01]$ repmgr witness register -h 192.168.5.200
INFO: connecting to witness node "db04" (ID: 4)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "db04" (ID: 4) successfully registered
#启动repmgrd
[postgres@db04 db01]$ repmgrd --pid-file /tmp/repmgrd.pid
[2021-10-29 01:50:39] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
#状态检查
[postgres@db04 db01]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | primary | * running | | running | 8470 | no | n/a
2 | db02 | standby | running | db01 | running | 1716 | no | 3 second(s) ago
3 | db03 | standby | running | db01 | running | 17700 | no | 3 second(s) ago
4 | db04 | witness | * running | db01 | running | 26629 | no | 0 second(s) ago
[postgres@db04 db01]$ repmgr cluster show --compact
ID | Name | Role | Status | Upstream | Location | Prio. | TLI
----+------+---------+-----------+----------+----------+-------+-----
1 | db01 | primary | * running | | default | 100 | 7
2 | db02 | standby | running | db01 | default | 100 | 7
3 | db03 | standby | running | db01 | default | 100 | 7
4 | db04 | witness | * running | db01 | default | 0 | n/a
postgres@db04 db01]$ repmgr cluster matrix
INFO: connecting to database
Name | ID | 1 | 2 | 3 | 4
------+----+---+---+---+---
db01 | 1 | * | * | * | *
db02 | 2 | * | * | * | *
db03 | 3 | * | * | * | *
db04 | 4 | * | * | * | *
[postgres@db04 db01]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 100 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
03,数据库更改优先级
[postgres@db01 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 100 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
#更改为200(priority)
node_id=1
node_name='db01'
conninfo='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
priority=200
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5
[postgres@db01 ~]$ repmgr primary register --force
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: primary node record (ID: 1) updated
[postgres@db01 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 200 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
#db02更优先级300
#更改配置文件
-bash-4.2$ cat /etc/repmgr.conf
node_id=2
node_name='db02'
conninfo='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
priority=300
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 200 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
#apply the changed setting
-bash-4.2$ repmgr standby register --force
INFO: connecting to local node "db02" (ID: 2)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "db02" (ID: 2) successfully registered
#检查更改情况
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 200 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 300 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
故障切换:
db01关机前状态:
[postgres@db01 ~]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | primary | * running | | running | 8470 | no | n/a
2 | db02 | standby | running | db01 | running | 1716 | no | 2 second(s) ago
3 | db03 | standby | running | db01 | running | 17700 | no | 2 second(s) ago
4 | db04 | witness | * running | db01 | running | 26629 | no | 1 second(s) ago
[postgres@db04 db01]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | primary | * running | | running | 8470 | no | n/a
2 | db02 | standby | running | db01 | running | 1716 | no | 3 second(s) ago
3 | db03 | standby | running | db01 | running | 17700 | no | 3 second(s) ago
4 | db04 | witness | * running | db01 | running | 26629 | no | 0 second(s) ago
db01关机:
db04:witness自动指向新的主数据库
[2021-10-29 01:59:46] [WARNING] unable to reconnect to node 1 after 6 attempts
[2021-10-29 01:59:48] [NOTICE] witness node "db04" (ID: 4) now following new primary node "db02" (ID: 2)
[2021-10-29 01:59:48] [INFO] resuming witness monitoring mode
[2021-10-29 01:59:48] [DETAIL] following new primary "db02" (ID: 2)
[2021-10-29 01:59:48] [INFO] witness monitoring connection to primary node "db02" (ID: 2)
自动切换之后集群状态:
#db02切换为主数据库:(提升promoted a standby)
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | - failed | ? | default | 200 | | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | primary | * running | | default | 300 | 8 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db02 | default | 100 | 8 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4 | db04 | witness | * running | db02 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "db01" (ID: 1)
HINT: execute with --verbose option to see connection error messages
-bash-4.2$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | primary | - failed | ? | n/a | n/a | n/a | n/a
2 | db02 | primary | * running | | running | 1716 | no | n/a
3 | db03 | standby | running | db02 | running | 17700 | no | 0 second(s) ago
4 | db04 | witness | * running | db02 | running | 26629 | no | 1 second(s) ago
WARNING: following issues were detected
- unable to connect to node "db01" (ID: 1)
HINT: execute with --verbose option to see connection error messages
##db02切换日志:
[2021-10-29 01:59:44] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts
[2021-10-29 01:59:46] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"
[2021-10-29 01:59:46] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-29 01:59:46] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-29 01:59:46] [INFO] 2 active sibling nodes registered
[2021-10-29 01:59:46] [INFO] 4 total nodes registered
[2021-10-29 01:59:46] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")
[2021-10-29 01:59:46] [INFO] local node's last receive lsn: 0/1A050250
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db03" (ID: 3)
[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) reports its upstream is node 1, last seen 65 second(s) ago
[2021-10-29 01:59:46] [INFO] standby node "db03" (ID: 3) last saw primary node 65 second(s) ago
[2021-10-29 01:59:46] [INFO] last receive LSN for sibling node "db03" (ID: 3) is: 0/1A050250
[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) has same LSN as current candidate "db02" (ID: 2)
[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) has lower priority (100) than current candidate "db02" (ID: 2) (300)
#根据优先级,决定db02为主。
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db04" (ID: 4)
[2021-10-29 01:59:46] [INFO] node "db04" (ID: 4) reports its upstream is node 1, last seen 61 second(s) ago
[2021-10-29 01:59:46] [INFO] witness node "db04" (ID: 4) last saw primary node 61 second(s) ago
[2021-10-29 01:59:46] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 10 seconds
[2021-10-29 01:59:46] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-29 01:59:46] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2021-10-29 01:59:46] [INFO] promote_command is:
"/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-10-29 01:59:46] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[2021-10-29 01:59:47] [WARNING] 2 sibling nodes found, but option "--siblings-follow" not specified
[2021-10-29 01:59:47] [DETAIL] these nodes will remain attached to the current primary:
db03 (node ID: 3)
db04 (node ID: 4, witness server)
[2021-10-29 01:59:47] [NOTICE] promoting standby to primary
[2021-10-29 01:59:47] [DETAIL] promoting server "db02" (ID: 2) using pg_promote()
[2021-10-29 01:59:47] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-10-29 01:59:48] [NOTICE] STANDBY PROMOTE successful
[2021-10-29 01:59:48] [DETAIL] server "db02" (ID: 2) was successfully promoted to primary
[2021-10-29 01:59:48] [INFO] checking state of node 2, 1 of 6 attempts
[2021-10-29 01:59:48] [NOTICE] node 2 has recovered, reconnecting
[2021-10-29 01:59:48] [INFO] connection to node 2 succeeded
[2021-10-29 01:59:48] [INFO] original connection is still available
[2021-10-29 01:59:48] [INFO] 2 followers to notify
[2021-10-29 01:59:48] [NOTICE] notifying node "db03" (ID: 3) to follow node 2
INFO: node 3 received notification to follow node 2
[2021-10-29 01:59:48] [NOTICE] notifying node "db04" (ID: 4) to follow node 2
INFO: node 4 received notification to follow node 2
[2021-10-29 01:59:48] [INFO] switching to primary monitoring mode
[2021-10-29 01:59:48] [NOTICE] monitoring cluster primary "db02" (ID: 2)
[2021-10-29 01:59:53] [NOTICE] new witness "db04" (ID: 4) has connected
[2021-10-29 01:59:53] [NOTICE] new standby "db03" (ID: 3) has connected
[2021-10-29 02:04:48] [INFO] monitoring primary node "db02" (ID: 2) in normal state
db03日志:
[2021-10-29 01:59:46] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-29 01:59:46] [INFO] 2 active sibling nodes registered
[2021-10-29 01:59:46] [INFO] 4 total nodes registered
[2021-10-29 01:59:46] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")
[2021-10-29 01:59:46] [INFO] local node's last receive lsn: 0/1A050250
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db02" (ID: 2)
[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) reports its upstream is node 1, last seen 65 second(s) ago
[2021-10-29 01:59:46] [INFO] standby node "db02" (ID: 2) last saw primary node 65 second(s) ago
[2021-10-29 01:59:46] [INFO] last receive LSN for sibling node "db02" (ID: 2) is: 0/1A050250
[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) has same LSN as current candidate "db03" (ID: 3)
[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) has higher priority (300) than current candidate "db03" (ID: 3) (100)
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db04" (ID: 4)
[2021-10-29 01:59:46] [INFO] node "db04" (ID: 4) reports its upstream is node 1, last seen 61 second(s) ago
[2021-10-29 01:59:46] [INFO] witness node "db04" (ID: 4) last saw primary node 61 second(s) ago
[2021-10-29 01:59:46] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 10 seconds
[2021-10-29 01:59:46] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-29 01:59:46] [INFO] follower node awaiting notification from a candidate node
[2021-10-29 01:59:49] [NOTICE] attempting to follow new primary "db02" (node ID: 2)
[2021-10-29 01:59:49] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[2021-10-29 01:59:49] [INFO] local node 3 can attach to follow target node 2
[2021-10-29 01:59:49] [DETAIL] local node's recovery point: 0/1A050250; follow target node's fork point: 0/1A050250
[2021-10-29 01:59:49] [NOTICE] setting node 3's upstream to node 2
[2021-10-29 01:59:49] [WARNING] node "db03" not found in "pg_stat_replication"
[2021-10-29 01:59:50] [NOTICE] STANDBY FOLLOW successful
[2021-10-29 01:59:50] [DETAIL] standby attached to upstream node "db02" (ID: 2)
INFO: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2021-10-29 01:59:50] [NOTICE] node "db03" (ID: 3) now following new upstream node "db02" (ID: 2)
[2021-10-29 01:59:50] [INFO] resuming standby monitoring mode
[2021-10-29 01:59:50] [DETAIL] following new primary "db02" (ID: 2)
[2021-10-29 02:04:50] [INFO] node "db03" (ID: 3) monitoring upstream node "db02" (ID: 2) in normal state
[2021-10-29 02:04:50] [DETAIL] last monitoring statistics update was 5 seconds ago
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)