- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

repmgr-witness

snowofsummer 发表于 2021/12/28 14:09:18 2021/12/28

【摘要】 1，witness服务是一个普通的PostgreSQL 实例。（ quorum of servers ）2，在二个节点流复制环境使用，防止脑裂发生，如果发生故障转移的情况，提供证据，证明主节点本身不可用，而不是不同的物理网络中断导致的脑裂。（ witness server 和）3， witness 和 primary主服务放在一个位置，当网络故障发生， standby 不能看到 wit...

1，witness服务是一个普通的PostgreSQL 实例。（  quorum of servers ）
2，在二个节点流复制环境使用，防止脑裂发生， 如果发生故障转移的情况，提供证据，证明主节点本身不可用，而不是不同的物理网络中断导致的脑裂。
（ witness server 和）
3， witness 和 primary主服务放在一个位置，当网络故障发生， standby 不能看到 witness和primary不会发生切换。
4，如果可以看到 witness ，不能看到 primary，备库 发生promote。
5， A witness server will only be useful if repmgrd is in use

01，初始化Witness节点。

#新建数据库和用户

initdb -D /data/db01

#网络和参数文件设置

#pg_hba.conf

host all all 192.168.5.0/24 trust

#postgresql.conf

listen_addresses = '*'

shared_preload_libraries='repmgr'

#创建数据库和用户

createuser -s repmgr

createdb repmgr -O repmgr

psql -Urepmgr -drepmgr

alter user repmgr set search_path to repmgr,public;

show search_path;

# 查是否能连接Witness节点

psql -Urepmgr -drepmgr -h 192.168.5.203

02，注册Witness节点至Repmgr集群

repmgr.conf

node_id=4

node_name='db04'

conninfo='host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/db01'

log_file='/tmp/repmgrd.log'

-h/--host host name of the primary node

注意：-h连接的是primary节点，当发生故障切换时，见证节点会自动连接到最新的primary节点

[postgres@db04 db01]$ repmgr witness register -h 192.168.5.200

INFO: connecting to witness node "db04" (ID: 4)

INFO: connecting to primary node

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

INFO: witness registration complete

NOTICE: witness node "db04" (ID: 4) successfully registered

#启动repmgrd

[postgres@db04 db01]$ repmgrd --pid-file /tmp/repmgrd.pid

[2021-10-29 01:50:39] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

#状态检查

[postgres@db04 db01]$ repmgr daemon status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | primary | * running | | running | 8470 | no | n/a

2 | db02 | standby | running | db01 | running | 1716 | no | 3 second(s) ago

3 | db03 | standby | running | db01 | running | 17700 | no | 3 second(s) ago

4 | db04 | witness | * running | db01 | running | 26629 | no | 0 second(s) ago

[postgres@db04 db01]$ repmgr cluster show --compact

----+------+---------+-----------+----------+----------+-------+-----

1 | db01 | primary | * running | | default | 100 | 7

2 | db02 | standby | running | db01 | default | 100 | 7

3 | db03 | standby | running | db01 | default | 100 | 7

4 | db04 | witness | * running | db01 | default | 0 | n/a

postgres@db04 db01]$ repmgr cluster matrix

INFO: connecting to database

Name | ID | 1 | 2 | 3 | 4

------+----+---+---+---+---

db01 | 1 | * | * | * | *

db02 | 2 | * | * | * | *

db03 | 3 | * | * | * | *

db04 | 4 | * | * | * | *

[postgres@db04 db01]$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 100 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

03，数据库更改优先级

[postgres@db01 ~]$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 100 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

#更改为200（priority）

node_id=1

node_name='db01'

conninfo='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/db01'

pg_bindir='/usr/local/postgresql/bin'

priority=200

failover=automatic

promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/local/postgresql/bin/repmgr follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

log_file='/tmp/repmgrd.log'

monitoring_history=true

monitor_interval_secs=5

[postgres@db01 ~]$ repmgr primary register --force

INFO: connecting to primary database...

INFO: "repmgr" extension is already installed

NOTICE: primary node record (ID: 1) updated

[postgres@db01 ~]$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 200 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

#db02更优先级300

#更改配置文件

-bash-4.2$ cat /etc/repmgr.conf

node_id=2

node_name='db02'

conninfo='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/db01'

pg_bindir='/usr/local/postgresql/bin'

priority=300

failover=automatic

promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

log_file='/tmp/repmgrd.log'

monitoring_history=true

monitor_interval_secs=5

-bash-4.2$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 200 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 100 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

#apply the changed setting

-bash-4.2$ repmgr standby register --force

INFO: connecting to local node "db02" (ID: 2)

INFO: connecting to primary database

INFO: standby registration complete

NOTICE: standby node "db02" (ID: 2) successfully registered

#检查更改情况

-bash-4.2$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 200 | 7 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 300 | 7 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db01 | default | 100 | 7 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

4 | db04 | witness | * running | db01 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

故障切换：

db01关机前状态：

[postgres@db01 ~]$ repmgr daemon status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | primary | * running | | running | 8470 | no | n/a

2 | db02 | standby | running | db01 | running | 1716 | no | 2 second(s) ago

3 | db03 | standby | running | db01 | running | 17700 | no | 2 second(s) ago

4 | db04 | witness | * running | db01 | running | 26629 | no | 1 second(s) ago

[postgres@db04 db01]$ repmgr daemon status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | primary | * running | | running | 8470 | no | n/a

2 | db02 | standby | running | db01 | running | 1716 | no | 3 second(s) ago

3 | db03 | standby | running | db01 | running | 17700 | no | 3 second(s) ago

4 | db04 | witness | * running | db01 | running | 26629 | no | 0 second(s) ago

db01关机:

db04:witness自动指向新的主数据库

[2021-10-29 01:59:46] [WARNING] unable to reconnect to node 1 after 6 attempts

[2021-10-29 01:59:48] [NOTICE] witness node "db04" (ID: 4) now following new primary node "db02" (ID: 2)

[2021-10-29 01:59:48] [INFO] resuming witness monitoring mode

[2021-10-29 01:59:48] [DETAIL] following new primary "db02" (ID: 2)

[2021-10-29 01:59:48] [INFO] witness monitoring connection to primary node "db02" (ID: 2)

自动切换之后集群状态：

#db02切换为主数据库：（提升promoted a standby）

-bash-4.2$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | - failed | ? | default | 200 | | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | primary | * running | | default | 300 | 8 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db02 | default | 100 | 8 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

4 | db04 | witness | * running | db02 | default | 0 | n/a | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

- unable to connect to node "db01" (ID: 1)

HINT: execute with --verbose option to see connection error messages

-bash-4.2$ repmgr service status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | primary | - failed | ? | n/a | n/a | n/a | n/a

2 | db02 | primary | * running | | running | 1716 | no | n/a

3 | db03 | standby | running | db02 | running | 17700 | no | 0 second(s) ago

4 | db04 | witness | * running | db02 | running | 26629 | no | 1 second(s) ago

WARNING: following issues were detected

- unable to connect to node "db01" (ID: 1)

HINT: execute with --verbose option to see connection error messages

##db02切换日志：

[2021-10-29 01:59:44] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts

[2021-10-29 01:59:46] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"

[2021-10-29 01:59:46] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2021-10-29 01:59:46] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts

[2021-10-29 01:59:46] [INFO] 2 active sibling nodes registered

[2021-10-29 01:59:46] [INFO] 4 total nodes registered

[2021-10-29 01:59:46] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")

[2021-10-29 01:59:46] [INFO] local node's last receive lsn: 0/1A050250

[2021-10-29 01:59:46] [INFO] checking state of sibling node "db03" (ID: 3)

[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) reports its upstream is node 1, last seen 65 second(s) ago

[2021-10-29 01:59:46] [INFO] standby node "db03" (ID: 3) last saw primary node 65 second(s) ago

[2021-10-29 01:59:46] [INFO] last receive LSN for sibling node "db03" (ID: 3) is: 0/1A050250

[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) has same LSN as current candidate "db02" (ID: 2)

[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) has lower priority (100) than current candidate "db02" (ID: 2) (300)

#根据优先级，决定db02为主。

[2021-10-29 01:59:46] [INFO] checking state of sibling node "db04" (ID: 4)

[2021-10-29 01:59:46] [INFO] node "db04" (ID: 4) reports its upstream is node 1, last seen 61 second(s) ago

[2021-10-29 01:59:46] [INFO] witness node "db04" (ID: 4) last saw primary node 61 second(s) ago

[2021-10-29 01:59:46] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 10 seconds

[2021-10-29 01:59:46] [NOTICE] promotion candidate is "db02" (ID: 2)

[2021-10-29 01:59:46] [NOTICE] this node is the winner, will now promote itself and inform other nodes

[2021-10-29 01:59:46] [INFO] promote_command is:

"/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"

[2021-10-29 01:59:46] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-29 01:59:47] [WARNING] 2 sibling nodes found, but option "--siblings-follow" not specified

[2021-10-29 01:59:47] [DETAIL] these nodes will remain attached to the current primary:

db03 (node ID: 3)

db04 (node ID: 4, witness server)

[2021-10-29 01:59:47] [NOTICE] promoting standby to primary

[2021-10-29 01:59:47] [DETAIL] promoting server "db02" (ID: 2) using pg_promote()

[2021-10-29 01:59:47] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

[2021-10-29 01:59:48] [NOTICE] STANDBY PROMOTE successful

[2021-10-29 01:59:48] [DETAIL] server "db02" (ID: 2) was successfully promoted to primary

[2021-10-29 01:59:48] [INFO] checking state of node 2, 1 of 6 attempts

[2021-10-29 01:59:48] [NOTICE] node 2 has recovered, reconnecting

[2021-10-29 01:59:48] [INFO] connection to node 2 succeeded

[2021-10-29 01:59:48] [INFO] original connection is still available

[2021-10-29 01:59:48] [INFO] 2 followers to notify

[2021-10-29 01:59:48] [NOTICE] notifying node "db03" (ID: 3) to follow node 2

INFO: node 3 received notification to follow node 2

[2021-10-29 01:59:48] [NOTICE] notifying node "db04" (ID: 4) to follow node 2

INFO: node 4 received notification to follow node 2

[2021-10-29 01:59:48] [INFO] switching to primary monitoring mode

[2021-10-29 01:59:48] [NOTICE] monitoring cluster primary "db02" (ID: 2)

[2021-10-29 01:59:53] [NOTICE] new witness "db04" (ID: 4) has connected

[2021-10-29 01:59:53] [NOTICE] new standby "db03" (ID: 3) has connected

[2021-10-29 02:04:48] [INFO] monitoring primary node "db02" (ID: 2) in normal state

db03日志：

[2021-10-29 01:59:46] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts

[2021-10-29 01:59:46] [INFO] 2 active sibling nodes registered

[2021-10-29 01:59:46] [INFO] 4 total nodes registered

[2021-10-29 01:59:46] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")

[2021-10-29 01:59:46] [INFO] local node's last receive lsn: 0/1A050250

[2021-10-29 01:59:46] [INFO] checking state of sibling node "db02" (ID: 2)

[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) reports its upstream is node 1, last seen 65 second(s) ago

[2021-10-29 01:59:46] [INFO] standby node "db02" (ID: 2) last saw primary node 65 second(s) ago

[2021-10-29 01:59:46] [INFO] last receive LSN for sibling node "db02" (ID: 2) is: 0/1A050250

[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) has same LSN as current candidate "db03" (ID: 3)

[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) has higher priority (300) than current candidate "db03" (ID: 3) (100)

[2021-10-29 01:59:46] [INFO] checking state of sibling node "db04" (ID: 4)

[2021-10-29 01:59:46] [INFO] node "db04" (ID: 4) reports its upstream is node 1, last seen 61 second(s) ago

[2021-10-29 01:59:46] [INFO] witness node "db04" (ID: 4) last saw primary node 61 second(s) ago

[2021-10-29 01:59:46] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 10 seconds

[2021-10-29 01:59:46] [NOTICE] promotion candidate is "db02" (ID: 2)

[2021-10-29 01:59:46] [INFO] follower node awaiting notification from a candidate node

[2021-10-29 01:59:49] [NOTICE] attempting to follow new primary "db02" (node ID: 2)

[2021-10-29 01:59:49] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-29 01:59:49] [INFO] local node 3 can attach to follow target node 2

[2021-10-29 01:59:49] [DETAIL] local node's recovery point: 0/1A050250; follow target node's fork point: 0/1A050250

[2021-10-29 01:59:49] [NOTICE] setting node 3's upstream to node 2

[2021-10-29 01:59:49] [WARNING] node "db03" not found in "pg_stat_replication"

[2021-10-29 01:59:50] [NOTICE] STANDBY FOLLOW successful

[2021-10-29 01:59:50] [DETAIL] standby attached to upstream node "db02" (ID: 2)

INFO: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid

[2021-10-29 01:59:50] [NOTICE] node "db03" (ID: 3) now following new upstream node "db02" (ID: 2)

[2021-10-29 01:59:50] [INFO] resuming standby monitoring mode

[2021-10-29 01:59:50] [DETAIL] following new primary "db02" (ID: 2)

[2021-10-29 02:04:50] [INFO] node "db03" (ID: 3) monitoring upstream node "db02" (ID: 2) in normal state

[2021-10-29 02:04:50] [DETAIL] last monitoring statistics update was 5 seconds ago

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

repmgr-witness

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

repmgr-witness

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品