repmgr-witness

举报
snowofsummer 发表于 2021/12/28 14:09:18 2021/12/28
【摘要】  1,witness服务是一个普通的PostgreSQL 实例。( quorum of servers )2,在二个节点流复制环境使用,防止脑裂发生, 如果发生故障转移的情况,提供证据,证明主节点本身不可用,而不是不同的物理网络中断导致的脑裂。( witness server 和)3, witness 和 primary主服务放在一个位置,当网络故障发生, standby 不能看到 wit...

 

1,witness服务是一个普通的PostgreSQL 实例。(  quorum of servers )
2,在二个节点流复制环境使用,防止脑裂发生, 如果发生故障转移的情况,提供证据,证明主节点本身不可用,而不是不同的物理网络中断导致的脑裂。
( witness server 和)
3, witness 和 primary主服务放在一个位置,当网络故障发生, standby 不能看到 witness和primary不会发生切换。
4,如果可以看到 witness ,不能看到 primary,备库 发生promote。
5, A witness server will only be useful if repmgrd is in use

01,初始化Witness节点。
#新建数据库和用户
initdb -D /data/db01

#网络和参数文件设置
#pg_hba.conf
host    all             all             192.168.5.0/24          trust
#postgresql.conf
listen_addresses = '*'
shared_preload_libraries='repmgr'

#创建数据库和用户
createuser -s repmgr
createdb repmgr -O repmgr

psql -Urepmgr -drepmgr
alter user repmgr set search_path to repmgr,public;
show search_path;

# 查是否能连接Witness节点
psql -Urepmgr -drepmgr -h 192.168.5.203



02,注册Witness节点至Repmgr集群
repmgr.conf
node_id=4
node_name='db04'
conninfo='host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
log_file='/tmp/repmgrd.log'


-h/--host                host name of the primary node
注意:-h连接的是primary节点,当发生故障切换时,见证节点会自动连接到最新的primary节点
[postgres@db04 db01]$ repmgr witness register -h 192.168.5.200
INFO: connecting to witness node "db04" (ID: 4)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "db04" (ID: 4) successfully registered
#启动repmgrd
[postgres@db04 db01]$ repmgrd --pid-file /tmp/repmgrd.pid
[2021-10-29 01:50:39] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
#状态检查
[postgres@db04 db01]$  repmgr daemon status
ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | primary | * running |          | running | 8470  | no      | n/a
2  | db02 | standby |   running | db01     | running | 1716  | no      | 3 second(s) ago
3  | db03 | standby |   running | db01     | running | 17700 | no      | 3 second(s) ago
4  | db04 | witness | * running | db01     | running | 26629 | no      | 0 second(s) ago

[postgres@db04 db01]$ repmgr cluster show --compact
ID | Name | Role    | Status    | Upstream | Location | Prio. | TLI
----+------+---------+-----------+----------+----------+-------+-----
1  | db01 | primary | * running |          | default  | 100   | 7
2  | db02 | standby |   running | db01     | default  | 100   | 7
3  | db03 | standby |   running | db01     | default  | 100   | 7
4  | db04 | witness | * running | db01     | default  | 0     | n/a

postgres@db04 db01]$ repmgr cluster matrix
INFO: connecting to database
Name | ID | 1 | 2 | 3 | 4
------+----+---+---+---+---
db01 | 1  | * | * | * | *
db02 | 2  | * | * | * | *
db03 | 3  | * | * | * | *
db04 | 4  | * | * | * | *

[postgres@db04 db01]$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                       
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 100      | 7        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4  | db04 | witness | * running | db01     | default  | 0        | n/a      | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2



03,数据库更改优先级
[postgres@db01 ~]$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 100      | 7        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4  | db04 | witness | * running | db01     | default  | 0        | n/a      | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

#更改为200(priority)
node_id=1
node_name='db01'
conninfo='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
priority=200
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5

[postgres@db01 ~]$ repmgr primary register --force
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: primary node record (ID: 1) updated
[postgres@db01 ~]$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 200      | 7        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4  | db04 | witness | * running | db01     | default  | 0        | n/a      | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

#db02更优先级300
#更改配置文件
-bash-4.2$ cat /etc/repmgr.conf
node_id=2
node_name='db02'
conninfo='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
priority=300
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5


-bash-4.2$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 200      | 7        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4  | db04 | witness | * running | db01     | default  | 0        | n/a      | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

#apply the changed setting
-bash-4.2$ repmgr standby register --force
INFO: connecting to local node "db02" (ID: 2)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "db02" (ID: 2) successfully registered

#检查更改情况
-bash-4.2$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 200      | 7        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 300      | 7        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db01     | default  | 100      | 7        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4  | db04 | witness | * running | db01     | default  | 0        | n/a      | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2

故障切换:
db01关机前状态:
[postgres@db01 ~]$ repmgr daemon status
ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | primary | * running |          | running | 8470  | no      | n/a
2  | db02 | standby |   running | db01     | running | 1716  | no      | 2 second(s) ago
3  | db03 | standby |   running | db01     | running | 17700 | no      | 2 second(s) ago
4  | db04 | witness | * running | db01     | running | 26629 | no      | 1 second(s) ago

[postgres@db04 db01]$  repmgr daemon status
ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | primary | * running |          | running | 8470  | no      | n/a
2  | db02 | standby |   running | db01     | running | 1716  | no      | 3 second(s) ago
3  | db03 | standby |   running | db01     | running | 17700 | no      | 3 second(s) ago
4  | db04 | witness | * running | db01     | running | 26629 | no      | 0 second(s) ago

db01关机:
db04:witness自动指向新的主数据库
[2021-10-29 01:59:46] [WARNING] unable to reconnect to node 1 after 6 attempts
[2021-10-29 01:59:48] [NOTICE] witness node "db04" (ID: 4) now following new primary node "db02" (ID: 2)
[2021-10-29 01:59:48] [INFO] resuming witness monitoring mode
[2021-10-29 01:59:48] [DETAIL] following new primary "db02" (ID: 2)
[2021-10-29 01:59:48] [INFO] witness monitoring connection to primary node "db02" (ID: 2)

自动切换之后集群状态:
#db02切换为主数据库:(提升promoted a standby)

-bash-4.2$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | - failed  | ?        | default  | 200      |          | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | primary | * running |          | default  | 300      | 8        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db02     | default  | 100      | 8        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
4  | db04 | witness | * running | db02     | default  | 0        | n/a      | host=192.168.5.203 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
  - unable to connect to node "db01" (ID: 1)
HINT: execute with --verbose option to see connection error messages

-bash-4.2$ repmgr service status
ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | primary | - failed  | ?        | n/a     | n/a   | n/a     | n/a
2  | db02 | primary | * running |          | running | 1716  | no      | n/a
3  | db03 | standby |   running | db02     | running | 17700 | no      | 0 second(s) ago
4  | db04 | witness | * running | db02     | running | 26629 | no      | 1 second(s) ago
WARNING: following issues were detected
  - unable to  connect to node "db01" (ID: 1)
HINT: execute with --verbose option to see connection error messages


##db02切换日志:
[2021-10-29 01:59:44] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts
[2021-10-29 01:59:46] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"
[2021-10-29 01:59:46] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-29 01:59:46] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-29 01:59:46] [INFO] 2 active sibling nodes registered
[2021-10-29 01:59:46] [INFO] 4 total nodes registered
[2021-10-29 01:59:46] [INFO] primary node  "db01" (ID: 1) and this node have the same location ("default")
[2021-10-29 01:59:46] [INFO] local node's last receive lsn: 0/1A050250
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db03" (ID: 3)
[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) reports its upstream is node 1, last seen 65 second(s) ago
[2021-10-29 01:59:46] [INFO] standby node "db03" (ID: 3) last saw primary node 65 second(s) ago
[2021-10-29 01:59:46] [INFO] last receive LSN for sibling node "db03" (ID: 3) is: 0/1A050250
[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) has same LSN as current candidate "db02" (ID: 2)
[2021-10-29 01:59:46] [INFO] node "db03" (ID: 3) has lower priority (100) than current candidate "db02" (ID: 2) (300)
#根据优先级,决定db02为主。
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db04" (ID: 4)
[2021-10-29 01:59:46] [INFO] node "db04" (ID: 4) reports its upstream is node 1, last seen 61 second(s) ago
[2021-10-29 01:59:46] [INFO] witness node "db04" (ID: 4) last saw primary node 61 second(s) ago
[2021-10-29 01:59:46] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 10 seconds
[2021-10-29 01:59:46] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-29 01:59:46] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2021-10-29 01:59:46] [INFO] promote_command is:
  "/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-10-29 01:59:46] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-29 01:59:47] [WARNING] 2 sibling nodes found, but option "--siblings-follow" not specified
[2021-10-29 01:59:47] [DETAIL] these nodes will remain attached to the current primary:
  db03 (node ID: 3)
  db04 (node ID: 4, witness server)
[2021-10-29 01:59:47] [NOTICE] promoting standby to primary
[2021-10-29 01:59:47] [DETAIL] promoting server "db02" (ID: 2) using pg_promote()
[2021-10-29 01:59:47] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-10-29 01:59:48] [NOTICE] STANDBY PROMOTE successful
[2021-10-29 01:59:48] [DETAIL] server "db02" (ID: 2) was successfully promoted to primary
[2021-10-29 01:59:48] [INFO] checking state of node 2, 1 of 6 attempts
[2021-10-29 01:59:48] [NOTICE] node 2 has recovered, reconnecting
[2021-10-29 01:59:48] [INFO] connection to node 2 succeeded
[2021-10-29 01:59:48] [INFO] original connection is still available
[2021-10-29 01:59:48] [INFO] 2 followers to notify
[2021-10-29 01:59:48] [NOTICE] notifying node "db03" (ID: 3) to follow node 2
INFO:  node 3 received notification to follow node 2
[2021-10-29 01:59:48] [NOTICE] notifying node "db04" (ID: 4) to follow node 2
INFO:  node 4 received notification to follow node 2
[2021-10-29 01:59:48] [INFO] switching to primary monitoring mode
[2021-10-29 01:59:48] [NOTICE] monitoring cluster primary "db02" (ID: 2)
[2021-10-29 01:59:53] [NOTICE] new witness "db04" (ID: 4) has connected
[2021-10-29 01:59:53] [NOTICE] new standby "db03" (ID: 3) has connected
[2021-10-29 02:04:48] [INFO] monitoring primary node "db02" (ID: 2) in normal state
db03日志:
[2021-10-29 01:59:46] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-29 01:59:46] [INFO] 2 active sibling nodes registered
[2021-10-29 01:59:46] [INFO] 4 total nodes registered
[2021-10-29 01:59:46] [INFO] primary node  "db01" (ID: 1) and this node have the same location ("default")
[2021-10-29 01:59:46] [INFO] local node's last receive lsn: 0/1A050250
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db02" (ID: 2)
[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) reports its upstream is node 1, last seen 65 second(s) ago
[2021-10-29 01:59:46] [INFO] standby node "db02" (ID: 2) last saw primary node 65 second(s) ago
[2021-10-29 01:59:46] [INFO] last receive LSN for sibling node "db02" (ID: 2) is: 0/1A050250
[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) has same LSN as current candidate "db03" (ID: 3)
[2021-10-29 01:59:46] [INFO] node "db02" (ID: 2) has higher priority (300) than current candidate "db03" (ID: 3) (100)
[2021-10-29 01:59:46] [INFO] checking state of sibling node "db04" (ID: 4)
[2021-10-29 01:59:46] [INFO] node "db04" (ID: 4) reports its upstream is node 1, last seen 61 second(s) ago
[2021-10-29 01:59:46] [INFO] witness node "db04" (ID: 4) last saw primary node 61 second(s) ago
[2021-10-29 01:59:46] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 10 seconds
[2021-10-29 01:59:46] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-29 01:59:46] [INFO] follower node awaiting notification from a candidate node
[2021-10-29 01:59:49] [NOTICE] attempting to follow new primary "db02" (node ID: 2)
[2021-10-29 01:59:49] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-29 01:59:49] [INFO] local node 3 can attach to follow target node 2
[2021-10-29 01:59:49] [DETAIL] local node's recovery point: 0/1A050250; follow target node's fork point: 0/1A050250
[2021-10-29 01:59:49] [NOTICE] setting node 3's upstream to node 2
[2021-10-29 01:59:49] [WARNING] node "db03" not found in "pg_stat_replication"
[2021-10-29 01:59:50] [NOTICE] STANDBY FOLLOW successful
[2021-10-29 01:59:50] [DETAIL] standby attached to upstream node "db02" (ID: 2)
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2021-10-29 01:59:50] [NOTICE] node "db03" (ID: 3) now following new upstream node "db02" (ID: 2)
[2021-10-29 01:59:50] [INFO] resuming standby monitoring mode
[2021-10-29 01:59:50] [DETAIL] following new primary "db02" (ID: 2)
[2021-10-29 02:04:50] [INFO] node "db03" (ID: 3) monitoring upstream node "db02" (ID: 2) in normal state
[2021-10-29 02:04:50] [DETAIL] last monitoring statistics update was 5 seconds ago

 

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。