repmgr-repmgrd-自动故障切换
【摘要】 1,可以利用repmgrd进程实现自动的failover.2,repmgr.conf文件中将location参数设置为一致,不设置的话默认也是一致的。(location='dc1'/ (default is default)).3,同时启动repmgrd必须在postgres.conf配置文件中设置shared_preload_libraries=‘repmgr’#### repmgr.c...
1,可以利用repmgrd进程实现自动的failover.
2,repmgr.conf文件中将location参数设置为一致,不设置的话默认也是一致的。(location='dc1'/ (default is default)).
3,同时启动repmgrd必须在postgres.conf配置文件中设置shared_preload_libraries=‘repmgr’
#### repmgr.conf配置:
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5
#monitoring_history=true (启用监控参数)
#monitor_interval_secs=5(定义监视数据间隔写入时间参数)
#reconnect_attempts=10(故障转移之前,尝试重新连接主库次数(默认为6)参数)
#reconnect_interval=5(每间隔5s尝试重新连接一次参数)
####数据库参数配置
postgresql.conf
shared_preload_libraries = 'repmgr'
####重启数据库
repmgr node service --action=restart
#####启动repmgrd
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid
#停止
kill `cat /tmp/repmgrd.pid`
####repmgrd启动停止命令:
repmgr daemon start
repmgr daemon stop
#/etc/repmgr.conf
repmgrd_service_start_command='/usr/local/postgresql/bin/repmgrd repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid'
repmgrd_service_stop_command=' kill `cat /tmp/repmgrd.pid`'
#未设置启动报错;
-bash-4.2$ repmgr daemon stop
ERROR: "repmgrd_service_stop_command" is not set
HINT: set "repmgrd_service_stop_command" in "repmgr.conf"
-bash-4.2$ repmgr daemon start
ERROR: "repmgrd_service_start_command" is not set
HINT: set "repmgrd_service_start_command" in "repmgr.conf"
db01
|
192.168.5.200
|
|
db02
|
192.168.5.201
|
|
db03
|
192.168.5.202
|
|
########
repmgr.conf配置文件信息:
[postgres@db01 ~]$ more /etc/repmgr.conf
node_id=1
node_name='db01'
conninfo='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5
-bash-4.2$ more /etc/repmgr.conf
node_id=2
node_name='db02'
conninfo='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5
-bash-4.2$ more /etc/repmgr.conf
node_id=3
node_name='db03'
conninfo='host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5
2节点故障模拟自动切换:
#状态检查:
#repmgrd -f /home/pg10/conf/db02.conf --pid-file /tmp/repmgrd.pid
-bash-4.2$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | standby | running | db02 | running | 2884 | no | 2 second(s) ago
2 | db02 | primary | * running | | running | 16432 | no | n/a
#停止 primary db02数据库(模拟故障):
db01自动切换为primary:
[2021-10-28 20:48:57] [INFO] checking state of node "db02" (ID: 2), 5 of 6 attempts
[2021-10-28 20:48:57] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.201 fallback_application_name=repmgr"
[2021-10-28 20:48:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 20:48:57] [INFO] sleeping up to 10 seconds until next reconnection attempt
[2021-10-28 20:49:07] [INFO] checking state of node "db02" (ID: 2), 6 of 6 attempts
[2021-10-28 20:49:07] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.201 fallback_application_name=repmgr"
[2021-10-28 20:49:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 20:49:07] [WARNING] unable to reconnect to node "db02" (ID: 2) after 6 attempts
[2021-10-28 20:49:07] [INFO] 0 active sibling nodes registered
[2021-10-28 20:49:07] [INFO] 2 total nodes registered
[2021-10-28 20:49:07] [INFO] primary node "db02" (ID: 2) and this node have the same location ("default")
[2021-10-28 20:49:07] [INFO] no other sibling nodes - we win by default
[2021-10-28 20:49:07] [NOTICE] this node is the only available candidate and will now promote itself
[2021-10-28 20:49:07] [INFO] promote_command is:
"/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-10-28 20:49:07] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[2021-10-28 20:49:07] [NOTICE] promoting standby to primary
[2021-10-28 20:49:07] [DETAIL] promoting server "db01" (ID: 1) using pg_promote()
[2021-10-28 20:49:07] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-10-28 20:49:08] [NOTICE] STANDBY PROMOTE successful
[2021-10-28 20:49:08] [DETAIL] server "db01" (ID: 1) was successfully promoted to primary
[2021-10-28 20:49:08] [INFO] checking state of node 1, 1 of 6 attempts
[2021-10-28 20:49:08] [NOTICE] node 1 has recovered, reconnecting
[2021-10-28 20:49:08] [INFO] connection to node 1 succeeded
[2021-10-28 20:49:08] [INFO] original connection is still available
[2021-10-28 20:49:08] [INFO] 0 followers to notify
[2021-10-28 20:49:08] [INFO] switching to primary monitoring mode
[2021-10-28 20:49:08] [NOTICE] monitoring cluster primary "db01" (ID: 1)
-bash-4.2$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+----------------------+----------+---------+-------+---------+--------------------
1 | db01 | standby | ! running as primary | | running | 15737 | no | n/a
2 | db02 | primary | ? unreachable | ? | n/a | n/a | n/a | n/a
WARNING: following issues were detected
- node "db01" (ID: 1) is registered as standby but running as primary
- unable to connect to node "db02" (ID: 2)
- node "db02" (ID: 2) is registered as an active primary but is unreachable
HINT: execute with --verbose option to see connection error messages
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+----------------------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | standby | ! running as primary | | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | primary | ? unreachable | ? | default | 100 | | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- node "db01" (ID: 1) is registered as standby but running as primary
- unable to connect to node "db02" (ID: 2)
- node "db02" (ID: 2) is registered as an active primary but is unreachable
HINT: execute with --verbose option to see connection error messages
#db02重新加入:
repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
####
-bash-4.2$ repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
INFO: looking for configuration file in /etc
INFO: configuration file found at: "/etc/repmgr.conf"
NOTICE: rejoin target is node "db01" (ID: 1)
INFO: prerequisites for using pg_rewind are met
INFO: 2 files copied to "/tmp/repmgr-config-archive-db02"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 2 files copied to /data/db01
INFO: directory "/tmp/repmgr-config-archive-db02" deleted
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl -w -D '/data/db01' start"
INFO: node "db02" (ID: 2) is pingable
INFO: node "db02" (ID: 2) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
#检查集群状态:
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 100 | 4 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3节点故障模拟自动切换:
#集群状态
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | * running | | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | standby | running | db01 | default | 100 | 5 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db01 | default | 100 | 5 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
-bash-4.2$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | primary | * running | | running | 15737 | no | n/a
2 | db02 | standby | running | db01 | running | 1716 | no | 4 second(s) ago
3 | db03 | standby | running | db01 | running | 17700 | no | 4 second(s) ago
-bash-4.2$
#db01 关机(模拟故障)
[root@db01 ~]# init 0
#db02日志:
[2021-10-28 21:37:41] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts
[2021-10-28 21:37:43] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"
[2021-10-28 21:37:43] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 21:37:43] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-28 21:37:43] [INFO] 1 active sibling nodes registered
[2021-10-28 21:37:43] [INFO] 3 total nodes registered
[2021-10-28 21:37:43] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")
[2021-10-28 21:37:43] [INFO] local node's last receive lsn: 0/13019878
[2021-10-28 21:37:43] [INFO] checking state of sibling node "db03" (ID: 3)
[2021-10-28 21:37:43] [INFO] node "db03" (ID: 3) reports its upstream is node 1, last seen 69 second(s) ago
[2021-10-28 21:37:43] [INFO] standby node "db03" (ID: 3) last saw primary node 69 second(s) ago
[2021-10-28 21:37:43] [INFO] last receive LSN for sibling node "db03" (ID: 3) is: 0/13019878
[2021-10-28 21:37:43] [INFO] node "db03" (ID: 3) has same LSN as current candidate "db02" (ID: 2)
[2021-10-28 21:37:43] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds
[2021-10-28 21:37:43] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-28 21:37:43] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2021-10-28 21:37:43] [INFO] promote_command is:
"/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-10-28 21:37:43] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[2021-10-28 21:37:44] [WARNING] 1 sibling nodes found, but option "--siblings-follow" not specified
[2021-10-28 21:37:44] [DETAIL] these nodes will remain attached to the current primary:
db03 (node ID: 3)
[2021-10-28 21:37:44] [NOTICE] promoting standby to primary
[2021-10-28 21:37:44] [DETAIL] promoting server "db02" (ID: 2) using pg_promote()
[2021-10-28 21:37:44] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-10-28 21:37:45] [NOTICE] STANDBY PROMOTE successful
[2021-10-28 21:37:45] [DETAIL] server "db02" (ID: 2) was successfully promoted to primary
[2021-10-28 21:37:45] [INFO] checking state of node 2, 1 of 6 attempts
[2021-10-28 21:37:45] [NOTICE] node 2 has recovered, reconnecting
[2021-10-28 21:37:45] [INFO] connection to node 2 succeeded
[2021-10-28 21:37:45] [INFO] original connection is still available
[2021-10-28 21:37:45] [INFO] 1 followers to notify
[2021-10-28 21:37:45] [NOTICE] notifying node "db03" (ID: 3) to follow node 2
INFO: node 3 received notification to follow node 2
[2021-10-28 21:37:45] [INFO] switching to primary monitoring mode
[2021-10-28 21:37:45] [NOTICE] monitoring cluster primary "db02" (ID: 2)
[2021-10-28 21:37:50] [NOTICE] new standby "db03" (ID: 3) has connected
[2021-10-28 21:42:46] [INFO] monitoring primary node "db02" (ID: 2) in normal state
#db03日志:
[2021-10-28 21:37:41] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts
[2021-10-28 21:37:43] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"
[2021-10-28 21:37:43] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 21:37:43] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-28 21:37:43] [INFO] 1 active sibling nodes registered
[2021-10-28 21:37:43] [INFO] 3 total nodes registered
[2021-10-28 21:37:43] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")
[2021-10-28 21:37:43] [INFO] local node's last receive lsn: 0/13019878
[2021-10-28 21:37:43] [INFO] checking state of sibling node "db02" (ID: 2)
[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) reports its upstream is node 1, last seen 68 second(s) ago
[2021-10-28 21:37:43] [INFO] standby node "db02" (ID: 2) last saw primary node 68 second(s) ago
[2021-10-28 21:37:43] [INFO] last receive LSN for sibling node "db02" (ID: 2) is: 0/13019878
[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) has same LSN as current candidate "db03" (ID: 3)
[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) has same priority but lower node_id than current candidate "db03" (ID: 3)
[2021-10-28 21:37:43] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds
[2021-10-28 21:37:43] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-28 21:37:43] [INFO] follower node awaiting notification from a candidate node
2021-10-28 21:37:44.692 EDT [17737] FATAL: could not connect to the primary server: could not connect to server: No route to host
Is the server running on host "192.168.5.200" and accepting
TCP/IP connections on port 5432?
[2021-10-28 21:37:46] [NOTICE] attempting to follow new primary "db02" (node ID: 2)
[2021-10-28 21:37:46] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[2021-10-28 21:37:46] [INFO] local node 3 can attach to follow target node 2
[2021-10-28 21:37:46] [DETAIL] local node's recovery point: 0/13019878; follow target node's fork point: 0/13019878
[2021-10-28 21:37:46] [NOTICE] setting node 3's upstream to node 2
2021-10-28 21:37:46.718 EDT [17680] LOG: received SIGHUP, reloading configuration files
2021-10-28 21:37:46.719 EDT [17680] LOG: parameter "primary_conninfo" changed to "user=repmgr connect_timeout=2 host=192.168.5.201 application_name=db03"
2021-10-28 21:37:46.720 EDT [17681] LOG: WAL receiver process shutdown requested
2021-10-28 21:37:46.721 EDT [17741] FATAL: terminating walreceiver process due to administrator command
[2021-10-28 21:37:46] [WARNING] node "db03" attached in state "startup"
2021-10-28 21:37:46.726 EDT [17749] LOG: fetching timeline history file for timeline 6 from primary server
2021-10-28 21:37:46.727 EDT [17749] LOG: started streaming WAL from primary at 0/13000000 on timeline 5
2021-10-28 21:37:46.727 EDT [17749] LOG: replication terminated by primary server
2021-10-28 21:37:46.727 EDT [17749] DETAIL: End of WAL reached on timeline 5 at 0/13019878.
2021-10-28 21:37:46.728 EDT [17681] LOG: new target timeline is 6
2021-10-28 21:37:46.728 EDT [17749] LOG: restarted WAL streaming at 0/13000000 on timeline 6
[2021-10-28 21:37:47] [NOTICE] STANDBY FOLLOW successful
[2021-10-28 21:37:47] [DETAIL] standby attached to upstream node "db02" (ID: 2)
INFO: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2021-10-28 21:37:47] [NOTICE] node "db03" (ID: 3) now following new upstream node "db02" (ID: 2)
[2021-10-28 21:37:47] [INFO] resuming standby monitoring mode
[2021-10-28 21:37:47] [DETAIL] following new primary "db02" (ID: 2)
[2021-10-28 21:42:48] [INFO] node "db03" (ID: 3) monitoring upstream node "db02" (ID: 2) in normal state
[2021-10-28 21:42:48] [DETAIL] last monitoring statistics update was 5 seconds ago
#自动切换之后状态:
-bash-4.2$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | primary | - failed | ? | default | 100 | | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | primary | * running | | default | 100 | 6 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db02 | default | 100 | 6 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "db01" (ID: 1)
HINT: execute with --verbose option to see connection error messages
db01恢复重新加入:
repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
#加入日志:
[postgres@db01 ~]$ repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
INFO: looking for configuration file in /etc
INFO: configuration file found at: "/etc/repmgr.conf"
NOTICE: rejoin target is node "db02" (ID: 2)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7024014994509133506
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 0/14000028
INFO: prerequisites for using pg_rewind are met
INFO: temporary archive directory "/tmp/repmgr-config-archive-db01" created
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-db01/postgresql.conf"
INFO: file "postgresql.auto.conf" would be copied to "/tmp/repmgr-config-archive-db01/postgresql.auto.conf"
INFO: 2 files would have been copied to "/tmp/repmgr-config-archive-db01"
INFO: temporary archive directory "/tmp/repmgr-config-archive-db01" deleted
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met
[postgres@db01 ~]$ repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
INFO: looking for configuration file in /etc
INFO: configuration file found at: "/etc/repmgr.conf"
NOTICE: rejoin target is node "db02" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 0/14000028
INFO: prerequisites for using pg_rewind are met
INFO: 2 files copied to "/tmp/repmgr-config-archive-db01"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 2 files copied to /data/db01
INFO: directory "/tmp/repmgr-config-archive-db01" deleted
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl -w -D '/data/db01' start"
INFO: node "db01" (ID: 1) is pingable
INFO: node "db01" (ID: 1) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
#重新rejoin加入之后,数据库自动启动:
[postgres@db01 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | db01 | standby | running | db02 | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2 | db02 | primary | * running | | default | 100 | 6 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3 | db03 | standby | running | db02 | default | 100 | 6 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
[postgres@db01 ~]$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+-------------+-------+---------+--------------------
1 | db01 | standby | running | db02 | not running | n/a | n/a | n/a
2 | db02 | primary | * running | | running | 1716 | no | n/a
3 | db03 | standby | running | db02 | running | 17700 | no | 1 second(s) ago
db01启动repmgrd
[postgres@db01 ~]$ repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid
[2021-10-28 22:02:54] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[postgres@db01 ~]$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1 | db01 | standby | running | db02 | running | 1619 | no | 2 second(s) ago
2 | db02 | primary | * running | | running | 1716 | no | n/a
3 | db03 | standby | running | db02 | running | 17700 | no | 3 second(s) ago
参考:
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)