repmgr-repmgrd-自动故障切换

举报
snowofsummer 发表于 2021/12/30 08:40:05 2021/12/30
【摘要】  1,可以利用repmgrd进程实现自动的failover.2,repmgr.conf文件中将location参数设置为一致,不设置的话默认也是一致的。(location='dc1'/ (default is default)).3,同时启动repmgrd必须在postgres.conf配置文件中设置shared_preload_libraries=‘repmgr’#### repmgr.c...

 

1,可以利用repmgrd进程实现自动的failover.
2,repmgr.conf文件中将location参数设置为一致,不设置的话默认也是一致的。(location='dc1'/ (default is default)).
3,同时启动repmgrd必须在postgres.conf配置文件中设置shared_preload_libraries=‘repmgr’

#### repmgr.conf配置:



failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5

#monitoring_history=true (启用监控参数)                    
#monitor_interval_secs=5(定义监视数据间隔写入时间参数)
#reconnect_attempts=10(故障转移之前,尝试重新连接主库次数(默认为6)参数)
#reconnect_interval=5(每间隔5s尝试重新连接一次参数)

####数据库参数配置
postgresql.conf 
shared_preload_libraries = 'repmgr'


####重启数据库
repmgr node service --action=restart

#####启动repmgrd
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid
#停止
 kill `cat /tmp/repmgrd.pid`
####repmgrd启动停止命令:
repmgr daemon start
repmgr daemon stop
#/etc/repmgr.conf
repmgrd_service_start_command='/usr/local/postgresql/bin/repmgrd repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid'
repmgrd_service_stop_command=' kill `cat /tmp/repmgrd.pid`'

#未设置启动报错;
-bash-4.2$ repmgr daemon stop
ERROR: "repmgrd_service_stop_command" is not set
HINT: set "repmgrd_service_stop_command" in "repmgr.conf"
-bash-4.2$ repmgr daemon start
ERROR: "repmgrd_service_start_command" is not set
HINT: set "repmgrd_service_start_command" in "repmgr.conf"



db01
192.168.5.200

db02
192.168.5.201

db03
192.168.5.202


########
repmgr.conf配置文件信息:

[postgres@db01 ~]$ more /etc/repmgr.conf
node_id=1
node_name='db01'
conninfo='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5

-bash-4.2$ more /etc/repmgr.conf
node_id=2
node_name='db02'
conninfo='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5


-bash-4.2$ more /etc/repmgr.conf
node_id=3
node_name='db03'
conninfo='host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/db01'
pg_bindir='/usr/local/postgresql/bin'
failover=automatic
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/tmp/repmgrd.log'
monitoring_history=true
monitor_interval_secs=5








2节点故障模拟自动切换:
#状态检查:
#repmgrd -f /home/pg10/conf/db02.conf --pid-file /tmp/repmgrd.pid
-bash-4.2$ repmgr  service  status

ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | standby |   running | db02     | running | 2884  | no      | 2 second(s) ago
2  | db02 | primary | * running |          | running | 16432 | no      | n/a

#停止 primary db02数据库(模拟故障):

db01自动切换为primary:
[2021-10-28 20:48:57] [INFO] checking state of node "db02" (ID: 2), 5 of 6 attempts
[2021-10-28 20:48:57] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.201 fallback_application_name=repmgr"
[2021-10-28 20:48:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 20:48:57] [INFO] sleeping up to 10 seconds until next reconnection attempt
[2021-10-28 20:49:07] [INFO] checking state of node "db02" (ID: 2), 6 of 6 attempts
[2021-10-28 20:49:07] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.201 fallback_application_name=repmgr"
[2021-10-28 20:49:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 20:49:07] [WARNING] unable to reconnect to node "db02" (ID: 2) after 6 attempts
[2021-10-28 20:49:07] [INFO] 0 active sibling nodes registered
[2021-10-28 20:49:07] [INFO] 2 total nodes registered
[2021-10-28 20:49:07] [INFO] primary node  "db02" (ID: 2) and this node have the same location ("default")
[2021-10-28 20:49:07] [INFO] no other sibling nodes - we win by default
[2021-10-28 20:49:07] [NOTICE] this node is the only available candidate and will now promote itself
[2021-10-28 20:49:07] [INFO] promote_command is:
  "/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2021-10-28 20:49:07] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-28 20:49:07] [NOTICE] promoting standby to primary
[2021-10-28 20:49:07] [DETAIL] promoting server "db01" (ID: 1) using pg_promote()
[2021-10-28 20:49:07] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-10-28 20:49:08] [NOTICE] STANDBY PROMOTE successful
[2021-10-28 20:49:08] [DETAIL] server "db01" (ID: 1) was successfully promoted to primary
[2021-10-28 20:49:08] [INFO] checking state of node 1, 1 of 6 attempts
[2021-10-28 20:49:08] [NOTICE] node 1 has recovered, reconnecting
[2021-10-28 20:49:08] [INFO] connection to node 1 succeeded
[2021-10-28 20:49:08] [INFO] original connection is still available
[2021-10-28 20:49:08] [INFO] 0 followers to notify
[2021-10-28 20:49:08] [INFO] switching to primary monitoring mode
[2021-10-28 20:49:08] [NOTICE] monitoring cluster primary "db01" (ID: 1)

-bash-4.2$ repmgr  service  status
ID | Name | Role    | Status               | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+----------------------+----------+---------+-------+---------+--------------------
1  | db01 | standby | ! running as primary |          | running | 15737 | no      | n/a
2  | db02 | primary | ? unreachable        | ?        | n/a     | n/a   | n/a     | n/a


WARNING: following issues were detected
  - node "db01" (ID: 1) is registered as standby but running as primary
  - unable to  connect to node "db02" (ID: 2)
  - node "db02" (ID: 2) is registered as an active primary but is unreachable
HINT: execute with --verbose option to see connection error messages

-bash-4.2$ repmgr  cluster show
ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                     
----+------+---------+----------------------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | standby | ! running as primary |          | default  | 100      | 5        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | primary | ? unreachable        | ?        | default  | 100      |          | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2


WARNING: following issues were detected
  - node "db01" (ID: 1) is registered as standby but running as primary
  - unable to connect to node "db02" (ID: 2)
  - node "db02" (ID: 2) is registered as an active primary but is unreachable
HINT: execute with --verbose option to see connection error messages

#db02重新加入:
repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
####
-bash-4.2$ repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
INFO: looking for configuration file in /etc
INFO: configuration file found at: "/etc/repmgr.conf"
NOTICE: rejoin target is node "db01" (ID: 1)
INFO: prerequisites for using pg_rewind are met
INFO: 2 files copied to "/tmp/repmgr-config-archive-db02"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 2 files copied to /data/db01
INFO: directory "/tmp/repmgr-config-archive-db02" deleted
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl  -w -D '/data/db01' start"
INFO: node "db02" (ID: 2) is pingable
INFO: node "db02" (ID: 2) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1

#检查集群状态:
-bash-4.2$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 100      | 5        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 100      | 4        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2




3节点故障模拟自动切换:

#集群状态
-bash-4.2$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | * running |          | default  | 100      | 5        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | standby |   running | db01     | default  | 100      | 5        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db01     | default  | 100      | 5        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr service status
ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | primary | * running |          | running | 15737 | no      | n/a
2  | db02 | standby |   running | db01     | running | 1716  | no      | 4 second(s) ago
3  | db03 | standby |   running | db01     | running | 17700 | no      | 4 second(s) ago
-bash-4.2$


#db01 关机(模拟故障)
[root@db01 ~]# init 0

#db02日志:
[2021-10-28 21:37:41] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts
[2021-10-28 21:37:43] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"
[2021-10-28 21:37:43] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 21:37:43] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-28 21:37:43] [INFO] 1 active sibling nodes registered
[2021-10-28 21:37:43] [INFO] 3 total nodes registered
[2021-10-28 21:37:43] [INFO] primary node  "db01" (ID: 1) and this node have the same location ("default")
[2021-10-28 21:37:43] [INFO] local node's last receive lsn: 0/13019878
[2021-10-28 21:37:43] [INFO] checking state of sibling node "db03" (ID: 3)
[2021-10-28 21:37:43] [INFO] node "db03" (ID: 3) reports its upstream is node 1, last seen 69 second(s) ago
[2021-10-28 21:37:43] [INFO] standby node "db03" (ID: 3) last saw primary node 69 second(s) ago
[2021-10-28 21:37:43] [INFO] last receive LSN for sibling node "db03" (ID: 3) is: 0/13019878
[2021-10-28 21:37:43] [INFO] node "db03" (ID: 3) has same LSN as current candidate "db02" (ID: 2)
[2021-10-28 21:37:43] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds
[2021-10-28 21:37:43] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-28 21:37:43] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2021-10-28 21:37:43] [INFO] promote_command is:
  "/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"

[2021-10-28 21:37:43] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[2021-10-28 21:37:44] [WARNING] 1 sibling nodes found, but option "--siblings-follow" not specified
[2021-10-28 21:37:44] [DETAIL] these nodes will remain attached to the current primary:
  db03 (node ID: 3)
[2021-10-28 21:37:44] [NOTICE] promoting standby to primary
[2021-10-28 21:37:44] [DETAIL] promoting server "db02" (ID: 2) using pg_promote()
[2021-10-28 21:37:44] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2021-10-28 21:37:45] [NOTICE] STANDBY PROMOTE successful
[2021-10-28 21:37:45] [DETAIL] server "db02" (ID: 2) was successfully promoted to primary
[2021-10-28 21:37:45] [INFO] checking state of node 2, 1 of 6 attempts
[2021-10-28 21:37:45] [NOTICE] node 2 has recovered, reconnecting
[2021-10-28 21:37:45] [INFO] connection to node 2 succeeded
[2021-10-28 21:37:45] [INFO] original connection is still available
[2021-10-28 21:37:45] [INFO] 1 followers to notify
[2021-10-28 21:37:45] [NOTICE] notifying node "db03" (ID: 3) to follow node 2
INFO:  node 3 received notification to follow node 2
[2021-10-28 21:37:45] [INFO] switching to primary monitoring mode
[2021-10-28 21:37:45] [NOTICE] monitoring cluster primary "db02" (ID: 2)
[2021-10-28 21:37:50] [NOTICE] new standby "db03" (ID: 3) has connected
[2021-10-28 21:42:46] [INFO] monitoring primary node "db02" (ID: 2) in normal state


#db03日志:
[2021-10-28 21:37:41] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts
[2021-10-28 21:37:43] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"
[2021-10-28 21:37:43] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-28 21:37:43] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts
[2021-10-28 21:37:43] [INFO] 1 active sibling nodes registered
[2021-10-28 21:37:43] [INFO] 3 total nodes registered
[2021-10-28 21:37:43] [INFO] primary node  "db01" (ID: 1) and this node have the same location ("default")
[2021-10-28 21:37:43] [INFO] local node's last receive lsn: 0/13019878
[2021-10-28 21:37:43] [INFO] checking state of sibling node "db02" (ID: 2)
[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) reports its upstream is node 1, last seen 68 second(s) ago
[2021-10-28 21:37:43] [INFO] standby node "db02" (ID: 2) last saw primary node 68 second(s) ago
[2021-10-28 21:37:43] [INFO] last receive LSN for sibling node "db02" (ID: 2) is: 0/13019878
[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) has same LSN as current candidate "db03" (ID: 3)
[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) has same priority but lower node_id than current candidate "db03" (ID: 3)
[2021-10-28 21:37:43] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds
[2021-10-28 21:37:43] [NOTICE] promotion candidate is "db02" (ID: 2)
[2021-10-28 21:37:43] [INFO] follower node awaiting notification from a candidate node
2021-10-28 21:37:44.692 EDT [17737] FATAL:  could not connect to the primary server: could not connect to server: No route to host
                Is the server running on host "192.168.5.200" and accepting
                TCP/IP connections on port 5432?
[2021-10-28 21:37:46] [NOTICE] attempting to follow new primary "db02" (node ID: 2)
[2021-10-28 21:37:46] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-28 21:37:46] [INFO] local node 3 can attach to follow target node 2
[2021-10-28 21:37:46] [DETAIL] local node's recovery point: 0/13019878; follow target node's fork point: 0/13019878
[2021-10-28 21:37:46] [NOTICE] setting node 3's upstream to node 2
2021-10-28 21:37:46.718 EDT [17680] LOG:  received SIGHUP, reloading configuration files
2021-10-28 21:37:46.719 EDT [17680] LOG:  parameter "primary_conninfo" changed to "user=repmgr connect_timeout=2 host=192.168.5.201 application_name=db03"
2021-10-28 21:37:46.720 EDT [17681] LOG:  WAL receiver process shutdown requested
2021-10-28 21:37:46.721 EDT [17741] FATAL:  terminating walreceiver process due to administrator command
[2021-10-28 21:37:46] [WARNING] node "db03" attached in state "startup"
2021-10-28 21:37:46.726 EDT [17749] LOG:  fetching timeline history file for timeline 6 from primary server
2021-10-28 21:37:46.727 EDT [17749] LOG:  started streaming WAL from primary at 0/13000000 on timeline 5
2021-10-28 21:37:46.727 EDT [17749] LOG:  replication terminated by primary server
2021-10-28 21:37:46.727 EDT [17749] DETAIL:  End of WAL reached on timeline 5 at 0/13019878.
2021-10-28 21:37:46.728 EDT [17681] LOG:  new target timeline is 6
2021-10-28 21:37:46.728 EDT [17749] LOG:  restarted WAL streaming at 0/13000000 on timeline 6
[2021-10-28 21:37:47] [NOTICE] STANDBY FOLLOW successful
[2021-10-28 21:37:47] [DETAIL] standby attached to upstream node "db02" (ID: 2)
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2021-10-28 21:37:47] [NOTICE] node "db03" (ID: 3) now following new upstream node "db02" (ID: 2)
[2021-10-28 21:37:47] [INFO] resuming standby monitoring mode
[2021-10-28 21:37:47] [DETAIL] following new primary "db02" (ID: 2)
[2021-10-28 21:42:48] [INFO] node "db03" (ID: 3) monitoring upstream node "db02" (ID: 2) in normal state
[2021-10-28 21:42:48] [DETAIL] last monitoring statistics update was 5 seconds ago


#自动切换之后状态:
-bash-4.2$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | primary | - failed  | ?        | default  | 100      |          | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | primary | * running |          | default  | 100      | 6        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db02     | default  | 100      | 6        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "db01" (ID: 1)
HINT: execute with --verbose option to see connection error messages



db01恢复重新加入:

repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
#加入日志:
[postgres@db01 ~]$ repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
INFO: looking for configuration file in /etc
INFO: configuration file found at: "/etc/repmgr.conf"
NOTICE: rejoin target is node "db02" (ID: 2)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7024014994509133506
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 0/14000028
INFO: prerequisites for using pg_rewind are met
INFO: temporary archive directory "/tmp/repmgr-config-archive-db01" created
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-db01/postgresql.conf"
INFO: file "postgresql.auto.conf" would be copied to "/tmp/repmgr-config-archive-db01/postgresql.auto.conf"
INFO: 2 files would have been copied to "/tmp/repmgr-config-archive-db01"
INFO: temporary archive directory "/tmp/repmgr-config-archive-db01" deleted
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  /usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met
[postgres@db01 ~]$ repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
INFO: looking for configuration file in /etc
INFO: configuration file found at: "/etc/repmgr.conf"
NOTICE: rejoin target is node "db02" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 0/14000028
INFO: prerequisites for using pg_rewind are met
INFO: 2 files copied to "/tmp/repmgr-config-archive-db01"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 2 files copied to /data/db01
INFO: directory "/tmp/repmgr-config-archive-db01" deleted
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl  -w -D '/data/db01' start"
INFO: node "db01" (ID: 1) is pingable
INFO: node "db01" (ID: 1) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
#重新rejoin加入之后,数据库自动启动:
[postgres@db01 ~]$ repmgr cluster show
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1  | db01 | standby |   running | db02     | default  | 100      | 5        | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2
2  | db02 | primary | * running |          | default  | 100      | 6        | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2
3  | db03 | standby |   running | db02     | default  | 100      | 6        | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2
[postgres@db01 ~]$ repmgr service status
ID | Name | Role    | Status    | Upstream | repmgrd     | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+-------------+-------+---------+--------------------
1  | db01 | standby |   running | db02     | not running | n/a   | n/a     | n/a
2  | db02 | primary | * running |          | running     | 1716  | no      | n/a
3  | db03 | standby |   running | db02     | running     | 17700 | no      | 1 second(s) ago
db01启动repmgrd
[postgres@db01 ~]$ repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid
[2021-10-28 22:02:54] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"
[postgres@db01 ~]$ repmgr service status
ID | Name | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+-------+---------+--------------------
1  | db01 | standby |   running | db02     | running | 1619  | no      | 2 second(s) ago
2  | db02 | primary | * running |          | running | 1716  | no      | n/a
3  | db03 | standby |   running | db02     | running | 17700 | no      | 3 second(s) ago




参考:


 

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。