- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

repmgr-repmgrd-自动故障切换

snowofsummer 发表于 2021/12/30 08:40:05 2021/12/30

【摘要】 1,可以利用repmgrd进程实现自动的failover.2,repmgr.conf文件中将location参数设置为一致，不设置的话默认也是一致的。(location='dc1'/ (default is default)).3,同时启动repmgrd必须在postgres.conf配置文件中设置shared_preload_libraries=‘repmgr’#### repmgr.c...

1,可以利用repmgrd进程实现自动的failover.

2,repmgr.conf文件中将location参数设置为一致，不设置的话默认也是一致的。(location='dc1'/ (default is default)).

3,同时启动repmgrd必须在postgres.conf配置文件中设置shared_preload_libraries=‘repmgr’

#### repmgr.conf配置：

failover=automatic

promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

log_file='/tmp/repmgrd.log'

monitoring_history=true

monitor_interval_secs=5

#monitoring_history=true （启用监控参数）

#monitor_interval_secs=5（定义监视数据间隔写入时间参数）

#reconnect_attempts=10（故障转移之前，尝试重新连接主库次数（默认为6）参数）

#reconnect_interval=5（每间隔5s尝试重新连接一次参数）

####数据库参数配置

postgresql.conf

shared_preload_libraries = 'repmgr'

####重启数据库

repmgr node service --action=restart

#####启动repmgrd

repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid

#停止

kill `cat /tmp/repmgrd.pid`

####repmgrd启动停止命令：

repmgr daemon start

repmgr daemon stop

#/etc/repmgr.conf

repmgrd_service_start_command='/usr/local/postgresql/bin/repmgrd repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid'

repmgrd_service_stop_command=' kill `cat /tmp/repmgrd.pid`'

#未设置启动报错;

-bash-4.2$ repmgr daemon stop

ERROR: "repmgrd_service_stop_command" is not set

HINT: set "repmgrd_service_stop_command" in "repmgr.conf"

-bash-4.2$ repmgr daemon start

ERROR: "repmgrd_service_start_command" is not set

HINT: set "repmgrd_service_start_command" in "repmgr.conf"

db01	192.168.5.200
db02	192.168.5.201
db03	192.168.5.202

########

repmgr.conf配置文件信息：

[postgres@db01 ~]$ more /etc/repmgr.conf

node_id=1

node_name='db01'

conninfo='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/db01'

pg_bindir='/usr/local/postgresql/bin'

failover=automatic

promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/local/postgresql/bin/repmgr follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

log_file='/tmp/repmgrd.log'

monitoring_history=true

monitor_interval_secs=5

-bash-4.2$ more /etc/repmgr.conf

node_id=2

node_name='db02'

conninfo='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/db01'

pg_bindir='/usr/local/postgresql/bin'

failover=automatic

promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

log_file='/tmp/repmgrd.log'

monitoring_history=true

monitor_interval_secs=5

-bash-4.2$ more /etc/repmgr.conf

node_id=3

node_name='db03'

conninfo='host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/db01'

pg_bindir='/usr/local/postgresql/bin'

failover=automatic

promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'

follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

log_file='/tmp/repmgrd.log'

monitoring_history=true

monitor_interval_secs=5

2节点故障模拟自动切换：

#状态检查：

#repmgrd -f /home/pg10/conf/db02.conf --pid-file /tmp/repmgrd.pid

-bash-4.2$ repmgr service status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | standby | running | db02 | running | 2884 | no | 2 second(s) ago

2 | db02 | primary | * running | | running | 16432 | no | n/a

#停止 primary db02数据库（模拟故障）：

db01自动切换为primary：

[2021-10-28 20:48:57] [INFO] checking state of node "db02" (ID: 2), 5 of 6 attempts

[2021-10-28 20:48:57] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.201 fallback_application_name=repmgr"

[2021-10-28 20:48:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2021-10-28 20:48:57] [INFO] sleeping up to 10 seconds until next reconnection attempt

[2021-10-28 20:49:07] [INFO] checking state of node "db02" (ID: 2), 6 of 6 attempts

[2021-10-28 20:49:07] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.201 fallback_application_name=repmgr"

[2021-10-28 20:49:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2021-10-28 20:49:07] [WARNING] unable to reconnect to node "db02" (ID: 2) after 6 attempts

[2021-10-28 20:49:07] [INFO] 0 active sibling nodes registered

[2021-10-28 20:49:07] [INFO] 2 total nodes registered

[2021-10-28 20:49:07] [INFO] primary node "db02" (ID: 2) and this node have the same location ("default")

[2021-10-28 20:49:07] [INFO] no other sibling nodes - we win by default

[2021-10-28 20:49:07] [NOTICE] this node is the only available candidate and will now promote itself

[2021-10-28 20:49:07] [INFO] promote_command is:

"/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"

[2021-10-28 20:49:07] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-28 20:49:07] [NOTICE] promoting standby to primary

[2021-10-28 20:49:07] [DETAIL] promoting server "db01" (ID: 1) using pg_promote()

[2021-10-28 20:49:07] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

[2021-10-28 20:49:08] [NOTICE] STANDBY PROMOTE successful

[2021-10-28 20:49:08] [DETAIL] server "db01" (ID: 1) was successfully promoted to primary

[2021-10-28 20:49:08] [INFO] checking state of node 1, 1 of 6 attempts

[2021-10-28 20:49:08] [NOTICE] node 1 has recovered, reconnecting

[2021-10-28 20:49:08] [INFO] connection to node 1 succeeded

[2021-10-28 20:49:08] [INFO] original connection is still available

[2021-10-28 20:49:08] [INFO] 0 followers to notify

[2021-10-28 20:49:08] [INFO] switching to primary monitoring mode

[2021-10-28 20:49:08] [NOTICE] monitoring cluster primary "db01" (ID: 1)

-bash-4.2$ repmgr service status

----+------+---------+----------------------+----------+---------+-------+---------+--------------------

1 | db01 | standby | ! running as primary | | running | 15737 | no | n/a

2 | db02 | primary | ? unreachable | ? | n/a | n/a | n/a | n/a

WARNING: following issues were detected

- node "db01" (ID: 1) is registered as standby but running as primary

- unable to connect to node "db02" (ID: 2)

- node "db02" (ID: 2) is registered as an active primary but is unreachable

HINT: execute with --verbose option to see connection error messages

-bash-4.2$ repmgr cluster show

----+------+---------+----------------------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | standby | ! running as primary | | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | primary | ? unreachable | ? | default | 100 | | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

- node "db01" (ID: 1) is registered as standby but running as primary

- unable to connect to node "db02" (ID: 2)

- node "db02" (ID: 2) is registered as an active primary but is unreachable

HINT: execute with --verbose option to see connection error messages

#db02重新加入：

repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run

repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

####

-bash-4.2$ repmgr node rejoin -d 'host=192.168.5.200 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

INFO: looking for configuration file in /etc

INFO: configuration file found at: "/etc/repmgr.conf"

NOTICE: rejoin target is node "db01" (ID: 1)

INFO: prerequisites for using pg_rewind are met

INFO: 2 files copied to "/tmp/repmgr-config-archive-db02"

NOTICE: executing pg_rewind

DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2'"

NOTICE: 2 files copied to /data/db01

INFO: directory "/tmp/repmgr-config-archive-db02" deleted

NOTICE: setting node 2's upstream to node 1

WARNING: unable to ping "host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl -w -D '/data/db01' start"

INFO: node "db02" (ID: 2) is pingable

INFO: node "db02" (ID: 2) has attached to its upstream node

NOTICE: NODE REJOIN successful

DETAIL: node 2 is now attached to node 1

#检查集群状态：

-bash-4.2$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 100 | 4 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3节点故障模拟自动切换：

#集群状态

-bash-4.2$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | * running | | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | standby | running | db01 | default | 100 | 5 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db01 | default | 100 | 5 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr service status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | primary | * running | | running | 15737 | no | n/a

2 | db02 | standby | running | db01 | running | 1716 | no | 4 second(s) ago

3 | db03 | standby | running | db01 | running | 17700 | no | 4 second(s) ago

-bash-4.2$

#db01 关机（模拟故障）

[root@db01 ~]# init 0

#db02日志：

[2021-10-28 21:37:41] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts

[2021-10-28 21:37:43] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"

[2021-10-28 21:37:43] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2021-10-28 21:37:43] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts

[2021-10-28 21:37:43] [INFO] 1 active sibling nodes registered

[2021-10-28 21:37:43] [INFO] 3 total nodes registered

[2021-10-28 21:37:43] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")

[2021-10-28 21:37:43] [INFO] local node's last receive lsn: 0/13019878

[2021-10-28 21:37:43] [INFO] checking state of sibling node "db03" (ID: 3)

[2021-10-28 21:37:43] [INFO] node "db03" (ID: 3) reports its upstream is node 1, last seen 69 second(s) ago

[2021-10-28 21:37:43] [INFO] standby node "db03" (ID: 3) last saw primary node 69 second(s) ago

[2021-10-28 21:37:43] [INFO] last receive LSN for sibling node "db03" (ID: 3) is: 0/13019878

[2021-10-28 21:37:43] [INFO] node "db03" (ID: 3) has same LSN as current candidate "db02" (ID: 2)

[2021-10-28 21:37:43] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds

[2021-10-28 21:37:43] [NOTICE] promotion candidate is "db02" (ID: 2)

[2021-10-28 21:37:43] [NOTICE] this node is the winner, will now promote itself and inform other nodes

[2021-10-28 21:37:43] [INFO] promote_command is:

"/usr/local/postgresql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"

[2021-10-28 21:37:43] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-28 21:37:44] [WARNING] 1 sibling nodes found, but option "--siblings-follow" not specified

[2021-10-28 21:37:44] [DETAIL] these nodes will remain attached to the current primary:

db03 (node ID: 3)

[2021-10-28 21:37:44] [NOTICE] promoting standby to primary

[2021-10-28 21:37:44] [DETAIL] promoting server "db02" (ID: 2) using pg_promote()

[2021-10-28 21:37:44] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

[2021-10-28 21:37:45] [NOTICE] STANDBY PROMOTE successful

[2021-10-28 21:37:45] [DETAIL] server "db02" (ID: 2) was successfully promoted to primary

[2021-10-28 21:37:45] [INFO] checking state of node 2, 1 of 6 attempts

[2021-10-28 21:37:45] [NOTICE] node 2 has recovered, reconnecting

[2021-10-28 21:37:45] [INFO] connection to node 2 succeeded

[2021-10-28 21:37:45] [INFO] original connection is still available

[2021-10-28 21:37:45] [INFO] 1 followers to notify

[2021-10-28 21:37:45] [NOTICE] notifying node "db03" (ID: 3) to follow node 2

INFO: node 3 received notification to follow node 2

[2021-10-28 21:37:45] [INFO] switching to primary monitoring mode

[2021-10-28 21:37:45] [NOTICE] monitoring cluster primary "db02" (ID: 2)

[2021-10-28 21:37:50] [NOTICE] new standby "db03" (ID: 3) has connected

[2021-10-28 21:42:46] [INFO] monitoring primary node "db02" (ID: 2) in normal state

#db03日志：

[2021-10-28 21:37:41] [INFO] checking state of node "db01" (ID: 1), 6 of 6 attempts

[2021-10-28 21:37:43] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.5.200 fallback_application_name=repmgr"

[2021-10-28 21:37:43] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2021-10-28 21:37:43] [WARNING] unable to reconnect to node "db01" (ID: 1) after 6 attempts

[2021-10-28 21:37:43] [INFO] 1 active sibling nodes registered

[2021-10-28 21:37:43] [INFO] 3 total nodes registered

[2021-10-28 21:37:43] [INFO] primary node "db01" (ID: 1) and this node have the same location ("default")

[2021-10-28 21:37:43] [INFO] local node's last receive lsn: 0/13019878

[2021-10-28 21:37:43] [INFO] checking state of sibling node "db02" (ID: 2)

[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) reports its upstream is node 1, last seen 68 second(s) ago

[2021-10-28 21:37:43] [INFO] standby node "db02" (ID: 2) last saw primary node 68 second(s) ago

[2021-10-28 21:37:43] [INFO] last receive LSN for sibling node "db02" (ID: 2) is: 0/13019878

[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) has same LSN as current candidate "db03" (ID: 3)

[2021-10-28 21:37:43] [INFO] node "db02" (ID: 2) has same priority but lower node_id than current candidate "db03" (ID: 3)

[2021-10-28 21:37:43] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds

[2021-10-28 21:37:43] [NOTICE] promotion candidate is "db02" (ID: 2)

[2021-10-28 21:37:43] [INFO] follower node awaiting notification from a candidate node

2021-10-28 21:37:44.692 EDT [17737] FATAL: could not connect to the primary server: could not connect to server: No route to host

Is the server running on host "192.168.5.200" and accepting

TCP/IP connections on port 5432?

[2021-10-28 21:37:46] [NOTICE] attempting to follow new primary "db02" (node ID: 2)

[2021-10-28 21:37:46] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[2021-10-28 21:37:46] [INFO] local node 3 can attach to follow target node 2

[2021-10-28 21:37:46] [DETAIL] local node's recovery point: 0/13019878; follow target node's fork point: 0/13019878

[2021-10-28 21:37:46] [NOTICE] setting node 3's upstream to node 2

2021-10-28 21:37:46.718 EDT [17680] LOG: received SIGHUP, reloading configuration files

2021-10-28 21:37:46.719 EDT [17680] LOG: parameter "primary_conninfo" changed to "user=repmgr connect_timeout=2 host=192.168.5.201 application_name=db03"

2021-10-28 21:37:46.720 EDT [17681] LOG: WAL receiver process shutdown requested

2021-10-28 21:37:46.721 EDT [17741] FATAL: terminating walreceiver process due to administrator command

[2021-10-28 21:37:46] [WARNING] node "db03" attached in state "startup"

2021-10-28 21:37:46.726 EDT [17749] LOG: fetching timeline history file for timeline 6 from primary server

2021-10-28 21:37:46.727 EDT [17749] LOG: started streaming WAL from primary at 0/13000000 on timeline 5

2021-10-28 21:37:46.727 EDT [17749] LOG: replication terminated by primary server

2021-10-28 21:37:46.727 EDT [17749] DETAIL: End of WAL reached on timeline 5 at 0/13019878.

2021-10-28 21:37:46.728 EDT [17681] LOG: new target timeline is 6

2021-10-28 21:37:46.728 EDT [17749] LOG: restarted WAL streaming at 0/13000000 on timeline 6

[2021-10-28 21:37:47] [NOTICE] STANDBY FOLLOW successful

[2021-10-28 21:37:47] [DETAIL] standby attached to upstream node "db02" (ID: 2)

INFO: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid

[2021-10-28 21:37:47] [NOTICE] node "db03" (ID: 3) now following new upstream node "db02" (ID: 2)

[2021-10-28 21:37:47] [INFO] resuming standby monitoring mode

[2021-10-28 21:37:47] [DETAIL] following new primary "db02" (ID: 2)

[2021-10-28 21:42:48] [INFO] node "db03" (ID: 3) monitoring upstream node "db02" (ID: 2) in normal state

[2021-10-28 21:42:48] [DETAIL] last monitoring statistics update was 5 seconds ago

#自动切换之后状态：

-bash-4.2$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | primary | - failed | ? | default | 100 | | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | primary | * running | | default | 100 | 6 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db02 | default | 100 | 6 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

- unable to connect to node "db01" (ID: 1)

HINT: execute with --verbose option to see connection error messages

db01恢复重新加入：

repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run

repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

#加入日志：

[postgres@db01 ~]$ repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run

INFO: looking for configuration file in /etc

INFO: configuration file found at: "/etc/repmgr.conf"

NOTICE: rejoin target is node "db02" (ID: 2)

INFO: replication connection to the rejoin target node was successful

INFO: local and rejoin target system identifiers match

DETAIL: system identifier is 7024014994509133506

NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2

DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 0/14000028

INFO: prerequisites for using pg_rewind are met

INFO: temporary archive directory "/tmp/repmgr-config-archive-db01" created

INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-db01/postgresql.conf"

INFO: file "postgresql.auto.conf" would be copied to "/tmp/repmgr-config-archive-db01/postgresql.auto.conf"

INFO: 2 files would have been copied to "/tmp/repmgr-config-archive-db01"

INFO: temporary archive directory "/tmp/repmgr-config-archive-db01" deleted

INFO: pg_rewind would now be executed

DETAIL: pg_rewind command is:

/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'

INFO: prerequisites for executing NODE REJOIN are met

[postgres@db01 ~]$ repmgr node rejoin -d 'host=192.168.5.201 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

INFO: looking for configuration file in /etc

INFO: configuration file found at: "/etc/repmgr.conf"

NOTICE: rejoin target is node "db02" (ID: 2)

NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2

DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 0/14000028

INFO: prerequisites for using pg_rewind are met

INFO: 2 files copied to "/tmp/repmgr-config-archive-db01"

NOTICE: executing pg_rewind

DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/data/db01' --source-server='host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2'"

NOTICE: 2 files copied to /data/db01

INFO: directory "/tmp/repmgr-config-archive-db01" deleted

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl -w -D '/data/db01' start"

INFO: node "db01" (ID: 1) is pingable

INFO: node "db01" (ID: 1) has attached to its upstream node

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

#重新rejoin加入之后，数据库自动启动：

[postgres@db01 ~]$ repmgr cluster show

----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------

1 | db01 | standby | running | db02 | default | 100 | 5 | host=192.168.5.200 user=repmgr dbname=repmgr connect_timeout=2

2 | db02 | primary | * running | | default | 100 | 6 | host=192.168.5.201 user=repmgr dbname=repmgr connect_timeout=2

3 | db03 | standby | running | db02 | default | 100 | 6 | host=192.168.5.202 user=repmgr dbname=repmgr connect_timeout=2

[postgres@db01 ~]$ repmgr service status

----+------+---------+-----------+----------+-------------+-------+---------+--------------------

1 | db01 | standby | running | db02 | not running | n/a | n/a | n/a

2 | db02 | primary | * running | | running | 1716 | no | n/a

3 | db03 | standby | running | db02 | running | 17700 | no | 1 second(s) ago

db01启动repmgrd

[postgres@db01 ~]$ repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid

[2021-10-28 22:02:54] [NOTICE] redirecting logging output to "/tmp/repmgrd.log"

[postgres@db01 ~]$ repmgr service status

----+------+---------+-----------+----------+---------+-------+---------+--------------------

1 | db01 | standby | running | db02 | running | 1619 | no | 2 second(s) ago

2 | db02 | primary | * running | | running | 1716 | no | n/a

3 | db03 | standby | running | db02 | running | 17700 | no | 3 second(s) ago

参考：

https://my.oschina.net/postgresqlchina/blog/5115308

https://cloud.tencent.com/developer/article/1555358

https://blog.csdn.net/icreasy3/article/details/111317844

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

repmgr-repmgrd-自动故障切换

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

repmgr-repmgrd-自动故障切换

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品