基于Pacemaker的PostgreSQL一主多从读负载均衡集群搭建

简介

PostgreSQL的HA方案有很多种，本文演示基于Pacemaker的PostgreSQL一主多从读负载均衡集群搭建。搭建过程并不是使用原始的Pacemaker pgsql RA脚本，而使用以下我修改和包装的脚本集pha4pgsql。

https://github.com/ChenHuajun/pha4pgsql

目标集群特性

秒级自动failover
failover零数据丢失(防脑裂)
支持在线主从切换
支持读写分离
支持读负载均衡
支持动态增加和删除只读节点

环境

OS:CentOS 7.3
节点1:node1(192.168.0.231)
节点2:node2(192.168.0.232)
节点2:node3(192.168.0.233)
writer_vip:192.168.0.236
reader_vip:192.168.0.237

依赖软件

pacemaker
corosync
pcs
ipvsadm

安装与配置

环境准备

所有节点设置时钟同步

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ntpdate time.windows.com && hwclock -w

所有节点设置独立的主机名(node1，node2，node3)
```
hostnamectl set-hostname node1 
```

设置对所有节点的域名解析

$ vi /etc/hosts
...
192.168.0.231 node1
192.168.0.232 node2
192.168.0.233 node3

在所有节点上禁用SELINUX

$ setenforce 0
$ vi /etc/selinux/config
...
SELINUX=disabled

在所有节点上禁用防火墙
```
systemctl disable firewalld.service
systemctl stop firewalld.service 
```
如果开启防火墙需要开放postgres，pcsd和corosync的端口。参考CentOS 7防火墙设置示例
- postgres:5432/tcp
- pcsd:2224/tcp
- corosync:5405/udp

安装和配置Pacemaker+Corosync集群软件

安装Pacemaker和Corosync及相关软件包

在所有节点执行：

yum install -y pacemaker corosync pcs ipvsadm

启用pcsd服务

在所有节点执行：

systemctl start pcsd.service
systemctl enable pcsd.service

设置hacluster用户密码

在所有节点执行：

echo hacluster | passwd hacluster --stdin

集群认证

在任何一个节点上执行:

pcs cluster auth -u hacluster -p hacluster node1 node2 node3

同步配置

在任何一个节点上执行:

pcs cluster setup --last_man_standing=1 --name pgcluster node1 node2 node3

启动集群

在任何一个节点上执行:

pcs cluster start --all

安装和配置PostgreSQL

安装PostgreSQL

安装9.2以上的PostgreSQL，本文通过PostgreSQL官方yum源安装CentOS 7.3对应的PostgreSQL 9.6

https://yum.postgresql.org/

在所有节点执行：

yum install -y https://yum.postgresql.org/9.6/redhat/rhel-7.3-x86_64/pgdg-centos96-9.6-3.noarch.rpm

yum install -y postgresql96 postgresql96-contrib postgresql96-libs postgresql96-server postgresql96-devel

ln -sf /usr/pgsql-9.6 /usr/pgsql
echo 'export PATH=/usr/pgsql/bin:$PATH' >>~postgres/.bash_profile

创建Master数据库

在node1节点执行：

创建数据目录

mkdir -p /pgsql/data
chown -R postgres:postgres /pgsql/
chmod 0700 /pgsql/data

初始化db
```
su - postgres
initdb -D /pgsql/data/ 
```

修改postgresql.conf

listen_addresses = '*'
wal_level = hot_standby
wal_log_hints = on
synchronous_commit = on
max_wal_senders=5
wal_keep_segments = 32
hot_standby = on
wal_sender_timeout = 5000
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
restart_after_crash = off
hot_standby_feedback = on

注：设置"wal_log_hints = on"可以使用pg_rewind修复旧Master。

修改pg_hba.conf

local   all                 all                              trust
host    all                 all     192.168.0.0/24           md5
host    replication         all     192.168.0.0/24           md5

启动postgres
```
pg_ctl -D /pgsql/data/ start 
```
创建复制用户
```
createuser --login --replication replication -P -s 
```
注:加上“-s”选项可支持pg_rewind。

创建Slave数据库

在node2和node3节点执行：

创建数据目录

mkdir -p /pgsql/data
chown -R postgres:postgres /pgsql/
chmod 0700 /pgsql/data

创建基础备份

su - postgres
pg_basebackup -h node1 -U replication -D /pgsql/data/ -X stream -P

停止PostgreSQL服务

在node1上执行: pg_ctl -D /pgsql/data/ stop

安装和配置pha4pgsql

在任意一个节点上执行:

下载pha4pgsql

cd /opt
git clone git://github.com/Chenhuajun/pha4pgsql.git

拷贝config.ini

cd /opt/pha4pgsql
cp template/config_muti_with_lvs.ini.sample  config.ini

注:如果不需要配置基于LVS的负载均衡，可使用模板config_muti.ini.sample

修改config.ini

pcs_template=muti_with_lvs.pcs.template
OCF_ROOT=/usr/lib/ocf
RESOURCE_LIST="msPostgresql vip-master vip-slave"
pha4pgsql_dir=/opt/pha4pgsql
writer_vip=192.168.0.236
reader_vip=192.168.0.237
node1=node1
node2=node2
node3=node3
othernodes=""
vip_nic=ens37
vip_cidr_netmask=24
pgsql_pgctl=/usr/pgsql/bin/pg_ctl
pgsql_psql=/usr/pgsql/bin/psql
pgsql_pgdata=/pgsql/data
pgsql_pgport=5432
pgsql_restore_command=""
pgsql_rep_mode=sync
pgsql_repuser=replication
pgsql_reppassord=replication

安装pha4pgsql
```
sh install.sh
./setup.sh 
```
执行install.sh使用了scp拷贝文件，中途会多次要求输入其它节点的root账号。 install.sh执行会生成Pacemaker的配置脚本/opt/pha4pgsql/config.pcs，可以根据情况对其中的参数进行调优后再执行setup.sh。

设置环境变量

export PATH=/opt/pha4pgsql/bin:$PATH
echo 'export PATH=/opt/pha4pgsql/bin:$PATH' >>/root/.bash_profile

启动集群
```
cls_start 
```

确认集群状态

cls_status

cls_status的输出如下：

[root@node1 pha4pgsql]# cls_status
Stack: corosync
Current DC: node1 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Wed Jan 11 00:53:58 2017      Last change: Wed Jan 11 00:45:54 2017 by root via crm_attribute on node1

3 nodes and 9 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 vip-master (ocf::heartbeat:IPaddr2):   Started node1
 vip-slave  (ocf::heartbeat:IPaddr2):   Started node2
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 lvsdr  (ocf::heartbeat:lvsdr): Started node2
 Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
     Started: [ node2 node3 ]
     Stopped: [ node1 ]

Node Attributes:
* Node node1:
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 00000000050001B0
    + pgsql-status                      : PRI
* Node node2:
    + master-pgsql                      : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync
* Node node3:
    + master-pgsql                      : -INFINITY
    + pgsql-data-status                 : STREAMING|ASYNC
    + pgsql-status                      : HS:async

Migration Summary:
* Node node2:
* Node node3:
* Node node1:

pgsql_REPL_INFO:node1|1|00000000050001B0

检查集群的健康状态。完全健康的集群需要满足以下条件：

msPostgresql在每个节点上都已启动
在其中一个节点上msPostgresql处于Master状态，其它的为Salve状态
Salve节点的data-status值是以下中的一个
- STREAMING|SYNC
  同步复制Slave
- STREAMING|POTENTIAL
  候选同步复制Slave
- STREAMING|ASYNC
  异步复制Slave

pgsql_REPL_INFO的3段内容分别指当前master，上次提升前的时间线和xlog位置。

pgsql_REPL_INFO:node1|1|00000000050001B0

LVS配置在node2上

[root@node2 ~]#  ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node2:postgres rr
  -> node2:postgres               Route   1      0          0
  -> node3:postgres               Route   1      0          0

故障测试

Master故障

停止Master上的网卡模拟故障

[root@node1 pha4pgsql]# ifconfig ens37 down

检查集群状态

Pacemaker已经将Master和写VIP切换到node2上

[root@node2 ~]# cls_status
resource msPostgresql is NOT running
Stack: corosync
Current DC: node2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Wed Jan 11 01:25:08 2017      Last change: Wed Jan 11 01:21:26 2017 by root via crm_attribute on node2

3 nodes and 9 resources configured

Online: [ node2 node3 ]
OFFLINE: [ node1 ]

Full list of resources:

 vip-master (ocf::heartbeat:IPaddr2):   Started node2
 vip-slave  (ocf::heartbeat:IPaddr2):   Started node3
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node2 ]
     Slaves: [ node3 ]
     Stopped: [ node1 ]
 lvsdr  (ocf::heartbeat:lvsdr): Started node3
 Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
     Started: [ node3 ]
     Stopped: [ node1 node2 ]

Node Attributes:
* Node node2:
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 00000000050008E0
    + pgsql-status                      : PRI
* Node node3:
    + master-pgsql                      : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync

Migration Summary:
* Node node2:
* Node node3:

pgsql_REPL_INFO:node2|2|00000000050008E0

LVS和读VIP被移到了node3上

[root@node3 ~]# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node3:postgres rr
  -> node3:postgres               Route   1      0          0

修复旧Master的网卡

在旧Master node1上，postgres进程还在（注1）。但是由于配置的是同步复制，数据无法写入不会导致脑裂。

[root@node1 pha4pgsql]# ps -ef|grep postgres
root      20295   2269  0 01:35 pts/0    00:00:00 grep --color=auto postgres
postgres  20556      1  0 00:45 ?        00:00:01 /usr/pgsql-9.6/bin/postgres -D /pgsql/data -c config_file=/pgsql/data/postgresql.conf
postgres  20566  20556  0 00:45 ?        00:00:00 postgres: logger process
postgres  20574  20556  0 00:45 ?        00:00:00 postgres: checkpointer process
postgres  20575  20556  0 00:45 ?        00:00:00 postgres: writer process
postgres  20576  20556  0 00:45 ?        00:00:00 postgres: stats collector process
postgres  22390  20556  0 00:45 ?        00:00:00 postgres: wal writer process
postgres  22391  20556  0 00:45 ?        00:00:00 postgres: autovacuum launcher process

启动网卡后，postgres进程被停止

[root@node1 pha4pgsql]# ifconfig ens37 up
[root@node1 pha4pgsql]# ps -ef|grep postgres
root      21360   2269  0 01:36 pts/0    00:00:00 grep --color=auto postgres
[root@node1 pha4pgsql]# cls_status
resource msPostgresql is NOT running
Stack: corosync
Current DC: node2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Wed Jan 11 01:36:20 2017      Last change: Wed Jan 11 01:36:00 2017 by hacluster via crmd on node2

3 nodes and 9 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 vip-master (ocf::heartbeat:IPaddr2):   Started node2
 vip-slave  (ocf::heartbeat:IPaddr2):   Started node3
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node2 ]
     Slaves: [ node3 ]
     Stopped: [ node1 ]
 lvsdr  (ocf::heartbeat:lvsdr): Started node3
 Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
     Started: [ node3 ]
     Stopped: [ node1 node2 ]

Node Attributes:
* Node node1:
    + master-pgsql                      : -INFINITY
    + pgsql-data-status                 : DISCONNECT
    + pgsql-status                      : STOP
* Node node2:
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 00000000050008E0
    + pgsql-status                      : PRI
* Node node3:
    + master-pgsql                      : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync

Migration Summary:
* Node node2:
* Node node3:
* Node node1:
   pgsql: migration-threshold=3 fail-count=1000000 last-failure='Wed Jan 11 01:36:08 2017'

Failed Actions:
* pgsql_start_0 on node1 'unknown error' (1): call=278, status=complete, exitreason='The master's timeline forked off current database system timeline 2 before latest checkpoint location 0000000005000B80, REPL_INF',
    last-rc-change='Wed Jan 11 01:36:07 2017', queued=0ms, exec=745ms


pgsql_REPL_INFO:node2|2|00000000050008E0

注1：这是通过ifconfig ens37 down停止网卡模拟故障的特殊现象（或者说是corosync的bug），Pacemkaer的日志中不停的输出以下警告。在实际的物理机宕机或网卡故障时，故障节点会由于失去quorum，postgres进程会被Pacemaker主动停止。

[43260] node3 corosyncwarning [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.

修复旧Master(node1)并作为Slave加入集群

通过pg_rewind修复旧Master

[root@node1 pha4pgsql]# cls_repair_by_pg_rewind
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
connected to server
servers diverged at WAL position 0/50008E0 on timeline 2
rewinding from last common checkpoint at 0/5000838 on timeline 2
reading source file list
reading target file list
reading WAL in target
need to copy 99 MB (total source directory size is 117 MB)
102359/102359 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
pg_rewind complete!
resource msPostgresql is NOT running
resource msPostgresql is NOT running
Waiting for 1 replies from the CRMd. OK
wait for recovery complete
.....
slave recovery of node1 successed

检查集群状态

[root@node1 pha4pgsql]# cls_status
Stack: corosync
Current DC: node2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Wed Jan 11 01:39:30 2017      Last change: Wed Jan 11 01:37:35 2017 by root via crm_attribute on node2

3 nodes and 9 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 vip-master (ocf::heartbeat:IPaddr2):   Started node2
 vip-slave  (ocf::heartbeat:IPaddr2):   Started node3
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node2 ]
     Slaves: [ node1 node3 ]
 lvsdr  (ocf::heartbeat:lvsdr): Started node3
 Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
     Started: [ node1 node3 ]
     Stopped: [ node2 ]

Node Attributes:
* Node node1:
    + master-pgsql                      : -INFINITY
    + pgsql-data-status                 : STREAMING|ASYNC
    + pgsql-status                      : HS:async
* Node node2:
    + master-pgsql                      : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 00000000050008E0
    + pgsql-status                      : PRI
* Node node3:
    + master-pgsql                      : 100
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync
    + pgsql-xlog-loc                    : 000000000501F118

Migration Summary:
* Node node2:
* Node node3:
* Node node1:

pgsql_REPL_INFO:node2|2|00000000050008E0

Slave故障

LVS配置在node3上，2个real server

[root@node3 ~]# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node3:postgres rr
  -> node1:postgres               Route   1      0          0
  -> node3:postgres               Route   1      0          0

在其中一个Slave(node1)上停止网卡

[root@node1 pha4pgsql]# ifconfig ens37 down

Pacemaker已自动修改LVS的real server配置

[root@node3 ~]# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node3:postgres rr
  -> node3:postgres               Route   1      0          0

添加Slave扩容读负载均衡

目前配置的是1主2从集群，2个Slave通过读VIP+LVS做读负载均衡，如果读负载很高可以添加额外的Slave扩展读性能。把更多的Slave直接添加到Pacemaker集群中可以达到这个目的，但过多的节点数会增加Pacemaker+Corosync集群的复杂性和通信负担（Corosync的通信是一个环路，节点数越多，时延越大）。所以不把额外的Slave加入Pacemaker集群，仅仅加到LVS的real server中，并让lvsdr监视Slave的健康状况，动态更新LVS的real server列表。方法如下：

创建额外的Slave数据库

准备第4台机器node4(192.168.0.234),并在该机器上执行以下命令创建新的Slave

禁用SELINUX

$ setenforce 0
$ vi /etc/selinux/config
...
SELINUX=disabled

禁用防火墙

systemctl disable firewalld.service
systemctl stop firewalld.service

安装PostgreSQL

yum install -y https://yum.postgresql.org/9.6/redhat/rhel-7.3-x86_64/pgdg-centos96-9.6-3.noarch.rpm

yum install -y postgresql96 postgresql96-contrib postgresql96-libs postgresql96-server postgresql96-devel

ln -sf /usr/pgsql-9.6 /usr/pgsql
echo 'export PATH=/usr/pgsql/bin:$PATH' >>~postgres/.bash_profile

创建数据目录

mkdir -p /pgsql/data
chown -R postgres:postgres /pgsql/
chmod 0700 /pgsql/data

创建Salve备份

从当前的Master节点(即写VIP 192.168.0.236)拉取备份创建Slave

su - postgres
pg_basebackup -h 192.168.0.236 -U replication -D /pgsql/data/ -X stream -P

编辑postgresql.conf

将postgresql.conf中的下面一行删掉

￥vi /pgsql/data/postgresql.conf
...
#include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA

编辑recovery.conf

$vi /pgsql/data/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.236 port=5432 application_name=192.168.0.234 user=replication password=replication keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = ''
recovery_target_timeline = 'latest'

上面的application_name设置为本节点的IP地址192.168.0.234

启动Slave

pg_ctl -D /pgsql/data/ start

在Master上检查postgres wal sender进程，新建的Slave(192.168.0.234)已经和Master建立了流复制。

[root@node1 pha4pgsql]# ps -ef|grep '[w]al sender'
postgres  32387 111175  0 12:15 ?        00:00:00 postgres: wal sender process replication 192.168.0.234(47894) streaming 0/7000220
postgres 116675 111175  0 12:01 ?        00:00:00 postgres: wal sender process replication 192.168.0.233(33652) streaming 0/7000220
postgres 117079 111175  0 12:01 ?        00:00:00 postgres: wal sender process replication 192.168.0.232(40088) streaming 0/7000220

配置LVS real server

设置系统参数

echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce

在lo网卡上添加读VIP
```
ip a add 192.168.0.237/32 dev lo:0 
```

将新建的Slave加入到LVS中

现在LVS的配置中还没有把新的Slave作为real server加入

[root@node3 ~]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node3:postgres rr
  -> node2:postgres               Route   1      0          0
  -> node3:postgres               Route   1      0          0

在Pacemaker集群的任意一个节点(node1,node2或node3)上，修改lvsdr RA的配置，加入新的real server。

[root@node2 ~]# pcs resource update lvsdr realserver_get_real_servers_script="/opt/pha4pgsql/tools/get_active_slaves /usr/pgsql/bin/psql \"host=192.168.0.236 port=5432 dbname=postgres user=replication password=replication connect_timeout=5\""

设置realserver_get_real_servers_script参数后，lvsdr会通过脚本获取LVS的real server列表，这里的get_active_slaves会通过写VIP连接到Master节点获取所有以连接到Master的Slave的application_name作为real server。设置后新的Slave 192.168.0.234已经被加入到real server 列表中了。

[root@node2 ~]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node2:postgres rr
  -> node2:postgres               Route   1      0          0
  -> node3:postgres               Route   1      0          0
  -> 192.168.0.234:postgres       Route   1      0          0

测试读负载均衡

在当前的Master节点(node1)上通过读VIP访问postgres,可以看到psql会轮询连接到3个不同的Slave上。

[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication" -tAc "select pg_postmaster_start_time()"
2017-01-14 12:01:48.068455+08
[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication" -tAc "select pg_postmaster_start_time()"
2017-01-14 12:01:12.222412+08
[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication" -tAc "select pg_postmaster_start_time()"
2017-01-14 12:15:19.614782+08
[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication" -tAc "select pg_postmaster_start_time()"
2017-01-14 12:01:48.068455+08
[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication" -tAc "select pg_postmaster_start_time()"
2017-01-14 12:01:12.222412+08
[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication" -tAc "select pg_postmaster_start_time()"
2017-01-14 12:15:19.614782+08

下面测试Salve节点发生故障的场景。先连接到其中一台Slave

[root@node1 pha4pgsql]# psql "host=192.168.0.237 port=5432 dbname=postgres user=replication password=replication"
psql (9.6.1)
Type "help" for help.

当前连接在node4上

[root@node4 ~]# ps -ef|grep postgres
postgres  11911      1  0 12:15 pts/0    00:00:00 /usr/pgsql-9.6/bin/postgres -D /pgsql/data
postgres  11912  11911  0 12:15 ?        00:00:00 postgres: logger process
postgres  11913  11911  0 12:15 ?        00:00:00 postgres: startup process   recovering 000000090000000000000007
postgres  11917  11911  0 12:15 ?        00:00:00 postgres: checkpointer process
postgres  11918  11911  0 12:15 ?        00:00:00 postgres: writer process
postgres  11920  11911  0 12:15 ?        00:00:00 postgres: stats collector process
postgres  11921  11911  0 12:15 ?        00:00:04 postgres: wal receiver process   streaming 0/7000CA0
postgres  12004  11911  0 13:19 ?        00:00:00 postgres: replication postgres 192.168.0.231(42116) idle
root      12006   2230  0 13:19 pts/0    00:00:00 grep --color=auto postgres

强制杀死node4上的postgres进程

[root@node4 ~]# killall postgres

lvsdr探测到node4挂了后会自动将其从real server列表中摘除

[root@node2 ~]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  node2:postgres rr
  -> node2:postgres               Route   1      0          0
  -> node3:postgres               Route   1      0          0

psql执行下一条SQL时就会自动连接到其它Slave上。

postgres=# select pg_postmaster_start_time();
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select pg_postmaster_start_time();
   pg_postmaster_start_time
-------------------------------
 2017-01-14 12:01:48.068455+08
(1 row)

指定静态的real server列表

有时候不希望将所有连接到Master的Slave都加入到LVS的real server中，比如某个Slave可能实际上是pg_receivexlog。这时可以在lvsdr上指定静态的real server列表作为白名单。

方法1:

通过default_weight和weight_of_realservers指定各个real server的权重，将不想参与到负载均衡的Slave的权重设置为0。并且还是通过在Master上查询Slave一览的方式监视Slave健康状态。

下面在Pacemaker集群的任意一个节点(node1,node2或node3)上，修改lvsdr RA的配置，设置有效的real server列表为node,node2和node3。

pcs resource update lvsdr default_weight="0"
pcs resource update lvsdr weight_of_realservers="node1,1 node2,1 node3,1"
pcs resource update lvsdr realserver_get_real_servers_script="/opt/pha4pgsql/tools/get_active_slaves /usr/pgsql/bin/psql \"host=192.168.0.236 port=5432 dbname=postgres user=replication password=replication connect_timeout=5\""

在lvsdr所在节点上检查LVS的状态，此时node4(192.168.0.234)的权重为0，LVS不会往node4上转发请求。

[root@node2 ~]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

TCP  node2:postgres rr
  -> node2:postgres               Route   1      0          0
  -> node3:postgres               Route   1      0          0
  -> 192.168.0.234:postgres       Route   0      0          0

方法2:

通过default_weight和weight_of_realservers指定real server一览，并通过调用check_active_slave脚本，依次连接到real server中的每个节点上检查其是否可以连接并且是Slave。

pcs resource update lvsdr default_weight="1"
pcs resource update lvsdr weight_of_realservers="node1 node2 node3 192.168.0.234"
pcs resource update lvsdr realserver_dependent_resource=""
pcs resource update lvsdr realserver_get_real_servers_script=""
pcs resource update lvsdr realserver_check_active_real_server_script="/opt/pha4pgsql/tools/check_active_slave /usr/pgsql/bin/psql \"port=5432 dbname=postgres user=replication password=replication connect_timeout=5\" -h"

推荐采用方法1，因为每次健康检查只需要1次连接。

参考

Pacemaker High Availability for PostgreSQL
PostgreSQL流复制高可用的原理与实践
PgSQL Replicated Cluster
Pacemaker+Corosync搭建PostgreSQL集群

subx