搭建基于etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案
宿主机操作系统:ubuntu20.04

使用kvm搭建虚拟环境(如没有安装kvm,请先自行安装kvm)

1、安装kvm服务

①、查看虚拟支持

如果CPU 支持硬件虚拟化则输出结果大于0,安装kvm-ok命令检查是否启用虚拟化支持,否则需要重启进入BIOS查看是否启用虚拟化支持,如果BIOS也已经启用虚拟化支持但是输出依旧为0则表示当前CPU不支持虚拟化(不能使用虚拟机)

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ grep -Eoc '(vmx|svm)' /proc/cpuinfo
8
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get install cpu-check -y
[sudo] password for liyi: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package cpu-check
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ kvm
kvm        kvm-ok     kvm-spice  
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
②、安装kvm

更新系统环境

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get update -y
...更新输出

安装依赖包

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager

将当前用户加入kvm所属组

liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo usermod -aG libvirt $USER
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ sudo usermod -aG kvm $USER
...
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ cat /etc/group | grep kvm
kvm:x:108:liyi
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ cat /etc/group | grep libv
libvirt:x:136:liyi
libvirt-qemu:x:64055:libvirt-qemu
libvirt-dnsmasq:x:137:

可以查看到当前用户已经被加入到了kvm组

③、启动kvm服务
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ systemctl start libvirtd.service
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$ systemctl status libvirtd.service 
● libvirtd.service - Virtualization daemon
     Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2024-03-09 09:45:11 CST; 57min ago
TriggeredBy: ● libvirtd-ro.socket
             ● libvirtd-admin.socket
             ● libvirtd.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
   Main PID: 982 (libvirtd)
      Tasks: 26 (limit: 32768)
     Memory: 44.3M
     CGroup: /system.slice/libvirtd.service
             ├─ 982 /usr/sbin/libvirtd
             ├─1322 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br2.conf --leasefile-ro -->
             ├─1323 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br2.conf --leasefile-ro -->
             ├─1358 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br0.conf --leasefile-ro -->
             ├─1359 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br0.conf --leasefile-ro -->
             ├─1392 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf --leasefile-ro -->
             ├─1393 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/br1.conf --leasefile-ro -->
             ├─1424 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r>
             └─1425 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r>

3月 09 10:26:27 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPREQUEST(virbr0) 192.168.122.119 52:5>
3月 09 10:26:27 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPACK(virbr0) 192.168.122.119 52:54:00>
3月 09 10:34:08 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPREQUEST(virbr0) 192.168.122.85 52:54>
3月 09 10:34:08 liyi-RESCUER-R720-lenovo dnsmasq-dhcp[1424]: DHCPACK(virbr0) 192.168.122.85 52:54:00:>
liyi@liyi-RESCUER-R720-lenovo:~/Desktop$

3、使用kvm配置脚本创建虚拟机(需要下载相关的centos操作系统ISO镜像文件…此处省略部分操作)

ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP

虚拟机环境和IP地址

4、搭建postgresql环境

①、3台pg虚拟机安装postgresql,名称分别为:pg01、pg02、pg03

具体操作和安装postgresql版本请参考postgresql官网: https://www.postgresql.org/download/linux/redhat/
这里我选择postgre版本14:

[root@pg01 ~]# yum makecache -y
[root@pg01 ~]# yum update -y
[root@pg01 ~]# sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
[root@pg01 ~]# sudo yum install -y postgresql14-server
[root@pg01 ~]# sudo /usr/pgsql-14/bin/postgresql-14-setup initdb
[root@pg01 ~]# sudo systemctl start postgresql-14
[root@pg01 ~]# sudo systemctl enable postgresql-14

查看postgresql服务状态

[root@pg01 ~]# systemctl status postgresql-14
● postgresql-14.service - PostgreSQL 14 database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
   Active: active (running) since 五 2024-03-08 20:50:47 EST; 1h 8min ago
     Docs: https://www.postgresql.org/docs/14/static/
  Process: 1022 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 1049 (postmaster)
   CGroup: /system.slice/postgresql-14.service
           ├─1049 /usr/pgsql-14/bin/postmaster -D /var/lib/pgsql/14/data/
           ├─1416 postgres: logger 
           ├─1536 postgres: checkpointer 
           ├─1537 postgres: background writer 
           ├─1538 postgres: walwriter 
           ├─1539 postgres: autovacuum launcher 
           ├─1540 postgres: stats collector 
           └─1541 postgres: logical replication launcher 

3月 08 20:50:38 pg01 systemd[1]: Starting PostgreSQL 14 database server...
3月 08 20:50:45 pg01 postmaster[1049]: 2024-03-08 20:50:45.329 EST [1049] LOG:  redirecting l...ess
3月 08 20:50:45 pg01 postmaster[1049]: 2024-03-08 20:50:45.329 EST [1049] HINT:  Future log o...g".
3月 08 20:50:47 pg01 systemd[1]: Started PostgreSQL 14 database server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@pg01 ~]#

pg02、pg03同理进行postgresql安装…

②、开启postgresql 访问端口

telnet检查3台postgresql5432端口是否可以访问
firewalld开放5432端口

[root@pg01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since 五 2024-03-08 22:55:32 EST; 1s ago
     Docs: man:firewalld(1)
 Main PID: 1953 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─1953 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid

3月 08 22:55:32 pg01 systemd[1]: Starting firewalld - dynamic firewall daemon...
3月 08 22:55:32 pg01 systemd[1]: Started firewalld - dynamic firewall daemon.
3月 08 22:55:32 pg01 firewalld[1953]: WARNING: AllowZoneDrifting is enabled. This is conside...now.
Hint: Some lines were ellipsized, use -l to show in full.
[root@pg01 ~]# firewall-cmd --zone=public --add-port=5432/tcp --permanent
success
[root@pg01 ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh
  ports: 5432/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules:

配置postgresql pg_hba允许pg02、pg03服务器连接,配置postgresql.conf监听所有端口[*]

[root@pg01 ~]# vim /var/lib/pgsql/14/data/pg_hba.conf
# IPv4 local connections:
# server connected
host    all             all             192.168.122.77/32       md5
host    all             all             192.168.122.85/32       md5
host    all             all             192.168.122.102/32      md5
host    all             all             127.0.0.1/32            scram-sha-256
#host    all             all             0.0.0.0/0               md5

[root@pg01 ~]# vim /var/lib/pgsql/14/data/postgresql.conf
#listen_addresses = 'localhost'         # what IP address(es) to listen on;
listen_addresses = '*'

重启postgresql服务

[root@pg01 ~]# systemctl restart postgresql-14

telnet测试5432端口

[root@pg01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.122.77 pg01
192.168.122.85 pg02
192.168.122.102 pg03
[root@pg01 ~]# telnet pg01
Trying 192.168.122.77...
telnet: connect to address 192.168.122.77: Connection refused
[root@pg01 ~]# telnet pg01 5432
Trying 192.168.122.77...
Connected to pg01.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@pg01 ~]# telnet pg02 5432
Trying 192.168.122.85...
Connected to pg02.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@pg01 ~]# telnet pg03 5432
Trying 192.168.122.102...
Connected to pg03.
Escape character is '^]'.
^CConnection closed by foreign host.

postgresql端口访问正常。

③、配置主库

配置主服务器postgresql.conf,开启archive设置archive path

[root@pg01 ~]# vim /var/lib/pgsql/14/data/postgresql.conf
206 wal_level = replica                     # minimal, replica, or logical
211 synchronous_commit = off                # synchronization level;
246 archive_mode = on              # enables archiving; off, on, or alw    ays
248 archive_command = 'cp %p /var/lib/pgsql/14/backups/%f'          # command to use to archive a logfile segment
299 max_wal_senders = 10            # max number of walsender processes

创建用于复制的用户

[root@pg01 ~]# su - postgres
Last login: 五 38 22:45:30 EST 2024 on pts/0
-bash-4.2$ psql
postgres=# create role repli_user login encrypted password 'repli_password' replication;
postgres=# select usename,usesysid,passwd from pg_user;
  usename   | usesysid |  passwd  
------------+----------+----------
 postgres   |       10 | ********
 repli_user |    16393 | ********
(2 rows)
postgres=# \q

更改pg01主库的pg_hba文件,将之前配置的trust 的允许所有(all)更改为允许新创建的复制用户,以准许repli_user复制

[root@pg01 ~]# vim /var/lib/pgsql/14/data/pg_hba.conf
# server connected
host    replication             repli_user             192.168.122.77/32       md5
host    replication             repli_user             192.168.122.85/32       md5
host    replication             repli_user             192.168.122.102/32      md5
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             0.0.0.0/0      md5

[root@pg01 ~]# systemctl restart postgresql-14

配置主库postgres用户目录下.pgpass用户名密码文件并设置文件模式为400

-bash-4.2$ pwd
/var/lib/pgsql
-bash-4.2$ cat .pgpass 
192.168.122.77:5432:replication:repli_user:repli_password
④、配置从库

配置从库postgres用户目录下.pgpass用户名密码文件并设置文件模式为400

-bash-4.2$ cat .pgpass 
192.168.122.77:5432:replication:repli_user:repli_password

使用backup将主库archive到从库backups目录,以检查从库连接主库

-bash-4.2$ ls -lh 14/backups/
total 0
-bash-4.2$ pg_basebackup -F p -D 14/backups -h 192.168.122.77 -p 5432 -U repli_user -F p -v -R
-bash-4.2$ ls -lh 14/backups/
total 200K
-rw-------. 1 postgres postgres  225 Mar  9 03:42 backup_label
-rw-------. 1 postgres postgres 135K Mar  9 03:42 backup_manifest
drwx------. 5 postgres postgres   41 Mar  9 03:42 base
......

配置backups/postgresql.conf 添加primary_conninfo路径

primary_conninfo = 'host=192.168.122.77 port=5432 user=repli_user password=repli_password'

清空从库的data目录,把backups下所有同步数据复制到从库的data目录

[root@pg02 ~]# systemctl stop postgresql-14
[root@pg02 ~]# su - postgres
-bash-4.2$ rm -rf /var/lib/pgsql/14/data/*
-bash-4.2$ cp -r 14/backups/* /var/lib/pgsql/14/data/
-bash-4.2$ exit
logout
[root@pg02 ~]# 
[root@pg02 ~]# systemctl start postgresql-14

从库服务重启正常,再切换到主库的数据库查询窗口执行replication 日志发送状态查询

postgres=# select * from pg_stat_replication;
*
(1 row)

从库查询日志接收状态

postgres=# select * from pg_stat_wal_receiver ;
*
(1 row)

pg03从服务器配置依旧如上
从pg01主服务器查看pg_stat_replication表

postgres=# select pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start from pg_stat_replication;
 pid  | usesysid |  usename   | application_name |  client_addr   | client_hostname | client_port |         backend_start         
------+----------+------------+------------------+----------------+-----------------+-------------+-------------------------------
 4279 |    16393 | repli_user | walreceiver      | 192.168.122.85 |                 |       59074 | 2024-03-09 05:20:55.374547-05
(1 row)

postgres=# 
postgres=# select pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start from pg_stat_replication;
 pid  | usesysid |  usename   | application_name |   client_addr   | client_hostname | client_port |         backend_start         
------+----------+------------+------------------+-----------------+-----------------+-------------+-------------------------------
 4353 |    16393 | repli_user | walreceiver      | 192.168.122.85  |                 |       59076 | 2024-03-09 05:43:10.798228-05
 4359 |    16393 | repli_user | walreceiver      | 192.168.122.102 |                 |       39832 | 2024-03-09 05:45:26.745926-05
(2 rows)
⑤、主服务器(pg01)创建test table查看从服务器同步状况

ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP

5、搭建etcd环境

①、3台etcd虚拟机安装etcd服务:etcd1、etcd2、etcd3
[root@etcd1 ~]# yum makecache -y
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bfsu.edu.cn
 * updates: mirrors.qlu.edu.cn
base                                                                         | 3.6 kB  00:00:00     
extras                                                                       | 2.9 kB  00:00:00     
updates                                                                      | 2.9 kB  00:00:00     
(1/10): base/7/x86_64/group_gz                                               | 153 kB  00:00:00     
(2/10): extras/7/x86_64/filelists_db                                         | 303 kB  00:00:00     
(3/10): extras/7/x86_64/other_db                                             | 150 kB  00:00:00     
(4/10): base/7/x86_64/other_db                                               | 2.6 MB  00:00:01     
(5/10): extras/7/x86_64/primary_db                                           | 250 kB  00:00:01     
(6/10): updates/7/x86_64/other_db                                            | 1.5 MB  00:00:03     
(7/10): updates/7/x86_64/filelists_db                                        |  14 MB  00:00:05     
(8/10): base/7/x86_64/filelists_db                                           | 7.2 MB  00:00:08     
(9/10): base/7/x86_64/primary_db                                             | 6.1 MB  00:00:09     
(10/10): updates/7/x86_64/primary_db                                         |  25 MB  00:00:12     
Metadata Cache Created

安装etcd:3.3.11版本

[root@etcd1 ~]# yum install etcd -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bfsu.edu.cn
 * updates: mirrors.qlu.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package etcd.x86_64 0:3.3.11-2.el7.centos will be installed
--> Finished Dependency Resolution
......
[root@etcd1 ~]# etcd --version
etcd Version: 3.3.11
Git SHA: 2cf9e51
Go Version: go1.10.3
Go OS/Arch: linux/amd64
②、配置每个节点etcd.conf

ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP
后面出现了一些问题,因为配置内的TOKEN设置是纯字母加下划线,重启服务会告知TOKEN过于简单,后面改成了1hretcd-cluster 四分之三原则

③、使用etcdctl member list查看集群状态

ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP

④、集群健康状态检查
[root@etcd1 ~]# sed -i 's/ETCD_INITIAL_CLUSTER_STATE="new"/ETCD_INITIAL_CLUSTER_STATE="existing"/g' /etc/etcd/etcd.conf | grep ETCD_INITIAL_CLUSTER_STATE
[root@etcd1 ~]# cat /etc/etcd/etcd.conf | grep ETCD_INITIAL_CLUSTER_STATE
ETCD_INITIAL_CLUSTER_STATE="existing"

firewall-cmd开放etcd集群每个成员主机的2379端口,否则会导致etcd集群降级

[root@etcd1 ~]# etcdctl cluster-health
member c6cde425fdd57d1 is healthy: got healthy result from http://192.168.122.225:2379
failed to check the health of member 92c68a626a26e458 on http://192.168.122.119:2379: Get http://192.168.122.119:2379/health: dial tcp 192.168.122.119:2379: i/o timeout
member 92c68a626a26e458 is unreachable: [http://192.168.122.119:2379] are all unreachable
failed to check the health of member cc4e6b745b5521d9 on http://192.168.122.138:2379: Get http://192.168.122.138:2379/health: dial tcp 192.168.122.138:2379: i/o timeout
member cc4e6b745b5521d9 is unreachable: [http://192.168.122.138:2379] are all unreachable
cluster is degraded
[root@etcd1 ~]# firewall-cmd --add-port=2379/tcp
success
[root@etcd1 ~]# 
[root@etcd1 ~]# etcdctl cluster-health
member c6cde425fdd57d1 is healthy: got healthy result from http://192.168.122.225:2379
member 92c68a626a26e458 is healthy: got healthy result from http://192.168.122.119:2379
member cc4e6b745b5521d9 is healthy: got healthy result from http://192.168.122.138:2379
cluster is healthy

6、搭建patroni

①、在pg01、pg02、pg03三台机器上安装python3环境和patroni
yum install python3 python3-pip -y
pip3 install --upgrade pip
pip install psycopg2-binary
pip install patroni
patroni --help  #正常输出
[root@pg01 ~]# pip list
Package            Version
------------------ -----------
click              8.0.4
dnspython          2.2.1
importlib-metadata 4.8.3
patroni            3.2.2
pip                21.3.1
prettytable        2.5.0
psutil             5.9.8
psycopg2-binary    2.9.5
python-dateutil    2.9.0.post0
python-etcd        0.4.5
PyYAML             6.0.1
setuptools         59.6.0
six                1.16.0
typing_extensions  4.1.1
urllib3            1.26.18
wcwidth            0.2.13
ydiff              1.2
zipp               3.6.0

创建patroni参数

[root@pg01 ~]# mkdir -p /app/patroni/etc/
[root@pg01 ~]# mkdir -p /app/patroni/log/
[root@pg01 ~]# chown postgres.postgres -R /app/patroni/
[root@pg01 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg01

log:
  level: INFO
  traceback_level: ERROR
  dir: /app/patroni/log
  file_num: 10
  file_size: 104857600

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.122.77:8008

etcd:
  host: 192.168.122.225:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    max_timelines_history: 0
    master_start_timeout: 300
    master_stop_timeout: 0
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 3000
        superuser_reserved_connections: 100
        max_locks_per_transaction: 64
        max_worker_processes: 2
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
        listen_addresses: "*"
        port: 5432
        cluster_name: "pg_cluster"
        archive_mode: on
        archive_command: "cp %p /var/lib/pgsql/14/backups/%f"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.122.77:5432
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /var/lib/pgsql/.pgpass
  pg_ctl_timeout: 60
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: true
  authentication:
    replication:
      username: repli_user
      password: repli_password
    superuser:
      username: postgres
      password: postgres

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF
[root@pg02 ~]# mkdir -p /app/patroni/etc/
[root@pg02 ~]# mkdir -p /app/patroni/log/
[root@pg02 ~]# chown postgres.postgres -R /app/patroni/
[root@pg02 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg02

log:
  level: INFO
  traceback_level: ERROR
  dir: /app/patroni/log
  file_num: 10
  file_size: 104857600

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.122.85:8008

etcd:
  host: 192.168.122.138:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    max_timelines_history: 0
    master_start_timeout: 300
    master_stop_timeout: 0
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 3000
        superuser_reserved_connections: 100
        max_locks_per_transaction: 64
        max_worker_processes: 2
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
        listen_addresses: "*"
        port: 5432
        cluster_name: "pg_cluster"
        archive_mode: on
        archive_command: "cp %p /var/lib/pgsql/14/backups/%f"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.122.85:5432
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /var/lib/pgsql/.pgpass
  pg_ctl_timeout: 60
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: true
  authentication:
    replication:
      username: repli_user
      password: repli_password
    superuser:
      username: postgres
      password: postgres

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF
[root@pg03 ~]# mkdir -p /app/patroni/etc/
[root@pg03 ~]# mkdir -p /app/patroni/log/
[root@pg03 ~]# chown postgres.postgres -R /app/patroni/
[root@pg03 ~]# cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: pg03

log:
  level: INFO
  traceback_level: ERROR
  dir: /app/patroni/log
  file_num: 10
  file_size: 104857600

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.122.102:8008

etcd:
  host: 192.168.122.119:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    max_timelines_history: 0
    master_start_timeout: 300
    master_stop_timeout: 0
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 3000
        superuser_reserved_connections: 100
        max_locks_per_transaction: 64
        max_worker_processes: 2
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
        listen_addresses: "*"
        port: 5432
        cluster_name: "pg_cluster"
        archive_mode: on
        archive_command: "cp %p /var/lib/pgsql/14/backups/%f"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.122.102:5432
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /var/lib/pgsql/.pgpass
  pg_ctl_timeout: 60
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: true
  authentication:
    replication:
      username: repli_user
      password: repli_password
    superuser:
      username: postgres
      password: postgres

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF

在运行之前需要给patroni用到的super用户(postgres)开放本地登录,配置pg_hba

-bash-4.2$ vim 14/data/pg_hba.conf
# server connected
host    replication             repli_user             192.168.122.77/32       md5
host    replication             repli_user             192.168.122.85/32       md5
host    replication             repli_user             192.168.122.102/32      md5
host    all             all             127.0.0.1/32            scram-sha-256
host    all             postgres             127.0.0.1/32            md5

测试patroni

-bash-4.2$ patroni /app/patroni/etc/patroni_config.yml

查看设置的patroni log信息

-bash-4.2$ tail -f /app/patroni/log/patroni.log 
2024-03-09 19:23:06,721 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:16,797 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:26,790 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:36,901 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:46,835 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:23:57,068 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:06,688 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:16,926 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:26,906 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:36,882 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:46,828 INFO: no action. I am (pg01), the leader with the lock
2024-03-09 19:24:56,748 INFO: no action. I am (pg01), the leader with the lock

另外两台log显示

2024-03-09 19:33:12,764 INFO: Lock owner: pg01; I am pg02
2024-03-09 19:33:12,849 INFO: failed to start postgres

2024-03-09 19:33:32,809 INFO: Lock owner: pg01; I am pg03
2024-03-09 19:33:32,888 INFO: failed to start postgres

新建patroni.service服务

cat > /usr/lib/systemd/system/patroni.service <<"EOF"
[Unit]
Description=patroni
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
User=postgres
Group=postgres
Environment="PGHOME=/usr/pgsql-14"
Environment="PGDATA=/var/lib/pgsql/14/data"
Environment="PGPORT=5432"
Environment="LD_LIBRARY_PATH=/usr/pgsql-14/lib"
Environment="PATH=/usr/pgsql-14/bin:/usr/local/bin:/usr/bin"
ExecStart=/bin/bash -c "/usr/local/bin/patroni /app/patroni/etc/patroni_config.yml >> /app/patroni/log/patroni.log 2>&1 &"
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/bin/kill -9 patroni
KillMode=process
TimeoutSec=30
Restart=no

[Install]
WantedBy=multi-user.target
EOF

检查patroni集群状态

-bash-4.2$ patronictl -c /app/patroni/etc/patroni_config.yml list
+ Cluster: pg_cluster (7344240387828606601) ---+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| pg01   | 192.168.122.77  | Leader  | running |  7 |           |
| pg02   | 192.168.122.85  | Replica | running |  4 |        48 |
| pg03   | 192.168.122.102 | Replica | running |  7 |        16 |
+--------+-----------------+---------+---------+----+-----------+
-bash-4.2$

如果replica主机state状态为start failed就需要查看是否是该节点同步的postgresql .partial有如果有问题,需要把backup目录删除重新执行backup命令同步状态

pg_basebackup -F p -D 14/backups -h 192.168.122.77 -p 5432 -U repli_user -F p -v -R

7、主节点安装pgbouncer

yum install pgbouncer -y
vim /etc/pgbouncer/pgbouncer.ini
[databases]
my_test_db = host=localhost port=5432 dbname=my_test_db user=postgres  #注意db配置
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = md5    #需要和pg_hba.conf中的 trust、MD5保持一致
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
stats_users = postgres
vim /etc/pgbouncer/userlist.txt
"postgres" "SCRAM**"
systemctl restart pgbouncer

登录测试

-bash-4.2$ psql -h 127.0.0.1 -p 6432 -d my_test_db
Password for user postgres: 
psql (14.11)
Type "help" for help.

my_test_db=# \d pg_user;
                        View "pg_catalog.pg_user"
    Column    |           Type           | Collation | Nullable | Default 
--------------+--------------------------+-----------+----------+---------
 usename      | name                     |           |          | 
 usesysid     | oid                      |           |          | 
 usecreatedb  | boolean                  |           |          | 
 usesuper     | boolean                  |           |          | 
 userepl      | boolean                  |           |          | 
 usebypassrls | boolean                  |           |          | 
 passwd       | text                     |           |          | 
 valuntil     | timestamp with time zone |           |          | 
 useconfig    | text[]                   | C         |          | 

my_test_db=#

8、搭建haproxy、keepalived

yum makecache
yum update -y
yum install -y vim haproxy
setenforce 0 #会影响haproxy 服务的端口绑定

增加/etc/hosts解析

192.168.122.77 pg01
192.168.122.85 pg02
192.168.122.102 pg03
编辑haproxy配置文件

主库postgresql 的pgbouncer端口6432
另外两个从库是5432

vim /etc/haproxy/haproxy.cfg

global
    log         127.0.0.1 local2       # 
    chroot      /var/lib/haproxy       # 
    pidfile     /var/run/haproxy.pid   # 
    maxconn     3000                   # 
    user        haproxy                # 
    group       haproxy                # 
    daemon                             #
    nbproc 1                           # 
    stats socket /var/lib/haproxy/stats

defaults
    mode tcp                           # mode {tcp|http|health}
    log 127.0.0.1 local2 err           # use syslog
    option     tcplog                  # 
    option     dontlognull             # 
    option     redispatch              # 
    option     abortonclose            # 
    retries    3                       # 
    maxconn    3000                    # 
    timeout queue           1m         # 
    timeout connect         10s        # 
    timeout client          1m         # 
    timeout server          1m         # 
    timeout check           5s         # 

listen status
    bind *:1080                        # 
    mode http
    log global
    stats enable                       # 
    stats refresh 30s                  # 
    maxconn 10                         # 
    stats uri /                        # http//ip:1080/
    stats realm Private lands          # 
    stats auth admin:Admin2023         # user password
    stats hide-version

listen master
    bind *:5000
    mode tcp
    option tcplog
    balance roundrobin                 # 轮询
    option httpchk OPTIONS /master
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg01 192.168.122.77:6432  maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
    server pg02 192.168.122.85:5432  maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
    server pg03 192.168.122.102:5432  maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
setenforce 0
systemctl start haproxy
firewall-cmd --add-port=1080/tcp  #开放绑定端口的访问 http

可以通过http://192.168.122.53:1080/ 和 http://192.168.122.54:1080/ 进行haproxy访问
ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP

安装keepalived

给53 和 54 统一安装keepalived

yum install keepalived -y

配置53为主节点master 增加浮动IP地址

[root@ha_keep1 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   router_id pgservice_router
}

vrrp_script chk_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 5
    fall 3
    rise 5
    timeout 2
}
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 53
    priority 200
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.122.3
    }
    track_script {
        chk_haproxy
    }
}
[root@ha_keep1 ~]# systemctl restart keepalived
[root@ha_keep1 ~]# systemctl enable keepalived
[root@ha_keep1 ~]# systemctl status keepalived
[root@ha_keep1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:00:a1:49 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.53/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
       valid_lft 3128sec preferred_lft 3128sec
    inet 192.168.122.3/32 scope global eth0  #浮动地址已经出现
       valid_lft forever preferred_lft forever
    inet6 fe80::ee78:b71e:23c0:7fc4/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

配置54为备节点backup (因为是备节点,目前浮动地址还挂在主节点53上)

[root@ha_keep2 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   router_id pgservice_router
}

vrrp_script chk_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 5
    fall 3
    rise 5
    timeout 2
}
vrrp_instance VI_1 {
    state BACKUP  #更改为备节点
    interface eth0
    virtual_router_id 53
    priority 100  #降低优先级
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.122.3
    }
    track_script {
        chk_haproxy
    }
}
[root@ha_keep2 ~]# systemctl restart keepalived
[root@ha_keep2 ~]# systemctl enable keepalived
[root@ha_keep2 ~]# systemctl status keepalived
[root@ha_keep2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:a2:85:b8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.54/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
       valid_lft 3572sec preferred_lft 3572sec
    inet6 fe80::df8b:cdfb:69d3:8f24/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

使用浮动地址:http://192.168.122.3:1080/来查看keepalived链接haproxy的状态
ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP
ubuntu20.04环境搭建:etcd+patroni+pgbouncer+haproxy+keepalived的postgresql集群方案-LMLPHP
END!结束

03-11 07:06