Pacemaker是Red Hat High Availability Add-on的一部分。在RHEL上进行试用的最简单方法是从Scientific Linux 或CentOS存储库中进行安装

环境准备

双节点

注:centos修改主机名

临时修改:hostname 主机名  --立即生效

永久修改:hostnamectl set-hostname 主机名  --重启生效

node1 - 192.168.29.246
node2 - 192.168.29.247

系统信息

CentOS Linux release 7.8. (Core)

安装

所有节点使用yum安装Pacemaker以及我们将需要的一些其他必要软件包

yum install pacemaker pcs resource-agents

创建集群

所有节点启动pcs守护进程并设置开机运行

systemctl start pcsd.service
systemctl enable pcsd.service

设置pcs所需的身份验证

#所有节点执行
echo 123456 | passwd --stdin hacluster #主节点执行
pcs cluster auth node1 node2 -u hacluster -p 123456 --force

开始创建

pcs cluster setup --force --name pacemaker1 node1 node2

过程信息如下:

[root@node1 ~]# pcs cluster setup --force --name pacemaker1 node1 node2
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded

Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

启动集群

任一节点执行

pcs cluster start --all

启动信息

[root@node1 ~]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...

集群设置

禁用Fencing

pcs property set stonith-enabled=false

因为只有两个节点,仲裁没有意义,所以我们禁用仲裁

pcs property set no-quorum-policy=ignore

强制集群在单个故障后移动服务

pcs resource defaults migration-threshold=

添加资源

pcs resource create my_first_svc ocf:heartbeat:Dummy op monitor interval=60s

my_first_svc:服务名

ocf:pacemaker:Dummy:需要使用的脚本(Dummy- 一种用作模板以及对此类指南有用的代理)

op monitor interval = 60s 告诉Pacemaker通过调用代理的Monitor操作每1分钟检查一次此服务的运行状况

查看集群状态

[root@node1 ~]# pcs status
Cluster name: pacemaker1
Stack: corosync
Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum
Last updated: Sat Jun ::
Last change: Sat Jun :: by root via cibadmin on node1 nodes configured
resource configured Online: [ node1 node2 ] Full list of resources: my_first_svc (ocf::heartbeat:Dummy): Started node1 Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]# crm_mon -
Stack: corosync
Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum
Last updated: Sat Jun ::
Last change: Sat Jun :: by root via cibadmin on node1 nodes configured
resource configured Online: [ node1 node2 ] Active resources: my_first_svc (ocf::heartbeat:Dummy): Started node1

故障验证

手动停止服务模拟故障

crm_resource --resource my_first_svc --force-stop

1min后再次查看状态,可知服务切换到了节点2

[root@node1 ~]# crm_mon -
Stack: corosync
Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum
Last updated: Sat Jun ::
Last change: Sat Jun :: by root via cibadmin on node1 nodes configured
resource configured Online: [ node1 node2 ] Active resources: my_first_svc (ocf::heartbeat:Dummy): Started node2 Failed Resource Actions:
* my_first_svc_monitor_60000 on node1 'not running' (): call=, status=complete, exitreason='No process state file found',
last-rc-change='Sat Jun 6 15:29:26 2020', queued=0ms, exec=0ms
05-22 13:57