Pacemaker是Red Hat High Availability Add-on的一部分。在RHEL上进行试用的最简单方法是从Scientific Linux 或CentOS存储库中进行安装
环境准备
双节点
注:centos修改主机名
临时修改:hostname 主机名 --立即生效
永久修改:hostnamectl set-hostname 主机名 --重启生效
node1 - 192.168.29.246
node2 - 192.168.29.247
系统信息
CentOS Linux release 7.8. (Core)
安装
所有节点使用yum安装Pacemaker以及我们将需要的一些其他必要软件包
yum install pacemaker pcs resource-agents
创建集群
所有节点启动pcs守护进程并设置开机运行
systemctl start pcsd.service
systemctl enable pcsd.service
设置pcs所需的身份验证
#所有节点执行
echo 123456 | passwd --stdin hacluster #主节点执行
pcs cluster auth node1 node2 -u hacluster -p 123456 --force
开始创建
pcs cluster setup --force --name pacemaker1 node1 node2
过程信息如下:
[root@node1 ~]# pcs cluster setup --force --name pacemaker1 node1 node2
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success
启动集群
任一节点执行
pcs cluster start --all
启动信息
[root@node1 ~]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...
集群设置
禁用Fencing
pcs property set stonith-enabled=false
因为只有两个节点,仲裁没有意义,所以我们禁用仲裁
pcs property set no-quorum-policy=ignore
强制集群在单个故障后移动服务
pcs resource defaults migration-threshold=
添加资源
pcs resource create my_first_svc ocf:heartbeat:Dummy op monitor interval=60s
my_first_svc:服务名
ocf:pacemaker:Dummy:需要使用的脚本(Dummy- 一种用作模板以及对此类指南有用的代理)
op monitor interval = 60s 告诉Pacemaker通过调用代理的Monitor操作每1分钟检查一次此服务的运行状况
查看集群状态
[root@node1 ~]# pcs status
Cluster name: pacemaker1
Stack: corosync
Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum
Last updated: Sat Jun ::
Last change: Sat Jun :: by root via cibadmin on node1 nodes configured
resource configured Online: [ node1 node2 ] Full list of resources: my_first_svc (ocf::heartbeat:Dummy): Started node1 Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]# crm_mon -
Stack: corosync
Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum
Last updated: Sat Jun ::
Last change: Sat Jun :: by root via cibadmin on node1 nodes configured
resource configured Online: [ node1 node2 ] Active resources: my_first_svc (ocf::heartbeat:Dummy): Started node1
故障验证
手动停止服务模拟故障
crm_resource --resource my_first_svc --force-stop
1min后再次查看状态,可知服务切换到了节点2
[root@node1 ~]# crm_mon -
Stack: corosync
Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum
Last updated: Sat Jun ::
Last change: Sat Jun :: by root via cibadmin on node1 nodes configured
resource configured Online: [ node1 node2 ] Active resources: my_first_svc (ocf::heartbeat:Dummy): Started node2 Failed Resource Actions:
* my_first_svc_monitor_60000 on node1 'not running' (): call=, status=complete, exitreason='No process state file found',
last-rc-change='Sat Jun 6 15:29:26 2020', queued=0ms, exec=0ms