CentOS7搭建Pacemaker高可用集群（1）

Pacemaker是Red Hat High Availability Add-on的一部分。在RHEL上进行试用的最简单方法是从Scientific Linux 或CentOS存储库中进行安装

环境准备

双节点

注：centos修改主机名

临时修改：hostname 主机名 --立即生效

永久修改：hostnamectl set-hostname 主机名 --重启生效

node1 - 192.168.29.246

node2 - 192.168.29.247

系统信息

CentOS Linux release 7.8. (Core)

安装

所有节点使用yum安装Pacemaker以及我们将需要的一些其他必要软件包

yum install pacemaker pcs resource-agents

创建集群

所有节点启动pcs守护进程并设置开机运行

systemctl start pcsd.service

systemctl enable pcsd.service

设置pcs所需的身份验证

#所有节点执行

echo 123456 | passwd --stdin hacluster

#主节点执行

pcs cluster auth node1 node2 -u hacluster -p 123456 --force

开始创建

pcs cluster setup --force --name pacemaker1 node1 node2

过程信息如下：

[root@node1 ~]# pcs cluster setup --force --name pacemaker1 node1 node2
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded

Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

启动集群

任一节点执行

pcs cluster start --all

启动信息

[root@node1 ~]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...

集群设置

禁用Fencing

pcs property set stonith-enabled=false

因为只有两个节点，仲裁没有意义，所以我们禁用仲裁

pcs property set no-quorum-policy=ignore

强制集群在单个故障后移动服务

pcs resource defaults migration-threshold=

添加资源

pcs resource create my_first_svc ocf:heartbeat:Dummy op monitor interval=60s

my_first_svc：服务名

ocf:pacemaker:Dummy：需要使用的脚本（Dummy- 一种用作模板以及对此类指南有用的代理）

op monitor interval = 60s 告诉Pacemaker通过调用代理的Monitor操作每1分钟检查一次此服务的运行状况

查看集群状态

[root@node1 ~]# pcs status

Cluster name: pacemaker1

Stack: corosync

Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum

Last updated: Sat Jun   ::

Last change: Sat Jun   ::  by root via cibadmin on node1

 nodes configured

 resource configured

Online: [ node1 node2 ]

Full list of resources:

 my_first_svc   (ocf::heartbeat:Dummy): Started node1

Daemon Status:

  corosync: active/disabled

  pacemaker: active/disabled

  pcsd: active/enabled

[root@node1 ~]# crm_mon -

Stack: corosync

Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum

Last updated: Sat Jun   ::

Last change: Sat Jun   ::  by root via cibadmin on node1

 nodes configured

 resource configured

Online: [ node1 node2 ]

Active resources:

 my_first_svc   (ocf::heartbeat:Dummy): Started node1

故障验证

手动停止服务模拟故障

crm_resource --resource my_first_svc --force-stop

1min后再次查看状态，可知服务切换到了节点2

[root@node1 ~]# crm_mon -

Stack: corosync

Current DC: node1 (version 1.1.-.el7-f14e36fd43) - partition with quorum

Last updated: Sat Jun   ::

Last change: Sat Jun   ::  by root via cibadmin on node1

 nodes configured

 resource configured

Online: [ node1 node2 ]

Active resources:

 my_first_svc   (ocf::heartbeat:Dummy): Started node2

Failed Resource Actions:

* my_first_svc_monitor_60000 on node1 'not running' (): call=, status=complete, exitreason='No process state file found',

    last-rc-change='Sat Jun  6 15:29:26 2020', queued=0ms, exec=0ms