1. MHA简介

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,该工具仅适用于MySQL Replication 环境,目的在于维持master主库的高可用性。MHA 是自动的master 故障转移和slave提升的软件包,基于标准的MySQL复制(异步/半同步)。

MHA由两部分组成:MHA Manager(管理节点)和 MHA Node(数据节点)。

  • MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。
  • MHA Node运行在每台MySQL服务器上。

MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

1.1 MHA工作原理总结为如下

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)apply差异的中继日志(relay log)到其他的slave;

(4)apply从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新的master;

(6)使其他的slave连接新的master进行复制。

1.2 MHA工具包介绍

Manager管理工具

`masterha_check_ssh`              检查MHA的SSH配置状况
`masterha_check_repl` 检查MySQL复制状况
`masterha_manger` 启动MHA
`masterha_check_status` 检测当前MHA运行状态
`masterha_master_monitor` 检测master是否宕机
`masterha_master_switch` 控制故障转移(自动或者手动)
`masterha_conf_host` 添加或删除配置的server信息

Node数据节点工具

`save_binary_logs`                保存和复制master的二进制日志
`apply_diff_relay_logs` 识别差异的中继日志事件并将其差异的事件应用于其他的slave
`filter_mysqlbinlog` 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
`purge_relay_logs` 清除中继日志(不会阻塞SQL线程)

2. 部署MHA

2.1 环境介绍

CentOS 7.5mysqldb1(100)Maternode10033065.7.23192.168.56.111
CentOS 7.5mysqldb2(200)Slavenode
manager
20033065.7.23 
CentOS 7.5mysqldb3(210)Slavenode21033065.7.23 

2.2 一主两从复制搭建

搭建过程参考MySQL 使用GTID进行复制

创建mha管理账号

#在(mysqldb1)master上执行
GRANT ALL PRIVILEGES ON *.* TO 'mha_rep'@'192.168.56.%' IDENTIFIED BY '';

从库上执行

#设置为只读模式
set global read_only=;
#read_only=1只读模式,可以限定普通用户进行数据修改的操作,但不会限定具有super权限的用户的数据修改操作 #禁用relay log自动清除
set global relay_log_purge=;

2.3 配置互信

(1) 在三台机器上生成各自的key文件

[root@mysqldb1 :: /root]
# ssh-keygen -t rsa
[root@mysqldb2 :: /root]
# ssh-keygen -t rsa
[root@mysqldb3 :: /root]
# ssh-keygen -t rsa

(2)用ssh-copy-id把公钥复制到每台主机上

[root@mysqldb1 :: /root]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected] [root@mysqldb2 :: /root]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected] [root@mysqldb3 :: /root]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
# ssh-copy-id -i .ssh/id_rsa.pub [email protected] 【注意】:配置完成后,采用 ssh [hostname] date 进行验证

2.4 下载MHA

网址:
https://github.com/yoshinorim/mha4mysql-manager
https://github.com/wubx/mha4mysql-manager 建议使用
文件:mha4mysql-manager-master.zip

网址:
https://github.com/yoshinorim/mha4mysql-node
https://github.com/wubx/mha4mysql-node #建议使用
文件:mha4mysql-node-master.zip

2.5 安装MHA

每个节点都安装manager和node,但只有mysqldb2上启动manager。

1)解压安装包

unzip mha4mysql-manager-master.zip
unzip mha4mysql-node-master.zip

2)安装依赖包

--yum search perl |grep install
yum install cpan
yum install perl-Module-Install.noarch
yum install perl-DBI
yum install perl-DBD-MySQL
yum install perl-Time-HiRes.x86_64
yum install perl-Config-Tiny.noarch
yum install perl-Log-Dispatch.noarch
yum install perl-Parallel-ForkManager.noarch

3)编译安装(node)

# cd /opt/mha4mysql-node-master/

# perl Makefile.PL
include /opt/mha4mysql-node-master/inc/Module/Install.pm
include inc/Module/Install/Metadata.pm
include inc/Module/Install/Base.pm
include inc/Module/Install/Makefile.pm
include inc/Module/Install/Scripts.pm
include inc/Module/Install/AutoInstall.pm
include inc/Module/Install/Include.pm
include inc/Module/AutoInstall.pm
*** Module::AutoInstall version 1.06
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
*** Module::AutoInstall configuration finished.
include inc/Module/Install/WriteAll.pm
include inc/Module/Install/Win32.pm
include inc/Module/Install/Can.pm
include inc/Module/Install/Fetch.pm
Checking if your kit is complete...
Warning: the following files are missing in your kit:
META.yml
Please inform the author.
Writing Makefile for mha4mysql::node
Writing MYMETA.yml and MYMETA.json
Writing META.yml # make && make install
cp lib/MHA/BinlogManager.pm blib/lib/MHA/BinlogManager.pm
cp lib/MHA/BinlogPosFindManager.pm blib/lib/MHA/BinlogPosFindManager.pm
cp lib/MHA/BinlogPosFinderXid.pm blib/lib/MHA/BinlogPosFinderXid.pm
cp lib/MHA/BinlogHeaderParser.pm blib/lib/MHA/BinlogHeaderParser.pm
cp lib/MHA/BinlogPosFinder.pm blib/lib/MHA/BinlogPosFinder.pm
cp lib/MHA/BinlogPosFinderElp.pm blib/lib/MHA/BinlogPosFinderElp.pm
cp lib/MHA/NodeUtil.pm blib/lib/MHA/NodeUtil.pm
cp lib/MHA/SlaveUtil.pm blib/lib/MHA/SlaveUtil.pm
cp lib/MHA/NodeConst.pm blib/lib/MHA/NodeConst.pm
cp bin/filter_mysqlbinlog blib/script/filter_mysqlbinlog
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/filter_mysqlbinlog
cp bin/apply_diff_relay_logs blib/script/apply_diff_relay_logs
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/apply_diff_relay_logs
cp bin/purge_relay_logs blib/script/purge_relay_logs
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/purge_relay_logs
cp bin/save_binary_logs blib/script/save_binary_logs
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/save_binary_logs
Manifying blib/man1/filter_mysqlbinlog.
Manifying blib/man1/apply_diff_relay_logs.
Manifying blib/man1/purge_relay_logs.
Manifying blib/man1/save_binary_logs.
Installing /usr/local/share/perl5/MHA/BinlogManager.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm
Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm
Installing /usr/local/share/perl5/MHA/NodeUtil.pm
Installing /usr/local/share/perl5/MHA/SlaveUtil.pm
Installing /usr/local/share/perl5/MHA/NodeConst.pm
Installing /usr/local/share/man/man1/filter_mysqlbinlog.
Installing /usr/local/share/man/man1/apply_diff_relay_logs.
Installing /usr/local/share/man/man1/purge_relay_logs.
Installing /usr/local/share/man/man1/save_binary_logs.
Installing /usr/local/bin/filter_mysqlbinlog
Installing /usr/local/bin/apply_diff_relay_logs
Installing /usr/local/bin/purge_relay_logs
Installing /usr/local/bin/save_binary_logs
Appending installation info to /usr/lib64/perl5/perllocal.pod

4)编译安装(manager)

# cd mha4mysql-manager-master/

# perl Makefile.PL
include /opt/mha4mysql-manager-master/inc/Module/Install.pm
include inc/Module/Install/Metadata.pm
include inc/Module/Install/Base.pm
include inc/Module/Install/Makefile.pm
include inc/Module/Install/Scripts.pm
include inc/Module/Install/AutoInstall.pm
include inc/Module/Install/Include.pm
include inc/Module/AutoInstall.pm
*** Module::AutoInstall version 1.06
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
- Time::HiRes ...loaded. (1.9725)
- Config::Tiny ...loaded. (2.14)
- Log::Dispatch ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.18)
- MHA::NodeConst ...loaded. (0.57)
*** Module::AutoInstall configuration finished.
include inc/Module/Install/WriteAll.pm
include inc/Module/Install/Win32.pm
include inc/Module/Install/Can.pm
include inc/Module/Install/Fetch.pm
Checking if your kit is complete...
Warning: the following files are missing in your kit:
META.yml
Please inform the author.
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json
Writing META.yml # make && make install
cp lib/MHA/ManagerUtil.pm blib/lib/MHA/ManagerUtil.pm
cp lib/MHA/Config.pm blib/lib/MHA/Config.pm
cp lib/MHA/HealthCheck.pm blib/lib/MHA/HealthCheck.pm
cp lib/MHA/ManagerConst.pm blib/lib/MHA/ManagerConst.pm
cp lib/MHA/ServerManager.pm blib/lib/MHA/ServerManager.pm
cp lib/MHA/FileStatus.pm blib/lib/MHA/FileStatus.pm
cp lib/MHA/ManagerAdmin.pm blib/lib/MHA/ManagerAdmin.pm
cp lib/MHA/ManagerAdminWrapper.pm blib/lib/MHA/ManagerAdminWrapper.pm
cp lib/MHA/MasterFailover.pm blib/lib/MHA/MasterFailover.pm
cp lib/MHA/MasterMonitor.pm blib/lib/MHA/MasterMonitor.pm
cp lib/MHA/MasterRotate.pm blib/lib/MHA/MasterRotate.pm
cp lib/MHA/SSHCheck.pm blib/lib/MHA/SSHCheck.pm
cp lib/MHA/Server.pm blib/lib/MHA/Server.pm
cp lib/MHA/DBHelper.pm blib/lib/MHA/DBHelper.pm
cp bin/masterha_stop blib/script/masterha_stop
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop
cp bin/masterha_conf_host blib/script/masterha_conf_host
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host
cp bin/masterha_check_repl blib/script/masterha_check_repl
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl
cp bin/masterha_check_status blib/script/masterha_check_status
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status
cp bin/masterha_master_monitor blib/script/masterha_master_monitor
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor
cp bin/masterha_check_ssh blib/script/masterha_check_ssh
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh
cp bin/masterha_master_switch blib/script/masterha_master_switch
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch
cp bin/masterha_secondary_check blib/script/masterha_secondary_check
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check
cp bin/masterha_manager blib/script/masterha_manager
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager
Manifying blib/man1/masterha_stop.
Manifying blib/man1/masterha_conf_host.
Manifying blib/man1/masterha_check_repl.
Manifying blib/man1/masterha_check_status.
Manifying blib/man1/masterha_master_monitor.
Manifying blib/man1/masterha_check_ssh.
Manifying blib/man1/masterha_master_switch.
Manifying blib/man1/masterha_secondary_check.
Manifying blib/man1/masterha_manager.
Installing /usr/local/share/perl5/MHA/ManagerUtil.pm
Installing /usr/local/share/perl5/MHA/Config.pm
Installing /usr/local/share/perl5/MHA/HealthCheck.pm
Installing /usr/local/share/perl5/MHA/ManagerConst.pm
Installing /usr/local/share/perl5/MHA/ServerManager.pm
Installing /usr/local/share/perl5/MHA/FileStatus.pm
Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm
Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm
Installing /usr/local/share/perl5/MHA/MasterFailover.pm
Installing /usr/local/share/perl5/MHA/MasterMonitor.pm
Installing /usr/local/share/perl5/MHA/MasterRotate.pm
Installing /usr/local/share/perl5/MHA/SSHCheck.pm
Installing /usr/local/share/perl5/MHA/Server.pm
Installing /usr/local/share/perl5/MHA/DBHelper.pm
Installing /usr/local/share/man/man1/masterha_stop.
Installing /usr/local/share/man/man1/masterha_conf_host.
Installing /usr/local/share/man/man1/masterha_check_repl.
Installing /usr/local/share/man/man1/masterha_check_status.
Installing /usr/local/share/man/man1/masterha_master_monitor.
Installing /usr/local/share/man/man1/masterha_check_ssh.
Installing /usr/local/share/man/man1/masterha_master_switch.
Installing /usr/local/share/man/man1/masterha_secondary_check.
Installing /usr/local/share/man/man1/masterha_manager.
Installing /usr/local/bin/masterha_stop
Installing /usr/local/bin/masterha_conf_host
Installing /usr/local/bin/masterha_check_repl
Installing /usr/local/bin/masterha_check_status
Installing /usr/local/bin/masterha_master_monitor
Installing /usr/local/bin/masterha_check_ssh
Installing /usr/local/bin/masterha_master_switch
Installing /usr/local/bin/masterha_secondary_check
Installing /usr/local/bin/masterha_manager
Appending installation info to /usr/lib64/perl5/perllocal.pod

2.6 配置MHA

2.6.1创建MHA工作目录

mkdir -p /etc/masterha
mkdir -p /var/log/masterha/app1

2.6.2 创建配置文件

cp /opt/mha4mysql-manager-master/samples/conf/app1.cnf /etc/masterha/
cd /etc/masterha/ #修改app1.cnf配置文件
vi app1.cnf [server default]
#设置manager的工作目录
manager_workdir=/var/log/masterha/app1
#设置manager的日志
manager_log=/var/log/masterha/app1/manager.log
#通过多种网络路径检测ssh是否能够连接到master脚本,若其中一个路径不通,则会通过另一个路径ssh连接到master
secondary_check_script=/usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1
#故障转移脚本
master_ip_failover_script=/usr/local/bin/master_ip_failover
#master_ip_failover_script=/usr/bin/master_ip_failover
#master_ip_online_change_script=/usr/bin/master_ip_online_change
#shutdown_script=/usr/bin/power_manager
#report_script=/usr/bin/send_report #设置监控的用户
user=mha_rep
#设置监控用户的密码
password= #设置ssh的登录用户名
ssh_user=root #设置复制环境中的复制用户名
repl_user=repl
#设置复制用户的密码
repl_password=wanbin #设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
ping_interval= [server1]
hostname=192.168.56.100
#候选master
candidate_master=
master_binlog_dir=/data/mysql/mysql3306/logs [server2]
hostname=192.168.56.200
#候选master
candidate_master=
master_binlog_dir=/data/mysql/mysql3306/logs [server3]
hostname=192.168.56.210
#该服务器不会成为new master
no_master=
master_binlog_dir=/data/mysql/mysql3306/logs

2.6.3 编辑故障转移脚本

# cd /opt/mha4mysql-manager-master/samples/scripts/

# ls
master_ip_failover master_ip_online_change power_manager send_report # cp master_ip_failover /usr/local/bin/ # 编辑master_ip_failover脚本
#!/usr/bin/env perl # Copyright (C) DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# Franklin Street, Fifth Floor, Boston, MA - USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict;
use warnings FATAL => 'all'; use Getopt::Long; my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
); my $vip = '192.168.56.111/24';
my $key = '';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip up";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $ssh_Bcast_arp = "/usr/bin/arping -c 3 -A 192.168.56.111"; #ARP回复模式。要是不加则服务器会自动等到vip缓存失效,期间VIP会有一定时间的不可用。 GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = ;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = ;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) { my $exit_code = ;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
&start_arp();
$exit_code = ;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit ;
} else {
&usage();
exit ;
}
} sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub start_arp() {
`ssh $ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`;
}
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
} sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

2.6.4 配置VIP

在用虚拟IP的时候,需要在开启MHA程序之前要把虚拟IP先设置到主上去,否则MHA是不会自己的去设置VIP,第一次设置VIP之后,后续脚本的故障转移等功能会自动的对VIP进行切换。

# ifconfig eth0: 192.168.56.111 up

# /sbin/arping -c  -A 192.168.56.111 -I eth0
ARPING 192.168.56.111 from 192.168.56.111 eth0 Sent probes ( broadcast(s))
Received response(s)

2.6.5 检查配置文件

利用mha工具检测ssh

# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Mon Oct :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Starting SSH connection tests..
Mon Oct :: - [debug]
Mon Oct :: - [debug] Connecting via SSH from [email protected](192.168.56.100:) to [email protected](192.168.56.200:)..
Mon Oct :: - [debug] ok.
Mon Oct :: - [debug] Connecting via SSH from [email protected](192.168.56.100:) to [email protected](192.168.56.210:)..
Mon Oct :: - [debug] ok.
Mon Oct :: - [debug]
Mon Oct :: - [debug] Connecting via SSH from [email protected](192.168.56.200:) to [email protected](192.168.56.100:)..
Mon Oct :: - [debug] ok.
Mon Oct :: - [debug] Connecting via SSH from [email protected](192.168.56.200:) to [email protected](192.168.56.210:)..
Mon Oct :: - [debug] ok.
Mon Oct :: - [debug]
Mon Oct :: - [debug] Connecting via SSH from [email protected](192.168.56.210:) to [email protected](192.168.56.100:)..
Mon Oct :: - [debug] ok.
Mon Oct :: - [debug] Connecting via SSH from [email protected](192.168.56.210:) to [email protected](192.168.56.200:)..
Mon Oct :: - [debug] ok.
Mon Oct :: - [info] All SSH connection tests passed successfully.

2.6.6 利用mha工具check检查repl环境

# /usr/local/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf
Mon Oct :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] MHA::MasterMonitor version 0.57.
Mon Oct :: - [info] GTID failover mode =
Mon Oct :: - [info] Dead Servers:
Mon Oct :: - [info] Alive Servers:
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:)
Mon Oct :: - [info] Alive Slaves:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info] Current Alive Master: 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Checking slave configurations..
Mon Oct :: - [info] read_only= is not set on slave 192.168.56.200(192.168.56.200:).
Mon Oct :: - [info] read_only= is not set on slave 192.168.56.210(192.168.56.210:).
Mon Oct :: - [info] Checking replication filtering settings..
Mon Oct :: - [info] binlog_do_db= , binlog_ignore_db=
Mon Oct :: - [info] Replication filtering check ok.
Mon Oct :: - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct :: - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct :: - [info] HealthCheck: SSH to 192.168.56.100 is reachable.
Mon Oct :: - [info]
192.168.56.100(192.168.56.100:) (current master)
+--192.168.56.200(192.168.56.200:)
+--192.168.56.210(192.168.56.210:) Mon Oct :: - [info] Checking replication health on 192.168.56.200..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Checking replication health on 192.168.56.210..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Checking master_ip_failover_script status:
Mon Oct :: - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port= IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: 192.168.56.111/ up=== Checking the Status of the script.. OK
Mon Oct :: - [info] OK.
Mon Oct :: - [warning] shutdown_script is not defined.
Mon Oct :: - [info] Got exit code (Not master dead). MySQL Replication Health is OK.

2.6.7 开启MHA Manager,并查看状态

# masterha_manager --conf=/etc/masterha/app1.cnf > /var/log/masterha/app1/manager.log &

# tail -100f /var/log/masterha/app1/manager.log
Mon Oct :: - [info] MHA::MasterMonitor version 0.57.
Mon Oct :: - [info] GTID failover mode =
Mon Oct :: - [info] Dead Servers:
Mon Oct :: - [info] Alive Servers:
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:)
Mon Oct :: - [info] Alive Slaves:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info] Current Alive Master: 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Checking slave configurations..
Mon Oct :: - [info] Checking replication filtering settings..
Mon Oct :: - [info] binlog_do_db= , binlog_ignore_db=
Mon Oct :: - [info] Replication filtering check ok.
Mon Oct :: - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct :: - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct :: - [info] HealthCheck: SSH to 192.168.56.100 is reachable.
Mon Oct :: - [info]
192.168.56.100(192.168.56.100:) (current master)
+--192.168.56.200(192.168.56.200:)
+--192.168.56.210(192.168.56.210:) Mon Oct :: - [info] Checking master_ip_failover_script status:
Mon Oct :: - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port= IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: 192.168.56.111/ up=== Checking the Status of the script.. OK
Mon Oct :: - [info] OK.
Mon Oct :: - [warning] shutdown_script is not defined.
Mon Oct :: - [info] Set master ping interval seconds.
Mon Oct :: - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1
Mon Oct :: - [info] Starting ping health check on 192.168.56.100(192.168.56.100:)..
Mon Oct :: - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. #查看状态
# /usr/local/bin/masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:) is running(:PING_OK), master:192.168.56.100

3. 测试MHA

3.1 模拟master(mysqldb1)故障

(1)查看VIP漂移和MySQL自动切换情况

#kill mysqldb1上的mysqld进程
# ps -ef|grep mysqld
avahi : ? :: avahi-daemon: running [mysqldb1.local]
mysql : pts/ :: mysqld --defaults-file=/etc/my3306.cnf
root : pts/ :: ssh mysqldb1
root : pts/ :: grep --color=auto mysqld # kill -

(2)在mysqldb2上观察manager日志:

# tail -f  /var/log/masterha/app1/manager.log

IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: 192.168.56.111/ up===

Checking the Status of the script.. OK
Mon Oct :: - [info] OK.
Mon Oct :: - [warning] shutdown_script is not defined.
Mon Oct :: - [info] Set master ping interval seconds.
Mon Oct :: - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1
Mon Oct :: - [info] Starting ping health check on 192.168.56.100(192.168.56.100:)..
Mon Oct :: - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Mon Oct :: - [warning] Got error on MySQL select ping: (MySQL server has gone away)
Mon Oct :: - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1 --user=root --master_host=192.168.56.100 --master_ip=192.168.56.100 --master_port= --master_user=mha_rep --master_password= --ping_type=SELECT
Mon Oct :: - [info] Executing SSH check script: exit
Mon Oct :: - [info] HealthCheck: SSH to 192.168.56.100 is reachable.
Monitoring server mysqldb2 is reachable, Master is not reachable from mysqldb2. OK.
Monitoring server mysqldb1 is reachable, Master is not reachable from mysqldb1. OK.
Mon Oct :: - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Oct :: - [warning] Got error on MySQL connect: (Can't connect to MySQL server on '192.168.56.100' (111))
Mon Oct :: - [warning] Connection failed time(s)..
Mon Oct :: - [warning] Got error on MySQL connect: (Can't connect to MySQL server on '192.168.56.100' (111))
Mon Oct :: - [warning] Connection failed time(s)..
Mon Oct :: - [warning] Got error on MySQL connect: (Can't connect to MySQL server on '192.168.56.100' (111))
Mon Oct :: - [warning] Connection failed time(s)..
Mon Oct :: - [warning] Master is not reachable from health checker!
Mon Oct :: - [warning] Master 192.168.56.100(192.168.56.100:) is not reachable!
Mon Oct :: - [warning] SSH is reachable.
Mon Oct :: - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Mon Oct :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] GTID failover mode =
Mon Oct :: - [info] Dead Servers:
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Alive Servers:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:)
Mon Oct :: - [info] Alive Slaves:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info] Checking slave configurations..
Mon Oct :: - [info] Checking replication filtering settings..
Mon Oct :: - [info] Replication filtering check ok.
Mon Oct :: - [info] Master is down!
Mon Oct :: - [info] Terminating monitoring script.
Mon Oct :: - [info] Got exit code (Master dead).
Mon Oct :: - [info] MHA::MasterFailover version 0.57.
Mon Oct :: - [info] Starting master failover.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Configuration Check Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] GTID failover mode =
Mon Oct :: - [info] Dead Servers:
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Checking master reachability via MySQL(double check)...
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Alive Servers:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:)
Mon Oct :: - [info] Alive Slaves:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info] Starting GTID based failover.
Mon Oct :: - [info]
Mon Oct :: - [info] ** Phase : Configuration Check Phase completed.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Dead Master Shutdown Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] Forcing shutdown so that applications never connect to the current master..
Mon Oct :: - [info] Executing master IP deactivation script:
Mon Oct :: - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port= --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: 192.168.56.111/ up=== Disabling the VIP on old master: 192.168.56.100
Mon Oct :: - [info] done.
Mon Oct :: - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Oct :: - [info] * Phase : Dead Master Shutdown Phase completed.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Master Recovery Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] The latest binary log file/position on all slaves is my3306_binlog.:
Mon Oct :: - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info] The oldest binary log file/position on all slaves is my3306_binlog.:
Mon Oct :: - [info] Oldest slaves:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase 3.3: Determining New Master Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] Searching new master from slaves..
Mon Oct :: - [info] Candidate masters from the configuration file:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] Non-candidate masters:
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set)
Mon Oct :: - [info] Searching from candidate_master slaves which have received the latest relay log events..
Mon Oct :: - [info] New master is 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] Starting master failover..
Mon Oct :: - [info]
From:
192.168.56.100(192.168.56.100:) (current master)
+--192.168.56.200(192.168.56.200:)
+--192.168.56.210(192.168.56.210:) To:
192.168.56.200(192.168.56.200:) (new master)
+--192.168.56.210(192.168.56.210:)
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase 3.3: New Master Recovery Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] Waiting all logs to be applied..
Mon Oct :: - [info] done.
Mon Oct :: - [info] Getting new master's binlog name and position..
Mon Oct :: - [info] my3306_binlog.:
Mon Oct :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.56.200', MASTER_PORT=, MASTER_AUTO_POSITION=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Oct :: - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: my3306_binlog., , 7390a401-b705-11e8-9ed9-080027b0b461:-
Mon Oct :: - [info] Executing master IP activate script:
Mon Oct :: - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port= --new_master_host=192.168.56.200 --new_master_ip=192.168.56.200 --new_master_port= --new_master_user='mha_rep' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: 192.168.56.111/ up=== Enabling the VIP - 192.168.56.111/ on the new master - 192.168.56.200
bash: /usr/bin/arping: No such file or directory
Mon Oct :: - [info] OK.
Mon Oct :: - [info] Setting read_only= on 192.168.56.200(192.168.56.200:)..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] ** Finished master recovery successfully.
Mon Oct :: - [info] * Phase : Master Recovery Phase completed.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Slaves Recovery Phase..
Mon Oct :: - [info]
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Oct :: - [info]
Mon Oct :: - [info] -- Slave recovery on host 192.168.56.210(192.168.56.210:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..210_3306_20181029103800.log if it takes time..
Mon Oct :: - [info]
Mon Oct :: - [info] Log messages from 192.168.56.210 ...
Mon Oct :: - [info]
Mon Oct :: - [info] Resetting slave 192.168.56.210(192.168.56.210:) and starting replication from the new master 192.168.56.200(192.168.56.200:)..
Mon Oct :: - [info] Executed CHANGE MASTER.
Mon Oct :: - [info] Slave started.
Mon Oct :: - [info] gtid_wait(7390a401-b705-11e8-9ed9-080027b0b461:-) completed on 192.168.56.210(192.168.56.210:). Executed events.
Mon Oct :: - [info] End of log messages from 192.168.56.210.
Mon Oct :: - [info] -- Slave on host 192.168.56.210(192.168.56.210:) started.
Mon Oct :: - [info] All new slave servers recovered successfully.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : New master cleanup phase..
Mon Oct :: - [info]
Mon Oct :: - [info] Resetting slave info on the new master..
Mon Oct :: - [info] 192.168.56.200: Resetting slave info succeeded.
Mon Oct :: - [info] Master failover to 192.168.56.200(192.168.56.200:) completed successfully.
Mon Oct :: - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.56.100(192.168.56.100:) to 192.168.56.200(192.168.56.200:) succeeded Master 192.168.56.100(192.168.56.100:) is down! Check MHA Manager logs at mysqldb2:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.56.100(192.168.56.100:)
Selected 192.168.56.200(192.168.56.200:) as a new master.
192.168.56.200(192.168.56.200:): OK: Applying all logs succeeded.
192.168.56.200(192.168.56.200:): OK: Activated master IP address.
192.168.56.210(192.168.56.210:): OK: Slave started, replicating from 192.168.56.200(192.168.56.200:)
192.168.56.200(192.168.56.200:): Resetting slave info succeeded.
Master failover to 192.168.56.200(192.168.56.200:) completed successfully.

(3)查看vip是否漂移到mysqldb2上

# ifconfig
eth0: flags=<UP,BROADCAST,RUNNING,MULTICAST> mtu
inet 192.168.56.200 netmask 255.255.255.0 broadcast 192.168.56.255
... eth0:: flags=<UP,BROADCAST,RUNNING,MULTICAST> mtu
inet 192.168.56.111 netmask 255.255.255.0 broadcast 192.168.56.255
ether :::5b:8a:9a txqueuelen (Ethernet)
...

3.2 mysqldb1重新加入集群

一旦发生MHA切换,管理进程(Manager)将会退出,无法进行再次测试,需将故障数据库解决掉之后,重新change加入到MHA环境中来,并且要保证app1.failover.complete不存在或则加上--ignore_last_failover参数忽略,才能再次开启管理进程。

# mysqld --defaults-file=/etc/my3306.cnf &

mysql> CHANGE MASTER TO master_host='192.168.56.200',
master_port=,
master_user='repl',
master_password='wanbin',
master_auto_position=; mysql> start slave;

3.3 masterha_master_switch工具进行主从切换

(1)查看mha状态

在手动切换的同时需要保证没有启用MHA自动切换功能。

# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(:NOT_RUNNING).

(2) 手工切换

# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.56.100 --orig_master_is_new_slave

#参数解释
--master_state:代表当前主库的状态为alive
--new_master_host:代表切换后新主库为192.168.56.
--orig_master_is_new_slave:将原来的主库变更为slave节点。 Mon Oct :: - [info] MHA::MasterRotate version 0.57.
Mon Oct :: - [info] Starting online master switch..
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Configuration Check Phase..
Mon Oct :: - [info]
Mon Oct :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] GTID failover mode =
Mon Oct :: - [info] Current Alive Master: 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] Alive Slaves:
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.200(192.168.56.200:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.56.200(192.168.56.200:)? (YES/no): YES
Mon Oct :: - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Checking MHA is not monitoring or doing failover..
Mon Oct :: - [info] Checking replication health on 192.168.56.100..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Checking replication health on 192.168.56.210..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] 192.168.56.100 can be new master.
Mon Oct :: - [info]
From:
192.168.56.200(192.168.56.200:) (current master)
+--192.168.56.100(192.168.56.100:)
+--192.168.56.210(192.168.56.210:) To:
192.168.56.100(192.168.56.100:) (new master)
+--192.168.56.210(192.168.56.210:)
+--192.168.56.200(192.168.56.200:) Starting master switch from 192.168.56.200(192.168.56.200:) to 192.168.56.100(192.168.56.100:)? (yes/NO): yes
Mon Oct :: - [info] Checking whether 192.168.56.100(192.168.56.100:) is ok for the new master..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:): Resetting slave pointing to the dummy host.
Mon Oct :: - [info] ** Phase : Configuration Check Phase completed.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Rejecting updates Phase..
Mon Oct :: - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Mon Oct :: - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Oct :: - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Orig master binlog:pos is my3306_binlog.:.
Mon Oct :: - [info] Waiting to execute all relay logs on 192.168.56.100(192.168.56.100:)..
Mon Oct :: - [info] master_pos_wait(my3306_binlog.:) completed on 192.168.56.100(192.168.56.100:). Executed events.
Mon Oct :: - [info] done.
Mon Oct :: - [info] Getting new master's binlog name and position..
Mon Oct :: - [info] my3306_binlog.:
Mon Oct :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.56.100', MASTER_PORT=, MASTER_AUTO_POSITION=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Oct :: - [info]
Mon Oct :: - [info] * Switching slaves in parallel..
Mon Oct :: - [info]
Mon Oct :: - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:) started, pid:
Mon Oct :: - [info]
Mon Oct :: - [info] Log messages from 192.168.56.210 ...
Mon Oct :: - [info]
Mon Oct :: - [info] Waiting to execute all relay logs on 192.168.56.210(192.168.56.210:)..
Mon Oct :: - [info] master_pos_wait(my3306_binlog.:) completed on 192.168.56.210(192.168.56.210:). Executed events.
Mon Oct :: - [info] done.
Mon Oct :: - [info] Resetting slave 192.168.56.210(192.168.56.210:) and starting replication from the new master 192.168.56.100(192.168.56.100:)..
Mon Oct :: - [info] Executed CHANGE MASTER.
Mon Oct :: - [info] Slave started.
Mon Oct :: - [info] End of log messages from 192.168.56.210 ...
Mon Oct :: - [info]
Mon Oct :: - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:) succeeded.
Mon Oct :: - [info] Unlocking all tables on the orig master:
Mon Oct :: - [info] Executing UNLOCK TABLES..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Starting orig master as a new slave..
Mon Oct :: - [info] Resetting slave 192.168.56.200(192.168.56.200:) and starting replication from the new master 192.168.56.100(192.168.56.100:)..
Mon Oct :: - [info] Executed CHANGE MASTER.
Mon Oct :: - [info] Slave started.
Mon Oct :: - [info] All new slave servers switched successfully.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : New master cleanup phase..
Mon Oct :: - [info]
Mon Oct :: - [info] 192.168.56.100: Resetting slave info succeeded.
Mon Oct :: - [info] Switching master to 192.168.56.100(192.168.56.100:) completed successfully.

手工切换后,vip没有自动漂移到mysqldb1上,因为还没设置master_ip_online_change脚本。

(3)编辑master_ip_online_change脚本

# cp /opt/mha4mysql-manager-master/samples/scripts/master_ip_online_change /usr/local/bin/master_ip_online_change

# vi  /usr/local/bin/master_ip_online_change

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all'; use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper; my $_tstart;
my $_running_interval = 0.1;
my (
$command,
$orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user,
$new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user,
); my $vip = '192.168.56.111/24';
my $key = '';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip up";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $ssh_Bcast_arp = "/usr/bin/arping -c 3 -A 192.168.56.111"; GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
); exit &main(); sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
} sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
} sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = unless ($running_time_threshold);
$type = unless ($type);
my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= ); if ( $type >= ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
} if ( $type >= ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
} push @threads, $ref;
}
return @threads;
} sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# . Set read_only= on the new master
# . DROP USER so that no app user can establish new connections
# . Set read_only= on the current master
# . Kill current queries
# * Any database access failure will result in script die.
my $exit_code = ;
eval {
## Setting read_only= on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
#$orig_master_handler->disable_log_bin_local();
#print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler); ## Waiting for N * milliseconds so that current connections can exit
my $time_until_read_only = ;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > && $#threads >= ) {
if ( $time_until_read_only % == ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + , $time_until_read_only * ;
if ( $#threads < ) {
print Data::Dumper->new( [$_] )->Indent()->Terse()->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
} ## Setting read_only= on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
} ## Waiting for M * milliseconds so that current update queries can complete
my $time_until_kill_threads = ;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > && $#threads >= ) {
if ( $time_until_kill_threads % == ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + , $time_until_kill_threads * ;
if ( $#threads < ) {
print Data::Dumper->new( [$_] )->Indent()->Terse()->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
} print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip(); ## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= );
print current_time_us() . " done.\n";
#$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = ;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# . Create app user with write privileges
# . Moving backup script if needed
# . Register new master's ip to the catalog database # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is or , MHA does not abort
my $exit_code = ;
eval {
my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, ); ## Set read_only= on the new master
#$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only(); ## Creating an app user on the new master
#print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
#$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect(); ## Update master ip on the catalog database, etc
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = ;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) { # do nothing
exit ;
}
else {
&usage();
exit ;
}
} # A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
} sub start_arp() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
} sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n";
die;
}

(4)再次切换,观察vip是否迁移到mysqldb2上

# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.56.200 --orig_master_is_new_slave
Mon Oct :: - [info] MHA::MasterRotate version 0.57.
Mon Oct :: - [info] Starting online master switch..
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Configuration Check Phase..
Mon Oct :: - [info]
Mon Oct :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct :: - [info] GTID failover mode =
Mon Oct :: - [info] Current Alive Master: 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Alive Slaves:
Mon Oct :: - [info] 192.168.56.200(192.168.56.200:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Primary candidate for the new Master (candidate_master is set)
Mon Oct :: - [info] 192.168.56.210(192.168.56.210:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Mon Oct :: - [info] GTID ON
Mon Oct :: - [info] Replicating from 192.168.56.100(192.168.56.100:)
Mon Oct :: - [info] Not candidate for the new Master (no_master is set) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.56.100(192.168.56.100:)? (YES/no): yes
Mon Oct :: - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Checking MHA is not monitoring or doing failover..
Mon Oct :: - [info] Checking replication health on 192.168.56.200..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Checking replication health on 192.168.56.210..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] 192.168.56.200 can be new master.
Mon Oct :: - [info]
From:
192.168.56.100(192.168.56.100:) (current master)
+--192.168.56.200(192.168.56.200:)
+--192.168.56.210(192.168.56.210:) To:
192.168.56.200(192.168.56.200:) (new master)
+--192.168.56.210(192.168.56.210:)
+--192.168.56.100(192.168.56.100:) Starting master switch from 192.168.56.100(192.168.56.100:) to 192.168.56.200(192.168.56.200:)? (yes/NO): yes
Mon Oct :: - [info] Checking whether 192.168.56.200(192.168.56.200:) is ok for the new master..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Oct :: - [info] 192.168.56.100(192.168.56.100:): Resetting slave pointing to the dummy host.
Mon Oct :: - [info] ** Phase : Configuration Check Phase completed.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : Rejecting updates Phase..
Mon Oct :: - [info]
Mon Oct :: - [info] Executing master ip online change script to disable write on the current master:
Mon Oct :: - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port= --orig_master_user='mha_rep' --new_master_host=192.168.56.200 --new_master_ip=192.168.56.200 --new_master_port= --new_master_user='mha_rep' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Mon Oct :: Set read_only on the new master.. ok.
Mon Oct :: Waiting all running threads are disconnected.. (max milliseconds)
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'}
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'}
Mon Oct :: Waiting all running threads are disconnected.. (max milliseconds)
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'}
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'}
Mon Oct :: Waiting all running threads are disconnected.. (max milliseconds)
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'}
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'}
Mon Oct :: Set read_only= on the orig master.. ok.
Mon Oct :: Waiting all running queries are disconnected.. (max milliseconds)
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'}
{'Time' => '','db' => undef,'Id' => '','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'}
Disabling the VIP on old master: 192.168.56.100
Mon Oct :: Killing all application threads..
Mon Oct :: done.
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Oct :: - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Orig master binlog:pos is my3306_binlog.:.
Mon Oct :: - [info] Waiting to execute all relay logs on 192.168.56.200(192.168.56.200:)..
Mon Oct :: - [info] master_pos_wait(my3306_binlog.:) completed on 192.168.56.200(192.168.56.200:). Executed events.
Mon Oct :: - [info] done.
Mon Oct :: - [info] Getting new master's binlog name and position..
Mon Oct :: - [info] my3306_binlog.:
Mon Oct :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.56.200', MASTER_PORT=, MASTER_AUTO_POSITION=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Oct :: - [info] Executing master ip online change script to allow write on the new master:
Mon Oct :: - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port= --orig_master_user='mha_rep' --new_master_host=192.168.56.200 --new_master_ip=192.168.56.200 --new_master_port= --new_master_user='mha_rep' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Mon Oct :: Set read_only= on the new master.
Enabling the VIP - 192.168.56.111/ on the new master - 192.168.56.200
Mon Oct :: - [info] ok.
Mon Oct :: - [info]
Mon Oct :: - [info] * Switching slaves in parallel..
Mon Oct :: - [info]
Mon Oct :: - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:) started, pid:
Mon Oct :: - [info]
Mon Oct :: - [info] Log messages from 192.168.56.210 ...
Mon Oct :: - [info]
Mon Oct :: - [info] Waiting to execute all relay logs on 192.168.56.210(192.168.56.210:)..
Mon Oct :: - [info] master_pos_wait(my3306_binlog.:) completed on 192.168.56.210(192.168.56.210:). Executed events.
Mon Oct :: - [info] done.
Mon Oct :: - [info] Resetting slave 192.168.56.210(192.168.56.210:) and starting replication from the new master 192.168.56.200(192.168.56.200:)..
Mon Oct :: - [info] Executed CHANGE MASTER.
Mon Oct :: - [info] Slave started.
Mon Oct :: - [info] End of log messages from 192.168.56.210 ...
Mon Oct :: - [info]
Mon Oct :: - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:) succeeded.
Mon Oct :: - [info] Unlocking all tables on the orig master:
Mon Oct :: - [info] Executing UNLOCK TABLES..
Mon Oct :: - [info] ok.
Mon Oct :: - [info] Starting orig master as a new slave..
Mon Oct :: - [info] Resetting slave 192.168.56.100(192.168.56.100:) and starting replication from the new master 192.168.56.200(192.168.56.200:)..
Mon Oct :: - [info] Executed CHANGE MASTER.
Mon Oct :: - [info] Slave started.
Mon Oct :: - [info] All new slave servers switched successfully.
Mon Oct :: - [info]
Mon Oct :: - [info] * Phase : New master cleanup phase..
Mon Oct :: - [info]
Mon Oct :: - [info] 192.168.56.200: Resetting slave info succeeded.
Mon Oct :: - [info] Switching master to 192.168.56.200(192.168.56.200:) completed successfully. # ifconfig
eth0: flags=<UP,BROADCAST,RUNNING,MULTICAST> mtu
inet 192.168.56.200 netmask 255.255.255.0 broadcast 192.168.56.255
... eth0:: flags=<UP,BROADCAST,RUNNING,MULTICAST> mtu
inet 192.168.56.111 netmask 255.255.255.0 broadcast 192.168.56.255
ether :::5b:8a:9a txqueuelen (Ethernet)
..
05-25 09:16