在之前的博客中, 介绍了 MySQL 的主从模型以及深层次的 MySQL 的读写分离插件 --ProxySQL, 让我们可以很大程度上提升数据库服务器的性能和优化用户的体验, 但是, 我们对于数据库的可靠性似乎缺了一点, 因为一旦 master 数据库服务器宕机, 我们的数据库基本上就是瘫痪了, 所以, 我们需要一个解决方案针对于数据库服务器的可靠性.
MHA(MySQL|Master High Availability) 是一种基于主从模型的相当成熟的一种解决方案, 我们对于 master 做一个高可用, 使得哪怕在 master 数据库服务器宕机时, 我们的 slave 可以及时顶上, 直接变成 master 主机, 保证服务可靠的运行;
由于 mha4mysql 不存在于镜像仓库和 epel 中, 所以需要下载 rpm 包, 我下载的是
- mha4mysql-manager-0.56-0.el6.noarch.rpm
- mha4mysql-node-0.56-0.el6.noarch.rpm
建议 mha 服务器和 master 服务器分开; 否则 master 物理宕机的话, mha 也会宕机;
Master 和 slave 上安装 mha4mysql-node 节点;
mha 服务器两个 rpm 包都需要安装;
- MHA: CentOS 7.5B 172.16.75.2
- Master: CentOS 7.5D 172.16.75.4
- Slave: CentOS 7.5C 172.16.75.3
三台主机都需要进行 SSH 免密通信;
因为如果 master 宕机, slave 需要顶上, 所以 slave 和 master 都需要开启二进制日志和中继日志;
Master 配置文件:
- [mysqld]
- datadir=/var/lib/MySQL
- socket=/var/lib/MySQL/MySQL.sock
- # Disabling symbolic-links is recommended to prevent assorted security risks
- symbolic-links=0
- # Settings user and group are ignored when systemd is used.
- # If you need to run mysqld under a different user or group,
- # customize your systemd unit file for mariadb according to the
- # instructions in http://fedoraproject.org/wiki/Systemd
- innodb_file_per_table=ON
- skip_name_resolve=ON
- server_id=401
- log_bin=/var/lib/MySQL/binlog
- sync_binlog=1
- innodb_flush_log_at_trx_commit=1
- relay_log_purge=0
- relay_log=relay_log
- [mysqld_safe]
- log-error=/var/log/mariadb/mariadb.log
- pid-file=/var/run/mariadb/mariadb.pid
- #
- # include all files from the config directory
- #
- !includedir /etc/my.cnf.d
Slave 配置文件:
- [mysqld]
- datadir=/var/lib/MySQL
- socket=/var/lib/MySQL/MySQL.sock
- # Disabling symbolic-links is recommended to prevent assorted security risks
- symbolic-links=0
- # Settings user and group are ignored when systemd is used.
- # If you need to run mysqld under a different user or group,
- # customize your systemd unit file for mariadb according to the
- # instructions in http://fedoraproject.org/wiki/Systemd
- innodb_file_per_table=ON
- skip_name_resolve=ON
- server_id=301
- read_only=ON
- relay_log=slavelog
- relay_log_purge=0
- log_bin=binlog
- [mysqld_safe]
- log-error=/var/log/mariadb/mariadb.log
- pid-file=/var/run/mariadb/mariadb.pid
- #
- # include all files from the config directory
- #
- !includedir /etc/my.cnf.d
然后其他配置不变, 搭建主从模型;
1. 下载完 mha 的 rpm 包后, 分别在对应的节点安装对应的安装包, 通过查看 mha4mysql-manager 配置文件, 全是二进制执行脚本. 所以 mha 的配置文件需要自己写;
- [root@slave1 ~]# rpm -ql mha4mysql-manager
- /usr/bin/masterha_check_repl
- /usr/bin/masterha_check_ssh
- /usr/bin/masterha_check_status
- /usr/bin/masterha_conf_host
- /usr/bin/masterha_manager
- /usr/bin/masterha_master_monitor
- /usr/bin/masterha_master_switch
- /usr/bin/masterha_secondary_check
- /usr/bin/masterha_stop
- /usr/share/man/man1/masterha_check_repl.1.gz
- /usr/share/man/man1/masterha_check_ssh.1.gz
- /usr/share/man/man1/masterha_check_status.1.gz
- /usr/share/man/man1/masterha_conf_host.1.gz
- /usr/share/man/man1/masterha_manager.1.gz
- /usr/share/man/man1/masterha_master_monitor.1.gz
- /usr/share/man/man1/masterha_master_switch.1.gz
- /usr/share/man/man1/masterha_secondary_check.1.gz
- /usr/share/man/man1/masterha_stop.1.gz
- /usr/share/perl5/vendor_perl/MHA/Config.pm
- /usr/share/perl5/vendor_perl/MHA/DBHelper.pm
- /usr/share/perl5/vendor_perl/MHA/FileStatus.pm
- /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm
- /usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm
- /usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm
- /usr/share/perl5/vendor_perl/MHA/ManagerConst.pm
- /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm
- /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm
- /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm
- /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm
- /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
- /usr/share/perl5/vendor_perl/MHA/Server.pm
- /usr/share/perl5/vendor_perl/MHA/ServerManager.pm
2. 创建 mha 相对应的配置文件;
- [root@slave1 ~]# mkdir /etc/mha
- [root@slave1 ~]# VIM /etc/mha/app1.cnf
- [root@slave1 ~]#
- [root@slave1 ~]# cat /etc/mha/app1.cnf
- [server default]
- user=mha #登录用户
- password=mhapass
- manager_workdir=/data/masterha/app1
- manager_log=/data/masterha/app1/manager.log
- remote_workdir=/data/masterha/app1
- ssh_user=root
- repl_user=repuser #master 做 replication slave 授权的用户
- repl_password=123456
- ping_interval=1
- [server1]
- hostname=172.16.75.4
- candidate_master=1
- [server2]
- hostname=172.16.75.3
- candidate_master=1
- [root@slave1 ~]#
- [root@slave1 ~]# mkdir -pv /data/masterha/app1
mkdir: 已创建目录 "/data"
mkdir: 已创建目录 "/data/masterha"
mkdir: 已创建目录 "/data/masterha/app1"
3. 根据配置文件内容, master 还需要对 mha 做一个授权用户.
- Master MySQL:
- MariaDB [(none)]> grant all on *.* to 'mha'@'%' identified by 'mhapass';
- Query OK, 0 rows affected (0.08 sec)
4. 在 mha 服务器上进行 SSH 测试和 repl 测试 (最后显示 OK 即可);
- [root@slave1 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
- Wed Nov 7 20:46:17 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
- Wed Nov 7 20:46:17 2018 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
- Wed Nov 7 20:46:17 2018 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
- Wed Nov 7 20:46:17 2018 - [info] Starting SSH connection tests..
- Wed Nov 7 20:46:18 2018 - [debug]
- Wed Nov 7 20:46:17 2018 - [debug] Connecting via SSH from root@172.16.75.4(172.16.75.4:22) to root@172.16.75.3(172.16.75.3:22)..
- Wed Nov 7 20:46:17 2018 - [debug] ok.
- Wed Nov 7 20:46:18 2018 - [debug]
- Wed Nov 7 20:46:17 2018 - [debug] Connecting via SSH from root@172.16.75.3(172.16.75.3:22) to root@172.16.75.4(172.16.75.4:22)..
- Wed Nov 7 20:46:18 2018 - [debug] ok.
- Wed Nov 7 20:46:18 2018 - [info] All SSH connection tests passed successfully.
- [root@slave1 ~]# masterha_check_repl --conf=/etc/mha/App.cnf
- ...
- 172.16.75.4(172.16.75.4:3306) (current master)
- +--172.16.75.3(172.16.75.3:3306)
- Thu Nov 8 09:37:35 2018 - [info] Checking replication health on 172.16.75.3..
- Thu Nov 8 09:37:35 2018 - [info] ok.
- Thu Nov 8 09:37:35 2018 - [warning] master_ip_failover_script is not defined.
- Thu Nov 8 09:37:35 2018 - [warning] shutdown_script is not defined.
- Thu Nov 8 09:37:35 2018 - [info] Got exit code 0 (Not master dead).
- MySQL Replication Health is OK.
5. 启动 mha4mysql 进程;
- [root@slave1 ~]# nohup masterha_manager --conf=/etc/mha/App.cnf > /data/masterha/app1/manager.log 2>&1 &
- [1] 85154
6. 检测 mha 的状态;
- [root@slave1 ~]# masterha_check_status --conf=/etc/mha/App.cnf
- App (pid:85154) is running(0:PING_OK), master:172.16.75.4
7. 测试:
把 master 的 MySQL 进程关掉; 然后查看 manager 上的 mha 日志:
- Master:
- [root@slave2 ~]# systemctl stop mariadb
- Manager:
- [root@slave1 ~]# masterha_check_status --conf=/etc/mha/App.cnf
- App master is down and failover is running(50:FAILOVER_RUNNING). master:172.16.75.4
- [root@slave1 ~]# cat /data/masterha/app1/manager.log
- ...
- ...
- ----- Failover Report -----
- App: MySQL Master failover 172.16.75.4(172.16.75.4:3306) to 172.16.75.3(172.16.75.3:3306) succeeded
- Master 172.16.75.4(172.16.75.4:3306) is down!
- Check MHA Manager logs at slave1.ljy.com:/data/masterha/app1/manager.log for details.
- Started automated(non-interactive) failover.
- The latest slave 172.16.75.3(172.16.75.3:3306) has all relay logs for recovery.
- Selected 172.16.75.3(172.16.75.3:3306) as a new master.
- 172.16.75.3(172.16.75.3:3306): OK: Applying all logs succeeded.
- Generating relay diff files from the latest slave succeeded.
- 172.16.75.3(172.16.75.3:3306): Resetting slave info succeeded.
- Master failover to 172.16.75.3(172.16.75.3:3306) completed successfully.
最后, 当出现 Master failover to 172.16.75.3(172.16.75.3:3306) completed successfully.
就意味着我们的 master 已经转到 slave 上了, 虽然之前的 Master 数据库服务器 down 掉, 但是我们通过 mha 及时的切换到 slave 主机上, 保证数据库服务器的可靠性.
遇到的大坑: Checking if super_read_only is defined and turned on..DBD::MySQL::st execute failed: Unknown system variable 'super_read_only'at /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm line 245.
一开始用的 0.58 版本的 mha, 结果在进行 repl 检测的时候, 出现了 "super read only" 的错误, 这是因为我们的 MySQL 版本和 mha 版本有不兼容的情况, 换用 0.56 的版本就可以, 因为 0.58 版本的 mha 兼容的是 mariadb-10 以后版本有 "super_read_only" 选项, 我们正常 CentOS-7 系列主机上安装的 MariaDB-5.56 没有 "super_read_only" 选项, 所以检测 repl 权限时, 会说我们没有设置此选项;
总之, mha 在生产环境中有很高的必要性, 我们需要 Proxysql 做读写分离提升服务器型男的同时, 又需要 mha 对 master 做高可用来保证服务器的可靠性, 保障我们的数据库不间断的运行;
来源: http://blog.51cto.com/liujingyu/2314335