MHA的简要介绍
实验拓扑
要点以及基础知识
主机名 | 主机地址 | 角色 |
---|---|---|
node1 | 192.168.2.201 | Master节点,安装node组件 |
node2 | 192.168.2.202 | Slave节点,安装node组件 |
node3 | 192.168.2.203 | Slave节点,安装node组件 |
node4 | 192.168.2.204 | 安装manager组件 |
- 本文使用CentOS7.1,数据库:MariaDB-5.5.50
- 关于半同步复制的详细配置,可以参考我的上一篇文章。由于篇幅问题,这里主要讲如何安装配置和使用MHA组件。
- 因为数据库版本是MariaDB-5.5.50,所以选择编译在codegoole上面的mha4mysql-0.56
- 注意:本文关闭了selinux,以及iptables。
Perl编译安装
最新版MHA下载地址:
mha4mysql-manager
mha4mysql-node
- 题外话
- 本来代码是在codegoogle上面进行托管的,甚至连一些介绍的主页也是在codegoogle上面的。
- 但是由于github的出现,很多软件都转移到github上边了。codegoole上面的rpm包很多都已经失效。
- 因为来历不明的rpm不敢安装在实际环境中,所以选择使用perl编译安装。
(1)在每一个节点上面进行编译环境的安装
- yum - y install perl - DBD - MySQL perl - Config - Tiny perl - Log - Dispatch perl - Parallel - ForkManager perl - Config - IniFiles ncftp perl - Params - Validate perl - CPAN perl - Test - Mock - LWP.noarch perl - LWP - Authen - Negotiate.noarch perl - devel perl - ExtUtils - CBuilder perl - ExtUtils - MakeMaker
(2)在node4中安装manager组件
- a.使用make Makefile.PL检查编译环境,功能类似于./configure
- 其实node1~node3这三个配置了半同步复制的数据库节点安装的是node组件,但是也是执行这两步。
- 一般都不会出错。而且node节点不用额外配置,所以不做重复演示了。
- [root@node4 mha4mysql-manager-0.56]# perl Makefile.PL
- *** Module::AutoInstall version 1.03
- *** Checking for Perl dependencies...
- [Core Features]
- - DBI ...loaded. (1.627)
- - DBD::mysql ...loaded. (4.023)
- - Time::HiRes ...loaded. (1.9725)
- - Config::Tiny ...loaded. (2.14)
- - Log::Dispatch ...loaded. (2.41)
- - Parallel::ForkManager ...loaded. (1.05)
- - MHA::NodeConst ...loaded. (0.56)
- *** Module::AutoInstall configuration finished.
- Writing Makefile for mha4mysql::manager
- b.使用make&&make install安装
- [root@node4 mha4mysql-manager-0.56]# make&&make install
- Skip blib/lib/MHA/ManagerUtil.pm (unchanged)
- Skip blib/lib/MHA/Config.pm (unchanged)
- Skip blib/lib/MHA/HealthCheck.pm (unchanged)
- Skip blib/lib/MHA/ManagerConst.pm (unchanged)
- Skip blib/lib/MHA/ServerManager.pm (unchanged)
- Skip blib/lib/MHA/ManagerAdmin.pm (unchanged)
- Skip blib/lib/MHA/FileStatus.pm (unchanged)
- Skip blib/lib/MHA/ManagerAdminWrapper.pm (unchanged)
- Skip blib/lib/MHA/MasterFailover.pm (unchanged)
- Skip blib/lib/MHA/MasterRotate.pm (unchanged)
- Skip blib/lib/MHA/MasterMonitor.pm (unchanged)
- Skip blib/lib/MHA/SSHCheck.pm (unchanged)
- Skip blib/lib/MHA/Server.pm (unchanged)
- Skip blib/lib/MHA/DBHelper.pm (unchanged)
- cp bin/masterha_stop blib/script/masterha_stop
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop
- cp bin/masterha_conf_host blib/script/masterha_conf_host
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host
- cp bin/masterha_check_repl blib/script/masterha_check_repl
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl
- cp bin/masterha_check_status blib/script/masterha_check_status
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status
- cp bin/masterha_master_monitor blib/script/masterha_master_monitor
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor
- cp bin/masterha_check_ssh blib/script/masterha_check_ssh
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh
- cp bin/masterha_master_switch blib/script/masterha_master_switch
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch
- cp bin/masterha_secondary_check blib/script/masterha_secondary_check
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check
- cp bin/masterha_manager blib/script/masterha_manager
- /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager
- Manifying blib/man1/masterha_stop.1
- Manifying blib/man1/masterha_conf_host.1
- Manifying blib/man1/masterha_check_repl.1
- Manifying blib/man1/masterha_check_status.1
- Manifying blib/man1/masterha_master_monitor.1
- Manifying blib/man1/masterha_check_ssh.1
- Manifying blib/man1/masterha_master_switch.1
- Manifying blib/man1/masterha_secondary_check.1
- Manifying blib/man1/masterha_manager.1
- Appending installation info to /usr/lib64/perl5/perllocal.pod
数据库节点的配置
半同步复制Master节点
的MariaDB配置文件
- Node1
- [mysqld]
- datadir=/var/lib/mysql
- socket=/var/lib/mysql/mysql.sock
- # Disabling symbolic-links is recommended to prevent assorted security risks
- symbolic-links=0
- # Settings user and group are ignored when systemd is used.
- # If you need to run mysqld under a different user or group,
- # customize your systemd unit file for mariadb according to the
- # instructions in http://Fedoraproject.org/wiki/Systemd
- innodb_file_per_table = 1
- skip_name_resolve = 1
- log_bin = Master-log
- log_bin_index = 1
- server_id = 1
- relay_log=relay-log
- relay_log_purge=0
- #skip-grant-tables
- #skip-networking
- [mysqld_safe]
- log-error=/var/log/mariadb/mariadb.log
- pid-file=/var/run/mariadb/mariadb.pid
- #
- # include all files from the config directory
- #
- !includedir /etc/my.cnf.d
- 这里需要注意的是,
- 半同步复制主节点和从节点都要启动了二进制日志log_bin = Master-log,中继日志relay_log=relay-log
- 而且这里关闭了中继日志的修剪功能relay_log_purge=0。因为这由MHA完成。
半同步复制Slave节点
和
- Node2
的MariaDB配置文件
- node3
- [mysqld]
- datadir=/var/lib/mysql/
- socket=/var/lib/mysql/mysql.sock
- log_bin=Master-bin
- # Disabling symbolic-links is recommended to prevent assorted security risks
- symbolic-links=0
- # Settings user and group are ignored when systemd is used.
- # If you need to run mysqld under a different user or group,
- # customize your systemd unit file for mariadb according to the
- # instructions in http://fedoraproject.org/wiki/Systemd
- skip_name_resolve=true
- innodb_file_per_table=ture
- server_id = 2
- log_bin=bin_log
- relay_log=relay-log
- read_only = 1
- relay_log_purge=0
- [mysqld_safe]
- log-error=/var/log/mariadb/mariadb.log
- pid-file=/var/run/mariadb/mariadb.pid
- #
- # include all files from the config directory
- #
- !includedir /etc/my.cnf.d
- 这里比Master节点多一个read_only=1
- 假如Slave节点���提升为Master节点的话,MHA会自动将这个read_only=1去掉
- 并且会将修改其他Slave节点指向新的主节点,可以用show slave status\G查看。
Manager节点配置
(1)复制默认文件作为模板,并清空默认配置
- cp /etc/masterha/masterha_default.cnf /etc/masterha/app1.cnf
- > /etc/masterha/masterha_default.cnf
(2)配置/etc/masterha/app1.cnf,用于启动manager进程的时候指定。
- MHA的一个manager节点可以通过启动多个进程来监控多个MHA集群,所以使用app1,app2的方式。
- [server default]
- #manager_workdir=/var/log/masterha/app1
- #manager_log=/var/log/masterha/app1/manager.log
- user=root
- password=123456789
- manager_workdir=/data/masterha/app1
- manager_log=/data/masterha/app1/manager.log
- remote_workdir=/data/masterha/app1
- ssh_user=root
- repl_user=repuser
- repl_password=repuser
- ping_interval=1
- [server1]
- hostname=node1
- candidate_master=1
- [server2]
- hostname=node2
- candidate_master=1
- [server3]
- hostname=node3
- 这里的user和password指的是数据库管理员的账号密码
- repl_user和repl_password是具有复制权限的用户和密码
- ssh_user=root是ssh的账户,由于是秘钥认证,并不需要密码
- 配置文件中,hostname=node1是因为主机可以使用node1访问到该主机,这里也可以用ip地址。
(3)创建配置文件中manager_workdir的工作路径
- mkdir / data / masterha / app1 /
利用MHA的工具测试环境是否正常
(1)测试ssh是否连接正常
- [root@node4 mha4mysql-manager-0.56]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
- Thu Nov 10 22:59:03 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
- Thu Nov 10 22:59:03 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
- Thu Nov 10 22:59:03 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
- Thu Nov 10 22:59:03 2016 - [info] Starting SSH connection tests..
- Thu Nov 10 22:59:04 2016 - [debug]
- Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node1(192.168.2.201:22) to root@node2(192.168.2.202:22)..
- Thu Nov 10 22:59:03 2016 - [debug] ok.
- Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node1(192.168.2.201:22) to root@node3(192.168.2.203:22)..
- Thu Nov 10 22:59:03 2016 - [debug] ok.
- Thu Nov 10 22:59:04 2016 - [debug]
- Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node2(192.168.2.202:22) to root@node1(192.168.2.201:22)..
- Thu Nov 10 22:59:04 2016 - [debug] ok.
- Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node2(192.168.2.202:22) to root@node3(192.168.2.203:22)..
- Thu Nov 10 22:59:04 2016 - [debug] ok.
- Thu Nov 10 22:59:05 2016 - [debug]
- Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node3(192.168.2.203:22) to root@node1(192.168.2.201:22)..
- Thu Nov 10 22:59:04 2016 - [debug] ok.
- Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node3(192.168.2.203:22) to root@node2(192.168.2.202:22)..
- Thu Nov 10 22:59:05 2016 - [debug] ok.
- Thu Nov 10 22:59:05 2016 - [info] All SSH connection tests passed successfully.
- 这么多输出信息,其实只看最后一句就知道ssh是否正常了
- 这里需要注意的是这里指定了刚才配置的app1.
(2)测试复制功能是否正常
- [root@node4 mha4mysql-manager-0.56]# masterha_check_repl --conf=/etc/masterha/app1.cnf
- Thu Nov 10 23:07:35 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
- Thu Nov 10 23:07:35 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
- Thu Nov 10 23:07:35 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
- Thu Nov 10 23:07:35 2016 - [info] MHA::MasterMonitor version 0.56.
- Thu Nov 10 23:07:35 2016 - [info] GTID failover mode = 0
- Thu Nov 10 23:07:35 2016 - [info] Dead Servers:
- Thu Nov 10 23:07:35 2016 - [info] Alive Servers:
- Thu Nov 10 23:07:35 2016 - [info] node1(192.168.2.201:3306)
- Thu Nov 10 23:07:35 2016 - [info] node2(192.168.2.202:3306)
- Thu Nov 10 23:07:35 2016 - [info] node3(192.168.2.203:3306)
- Thu Nov 10 23:07:35 2016 - [info] Alive Slaves:
- Thu Nov 10 23:07:35 2016 - [info] node2(192.168.2.202:3306) Version=5.5.50-MariaDB (oldest major version between slaves) log-bin:enabled
- Thu Nov 10 23:07:35 2016 - [info] Replicating from 192.168.2.201(192.168.2.201:3306)
- Thu Nov 10 23:07:35 2016 - [info] Primary candidate for the new Master (candidate_master is set)
- Thu Nov 10 23:07:35 2016 - [info] node3(192.168.2.203:3306) Version=5.5.50-MariaDB (oldest major version between slaves) log-bin:enabled
- Thu Nov 10 23:07:35 2016 - [info] Replicating from 192.168.2.201(192.168.2.201:3306)
- Thu Nov 10 23:07:35 2016 - [info] Current Alive Master: node1(192.168.2.201:3306)
- Thu Nov 10 23:07:35 2016 - [info] Checking slave configurations..
- Thu Nov 10 23:07:35 2016 - [warning] relay_log_purge=0 is not set on slave node3(192.168.2.203:3306).
- Thu Nov 10 23:07:35 2016 - [info] Checking replication filtering settings..
- Thu Nov 10 23:07:35 2016 - [info] binlog_do_db= , binlog_ignore_db=
- Thu Nov 10 23:07:35 2016 - [info] Replication filtering check ok.
- Thu Nov 10 23:07:35 2016 - [info] GTID (with auto-pos) is not supported
- Thu Nov 10 23:07:35 2016 - [info] Starting SSH connection tests..
- Thu Nov 10 23:07:37 2016 - [info] All SSH connection tests passed successfully.
- Thu Nov 10 23:07:37 2016 - [info] Checking MHA Node version..
- Thu Nov 10 23:07:37 2016 - [info] Version check ok.
- Thu Nov 10 23:07:37 2016 - [info] Checking SSH publickey authentication settings on the current master..
- Thu Nov 10 23:07:37 2016 - [info] HealthCheck: SSH to node1 is reachable.
- Thu Nov 10 23:07:37 2016 - [info] Master MHA Node version is 0.56.
- Thu Nov 10 23:07:37 2016 - [info] Checking recovery script configurations on node1(192.168.2.201:3306)..
- Thu Nov 10 23:07:37 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=Master-log.000006
- Thu Nov 10 23:07:37 2016 - [info] Connecting to root@192.168.2.201(node1:22)..
- Creating /data/masterha/app1 if not exists.. ok.
- Checking output directory is accessible or not..
- ok.
- Binlog found at /var/lib/mysql, up to Master-log.000006
- Thu Nov 10 23:07:38 2016 - [info] Binlog setting check done.
- Thu Nov 10 23:07:38 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
- Thu Nov 10 23:07:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node2 --slave_ip=192.168.2.202 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.50-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
- Thu Nov 10 23:07:38 2016 - [info] Connecting to root@192.168.2.202(node2:22)..
- Checking slave recovery environment settings..
- Opening /var/lib/mysql/relay-log.info ... ok.
- Relay log found at /var/lib/mysql, up to relay-log.000004
- Temporary relay log file is /var/lib/mysql/relay-log.000004
- Testing mysql connection and privileges.. done.
- Testing mysqlbinlog output.. done.
- Cleaning up test file(s).. done.
- Thu Nov 10 23:07:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node3 --slave_ip=192.168.2.203 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.50-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
- Thu Nov 10 23:07:38 2016 - [info] Connecting to root@192.168.2.203(node3:22)..
- Checking slave recovery environment settings..
- Opening /var/lib/mysql/relay-log.info ... ok.
- Relay log found at /var/lib/mysql, up to relay-log.000002
- Temporary relay log file is /var/lib/mysql/relay-log.000002
- Testing mysql connection and privileges.. done.
- Testing mysqlbinlog output.. done.
- Cleaning up test file(s).. done.
- Thu Nov 10 23:07:38 2016 - [info] Slaves settings check done.
- Thu Nov 10 23:07:38 2016 - [info]
- node1(192.168.2.201:3306) (current master)
- +--node2(192.168.2.202:3306)
- +--node3(192.168.2.203:3306)
- Thu Nov 10 23:07:38 2016 - [info] Checking replication health on node2..
- Thu Nov 10 23:07:38 2016 - [info] ok.
- Thu Nov 10 23:07:38 2016 - [info] Checking replication health on node3..
- Thu Nov 10 23:07:38 2016 - [info] ok.
- Thu Nov 10 23:07:38 2016 - [warning] master_ip_failover_script is not defined.
- Thu Nov 10 23:07:38 2016 - [warning] shutdown_script is not defined.
- Thu Nov 10 23:07:38 2016 - [info] Got exit code 0 (Not master dead).
- MySQL Replication Health is OK.
(3)最激动人心的时刻到了,启动服务!
- [root@node4 mha4mysql-manager-0.56]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &
- [1] 8463
(4)查看masterha是否正在正常运行,还有主节点信息。
- [root@node4 mha4mysql - manager - 0.56]#masterha_check_status--conf = /etc/masterha / app1.cnf app1(pid: 8463) is running(0 : PING_OK),
- master: node1
模拟MHA故障
(1)Master节点·node1·关闭MariaDB
- systemctl stop mariadb.service
(2)查看manager节点的状况
- [root@node4 mha4mysql-manager-0.56]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 is stopped(2:NOT_RUNNING).
- [1]+ Done nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1
- 可以看出MHA程序masterha_manager已经退出了
- 同时还要注意一点,在工作路径/data/masterha/app1/下会生成一个app1.failover.complete的文件。
- 如果需要启动的时候,最好删除这个文件,否则会启动失败。
(3)去node3查看slave信息,node3指向新的Master节点。
- MariaDB [(none)]> show slave status\G
- *************************** 1. row ***************************
- Slave_IO_State: Waiting for master to send event
- Master_Host: 192.168.2.202
- Master_User: repuser
- Master_Port: 3306
- Connect_Retry: 60
- Master_Log_File: bin_log.000002
- Read_Master_Log_Pos: 245
- Relay_Log_File: relay-log.000002
- Relay_Log_Pos: 527
- Relay_Master_Log_File: bin_log.000002
- Slave_IO_Running: Yes
- Slave_SQL_Running: Yes
- Replicate_Do_DB:
- Replicate_Ignore_DB:
- Replicate_Do_Table:
- Replicate_Ignore_Table:
- Replicate_Wild_Do_Table:
- Replicate_Wild_Ignore_Table:
- Last_Errno: 0
- Last_Error:
- Skip_Counter: 0
- Exec_Master_Log_Pos: 245
- Relay_Log_Space: 815
- Until_Condition: None
- Until_Log_File:
- Until_Log_Pos: 0
- Master_SSL_Allowed: No
- Master_SSL_CA_File:
- Master_SSL_CA_Path:
- Master_SSL_Cert:
- Master_SSL_Cipher:
- Master_SSL_Key:
- Seconds_Behind_Master: 0
- Master_SSL_Verify_Server_Cert: No
- Last_IO_Errno: 0
- Last_IO_Error:
- Last_SQL_Errno: 0
- Last_SQL_Error:
- Replicate_Ignore_Server_Ids:
- Master_Server_Id: 2
(4)node2原本作为从节点所设置的只读属性也自动取消了。
- MariaDB [(none)]> show variables like '%read_only%';
- +---------------+-------+
- | Variable_name | Value |
- +---------------+-------+
- | read_only | OFF |
- +---------------+-------+
- 1 row in set (0.00 sec)
(5)灾后重建的步骤
我们知道,当时原有master故障的时候,masterha_manager会通过二进制日志和中继日志的状况,选举出新的master节点,并由只读状态改为可读写的状态会退出。
所以接下来要怎么做呢?
- a.删除工作路径下的failover.complete文件。
- 如/data/masterha/app1/app1.failover.complete
- b.原有的master,也就是node1节点。
- 需要清空数据库,再将node2全备一次,恢复到node1上面来
- 并配置node1为Slave节点,并指向新的节点node2
- c.重新通过masterha_check等工具检测环境是否正常,并重新启动MHA的主程序masterha_manager。
来源: http://www.linuxidc.com/Linux/2017-10/147555.htm