实为吾之愚见, 望诸君酌之! 闻过则喜, 与君共勉
以下方法适用于实时查看只读延迟(主从延迟), 即需要在延迟发生的时候查看才能确认问题, 历史延迟不适用, 以下环境已经开启并行复制.
第一章: 如何判断无延迟
第一步: 主实例信息获取
主实例 (主节点) 高权限账户执行:
1,show processlist
2,show master status\G
3,show variables like '';
第二步: 只读实例信息获取
只读实例 (slave 从库) 高权限账户执行:
1ï¼show processlist
2ï¼show slave status\G
- MySQL>show slave status\G
- *************************** 1. row ***************************
- Slave_IO_State: Waiting for master to send event
- Master_Host: 10.151.83.26
- Master_User: replicator
- Master_Port: 3106
- Connect_Retry: 60
- Master_Log_File: MySQL-bin.000379
- Read_Master_Log_Pos: 53064
- Relay_Log_File: slave-relay.000375
- Relay_Log_Pos: 53237
- Relay_Master_Log_File: MySQL-bin.000379
- Slave_IO_Running: Yes
- Slave_SQL_Running: Yes
- Replicate_Do_DB:
- Replicate_Ignore_DB:
- Replicate_Do_Table:
- Replicate_Ignore_Table:
- Replicate_Wild_Do_Table:
- Replicate_Wild_Ignore_Table:
- Last_Errno: 0
- Last_Error:
- Skip_Counter: 0
- Exec_Master_Log_Pos: 53064
- Relay_Log_Space: 53527
- Until_Condition: None
- Until_Log_File:
- Until_Log_Pos: 0
- Master_SSL_Allowed: No
- Master_SSL_CA_File:
- Master_SSL_CA_Path:
- Master_SSL_Cert:
- Master_SSL_Cipher:
- Master_SSL_Key:
- Seconds_Behind_Master: 0
- Master_SSL_Verify_Server_Cert: No
- Last_IO_Errno: 0
- Last_IO_Error:
- Last_SQL_Errno: 0
- Last_SQL_Error:
- Replicate_Ignore_Server_Ids:
- Master_Server_Id: 999098802
- Master_UUID: a23bf85f-f20b-11e8-8268-7cd30abda240
- Master_Info_File: MySQL.slave_master_info
- SQL_Delay: 0
- SQL_Remaining_Delay:
- Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
- Master_Retry_Count: 86400
- Master_Bind:
- Last_IO_Error_Timestamp:
- Last_SQL_Error_Timestamp:
- Last_SQL_Error_Gtid:
- Master_SSL_Crl:
- Master_SSL_Crlpath:
- Retrieved_Gtid_Set: a23bf85f-f20b-11e8-8268-7cd30abda240:391216-838127
- Executed_Gtid_Set: 937ca3af-f20b-11e8-9447-7cd30ab8a5d8:1-178841,
- a23bf85f-f20b-11e8-8268-7cd30abda240:1-838127
- Auto_Position: 1
- Replicate_Rewrite_DB:
- Channel_Name:
- Master_TLS_Version:
- 3,show variables like '';
第三步: 对比确认
1ï¼确认主实例的进程状态是否为 Master has sent all binlog to slave; waiting for more updates(图片红框标识处), 如果是该状态, 说明 master 已经把所有的 binlog 发送给了 slave, 主实例正常
2ï¼确认只读实例 (slave 从库) 的进程状态是否为: Waiting for master to send event 以及
Slave has read all relay log; waiting for more updates, 如果是改状态, 说明 slave 以及应用了所有 master 发送的 binlog, 正在等待 master 继续发送
3ï¼对比只读实例执行 show slave status\G 得到的 Executed_Gtid_Set 与主实例执行 show master status\G 得到的 Executed_Gtid_Set 是否一致或相近, 如果一致, 则可以说明主实例执行的事务, 只读实例也已经执行.
通过如上 3 步的对比, 可以确认主实例与只读实例 (slave 从库) 无延迟
第二章 延迟原因以及解决
通过第一章的结论, 可以得出只读实例无延迟, 当出现只读延迟的时候, 应该如何确认原因以及解决, 请参考下面的具体例子:
2.1: 大批量更新
2.1.1 复现方法:
主实例执行大批量写入等更新操作: insert into MOCK_DATA3 select * from aaaaa.MOCK_DATA, 其中 MOCK_DATA 表数量 1000w 行, 导致延迟出现
2.1.2 延迟出现:
show processlist 结果:
- system user Connect 41786 Waiting for master to send event
- system user Connect Waiting for Slave Workers to free pending events
- system user Connect 236 Executing event
- system user Connect 571 Waiting for an event from Coordinator
- system user Connect 41786 Waiting for an event from Coordinator
- system user Connect 41786 Waiting for an event from Coordinator
- system user Connect 41786 Waiting for an event from Coordinator
- system user Connect 41786 Waiting for an event from Coordinator
- system user Connect 41786 Waiting for an event from Coordinator
- system user Connect 41786 Waiting for an event from Coordinator
show slave status\G 结果:
- Master_SSL_Key :
- Seconds_Behind_Master : 0
- Master_SSL_Verify_Server_Cert : No
- Last_IO_Errno : 0
- Last_IO_Error :
- Last_SQL_Errno : 0
- Last_SQL_Error :
- Replicate_Ignore_Server_Ids :
- Master_Server_Id : 999098802
- Master_UUID : a23bf85f-f20b-11e8-8268-7cd30abda240
- Master_Info_File : MySQL.slave_master_info
- SQL_Delay : 0
- SQL_Remaining_Delay : undefined
- Slave_SQL_Running_State : Waiting for Slave Workers to free pending events
- Master_Retry_Count : 86400
- Master_Bind :
- Last_IO_Error_Timestamp :
- Last_SQL_Error_Timestamp :
- Last_SQL_Error_Gtid :
- Master_SSL_Crl :
- Master_SSL_Crlpath :
- Retrieved_Gtid_Set : a23bf85f-f20b-11e8-8268-7cd30abda240:1-861466
- Executed_Gtid_Set : 937ca3af-f20b-11e8-9447-7cd30ab8a5d8:1-178841,a23bf85f-f20b-11e8-8268-7cd30abda240:1-861455
show engine innodb status\G 结果:
- ---TRANSACTION 2819387, ACTIVE 65 sec inserting
- MySQL tables in use 1, locked 1
- 1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 5919707
- MySQL thread id 23, OS thread handle 140536255158016, query id 102150 Executing event
2.1.3 解决方法
建议等待只读实例应用完主库的更新, 会自动恢复.
2.2: 无主键更新
2.2.1 复现方法:
主实例执行较多数据的无主键更新操作: MySQL>update MOCK_DATA set id=90 where id<10;, 导致延迟出现
2.2.2 延迟出现:
show processlist 结果:
show slave status\G 结果:
2.2.3 解决方法
建议增加有效索引提高更新效率.
2.3: 阻塞(ddl)
2.3.1 复现方法
只读实例开启显式事务, 执行如下查询, 不提交事务:
主实例执行 truncate table mock_data, 只读延迟产生
2.3.2 延迟产生
show processlist 结果:
show slave status\G 结果:
2.3.3 解决方法
该类问题一般是其他事务导致 ddl 操作阻塞, 建议 kill 掉阻塞的事务, 参考: https://help.aliyun.com/knowledge_detail/41723.html?spm=5176.11065259.1996646101.searchclickresult.3c2e73bf9s0jmI
2.4:Waiting for Slave Worker to release partition
参考: https://yq.aliyun.com/articles/188482?spm=a2c4e.11155435.0.0.120f63b5YRAxtQ
来源: https://yq.aliyun.com/articles/692087