A few things to understand ASAP about Redis replication.
- 1) Redis replication is asynchronous, but you can configure a master to
- stop accepting writes if it appears to be not connected with at least
- a given number of slaves.
- 2) Redis slaves are able to perform a partial resynchronization with the
- master if the replication link is lost for a relatively small amount of
- time. You may want to configure the replication backlog size (see the next
- sections of this file) with a sensible value depending on your needs.
- 3) Replication is automatic and does not need user intervention. After a
- network partition slaves automatically try to reconnect to masters
- and resynchronize with them.
复制的实现
1. 设置主节点的地址和端口
简而言之, 是执行 SLAVEOF 命令, 该命令是个异步命令, 在设置完 masterhost 和 masterport 属性之后, 从节点将向发送 SLAVEOF 的客户端返回 OK. 表示复制指令已经被接受, 而实际的复制工作将在 OK 返回之后才真正开始执行.
2. 创建套接字连接.
在执行完 SLAVEOF 命令后, 从节点根据命令所设置的 IP 和端口, 创建连向主节点的套接字连接. 如果创建成功, 则从节点将为这个套接字关联一个专门用于处理复制工作的文件事件处理器, 这个处理器将负责执行后续的复制工作, 比如接受 RDB 文件, 以及接受主节点传播来的写命令等.
3. 发送 PING 命令.
从节点成为主节点的客户端之后, 首先会向主节点发送一个 PING 命令, 其作用如下:
1. 检查套接字的读写状态是否正常.
2. 检查主节点是否能正常处理命令请求.
如果从节点读取到 "PONG" 的回复, 则表示主从节点之间的网路连接状态正常, 并且主节点可以正常处理从节点发送的命令请求.
4. 身份验证
从节点在收到主节点返回的 "PONG" 回复之后, 接下来会做的就是身份验证. 如果从节点设置了 masterauth 选项, 则进行身份验证. 反之则不进行.
在需要进行身份验证的情况下, 从节点将向主节点发送一条 AUTH 命令, 命令的参数即可从节点 masterauth 选项的值.
5. 发送端口信息.
在身份验证之后, 从节点将执行 REPLCONF listening-port <port-number>, 向主节点发送从节点的监听端口号.
主节点会将其记录在对应的客户端状态的 slave_listening_port 属性中, 这点可通过 info Replication 查看.
- 127.0.0.1:6379> info Replication
- # Replication
- role:master
- connected_slaves:1
- slave0:ip=127.0.0.1,port=6380,state=online,offset=3696,lag=0
6. 同步.
从节点向主节点发送 PSYNC 命令, 执行同步操作, 并将自己的数据库更新至主节点数据库当前所处的状态.
7. 命令传播
当完成了同步之后, 主从节点就会进入命令传播阶段. 这时主节点只要一直将自己执行的写命令发送到从节点, 而从节点只要一直接收并执行主节点发来的写命令, 就可以保证主从节点保持一致了.
8. 心跳检测
在命令传播阶段, 从节点默认会以每秒一次的频率, 向主节点发送命令.
REPLCONF ACK <replication_offset>
其中, replication_offset 是从节点当前的复制偏移量.
发送 REPLCONF ACK 主从节点有三个作用:
1> 检测主从节点的网络连接状态.
2> 辅助实现 min-slave 选项.
3> 检查是否存在命令丢失.
REPLCONF ACK 命令和复制积压缓冲区是 Redis 2.8 版本新增的, 在此之前, 即使命令在传播过程中丢失, 主从节点都不会注意到.
复制的相关参数
- slaveof <masterip> <masterport>
- masterauth <master-password>
- slave-serve-stale-data yes
- slave-read-only yes
- repl-diskless-sync no
- repl-diskless-sync-delay 5
- repl-ping-slave-period 10
- repl-timeout 60
- repl-disable-tcp-nodelay no
- repl-backlog-size 1mb
- repl-backlog-ttl 3600
- slave-priority 100
- min-slaves-to-write 3
- min-slaves-max-lag 10
- slave-announce-ip 5.5.5.5
- slave-announce-port 1234
其中,
slaveof <masterip> <masterport>: 开启复制, 只需这条命令即可.
masterauth <master-password>: 如果 master 中通过 requirepass 参数设置了密码, 则 slave 中需设置该参数.
slave-serve-stale-data: 当主从连接中断, 或主从复制建立期间, 是否允许 slave 对外提供服务. 默认为 yes, 即允许对外提供服务, 但有可能会读到脏的数据.
slave-read-only: 将 slave 设置为只读模式. 需要注意的是, 只读模式针对的只是客户端的写操作, 对于管理命令无效.
repl-diskless-sync,repl-diskless-sync-delay: 是否使用无盘复制. 为了降低主节点磁盘开销, Redis 支持无盘复制, 生成的 RDB 文件不保存到磁盘而是直接通过网络发送给从节点. 无盘复制适用于主节点所在机器磁盘性能较差但网络宽带较充裕的场景. 需要注意的是, 无盘复制目前依然处于实验阶段.
repl-ping-slave-period:master 每隔一段固定的时间向 SLAVE 发送一个 PING 命令.
repl-timeout: 复制超时时间.
- # The following option sets the replication timeout for:
- #
- # 1) Bulk transfer I/O during SYNC, from the point of view of slave.
- # 2) Master timeout from the point of view of slaves (data, pings).
- # 3) Slave timeout from the point of view of masters (REPLCONF ACK pings).
- #
- # It is important to make sure that this value is greater than the value
- # specified for repl-ping-slave-period otherwise a timeout will be detected
- # every time there is low traffic between the master and the slave.
repl-disable-tcp-nodelay: 设置为 yes, 主节点会等待一段时间才发送 TCP 数据包, 具体等待时间取决于 Linux 内核, 一般是 40 毫秒. 适用于主从网络环境复杂或带宽紧张的场景. 默认为 no.
repl-backlog-size: 复制积压缓冲区, 复制积压缓冲区是保存在主节点上的一个固定长度的队列. 用于从 Redis 2.8 开始引入的部分复制.
- # Set the replication backlog size. The backlog is a buffer that accumulates
- # slave data when slaves are disconnected for some time, so that when a slave
- # wants to reconnect again, often a full resync is not needed, but a partial
- # resync is enough, just passing the portion of data the slave missed while
- # disconnected.
- #
- # The bigger the replication backlog, the longer the time the slave can be
- # disconnected and later be able to perform a partial resynchronization.
- #
- # The backlog is only allocated once there is at least a slave connected.
只有 slave 连接上来, 才会开辟 backlog.
repl-backlog-ttl: 如果 master 上的 slave 全都断开了, 且在指定的时间内没有连接上, 则 backlog 会被 master 清除掉. repl-backlog-ttl 即用来设置该时长, 默认为 3600s, 如果设置为 0, 则永不清除.
slave-priority: 设置 slave 的优先级, 用于 Redis Sentinel 主从切换时使用, 值越小, 则提升为主的优先级越高. 需要注意的是, 如果设置为 0, 则代表该 slave 不参加选主.
slave-announce-ip,slave-announce-port : 常用于端口转发或 NAT 场景下, 对 Master 暴露真实 IP 和端口信息.
同步的过程
1. 从节点向主节点发送 PSYNC 命令.
2. 收到 PSYNC 命令的主节点执行 BGSAVE 命令, 在后台生成一个 RDB 文件, 并使用一个缓冲区记录从现在开始执行的所有写命令.
3. 当主节点的 BGSAVE 命令执行完毕时, 主节点会将 BGSAVE 命令生成的 RDB 文件发送给从节点, 从节点接受并载入这个 RDB 文件, 将自己的数据库状态更新至主节点执行 BGSAVE 命令时的数据库状态.
4. 主节点将记录在缓冲区里面的所有写命令发送给从节点, 从节点执行这些写命令, 将自己的数据库状态更新至主节点数据库当前所处的状态.
需要注意的是, 在步骤 2 中提到的缓冲区, 其实是有大小限制的, 其由 client-output-buffer-limit slave 256mb 64mb 60 决定, 该参数的语法及解释如下:
- # client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds>
- #
- # A client is immediately disconnected once the hard limit is reached, or if
- # the soft limit is reached and remains reached for the specified number of
- # seconds (continuously).
意思是如果该缓冲区的大小超过 256M, 或该缓冲区的大小超过 64M, 且持续了 60s, 主节点会马上断开从节点的连接. 断开连接后, 在 60s 之后(repl-timeout), 从节点发现没有从主节点中获得数据, 会重新启动复制.
在 Redis 2.8 之前, 如果因网络原因, 主从节点复制中断, 当再次建立连接时, 还是会执行 SYNC 命令进行全量复制. 效率较为低下. 从 Redis 2.8 开始, 引入了 PSYNC 命令代替 SYNC 命令来执行复制时的同步操作.
PSYNC 命令具有全量同步 (full resynchronization) 和增量同步(partial resynchronization).
全量同步的日志:
- master:
- 19544:M 05 Oct 20:44:04.713 * Slave 127.0.0.1:6380 asks for synchronization
- 19544:M 05 Oct 20:44:04.713 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for 'dc419fe03ddc9ba30cf2a2cf1894872513f1ef96', my
- replication IDs are 'f8a035fdbb7cfe435652b3445c2141f98a65e437' and '0000000000000000000000000000000000000000')19544:M 05 Oct 20:44:04.713 * Starting BGSAVE for SYNC with target: disk
- 19544:M 05 Oct 20:44:04.713 * Background saving started by pid 20585
- 20585:C 05 Oct 20:44:04.723 * DB saved on disk
- 20585:C 05 Oct 20:44:04.723 * RDB: 0 MB of memory used by copy-on-write
- 19544:M 05 Oct 20:44:04.813 * Background saving terminated with success
- 19544:M 05 Oct 20:44:04.814 * Synchronization with slave 127.0.0.1:6380 succeeded
- slave:
- 19746:S 05 Oct 20:44:04.288 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new
- master with just a partial transfer.19746:S 05 Oct 20:44:04.288 * SLAVE OF 127.0.0.1:6379 enabled (user request from 'id=3 addr=127.0.0.1:37128 fd=8 name= age=929 idle=0 flags=N db=0 sub=0 psub=
- 0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')19746:S 05 Oct 20:44:04.712 * Connecting to MASTER 127.0.0.1:6379
- 19746:S 05 Oct 20:44:04.712 * MASTER <-> SLAVE sync started
- 19746:S 05 Oct 20:44:04.712 * Non blocking connect for SYNC fired the event.
- 19746:S 05 Oct 20:44:04.713 * Master replied to PING, replication can continue...
- 19746:S 05 Oct 20:44:04.713 * Trying a partial resynchronization (request dc419fe03ddc9ba30cf2a2cf1894872513f1ef96:1191).
- 19746:S 05 Oct 20:44:04.713 * Full resync from master: f8a035fdbb7cfe435652b3445c2141f98a65e437:1190
- 19746:S 05 Oct 20:44:04.713 * Discarding previously cached master state.
- 19746:S 05 Oct 20:44:04.814 * MASTER <-> SLAVE sync: receiving 224566 bytes from master
- 19746:S 05 Oct 20:44:04.814 * MASTER <-> SLAVE sync: Flushing old data
- 19746:S 05 Oct 20:44:04.815 * MASTER <-> SLAVE sync: Loading DB in memory
- 19746:S 05 Oct 20:44:04.817 * MASTER <-> SLAVE sync: Finished with success
增量同步的日志:
- master:
- 19544:M 05 Oct 20:42:06.423 # Connection with slave 127.0.0.1:6380 lost.
- 19544:M 05 Oct 20:42:06.753 * Slave 127.0.0.1:6380 asks for synchronization
- 19544:M 05 Oct 20:42:06.753 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 0 bytes of backlog starting from offset 1037.
- slave:
- 19746:S 05 Oct 20:42:06.423 # Connection with master lost.
- 19746:S 05 Oct 20:42:06.423 * Caching the disconnected master state.
- 19746:S 05 Oct 20:42:06.752 * Connecting to MASTER 127.0.0.1:6379
- 19746:S 05 Oct 20:42:06.752 * MASTER <-> SLAVE sync started
- 19746:S 05 Oct 20:42:06.752 * Non blocking connect for SYNC fired the event.
- 19746:S 05 Oct 20:42:06.753 * Master replied to PING, replication can continue...
- 19746:S 05 Oct 20:42:06.753 * Trying a partial resynchronization (request f8a035fdbb7cfe435652b3445c2141f98a65e437:1037).
- 19746:S 05 Oct 20:42:06.753 * Successful partial resynchronization with master.
- 19746:S 05 Oct 20:42:06.753 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
在 Redis 4.0 中, master_replid 和 offset 存储在 RDB 文件中. 当从节点被优雅的关闭并重新启动时, Redis 能够从 RDB 文件中重新加载 master_replid 和 offset, 从而使增量同步成为可能.
增量同步的实现依赖于以下三部分:
1. 主从节点的复制偏移量.
2. 主节点的复制积压缓冲区.
3. 节点的运行 ID(run ID).
当一个从节点被提升为主节点时, 其它的从节点必须与新主节点重新同步. 在 Redis 4.0 之前, 因为 master_replid 发生了变化, 所以这个过程是一个全量同步. 在 Redis 4.0 之后, 新主节点会记录旧主节点的 naster_replid 和 offset, 因为能够接受来自其它从节点的增量同步请求, 即使请求中的 master_replid 不同. 在底层实现上, 当执行 slaveof no one 时, 会将 master_replid,master_repl_offset+1 复制为 master_replid,second_repl_offset.
复制相关变量
- # Replication
- role:master
- connected_slaves:2
- slave0:ip=127.0.0.1,port=6380,state=online,offset=5698,lag=0
- slave1:ip=127.0.0.1,port=6381,state=online,offset=5698,lag=0
- master_replid:e071f49c8d9d6719d88c56fa632435fba83e145d
- master_replid2:0000000000000000000000000000000000000000
- master_repl_offset:5698
- second_repl_offset:-1
- repl_backlog_active:1
- repl_backlog_size:1048576
- repl_backlog_first_byte_offset:1
- repl_backlog_histlen:5698
- # Replication
- role:slave
- master_host:127.0.0.1
- master_port:6379
- master_link_status:up
- master_last_io_seconds_ago:1
- master_sync_in_progress:0
- slave_repl_offset:126
- slave_priority:100
- slave_read_only:1
- connected_slaves:0
- master_replid:15715bc0bd37a71cae3d08b9566f001ccbc739de
- master_replid2:0000000000000000000000000000000000000000
- master_repl_offset:126
- second_repl_offset:-1
- repl_backlog_active:1
- repl_backlog_size:1048576
- repl_backlog_first_byte_offset:1
- repl_backlog_histlen:126
其中,
role: Value is "master" if the instance is replica of no one, or "slave" if the instance is a replica of some master instance. Note that a replica can be master of another replica (chained replication).
master_replid: The replication ID of the Redis server. 每个 Redis 节点启动后都会动态分配一个 40 位的十六进制字符串作为运行 ID. 主的运行 ID.
master_replid2: The secondary replication ID, used for PSYNC after a failover. 在执行 slaveof no one 时, 会将 master_replid,master_repl_offset+1 复制为 master_replid,second_repl_offset.
master_repl_offset: The server's current replication offset. Master 的复制偏移量.
second_repl_offset: The offset up to which replication IDs are accepted.
repl_backlog_active: Flag indicating replication backlog is active 是否开启了 backlog.
repl_backlog_size: Total size in bytes of the replication backlog buffer. repl-backlog-size 的大小.
repl_backlog_first_byte_offset: The master offset of the replication backlog buffer. backlog 中保存的 Master 最早的偏移量,
repl_backlog_histlen: Size in bytes of the data in the replication backlog buffer. backlog 中数据的大小.
If the instance is a replica, these additional fields are provided:
master_host: Host or IP address of the master. Master 的 IP.
master_port: Master listening TCP port. Master 的端口.
master_link_status: Status of the link (up/down). 主从之间的连接状态.
master_last_io_seconds_ago: Number of seconds since the last interaction with master. 主节点每隔 10s 对从从节点发送 PING 命令, 以判断从节点的存活性和连接状态. 该变量代表多久之前, 主从进行了心跳交互.
master_sync_in_progress: Indicate the master is syncing to the replica. 主节点是否在向从节点同步数据. 个人觉得, 应该指的是全量同步或增量同步.
slave_repl_offset: The replication offset of the replica instance. Slave 的复制偏移量.
slave_priority: The priority of the instance as a candidate for failover. Slave 的权重.
slave_read_only: Flag indicating if the replica is read-only. Slave 是否处于可读模式.
If a SYNC operation is on-going, these additional fields are provided:
master_sync_left_bytes: Number of bytes left before syncing is complete.
- master_sync_last_io_seconds_ago: Number of seconds since last transfer I/O during a SYNC operation.
- If the link between master and replica is down, an additional field is provided:
master_link_down_since_seconds: Number of seconds since the link is down. 主从连接中断持续的时间.
The following field is always provided:
connected_slaves: Number of connected replicas. 连接的 Slave 的数量.
If the server is configured with the min-slaves-to-write (or starting with Redis 5 with the min-replicas-to-write) directive, an additional field is provided:
min_slaves_good_slaves: Number of replicas currently considered good. 状态正常的从节点的数量.
For each replica, the following line is added:
slaveXXX: id, IP address, port, state, offset, lag. Slave 的状态.
slave0:ip=127.0.0.1,port=6381,state=online,offset=1288,lag=1
如何监控主从延迟
- # Replication
- role:master
- connected_slaves:2
- slave0:ip=127.0.0.1,port=6381,state=online,offset=560,lag=0
- slave1:ip=127.0.0.1,port=6380,state=online,offset=560,lag=0
- master_replid:15715bc0bd37a71cae3d08b9566f001ccbc739de
- master_replid2:0000000000000000000000000000000000000000
- master_repl_offset:560
其中, master_repl_offset 是主节点的复制偏移量, slaveX 中的 offset 即对应从节点的复制偏移量, 两者的差值即主从的延迟量.
如何评估 backlog 缓冲区的大小
t * (master_repl_offset2 - master_repl_offset1 ) / (t2 - t1)
t is how long the disconnections may last in seconds.
参考:
1. 《Redis 开发与运维》
2. 《Redis 设计与实现》
3. 《Redis 4.X Cookbook》
来源: https://www.cnblogs.com/ivictor/p/9749491.html