问题:
客户端不能推送数据到服务端.
排查:
ping ip 或者 telnet port 全是正常的, 不奏效.
通过 wireshark 抓取报文查看, 发现一个奇怪现象是窗口不固定, 但是整体趋势是逐渐减小, 直到为 0. 服务端报文如下:
- 15:41:29.680256 IP 110.89.84.123.1950> 110.89.84.126.52021: Flags [.], ack 107925, win 38, options [nop,nop,TS val 1604471956 ecr 1606303303], length 0
- 0x0000: 0022 462c a12f d067 e50f e893 0800 4500 ."F,./.g......E.
- 0x0010: 0034 a79b 4000 4006 d417 0b0c 547b 0b0c .4..@.@.....T{..
- 0x0020: 547e 079e cb35 0c6f 535c 531b 640c 8010 T~...5.oS\S.d...
- 0x0030: 0026 8383 0000 0101 080a 5fa2 4c94 5fbe .&........_.L._.
- 0x0040: 3e47>G
- 15:41:29.719474 IP 110.89.84.123.1950> 110.89.84.126.52021: Flags [.], ack 112269, win 5, options [nop,nop,TS val 1604471996 ecr 1606303303], length 0
- 0x0000: 0022 462c a12f d067 e50f e893 0800 4500 ."F,./.g......E.
- 0x0010: 0034 a79c 4000 4006 d416 0b0c 547b 0b0c .4..@.@.....T{..
- 0x0020: 547e 079e cb35 0c6f 535c 531b 7504 8010 T~...5.oS\S.u...
- 0x0030: 0005 7284 0000 0101 080a 5fa2 4cbc 5fbe ..r......._.L._.
- 0x0040: 3e47>G
- 15:41:29.934875 IP 110.89.84.126.52021> 110.89.84.123.1950: Flags [P.], seq 112269:112909, ack 88, win 115, options [nop,nop,TS val 1606303559 ecr 1604471996], length 640
- 0x0000: d067 e50f e893 0022 462c a12f 0800 4500 .g....."F,./..E.
- 0x0010: 02b4 5a89 4000 4006 1eaa 0b0c 547e 0b0c ..Z.@.@.....T~..
- 0x0020: 547b cb35 079e 531b 7504 0c6f 535c 8018 T{.5..S.u..oS\..
- 0x0030: 0073 c1b7 0000 0101 080a 5fbe 3f47 5fa2 .s........_.?G_.
- 15:41:29.975487 IP 110.89.84.123.1950> 110.89.84.126.52021: Flags [.], ack 116613, win 10, options [nop,nop,TS val 1604472252 ecr 1606303559], length 0
- 0x0000: 0022 462c a12f d067 e50f e893 0800 4500 ."F,./.g......E.
- 0x0010: 0034 a79e 4000 4006 d414 0b0c 547b 0b0c .4..@.@.....T{..
- 0x0020: 547e 079e cb35 0c6f 535c 531b 85fc 8010 T~...5.oS\S.....
- 0x0030: 000a 5f87 0000 0101 080a 5fa2 4dbc 5fbe .._......._.M._.
- 0x0040: 3f47 ?G
- 15:41:30.191875 IP 110.89.84.126.52021> 110.89.84.123.1950: Flags [P.], seq 116613:117893, ack 88, win 115, options [nop,nop,TS val 1606303816 ecr 1604472252], length 1280
- 0x0000: d067 e50f e893 0022 462c a12f 0800 4500 .g....."F,./..E.
- 0x0010: 0534 5a8d 4000 4006 1c26 0b0c 547e 0b0c .4Z.@.@..&..T~..
- 0x0020: 547b cb35 079e 531b 85fc 0c6f 535c 8018 T{.5..S....oS\..
- 0x0030: 0073 c437 0000 0101 080a 5fbe 4048 5fa2 .s.7......_.@H_.
- 0x0040: 4dbc 2037 3435 6634 3361 3238 3334 6534 M..745f43a2834e4
- 0x0050: 6465 3462 3561 3862 6630 3031 3333 6564 de4b5a8bf00133ed
- 0x0060: 6462 3401 0d01 0400 0000 5308 0b10 0000 db4.......S..... a3
- 15:41:30.192523 IP 110.89.84.123.1950> 110.89.84.126.52021: Flags [.], ack 117893, win 0, options [nop,nop,TS val 1604472469 ecr 1606303816], length 0
- 0x0000: 0022 462c a12f d067 e50f e893 0800 4500 ."F,./.g......E.
- 0x0010: 0034 a79f 4000 4006 d413 0b0c 547b 0b0c .4..@.@.....T{..
- 0x0020: 547e 079e cb35 0c6f 535c 531b 8afc 8010 T~...5.oS\S.....
- 0x0030: 0000 58b7 0000 0101 080a 5fa2 4e95 5fbe ..X......._.N._.
- 0x0040: 4048 @H
- 15:41:30.406872 IP 110.89.84.126.52021> 110.89.84.123.1950: Flags [.], ack 88, win 115, options [nop,nop,TS val 1606304031 ecr 1604472469], length 0
- 0x0000: d067 e50f e893 0022 462c a12f 0800 4500 .g....."F,./..E.
- 0x0010: 0034 5a8e 4000 4006 2125 0b0c 547e 0b0c .4Z.@.@.!%..T~..
- 0x0020: 547b cb35 079e 531b 8afb 0c6f 535c 8010 T{.5..S....oS\..
- 0x0030: 0073 bf37 0000 0101 080a 5fbe 411f 5fa2 .s.7......_.A._.
- 0x0040: 4e95 N.
- 15:41:30.407143 IP 110.89.84.123.1950> 110.89.84.126.52021: Flags [.], ack 117893, win 0, options [nop,nop,TS val 1604472683 ecr 1606303816], length 0
- 0x0000: 0022 462c a12f d067 e50f e893 0800 4500 ."F,./.g......E.
- 0x0010: 0034 a7a0 4000 4006 d412 0b0c 547b 0b0c .4..@.@.....T{..
- 0x0020: 547e 079e cb35 0c6f 535c 531b 8afc 8010 T~...5.oS\S.....
- 0x0030: 0000 57e1 0000 0101 080a 5fa2 4f6b 5fbe ..W......._.Ok_.
- 0x0040: 4048 @H
至此服务端一直回复服务端窗口为 0, 导致客户端数据无法回传到服务端.
通过 netstat -ano 查看服务端 TCP 内核的发送和接受缓冲区, 发现服务端接受缓冲字节, 但是一直不能发送.
- [root@xdja tomcat]# netstat -ant
- Active Internet connections (servers and established)
- Proto Recv-Q Send-Q Local Address Foreign Address State
- tcp 0 0 110.89.84.123:14468 110.89.84.33:1950 ESTABLISHED
- tcp 0 0 :::1950 :::* LISTEN
- tcp 115005 0 ::ffff:110.89.84.123:1950 ::ffff:110.89.84.126:52021 ESTABLISHED
结论:
由此可以判断, 客户端一直在发数据, 但是服务端处理数据整体慢于客户端发送数据, 导致服务端数据积压.
解决方案:
后台修改成异步处理, 如果收到 TCP 消息, 先缓存到业务中, 然后启动线程消费.
来源: https://www.qcloud.com/developer/article/1416157