跟我一起学TCP/IP

723937936@qq.com · #46

Congestion Avoidance Algorithm

TCP判断网络发生拥塞是基于一个假设：出现丢包时就认为网络发生了拥塞
拥塞避免：当TCP发现丢包时采取的措施

出现拥塞的两个信号：
1. timeout
2. receipt of duplicate ACKs

拥塞避免的两种措施：
措施1：立即降低传输率，然后再快速增加传输率，当达到一个阈值（ssthresh）后，再切换到措施2
措施2：缓慢的增加传输率

具体执行哪种措施，是根据拥塞是否严重而定的：当拥塞严重时执行措施1，拥塞不严重时执行措施2

判断拥塞是否严重是根据出现拥塞的信号来确定的：
1. timeout 被认为是严重拥塞
2. receipt of duplicate ACKs 被认为拥塞不严重（重复的ACK是由接收端接收后续的segments触发的，由此可知不是那么拥堵）

措施1

当发生严重拥塞时，TCP执行措施1

如何实现立即降低传输率？
通过将cwnd重置为1来实现的

如何实现快速增加传输率？
通过慢启动算法完成的，慢启动算法使得cwnd指数增长

阈值（ssthresh）是多大？
当发生拥塞（无论是否严重）时，ssthresh设为当前窗口大小的一半，当前窗口大小是min(cwnd, advertised_wnd)，这里的cwnd是指重置为1之前的旧值
我感觉现实中如果对端通告的advertised_wnd比较小，可能就不会发生拥塞，更可能是因为cwnd比较大才造成拥塞
所以这里可以认为ssthresh设为cwnd的一半

当慢启动算法执行一段时候后，使得cwnd > ssthresh时，切换到措施2

措施2

我们知道慢启动算法，每收到一个ack，cwnd就增加一个segment
假如此刻cwnd是8个segments，那么TCP会快速注入8个segments到网络，也就是一个RTT后，可能最多会收到8个ACK，那么一个RTT后cwnd就增加到了16

措施2的目的是缓慢增加cwnd，一个RTT只增加1个segment（是慢启动算法增加量的1/8）
cwnd每经过一个RTT增加1个segment，cwnd的增加是线性的

723937936@qq.com · #47

Fast Retransmit and Fast Recovery Algorithms

TCP流控的不同模式：
1. slow start mode
2. fast recovery mode
3. congestion avoidance mode
4. maximum throughput mode

上面的模式划分是我个人的理解，仅供参考

上一个帖子里说的拥塞避免措施中包含了慢启动算法，造成了概念混淆，这里明确下：拥塞避免算法就是指TCP处于拥塞避免模式

Fast Retransmit Algorithm

TCP在收到每个乱序的segment时，会立即发送一个duplicate ACK，这个duplicate ACK告知对端自己期望的sequence number

Fast Retransmit Algorithm：TCP如果连续收到3个duplicate ACKs，那么TCP会立即重传该丢失的segment

我理解fast retransmit algorithm是fast recovery mode里的一个操作，如果这么理解的话，那么当TCP连续收到3个duplicate ACKs时，TCP会进入fast recovery mode

fast recovery mode

fast recovery mode的具体操作如下：

1. 当TCP收到第3个duplicate ACK时，设置ssthresh为当前有效窗口大小的一半，设置cwnd为ssthresh+3*MSS （参考书上的图21.7和图21.11的segment 62）
2. 重传丢失的segment
3. 之后每次收到一个duplicate ACK时，设置cwnd=cwnd+MSS （参考书上的图21.7和图21.11的segment 64、65、66、68、70）
4. 当收到第2步重传的那个segment的ACK时，设置cwnd=ssthresh，因为此时cwnd=ssthresh，且收到的是new data的ACK（非duplicate ACK），所以更新cwnd=cwnd+MSS，然后退出fast recovery mode，进入congestion avoidance mode （参考书上的图21.7和图21.11的segment 72）

说明：
第1步里的当前有效窗口大小指的是min(cwnd, addvertised_wnd)
第3步的继续增cwnd的值的理由是，对端依然可以继续接受segment，说明网络不那么拥塞
第4步退出fast recovery mode前将cwnd重置为ssthresh（第1步设置的值ssthresh）的理由是，因为当前网络拥塞不是很严重，没必要把cwnd降的太低

congestion avoidance mode

当cwnd>ssthresh时，当接收到new data ACK时，cwnd的更新公式，如下：

代码：全选

cwnd <- cwnd + segsize*segsize/cwnd + segsize/8

上面这个公式是4.3BSD和4.4BSD使用的公式，作者已经指出，这个公式不符合标准，但是作者为了得到与实现计算结果一致的值，这里还是使用这个公式

723937936@qq.com · #48

ICMP Errors

观察ICMP host unreachable错误

首先把linux主机设为路由器，前面说过linux作为路由器，在转发ip数据报时，如果查不到route，默认不会回送ICMP host unreachable错误消息
在路由表加一条拒绝的route，才会回送ICMP host unreachable错误消息

代码：全选

linux $ sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'                 # 配置为路由器
linux $ sudo route add -net 192.168.2.0 netmask 255.255.255.0 metric 1024 reject       # 添加显式拒绝去往网络192.168.2.0的route
linux $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 enp0s3
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 enp0s3
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
192.168.2.0     -               255.255.255.0   !     1024   -        0 -

先把macos的默认路由配置为linux，然后发起连接

代码：全选

macos $ sock 192.168.2.3 8888
connect() error: Network is unreachable

上面建立连接超时了，报告的错误是Network is unreachable，而不是Operation timed out

下面是在linux主机上的抓包

代码：全选

20:51:25.505567 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:26.508039 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:27.509243 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:28.510379 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:32.513780 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72

从上面的tcpdump输出，显示了路由器回送了5个ICMP host unreachable错误消息
虽然报告的ICMP错误是host unreachable，而不是network unreachable，但是macos返回给应用进程的errno却是ENETUNREACH
应用进程应该把ENETUNREACH和EHOSTUNREACH视为一样的错误

Repacketization

在macos上发起连接

代码：全选

macos $ sock linux 8888
hello there
line number 2                # 输入这行前拔掉linux主机的网线或用iptables丢弃掉目的端口为8888的segments
and 3

代码：全选

 9.103102 ( 9.102827) IP macos.52015 > linux.8888: flags [PA], seq 3271102335:3271102347, ack 812886231, win 2058, options [ts 1352883248 3324909990], length 12
 9.103129 ( 0.000027) IP linux.8888 > macos.52015: flags [A], seq 812886231:812886231, ack 3271102347, win 509, options [ts 3324919093 1352883248], length 0

// 这里断开连接

21.337522 (12.234393) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895468 3324919093], length 14
21.488395 ( 0.150873) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895618 3324919093], length 14
21.789470 ( 0.301075) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895918 3324919093], length 14
22.190727 ( 0.401257) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352896319 3324919093], length 14
22.791443 ( 0.600716) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352896919 3324919093], length 14
23.793449 ( 1.002006) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352897919 3324919093], length 14
25.596108 ( 1.802659) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352899719 3324919093], length 14
28.999289 ( 3.403181) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352903119 3324919093], length 14
35.623246 ( 6.623957) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352909719 3324919093], length 14
42.227430 ( 6.604184) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352916319 3324919093], length 14

// 这里恢复连接

48.837282 ( 6.609852) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102367, ack 812886231, win 2058, options [ts 1352922920 3324919093], length 20
48.837307 ( 0.000025) IP linux.8888 > macos.52015: flags [A], seq 812886231:812886231, ack 3271102367, win 509, options [ts 3324958827 1352922920], length 0

前两行对应"hello there\n"共12个字节
中间的10行对应"line number 2\n"共14个字节
最后两行表明，TCP将"line number 2\n"和"and 3\n"共20个字节重新打包一起发送了

另外macos的RTO似乎并不是指数退避算法，后面的三行重传RTO都是6.6s

723937936@qq.com · #49

第22章：TCP Persist Timer

如果接收端通告的窗口大小为0，那么接收端就不能发送数据了，为了防止接收端发送的窗口更新segment丢失，发送端会向接收端查询窗口大小，TCP通过启动一个称为Persist Timer的定时器来定时发送查询，这里Persist的含义是持之以恒、持久的意思，也就是说发送端永远不会放弃查询，直到接收端通告的窗口不为0为止

观察window probes

代码：全选

 0.000000 ( 0.000000) IP linux.47154 > macos.5555: flags [S], seq 133017518:133017518, win 64240, options [mss 1460,ts 3354223538 0,ws 7], length 0
 0.000489 ( 0.000489) IP macos.5555 > linux.47154: flags [SA], seq 83378274:83378274, ack 133017519, win 33304, options [mss 1460,ws 3,ts 1408617785 3354223538], length 0
 0.000544 ( 0.000055) IP linux.47154 > macos.5555: flags [A], seq 133017519:133017519, ack 83378275, win 502, options [ts 3354223538 1408617785], length 0
 
 0.000661 ( 0.000117) IP linux.47154 > macos.5555: flags [PA], seq 133017519:133018543, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1024
 0.000706 ( 0.000045) IP linux.47154 > macos.5555: flags [A], seq 133018543:133019991, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1448
 0.000765 ( 0.000059) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133018543, win 4035, options [ts 1408617785 3354223538], length 0
 0.000783 ( 0.000018) IP linux.47154 > macos.5555: flags [PA], seq 133019991:133020591, ack 83378275, win 502, options [ts 3354223538 1408617785], length 600
 0.000796 ( 0.000013) IP linux.47154 > macos.5555: flags [A], seq 133020591:133022039, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1448
 0.000806 ( 0.000010) IP linux.47154 > macos.5555: flags [PA], seq 133022039:133024935, ack 83378275, win 502, options [ts 3354223538 1408617785], length 2896
 0.000859 ( 0.000053) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133020591, win 5536, options [ts 1408617785 3354223538], length 0
 0.000867 ( 0.000008) IP linux.47154 > macos.5555: flags [PA], seq 133024935:133032175, ack 83378275, win 502, options [ts 3354223538 1408617785], length 7240
 0.000875 ( 0.000008) IP linux.47154 > macos.5555: flags [PA], seq 133032175:133032879, ack 83378275, win 502, options [ts 3354223539 1408617785], length 704
 0.000905 ( 0.000030) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133023487, win 5174, options [ts 1408617785 3354223538], length 0
 0.000905 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133024935, win 4993, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000052) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133027831, win 4631, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133030727, win 4269, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133032175, win 4088, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133032879, win 4000, options [ts 1408617785 3354223539], length 0
 0.000977 ( 0.000020) IP linux.47154 > macos.5555: flags [PA], seq 133032879:133033903, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1024
 0.000987 ( 0.000010) IP linux.47154 > macos.5555: flags [A], seq 133033903:133035351, ack 83378275, win 502, options [ts 3354223539 1408617785], length 144

跟我一起学TCP/IP

Re: 跟我一起学TCP/IP

Re: 跟我一起学TCP/IP

Re: 跟我一起学TCP/IP

Re: 跟我一起学TCP/IP