跟我一起学TCP/IP

Web、Mail、Ftp、DNS、Proxy、VPN、Samba、LDAP 等基础网络服务
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#46

帖子 723937936@qq.com » 2023-04-11 21:45

Congestion Avoidance Algorithm

TCP判断网络发生拥塞是基于一个假设:出现丢包时就认为网络发生了拥塞
拥塞避免:当TCP发现丢包时采取的措施

出现拥塞的两个信号:
1. timeout
2. receipt of duplicate ACKs

拥塞避免的两种措施:
措施1:立即降低传输率,然后再快速增加传输率,当达到一个阈值(ssthresh)后,再切换到措施2
措施2:缓慢的增加传输率

具体执行哪种措施,是根据拥塞是否严重而定的:当拥塞严重时执行措施1,拥塞不严重时执行措施2

判断拥塞是否严重是根据出现拥塞的信号来确定的:
1. timeout 被认为是严重拥塞
2. receipt of duplicate ACKs 被认为拥塞不严重(重复的ACK是由接收端接收后续的segments触发的,由此可知不是那么拥堵)

措施1

当发生严重拥塞时,TCP执行措施1

如何实现立即降低传输率?
通过将cwnd重置为1来实现的

如何实现快速增加传输率?
通过慢启动算法完成的,慢启动算法使得cwnd指数增长

阈值(ssthresh)是多大?
当发生拥塞(无论是否严重)时,ssthresh设为当前窗口大小的一半,当前窗口大小是min(cwnd, advertised_wnd),这里的cwnd是指重置为1之前的旧值
我感觉现实中如果对端通告的advertised_wnd比较小,可能就不会发生拥塞,更可能是因为cwnd比较大才造成拥塞
所以这里可以认为ssthresh设为cwnd的一半

当慢启动算法执行一段时候后,使得cwnd > ssthresh时,切换到措施2

措施2

我们知道慢启动算法,每收到一个ack,cwnd就增加一个segment
假如此刻cwnd是8个segments,那么TCP会快速注入8个segments到网络,也就是一个RTT后,可能最多会收到8个ACK,那么一个RTT后cwnd就增加到了16

措施2的目的是缓慢增加cwnd,一个RTT只增加1个segment(是慢启动算法增加量的1/8)
cwnd每经过一个RTT增加1个segment,cwnd的增加是线性
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#47

帖子 723937936@qq.com » 2023-04-15 19:34

Fast Retransmit and Fast Recovery Algorithms

TCP流控的不同模式:
1. slow start mode
2. fast recovery mode
3. congestion avoidance mode
4. maximum throughput mode

上面的模式划分是我个人的理解,仅供参考

上一个帖子里说的拥塞避免措施中包含了慢启动算法,造成了概念混淆,这里明确下:拥塞避免算法就是指TCP处于拥塞避免模式

Fast Retransmit Algorithm

TCP在收到每个乱序的segment时,会立即发送一个duplicate ACK,这个duplicate ACK告知对端自己期望的sequence number

Fast Retransmit Algorithm:TCP如果连续收到3个duplicate ACKs,那么TCP会立即重传该丢失的segment

我理解fast retransmit algorithm是fast recovery mode里的一个操作,如果这么理解的话,那么当TCP连续收到3个duplicate ACKs时,TCP会进入fast recovery mode

fast recovery mode

fast recovery mode的具体操作如下:

1. 当TCP收到第3个duplicate ACK时,设置ssthresh为当前有效窗口大小的一半,设置cwnd为ssthresh+3*MSS (参考书上的图21.7和图21.11的segment 62)
2. 重传丢失的segment
3. 之后每次收到一个duplicate ACK时,设置cwnd=cwnd+MSS (参考书上的图21.7和图21.11的segment 64、65、66、68、70)
4. 当收到第2步重传的那个segment的ACK时,设置cwnd=ssthresh,因为此时cwnd=ssthresh,且收到的是new data的ACK(非duplicate ACK),所以更新cwnd=cwnd+MSS,然后退出fast recovery mode,进入congestion avoidance mode (参考书上的图21.7和图21.11的segment 72)

说明:
第1步里的当前有效窗口大小指的是min(cwnd, addvertised_wnd)
第3步的继续增cwnd的值的理由是,对端依然可以继续接受segment,说明网络不那么拥塞
第4步退出fast recovery mode前将cwnd重置为ssthresh(第1步设置的值ssthresh)的理由是,因为当前网络拥塞不是很严重,没必要把cwnd降的太低


congestion avoidance mode

当cwnd>ssthresh时,当接收到new data ACK时,cwnd的更新公式,如下:

代码: 全选

cwnd <- cwnd + segsize*segsize/cwnd + segsize/8
上面这个公式是4.3BSD和4.4BSD使用的公式,作者已经指出,这个公式不符合标准,但是作者为了得到与实现计算结果一致的值,这里还是使用这个公式
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#48

帖子 723937936@qq.com » 2023-04-17 21:22

ICMP Errors


观察ICMP host unreachable错误

首先把linux主机设为路由器,前面说过linux作为路由器,在转发ip数据报时,如果查不到route,默认不会回送ICMP host unreachable错误消息
在路由表加一条拒绝的route,才会回送ICMP host unreachable错误消息

代码: 全选

linux $ sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'                 # 配置为路由器
linux $ sudo route add -net 192.168.2.0 netmask 255.255.255.0 metric 1024 reject       # 添加显式拒绝去往网络192.168.2.0的route
linux $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 enp0s3
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 enp0s3
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
192.168.2.0     -               255.255.255.0   !     1024   -        0 -
先把macos的默认路由配置为linux,然后发起连接

代码: 全选

macos $ sock 192.168.2.3 8888
connect() error: Network is unreachable
上面建立连接超时了,报告的错误是Network is unreachable,而不是Operation timed out

下面是在linux主机上的抓包

代码: 全选

20:51:25.505567 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:26.508039 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:27.509243 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:28.510379 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
20:51:32.513780 IP linux > macos: ICMP host 192.168.2.3 unreachable, length 72
从上面的tcpdump输出,显示了路由器回送了5个ICMP host unreachable错误消息
虽然报告的ICMP错误是host unreachable,而不是network unreachable,但是macos返回给应用进程的errno却是ENETUNREACH
应用进程应该把ENETUNREACH和EHOSTUNREACH视为一样的错误

Repacketization

在macos上发起连接

代码: 全选

macos $ sock linux 8888
hello there
line number 2                # 输入这行前拔掉linux主机的网线或用iptables丢弃掉目的端口为8888的segments
and 3

代码: 全选

 9.103102 ( 9.102827) IP macos.52015 > linux.8888: flags [PA], seq 3271102335:3271102347, ack 812886231, win 2058, options [ts 1352883248 3324909990], length 12
 9.103129 ( 0.000027) IP linux.8888 > macos.52015: flags [A], seq 812886231:812886231, ack 3271102347, win 509, options [ts 3324919093 1352883248], length 0

// 这里断开连接

21.337522 (12.234393) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895468 3324919093], length 14
21.488395 ( 0.150873) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895618 3324919093], length 14
21.789470 ( 0.301075) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895918 3324919093], length 14
22.190727 ( 0.401257) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352896319 3324919093], length 14
22.791443 ( 0.600716) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352896919 3324919093], length 14
23.793449 ( 1.002006) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352897919 3324919093], length 14
25.596108 ( 1.802659) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352899719 3324919093], length 14
28.999289 ( 3.403181) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352903119 3324919093], length 14
35.623246 ( 6.623957) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352909719 3324919093], length 14
42.227430 ( 6.604184) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352916319 3324919093], length 14

// 这里恢复连接

48.837282 ( 6.609852) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102367, ack 812886231, win 2058, options [ts 1352922920 3324919093], length 20
48.837307 ( 0.000025) IP linux.8888 > macos.52015: flags [A], seq 812886231:812886231, ack 3271102367, win 509, options [ts 3324958827 1352922920], length 0
前两行对应"hello there\n"共12个字节
中间的10行对应"line number 2\n"共14个字节
最后两行表明,TCP将"line number 2\n"和"and 3\n"共20个字节重新打包一起发送了

另外macos的RTO似乎并不是指数退避算法,后面的三行重传RTO都是6.6s
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#49

帖子 723937936@qq.com » 2023-04-23 21:14

第22章:TCP Persist Timer

如果接收端通告的窗口大小为0,那么接收端就不能发送数据了,为了防止接收端发送的窗口更新segment丢失,发送端会向接收端查询窗口大小,TCP通过启动一个称为Persist Timer的定时器来定时发送查询,这里Persist的含义是持之以恒、持久的意思,也就是说发送端永远不会放弃查询,直到接收端通告的窗口不为0为止

观察window probes

代码: 全选

 0.000000 ( 0.000000) IP linux.47154 > macos.5555: flags [S], seq 133017518:133017518, win 64240, options [mss 1460,ts 3354223538 0,ws 7], length 0
 0.000489 ( 0.000489) IP macos.5555 > linux.47154: flags [SA], seq 83378274:83378274, ack 133017519, win 33304, options [mss 1460,ws 3,ts 1408617785 3354223538], length 0
 0.000544 ( 0.000055) IP linux.47154 > macos.5555: flags [A], seq 133017519:133017519, ack 83378275, win 502, options [ts 3354223538 1408617785], length 0
 
 0.000661 ( 0.000117) IP linux.47154 > macos.5555: flags [PA], seq 133017519:133018543, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1024
 0.000706 ( 0.000045) IP linux.47154 > macos.5555: flags [A], seq 133018543:133019991, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1448
 0.000765 ( 0.000059) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133018543, win 4035, options [ts 1408617785 3354223538], length 0
 0.000783 ( 0.000018) IP linux.47154 > macos.5555: flags [PA], seq 133019991:133020591, ack 83378275, win 502, options [ts 3354223538 1408617785], length 600
 0.000796 ( 0.000013) IP linux.47154 > macos.5555: flags [A], seq 133020591:133022039, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1448
 0.000806 ( 0.000010) IP linux.47154 > macos.5555: flags [PA], seq 133022039:133024935, ack 83378275, win 502, options [ts 3354223538 1408617785], length 2896
 0.000859 ( 0.000053) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133020591, win 5536, options [ts 1408617785 3354223538], length 0
 0.000867 ( 0.000008) IP linux.47154 > macos.5555: flags [PA], seq 133024935:133032175, ack 83378275, win 502, options [ts 3354223538 1408617785], length 7240
 0.000875 ( 0.000008) IP linux.47154 > macos.5555: flags [PA], seq 133032175:133032879, ack 83378275, win 502, options [ts 3354223539 1408617785], length 704
 0.000905 ( 0.000030) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133023487, win 5174, options [ts 1408617785 3354223538], length 0
 0.000905 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133024935, win 4993, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000052) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133027831, win 4631, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133030727, win 4269, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133032175, win 4088, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133032879, win 4000, options [ts 1408617785 3354223539], length 0
 0.000977 ( 0.000020) IP linux.47154 > macos.5555: flags [PA], seq 133032879:133033903, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1024
 0.000987 ( 0.000010) IP linux.47154 > macos.5555: flags [A], seq 133033903:133035351, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1448
 0.001069 ( 0.000082) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133033903, win 3872, options [ts 1408617785 3354223539], length 0
 0.001083 ( 0.000014) IP linux.47154 > macos.5555: flags [PA], seq 133035351:133035951, ack 83378275, win 502, options [ts 3354223539 1408617785], length 600
 0.001098 ( 0.000015) IP linux.47154 > macos.5555: flags [A], seq 133035951:133037399, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1448
 0.001105 ( 0.000007) IP linux.47154 > macos.5555: flags [PA], seq 133037399:133040295, ack 83378275, win 502, options [ts 3354223539 1408617785], length 2896
 0.001114 ( 0.000009) IP linux.47154 > macos.5555: flags [PA], seq 133040295:133047535, ack 83378275, win 502, options [ts 3354223539 1408617785], length 7240
 0.001129 ( 0.000015) IP linux.47154 > macos.5555: flags [PA], seq 133047535:133062015, ack 83378275, win 502, options [ts 3354223539 1408617785], length 14480
 0.002829 ( 0.001700) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133035951, win 3616, options [ts 1408617785 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133038847, win 3254, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133040295, win 3073, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133043191, win 2711, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133046087, win 2349, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133047535, win 2168, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133050431, win 1806, options [ts 1408617787 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133053327, win 1444, options [ts 1408617787 3354223539], length 0
 0.002850 ( 0.000021) IP linux.47154 > macos.5555: flags [PA], seq 133062015:133064879, ack 83378275, win 502, options [ts 3354223540 1408617785], length 2864
 0.002906 ( 0.000056) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133056223, win 1082, options [ts 1408617787 3354223539], length 0
 0.002906 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133059119, win 720, options [ts 1408617787 3354223539], length 0
 0.002906 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133062015, win 358, options [ts 1408617787 3354223539], length 0
 0.003667 ( 0.000761) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408617787 3354223540], length 0
 // 下面是window probes
 0.211649 ( 0.207982) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354223749 1408617787], length 0
 0.212492 ( 0.000843) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408617993 3354223540], length 0
 0.653414 ( 0.440922) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354224191 1408617993], length 0
 0.653753 ( 0.000339) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408618432 3354223540], length 0
 1.484679 ( 0.830926) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354225022 1408618432], length 0
 1.484958 ( 0.000279) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408619260 3354223540], length 0
 3.139554 ( 1.654596) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354226677 1408619260], length 0
 3.140043 ( 0.000489) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408620912 3354223540], length 0
 6.446848 ( 3.306805) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354229984 1408620912], length 0
 6.447465 ( 0.000617) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408624210 3354223540], length 0
13.091919 ( 6.644454) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354236630 1408624210], length 0
13.092627 ( 0.000708) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408630835 3354223540], length 0
26.405423 (13.312796) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354249943 1408630835], length 0
26.405857 ( 0.000434) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408644125 3354223540], length 0
53.539589 (27.133732) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354277077 1408644125], length 0
53.540193 ( 0.000604) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408671224 3354223540], length 0
106.787819 (53.247626) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354330325 1408671224], length 0
106.788233 ( 0.000414) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408724333 3354223540], length 0
213.283748 (106.495515) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354436821 1408724333], length 0
213.283997 ( 0.000249) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408830564 3354223540], length 0
334.116210 (120.832213) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354557654 1408830564], length 0
334.116433 ( 0.000223) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408951116 3354223540], length 0
454.947503 (120.831070) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354678485 1408951116], length 0
454.947731 ( 0.000228) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1409071707 3354223540], length 0
575.783983 (120.836252) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354799322 1409071707], length 0
575.784589 ( 0.000606) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1409192339 3354223540], length 0
从上面的输出看,Persist Timer的时间间隔遵循指数退避算法:0.2、0.4、0.8、1.6 ... 120、120、120 ...
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#50

帖子 723937936@qq.com » 2023-04-25 21:26

Silly Window Syndrome

Silly Window Syndrome:傻逼窗口综合征指的是发送端每次发送非常小的segment,原因有二:
1. 接收端每次通告一个很小的窗口
2. 发送端应用每次write很小的数据

为了避免Silly Window Syndrome,发送端和接收端都采取了一定措施

接收端采取的措施:
1. 接收端TCP模块绝不会通告小的窗口(0除外),小是指小于min(MSS, receive_buffer_size / 2),这里的MSS应该是指接收端通告给发送端的MSS
2. 如果接收端上一次通告的窗口大小为X,当发送端发送一个大小为Y的segment,如果X-Y>0,那么不论X-Y值多小,发送端都必须通告该值(参考图22.3的segment 13)

发送端是否可以输出segment,由下列条件决定:
1. 如果send buffer里数据量大于等于MSS,则可以输出
2. 如果send buffer里数据量大于等于接收端曾经通告的最大窗口的一半,则可以输出
3. 如果启用了Nagle algorithm,只要没有未确认的segment,就可以输出
4. 如果禁用了Nagle algorithm,则可以输出

第3点和第4点,似乎还与接收端通告的窗口大小有关,如果对端通告的窗口小于min(MSS, receive_buffer_size / 2),则也不会立即发送,而是要等Persist timer超时才发送(参考图22.3的segment 14)

书上最后说FIN_WAIT2状态没有设置定时器,那是作者用的sun系统没有设置定时器,前面学习过,linux系统是会设置定时器的

书上图22.3的segment 16和segment 17,还有segment 20表明:即使receive buffer的可用大小超过MSS,接收端也不会主动发送窗口更新,只有发送端发送window probe才通告可用的窗口大小。
只有receive buffer的可用大小超过receive_buffer_size/2时,接收端才主动发送窗口更新
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#51

帖子 723937936@qq.com » 2023-04-27 20:26

第23章:TCP Keepalive Timer

学习完这一章的结论是不要使用TCP Keepalive Timer,而是在应用层使用心跳
723937936@qq.com
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP

#52

帖子 723937936@qq.com » 2023-05-03 9:06

第24章:TCP Futures and Performance

Path MTU Discovery

Path MTU是路径上最小的MTU,路径MTU发现技术的原理是:发送的IP数据报的IP header里DF flag置位,如果中间的某个router在转发该IP数据报时,发现出口MTU小于该IP数据报的大小,则会丢弃该IP数据报,然后向发送者回送一个ICMP can't fragment error,该ICMP错误消息里携带了router的外出接口的MTU,接收端收到该ICMP错误消息后,使用ICMP消息里携带的MTU,重传一个合适大小的IP数据报,直到IP数据报到达目的地为止

前面第18个回帖介绍了linux上控制MTU发现机制的socket option。

Long Fat Pipes

capacity(bits) = bandwidth(bits/sec) * round-trip time(sec)

连接的容量也称为bandwidth-delay product,前面第43个回帖举了一个例子来理解bandwidth-delay product的概念

connection也称为pipe,这里的pipe是个一般概念,并不是指pipe系统调用

long fat networks:简称LFNs,指的是bandwidth-delay product比较大的网络,多大叫大?姑且认为大于65535 bytes吧,因为tcp header里的window size字段只有16位,最大值是65535,也就是说百兆以太网就属于LFN了

long fat pipe:在LFN上建立的连接称为long fat pipe

在LFN上运行tcp,无法充分利用LFNs的大容量,有如下问题:

1. 在LFN上,tcp header里的window size字段(16位),已经不够用,window scale option用来解决该问题
2. 在LFN上,如果在传输一个窗口数据的过程中有多个packets丢失,会导致整个pipeline里的数据被清空,大幅度降低吞吐量,SACK用来解决该问题
3. 在LFN上,由于窗口非常大,如果一个窗口只采样一次RTT,则这个RTT误差就很大,可能导致不必要的重传,timestamp option用来解决该问题
4. 在千兆LFN上,tcp header里的sequence number字段(32位),已经不够用,timestamp option用来解决该问题(4字节的timestamp相当于扩展了sequence number)

千兆网络里时延和带宽的关系

delay:时延,一个bit从一端传输到另一端所需的时间,由光速限制,是个固定值,无法优化
bandwidth:带宽,单位时间内可以注入到网络里的bit数,比如千兆网络的的带宽是1,000,000,000bits/sec

问题是,是否带宽越大,传输一定数据量所需的时间就越短?答案是肯定的,但是当带宽超过千兆时,增加带宽对传输时间的影响已经很小了,比如你花了两倍的钱将带宽提高到2,000,000,000bits/sec,传输时间可能只节省了10%,就很不划算了


Window Scale Option

Window Scale Option格式:
Screen Shot 2023-05-01 at 4.56.45 PM.png
使用了Window Scale Option的窗口大小是:window_size << shift_count
shift_count的取值范围为0-14

那么最大窗口是65535 << 14 = 1073725440 bytes,比1GB小点,65536 << 14 为1GB

Window Scale Option只能出现在SYN segment里

shift count是TCP模块根据receive buffer的大小自动计算的,应用程序通过套接字选项SO_RCVBUF设置receive buffer的大小,从而间接的指定了shift count的大小,SO_RCVBUF选项必须在调用connect函数或accept函数之前设置,因为Window Scale Option是在SYN segment里携带的,连接建立后再修改receive buffer的大小也无法告知对端了

观察Window Scale Option

代码: 全选

linux $ sock -v -R128000 macos 8888
SO_RCVBUF = 256000
connected on 192.168.0.6.58246 to 192.168.0.2.8888
TCP_MAXSEG = 1448
linux会double通过SO_RCVBUF选项设置的buffer size(参考man 7 socket)

代码: 全选

 0.000000 ( 0.000000) IP linux.58246 > macos.8888: flags [S], seq 2182126623:2182126623, win 65535, options [mss 1460,ts 1476741476 0,ws 1], length 0
 0.000352 ( 0.000352) IP macos.8888 > linux.58246: flags [SA], seq 1916769072:1916769072, ack 2182126624, win 65535, options [mss 1460,ws 6,ts 1506657680 1476741476], length 0
 0.000374 ( 0.000022) IP linux.58246 > macos.8888: flags [A], seq 2182126624:2182126624, ack 1916769073, win 32768, options [ts 1476741476 1506657680], length 0
 0.000480 ( 0.000106) IP macos.8888 > linux.58246: flags [A], seq 1916769073:1916769073, ack 2182126624, win 2058, options [ts 1506657680 1476741476], length 0
linux在建立连接时,通告的window size的值为65535,window scale的值为1,即通告的窗口大小为65535 << 1 = 131070,该值小于应用设置receive buffer的大小(SO_RCVBUF = 256000),如果window scale的值为2的话,65535 << 2 = 262140,则超过了应用设置receive buffer的大小(SO_RCVBUF = 256000)

书上的例子,4.3BSD通告的窗口大小都大于应用设置的receive buffer的大小,有点奇怪,可能是4.3BSD内核对receive buffer的大小进行了某种round操作

Timestamp Option


时间戳选项格式:
Screen Shot 2023-05-01 at 5.57.45 PM.png
时间戳选项存在于每个segment中,发送端填充timestamp value字段,接收端在发送确认段时将该值填充到timestamp echo reply字段,这样发送端每收到一个确认段都可以计算一个RTT值,不但提高了采样频率也提高了RTT的计算精度。

时间戳的值是单调递增的,每隔一段时间就增加1,RFC 1323推荐的时间间隔是1ms-1second之间,4.3BSD的具体实现是每500ms增加1

TCP并不是每收到一个data segment,就发送一个ack segment,可能一个ack segment确认多个data segment,那么这个ack segment里timestamp echo reply字段对应的是哪个data segment呢?

TCP为每个连接维护一个tsrecent变量和一个lastack变量:
tsrecent变量就是TCP在发送ack segment时填充到timestam echo reply字段的值
lastack变量是TCP期望收到的下一个data segment里第一个字节的序号
当TCP下次收到的data segment就是他期望的data segment时,就更新tsrecent变量为该data segment里的timestamp value,否则不更新tsrecent变量

由此可见上面问题的答案是:ack segment里timestam echo reply字段填充的是要确认的第一个data segment里的timestamp value


PAWS:Protection Against Wrapped Sequence Numbers

tcp header里的Sequence number字段是32位的,也就是4G大小,每发送4G字节的数据就会发生wrap

前面学习过ip数据报的最大生命期是MSL(Maximum Segment Lifetime),如果在这个时间内一个丢失的segment又重新出现,且它的Sequence number处于当前正在传输的窗口中,这个时候timestamp value的值较小的那个segment会被接收端丢弃,如果没有timestamp option的帮助,接收端会认为接受到了一个乱序的segment,并将其保存在receive buffer里

MSL一般是120s,而在一个千兆网络中,发送4G大小的数据大约需要34s,所以是可能发生上述的情况的


TCP Performance

TCP数据传输最大吞吐率计算(以百兆以太网为例,且忽略ACK):

一个full-sized data segment里携带的最大数据量是1460字节,再加上TCP header、IP header、Ethernet frame header等开销,一共1538字节(见图24.9)

那么实际的数据吞吐率为:

throughput = (1460 / 1538) * (100,000,000 / 8) = 11,866,059 bytes/sec

上述计算的是理论值,大概每秒11M多,现实中不可能达到理论值,一些现实中的限制如下:

1. 在一个路径中可能存在较慢的链路
2. 机器的内存带宽也是一个限制(比如将数据从用户空间拷贝到内核空间的吞吐率,对于现代机器,内存带宽不太可能成为限制)
3. 对端通告的窗口大小,以及RTT

关于第2点,可用下面的命令进行简单测试(从内核空间拷贝10GB数据到用户空间):

代码: 全选

linux $ time dd if=/dev/zero of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 0.52927 s, 19.8 GB/s

real	0m0.530s
user	0m0.013s
sys	0m0.517s
上述测试显示内存拷贝的带宽为19.8GB/s

关于第3点,因为:

bandwidth-delay product = bandwidth * RTT

bandwidth-delay product即为接收端的receive buffer,也即:接收端通告的窗口大小
所以bandwidth = advertised_win_size / RTT

假如:
advertised_win_size取最大值,即:65535 << 14 = 1,073,725,440
RTT为20ms

那么:
bandwidth = 1,073,725,440 / 0.02s = 53,686,272,000 bytes/sec,大约50GB/sec

所以TCP协议的window size也不大可能成为TCP吞吐率的限制

带宽实测

iperf3工具可以用来测试网络带宽

ubuntu上安装iperf3

代码: 全选

linux $ sudo apt install iperf3

在ubuntu18.04上执行iperf3 -s

代码: 全选

linux $ iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.0.5, port 49506
[  5] local 192.168.0.6 port 5201 connected to 192.168.0.5 port 49520
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   700 KBytes  5.73 Mbits/sec
[  5]   1.00-2.00   sec   897 KBytes  7.34 Mbits/sec
[  5]   2.00-3.00   sec   803 KBytes  6.57 Mbits/sec
[  5]   3.00-4.00   sec   723 KBytes  5.92 Mbits/sec
[  5]   4.00-5.00   sec   817 KBytes  6.70 Mbits/sec
[  5]   5.00-6.00   sec   963 KBytes  7.88 Mbits/sec
[  5]   6.00-7.00   sec   981 KBytes  8.04 Mbits/sec
[  5]   7.00-8.00   sec   833 KBytes  6.82 Mbits/sec
[  5]   8.00-9.00   sec   734 KBytes  6.01 Mbits/sec
[  5]   9.00-10.00  sec   826 KBytes  6.77 Mbits/sec
[  5]  10.00-10.09  sec   107 KBytes  9.95 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.09  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.09  sec  8.19 MBytes  6.81 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
在windows wsl2上执行iperf3 -c 192.168.0.6

代码: 全选

$ $ iperf3 -c 192.168.0.6
Connecting to host 192.168.0.6, port 5201
[  5] local 172.29.65.74 port 40662 connected to 192.168.0.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.05 MBytes  8.80 Mbits/sec    0   67.9 KBytes
[  5]   1.00-2.00   sec   851 KBytes  6.97 Mbits/sec    2   83.4 KBytes
[  5]   2.00-3.00   sec   912 KBytes  7.47 Mbits/sec    0   91.9 KBytes
[  5]   3.00-4.00   sec   912 KBytes  7.47 Mbits/sec    2   96.2 KBytes
[  5]   4.00-5.00   sec   730 KBytes  5.98 Mbits/sec    1    102 KBytes
[  5]   5.00-6.00   sec   912 KBytes  7.47 Mbits/sec    1    107 KBytes
[  5]   6.00-7.00   sec  1.13 MBytes  9.47 Mbits/sec    0    113 KBytes
[  5]   7.00-8.00   sec   547 KBytes  4.48 Mbits/sec    2    117 KBytes
[  5]   8.00-9.00   sec   851 KBytes  6.97 Mbits/sec    0    123 KBytes
[  5]   9.00-10.00  sec   973 KBytes  7.97 Mbits/sec    1    126 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  8.71 MBytes  7.31 Mbits/sec    9             sender
[  5]   0.00-10.00  sec  8.19 MBytes  6.87 Mbits/sec                  receiver

iperf Done.
回复