
Web、Mail、Ftp、DNS、Proxy、VPN、Samba、LDAP 等基础网络服务
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-04-11 21:45

Congestion Avoidance Algorithm


1. timeout
2. receipt of duplicate ACKs



1. timeout 被认为是严重拥塞
2. receipt of duplicate ACKs 被认为拥塞不严重(重复的ACK是由接收端接收后续的segments触发的,由此可知不是那么拥堵)





当发生拥塞(无论是否严重)时,ssthresh设为当前窗口大小的一半,当前窗口大小是min(cwnd, advertised_wnd),这里的cwnd是指重置为1之前的旧值

当慢启动算法执行一段时候后,使得cwnd > ssthresh时,切换到措施2



帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-04-15 19:34

Fast Retransmit and Fast Recovery Algorithms

1. slow start mode
2. fast recovery mode
3. congestion avoidance mode
4. maximum throughput mode



Fast Retransmit Algorithm

TCP在收到每个乱序的segment时,会立即发送一个duplicate ACK,这个duplicate ACK告知对端自己期望的sequence number

Fast Retransmit Algorithm:TCP如果连续收到3个duplicate ACKs,那么TCP会立即重传该丢失的segment

我理解fast retransmit algorithm是fast recovery mode里的一个操作,如果这么理解的话,那么当TCP连续收到3个duplicate ACKs时,TCP会进入fast recovery mode

fast recovery mode

fast recovery mode的具体操作如下:

1. 当TCP收到第3个duplicate ACK时,设置ssthresh为当前有效窗口大小的一半,设置cwnd为ssthresh+3*MSS (参考书上的图21.7和图21.11的segment 62)
2. 重传丢失的segment
3. 之后每次收到一个duplicate ACK时,设置cwnd=cwnd+MSS (参考书上的图21.7和图21.11的segment 64、65、66、68、70)
4. 当收到第2步重传的那个segment的ACK时,设置cwnd=ssthresh,因为此时cwnd=ssthresh,且收到的是new data的ACK(非duplicate ACK),所以更新cwnd=cwnd+MSS,然后退出fast recovery mode,进入congestion avoidance mode (参考书上的图21.7和图21.11的segment 72)

第1步里的当前有效窗口大小指的是min(cwnd, addvertised_wnd)
第4步退出fast recovery mode前将cwnd重置为ssthresh(第1步设置的值ssthresh)的理由是,因为当前网络拥塞不是很严重,没必要把cwnd降的太低

congestion avoidance mode

当cwnd>ssthresh时,当接收到new data ACK时,cwnd的更新公式,如下:

代码: 全选

cwnd <- cwnd + segsize*segsize/cwnd + segsize/8
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-04-17 21:22

ICMP Errors

观察ICMP host unreachable错误

首先把linux主机设为路由器,前面说过linux作为路由器,在转发ip数据报时,如果查不到route,默认不会回送ICMP host unreachable错误消息
在路由表加一条拒绝的route,才会回送ICMP host unreachable错误消息

代码: 全选

linux $ sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'                 # 配置为路由器
linux $ sudo route add -net netmask metric 1024 reject       # 添加显式拒绝去往网络192.168.2.0的route
linux $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         UG    100    0        0 enp0s3     U     1000   0        0 enp0s3   U     100    0        0 enp0s3     -        !     1024   -        0 -

代码: 全选

macos $ sock 8888
connect() error: Network is unreachable
上面建立连接超时了,报告的错误是Network is unreachable,而不是Operation timed out


代码: 全选

20:51:25.505567 IP linux > macos: ICMP host unreachable, length 72
20:51:26.508039 IP linux > macos: ICMP host unreachable, length 72
20:51:27.509243 IP linux > macos: ICMP host unreachable, length 72
20:51:28.510379 IP linux > macos: ICMP host unreachable, length 72
20:51:32.513780 IP linux > macos: ICMP host unreachable, length 72
从上面的tcpdump输出,显示了路由器回送了5个ICMP host unreachable错误消息
虽然报告的ICMP错误是host unreachable,而不是network unreachable,但是macos返回给应用进程的errno却是ENETUNREACH



代码: 全选

macos $ sock linux 8888
hello there
line number 2                # 输入这行前拔掉linux主机的网线或用iptables丢弃掉目的端口为8888的segments
and 3

代码: 全选

 9.103102 ( 9.102827) IP macos.52015 > linux.8888: flags [PA], seq 3271102335:3271102347, ack 812886231, win 2058, options [ts 1352883248 3324909990], length 12
 9.103129 ( 0.000027) IP linux.8888 > macos.52015: flags [A], seq 812886231:812886231, ack 3271102347, win 509, options [ts 3324919093 1352883248], length 0

// 这里断开连接

21.337522 (12.234393) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895468 3324919093], length 14
21.488395 ( 0.150873) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895618 3324919093], length 14
21.789470 ( 0.301075) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352895918 3324919093], length 14
22.190727 ( 0.401257) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352896319 3324919093], length 14
22.791443 ( 0.600716) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352896919 3324919093], length 14
23.793449 ( 1.002006) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352897919 3324919093], length 14
25.596108 ( 1.802659) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352899719 3324919093], length 14
28.999289 ( 3.403181) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352903119 3324919093], length 14
35.623246 ( 6.623957) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352909719 3324919093], length 14
42.227430 ( 6.604184) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102361, ack 812886231, win 2058, options [ts 1352916319 3324919093], length 14

// 这里恢复连接

48.837282 ( 6.609852) IP macos.52015 > linux.8888: flags [PA], seq 3271102347:3271102367, ack 812886231, win 2058, options [ts 1352922920 3324919093], length 20
48.837307 ( 0.000025) IP linux.8888 > macos.52015: flags [A], seq 812886231:812886231, ack 3271102367, win 509, options [ts 3324958827 1352922920], length 0
前两行对应"hello there\n"共12个字节
中间的10行对应"line number 2\n"共14个字节
最后两行表明,TCP将"line number 2\n"和"and 3\n"共20个字节重新打包一起发送了

帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-04-23 21:14

第22章:TCP Persist Timer

如果接收端通告的窗口大小为0,那么接收端就不能发送数据了,为了防止接收端发送的窗口更新segment丢失,发送端会向接收端查询窗口大小,TCP通过启动一个称为Persist Timer的定时器来定时发送查询,这里Persist的含义是持之以恒、持久的意思,也就是说发送端永远不会放弃查询,直到接收端通告的窗口不为0为止

观察window probes

代码: 全选

 0.000000 ( 0.000000) IP linux.47154 > macos.5555: flags [S], seq 133017518:133017518, win 64240, options [mss 1460,ts 3354223538 0,ws 7], length 0
 0.000489 ( 0.000489) IP macos.5555 > linux.47154: flags [SA], seq 83378274:83378274, ack 133017519, win 33304, options [mss 1460,ws 3,ts 1408617785 3354223538], length 0
 0.000544 ( 0.000055) IP linux.47154 > macos.5555: flags [A], seq 133017519:133017519, ack 83378275, win 502, options [ts 3354223538 1408617785], length 0
 0.000661 ( 0.000117) IP linux.47154 > macos.5555: flags [PA], seq 133017519:133018543, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1024
 0.000706 ( 0.000045) IP linux.47154 > macos.5555: flags [A], seq 133018543:133019991, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1448
 0.000765 ( 0.000059) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133018543, win 4035, options [ts 1408617785 3354223538], length 0
 0.000783 ( 0.000018) IP linux.47154 > macos.5555: flags [PA], seq 133019991:133020591, ack 83378275, win 502, options [ts 3354223538 1408617785], length 600
 0.000796 ( 0.000013) IP linux.47154 > macos.5555: flags [A], seq 133020591:133022039, ack 83378275, win 502, options [ts 3354223538 1408617785], length 1448
 0.000806 ( 0.000010) IP linux.47154 > macos.5555: flags [PA], seq 133022039:133024935, ack 83378275, win 502, options [ts 3354223538 1408617785], length 2896
 0.000859 ( 0.000053) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133020591, win 5536, options [ts 1408617785 3354223538], length 0
 0.000867 ( 0.000008) IP linux.47154 > macos.5555: flags [PA], seq 133024935:133032175, ack 83378275, win 502, options [ts 3354223538 1408617785], length 7240
 0.000875 ( 0.000008) IP linux.47154 > macos.5555: flags [PA], seq 133032175:133032879, ack 83378275, win 502, options [ts 3354223539 1408617785], length 704
 0.000905 ( 0.000030) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133023487, win 5174, options [ts 1408617785 3354223538], length 0
 0.000905 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133024935, win 4993, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000052) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133027831, win 4631, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133030727, win 4269, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133032175, win 4088, options [ts 1408617785 3354223538], length 0
 0.000957 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133032879, win 4000, options [ts 1408617785 3354223539], length 0
 0.000977 ( 0.000020) IP linux.47154 > macos.5555: flags [PA], seq 133032879:133033903, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1024
 0.000987 ( 0.000010) IP linux.47154 > macos.5555: flags [A], seq 133033903:133035351, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1448
 0.001069 ( 0.000082) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133033903, win 3872, options [ts 1408617785 3354223539], length 0
 0.001083 ( 0.000014) IP linux.47154 > macos.5555: flags [PA], seq 133035351:133035951, ack 83378275, win 502, options [ts 3354223539 1408617785], length 600
 0.001098 ( 0.000015) IP linux.47154 > macos.5555: flags [A], seq 133035951:133037399, ack 83378275, win 502, options [ts 3354223539 1408617785], length 1448
 0.001105 ( 0.000007) IP linux.47154 > macos.5555: flags [PA], seq 133037399:133040295, ack 83378275, win 502, options [ts 3354223539 1408617785], length 2896
 0.001114 ( 0.000009) IP linux.47154 > macos.5555: flags [PA], seq 133040295:133047535, ack 83378275, win 502, options [ts 3354223539 1408617785], length 7240
 0.001129 ( 0.000015) IP linux.47154 > macos.5555: flags [PA], seq 133047535:133062015, ack 83378275, win 502, options [ts 3354223539 1408617785], length 14480
 0.002829 ( 0.001700) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133035951, win 3616, options [ts 1408617785 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133038847, win 3254, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133040295, win 3073, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133043191, win 2711, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133046087, win 2349, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133047535, win 2168, options [ts 1408617786 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133050431, win 1806, options [ts 1408617787 3354223539], length 0
 0.002829 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133053327, win 1444, options [ts 1408617787 3354223539], length 0
 0.002850 ( 0.000021) IP linux.47154 > macos.5555: flags [PA], seq 133062015:133064879, ack 83378275, win 502, options [ts 3354223540 1408617785], length 2864
 0.002906 ( 0.000056) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133056223, win 1082, options [ts 1408617787 3354223539], length 0
 0.002906 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133059119, win 720, options [ts 1408617787 3354223539], length 0
 0.002906 ( 0.000000) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133062015, win 358, options [ts 1408617787 3354223539], length 0
 0.003667 ( 0.000761) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408617787 3354223540], length 0
 // 下面是window probes
 0.211649 ( 0.207982) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354223749 1408617787], length 0
 0.212492 ( 0.000843) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408617993 3354223540], length 0
 0.653414 ( 0.440922) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354224191 1408617993], length 0
 0.653753 ( 0.000339) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408618432 3354223540], length 0
 1.484679 ( 0.830926) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354225022 1408618432], length 0
 1.484958 ( 0.000279) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408619260 3354223540], length 0
 3.139554 ( 1.654596) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354226677 1408619260], length 0
 3.140043 ( 0.000489) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408620912 3354223540], length 0
 6.446848 ( 3.306805) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354229984 1408620912], length 0
 6.447465 ( 0.000617) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408624210 3354223540], length 0
13.091919 ( 6.644454) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354236630 1408624210], length 0
13.092627 ( 0.000708) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408630835 3354223540], length 0
26.405423 (13.312796) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354249943 1408630835], length 0
26.405857 ( 0.000434) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408644125 3354223540], length 0
53.539589 (27.133732) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354277077 1408644125], length 0
53.540193 ( 0.000604) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408671224 3354223540], length 0
106.787819 (53.247626) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354330325 1408671224], length 0
106.788233 ( 0.000414) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408724333 3354223540], length 0
213.283748 (106.495515) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354436821 1408724333], length 0
213.283997 ( 0.000249) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408830564 3354223540], length 0
334.116210 (120.832213) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354557654 1408830564], length 0
334.116433 ( 0.000223) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1408951116 3354223540], length 0
454.947503 (120.831070) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354678485 1408951116], length 0
454.947731 ( 0.000228) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1409071707 3354223540], length 0
575.783983 (120.836252) IP linux.47154 > macos.5555: flags [A], seq 133064878:133064878, ack 83378275, win 502, options [ts 3354799322 1409071707], length 0
575.784589 ( 0.000606) IP macos.5555 > linux.47154: flags [A], seq 83378275:83378275, ack 133064879, win 0, options [ts 1409192339 3354223540], length 0
从上面的输出看,Persist Timer的时间间隔遵循指数退避算法:0.2、0.4、0.8、1.6 ... 120、120、120 ...
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-04-25 21:26

Silly Window Syndrome

Silly Window Syndrome:傻逼窗口综合征指的是发送端每次发送非常小的segment,原因有二:
1. 接收端每次通告一个很小的窗口
2. 发送端应用每次write很小的数据

为了避免Silly Window Syndrome,发送端和接收端都采取了一定措施

1. 接收端TCP模块绝不会通告小的窗口(0除外),小是指小于min(MSS, receive_buffer_size / 2),这里的MSS应该是指接收端通告给发送端的MSS
2. 如果接收端上一次通告的窗口大小为X,当发送端发送一个大小为Y的segment,如果X-Y>0,那么不论X-Y值多小,发送端都必须通告该值(参考图22.3的segment 13)

1. 如果send buffer里数据量大于等于MSS,则可以输出
2. 如果send buffer里数据量大于等于接收端曾经通告的最大窗口的一半,则可以输出
3. 如果启用了Nagle algorithm,只要没有未确认的segment,就可以输出
4. 如果禁用了Nagle algorithm,则可以输出

第3点和第4点,似乎还与接收端通告的窗口大小有关,如果对端通告的窗口小于min(MSS, receive_buffer_size / 2),则也不会立即发送,而是要等Persist timer超时才发送(参考图22.3的segment 14)


书上图22.3的segment 16和segment 17,还有segment 20表明:即使receive buffer的可用大小超过MSS,接收端也不会主动发送窗口更新,只有发送端发送window probe才通告可用的窗口大小。
只有receive buffer的可用大小超过receive_buffer_size/2时,接收端才主动发送窗口更新
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-04-27 20:26

第23章:TCP Keepalive Timer

学习完这一章的结论是不要使用TCP Keepalive Timer,而是在应用层使用心跳
帖子: 51
注册时间: 2023-02-26 9:59
系统: ubuntu

Re: 跟我一起学TCP/IP


帖子 723937936@qq.com » 2023-05-03 9:06

第24章:TCP Futures and Performance

Path MTU Discovery

Path MTU是路径上最小的MTU,路径MTU发现技术的原理是:发送的IP数据报的IP header里DF flag置位,如果中间的某个router在转发该IP数据报时,发现出口MTU小于该IP数据报的大小,则会丢弃该IP数据报,然后向发送者回送一个ICMP can't fragment error,该ICMP错误消息里携带了router的外出接口的MTU,接收端收到该ICMP错误消息后,使用ICMP消息里携带的MTU,重传一个合适大小的IP数据报,直到IP数据报到达目的地为止

前面第18个回帖介绍了linux上控制MTU发现机制的socket option。

Long Fat Pipes

capacity(bits) = bandwidth(bits/sec) * round-trip time(sec)

连接的容量也称为bandwidth-delay product,前面第43个回帖举了一个例子来理解bandwidth-delay product的概念


long fat networks:简称LFNs,指的是bandwidth-delay product比较大的网络,多大叫大?姑且认为大于65535 bytes吧,因为tcp header里的window size字段只有16位,最大值是65535,也就是说百兆以太网就属于LFN了

long fat pipe:在LFN上建立的连接称为long fat pipe


1. 在LFN上,tcp header里的window size字段(16位),已经不够用,window scale option用来解决该问题
2. 在LFN上,如果在传输一个窗口数据的过程中有多个packets丢失,会导致整个pipeline里的数据被清空,大幅度降低吞吐量,SACK用来解决该问题
3. 在LFN上,由于窗口非常大,如果一个窗口只采样一次RTT,则这个RTT误差就很大,可能导致不必要的重传,timestamp option用来解决该问题
4. 在千兆LFN上,tcp header里的sequence number字段(32位),已经不够用,timestamp option用来解决该问题(4字节的timestamp相当于扩展了sequence number)




Window Scale Option

Window Scale Option格式:
Screen Shot 2023-05-01 at 4.56.45 PM.png
使用了Window Scale Option的窗口大小是:window_size << shift_count

那么最大窗口是65535 << 14 = 1073725440 bytes,比1GB小点,65536 << 14 为1GB

Window Scale Option只能出现在SYN segment里

shift count是TCP模块根据receive buffer的大小自动计算的,应用程序通过套接字选项SO_RCVBUF设置receive buffer的大小,从而间接的指定了shift count的大小,SO_RCVBUF选项必须在调用connect函数或accept函数之前设置,因为Window Scale Option是在SYN segment里携带的,连接建立后再修改receive buffer的大小也无法告知对端了

观察Window Scale Option

代码: 全选

linux $ sock -v -R128000 macos 8888
SO_RCVBUF = 256000
connected on to
linux会double通过SO_RCVBUF选项设置的buffer size(参考man 7 socket)

代码: 全选

 0.000000 ( 0.000000) IP linux.58246 > macos.8888: flags [S], seq 2182126623:2182126623, win 65535, options [mss 1460,ts 1476741476 0,ws 1], length 0
 0.000352 ( 0.000352) IP macos.8888 > linux.58246: flags [SA], seq 1916769072:1916769072, ack 2182126624, win 65535, options [mss 1460,ws 6,ts 1506657680 1476741476], length 0
 0.000374 ( 0.000022) IP linux.58246 > macos.8888: flags [A], seq 2182126624:2182126624, ack 1916769073, win 32768, options [ts 1476741476 1506657680], length 0
 0.000480 ( 0.000106) IP macos.8888 > linux.58246: flags [A], seq 1916769073:1916769073, ack 2182126624, win 2058, options [ts 1506657680 1476741476], length 0
linux在建立连接时,通告的window size的值为65535,window scale的值为1,即通告的窗口大小为65535 << 1 = 131070,该值小于应用设置receive buffer的大小(SO_RCVBUF = 256000),如果window scale的值为2的话,65535 << 2 = 262140,则超过了应用设置receive buffer的大小(SO_RCVBUF = 256000)

书上的例子,4.3BSD通告的窗口大小都大于应用设置的receive buffer的大小,有点奇怪,可能是4.3BSD内核对receive buffer的大小进行了某种round操作

Timestamp Option

Screen Shot 2023-05-01 at 5.57.45 PM.png
时间戳选项存在于每个segment中,发送端填充timestamp value字段,接收端在发送确认段时将该值填充到timestamp echo reply字段,这样发送端每收到一个确认段都可以计算一个RTT值,不但提高了采样频率也提高了RTT的计算精度。

时间戳的值是单调递增的,每隔一段时间就增加1,RFC 1323推荐的时间间隔是1ms-1second之间,4.3BSD的具体实现是每500ms增加1

TCP并不是每收到一个data segment,就发送一个ack segment,可能一个ack segment确认多个data segment,那么这个ack segment里timestamp echo reply字段对应的是哪个data segment呢?

tsrecent变量就是TCP在发送ack segment时填充到timestam echo reply字段的值
lastack变量是TCP期望收到的下一个data segment里第一个字节的序号
当TCP下次收到的data segment就是他期望的data segment时,就更新tsrecent变量为该data segment里的timestamp value,否则不更新tsrecent变量

由此可见上面问题的答案是:ack segment里timestam echo reply字段填充的是要确认的第一个data segment里的timestamp value

PAWS:Protection Against Wrapped Sequence Numbers

tcp header里的Sequence number字段是32位的,也就是4G大小,每发送4G字节的数据就会发生wrap

前面学习过ip数据报的最大生命期是MSL(Maximum Segment Lifetime),如果在这个时间内一个丢失的segment又重新出现,且它的Sequence number处于当前正在传输的窗口中,这个时候timestamp value的值较小的那个segment会被接收端丢弃,如果没有timestamp option的帮助,接收端会认为接受到了一个乱序的segment,并将其保存在receive buffer里


TCP Performance


一个full-sized data segment里携带的最大数据量是1460字节,再加上TCP header、IP header、Ethernet frame header等开销,一共1538字节(见图24.9)


throughput = (1460 / 1538) * (100,000,000 / 8) = 11,866,059 bytes/sec


1. 在一个路径中可能存在较慢的链路
2. 机器的内存带宽也是一个限制(比如将数据从用户空间拷贝到内核空间的吞吐率,对于现代机器,内存带宽不太可能成为限制)
3. 对端通告的窗口大小,以及RTT


代码: 全选

linux $ time dd if=/dev/zero of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 0.52927 s, 19.8 GB/s

real	0m0.530s
user	0m0.013s
sys	0m0.517s


bandwidth-delay product = bandwidth * RTT

bandwidth-delay product即为接收端的receive buffer,也即:接收端通告的窗口大小
所以bandwidth = advertised_win_size / RTT

advertised_win_size取最大值,即:65535 << 14 = 1,073,725,440

bandwidth = 1,073,725,440 / 0.02s = 53,686,272,000 bytes/sec,大约50GB/sec

所以TCP协议的window size也不大可能成为TCP吞吐率的限制




代码: 全选

linux $ sudo apt install iperf3

在ubuntu18.04上执行iperf3 -s

代码: 全选

linux $ iperf3 -s
Server listening on 5201
Accepted connection from, port 49506
[  5] local port 5201 connected to port 49520
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   700 KBytes  5.73 Mbits/sec
[  5]   1.00-2.00   sec   897 KBytes  7.34 Mbits/sec
[  5]   2.00-3.00   sec   803 KBytes  6.57 Mbits/sec
[  5]   3.00-4.00   sec   723 KBytes  5.92 Mbits/sec
[  5]   4.00-5.00   sec   817 KBytes  6.70 Mbits/sec
[  5]   5.00-6.00   sec   963 KBytes  7.88 Mbits/sec
[  5]   6.00-7.00   sec   981 KBytes  8.04 Mbits/sec
[  5]   7.00-8.00   sec   833 KBytes  6.82 Mbits/sec
[  5]   8.00-9.00   sec   734 KBytes  6.01 Mbits/sec
[  5]   9.00-10.00  sec   826 KBytes  6.77 Mbits/sec
[  5]  10.00-10.09  sec   107 KBytes  9.95 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.09  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.09  sec  8.19 MBytes  6.81 Mbits/sec                  receiver
Server listening on 5201
在windows wsl2上执行iperf3 -c

代码: 全选

$ $ iperf3 -c
Connecting to host, port 5201
[  5] local port 40662 connected to port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.05 MBytes  8.80 Mbits/sec    0   67.9 KBytes
[  5]   1.00-2.00   sec   851 KBytes  6.97 Mbits/sec    2   83.4 KBytes
[  5]   2.00-3.00   sec   912 KBytes  7.47 Mbits/sec    0   91.9 KBytes
[  5]   3.00-4.00   sec   912 KBytes  7.47 Mbits/sec    2   96.2 KBytes
[  5]   4.00-5.00   sec   730 KBytes  5.98 Mbits/sec    1    102 KBytes
[  5]   5.00-6.00   sec   912 KBytes  7.47 Mbits/sec    1    107 KBytes
[  5]   6.00-7.00   sec  1.13 MBytes  9.47 Mbits/sec    0    113 KBytes
[  5]   7.00-8.00   sec   547 KBytes  4.48 Mbits/sec    2    117 KBytes
[  5]   8.00-9.00   sec   851 KBytes  6.97 Mbits/sec    0    123 KBytes
[  5]   9.00-10.00  sec   973 KBytes  7.97 Mbits/sec    1    126 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  8.71 MBytes  7.31 Mbits/sec    9             sender
[  5]   0.00-10.00  sec  8.19 MBytes  6.87 Mbits/sec                  receiver

iperf Done.