From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ma1-aaemail-dr-lapp03.apple.com (ma1-aaemail-dr-lapp03.apple.com [17.171.2.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 42D753B2A4; Thu, 28 Oct 2021 12:04:47 -0400 (EDT) Received: from pps.filterd (ma1-aaemail-dr-lapp03.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp03.apple.com (8.16.0.42/8.16.0.42) with SMTP id 19SFwEhb037877; Thu, 28 Oct 2021 09:04:39 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=20180706; bh=e0qm5UlR+F2p1s3UCpMxuNpgXyPXA0m3LMECwMHB61k=; b=R38cWadryuy8iw7PwcEwx7Ny2DQfnLdcryjVvhis7Z4nSfGF5PwU6+XEDhUUb0H5Z3FS pPuoTyc10yvcxeqY9QeNp9X55HMWq1eI0sbH+g1sqGF5G4BzXTZTjGwYvstARPRPBSc4 iMM8FfVfkRVyCYGR9eOatC5UjXwOybq9+/dVFrMxn6id2jnEPfFJijqfiBeP0uQxYujU T+Mtt4ruEzhYuKqJBTDtRQXL6syRnB3fZMYudxmkd6cW8S2tGbE0UkRlOuJ8NkEUwbaj LgHbAGH4sXuZu3BZqp/OUVQZtViBEwbtJ5sn4EcGnko8xaq90tZB3KtrX04t5AScqGsi hA== Received: from rn-mailsvcp-mta-lapp01.rno.apple.com (rn-mailsvcp-mta-lapp01.rno.apple.com [10.225.203.149]) by ma1-aaemail-dr-lapp03.apple.com with ESMTP id 3bx4hnqma4-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Thu, 28 Oct 2021 09:04:39 -0700 Received: from rn-mailsvcp-mmp-lapp04.rno.apple.com (rn-mailsvcp-mmp-lapp04.rno.apple.com [17.179.253.17]) by rn-mailsvcp-mta-lapp01.rno.apple.com (Oracle Communications Messaging Server 8.1.0.12.20210903 64bit (built Sep 3 2021)) with ESMTPS id <0R1P00Z2A3BP0N00@rn-mailsvcp-mta-lapp01.rno.apple.com>; Thu, 28 Oct 2021 09:04:37 -0700 (PDT) Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp04.rno.apple.com by rn-mailsvcp-mmp-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.12.20210903 64bit (built Sep 3 2021)) id <0R1P010003877R00@rn-mailsvcp-mmp-lapp04.rno.apple.com>; Thu, 28 Oct 2021 09:04:37 -0700 (PDT) X-Va-A: X-Va-T-CD: 095fc62c348ed25dc0c10f0adec69295 X-Va-E-CD: e6061580a21389ade45530f31e4ff121 X-Va-R-CD: 8bbe4467100e83bb797086b9816f35d5 X-Va-CD: 0 X-Va-ID: e3cd19bf-5425-4dd0-b105-ab7b09930492 X-V-A: X-V-T-CD: 095fc62c348ed25dc0c10f0adec69295 X-V-E-CD: e6061580a21389ade45530f31e4ff121 X-V-R-CD: 8bbe4467100e83bb797086b9816f35d5 X-V-CD: 0 X-V-ID: 6d827401-27ec-443f-8136-7cec8f78afd0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425, 18.0.790 definitions=2021-10-28_04:2021-10-26, 2021-10-28 signatures=0 Received: from smtpclient.apple ([17.192.155.152]) by rn-mailsvcp-mmp-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.12.20210903 64bit (built Sep 3 2021)) with ESMTPSA id <0R1P002D93BOAJ00@rn-mailsvcp-mmp-lapp04.rno.apple.com>; Thu, 28 Oct 2021 09:04:36 -0700 (PDT) Content-type: text/plain; charset=utf-8 MIME-version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency From: Christoph Paasch In-reply-to: Date: Thu, 28 Oct 2021 09:04:35 -0700 Cc: Eric Dumazet , Stuart Cheshire , Cake List , =?utf-8?Q?Valdis_Kl=C4=93tnieks?= , Make-Wifi-fast , "David P. Reed" , starlink@lists.bufferbloat.net, codel , cerowrt-devel , bloat , Steve Crocker , Vint Cerf Content-transfer-encoding: quoted-printable Message-id: References: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> To: Bob McMahon X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425, 18.0.790 definitions=2021-10-28_04:2021-10-26, 2021-10-28 signatures=0 X-Mailman-Approved-At: Thu, 28 Oct 2021 13:15:51 -0400 X-List-Received-Date: Thu, 28 Oct 2021 16:04:47 -0000 > On Oct 26, 2021, at 8:45 PM, Bob McMahon = wrote: >=20 > This is linux. The code flow is burst writes until the burst size, = take a timestamp, call select(), take second timestamp and insert time = delta into histogram, await clock_nanosleep() to schedule the next = burst. (actually, the deltas, inserts into the histogram and user i/o = are done in another thread, i.e. iperf 2's reporter thread.) > I still must be missing something. Does anything else need to be set = to reduce the skb size? Everything seems to be indicating 4K writes even = when gso_max_size is 2000 (I assume these are units of bytes?) There are = ten writes, ten reads and ten RTTs for the bursts. I don't see partial = writes at the app level.=20 One thing to keep in mind is that once the congestion-window increased = to > 40KB (your burst-size), all of the writes will not be blocking at = all. TCP_NOTSENT_LOWAT is really just about the "notsent" part. Once the = congestion-window is big enough to send 40KB in a burst, it will just = all be immediately sent out. > [root@localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000 > [root@localhost iperf2-code]# ip -d link sh dev eth1 > 9: eth1: mtu 1500 qdisc fq_codel = state UNKNOWN mode DEFAULT group default qlen 1000 > link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 = minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 = gso_max_size 2000 gso_max_segs 65535 > [root@localhost iperf2-code]# uname -r > 5.0.9-301.fc30.x86_64 >=20 > It looks like RTT is being driven by WiFi TXOPs as doubling the write = size increases the aggregation by two but has no significant effect on = the RTTs. >=20 > 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2 >=20 > 8k writes: tot_mpdus 317 tot_ampdus 107 mpduperampdu 3 >=20 > [root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 = --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=3D40K = --histograms > WARN: option of --burst-size without --burst-period defaults = --burst-period to 1 second > ------------------------------------------------------------ > Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 = (1 flows) > Write buffer size: 4096 Byte > Bursting: 40.0 KByte every 1.00 seconds > TCP window size: 85.0 KByte (default) > Event based writes (pending queue watermark at 4 bytes) > Enabled select histograms bin-width=3D0.100 ms, bins=3D10000 > ------------------------------------------------------------ > [ 1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 = port 5001 (MSS=3D1448) (prefetch=3D4) (trip-times) (sock=3D3) (ct=3D5.30 = ms) on 2021-10-26 20:25:29 (PDT) > [ ID] Interval Transfer Bandwidth Write/Err Rtry = Cwnd/RTT NetPwr > [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 = 14K/10091 us 4 > [ 1] 0.00-1.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 = (5.00/95.00/99.7%=3D1/52/52,Outliers=3D0,obl/obu=3D0/0) (5.121 = ms/1635305129.152339) Am I reading this correctly, that your writes take worst-case 5 = milli-seconds ? This looks correct then, because you seem to have an RTT of around 5ms. It's surprising though that your congestion-window is not increasing. Christoph > [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4990 us 8 > [ 1] 1.00-2.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,39:1,45:1,49:5,50:1 = (5.00/95.00/99.7%=3D1/50/50,Outliers=3D0,obl/obu=3D0/0) (4.991 = ms/1635305130.153330) > [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4904 us 8 > [ 1] 2.00-3.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,29:1,49:4,50:1,59:1,75:1 = (5.00/95.00/99.7%=3D1/75/75,Outliers=3D0,obl/obu=3D0/0) (7.455 = ms/1635305131.147353) > [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4964 us 8 > [ 1] 3.00-4.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,49:4,50:2,59:1,65:1 = (5.00/95.00/99.7%=3D1/65/65,Outliers=3D0,obl/obu=3D0/0) (6.460 = ms/1635305132.146338) > [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4970 us 8 > [ 1] 4.00-5.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:2,49:6,59:1,65:1 = (5.00/95.00/99.7%=3D1/65/65,Outliers=3D0,obl/obu=3D0/0) (6.404 = ms/1635305133.146335) > [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4986 us 8 > [ 1] 5.00-6.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,48:1,49:1,50:4,59:1,64:1 = (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.395 = ms/1635305134.146343) > [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5059 us 8 > [ 1] 6.00-7.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,39:1,49:3,50:2,60:1,85:1 = (5.00/95.00/99.7%=3D1/85/85,Outliers=3D0,obl/obu=3D0/0) (8.417 = ms/1635305135.148343) > [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5407 us 8 > [ 1] 7.00-8.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,40:1,49:4,50:1,59:1,75:1 = (5.00/95.00/99.7%=3D1/75/75,Outliers=3D0,obl/obu=3D0/0) (7.428 = ms/1635305136.147343) > [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5188 us 8 > [ 1] 8.00-9.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,40:1,49:3,50:3,64:1 = (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.388 = ms/1635305137.146284) > [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5306 us 8 > [ 1] 9.00-10.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,39:1,49:2,50:2,51:1,60:1,65:1 = (5.00/95.00/99.7%=3D1/65/65,Outliers=3D0,obl/obu=3D0/0) (6.422 = ms/1635305138.146316) > [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 = 14K/5939 us 7 > [ 1] 0.00-10.01 sec S8(f)-PDF: = bin(w=3D100us):cnt(100)=3D1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:= 33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1 = (5.00/95.00/99.7%=3D1/65/85,Outliers=3D0,obl/obu=3D0/0) (8.417 = ms/1635305135.148343) >=20 > [root@localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1 > ------------------------------------------------------------ > Server listening on TCP port 5001 with pid 6287 > Binding to local address 192.168.1.1 and iface eth1 > Read buffer size: 128 KByte (Dist bin width=3D16.0 KByte) > TCP window size: 128 KByte (default) > ------------------------------------------------------------ > [ 1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port = 45680 (MSS=3D1448) (burst-period=3D1.0000s) (trip-times) (sock=3D4) = (peer 2.1.4-master) on 2021-10-26 20:25:29 (PDT) > [ ID] Burst (start-end) Transfer Bandwidth XferTime (DC%) = Reads=3DDist NetPwr > [ 1] 0.0001-0.0500 sec 40.1 KBytes 6.59 Mbits/sec 49.848 ms (5%) = 12=3D12:0:0:0:0:0:0:0 0 > [ 1] 1.0002-1.0461 sec 40.0 KBytes 7.14 Mbits/sec 45.913 ms (4.6%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 2.0002-2.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.876 ms (4.9%) = 11=3D11:0:0:0:0:0:0:0 0 > [ 1] 3.0002-3.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.886 ms (5%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 4.0002-4.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.887 ms (5%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 5.0002-5.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.881 ms (5%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 6.0002-6.0511 sec 40.0 KBytes 6.44 Mbits/sec 50.895 ms (5.1%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 7.0002-7.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.889 ms (5%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 8.0002-8.0481 sec 40.0 KBytes 6.84 Mbits/sec 47.901 ms (4.8%) = 11=3D11:0:0:0:0:0:0:0 0 > [ 1] 9.0002-9.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.872 ms (4.9%) = 10=3D10:0:0:0:0:0:0:0 0 > [ 1] 0.0000-10.0031 sec 400 KBytes 328 Kbits/sec = 104=3D104:0:0:0:0:0:0:0 >=20 > Bob >=20 > On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet = wrote: >=20 >=20 > On 10/26/21 4:38 PM, Christoph Paasch wrote: > > Hi Bob, > >=20 > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon > wrote: > >> I'm confused. I don't see any blocking nor partial writes per the = write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The = burst is 40K, the write size is 4K and the watermark is 4 bytes. There = are ten writes per burst. > >=20 > > You are on Linux here, right? > >=20 > > AFAICS, Linux will still accept whatever fits in an skb. And that is = likely more than 4K (with GSO on by default). >=20 > This (max payload per skb) can be tuned at the driver level, at least = for experimental purposes or dedicated devices. >=20 > ip link set dev eth0 gso_max_size 8000 >=20 > To fetch current values : >=20 > ip -d link sh dev eth0 >=20 >=20 > >=20 > > However, do you go back to select() after each write() or do you = loop over the write() calls? > >=20 > >=20 > > Christoph > >=20 > >> The S8 histograms are the times waiting on the select(). The first = value is the bin number (multiplied by 100usec bin width) and second the = bin count. The worst case time is at the end and is timestamped per unix = epoch. > >> > >> The second run is over a controlled WiFi link where a 99.7% point = of 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first = is 1G wired and is in the 600 usec range. (No media arbitration there.) > >> > >> [root@localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times = -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=3D40K --histograms > >> WARN: option of --burst-size without --burst-period defaults = --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 = flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=3D0.100 ms, bins=3D10000 > >> ------------------------------------------------------------ > >> [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 = port 5001 (MSS=3D1448) (prefetch=3D4) (trip-times) (sock=3D3) (ct=3D0.54 = ms) on 2021-10-26 16:07:33 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry = Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 = 14K/5368 us 8 > >> [ 1] 0.00-1.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:1,2:5,3:2,4:1,11:1 = (5.00/95.00/99.7%=3D1/11/11,Outliers=3D0,obl/obu=3D0/0) (1.089 = ms/1635289653.928360) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/569 us 72 > >> [ 1] 1.00-2.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,2:1,3:4,4:1,7:1,8:1 = (5.00/95.00/99.7%=3D1/8/8,Outliers=3D0,obl/obu=3D0/0) (0.736 = ms/1635289654.928088) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/312 us 131 > >> [ 1] 2.00-3.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:3,2:2,3:2,5:2,6:1 = (5.00/95.00/99.7%=3D1/6/6,Outliers=3D0,obl/obu=3D0/0) (0.548 = ms/1635289655.927776) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/302 us 136 > >> [ 1] 3.00-4.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:2,2:2,3:5,6:1 = (5.00/95.00/99.7%=3D1/6/6,Outliers=3D0,obl/obu=3D0/0) (0.584 = ms/1635289656.927814) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/316 us 130 > >> [ 1] 4.00-5.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:3,3:2,4:2,5:2,6:1 = (5.00/95.00/99.7%=3D1/6/6,Outliers=3D0,obl/obu=3D0/0) (0.572 = ms/1635289657.927810) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/253 us 162 > >> [ 1] 5.00-6.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:3,2:2,3:4,5:1 = (5.00/95.00/99.7%=3D1/5/5,Outliers=3D0,obl/obu=3D0/0) (0.417 = ms/1635289658.927630) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/290 us 141 > >> [ 1] 6.00-7.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:3,3:3,4:3,6:1 = (5.00/95.00/99.7%=3D1/6/6,Outliers=3D0,obl/obu=3D0/0) (0.573 = ms/1635289659.927771) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/359 us 114 > >> [ 1] 7.00-8.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:2,3:4,4:3,6:1 = (5.00/95.00/99.7%=3D1/6/6,Outliers=3D0,obl/obu=3D0/0) (0.570 = ms/1635289660.927753) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/349 us 117 > >> [ 1] 8.00-9.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:3,3:5,4:1,7:1 = (5.00/95.00/99.7%=3D1/7/7,Outliers=3D0,obl/obu=3D0/0) (0.608 = ms/1635289661.927843) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/347 us 118 > >> [ 1] 9.00-10.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:3,2:1,3:5,8:1= (5.00/95.00/99.7%=3D1/8/8,Outliers=3D0,obl/obu=3D0/0) (0.725 = ms/1635289662.927861) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 = 14K/1519 us 27 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: = bin(w=3D100us):cnt(100)=3D1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 = (5.00/95.00/99.7%=3D1/7/11,Outliers=3D0,obl/obu=3D0/0) (1.089 = ms/1635289653.928360) > >> > >> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times = -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=3D40K --histograms > >> WARN: option of --burst-size without --burst-period defaults = --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 = flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=3D0.100 ms, bins=3D10000 > >> ------------------------------------------------------------ > >> [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 = port 5001 (MSS=3D1448) (prefetch=3D4) (trip-times) (sock=3D3) (ct=3D5.48 = ms) on 2021-10-26 16:07:56 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry = Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 = 14K/10339 us 4 > >> [ 1] 0.00-1.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:1,40:1,47:1,49:2,50:3,51:1,60:1 = (5.00/95.00/99.7%=3D1/60/60,Outliers=3D0,obl/obu=3D0/0) (5.990 = ms/1635289676.802143) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4853 us 8 > >> [ 1] 1.00-2.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 = (5.00/95.00/99.7%=3D1/60/60,Outliers=3D0,obl/obu=3D0/0) (5.937 = ms/1635289677.802274) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4991 us 8 > >> [ 1] 2.00-3.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,48:1,49:2,50:2,51:1,60:1,64:1 = (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.307 = ms/1635289678.794326) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/4610 us 9 > >> [ 1] 3.00-4.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,49:3,50:3,56:1,64:1 = (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.362 = ms/1635289679.794335) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5028 us 8 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=3D100us):cnt(10)=3D1:2,49:6,59:1,64= :1 (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.367 = ms/1635289680.794399) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5113 us 8 > >> [ 1] 5.00-6.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,49:3,50:2,58:1,60:1,65:1 = (5.00/95.00/99.7%=3D1/65/65,Outliers=3D0,obl/obu=3D0/0) (6.442 = ms/1635289681.794392) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5054 us 8 > >> [ 1] 6.00-7.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,39:1,49:3,51:1,60:2,64:1 = (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.374 = ms/1635289682.794335) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5138 us 8 > >> [ 1] 7.00-8.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,39:2,40:1,49:2,50:1,60:1,64:1 = (5.00/95.00/99.7%=3D1/64/64,Outliers=3D0,obl/obu=3D0/0) (6.396 = ms/1635289683.794338) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5329 us 8 > >> [ 1] 8.00-9.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,38:1,45:2,49:1,50:3,63:1 = (5.00/95.00/99.7%=3D1/63/63,Outliers=3D0,obl/obu=3D0/0) (6.292 = ms/1635289684.794262) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 = 14K/5329 us 8 > >> [ 1] 9.00-10.00 sec S8-PDF: = bin(w=3D100us):cnt(10)=3D1:2,39:1,49:3,50:3,84:1 = (5.00/95.00/99.7%=3D1/84/84,Outliers=3D0,obl/obu=3D0/0) (8.306 = ms/1635289685.796315) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 = 14K/6331 us 6 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: = bin(w=3D100us):cnt(100)=3D1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50= :17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 = (5.00/95.00/99.7%=3D1/64/84,Outliers=3D0,obl/obu=3D0/0) (8.306 = ms/1635289685.796315) > >> > >> Bob > >> > >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch = > wrote: > >> > >> Hello, > >> > >> > On Oct 25, 2021, at 9:24 PM, Eric Dumazet = > wrote: > >> > > >> > > >> > > >> > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >> >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast = > wrote: > >> >> > >> >>> Hi All, > >> >>> > >> >>> Sorry for the spam. I'm trying to support a meaningful TCP = message latency w/iperf 2 from the sender side w/o requiring e2e clock = synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to = help with this. It seems that this event goes off when the bytes are in = flight vs have reached the destination network stack. If that's the = case, then iperf 2 client (sender) may be able to produce the message = latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and = the sampled RTT. > >> >>> > >> >>> Does this seem reasonable? > >> >> > >> >> I=E2=80=99m not 100% sure what you=E2=80=99re asking, but I = will try to help. > >> >> > >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation = won=E2=80=99t report your endpoint as writable (e.g., via kqueue or = epoll) until less than that threshold of data remains unsent. It won=E2=80= =99t stop you writing more bytes if you want to, up to the socket send = buffer size, but it won=E2=80=99t *ask* you for more data until the = TCP_NOTSENT_LOWAT threshold is reached. > >> > > >> > > >> > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made = sure that sendmsg() would actually > >> > stop feeding more bytes in TCP transmit queue if the current = amount of unsent bytes > >> > was above the threshold. > >> > > >> > So it looks like Apple implementation is different, based on = your description ? > >> > >> Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > >> > >> An app can still fill the send-buffer if it does a sendmsg() = with a large buffer or does repeated calls to sendmsg(). > >> > >> Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to = quickly change the data it "scheduled" to send. And thus allow the app = to write the smallest "logical unit" it has. If that unit is 512KB = large, the app is allowed to send that. > >> For example, in case of video-streaming one may want to skip = ahead in the video. In that case the app still needs to transmit the = remaining parts of the previous frame anyways, before it can send the = new video frame. > >> That's the reason why the Apple implementation allows one to = write more than just the lowat threshold. > >> > >> > >> That being said, I do think that Linux's way allows for an = easier API because the app does not need to be careful at how much data = it sends after an epoll/kqueue wakeup. So, the latency-benefits will be = easier to get. > >> > >> > >> Christoph > >> > >> > >> > >> > [1] = https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3D= c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 = > >> > > >> > netperf does not use epoll(), but rather a loop over = sendmsg(). > >> > > >> > One of the point of TCP_NOTSENT_LOWAT for Google was to be = able to considerably increase > >> > max number of bytes in transmit queues (3rd column of = /proc/sys/net/ipv4/tcp_wmem) > >> > by 10x, allowing for autotune to increase BDP for big RTT = flows, this without > >> > increasing memory needs for flows with small RTT. > >> > > >> > In other words, the TCP implementation attempts to keep BDP = bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The = BDP of bytes in flight is necessary to fill the network pipe and get = good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go = is provided to give the source software some advance notice that the TCP = implementation will soon be looking for more bytes to send, so that the = buffer doesn=E2=80=99t run dry, thereby lowering throughput. (The old = SO_SNDBUF option conflates both =E2=80=9Cbytes in flight=E2=80=9D and = =E2=80=9Cbytes buffered and ready to go=E2=80=9D into the same number.) > >> >> > >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a = chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT = notification, that will tell you roughly how long it took n bytes to = depart the machine. You won=E2=80=99t know why, though. The bytes could = depart the machine in response for acks indicating that the same number = of bytes have been accepted at the receiver. But the bytes can also = depart the machine because CWND is growing. Of course, both of those = things are usually happening at the same time. > >> >> > >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >> >> > >> >> = > > >> >> > >> >> Later in the same video is a two-minute demo (time offset = 42:00 to time offset 44:00) showing a =E2=80=9Cbefore and after=E2=80=9D = demo illustrating the dramatic difference this makes for screen sharing = responsiveness. > >> >> > >> >> = > > >> >> > >> >> Stuart Cheshire > >> >> _______________________________________________ > >> >> Bloat mailing list > >> >> Bloat@lists.bufferbloat.net = > >> >> https://lists.bufferbloat.net/listinfo/bloat = > >> >> > >> > _______________________________________________ > >> > Bloat mailing list > >> > Bloat@lists.bufferbloat.net = > >> > https://lists.bufferbloat.net/listinfo/bloat = > >> > >> > >> This electronic communication and the information and any files = transmitted with it, or attached to it, are confidential and are = intended solely for the use of the individual or entity to whom it is = addressed and may contain information that is confidential, legally = privileged, protected by privacy laws, or otherwise restricted from = disclosure to anyone else. If you are not the intended recipient or the = person responsible for delivering the e-mail to the intended recipient, = you are hereby notified that any use, copying, distributing, = dissemination, forwarding, printing, or copying of this e-mail is = strictly prohibited. If you received this e-mail in error, please return = the e-mail to the sender, delete it from your computer, and destroy any = printed copy of it. >=20 > This electronic communication and the information and any files = transmitted with it, or attached to it, are confidential and are = intended solely for the use of the individual or entity to whom it is = addressed and may contain information that is confidential, legally = privileged, protected by privacy laws, or otherwise restricted from = disclosure to anyone else. If you are not the intended recipient or the = person responsible for delivering the e-mail to the intended recipient, = you are hereby notified that any use, copying, distributing, = dissemination, forwarding, printing, or copying of this e-mail is = strictly prohibited. If you received this e-mail in error, please return = the e-mail to the sender, delete it from your computer, and destroy any = printed copy of it.