From dave.taht at gmail.com Mon Oct 4 12:54:35 2021 From: dave.taht at gmail.com (Dave Taht) Date: Mon, 4 Oct 2021 09:54:35 -0700 Subject: [Cerowrt-devel] resuming the right to repair fight in particular In-Reply-To: References: Message-ID: On Sun, Jul 25, 2021 at 7:48 AM Dave Taht wrote: > > Early on in the FLOSS podcast ( > https://twit.tv/shows/floss-weekly/episodes/638?autostart=false ) I > harped on what is basically my biggest issue with the world of IoT - > home routers only being a tiny subset - being able to fix the stuff > you bought, and KNOWING that the stuff you bought isn't going to > betray you. The cell phone universe is about as well handled in this > department as seems feasible I take it back. https://www.vice.com/en/article/z3xpm8/company-that-routes-billions-of-text-messages-quietly-says-it-was-hacked >, but the rest... ugh! > > I know our lists are mostly technically oriented but does anyone know > of a site, a forum, a slack channel, a linked in group, a faceboook > group, some legal advisory group... somewhere??, where I, at least, > could vent in something in a productive direction? I'm very happy to > finally be in BITAG but that's just about lag. > > I often look back on our 2015 fcc fight with remorse, as we didn't > have enough capital to capitalize on it, and I just went back to > finishing up our research. We knocked 'em down FLAT with that one > broadside but nobody read the filing itself, just the press release, > and the vogons got up again, like a tarbaby, and resumed bad > governance of the future as usual. > > For the record, if you haven't read: > > http://fqcodel.bufferbloat.net/~d/fcc_saner_software_practices.pdf > > Our proposal buried on page 12: > > 1. Any vendor of SDR, wireless, or Wi­Fi radio must make public the > full and maintained source > code for the device driver and radio firmware in order to maintain FCC > compliance. The source > code should be in a buildable, change controlled source code > repository on the Internet, > available for review and improvement by all. > > 2. The vendor must assure that secure update of firmware be working at > shipment, and that update streams be under ultimate control of the > owner of the equipment. Problems with compliance can then be fixed > going forward by the person legally responsible for the router being > in compliance. > > 3. The vendor must supply a continuous stream of source and binary > updates that must respond to regulatory transgressions and Common > Vulnerability and Exposure reports (CVEs) within 45 > days of disclosure, for the warranted lifetime of the product, the > business lifetime of the vendor, > or until five years after the last customer shipment, whichever is longer. > > 4. Failure to comply with these regulations should result in FCC > decertification of the existing > product and, in severe cases, bar new products from that vendor from > being considered for > certification. > > 5. Additionally, we ask the FCC to review and rescind any rules for > anything that conflict with > open source best practices, produce unmaintainable hardware, or cause > vendors to believe they > must only ship undocumented “binary blobs” of compiled code or use > lockdown mechanisms > that forbid user patching. This is an ongoing problem for the Internet > community committed to > best practice change control and error correction on safety­ critical systems > > > -- > Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw > > Dave Täht CEO, TekLibre, LLC -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC From klukonin at gmail.com Mon Oct 4 14:57:18 2021 From: klukonin at gmail.com (=?UTF-8?B?0JrQuNGA0LjQu9C7INCb0YPQutC+0L3QuNC9?=) Date: Mon, 4 Oct 2021 21:57:18 +0300 Subject: [Cerowrt-devel] [Make-wifi-fast] resuming the right to repair fight in particular In-Reply-To: References: Message-ID: There is a big another issue. Even if we work with softMAC driver, it can ignore or override wireless-regdb. For example, iwlfifi could advertise, but ignore MCS 9-11 according to LAR decision. More then that, LAR can spread it's logic and disable some channels globally for all systems. That's why, for example, if somebody use both QCA and Intel Wi-Fi modules, the last one will force right use it's settings through cfg80211 API. So there is a huge issue in kernel restrictions and rules, I think. Where wireless API is mostly developed by... Intel. More transparency and fairness required there, I think. No kernel module should be able to become a dictator. What about a certification process, well, I saw the dark side. Certification firmwares are very-very-very far from regular firmwares. That's how it works today. FCC can certify a product, but then it's firmware cold be modified without any control from FCC. FCC tries to do the best they can with AFC (I think). But today there are no any mechanisms to 1) Force vendors to use regular firmware for certification 2) Certify firmware updates. We'll have to invent something about that. Best regards, Kirill Lukonin пн, 4 окт. 2021 г., 19:54 Dave Taht : > On Sun, Jul 25, 2021 at 7:48 AM Dave Taht wrote: > > > > Early on in the FLOSS podcast ( > > https://twit.tv/shows/floss-weekly/episodes/638?autostart=false ) I > > harped on what is basically my biggest issue with the world of IoT - > > home routers only being a tiny subset - being able to fix the stuff > > you bought, and KNOWING that the stuff you bought isn't going to > > betray you. The cell phone universe is about as well handled in this > > department as seems feasible > > I take it back. > > > https://www.vice.com/en/article/z3xpm8/company-that-routes-billions-of-text-messages-quietly-says-it-was-hacked > > >, but the rest... ugh! > > > > I know our lists are mostly technically oriented but does anyone know > > of a site, a forum, a slack channel, a linked in group, a faceboook > > group, some legal advisory group... somewhere??, where I, at least, > > could vent in something in a productive direction? I'm very happy to > > finally be in BITAG but that's just about lag. > > > > I often look back on our 2015 fcc fight with remorse, as we didn't > > have enough capital to capitalize on it, and I just went back to > > finishing up our research. We knocked 'em down FLAT with that one > > broadside but nobody read the filing itself, just the press release, > > and the vogons got up again, like a tarbaby, and resumed bad > > governance of the future as usual. > > > > For the record, if you haven't read: > > > > http://fqcodel.bufferbloat.net/~d/fcc_saner_software_practices.pdf > > > > Our proposal buried on page 12: > > > > 1. Any vendor of SDR, wireless, or Wi­Fi radio must make public the > > full and maintained source > > code for the device driver and radio firmware in order to maintain FCC > > compliance. The source > > code should be in a buildable, change controlled source code > > repository on the Internet, > > available for review and improvement by all. > > > > 2. The vendor must assure that secure update of firmware be working at > > shipment, and that update streams be under ultimate control of the > > owner of the equipment. Problems with compliance can then be fixed > > going forward by the person legally responsible for the router being > > in compliance. > > > > 3. The vendor must supply a continuous stream of source and binary > > updates that must respond to regulatory transgressions and Common > > Vulnerability and Exposure reports (CVEs) within 45 > > days of disclosure, for the warranted lifetime of the product, the > > business lifetime of the vendor, > > or until five years after the last customer shipment, whichever is > longer. > > > > 4. Failure to comply with these regulations should result in FCC > > decertification of the existing > > product and, in severe cases, bar new products from that vendor from > > being considered for > > certification. > > > > 5. Additionally, we ask the FCC to review and rescind any rules for > > anything that conflict with > > open source best practices, produce unmaintainable hardware, or cause > > vendors to believe they > > must only ship undocumented “binary blobs” of compiled code or use > > lockdown mechanisms > > that forbid user patching. This is an ongoing problem for the Internet > > community committed to > > best practice change control and error correction on safety­ critical > systems > > > > > > -- > > Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw > > > > Dave Täht CEO, TekLibre, LLC > > > > -- > Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw > > Dave Täht CEO, TekLibre, LLC > _______________________________________________ > Make-wifi-fast mailing list > Make-wifi-fast at lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/make-wifi-fast -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.taht at gmail.com Wed Oct 20 17:51:17 2021 From: dave.taht at gmail.com (Dave Taht) Date: Wed, 20 Oct 2021 14:51:17 -0700 Subject: [Cerowrt-devel] the smart router... Message-ID: https://consult.red/smart-router-the-gateway-to-next-gen-services/ prplwrt gets a bit of play here. -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC From bob.mcmahon at broadcom.com Thu Oct 21 20:51:42 2021 From: bob.mcmahon at broadcom.com (Bob McMahon) Date: Thu, 21 Oct 2021 17:51:42 -0700 Subject: TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: <1632680642.869711321@apps.rackspace.com> References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: Hi All, Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. Does this seem reasonable? Below are some sample outputs of a 10G wired sending to a 1G wired. These systems do have e2e clock sync so the server side message latency is correct. The RTT + Drain does approximately equal the server side e2e msg latency First with BBR [root at ryzen3950 iperf2-code]# iperf -c 192.168.1.156 -i 1 -e --tcp-drain --realtime -Z bbr --trip-times -l 1M ------------------------------------------------------------ Client connecting to 192.168.1.156, TCP port 5001 with pid 206299 (1 flows) Write buffer size: 1048576 Byte (drain-enabled) TCP congestion control set to bbr TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.133%enp4s0 port 60684 connected with 192.168.1.156 port 5001 (MSS=1448) (trip-times) (sock=3) (ct=0.26 ms) on 2021-10-21 17:44:10 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr Drain avg/min/max/stdev (cnt) [ 1] 0.00-1.00 sec 112 MBytes 940 Mbits/sec 113/0 0 263K/1906 us 61616 8.947/8.322/13.465/0.478 ms (112) [ 1] 1.00-2.00 sec 112 MBytes 940 Mbits/sec 112/0 0 260K/1987 us 59104 8.911/8.229/9.569/0.229 ms (112) [ 1] 2.00-3.00 sec 113 MBytes 948 Mbits/sec 113/0 0 254K/2087 us 56775 8.910/8.311/9.564/0.221 ms (113) [ 1] 3.00-4.00 sec 112 MBytes 940 Mbits/sec 112/0 0 263K/1710 us 68679 8.911/8.297/9.618/0.217 ms (112) [ 1] 4.00-5.00 sec 112 MBytes 940 Mbits/sec 112/0 0 254K/2024 us 58024 8.907/8.470/9.641/0.197 ms (112) [ 1] 5.00-6.00 sec 112 MBytes 940 Mbits/sec 112/0 0 263K/2124 us 55292 8.911/8.291/9.325/0.198 ms (112) [ 1] 6.00-7.00 sec 113 MBytes 948 Mbits/sec 113/0 0 265K/2012 us 58891 8.913/8.226/9.569/0.229 ms (113) [ 1] 7.00-8.00 sec 112 MBytes 940 Mbits/sec 112/0 0 265K/1989 us 59045 8.908/8.313/9.366/0.194 ms (112) [ 1] 8.00-9.00 sec 112 MBytes 940 Mbits/sec 112/0 0 263K/1999 us 58750 8.908/8.212/9.402/0.211 ms (112) [ 1] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 112/0 0 5K/242 us 485291 8.947/8.319/12.754/0.414 ms (112) [ 1] 0.00-10.06 sec 1.10 GBytes 937 Mbits/sec 1125/0 0 5K/242 us 483764 8.950/8.212/45.293/1.120 ms (1123) [root at localhost rjmcmahon]# iperf -s -e -B 192.168.1.156%enp4s0f0 -i 1 --realtime ------------------------------------------------------------ Server listening on TCP port 5001 with pid 53099 Binding to local address 192.168.1.156 and iface enp4s0f0 Read buffer size: 128 KByte (Dist bin width=16.0 KByte) TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.156%enp4s0f0 port 5001 connected with 192.168.1.133 port 60684 (MSS=1448) (trip-times) (sock=4) (peer 2.1.4-master) on 2021-10-21 20:44:10 (EDT) [ ID] Interval Transfer Bandwidth Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr Reads=Dist [ 1] 0.00-1.00 sec 112 MBytes 936 Mbits/sec 10.629/9.890/14.998/1.507 ms (111/1053964) 1.20 MByte 11007 4347=412:3927:7:0:1:0:0:0 [ 1] 1.00-2.00 sec 112 MBytes 942 Mbits/sec 10.449/9.736/10.740/0.237 ms (112/1050799) 1.18 MByte 11263 4403=465:3938:0:0:0:0:0:0 [ 1] 2.00-3.00 sec 112 MBytes 942 Mbits/sec 10.426/9.873/10.698/0.246 ms (113/1041489) 1.16 MByte 11288 4382=420:3962:0:0:0:0:0:0 [ 1] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 10.485/9.724/10.716/0.208 ms (112/1050541) 1.18 MByte 11221 4393=446:3946:1:0:0:0:0:0 [ 1] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 10.487/9.902/10.736/0.216 ms (112/1050786) 1.18 MByte 11222 4392=448:3944:0:0:0:0:0:0 [ 1] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 10.484/9.758/10.748/0.236 ms (112/1050799) 1.18 MByte 11226 4397=456:3940:0:1:0:0:0:0 [ 1] 6.00-7.00 sec 112 MBytes 941 Mbits/sec 10.475/9.756/10.753/0.248 ms (112/1050515) 1.18 MByte 11232 4403=473:3930:0:0:0:0:0:0 [ 1] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 10.435/9.759/10.757/0.288 ms (113/1041502) 1.16 MByte 11278 4414=480:3934:0:0:0:0:0:0 [ 1] 8.00-9.00 sec 112 MBytes 942 Mbits/sec 10.485/9.762/10.759/0.277 ms (112/1050799) 1.18 MByte 11225 4409=470:3939:0:0:0:0:0:0 [ 1] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 10.550/10.000/10.759/0.191 ms (112/1050786) 1.19 MByte 11155 4399=455:3944:0:0:0:0:0:0 [ 1] 0.00-10.05 sec 1.10 GBytes 937 Mbits/sec 10.524/9.724/45.519/1.173 ms (1123/1048576) 1.18 MByte 11132 44149=4725:39414:8:1:1:0:0:0 Now with CUBIC [root at ryzen3950 iperf2-code]# iperf -c 192.168.1.156 -i 1 -e --tcp-drain --realtime -Z cubic --trip-times -l 1M ------------------------------------------------------------ Client connecting to 192.168.1.156, TCP port 5001 with pid 206487 (1 flows) Write buffer size: 1048576 Byte (drain-enabled) TCP congestion control set to cubic TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.133%enp4s0 port 60686 connected with 192.168.1.156 port 5001 (MSS=1448) (trip-times) (sock=3) (ct=0.49 ms) on 2021-10-21 17:47:02 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr Drain avg/min/max/stdev (cnt) [ 1] 0.00-1.00 sec 113 MBytes 948 Mbits/sec 114/0 66 1527K/13168 us 8998 8.855/4.757/15.949/0.995 ms (113) [ 1] 1.00-2.00 sec 113 MBytes 948 Mbits/sec 113/0 0 1668K/14380 us 8240 8.899/8.450/9.425/0.270 ms (113) [ 1] 2.00-3.00 sec 112 MBytes 940 Mbits/sec 112/0 0 1781K/15335 us 7658 8.904/8.446/9.314/0.258 ms (112) [ 1] 3.00-4.00 sec 112 MBytes 940 Mbits/sec 112/0 0 1867K/16127 us 7282 8.900/8.570/9.313/0.252 ms (112) [ 1] 4.00-5.00 sec 113 MBytes 948 Mbits/sec 113/0 0 1931K/16537 us 7165 8.908/8.330/9.431/0.290 ms (113) [ 1] 5.00-6.00 sec 111 MBytes 931 Mbits/sec 111/0 1 1439K/12342 us 9431 8.945/4.303/18.970/1.091 ms (111) [ 1] 6.00-7.00 sec 113 MBytes 948 Mbits/sec 113/0 0 1515K/12845 us 9225 8.904/8.451/9.432/0.298 ms (113) [ 1] 7.00-8.00 sec 112 MBytes 940 Mbits/sec 112/0 0 1569K/13353 us 8795 8.907/8.569/9.314/0.283 ms (112) [ 1] 8.00-9.00 sec 112 MBytes 940 Mbits/sec 112/0 0 1606K/13718 us 8561 8.909/8.571/9.312/0.275 ms (112) [ 1] 9.00-10.00 sec 113 MBytes 948 Mbits/sec 113/0 0 1630K/13930 us 8506 8.906/8.569/9.316/0.298 ms (113) [ 1] 0.00-10.04 sec 1.10 GBytes 940 Mbits/sec 1127/0 67 1630K/13930 us 8431 8.904/4.303/18.970/0.526 ms (1125) [root at localhost rjmcmahon]# iperf -s -e -B 192.168.1.156%enp4s0f0 -i 1 --realtime ------------------------------------------------------------ Server listening on TCP port 5001 with pid 53121 Binding to local address 192.168.1.156 and iface enp4s0f0 Read buffer size: 128 KByte (Dist bin width=16.0 KByte) TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.156%enp4s0f0 port 5001 connected with 192.168.1.133 port 60686 (MSS=1448) (trip-times) (sock=4) (peer 2.1.4-master) on 2021-10-21 20:47:02 (EDT) [ ID] Interval Transfer Bandwidth Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr Reads=Dist [ 1] 0.00-1.00 sec 111 MBytes 935 Mbits/sec 20.327/10.445/39.920/4.341 ms (111/1053090) 2.33 MByte 5751 4344=521:3791:7:2:1:9:0:11 [ 1] 1.00-2.00 sec 112 MBytes 942 Mbits/sec 22.492/21.768/23.254/0.397 ms (112/1050799) 2.53 MByte 5233 4487=594:3893:0:0:0:0:0:0 [ 1] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 23.624/22.987/24.248/0.327 ms (112/1050502) 2.66 MByte 4980 4462=548:3912:1:1:0:0:0:0 [ 1] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 24.475/23.741/24.971/0.287 ms (113/1041476) 2.73 MByte 4808 4483=575:3908:0:0:0:0:0:0 [ 1] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 25.146/24.597/25.459/0.254 ms (112/1050799) 2.83 MByte 4680 4523=642:3880:0:1:0:0:0:0 [ 1] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 21.592/15.549/36.567/2.358 ms (112/1050786) 2.42 MByte 5450 4373=489:3868:0:1:0:0:1:12 [ 1] 6.00-7.00 sec 112 MBytes 941 Mbits/sec 21.447/20.800/22.024/0.275 ms (112/1050528) 2.41 MByte 5486 4464=559:3904:0:1:0:0:0:0 [ 1] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 22.021/21.536/22.519/0.216 ms (113/1041502) 2.46 MByte 5344 4475=557:3918:0:0:0:0:0:0 [ 1] 8.00-9.00 sec 112 MBytes 942 Mbits/sec 22.445/22.023/22.774/0.209 ms (112/1050799) 2.53 MByte 5243 4407=474:3932:0:1:0:0:0:0 [ 1] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 22.680/22.269/23.024/0.184 ms (112/1050541) 2.55 MByte 5188 4511=635:3875:1:0:0:0:0:0 [ 1] 0.00-10.03 sec 1.10 GBytes 941 Mbits/sec 22.629/10.445/39.920/2.083 ms (1125/1048576) 2.54 MByte 5197 44659=5598:39007:9:7:1:9:1:23 Thanks, Bob -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.taht at gmail.com Mon Oct 25 11:55:56 2021 From: dave.taht at gmail.com (Dave Taht) Date: Mon, 25 Oct 2021 08:55:56 -0700 Subject: [Cerowrt-devel] why I loved our community Message-ID: I liked the analysis of the bug reporters here: https://twitter.com/kernellogger/status/1452635250329362440 -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC From eric.dumazet at gmail.com Tue Oct 26 00:24:00 2021 From: eric.dumazet at gmail.com (Eric Dumazet) Date: Mon, 25 Oct 2021 21:24:00 -0700 Subject: [Cerowrt-devel] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast wrote: > >> Hi All, >> >> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. >> >> Does this seem reasonable? > > I’m not 100% sure what you’re asking, but I will try to help. > > When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes was above the threshold. So it looks like Apple implementation is different, based on your description ? [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 netperf does not use epoll(), but rather a loop over sendmsg(). One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) by 10x, allowing for autotune to increase BDP for big RTT flows, this without increasing memory needs for flows with small RTT. In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) > > If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. > > How to use TCP_NOTSENT_LOWAT is explained in this video: > > > > Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. > > > > Stuart Cheshire > _______________________________________________ > Bloat mailing list > Bloat at lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > From bob.mcmahon at broadcom.com Tue Oct 26 01:32:33 2021 From: bob.mcmahon at broadcom.com (Bob McMahon) Date: Mon, 25 Oct 2021 22:32:33 -0700 Subject: [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: Thanks Stuart this is helpful. I'm measuring the time just before the first write() (of potentially a burst of writes to achieve a burst size) per a socket fd's select event occurring when TCP_NOT_SENT_LOWAT being set to a small value, then sampling the RTT and CWND and providing histograms for all three, all on that event. I'm not sure the correctness of RTT and CWND at this sample point. This is a controlled test over 802.11ax and OFDMA where the TCP acks per the WiFi clients are being scheduled by the AP using 802.11ax trigger frames so the AP is affecting the end/end BDP per scheduling the transmits and the acks. The AP can grow the BDP or shrink it based on these scheduling decisions. From there we're trying to maximize network power (throughput/delay) for elephant flows and just latency for mouse flows. (We also plan some RF frequency stuff to per OFDMA) Anyway, the AP based scheduling along with aggregation and OFDMA makes WiFi scheduling optimums non-obvious - at least to me - and I'm trying to provide insights into how an AP is affecting end/end performance. The more direct approach for e2e TCP latency and network power has been to measure first write() to final read() and compute the e2e delay. This requires clock sync on the ends. (We're using ptp4l with GPS OCXO atomic references for that but this is typically only available in some labs.) Bob On Mon, Oct 25, 2021 at 8:11 PM Stuart Cheshire wrote: > On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast < > make-wifi-fast at lists.bufferbloat.net> wrote: > > > Hi All, > > > > Sorry for the spam. I'm trying to support a meaningful TCP message > latency w/iperf 2 from the sender side w/o requiring e2e clock > synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to > help with this. It seems that this event goes off when the bytes are in > flight vs have reached the destination network stack. If that's the case, > then iperf 2 client (sender) may be able to produce the message latency by > adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled > RTT. > > > > Does this seem reasonable? > > I’m not 100% sure what you’re asking, but I will try to help. > > When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your > endpoint as writable (e.g., via kqueue or epoll) until less than that > threshold of data remains unsent. It won’t stop you writing more bytes if > you want to, up to the socket send buffer size, but it won’t *ask* you for > more data until the TCP_NOTSENT_LOWAT threshold is reached. In other words, > the TCP implementation attempts to keep BDP bytes in flight + > TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in > flight is necessary to fill the network pipe and get good throughput. The > TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the > source software some advance notice that the TCP implementation will soon > be looking for more bytes to send, so that the buffer doesn’t run dry, > thereby lowering throughput. (The old SO_SNDBUF option conflates both > “bytes in flight” and “bytes buffered and ready to go” into the same > number.) > > If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n > bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, > that will tell you roughly how long it took n bytes to depart the machine. > You won’t know why, though. The bytes could depart the machine in response > for acks indicating that the same number of bytes have been accepted at the > receiver. But the bytes can also depart the machine because CWND is > growing. Of course, both of those things are usually happening at the same > time. > > How to use TCP_NOTSENT_LOWAT is explained in this video: > > > > Later in the same video is a two-minute demo (time offset 42:00 to time > offset 44:00) showing a “before and after” demo illustrating the dramatic > difference this makes for screen sharing responsiveness. > > > > Stuart Cheshire -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheshire at apple.com Mon Oct 25 23:11:07 2021 From: cheshire at apple.com (Stuart Cheshire) Date: Mon, 25 Oct 2021 20:11:07 -0700 Subject: [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast wrote: > Hi All, > > Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. > > Does this seem reasonable? I’m not 100% sure what you’re asking, but I will try to help. When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. How to use TCP_NOTSENT_LOWAT is explained in this video: Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. Stuart Cheshire From bjorn at domos.no Tue Oct 26 06:04:30 2021 From: bjorn at domos.no (=?UTF-8?Q?Bj=C3=B8rn_Ivar_Teigen?=) Date: Tue, 26 Oct 2021 11:04:30 +0100 Subject: [Cerowrt-devel] [Starlink] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: Hi Bob, My name is Bjørn Ivar Teigen and I'm working on modeling and measuring WiFi MAC-layer protocol performance for my PhD. Is it necessary to measure the latency using the TCP stream itself? I had a similar problem in the past, and solved it by doing the latency measurements using TWAMP running alongside the TCP traffic. The requirement for this to work is that the TWAMP packets are placed in the same queue(s) as the TCP traffic, and that the impact of measurement traffic is small enough so as not to interfere too much with your TCP results. Just my two cents, hope it's helpful. Bjørn On Tue, 26 Oct 2021 at 06:32, Bob McMahon wrote: > Thanks Stuart this is helpful. I'm measuring the time just before the > first write() (of potentially a burst of writes to achieve a burst size) > per a socket fd's select event occurring when TCP_NOT_SENT_LOWAT being set > to a small value, then sampling the RTT and CWND and providing histograms > for all three, all on that event. I'm not sure the correctness of RTT and > CWND at this sample point. This is a controlled test over 802.11ax and > OFDMA where the TCP acks per the WiFi clients are being scheduled by the AP > using 802.11ax trigger frames so the AP is affecting the end/end BDP per > scheduling the transmits and the acks. The AP can grow the BDP or shrink it > based on these scheduling decisions. From there we're trying to maximize > network power (throughput/delay) for elephant flows and just latency for > mouse flows. (We also plan some RF frequency stuff to per OFDMA) Anyway, > the AP based scheduling along with aggregation and OFDMA makes WiFi > scheduling optimums non-obvious - at least to me - and I'm trying to > provide insights into how an AP is affecting end/end performance. > > The more direct approach for e2e TCP latency and network power has been to > measure first write() to final read() and compute the e2e delay. This > requires clock sync on the ends. (We're using ptp4l with GPS OCXO > atomic references for that but this is typically only available in some > labs.) > > Bob > > > On Mon, Oct 25, 2021 at 8:11 PM Stuart Cheshire > wrote: > >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast < >> make-wifi-fast at lists.bufferbloat.net> wrote: >> >> > Hi All, >> > >> > Sorry for the spam. I'm trying to support a meaningful TCP message >> latency w/iperf 2 from the sender side w/o requiring e2e clock >> synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to >> help with this. It seems that this event goes off when the bytes are in >> flight vs have reached the destination network stack. If that's the case, >> then iperf 2 client (sender) may be able to produce the message latency by >> adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled >> RTT. >> > >> > Does this seem reasonable? >> >> I’m not 100% sure what you’re asking, but I will try to help. >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your >> endpoint as writable (e.g., via kqueue or epoll) until less than that >> threshold of data remains unsent. It won’t stop you writing more bytes if >> you want to, up to the socket send buffer size, but it won’t *ask* you for >> more data until the TCP_NOTSENT_LOWAT threshold is reached. In other words, >> the TCP implementation attempts to keep BDP bytes in flight + >> TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in >> flight is necessary to fill the network pipe and get good throughput. The >> TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the >> source software some advance notice that the TCP implementation will soon >> be looking for more bytes to send, so that the buffer doesn’t run dry, >> thereby lowering throughput. (The old SO_SNDBUF option conflates both >> “bytes in flight” and “bytes buffered and ready to go” into the same >> number.) >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n >> bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, >> that will tell you roughly how long it took n bytes to depart the machine. >> You won’t know why, though. The bytes could depart the machine in response >> for acks indicating that the same number of bytes have been accepted at the >> receiver. But the bytes can also depart the machine because CWND is >> growing. Of course, both of those things are usually happening at the same >> time. >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: >> >> >> >> Later in the same video is a two-minute demo (time offset 42:00 to time >> offset 44:00) showing a “before and after” demo illustrating the dramatic >> difference this makes for screen sharing responsiveness. >> >> >> >> Stuart Cheshire > > > This electronic communication and the information and any files > transmitted with it, or attached to it, are confidential and are intended > solely for the use of the individual or entity to whom it is addressed and > may contain information that is confidential, legally privileged, protected > by privacy laws, or otherwise restricted from disclosure to anyone else. If > you are not the intended recipient or the person responsible for delivering > the e-mail to the intended recipient, you are hereby notified that any use, > copying, distributing, dissemination, forwarding, printing, or copying of > this e-mail is strictly prohibited. If you received this e-mail in error, > please return the e-mail to the sender, delete it from your computer, and > destroy any printed copy of it. > _______________________________________________ > Starlink mailing list > Starlink at lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink > -- Bjørn Ivar Teigen Head of Research +47 47335952 | bjorn at domos.no | www.domos.no WiFi Slicing by Domos -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.taht at gmail.com Tue Oct 26 12:24:15 2021 From: dave.taht at gmail.com (Dave Taht) Date: Tue, 26 Oct 2021 09:24:15 -0700 Subject: [Cerowrt-devel] Fwd: [PATCH net-next] ifb: Depend on netfilter alternatively to tc In-Reply-To: References: Message-ID: any benefits to getting away from mirred? ---------- Forwarded message --------- From: Lukas Wunner Date: Tue, Oct 26, 2021 at 12:11 AM Subject: [PATCH net-next] ifb: Depend on netfilter alternatively to tc To: David S. Miller , Jakub Kicinski Cc: Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , , , , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Daniel Borkmann , Willem de Bruijn IFB originally depended on NET_CLS_ACT for traffic redirection. But since v4.5, that may be achieved with NFT_FWD_NETDEV as well. Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family") Signed-off-by: Lukas Wunner Cc: # v4.5+: bcfabee1afd9: netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress Cc: # v4.5+ --- drivers/net/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f37b1c56f7c4..dd335ae1122b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -150,7 +150,7 @@ config NET_FC config IFB tristate "Intermediate Functional Block support" - depends on NET_CLS_ACT + depends on NET_ACT_MIRRED || NFT_FWD_NETDEV select NET_REDIRECT help This is an intermediate driver that allows sharing of -- 2.31.1 -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC From bob.mcmahon at broadcom.com Tue Oct 26 13:23:28 2021 From: bob.mcmahon at broadcom.com (Bob McMahon) Date: Tue, 26 Oct 2021 10:23:28 -0700 Subject: [Starlink] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: Hi Bjørn, I find, when possible, it's preferred to take telemetry data of actual traffic (or reads and writes) vs a proxy. We had a case where TCP BE was outperforming TCP w/VI because BE had the most engineering resources assigned to it and engineers did a better job with BE. Using a proxy protocol wouldn't have exercised the same logic paths (in this case it was in the L2 driver) as TCP did. Hence, measuring actual TCP traffic (or socket reads and socket writes) was needed to flush out the problem. Note: I also find that network engineers tend to focus on the stack but it's the e2e at the application level that impacts user experience. Send side bloat can drive the OWD while the TCP stack's RTT may look fine. For WiFi test & measurements, we've decided most testing should be using TCP_NOSENT_LOWAT because it helps mitigate send side bloat which WiFi engineering doesn't focus on per lack of ability to impact. Also, I think OWD is under tested and two way based testing can give incomplete and inaccurate information, particularly with respect to things like an e2e transport's control loop. A most obvious example is assuming 1/2 RTT is the same as OWD to/fro. For WiFi this assumption is most always false. It also false for many residential internet connections where OWD asymmetry is designed in. Bob On Tue, Oct 26, 2021 at 3:04 AM Bjørn Ivar Teigen wrote: > Hi Bob, > > My name is Bjørn Ivar Teigen and I'm working on modeling and measuring > WiFi MAC-layer protocol performance for my PhD. > > Is it necessary to measure the latency using the TCP stream itself? I had > a similar problem in the past, and solved it by doing the latency > measurements using TWAMP running alongside the TCP traffic. The requirement > for this to work is that the TWAMP packets are placed in the same queue(s) > as the TCP traffic, and that the impact of measurement traffic is small > enough so as not to interfere too much with your TCP results. > Just my two cents, hope it's helpful. > > Bjørn > > On Tue, 26 Oct 2021 at 06:32, Bob McMahon > wrote: > >> Thanks Stuart this is helpful. I'm measuring the time just before the >> first write() (of potentially a burst of writes to achieve a burst size) >> per a socket fd's select event occurring when TCP_NOT_SENT_LOWAT being set >> to a small value, then sampling the RTT and CWND and providing histograms >> for all three, all on that event. I'm not sure the correctness of RTT and >> CWND at this sample point. This is a controlled test over 802.11ax and >> OFDMA where the TCP acks per the WiFi clients are being scheduled by the AP >> using 802.11ax trigger frames so the AP is affecting the end/end BDP per >> scheduling the transmits and the acks. The AP can grow the BDP or shrink it >> based on these scheduling decisions. From there we're trying to maximize >> network power (throughput/delay) for elephant flows and just latency for >> mouse flows. (We also plan some RF frequency stuff to per OFDMA) Anyway, >> the AP based scheduling along with aggregation and OFDMA makes WiFi >> scheduling optimums non-obvious - at least to me - and I'm trying to >> provide insights into how an AP is affecting end/end performance. >> >> The more direct approach for e2e TCP latency and network power has been >> to measure first write() to final read() and compute the e2e delay. This >> requires clock sync on the ends. (We're using ptp4l with GPS OCXO >> atomic references for that but this is typically only available in some >> labs.) >> >> Bob >> >> >> On Mon, Oct 25, 2021 at 8:11 PM Stuart Cheshire >> wrote: >> >>> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast < >>> make-wifi-fast at lists.bufferbloat.net> wrote: >>> >>> > Hi All, >>> > >>> > Sorry for the spam. I'm trying to support a meaningful TCP message >>> latency w/iperf 2 from the sender side w/o requiring e2e clock >>> synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to >>> help with this. It seems that this event goes off when the bytes are in >>> flight vs have reached the destination network stack. If that's the case, >>> then iperf 2 client (sender) may be able to produce the message latency by >>> adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled >>> RTT. >>> > >>> > Does this seem reasonable? >>> >>> I’m not 100% sure what you’re asking, but I will try to help. >>> >>> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your >>> endpoint as writable (e.g., via kqueue or epoll) until less than that >>> threshold of data remains unsent. It won’t stop you writing more bytes if >>> you want to, up to the socket send buffer size, but it won’t *ask* you for >>> more data until the TCP_NOTSENT_LOWAT threshold is reached. In other words, >>> the TCP implementation attempts to keep BDP bytes in flight + >>> TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in >>> flight is necessary to fill the network pipe and get good throughput. The >>> TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the >>> source software some advance notice that the TCP implementation will soon >>> be looking for more bytes to send, so that the buffer doesn’t run dry, >>> thereby lowering throughput. (The old SO_SNDBUF option conflates both >>> “bytes in flight” and “bytes buffered and ready to go” into the same >>> number.) >>> >>> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n >>> bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, >>> that will tell you roughly how long it took n bytes to depart the machine. >>> You won’t know why, though. The bytes could depart the machine in response >>> for acks indicating that the same number of bytes have been accepted at the >>> receiver. But the bytes can also depart the machine because CWND is >>> growing. Of course, both of those things are usually happening at the same >>> time. >>> >>> How to use TCP_NOTSENT_LOWAT is explained in this video: >>> >>> >>> >>> Later in the same video is a two-minute demo (time offset 42:00 to time >>> offset 44:00) showing a “before and after” demo illustrating the dramatic >>> difference this makes for screen sharing responsiveness. >>> >>> >>> >>> Stuart Cheshire >> >> >> This electronic communication and the information and any files >> transmitted with it, or attached to it, are confidential and are intended >> solely for the use of the individual or entity to whom it is addressed and >> may contain information that is confidential, legally privileged, protected >> by privacy laws, or otherwise restricted from disclosure to anyone else. If >> you are not the intended recipient or the person responsible for delivering >> the e-mail to the intended recipient, you are hereby notified that any use, >> copying, distributing, dissemination, forwarding, printing, or copying of >> this e-mail is strictly prohibited. If you received this e-mail in error, >> please return the e-mail to the sender, delete it from your computer, and >> destroy any printed copy of it. >> _______________________________________________ >> Starlink mailing list >> Starlink at lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/starlink >> > > > -- > Bjørn Ivar Teigen > Head of Research > +47 47335952 | bjorn at domos.no | www.domos.no > WiFi Slicing by Domos > -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob.mcmahon at broadcom.com Tue Oct 26 19:23:35 2021 From: bob.mcmahon at broadcom.com (Bob McMahon) Date: Tue, 26 Oct 2021 16:23:35 -0700 Subject: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: <4BFB5A37-9574-49BE-B083-FBC1F2B0381E@apple.com> References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> <4BFB5A37-9574-49BE-B083-FBC1F2B0381E@apple.com> Message-ID: I'm confused. I don't see any blocking nor partial writes per the write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the write size is 4K and the watermark is 4 bytes. There are ten writes per burst. The S8 histograms are the times waiting on the select(). The first value is the bin number (multiplied by 100usec bin width) and second the bin count. The worst case time is at the end and is timestamped per unix epoch. The second run is over a controlled WiFi link where a 99.7% point of 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired and is in the 600 usec range. (No media arbitration there.) [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second ------------------------------------------------------------ Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) Write buffer size: 4096 Byte Bursting: 40.0 KByte every 1.00 seconds TCP window size: 85.0 KByte (default) Event based writes (pending queue watermark at 4 bytes) Enabled select histograms bin-width=0.100 ms, bins=10000 ------------------------------------------------------------ [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 16:07:33 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/5368 us 8 [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/569 us 72 [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/312 us 131 [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/302 us 136 [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/316 us 130 [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/253 us 162 [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/290 us 141 [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/359 us 114 [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/349 us 117 [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/347 us 118 [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/1519 us 27 [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) Write buffer size: 4096 Byte Bursting: 40.0 KByte every 1.00 seconds TCP window size: 85.0 KByte (default) Event based writes (pending queue watermark at 4 bytes) Enabled select histograms bin-width=0.100 ms, bins=10000 ------------------------------------------------------------ [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on 2021-10-26 16:07:56 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/10339 us 4 [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 ms/1635289676.802143) [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4853 us 8 [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 ms/1635289677.802274) [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4991 us 8 [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 ms/1635289678.794326) [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4610 us 9 [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 ms/1635289679.794335) [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5028 us 8 [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 ms/1635289680.794399) [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5113 us 8 [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 ms/1635289681.794392) [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5054 us 8 [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 ms/1635289682.794335) [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5138 us 8 [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 ms/1635289683.794338) [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5329 us 8 [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 ms/1635289684.794262) [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5329 us 8 [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/6331 us 6 [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) Bob On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch wrote: > Hello, > > > On Oct 25, 2021, at 9:24 PM, Eric Dumazet > wrote: > > > > > > > > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast < > make-wifi-fast at lists.bufferbloat.net> wrote: > >> > >>> Hi All, > >>> > >>> Sorry for the spam. I'm trying to support a meaningful TCP message > latency w/iperf 2 from the sender side w/o requiring e2e clock > synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to > help with this. It seems that this event goes off when the bytes are in > flight vs have reached the destination network stack. If that's the case, > then iperf 2 client (sender) may be able to produce the message latency by > adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled > RTT. > >>> > >>> Does this seem reasonable? > >> > >> I’m not 100% sure what you’re asking, but I will try to help. > >> > >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report > your endpoint as writable (e.g., via kqueue or epoll) until less than that > threshold of data remains unsent. It won’t stop you writing more bytes if > you want to, up to the socket send buffer size, but it won’t *ask* you for > more data until the TCP_NOTSENT_LOWAT threshold is reached. > > > > > > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that > sendmsg() would actually > > stop feeding more bytes in TCP transmit queue if the current amount of > unsent bytes > > was above the threshold. > > > > So it looks like Apple implementation is different, based on your > description ? > > Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > > An app can still fill the send-buffer if it does a sendmsg() with a large > buffer or does repeated calls to sendmsg(). > > Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly > change the data it "scheduled" to send. And thus allow the app to write the > smallest "logical unit" it has. If that unit is 512KB large, the app is > allowed to send that. > For example, in case of video-streaming one may want to skip ahead in the > video. In that case the app still needs to transmit the remaining parts of > the previous frame anyways, before it can send the new video frame. > That's the reason why the Apple implementation allows one to write more > than just the lowat threshold. > > > That being said, I do think that Linux's way allows for an easier API > because the app does not need to be careful at how much data it sends after > an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. > > > Christoph > > > > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > > > netperf does not use epoll(), but rather a loop over sendmsg(). > > > > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to > considerably increase > > max number of bytes in transmit queues (3rd column of > /proc/sys/net/ipv4/tcp_wmem) > > by 10x, allowing for autotune to increase BDP for big RTT flows, this > without > > increasing memory needs for flows with small RTT. > > > > In other words, the TCP implementation attempts to keep BDP bytes in > flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes > in flight is necessary to fill the network pipe and get good throughput. > The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give > the source software some advance notice that the TCP implementation will > soon be looking for more bytes to send, so that the buffer doesn’t run dry, > thereby lowering throughput. (The old SO_SNDBUF option conflates both > “bytes in flight” and “bytes buffered and ready to go” into the same > number.) > >> > >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n > bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, > that will tell you roughly how long it took n bytes to depart the machine. > You won’t know why, though. The bytes could depart the machine in response > for acks indicating that the same number of bytes have been accepted at the > receiver. But the bytes can also depart the machine because CWND is > growing. Of course, both of those things are usually happening at the same > time. > >> > >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >> > >> > >> > >> Later in the same video is a two-minute demo (time offset 42:00 to time > offset 44:00) showing a “before and after” demo illustrating the dramatic > difference this makes for screen sharing responsiveness. > >> > >> > >> > >> Stuart Cheshire > >> _______________________________________________ > >> Bloat mailing list > >> Bloat at lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/bloat > >> > > _______________________________________________ > > Bloat mailing list > > Bloat at lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.dumazet at gmail.com Tue Oct 26 21:12:04 2021 From: eric.dumazet at gmail.com (Eric Dumazet) Date: Tue, 26 Oct 2021 18:12:04 -0700 Subject: [Cerowrt-devel] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> References: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> Message-ID: <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> On 10/26/21 4:38 PM, Christoph Paasch wrote: > Hi Bob, > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon > wrote: >> I'm confused. I don't see any blocking nor partial writes per the write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the write size is 4K and the watermark is 4 bytes. There are ten writes per burst. > > You are on Linux here, right? > > AFAICS, Linux will still accept whatever fits in an skb. And that is likely more than 4K (with GSO on by default). This (max payload per skb) can be tuned at the driver level, at least for experimental purposes or dedicated devices. ip link set dev eth0 gso_max_size 8000 To fetch current values : ip -d link sh dev eth0 > > However, do you go back to select() after each write() or do you loop over the write() calls? > > > Christoph > >> The S8 histograms are the times waiting on the select().  The first value is the bin number (multiplied by 100usec bin width) and second the bin count. The worst case time is at the end and is timestamped per unix epoch. >> >> The second run is over a controlled WiFi link where a 99.7% point of 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired and is in the 600 usec range. (No media arbitration there.) >> >>  [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms >> WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second >> ------------------------------------------------------------ >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) >> Write buffer size: 4096 Byte >> Bursting: 40.0 KByte every 1.00 seconds >> TCP window size: 85.0 KByte (default) >> Event based writes (pending queue watermark at 4 bytes) >> Enabled select histograms bin-width=0.100 ms, bins=10000 >> ------------------------------------------------------------ >> [  1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 16:07:33 (PDT) >> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr >> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       14K/5368 us  8 >> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) >> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/569 us  72 >> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) >> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/312 us  131 >> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) >> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/302 us  136 >> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) >> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/316 us  130 >> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) >> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/253 us  162 >> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) >> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/290 us  141 >> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) >> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/359 us  114 >> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) >> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/349 us  117 >> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) >> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/347 us  118 >> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) >> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       14K/1519 us  27 >> [  1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) >> >> [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms >> WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second >> ------------------------------------------------------------ >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) >> Write buffer size: 4096 Byte >> Bursting: 40.0 KByte every 1.00 seconds >> TCP window size: 85.0 KByte (default) >> Event based writes (pending queue watermark at 4 bytes) >> Enabled select histograms bin-width=0.100 ms, bins=10000 >> ------------------------------------------------------------ >> [  1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on 2021-10-26 16:07:56 (PDT) >> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr >> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       14K/10339 us  4 >> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 ms/1635289676.802143) >> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4853 us  8 >> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 ms/1635289677.802274) >> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4991 us  8 >> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 ms/1635289678.794326) >> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4610 us  9 >> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 ms/1635289679.794335) >> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5028 us  8 >> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 ms/1635289680.794399) >> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5113 us  8 >> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 ms/1635289681.794392) >> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5054 us  8 >> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 ms/1635289682.794335) >> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5138 us  8 >> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 ms/1635289683.794338) >> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5329 us  8 >> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 ms/1635289684.794262) >> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5329 us  8 >> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) >> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       14K/6331 us  6 >> [  1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) >> >> Bob >> >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch > wrote: >> >> Hello, >> >> > On Oct 25, 2021, at 9:24 PM, Eric Dumazet > wrote: >> > >> > >> > >> > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: >> >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast > wrote: >> >> >> >>> Hi All, >> >>> >> >>> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. >> >>> >> >>> Does this seem reasonable? >> >> >> >> I’m not 100% sure what you’re asking, but I will try to help. >> >> >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. >> > >> > >> > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually >> > stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes >> > was above the threshold. >> > >> > So it looks like Apple implementation is different, based on your description ? >> >> Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... >> >> An app can still fill the send-buffer if it does a sendmsg() with a large buffer or does repeated calls to sendmsg(). >> >> Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly change the data it "scheduled" to send. And thus allow the app to write the smallest "logical unit" it has. If that unit is 512KB large, the app is allowed to send that. >> For example, in case of video-streaming one may want to skip ahead in the video. In that case the app still needs to transmit the remaining parts of the previous frame anyways, before it can send the new video frame. >> That's the reason why the Apple implementation allows one to write more than just the lowat threshold. >> >> >> That being said, I do think that Linux's way allows for an easier API because the app does not need to be careful at how much data it sends after an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. >> >> >> Christoph >> >> >> >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 >> > >> > netperf does not use epoll(), but rather a loop over sendmsg(). >> > >> > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase >> > max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) >> > by 10x, allowing for autotune to increase BDP for big RTT flows, this without >> > increasing memory needs for flows with small RTT. >> > >> > In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) >> >> >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. >> >> >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: >> >> >> >> > >> >> >> >> Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. >> >> >> >> > >> >> >> >> Stuart Cheshire >> >> _______________________________________________ >> >> Bloat mailing list >> >> Bloat at lists.bufferbloat.net >> >> https://lists.bufferbloat.net/listinfo/bloat >> >> >> > _______________________________________________ >> > Bloat mailing list >> > Bloat at lists.bufferbloat.net >> > https://lists.bufferbloat.net/listinfo/bloat >> >> >> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. From bob.mcmahon at broadcom.com Tue Oct 26 23:45:52 2021 From: bob.mcmahon at broadcom.com (Bob McMahon) Date: Tue, 26 Oct 2021 20:45:52 -0700 Subject: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> References: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> Message-ID: This is linux. The code flow is burst writes until the burst size, take a timestamp, call select(), take second timestamp and insert time delta into histogram, await clock_nanosleep() to schedule the next burst. (actually, the deltas, inserts into the histogram and user i/o are done in another thread, i.e. iperf 2's reporter thread.) I still must be missing something. Does anything else need to be set to reduce the skb size? Everything seems to be indicating 4K writes even when gso_max_size is 2000 (I assume these are units of bytes?) There are ten writes, ten reads and ten RTTs for the bursts. I don't see partial writes at the app level. [root at localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000 [root at localhost iperf2-code]# ip -d link sh dev eth1 9: eth1: mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 2000 gso_max_segs 65535 [root at localhost iperf2-code]# uname -r 5.0.9-301.fc30.x86_64 It looks like RTT is being driven by WiFi TXOPs as doubling the write size increases the aggregation by two but has no significant effect on the RTTs. 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2 8k writes: tot_mpdus 317 tot_ampdus 107 mpduperampdu 3 [root at localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 flows) Write buffer size: 4096 Byte Bursting: 40.0 KByte every 1.00 seconds TCP window size: 85.0 KByte (default) Event based writes (pending queue watermark at 4 bytes) Enabled select histograms bin-width=0.100 ms, bins=10000 ------------------------------------------------------------ [ 1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 20:25:29 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/10091 us 4 [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339) [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4990 us 8 [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330) [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4904 us 8 [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353) [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4964 us 8 [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338) [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4970 us 8 [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335) [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4986 us 8 [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343) [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5059 us 8 [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5407 us 8 [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 ms/1635305136.147343) [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5188 us 8 [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 ms/1635305137.146284) [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5306 us 8 [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422 ms/1635305138.146316) [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/5939 us 7 [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1 (5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) [root at localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1 ------------------------------------------------------------ Server listening on TCP port 5001 with pid 6287 Binding to local address 192.168.1.1 and iface eth1 Read buffer size: 128 KByte (Dist bin width=16.0 KByte) TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port 45680 (MSS=1448) (burst-period=1.0000s) (trip-times) (sock=4) (peer 2.1.4-master) on 2021-10-26 20:25:29 (PDT) [ ID] Burst (start-end) Transfer Bandwidth XferTime (DC%) Reads=Dist NetPwr [ 1] 0.0001-0.0500 sec 40.1 KBytes 6.59 Mbits/sec 49.848 ms (5%) 12=12:0:0:0:0:0:0:0 0 [ 1] 1.0002-1.0461 sec 40.0 KBytes 7.14 Mbits/sec 45.913 ms (4.6%) 10=10:0:0:0:0:0:0:0 0 [ 1] 2.0002-2.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.876 ms (4.9%) 11=11:0:0:0:0:0:0:0 0 [ 1] 3.0002-3.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.886 ms (5%) 10=10:0:0:0:0:0:0:0 0 [ 1] 4.0002-4.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.887 ms (5%) 10=10:0:0:0:0:0:0:0 0 [ 1] 5.0002-5.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.881 ms (5%) 10=10:0:0:0:0:0:0:0 0 [ 1] 6.0002-6.0511 sec 40.0 KBytes 6.44 Mbits/sec 50.895 ms (5.1%) 10=10:0:0:0:0:0:0:0 0 [ 1] 7.0002-7.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.889 ms (5%) 10=10:0:0:0:0:0:0:0 0 [ 1] 8.0002-8.0481 sec 40.0 KBytes 6.84 Mbits/sec 47.901 ms (4.8%) 11=11:0:0:0:0:0:0:0 0 [ 1] 9.0002-9.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.872 ms (4.9%) 10=10:0:0:0:0:0:0:0 0 [ 1] 0.0000-10.0031 sec 400 KBytes 328 Kbits/sec 104=104:0:0:0:0:0:0:0 Bob On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet wrote: > > > On 10/26/21 4:38 PM, Christoph Paasch wrote: > > Hi Bob, > > > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon > wrote: > >> I'm confused. I don't see any blocking nor partial writes per the > write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst > is 40K, the write size is 4K and the watermark is 4 bytes. There are ten > writes per burst. > > > > You are on Linux here, right? > > > > AFAICS, Linux will still accept whatever fits in an skb. And that is > likely more than 4K (with GSO on by default). > > This (max payload per skb) can be tuned at the driver level, at least for > experimental purposes or dedicated devices. > > ip link set dev eth0 gso_max_size 8000 > > To fetch current values : > > ip -d link sh dev eth0 > > > > > > However, do you go back to select() after each write() or do you loop > over the write() calls? > > > > > > Christoph > > > >> The S8 histograms are the times waiting on the select(). The first > value is the bin number (multiplied by 100usec bin width) and second the > bin count. The worst case time is at the end and is timestamped per unix > epoch. > >> > >> The second run is over a controlled WiFi link where a 99.7% point of > 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G > wired and is in the 600 usec range. (No media arbitration there.) > >> > >> [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i > 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port > 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on > 2021-10-26 16:07:33 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/5368 us 8 > >> [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 > (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 > ms/1635289653.928360) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/569 us 72 > >> [ 1] 1.00-2.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 > (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/312 us 131 > >> [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/302 us 136 > >> [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/316 us 130 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/253 us 162 > >> [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 > (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/290 us 141 > >> [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/359 us 114 > >> [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/349 us 117 > >> [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 > (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/347 us 118 > >> [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 > (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/1519 us 27 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 > (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 > ms/1635289653.928360) > >> > >> [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i > 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port > 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on > 2021-10-26 16:07:56 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/10339 us 4 > >> [ 1] 0.00-1.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 > (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 > ms/1635289676.802143) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4853 us 8 > >> [ 1] 1.00-2.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 > (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 > ms/1635289677.802274) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4991 us 8 > >> [ 1] 2.00-3.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 > ms/1635289678.794326) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4610 us 9 > >> [ 1] 3.00-4.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 > ms/1635289679.794335) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5028 us 8 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 > ms/1635289680.794399) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5113 us 8 > >> [ 1] 5.00-6.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 > ms/1635289681.794392) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5054 us 8 > >> [ 1] 6.00-7.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 > ms/1635289682.794335) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5138 us 8 > >> [ 1] 7.00-8.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 > ms/1635289683.794338) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5329 us 8 > >> [ 1] 8.00-9.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 > (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 > ms/1635289684.794262) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5329 us 8 > >> [ 1] 9.00-10.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 > (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 > ms/1635289685.796315) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/6331 us 6 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 > (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 > ms/1635289685.796315) > >> > >> Bob > >> > >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch > wrote: > >> > >> Hello, > >> > >> > On Oct 25, 2021, at 9:24 PM, Eric Dumazet > wrote: > >> > > >> > > >> > > >> > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >> >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast < > make-wifi-fast at lists.bufferbloat.net make-wifi-fast at lists.bufferbloat.net>> wrote: > >> >> > >> >>> Hi All, > >> >>> > >> >>> Sorry for the spam. I'm trying to support a meaningful TCP > message latency w/iperf 2 from the sender side w/o requiring e2e clock > synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to > help with this. It seems that this event goes off when the bytes are in > flight vs have reached the destination network stack. If that's the case, > then iperf 2 client (sender) may be able to produce the message latency by > adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled > RTT. > >> >>> > >> >>> Does this seem reasonable? > >> >> > >> >> I’m not 100% sure what you’re asking, but I will try to help. > >> >> > >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t > report your endpoint as writable (e.g., via kqueue or epoll) until less > than that threshold of data remains unsent. It won’t stop you writing more > bytes if you want to, up to the socket send buffer size, but it won’t *ask* > you for more data until the TCP_NOTSENT_LOWAT threshold is reached. > >> > > >> > > >> > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made > sure that sendmsg() would actually > >> > stop feeding more bytes in TCP transmit queue if the current > amount of unsent bytes > >> > was above the threshold. > >> > > >> > So it looks like Apple implementation is different, based on your > description ? > >> > >> Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > >> > >> An app can still fill the send-buffer if it does a sendmsg() with a > large buffer or does repeated calls to sendmsg(). > >> > >> Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to > quickly change the data it "scheduled" to send. And thus allow the app to > write the smallest "logical unit" it has. If that unit is 512KB large, the > app is allowed to send that. > >> For example, in case of video-streaming one may want to skip ahead > in the video. In that case the app still needs to transmit the remaining > parts of the previous frame anyways, before it can send the new video frame. > >> That's the reason why the Apple implementation allows one to write > more than just the lowat threshold. > >> > >> > >> That being said, I do think that Linux's way allows for an easier > API because the app does not need to be careful at how much data it sends > after an epoll/kqueue wakeup. So, the latency-benefits will be easier to > get. > >> > >> > >> Christoph > >> > >> > >> > >> > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > < > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > > >> > > >> > netperf does not use epoll(), but rather a loop over sendmsg(). > >> > > >> > One of the point of TCP_NOTSENT_LOWAT for Google was to be able > to considerably increase > >> > max number of bytes in transmit queues (3rd column of > /proc/sys/net/ipv4/tcp_wmem) > >> > by 10x, allowing for autotune to increase BDP for big RTT flows, > this without > >> > increasing memory needs for flows with small RTT. > >> > > >> > In other words, the TCP implementation attempts to keep BDP bytes > in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of > bytes in flight is necessary to fill the network pipe and get good > throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is > provided to give the source software some advance notice that the TCP > implementation will soon be looking for more bytes to send, so that the > buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF > option conflates both “bytes in flight” and “bytes buffered and ready to > go” into the same number.) > >> >> > >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a > chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT > notification, that will tell you roughly how long it took n bytes to depart > the machine. You won’t know why, though. The bytes could depart the machine > in response for acks indicating that the same number of bytes have been > accepted at the receiver. But the bytes can also depart the machine because > CWND is growing. Of course, both of those things are usually happening at > the same time. > >> >> > >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >> >> > >> >> > > >> >> > >> >> Later in the same video is a two-minute demo (time offset 42:00 > to time offset 44:00) showing a “before and after” demo illustrating the > dramatic difference this makes for screen sharing responsiveness. > >> >> > >> >> > > >> >> > >> >> Stuart Cheshire > >> >> _______________________________________________ > >> >> Bloat mailing list > >> >> Bloat at lists.bufferbloat.net > >> >> https://lists.bufferbloat.net/listinfo/bloat < > https://lists.bufferbloat.net/listinfo/bloat> > >> >> > >> > _______________________________________________ > >> > Bloat mailing list > >> > Bloat at lists.bufferbloat.net > >> > https://lists.bufferbloat.net/listinfo/bloat < > https://lists.bufferbloat.net/listinfo/bloat> > >> > >> > >> This electronic communication and the information and any files > transmitted with it, or attached to it, are confidential and are intended > solely for the use of the individual or entity to whom it is addressed and > may contain information that is confidential, legally privileged, protected > by privacy laws, or otherwise restricted from disclosure to anyone else. If > you are not the intended recipient or the person responsible for delivering > the e-mail to the intended recipient, you are hereby notified that any use, > copying, distributing, dissemination, forwarding, printing, or copying of > this e-mail is strictly prohibited. If you received this e-mail in error, > please return the e-mail to the sender, delete it from your computer, and > destroy any printed copy of it. > -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.dumazet at gmail.com Wed Oct 27 01:40:26 2021 From: eric.dumazet at gmail.com (Eric Dumazet) Date: Tue, 26 Oct 2021 22:40:26 -0700 Subject: [Cerowrt-devel] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> Message-ID: <99a1e8b4-4a98-af3f-386d-dd8a6fda0b41@gmail.com> On 10/26/21 8:45 PM, Bob McMahon wrote: > This is linux. The code flow is burst writes until the burst size, take a timestamp, call select(), take second timestamp and insert time delta into histogram, await clock_nanosleep() to schedule the next burst. (actually, the deltas, inserts into the histogram and user i/o are done in another thread, i.e. iperf 2's reporter thread.) > > I still must be missing something.  Does anything else need to be set to reduce the skb size? Everything seems to be indicating 4K writes even when gso_max_size is 2000 (I assume these are units of bytes?) There are ten writes, ten reads and ten  RTTs for the bursts.  I don't see partial writes at the app level.  > > [root at localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000 You could check with tcpdump on eth1, that outgoing packets are no longer 'TSO/GSO', but single MSS ones. (Note: this device gso_max_size is only taken into account for flows established after the change) > > [root at localhost iperf2-code]# ip -d link sh dev eth1 > 9: eth1: mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000 >     link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 2000 gso_max_segs 65535 > [root at localhost iperf2-code]# uname -r > 5.0.9-301.fc30.x86_64 > > > It looks like RTT is being driven by WiFi TXOPs as doubling the write size increases the aggregation by two but has no significant effect on the RTTs. > > 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2 > > > 8k writes:  tot_mpdus 317 tot_ampdus 107 mpduperampdu 3 > > > [root at localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > ------------------------------------------------------------ > Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 flows) > Write buffer size: 4096 Byte > Bursting: 40.0 KByte every 1.00 seconds > TCP window size: 85.0 KByte (default) > Event based writes (pending queue watermark at 4 bytes) > Enabled select histograms bin-width=0.100 ms, bins=10000 > ------------------------------------------------------------ > [  1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 20:25:29 (PDT) > [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr > [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       14K/10091 us  4 > [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339) > [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4990 us  8 > [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330) > [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4904 us  8 > [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353) > [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4964 us  8 > [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338) > [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4970 us  8 > [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335) > [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4986 us  8 > [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343) > [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5059 us  8 > [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) > [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5407 us  8 > [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 ms/1635305136.147343) > [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5188 us  8 > [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 ms/1635305137.146284) > [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5306 us  8 > [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422 ms/1635305138.146316) > [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       14K/5939 us  7 > [  1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1 (5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) > > [root at localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1 > ------------------------------------------------------------ > Server listening on TCP port 5001 with pid 6287 > Binding to local address 192.168.1.1 and iface eth1 > Read buffer size:  128 KByte (Dist bin width=16.0 KByte) > TCP window size:  128 KByte (default) > ------------------------------------------------------------ > [  1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port 45680 (MSS=1448) (burst-period=1.0000s) (trip-times) (sock=4) (peer 2.1.4-master) on 2021-10-26 20:25:29 (PDT) > [ ID] Burst (start-end)  Transfer     Bandwidth       XferTime  (DC%)     Reads=Dist          NetPwr > [  1] 0.0001-0.0500 sec  40.1 KBytes  6.59 Mbits/sec  49.848 ms (5%)    12=12:0:0:0:0:0:0:0  0 > [  1] 1.0002-1.0461 sec  40.0 KBytes  7.14 Mbits/sec  45.913 ms (4.6%)    10=10:0:0:0:0:0:0:0  0 > [  1] 2.0002-2.0491 sec  40.0 KBytes  6.70 Mbits/sec  48.876 ms (4.9%)    11=11:0:0:0:0:0:0:0  0 > [  1] 3.0002-3.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.886 ms (5%)    10=10:0:0:0:0:0:0:0  0 > [  1] 4.0002-4.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.887 ms (5%)    10=10:0:0:0:0:0:0:0  0 > [  1] 5.0002-5.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.881 ms (5%)    10=10:0:0:0:0:0:0:0  0 > [  1] 6.0002-6.0511 sec  40.0 KBytes  6.44 Mbits/sec  50.895 ms (5.1%)    10=10:0:0:0:0:0:0:0  0 > [  1] 7.0002-7.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.889 ms (5%)    10=10:0:0:0:0:0:0:0  0 > [  1] 8.0002-8.0481 sec  40.0 KBytes  6.84 Mbits/sec  47.901 ms (4.8%)    11=11:0:0:0:0:0:0:0  0 > [  1] 9.0002-9.0491 sec  40.0 KBytes  6.70 Mbits/sec  48.872 ms (4.9%)    10=10:0:0:0:0:0:0:0  0 > [  1] 0.0000-10.0031 sec   400 KBytes   328 Kbits/sec               104=104:0:0:0:0:0:0:0 > > Bob > > On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet > wrote: > > > > On 10/26/21 4:38 PM, Christoph Paasch wrote: > > Hi Bob, > > > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon >> wrote: > >> I'm confused. I don't see any blocking nor partial writes per the write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the write size is 4K and the watermark is 4 bytes. There are ten writes per burst. > > > > You are on Linux here, right? > > > > AFAICS, Linux will still accept whatever fits in an skb. And that is likely more than 4K (with GSO on by default). > > This (max payload per skb) can be tuned at the driver level, at least for experimental purposes or dedicated devices. > > ip link set dev eth0 gso_max_size 8000 > > To fetch current values : > > ip -d link sh dev eth0 > > > > > > However, do you go back to select() after each write() or do you loop over the write() calls? > > > > > > Christoph > > > >> The S8 histograms are the times waiting on the select().  The first value is the bin number (multiplied by 100usec bin width) and second the bin count. The worst case time is at the end and is timestamped per unix epoch. > >> > >> The second run is over a controlled WiFi link where a 99.7% point of 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired and is in the 600 usec range. (No media arbitration there.) > >> > >>  [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [  1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 16:07:33 (PDT) > >> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr > >> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       14K/5368 us  8 > >> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > >> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/569 us  72 > >> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) > >> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/312 us  131 > >> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) > >> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/302 us  136 > >> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) > >> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/316 us  130 > >> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) > >> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/253 us  162 > >> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) > >> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/290 us  141 > >> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) > >> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/359 us  114 > >> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) > >> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/349 us  117 > >> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) > >> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/347 us  118 > >> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) > >> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       14K/1519 us  27 > >> [  1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > >> > >> [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [  1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on 2021-10-26 16:07:56 (PDT) > >> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr > >> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       14K/10339 us  4 > >> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 ms/1635289676.802143) > >> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4853 us  8 > >> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 ms/1635289677.802274) > >> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4991 us  8 > >> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 ms/1635289678.794326) > >> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/4610 us  9 > >> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 ms/1635289679.794335) > >> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5028 us  8 > >> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 ms/1635289680.794399) > >> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5113 us  8 > >> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 ms/1635289681.794392) > >> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5054 us  8 > >> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 ms/1635289682.794335) > >> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5138 us  8 > >> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 ms/1635289683.794338) > >> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5329 us  8 > >> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 ms/1635289684.794262) > >> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       14K/5329 us  8 > >> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > >> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       14K/6331 us  6 > >> [  1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > >> > >> Bob > >> > >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch >> wrote: > >> > >>     Hello, > >> > >>     > On Oct 25, 2021, at 9:24 PM, Eric Dumazet >> wrote: > >>     > > >>     > > >>     > > >>     > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >>     >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast >> wrote: > >>     >> > >>     >>> Hi All, > >>     >>> > >>     >>> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. > >>     >>> > >>     >>> Does this seem reasonable? > >>     >> > >>     >> I’m not 100% sure what you’re asking, but I will try to help. > >>     >> > >>     >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. > >>     > > >>     > > >>     > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually > >>     > stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes > >>     > was above the threshold. > >>     > > >>     > So it looks like Apple implementation is different, based on your description ? > >> > >>     Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > >> > >>     An app can still fill the send-buffer if it does a sendmsg() with a large buffer or does repeated calls to sendmsg(). > >> > >>     Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly change the data it "scheduled" to send. And thus allow the app to write the smallest "logical unit" it has. If that unit is 512KB large, the app is allowed to send that. > >>     For example, in case of video-streaming one may want to skip ahead in the video. In that case the app still needs to transmit the remaining parts of the previous frame anyways, before it can send the new video frame. > >>     That's the reason why the Apple implementation allows one to write more than just the lowat threshold. > >> > >> > >>     That being said, I do think that Linux's way allows for an easier API because the app does not need to be careful at how much data it sends after an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. > >> > >> > >>     Christoph > >> > >> > >> > >>     > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > >>     > > >>     > netperf does not use epoll(), but rather a loop over sendmsg(). > >>     > > >>     > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase > >>     > max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) > >>     > by 10x, allowing for autotune to increase BDP for big RTT flows, this without > >>     > increasing memory needs for flows with small RTT. > >>     > > >>     > In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) > >>     >> > >>     >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. > >>     >> > >>     >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >>     >> > >>     >> >> > >>     >> > >>     >> Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. > >>     >> > >>     >> >> > >>     >> > >>     >> Stuart Cheshire > >>     >> _______________________________________________ > >>     >> Bloat mailing list > >>     >> Bloat at lists.bufferbloat.net > > >>     >> https://lists.bufferbloat.net/listinfo/bloat > > >>     >> > >>     > _______________________________________________ > >>     > Bloat mailing list > >>     > Bloat at lists.bufferbloat.net > > >>     > https://lists.bufferbloat.net/listinfo/bloat > > >> > >> > >> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. > > > This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. From cpaasch at apple.com Tue Oct 26 14:45:24 2021 From: cpaasch at apple.com (Christoph Paasch) Date: Tue, 26 Oct 2021 11:45:24 -0700 Subject: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> Message-ID: <4BFB5A37-9574-49BE-B083-FBC1F2B0381E@apple.com> Hello, > On Oct 25, 2021, at 9:24 PM, Eric Dumazet wrote: > > > > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast wrote: >> >>> Hi All, >>> >>> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. >>> >>> Does this seem reasonable? >> >> I’m not 100% sure what you’re asking, but I will try to help. >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. > > > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually > stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes > was above the threshold. > > So it looks like Apple implementation is different, based on your description ? Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... An app can still fill the send-buffer if it does a sendmsg() with a large buffer or does repeated calls to sendmsg(). Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly change the data it "scheduled" to send. And thus allow the app to write the smallest "logical unit" it has. If that unit is 512KB large, the app is allowed to send that. For example, in case of video-streaming one may want to skip ahead in the video. In that case the app still needs to transmit the remaining parts of the previous frame anyways, before it can send the new video frame. That's the reason why the Apple implementation allows one to write more than just the lowat threshold. That being said, I do think that Linux's way allows for an easier API because the app does not need to be careful at how much data it sends after an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. Christoph > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > netperf does not use epoll(), but rather a loop over sendmsg(). > > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase > max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) > by 10x, allowing for autotune to increase BDP for big RTT flows, this without > increasing memory needs for flows with small RTT. > > In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: >> >> >> >> Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. >> >> >> >> Stuart Cheshire >> _______________________________________________ >> Bloat mailing list >> Bloat at lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >> > _______________________________________________ > Bloat mailing list > Bloat at lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat From cpaasch at apple.com Tue Oct 26 19:38:15 2021 From: cpaasch at apple.com (Christoph Paasch) Date: Tue, 26 Oct 2021 16:38:15 -0700 Subject: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: Message-ID: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> Hi Bob, > On Oct 26, 2021, at 4:23 PM, Bob McMahon wrote: > I'm confused. I don't see any blocking nor partial writes per the write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the write size is 4K and the watermark is 4 bytes. There are ten writes per burst. You are on Linux here, right? AFAICS, Linux will still accept whatever fits in an skb. And that is likely more than 4K (with GSO on by default). However, do you go back to select() after each write() or do you loop over the write() calls? Christoph > The S8 histograms are the times waiting on the select(). The first value is the bin number (multiplied by 100usec bin width) and second the bin count. The worst case time is at the end and is timestamped per unix epoch. > > The second run is over a controlled WiFi link where a 99.7% point of 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired and is in the 600 usec range. (No media arbitration there.) > > [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > ------------------------------------------------------------ > Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) > Write buffer size: 4096 Byte > Bursting: 40.0 KByte every 1.00 seconds > TCP window size: 85.0 KByte (default) > Event based writes (pending queue watermark at 4 bytes) > Enabled select histograms bin-width=0.100 ms, bins=10000 > ------------------------------------------------------------ > [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 16:07:33 (PDT) > [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr > [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/5368 us 8 > [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/569 us 72 > [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) > [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/312 us 131 > [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) > [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/302 us 136 > [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) > [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/316 us 130 > [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) > [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/253 us 162 > [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) > [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/290 us 141 > [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) > [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/359 us 114 > [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) > [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/349 us 117 > [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) > [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/347 us 118 > [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) > [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/1519 us 27 > [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > > [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > ------------------------------------------------------------ > Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) > Write buffer size: 4096 Byte > Bursting: 40.0 KByte every 1.00 seconds > TCP window size: 85.0 KByte (default) > Event based writes (pending queue watermark at 4 bytes) > Enabled select histograms bin-width=0.100 ms, bins=10000 > ------------------------------------------------------------ > [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on 2021-10-26 16:07:56 (PDT) > [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr > [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/10339 us 4 > [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 ms/1635289676.802143) > [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4853 us 8 > [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 ms/1635289677.802274) > [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4991 us 8 > [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 ms/1635289678.794326) > [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4610 us 9 > [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 ms/1635289679.794335) > [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5028 us 8 > [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 ms/1635289680.794399) > [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5113 us 8 > [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 ms/1635289681.794392) > [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5054 us 8 > [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 ms/1635289682.794335) > [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5138 us 8 > [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 ms/1635289683.794338) > [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5329 us 8 > [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 ms/1635289684.794262) > [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5329 us 8 > [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/6331 us 6 > [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > > Bob > > On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch > wrote: > Hello, > > > On Oct 25, 2021, at 9:24 PM, Eric Dumazet > wrote: > > > > > > > > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast > wrote: > >> > >>> Hi All, > >>> > >>> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. > >>> > >>> Does this seem reasonable? > >> > >> I’m not 100% sure what you’re asking, but I will try to help. > >> > >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. > > > > > > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually > > stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes > > was above the threshold. > > > > So it looks like Apple implementation is different, based on your description ? > > Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > > An app can still fill the send-buffer if it does a sendmsg() with a large buffer or does repeated calls to sendmsg(). > > Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly change the data it "scheduled" to send. And thus allow the app to write the smallest "logical unit" it has. If that unit is 512KB large, the app is allowed to send that. > For example, in case of video-streaming one may want to skip ahead in the video. In that case the app still needs to transmit the remaining parts of the previous frame anyways, before it can send the new video frame. > That's the reason why the Apple implementation allows one to write more than just the lowat threshold. > > > That being said, I do think that Linux's way allows for an easier API because the app does not need to be careful at how much data it sends after an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. > > > Christoph > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > > > netperf does not use epoll(), but rather a loop over sendmsg(). > > > > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase > > max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) > > by 10x, allowing for autotune to increase BDP for big RTT flows, this without > > increasing memory needs for flows with small RTT. > > > > In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) > >> > >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. > >> > >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >> > >> > > >> > >> Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. > >> > >> > > >> > >> Stuart Cheshire > >> _______________________________________________ > >> Bloat mailing list > >> Bloat at lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/bloat > >> > > _______________________________________________ > > Bloat mailing list > > Bloat at lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From moeller0 at gmx.de Wed Oct 27 10:29:11 2021 From: moeller0 at gmx.de (Sebastian Moeller) Date: Wed, 27 Oct 2021 16:29:11 +0200 Subject: [Cerowrt-devel] [Make-wifi-fast] [Starlink] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> Message-ID: <77662A26-2CDC-4201-8148-6C0702702447@gmx.de> Hi Bob, OWD != RTT/2 seems generically to be the rule on the internet not the exception, even with perfectly symmetric access links. Routing between AS often is asymmetric in it self (hot potato routing, where each AS hands over packets destined to others as early as possible, means that forward and backward path are often noticeably different; or rather they are different but that is hard to notice unless one can get path measurements like traceroutes from both directions). That last point is what makes me believe that internet speedtests, should always also include traceroutes from server to client and from client to server so one at least has a rough idea where the packets are going, but I digress... Regards Sebastian > On Oct 26, 2021, at 19:23, Bob McMahon via Make-wifi-fast wrote: > > Hi Bjørn, > > I find, when possible, it's preferred to take telemetry data of actual traffic (or reads and writes) vs a proxy. We had a case where TCP BE was outperforming TCP w/VI because BE had the most engineering resources assigned to it and engineers did a better job with BE. Using a proxy protocol wouldn't have exercised the same logic paths (in this case it was in the L2 driver) as TCP did. Hence, measuring actual TCP traffic (or socket reads and socket writes) was needed to flush out the problem. Note: I also find that network engineers tend to focus on the stack but it's the e2e at the application level that impacts user experience. Send side bloat can drive the OWD while the TCP stack's RTT may look fine. For WiFi test & measurements, we've decided most testing should be using TCP_NOSENT_LOWAT because it helps mitigate send side bloat which WiFi engineering doesn't focus on per lack of ability to impact. > > Also, I think OWD is under tested and two way based testing can give incomplete and inaccurate information, particularly with respect to things like an e2e transport's control loop. A most obvious example is assuming 1/2 RTT is the same as OWD to/fro. For WiFi this assumption is most always false. It also false for many residential internet connections where OWD asymmetry is designed in. > > Bob > > > On Tue, Oct 26, 2021 at 3:04 AM Bjørn Ivar Teigen wrote: > Hi Bob, > > My name is Bjørn Ivar Teigen and I'm working on modeling and measuring WiFi MAC-layer protocol performance for my PhD. > > Is it necessary to measure the latency using the TCP stream itself? I had a similar problem in the past, and solved it by doing the latency measurements using TWAMP running alongside the TCP traffic. The requirement for this to work is that the TWAMP packets are placed in the same queue(s) as the TCP traffic, and that the impact of measurement traffic is small enough so as not to interfere too much with your TCP results. > Just my two cents, hope it's helpful. > > Bjørn > > On Tue, 26 Oct 2021 at 06:32, Bob McMahon wrote: > Thanks Stuart this is helpful. I'm measuring the time just before the first write() (of potentially a burst of writes to achieve a burst size) per a socket fd's select event occurring when TCP_NOT_SENT_LOWAT being set to a small value, then sampling the RTT and CWND and providing histograms for all three, all on that event. I'm not sure the correctness of RTT and CWND at this sample point. This is a controlled test over 802.11ax and OFDMA where the TCP acks per the WiFi clients are being scheduled by the AP using 802.11ax trigger frames so the AP is affecting the end/end BDP per scheduling the transmits and the acks. The AP can grow the BDP or shrink it based on these scheduling decisions. From there we're trying to maximize network power (throughput/delay) for elephant flows and just latency for mouse flows. (We also plan some RF frequency stuff to per OFDMA) Anyway, the AP based scheduling along with aggregation and OFDMA makes WiFi scheduling optimums non-obvious - at least to me - and I'm trying to provide insights into how an AP is affecting end/end performance. > > The more direct approach for e2e TCP latency and network power has been to measure first write() to final read() and compute the e2e delay. This requires clock sync on the ends. (We're using ptp4l with GPS OCXO atomic references for that but this is typically only available in some labs.) > > Bob > > > On Mon, Oct 25, 2021 at 8:11 PM Stuart Cheshire wrote: > On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast wrote: > > > Hi All, > > > > Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. > > > > Does this seem reasonable? > > I’m not 100% sure what you’re asking, but I will try to help. > > When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) > > If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. > > How to use TCP_NOTSENT_LOWAT is explained in this video: > > > > Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. > > > > Stuart Cheshire > > This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it._______________________________________________ > Starlink mailing list > Starlink at lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink > > > -- > Bjørn Ivar Teigen > Head of Research > +47 47335952 | bjorn at domos.no | www.domos.no > WiFi Slicing by Domos > > This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it._______________________________________________ > Make-wifi-fast mailing list > Make-wifi-fast at lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/make-wifi-fast From cpaasch at apple.com Thu Oct 28 12:04:35 2021 From: cpaasch at apple.com (Christoph Paasch) Date: Thu, 28 Oct 2021 09:04:35 -0700 Subject: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> Message-ID: > On Oct 26, 2021, at 8:45 PM, Bob McMahon wrote: > > This is linux. The code flow is burst writes until the burst size, take a timestamp, call select(), take second timestamp and insert time delta into histogram, await clock_nanosleep() to schedule the next burst. (actually, the deltas, inserts into the histogram and user i/o are done in another thread, i.e. iperf 2's reporter thread.) > I still must be missing something. Does anything else need to be set to reduce the skb size? Everything seems to be indicating 4K writes even when gso_max_size is 2000 (I assume these are units of bytes?) There are ten writes, ten reads and ten RTTs for the bursts. I don't see partial writes at the app level. One thing to keep in mind is that once the congestion-window increased to > 40KB (your burst-size), all of the writes will not be blocking at all. TCP_NOTSENT_LOWAT is really just about the "notsent" part. Once the congestion-window is big enough to send 40KB in a burst, it will just all be immediately sent out. > [root at localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000 > [root at localhost iperf2-code]# ip -d link sh dev eth1 > 9: eth1: mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000 > link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 2000 gso_max_segs 65535 > [root at localhost iperf2-code]# uname -r > 5.0.9-301.fc30.x86_64 > > It looks like RTT is being driven by WiFi TXOPs as doubling the write size increases the aggregation by two but has no significant effect on the RTTs. > > 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2 > > 8k writes: tot_mpdus 317 tot_ampdus 107 mpduperampdu 3 > > [root at localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > ------------------------------------------------------------ > Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 flows) > Write buffer size: 4096 Byte > Bursting: 40.0 KByte every 1.00 seconds > TCP window size: 85.0 KByte (default) > Event based writes (pending queue watermark at 4 bytes) > Enabled select histograms bin-width=0.100 ms, bins=10000 > ------------------------------------------------------------ > [ 1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 20:25:29 (PDT) > [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr > [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/10091 us 4 > [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339) Am I reading this correctly, that your writes take worst-case 5 milli-seconds ? This looks correct then, because you seem to have an RTT of around 5ms. It's surprising though that your congestion-window is not increasing. Christoph > [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4990 us 8 > [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330) > [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4904 us 8 > [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353) > [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4964 us 8 > [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338) > [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4970 us 8 > [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335) > [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4986 us 8 > [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343) > [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5059 us 8 > [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) > [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5407 us 8 > [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 ms/1635305136.147343) > [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5188 us 8 > [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 ms/1635305137.146284) > [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5306 us 8 > [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422 ms/1635305138.146316) > [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/5939 us 7 > [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1 (5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) > > [root at localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1 > ------------------------------------------------------------ > Server listening on TCP port 5001 with pid 6287 > Binding to local address 192.168.1.1 and iface eth1 > Read buffer size: 128 KByte (Dist bin width=16.0 KByte) > TCP window size: 128 KByte (default) > ------------------------------------------------------------ > [ 1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port 45680 (MSS=1448) (burst-period=1.0000s) (trip-times) (sock=4) (peer 2.1.4-master) on 2021-10-26 20:25:29 (PDT) > [ ID] Burst (start-end) Transfer Bandwidth XferTime (DC%) Reads=Dist NetPwr > [ 1] 0.0001-0.0500 sec 40.1 KBytes 6.59 Mbits/sec 49.848 ms (5%) 12=12:0:0:0:0:0:0:0 0 > [ 1] 1.0002-1.0461 sec 40.0 KBytes 7.14 Mbits/sec 45.913 ms (4.6%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 2.0002-2.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.876 ms (4.9%) 11=11:0:0:0:0:0:0:0 0 > [ 1] 3.0002-3.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.886 ms (5%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 4.0002-4.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.887 ms (5%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 5.0002-5.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.881 ms (5%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 6.0002-6.0511 sec 40.0 KBytes 6.44 Mbits/sec 50.895 ms (5.1%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 7.0002-7.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.889 ms (5%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 8.0002-8.0481 sec 40.0 KBytes 6.84 Mbits/sec 47.901 ms (4.8%) 11=11:0:0:0:0:0:0:0 0 > [ 1] 9.0002-9.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.872 ms (4.9%) 10=10:0:0:0:0:0:0:0 0 > [ 1] 0.0000-10.0031 sec 400 KBytes 328 Kbits/sec 104=104:0:0:0:0:0:0:0 > > Bob > > On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet wrote: > > > On 10/26/21 4:38 PM, Christoph Paasch wrote: > > Hi Bob, > > > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon > wrote: > >> I'm confused. I don't see any blocking nor partial writes per the write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the write size is 4K and the watermark is 4 bytes. There are ten writes per burst. > > > > You are on Linux here, right? > > > > AFAICS, Linux will still accept whatever fits in an skb. And that is likely more than 4K (with GSO on by default). > > This (max payload per skb) can be tuned at the driver level, at least for experimental purposes or dedicated devices. > > ip link set dev eth0 gso_max_size 8000 > > To fetch current values : > > ip -d link sh dev eth0 > > > > > > However, do you go back to select() after each write() or do you loop over the write() calls? > > > > > > Christoph > > > >> The S8 histograms are the times waiting on the select(). The first value is the bin number (multiplied by 100usec bin width) and second the bin count. The worst case time is at the end and is timestamped per unix epoch. > >> > >> The second run is over a controlled WiFi link where a 99.7% point of 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired and is in the 600 usec range. (No media arbitration there.) > >> > >> [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 16:07:33 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/5368 us 8 > >> [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/569 us 72 > >> [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/312 us 131 > >> [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/302 us 136 > >> [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/316 us 130 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/253 us 162 > >> [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/290 us 141 > >> [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/359 us 114 > >> [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/349 us 117 > >> [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/347 us 118 > >> [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/1519 us 27 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > >> > >> [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on 2021-10-26 16:07:56 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 14K/10339 us 4 > >> [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 ms/1635289676.802143) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4853 us 8 > >> [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 ms/1635289677.802274) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4991 us 8 > >> [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 ms/1635289678.794326) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/4610 us 9 > >> [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 ms/1635289679.794335) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5028 us 8 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 ms/1635289680.794399) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5113 us 8 > >> [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 ms/1635289681.794392) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5054 us 8 > >> [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 ms/1635289682.794335) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5138 us 8 > >> [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 ms/1635289683.794338) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5329 us 8 > >> [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 ms/1635289684.794262) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 14K/5329 us 8 > >> [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 14K/6331 us 6 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > >> > >> Bob > >> > >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch > wrote: > >> > >> Hello, > >> > >> > On Oct 25, 2021, at 9:24 PM, Eric Dumazet > wrote: > >> > > >> > > >> > > >> > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >> >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast > wrote: > >> >> > >> >>> Hi All, > >> >>> > >> >>> Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. > >> >>> > >> >>> Does this seem reasonable? > >> >> > >> >> I’m not 100% sure what you’re asking, but I will try to help. > >> >> > >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. > >> > > >> > > >> > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that sendmsg() would actually > >> > stop feeding more bytes in TCP transmit queue if the current amount of unsent bytes > >> > was above the threshold. > >> > > >> > So it looks like Apple implementation is different, based on your description ? > >> > >> Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > >> > >> An app can still fill the send-buffer if it does a sendmsg() with a large buffer or does repeated calls to sendmsg(). > >> > >> Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly change the data it "scheduled" to send. And thus allow the app to write the smallest "logical unit" it has. If that unit is 512KB large, the app is allowed to send that. > >> For example, in case of video-streaming one may want to skip ahead in the video. In that case the app still needs to transmit the remaining parts of the previous frame anyways, before it can send the new video frame. > >> That's the reason why the Apple implementation allows one to write more than just the lowat threshold. > >> > >> > >> That being said, I do think that Linux's way allows for an easier API because the app does not need to be careful at how much data it sends after an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. > >> > >> > >> Christoph > >> > >> > >> > >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > >> > > >> > netperf does not use epoll(), but rather a loop over sendmsg(). > >> > > >> > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably increase > >> > max number of bytes in transmit queues (3rd column of /proc/sys/net/ipv4/tcp_wmem) > >> > by 10x, allowing for autotune to increase BDP for big RTT flows, this without > >> > increasing memory needs for flows with small RTT. > >> > > >> > In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.) > >> >> > >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time. > >> >> > >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >> >> > >> >> > > >> >> > >> >> Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness. > >> >> > >> >> > > >> >> > >> >> Stuart Cheshire > >> >> _______________________________________________ > >> >> Bloat mailing list > >> >> Bloat at lists.bufferbloat.net > >> >> https://lists.bufferbloat.net/listinfo/bloat > >> >> > >> > _______________________________________________ > >> > Bloat mailing list > >> > Bloat at lists.bufferbloat.net > >> > https://lists.bufferbloat.net/listinfo/bloat > >> > >> > >> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. > > This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. From dave.taht at gmail.com Thu Oct 28 20:21:59 2021 From: dave.taht at gmail.com (Dave Taht) Date: Thu, 28 Oct 2021 17:21:59 -0700 Subject: [Cerowrt-devel] it's been a really comforting couple weeks, legally Message-ID: openwrt got a dmca exemption! https://lwn.net/Articles/874290/ there were some really dark days there in the past few years. I'm hoping we can get momentum for opening up some binary blobs. -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC From bob.mcmahon at broadcom.com Fri Oct 29 17:16:51 2021 From: bob.mcmahon at broadcom.com (Bob McMahon) Date: Fri, 29 Oct 2021 14:16:51 -0700 Subject: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency In-Reply-To: References: <6D6492CF-BD6D-45BF-BD40-FA49166F6DA4@apple.com> <34fac143-f1be-9886-4931-65139acaca2e@gmail.com> Message-ID: Thanks for pointing out the congestion window. Not sure why it doesn't increase. I think that takes a stack expert ;) The run below with rx window clamp does seem to align with linux blocking the writes. Yes, in the previous runt the worst cases were 5.121ms which does align with the RTT. As a side note: I wonder if WiFi AP folks can somehow better "schedule aggregates" based on GSO "predictions." One of the challenges for WiFi is to align aggregates with what TCP is feeding it. I'm not sure if an intermediary last hop AP could keep the queue size based upon the e2e source "big tcp" so-to-speak. This is all out of my areas of expertise but it might be nice if the two non-linear control loops, being the AP & 802.11ax first/last link hop scheduling and e2e TCP's feedback loop could somehow plugged together in a way to help with both e2e low latency and throughput. Here's a run with receive side window clamping set to 1024 bytes which I think should force CWND not to grow. In this case it does look like linux is blocking the writes as the TCP_NOTSENT_LOWAT select waits are sub 100 microseconds so the write must have blocked. [root at localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms WARN: option of --burst-size without --burst-period defaults --burst-period to 1 second ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 with pid 24601 via eth1 (1 flows) Write buffer size: 4096 Byte Bursting: 40.0 KByte every 1.00 seconds TCP window size: 85.0 KByte (default) Event based writes (pending queue watermark at 4 bytes) Enabled select histograms bin-width=0.100 ms, bins=10000 ------------------------------------------------------------ [ 1] local 192.168.1.4%eth1 port 46042 connected with 192.168.1.1 port 5001 (MSS=576) (prefetch=4) (trip-times) (sock=3) (ct=5.01 ms) on 2021-10-29 13:57:22 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 10/0 0 5K/10109 us 4 [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,50:7,51:1 (5.00/95.00/99.7%=1/51/51,Outliers=0,obl/obu=0/0) (5.015 ms/1635541042.537251) [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/4941 us 8 [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541043.465805) [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/5036 us 8 [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.013 ms/1635541044.602288) [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/4956 us 8 [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541045.465820) [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/5121 us 8 [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.014 ms/1635541046.664221) [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/5029 us 8 [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.091 ms/1635541047.466021) [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/4930 us 8 [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:9,2:1 (5.00/95.00/99.7%=1/2/2,Outliers=0,obl/obu=0/0) (0.121 ms/1635541048.466058) [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/5096 us 8 [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541049.465821) [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/5086 us 8 [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541050.466051) [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 5K/5112 us 8 [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10 (5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541051.465915) [ 1] 0.00-10.02 sec 400 KBytes 327 Kbits/sec 100/0 0 5K/6518 us 6 [ 1] 0.00-10.02 sec S8(f)-PDF: bin(w=100us):cnt(100)=1:90,2:1,40:1,50:7,51:1 (5.00/95.00/99.7%=1/50/51,Outliers=9,obl/obu=0/0) (5.015 ms/1635541042.537251) [root at localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%ap0 --tcp-rx-window-clamp 1024 ------------------------------------------------------------ Server listening on TCP port 5001 with pid 22772 Binding to local address 192.168.1.1 and iface ap0 Read buffer size: 128 KByte (Dist bin width=16.0 KByte) TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.1%ap0 port 5001 connected with 192.168.1.4 port 46042 (MSS=1448) (clamp=1024) (burst-period=1.00s) (trip-times) (sock=4) (peer 2.1.4-master) on 2021-10-29 13:57:22 (PDT) [ ID] Burst (start-end) Transfer Bandwidth XferTime (DC%) Reads=Dist NetPwr [ 1] 0.00-0.20 sec 40.1 KBytes 1.65 Mbits/sec 199.727 ms (20%) 42=42:0:0:0:0:0:0:0 0 [ 1] 1.00-1.20 sec 40.0 KBytes 1.65 Mbits/sec 198.674 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 2.00-2.20 sec 40.0 KBytes 1.64 Mbits/sec 199.729 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 3.00-3.19 sec 40.0 KBytes 1.69 Mbits/sec 193.638 ms (19%) 40=40:0:0:0:0:0:0:0 0 [ 1] 4.00-4.20 sec 40.0 KBytes 1.62 Mbits/sec 201.660 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 5.00-5.20 sec 40.0 KBytes 1.65 Mbits/sec 198.460 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 6.00-6.19 sec 40.0 KBytes 1.69 Mbits/sec 194.418 ms (19%) 40=40:0:0:0:0:0:0:0 0 [ 1] 7.00-7.20 sec 40.0 KBytes 1.66 Mbits/sec 197.658 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 8.00-8.20 sec 40.0 KBytes 1.67 Mbits/sec 196.431 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 9.00-9.20 sec 40.0 KBytes 1.63 Mbits/sec 200.665 ms (20%) 40=40:0:0:0:0:0:0:0 0 [ 1] 0.00-10.00 sec 400 KBytes 328 Kbits/sec 402=402:0:0:0:0:0:0:0 Bob On Thu, Oct 28, 2021 at 9:04 AM Christoph Paasch wrote: > > > > On Oct 26, 2021, at 8:45 PM, Bob McMahon > wrote: > > > > This is linux. The code flow is burst writes until the burst size, take > a timestamp, call select(), take second timestamp and insert time delta > into histogram, await clock_nanosleep() to schedule the next burst. > (actually, the deltas, inserts into the histogram and user i/o are done in > another thread, i.e. iperf 2's reporter thread.) > > I still must be missing something. Does anything else need to be set to > reduce the skb size? Everything seems to be indicating 4K writes even when > gso_max_size is 2000 (I assume these are units of bytes?) There are ten > writes, ten reads and ten RTTs for the bursts. I don't see partial writes > at the app level. > > One thing to keep in mind is that once the congestion-window increased to > > 40KB (your burst-size), all of the writes will not be blocking at all. > TCP_NOTSENT_LOWAT is really just about the "notsent" part. Once the > congestion-window is big enough to send 40KB in a burst, it will just all > be immediately sent out. > > > [root at localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000 > > [root at localhost iperf2-code]# ip -d link sh dev eth1 > > 9: eth1: mtu 1500 qdisc fq_codel state > UNKNOWN mode DEFAULT group default qlen 1000 > > link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 > minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 > gso_max_size 2000 gso_max_segs 65535 > > [root at localhost iperf2-code]# uname -r > > 5.0.9-301.fc30.x86_64 > > > > It looks like RTT is being driven by WiFi TXOPs as doubling the write > size increases the aggregation by two but has no significant effect on the > RTTs. > > > > 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2 > > > > 8k writes: tot_mpdus 317 tot_ampdus 107 mpduperampdu 3 > > > > [root at localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 > --trip-times -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K > --histograms > > WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > > ------------------------------------------------------------ > > Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 > (1 flows) > > Write buffer size: 4096 Byte > > Bursting: 40.0 KByte every 1.00 seconds > > TCP window size: 85.0 KByte (default) > > Event based writes (pending queue watermark at 4 bytes) > > Enabled select histograms bin-width=0.100 ms, bins=10000 > > ------------------------------------------------------------ > > [ 1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port > 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on > 2021-10-26 20:25:29 (PDT) > > [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > > [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/10091 us 4 > > [ 1] 0.00-1.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 > (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 > ms/1635305129.152339) > > Am I reading this correctly, that your writes take worst-case 5 > milli-seconds ? > > This looks correct then, because you seem to have an RTT of around 5ms. > > > It's surprising though that your congestion-window is not increasing. > > > Christoph > > > > [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4990 us 8 > > [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 > (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 > ms/1635305130.153330) > > [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4904 us 8 > > [ 1] 2.00-3.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 > (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 > ms/1635305131.147353) > > [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4964 us 8 > > [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 > ms/1635305132.146338) > > [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4970 us 8 > > [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 > ms/1635305133.146335) > > [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4986 us 8 > > [ 1] 5.00-6.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 > ms/1635305134.146343) > > [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5059 us 8 > > [ 1] 6.00-7.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 > (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 > ms/1635305135.148343) > > [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5407 us 8 > > [ 1] 7.00-8.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 > (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 > ms/1635305136.147343) > > [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5188 us 8 > > [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 > ms/1635305137.146284) > > [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5306 us 8 > > [ 1] 9.00-10.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422 > ms/1635305138.146316) > > [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/5939 us 7 > > [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1 > (5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417 > ms/1635305135.148343) > > > > [root at localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1 > > ------------------------------------------------------------ > > Server listening on TCP port 5001 with pid 6287 > > Binding to local address 192.168.1.1 and iface eth1 > > Read buffer size: 128 KByte (Dist bin width=16.0 KByte) > > TCP window size: 128 KByte (default) > > ------------------------------------------------------------ > > [ 1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port > 45680 (MSS=1448) (burst-period=1.0000s) (trip-times) (sock=4) (peer > 2.1.4-master) on 2021-10-26 20:25:29 (PDT) > > [ ID] Burst (start-end) Transfer Bandwidth XferTime (DC%) > Reads=Dist NetPwr > > [ 1] 0.0001-0.0500 sec 40.1 KBytes 6.59 Mbits/sec 49.848 ms (5%) > 12=12:0:0:0:0:0:0:0 0 > > [ 1] 1.0002-1.0461 sec 40.0 KBytes 7.14 Mbits/sec 45.913 ms (4.6%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 2.0002-2.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.876 ms (4.9%) > 11=11:0:0:0:0:0:0:0 0 > > [ 1] 3.0002-3.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.886 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 4.0002-4.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.887 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 5.0002-5.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.881 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 6.0002-6.0511 sec 40.0 KBytes 6.44 Mbits/sec 50.895 ms (5.1%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 7.0002-7.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.889 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 8.0002-8.0481 sec 40.0 KBytes 6.84 Mbits/sec 47.901 ms (4.8%) > 11=11:0:0:0:0:0:0:0 0 > > [ 1] 9.0002-9.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.872 ms (4.9%) > 10=10:0:0:0:0:0:0:0 0 > > [ 1] 0.0000-10.0031 sec 400 KBytes 328 Kbits/sec > 104=104:0:0:0:0:0:0:0 > > > > Bob > > > > On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet > wrote: > > > > > > On 10/26/21 4:38 PM, Christoph Paasch wrote: > > > Hi Bob, > > > > > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon > wrote: > > >> I'm confused. I don't see any blocking nor partial writes per the > write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst > is 40K, the write size is 4K and the watermark is 4 bytes. There are ten > writes per burst. > > > > > > You are on Linux here, right? > > > > > > AFAICS, Linux will still accept whatever fits in an skb. And that is > likely more than 4K (with GSO on by default). > > > > This (max payload per skb) can be tuned at the driver level, at least > for experimental purposes or dedicated devices. > > > > ip link set dev eth0 gso_max_size 8000 > > > > To fetch current values : > > > > ip -d link sh dev eth0 > > > > > > > > > > However, do you go back to select() after each write() or do you loop > over the write() calls? > > > > > > > > > Christoph > > > > > >> The S8 histograms are the times waiting on the select(). The first > value is the bin number (multiplied by 100usec bin width) and second the > bin count. The worst case time is at the end and is timestamped per unix > epoch. > > >> > > >> The second run is over a controlled WiFi link where a 99.7% point of > 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G > wired and is in the 600 usec range. (No media arbitration there.) > > >> > > >> [root at localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times > -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > > >> WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > > >> ------------------------------------------------------------ > > >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) > > >> Write buffer size: 4096 Byte > > >> Bursting: 40.0 KByte every 1.00 seconds > > >> TCP window size: 85.0 KByte (default) > > >> Event based writes (pending queue watermark at 4 bytes) > > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > > >> ------------------------------------------------------------ > > >> [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 > port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on > 2021-10-26 16:07:33 (PDT) > > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/5368 us 8 > > >> [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 > (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 > ms/1635289653.928360) > > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/569 us 72 > > >> [ 1] 1.00-2.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 > (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) > > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/312 us 131 > > >> [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) > > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/302 us 136 > > >> [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) > > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/316 us 130 > > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) > > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/253 us 162 > > >> [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 > (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) > > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/290 us 141 > > >> [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) > > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/359 us 114 > > >> [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) > > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/349 us 117 > > >> [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 > (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) > > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/347 us 118 > > >> [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 > (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) > > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/1519 us 27 > > >> [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 > (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 > ms/1635289653.928360) > > >> > > >> [root at localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times > -i 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > > >> WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > > >> ------------------------------------------------------------ > > >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 > flows) > > >> Write buffer size: 4096 Byte > > >> Bursting: 40.0 KByte every 1.00 seconds > > >> TCP window size: 85.0 KByte (default) > > >> Event based writes (pending queue watermark at 4 bytes) > > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > > >> ------------------------------------------------------------ > > >> [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 > port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on > 2021-10-26 16:07:56 (PDT) > > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/10339 us 4 > > >> [ 1] 0.00-1.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 > (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 > ms/1635289676.802143) > > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4853 us 8 > > >> [ 1] 1.00-2.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 > (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 > ms/1635289677.802274) > > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4991 us 8 > > >> [ 1] 2.00-3.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 > ms/1635289678.794326) > > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4610 us 9 > > >> [ 1] 3.00-4.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 > ms/1635289679.794335) > > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5028 us 8 > > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 > ms/1635289680.794399) > > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5113 us 8 > > >> [ 1] 5.00-6.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 > ms/1635289681.794392) > > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5054 us 8 > > >> [ 1] 6.00-7.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 > ms/1635289682.794335) > > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5138 us 8 > > >> [ 1] 7.00-8.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 > ms/1635289683.794338) > > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5329 us 8 > > >> [ 1] 8.00-9.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 > (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 > ms/1635289684.794262) > > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5329 us 8 > > >> [ 1] 9.00-10.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 > (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 > ms/1635289685.796315) > > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/6331 us 6 > > >> [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 > (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 > ms/1635289685.796315) > > >> > > >> Bob > > >> > > >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch > wrote: > > >> > > >> Hello, > > >> > > >> > On Oct 25, 2021, at 9:24 PM, Eric Dumazet < > eric.dumazet at gmail.com > wrote: > > >> > > > >> > > > >> > > > >> > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > > >> >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast < > make-wifi-fast at lists.bufferbloat.net make-wifi-fast at lists.bufferbloat.net>> wrote: > > >> >> > > >> >>> Hi All, > > >> >>> > > >> >>> Sorry for the spam. I'm trying to support a meaningful TCP > message latency w/iperf 2 from the sender side w/o requiring e2e clock > synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to > help with this. It seems that this event goes off when the bytes are in > flight vs have reached the destination network stack. If that's the case, > then iperf 2 client (sender) may be able to produce the message latency by > adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled > RTT. > > >> >>> > > >> >>> Does this seem reasonable? > > >> >> > > >> >> I’m not 100% sure what you’re asking, but I will try to help. > > >> >> > > >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t > report your endpoint as writable (e.g., via kqueue or epoll) until less > than that threshold of data remains unsent. It won’t stop you writing more > bytes if you want to, up to the socket send buffer size, but it won’t *ask* > you for more data until the TCP_NOTSENT_LOWAT threshold is reached. > > >> > > > >> > > > >> > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made > sure that sendmsg() would actually > > >> > stop feeding more bytes in TCP transmit queue if the current > amount of unsent bytes > > >> > was above the threshold. > > >> > > > >> > So it looks like Apple implementation is different, based on > your description ? > > >> > > >> Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > > >> > > >> An app can still fill the send-buffer if it does a sendmsg() with > a large buffer or does repeated calls to sendmsg(). > > >> > > >> Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to > quickly change the data it "scheduled" to send. And thus allow the app to > write the smallest "logical unit" it has. If that unit is 512KB large, the > app is allowed to send that. > > >> For example, in case of video-streaming one may want to skip > ahead in the video. In that case the app still needs to transmit the > remaining parts of the previous frame anyways, before it can send the new > video frame. > > >> That's the reason why the Apple implementation allows one to > write more than just the lowat threshold. > > >> > > >> > > >> That being said, I do think that Linux's way allows for an easier > API because the app does not need to be careful at how much data it sends > after an epoll/kqueue wakeup. So, the latency-benefits will be easier to > get. > > >> > > >> > > >> Christoph > > >> > > >> > > >> > > >> > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > < > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > > > >> > > > >> > netperf does not use epoll(), but rather a loop over sendmsg(). > > >> > > > >> > One of the point of TCP_NOTSENT_LOWAT for Google was to be able > to considerably increase > > >> > max number of bytes in transmit queues (3rd column of > /proc/sys/net/ipv4/tcp_wmem) > > >> > by 10x, allowing for autotune to increase BDP for big RTT > flows, this without > > >> > increasing memory needs for flows with small RTT. > > >> > > > >> > In other words, the TCP implementation attempts to keep BDP > bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP > of bytes in flight is necessary to fill the network pipe and get good > throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is > provided to give the source software some advance notice that the TCP > implementation will soon be looking for more bytes to send, so that the > buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF > option conflates both “bytes in flight” and “bytes buffered and ready to > go” into the same number.) > > >> >> > > >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a > chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT > notification, that will tell you roughly how long it took n bytes to depart > the machine. You won’t know why, though. The bytes could depart the machine > in response for acks indicating that the same number of bytes have been > accepted at the receiver. But the bytes can also depart the machine because > CWND is growing. Of course, both of those things are usually happening at > the same time. > > >> >> > > >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: > > >> >> > > >> >> < > https://developer.apple.com/videos/play/wwdc2015/719/?time=2199 < > https://developer.apple.com/videos/play/wwdc2015/719/?time=2199>> > > >> >> > > >> >> Later in the same video is a two-minute demo (time offset > 42:00 to time offset 44:00) showing a “before and after” demo illustrating > the dramatic difference this makes for screen sharing responsiveness. > > >> >> > > >> >> < > https://developer.apple.com/videos/play/wwdc2015/719/?time=2520 < > https://developer.apple.com/videos/play/wwdc2015/719/?time=2520>> > > >> >> > > >> >> Stuart Cheshire > > >> >> _______________________________________________ > > >> >> Bloat mailing list > > >> >> Bloat at lists.bufferbloat.net Bloat at lists.bufferbloat.net> > > >> >> https://lists.bufferbloat.net/listinfo/bloat < > https://lists.bufferbloat.net/listinfo/bloat> > > >> >> > > >> > _______________________________________________ > > >> > Bloat mailing list > > >> > Bloat at lists.bufferbloat.net > > > >> > https://lists.bufferbloat.net/listinfo/bloat < > https://lists.bufferbloat.net/listinfo/bloat> > > >> > > >> > > >> This electronic communication and the information and any files > transmitted with it, or attached to it, are confidential and are intended > solely for the use of the individual or entity to whom it is addressed and > may contain information that is confidential, legally privileged, protected > by privacy laws, or otherwise restricted from disclosure to anyone else. If > you are not the intended recipient or the person responsible for delivering > the e-mail to the intended recipient, you are hereby notified that any use, > copying, distributing, dissemination, forwarding, printing, or copying of > this e-mail is strictly prohibited. If you received this e-mail in error, > please return the e-mail to the sender, delete it from your computer, and > destroy any printed copy of it. > > > > This electronic communication and the information and any files > transmitted with it, or attached to it, are confidential and are intended > solely for the use of the individual or entity to whom it is addressed and > may contain information that is confidential, legally privileged, protected > by privacy laws, or otherwise restricted from disclosure to anyone else. If > you are not the intended recipient or the person responsible for delivering > the e-mail to the intended recipient, you are hereby notified that any use, > copying, distributing, dissemination, forwarding, printing, or copying of > this e-mail is strictly prohibited. If you received this e-mail in error, > please return the e-mail to the sender, delete it from your computer, and > destroy any printed copy of it. > > -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.taht at gmail.com Sat Oct 30 21:01:50 2021 From: dave.taht at gmail.com (Dave Taht) Date: Sat, 30 Oct 2021 18:01:50 -0700 Subject: [Cerowrt-devel] Occasionally I feel ornery Message-ID: And my scars have mostly faded from the heyday of cerowrt. That said, I really don't know what the world needs anymore on this front, most of my take on things is ISPs need to deploy what we've had running for 8 years, adopting prplwrt, or something. That said, I'd like some feedback over here: https://forum.openwrt.org/t/cerowrt-ii-would-anyone-care/110554 If anyone has any ideas. -- I tried to build a better future, a few times: https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org Dave Täht CEO, TekLibre, LLC