From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ma1-aaemail-dr-lapp02.apple.com (ma1-aaemail-dr-lapp02.apple.com [17.171.2.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id A4D583CB41; Tue, 26 Oct 2021 14:45:30 -0400 (EDT) Received: from pps.filterd (ma1-aaemail-dr-lapp02.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp02.apple.com (8.16.0.42/8.16.0.42) with SMTP id 19QIcXbu060375; Tue, 26 Oct 2021 11:45:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=20180706; bh=oyL9VS5ulcyKzt+/gU1oZu7Al8JJpWcRXX5kozjTmS0=; b=lfjcq46H6uJcVBsr79BCXKlAhYciafTU5Otv4LBGx7NzLy9Mus9gca+xIE8I3tNUnHZV JMJO10GzBq2CYqPTeUpq+Gbzm59kpOT1ogxHAPKR4/+zxCBwuFGRB9EXRFD0nroPQljs OCz+quDOAxiRtBRO6gi5hTcN/M647H3W6bRSo8OM8vRmOe0f7XPjCtR7XMbB4oANGo6i TM/8M1viaOG7Z19XEDs4j0vEHTDqDxpxjecB34lbs9saT1T6PkGX8+404Bk9ijc7FTtm 8hL5hQ5EHbjLAsGYETe1tdwkMBH50qDbEt/2g+o4ucWgUyAzaG6HpIsVnk6CvM3DxEM+ 8A== Received: from rn-mailsvcp-mta-lapp01.rno.apple.com (rn-mailsvcp-mta-lapp01.rno.apple.com [10.225.203.149]) by ma1-aaemail-dr-lapp02.apple.com with ESMTP id 3bx4htvjpm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 26 Oct 2021 11:45:27 -0700 Received: from rn-mailsvcp-mmp-lapp01.rno.apple.com (rn-mailsvcp-mmp-lapp01.rno.apple.com [17.179.253.14]) by rn-mailsvcp-mta-lapp01.rno.apple.com (Oracle Communications Messaging Server 8.1.0.12.20210903 64bit (built Sep 3 2021)) with ESMTPS id <0R1L005GXLFQ4S80@rn-mailsvcp-mta-lapp01.rno.apple.com>; Tue, 26 Oct 2021 11:45:26 -0700 (PDT) Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp01.rno.apple.com by rn-mailsvcp-mmp-lapp01.rno.apple.com (Oracle Communications Messaging Server 8.1.0.12.20210903 64bit (built Sep 3 2021)) id <0R1L00Q00LECI500@rn-mailsvcp-mmp-lapp01.rno.apple.com>; Tue, 26 Oct 2021 11:45:26 -0700 (PDT) X-Va-A: X-Va-T-CD: c77343ee6ab9f324defc7c289727b739 X-Va-E-CD: e6061580a21389ade45530f31e4ff121 X-Va-R-CD: 8bbe4467100e83bb797086b9816f35d5 X-Va-CD: 0 X-Va-ID: 1e0b279c-bd24-4716-bca8-fdc04a3c228e X-V-A: X-V-T-CD: c77343ee6ab9f324defc7c289727b739 X-V-E-CD: e6061580a21389ade45530f31e4ff121 X-V-R-CD: 8bbe4467100e83bb797086b9816f35d5 X-V-CD: 0 X-V-ID: d40da595-b532-430a-941b-b6b1572f3421 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425, 18.0.790 definitions=2021-10-26_05:2021-10-26, 2021-10-26 signatures=0 Received: from smtpclient.apple ([17.192.155.152]) by rn-mailsvcp-mmp-lapp01.rno.apple.com (Oracle Communications Messaging Server 8.1.0.12.20210903 64bit (built Sep 3 2021)) with ESMTPSA id <0R1L00QGDLFP1500@rn-mailsvcp-mmp-lapp01.rno.apple.com>; Tue, 26 Oct 2021 11:45:25 -0700 (PDT) Content-type: text/plain; charset=utf-8 MIME-version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency From: Christoph Paasch In-reply-to: <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> Date: Tue, 26 Oct 2021 11:45:24 -0700 Cc: Stuart Cheshire , Bob McMahon , Cake List , =?utf-8?Q?Valdis_Kl=C4=93tnieks?= , Make-Wifi-fast , "David P. Reed" , starlink@lists.bufferbloat.net, codel , cerowrt-devel , bloat , Steve Crocker , Vint Cerf Content-transfer-encoding: quoted-printable Message-id: <4BFB5A37-9574-49BE-B083-FBC1F2B0381E@apple.com> References: <1625188609.32718319@apps.rackspace.com> <989de0c1-e06c-cda9-ebe6-1f33df8a4c24@candelatech.com> <1625773080.94974089@apps.rackspace.com> <1625859083.09751240@apps.rackspace.com> <257851.1632110422@turing-police> <1632680642.869711321@apps.rackspace.com> <0e29e225-9f55-4392-640a-2d27c4c26116@gmail.com> To: Eric Dumazet X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425, 18.0.790 definitions=2021-10-26_05:2021-10-26, 2021-10-26 signatures=0 X-Mailman-Approved-At: Wed, 27 Oct 2021 07:43:09 -0400 X-List-Received-Date: Tue, 26 Oct 2021 18:45:30 -0000 Hello, > On Oct 25, 2021, at 9:24 PM, Eric Dumazet = wrote: >=20 >=20 >=20 > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast = wrote: >>=20 >>> Hi All, >>>=20 >>> Sorry for the spam. I'm trying to support a meaningful TCP message = latency w/iperf 2 from the sender side w/o requiring e2e clock = synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to = help with this. It seems that this event goes off when the bytes are in = flight vs have reached the destination network stack. If that's the = case, then iperf 2 client (sender) may be able to produce the message = latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and = the sampled RTT. >>>=20 >>> Does this seem reasonable? >>=20 >> I=E2=80=99m not 100% sure what you=E2=80=99re asking, but I will try = to help. >>=20 >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won=E2=80=99t = report your endpoint as writable (e.g., via kqueue or epoll) until less = than that threshold of data remains unsent. It won=E2=80=99t stop you = writing more bytes if you want to, up to the socket send buffer size, = but it won=E2=80=99t *ask* you for more data until the TCP_NOTSENT_LOWAT = threshold is reached. >=20 >=20 > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure = that sendmsg() would actually > stop feeding more bytes in TCP transmit queue if the current amount of = unsent bytes > was above the threshold. >=20 > So it looks like Apple implementation is different, based on your = description ? Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... An app can still fill the send-buffer if it does a sendmsg() with a = large buffer or does repeated calls to sendmsg(). Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly = change the data it "scheduled" to send. And thus allow the app to write = the smallest "logical unit" it has. If that unit is 512KB large, the app = is allowed to send that. For example, in case of video-streaming one may want to skip ahead in = the video. In that case the app still needs to transmit the remaining = parts of the previous frame anyways, before it can send the new video = frame. That's the reason why the Apple implementation allows one to write more = than just the lowat threshold. That being said, I do think that Linux's way allows for an easier API = because the app does not need to be careful at how much data it sends = after an epoll/kqueue wakeup. So, the latency-benefits will be easier to = get. Christoph > [1] = https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3D= c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 >=20 > netperf does not use epoll(), but rather a loop over sendmsg(). >=20 > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to = considerably increase > max number of bytes in transmit queues (3rd column of = /proc/sys/net/ipv4/tcp_wmem) > by 10x, allowing for autotune to increase BDP for big RTT flows, this = without > increasing memory needs for flows with small RTT. >=20 > In other words, the TCP implementation attempts to keep BDP bytes in = flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of = bytes in flight is necessary to fill the network pipe and get good = throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is = provided to give the source software some advance notice that the TCP = implementation will soon be looking for more bytes to send, so that the = buffer doesn=E2=80=99t run dry, thereby lowering throughput. (The old = SO_SNDBUF option conflates both =E2=80=9Cbytes in flight=E2=80=9D and = =E2=80=9Cbytes buffered and ready to go=E2=80=9D into the same number.) >>=20 >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of = n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT = notification, that will tell you roughly how long it took n bytes to = depart the machine. You won=E2=80=99t know why, though. The bytes could = depart the machine in response for acks indicating that the same number = of bytes have been accepted at the receiver. But the bytes can also = depart the machine because CWND is growing. Of course, both of those = things are usually happening at the same time. >>=20 >> How to use TCP_NOTSENT_LOWAT is explained in this video: >>=20 >> >>=20 >> Later in the same video is a two-minute demo (time offset 42:00 to = time offset 44:00) showing a =E2=80=9Cbefore and after=E2=80=9D demo = illustrating the dramatic difference this makes for screen sharing = responsiveness. >>=20 >> >>=20 >> Stuart Cheshire >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >>=20 > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat