[Cerowrt-devel] [Make-wifi-fast] [Starlink] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency
Sebastian Moeller
moeller0 at gmx.de
Wed Oct 27 10:29:11 EDT 2021
Hi Bob,
OWD != RTT/2 seems generically to be the rule on the internet not the exception, even with perfectly symmetric access links. Routing between AS often is asymmetric in it self (hot potato routing, where each AS hands over packets destined to others as early as possible, means that forward and backward path are often noticeably different; or rather they are different but that is hard to notice unless one can get path measurements like traceroutes from both directions). That last point is what makes me believe that internet speedtests, should always also include traceroutes from server to client and from client to server so one at least has a rough idea where the packets are going, but I digress...
Regards
Sebastian
> On Oct 26, 2021, at 19:23, Bob McMahon via Make-wifi-fast <make-wifi-fast at lists.bufferbloat.net> wrote:
>
> Hi Bjørn,
>
> I find, when possible, it's preferred to take telemetry data of actual traffic (or reads and writes) vs a proxy. We had a case where TCP BE was outperforming TCP w/VI because BE had the most engineering resources assigned to it and engineers did a better job with BE. Using a proxy protocol wouldn't have exercised the same logic paths (in this case it was in the L2 driver) as TCP did. Hence, measuring actual TCP traffic (or socket reads and socket writes) was needed to flush out the problem. Note: I also find that network engineers tend to focus on the stack but it's the e2e at the application level that impacts user experience. Send side bloat can drive the OWD while the TCP stack's RTT may look fine. For WiFi test & measurements, we've decided most testing should be using TCP_NOSENT_LOWAT because it helps mitigate send side bloat which WiFi engineering doesn't focus on per lack of ability to impact.
>
> Also, I think OWD is under tested and two way based testing can give incomplete and inaccurate information, particularly with respect to things like an e2e transport's control loop. A most obvious example is assuming 1/2 RTT is the same as OWD to/fro. For WiFi this assumption is most always false. It also false for many residential internet connections where OWD asymmetry is designed in.
>
> Bob
>
>
> On Tue, Oct 26, 2021 at 3:04 AM Bjørn Ivar Teigen <bjorn at domos.no> wrote:
> Hi Bob,
>
> My name is Bjørn Ivar Teigen and I'm working on modeling and measuring WiFi MAC-layer protocol performance for my PhD.
>
> Is it necessary to measure the latency using the TCP stream itself? I had a similar problem in the past, and solved it by doing the latency measurements using TWAMP running alongside the TCP traffic. The requirement for this to work is that the TWAMP packets are placed in the same queue(s) as the TCP traffic, and that the impact of measurement traffic is small enough so as not to interfere too much with your TCP results.
> Just my two cents, hope it's helpful.
>
> Bjørn
>
> On Tue, 26 Oct 2021 at 06:32, Bob McMahon <bob.mcmahon at broadcom.com> wrote:
> Thanks Stuart this is helpful. I'm measuring the time just before the first write() (of potentially a burst of writes to achieve a burst size) per a socket fd's select event occurring when TCP_NOT_SENT_LOWAT being set to a small value, then sampling the RTT and CWND and providing histograms for all three, all on that event. I'm not sure the correctness of RTT and CWND at this sample point. This is a controlled test over 802.11ax and OFDMA where the TCP acks per the WiFi clients are being scheduled by the AP using 802.11ax trigger frames so the AP is affecting the end/end BDP per scheduling the transmits and the acks. The AP can grow the BDP or shrink it based on these scheduling decisions. From there we're trying to maximize network power (throughput/delay) for elephant flows and just latency for mouse flows. (We also plan some RF frequency stuff to per OFDMA) Anyway, the AP based scheduling along with aggregation and OFDMA makes WiFi scheduling optimums non-obvious - at least to me - and I'm trying to provide insights into how an AP is affecting end/end performance.
>
> The more direct approach for e2e TCP latency and network power has been to measure first write() to final read() and compute the e2e delay. This requires clock sync on the ends. (We're using ptp4l with GPS OCXO atomic references for that but this is typically only available in some labs.)
>
> Bob
>
>
> On Mon, Oct 25, 2021 at 8:11 PM Stuart Cheshire <cheshire at apple.com> wrote:
> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast <make-wifi-fast at lists.bufferbloat.net> wrote:
>
> > Hi All,
> >
> > Sorry for the spam. I'm trying to support a meaningful TCP message latency w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It seems that this event goes off when the bytes are in flight vs have reached the destination network stack. If that's the case, then iperf 2 client (sender) may be able to produce the message latency by adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT.
> >
> > Does this seem reasonable?
>
> I’m not 100% sure what you’re asking, but I will try to help.
>
> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your endpoint as writable (e.g., via kqueue or epoll) until less than that threshold of data remains unsent. It won’t stop you writing more bytes if you want to, up to the socket send buffer size, but it won’t *ask* you for more data until the TCP_NOTSENT_LOWAT threshold is reached. In other words, the TCP implementation attempts to keep BDP bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is necessary to fill the network pipe and get good throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the source software some advance notice that the TCP implementation will soon be looking for more bytes to send, so that the buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” and “bytes buffered and ready to go” into the same number.)
>
> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will tell you roughly how long it took n bytes to depart the machine. You won’t know why, though. The bytes could depart the machine in response for acks indicating that the same number of bytes have been accepted at the receiver. But the bytes can also depart the machine because CWND is growing. Of course, both of those things are usually happening at the same time.
>
> How to use TCP_NOTSENT_LOWAT is explained in this video:
>
> <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199>
>
> Later in the same video is a two-minute demo (time offset 42:00 to time offset 44:00) showing a “before and after” demo illustrating the dramatic difference this makes for screen sharing responsiveness.
>
> <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520>
>
> Stuart Cheshire
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it._______________________________________________
> Starlink mailing list
> Starlink at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
>
>
> --
> Bjørn Ivar Teigen
> Head of Research
> +47 47335952 | bjorn at domos.no | www.domos.no
> WiFi Slicing by Domos
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it._______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
More information about the Cerowrt-devel
mailing list