[Bloat] Apple WWDC Talks on Latency/Bufferbloat

Sebastian Moeller moeller0 at gmx.de
Tue Jul 6 15:08:59 EDT 2021


Hello Christoph,

thanks for your detailed response!

> On Jul 6, 2021, at 20:54, Christoph Paasch <cpaasch at apple.com> wrote:
> 
> Hello Sebastian,
> 
> On 06/29/21 - 09:58, Sebastian Moeller wrote:
>> Hi Christoph,
>> 
>> one question below:
>> 
>>> On Jun 18, 2021, at 01:43, Christoph Paasch via Bloat
>>> <bloat at lists.bufferbloat.net> wrote:
>>> 
>>> Hello,
>>> 
>>> On 06/17/21 - 11:16, Matt Mathis via Bloat wrote:
>>>> Is there a paper or spec for RPM?
>>> 
>>> we try to publish an IETF-draft on the methodology before the upcoming
>>> IETF in July.
>>> 
>>> But, in the mean-time please see inline:
>>> 
>>>> There are at least two different ways to define RPM, both of which
>>>> might be relevant.
>>>> 
>>>> At the TCP layer: it can be directly computed from a packet capture.
>>>> The trick is to time reverse a trace and compute the critical path
>>>> backwards through the trace: what event triggered each segment or ACK,
>>>> and count round trips.  This would be super robust but does not include
>>>> the queueing required in the kernel socket buffers.  I need to think
>>>> some more about computing TCP RPM from tcp_info or other kernel
>>>> instrumentation - it might be possible.
>>> 
>>> We explicitly opted against measuring purely TCP-level round-trip times.
>>> Because there are countless transparent TCP-proxies out there that would
>>> skew these numbers. Our goal with RPM/Responsiveness is to measure how
>>> an end-user would experience the network. Which means, DNS-resolution,
>>> TCP handshake-time, TLS-handshake, HTTP/2 Request/response. Because, at
>>> the end, that's what actually matters to the users.
>>> 
>>>> A different RPM can be done in the application, above TCP, for example
>>>> by ping-ponging messages.  This would include the delays traversing the
>>>> kernel socket buffers which have to be at least as large as a full
>>>> network RTT.
>>>> 
>>>> This is perhaps an important point: due to the retransmit and
>>>> reassuebly queues (which are required to implement robust data
>>>> delivery) TCP must be able hold at least a full RTT of data in it's own
>>>> buffers, which means that under some conditions the RTT as seen by the
>>>> application has be be at least twice the network's RTT, including any
>>>> bloat in the network.
>>> 
>>> Currently, we measure RPM on separate connections (not the load-bearing
>>> ones). We are also measuring on the load-bearing connections themselves
>>> through H2 Ping frames. But for the reasons you described we haven't yet
>>> factored it into the RPM-number.
>>> 
>>> One way may be to inspect with TCP_INFO whether or not the connections
>>> had retransmissions and then throw away the number. On the other hand,
>>> if the network becomes extremely lossy under working conditions, it does
>>> impact the user-experience and so it could make sense to take this into
>>> account.
>>> 
>>> 
>>> In the end, we realized how hard it is to accurately measure bufferbloat
>>> within a reasonable time-frame (our goal is to finish the test within
>>> ~15 seconds).
>> 
>> 	[SM] I understand that 10-15 seconds is the amount of time users
>> 	have been trained to expect an on-line speedtest to take, but
>> 	experiments with flent/RRUL showed that there are latency affection
>> 	processes on slower timescales that are better visible if one can
>> 	also run a test for 60 - 300 seconds (e.g. cyclic WiFi channel
>> 	probing). Does your tool optionally allow to specify a longer
>> 	run-time?
> 
> Currently the tool does not have a "deep-dive"-mode. There are a few things
> (besides running longer) that a "deep-dive"-mode could provide. For example,
> traceroute-style probes during the test to identify the location of the
> bufferbloat.


[SM] Oh, shiny ;) To be useful/interpretable such a tracerouter style path traversal should be performed from both sides of a link (I am sure you know, but my go to slide-deck is https://archive.nanog.org/sites/default/files/10_Roisman_Traceroute.pdf). But it would be sweet if there was a reliable way to get bi-directional traceroutes over path one actually uses.



> Use H3 for testing and/or run TCP on a different port to
> identify traffic-classifiers/transparent TCP-proxies that treat things
> differently. Study the impact of TCP bulk transfer on UDP latency. And so
> on...
> Such a deep-dive mode would be possible in the command-line tool but very
> unlikely in the UI-mode.

[SM] Fair enough, thanks.


> 
> Our primary goal in this first iteration is to provide a tool that gives a
> quick insight into how bad/good the bufferbloat is on the network in such a
> way that a non-expert user can run it and understand the result.

[SM] Worthy goal.


> We also want it to be using standard protocols so that any basic web-server can
> be configured to serve as an endpoint to it and because that's the protocols
> that the users are actually using in the end.

[SM] +1; Yes, tests with the production protocols, ideally to the "production" servers seems like a great way forward.


Regards
	Sebastian

> 
> 
> Cheers,
> Christoph
> 
> 
>>      Thinking of it, to keep everybody on their toes, how
>> 	about occasionally running a test with longer run-time (maybe after
>> 	asking the users consent) and store the test duration as part of the
>> 	results?
>> 
>> 
>> Best Regards Sebastian
>> 
>> 
>>> 
>>> We hope that with the IETF-draft we can get the right people together to
>>> iterate over it and squash out a very accurate measurement that
>>> represents what users would experience.
>>> 
>>> 
>>> Cheers, Christoph
>>> 
>>> 
>>>> 
>>>> Thanks, --MM-- The best way to predict the future is to create it.  -
>>>> Alan Kay
>>>> 
>>>> We must not tolerate intolerance; however our response must be
>>>> carefully measured: too strong would be hypocritical and risks
>>>> spiraling out of control; too weak risks being mistaken for tacit
>>>> approval.
>>>> 
>>>> 
>>>> On Sat, Jun 12, 2021 at 9:11 AM Rich Brown <richb.hanover at gmail.com>
>>>> wrote:
>>>> 
>>>>>> On Jun 12, 2021, at 12:00 PM, bloat-request at lists.bufferbloat.net
>>>>>> wrote:
>>>>>> 
>>>>>> Some relevant talks / publicity at WWDC -- the first mentioning
>>>>>> CoDel, queueing, etc. Featuring Stuart Cheshire. iOS 15 adds a
>>>>>> developer test
>>>>> for
>>>>>> loaded latency, reported in "RPM" or round-trips per minute.
>>>>>> 
>>>>>> I ran it on my machine: nowens at mac1015 ~ % /usr/bin/networkQuality
>>>>>> ==== SUMMARY ==== Upload capacity: 90.867 Mbps Download capacity:
>>>>>> 93.616 Mbps Upload flows: 16 Download flows: 20 Responsiveness:
>>>>>> Medium (840 RPM)
>>>>> 
>>>>> Does anyone know how to get the command-line version for current (not
>>>>> upcoming) macOS? Thanks.
>>>>> 
>>>>> Rich _______________________________________________ Bloat mailing
>>>>> list Bloat at lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/bloat
>>>>> 
>>> 
>>>> _______________________________________________ Bloat mailing list
>>>> Bloat at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>> 
>>> _______________________________________________ Bloat mailing list
>>> Bloat at lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat



More information about the Bloat mailing list