[Cake] [Bloat] [Make-wifi-fast] dslreports is no longer free

Dave Taht dave.taht at gmail.com
Mon May 4 20:10:35 EDT 2020


On Mon, May 4, 2020 at 5:03 PM Bob McMahon via Bloat <
bloat at lists.bufferbloat.net> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Bob McMahon <bob.mcmahon at broadcom.com>
> To: Sergey Fedorov <sfedorov at netflix.com>
> Cc: "David P. Reed" <dpreed at deepplum.com>, Michael Richardson <
> mcr at sandelman.ca>, Make-Wifi-fast <make-wifi-fast at lists.bufferbloat.net>,
> bloat <bloat at lists.bufferbloat.net>, Cake List <cake at lists.bufferbloat.net>,
> Jannie Hanekom <jannie at hanekom.net>
> Bcc:
> Date: Mon, 4 May 2020 17:03:02 -0700
> Subject: Re: [Make-wifi-fast] [Cake] [Bloat] dslreports is no longer free
> Sorry for being a bit off topic but we find average latency not all that
> useful.  A full CDF is.  The next best is a box plot with outliers which
> can be presented parametrically as a few numbers. Most customers want
> visibility into the PDF tail.
>

yea!

Try never to discard the outliers anywhere in the core tests. I always
point to this as a place where, if you stop thinking
the noise is noise, caused by bird droppings in your receiver, you find
structure where you thought it never existed before.

https://theconversation.com/the-cmb-how-an-accidental-discovery-became-the-key-to-understanding-the-universe-45126

A lot of times, just plotting the patterns of the outliers can be
interesting. It's often a lot of bird droppings to sort through!


Also, we're moving to socket write() to read() latencies for our end/end
> measurements (using the iperf 2.0.14 --trip-times option assumes
> synchronized clocks.). We also now measure TCP connects (3WHS) as well.
>

One thing that may or may not help is the sock_sent_lowat option.

I note that with SSL so common, it helps to be using that, rather than
straight tcp, so it's closer to a 5WHS

Yes, generating the crypto exchange costs time, but with that as a baseline
with the extra round trips...


> Finally, since we have trip times and the application write rates we can
> compute the amount of "end/end bytes in queue" per Little's law.
>

I will reserve comment on littles law for a bit.

For fault isolation, in-band network telemetry (or something similar) can
> be useful. https://p4.org/assets/INT-current-spec.pdf
>
> Bob
>
> On Mon, May 4, 2020 at 10:05 AM Sergey Fedorov via Make-wifi-fast <
> make-wifi-fast at lists.bufferbloat.net> wrote:
>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Sergey Fedorov <sfedorov at netflix.com>
>> To: "David P. Reed" <dpreed at deepplum.com>
>> Cc: Sebastian Moeller <moeller0 at gmx.de>, "Dave Täht" <dave.taht at gmail.com>,
>> Michael Richardson <mcr at sandelman.ca>, Make-Wifi-fast <
>> make-wifi-fast at lists.bufferbloat.net>, Jannie Hanekom <jannie at hanekom.net>,
>> Cake List <cake at lists.bufferbloat.net>, bloat <
>> bloat at lists.bufferbloat.net>
>> Bcc:
>> Date: Mon, 4 May 2020 10:04:19 -0700
>> Subject: Re: [Cake] [Make-wifi-fast] [Bloat] dslreports is no longer free
>>
>>> Sergey - I wasn't assuming anything about fast.com. The document you
>>> shared wasn't clear about the methodology's details here. Others sadly,
>>> have actually used ICMP pings in the way I described. I was making a
>>> generic comment of concern.
>>>
>>> That said, it sounds like what you are doing is really helpful (esp.
>>> given that your measure is aimed at end user experiential qualities).
>>
>> David - my apologies, I incorrectly interpreted your statement as being
>> said in context of fast.com measurements. The blog post linked indeed
>> doesn't provide the latency measurement details - was written before we
>> added the extra metrics. We'll see if we can publish an update.
>>
>> 1) a clear definition of lag under load that is from end-to-end in
>>> latency, and involves, ideally, independent traffic from multiple sources
>>> through the bottleneck.
>>
>>  Curious if by multiple sources you mean multiple clients (devices) or
>> multiple connections sending data?
>>
>>
>> SERGEY FEDOROV
>>
>> Director of Engineering
>>
>> sfedorov at netflix.com
>>
>> 121 Albright Way | Los Gatos, CA 95032
>>
>>
>>
>>
>> On Sun, May 3, 2020 at 8:07 AM David P. Reed <dpreed at deepplum.com> wrote:
>>
>>> Thanks Sebastian. I do agree that in many cases, reflecting the ICMP off
>>> the entry device that has the external IP address for the NAT gets most of
>>> the RTT measure, and if there's no queueing built up in the NAT device,
>>> that's a reasonable measure. But...
>>>
>>>
>>>
>>> However, if the router has "taken up the queueing delay" by rate
>>> limiting its uplink traffic to slightly less than the capacity (as with
>>> Cake and other TC shaping that isn't as good as cake), then there is a
>>> queue in the TC layer itself. This is what concerns me as a distortion in
>>> the measurement that can fool one into thinking the TC shaper is doing a
>>> good job, when in fact, lag under load may be quite high from inside the
>>> routed domain (the home).
>>>
>>>
>>>
>>> As you point out this unmeasured queueing delay can also be a problem
>>> with WiFi inside the home. But it isn't limited to that.
>>>
>>>
>>>
>>> A badly set up shaping/congestion management subsystem inside the NAT
>>> can look "very good" in its echo of ICMP packets, but be terrible in
>>> response time to trivial HTTP requests from inside, or equally terrible in
>>> twitch games and video conferencing.
>>>
>>>
>>>
>>> So, for example, for tuning settings with "Cake" it is useless.
>>>
>>>
>>>
>>> To be fair, usually the Access Provider has no control of what is done
>>> after the cable is terminated at the home, so as a way to decide if the
>>> provider is badly engineering its side, a ping from a server is a
>>> reasonable quality measure of the provider.
>>>
>>>
>>>
>>> But not a good measure of the user experience, and if the provider
>>> provides the NAT box, even if it has a good shaper in it, like Cake or
>>> fq_codel, it will just confuse the user and create the opportunity for a
>>> "finger pointing" argument where neither side understands what is going on.
>>>
>>>
>>>
>>> This is why we need
>>>
>>>
>>>
>>> 1) a clear definition of lag under load that is from end-to-end in
>>> latency, and involves, ideally, independent traffic from multiple sources
>>> through the bottleneck.
>>>
>>>
>>>
>>> 2) ideally, a better way to localize where the queues are building up
>>> and present that to users and access providers.  The flent graphs are not
>>> interpretable by most non-experts. What we need is a simple visualization
>>> of a sketch-map of the path (like traceroute might provide) with queueing
>>> delay measures  shown at key points that the user can understand.
>>>
>>> On Saturday, May 2, 2020 4:19pm, "Sebastian Moeller" <moeller0 at gmx.de>
>>> said:
>>>
>>> > Hi David,
>>> >
>>> > in principle I agree, a NATed IPv4 ICMP probe will be at best
>>> reflected at the NAT
>>> > router (CPE) (some commercial home gateways do not respond to ICMP
>>> echo requests
>>> > in the name of security theatre). So it is pretty hard to measure the
>>> full end to
>>> > end path in that configuration. I believe that IPv6 should make that
>>> > easier/simpler in that NAT hopefully will be out of the path (but
>>> let's see what
>>> > ingenuity ISPs will come up with).
>>> > Then again, traditionally the relevant bottlenecks often are a) the
>>> internet
>>> > access link itself and there the CPE is in a reasonable position as a
>>> reflector on
>>> > the other side of the bottleneck as seen from an internet server, b)
>>> the home
>>> > network between CPE and end-host, often with variable rate wifi, here
>>> I agree
>>> > reflecting echos at the CPE hides part of the issue.
>>> >
>>> >
>>> >
>>> > > On May 2, 2020, at 19:38, David P. Reed <dpreed at deepplum.com> wrote:
>>> > >
>>> > > I am still a bit worried about properly defining "latency under
>>> load" for a
>>> > NAT routed situation. If the test is based on ICMP Ping packets *from
>>> the server*,
>>> > it will NOT be measuring the full path latency, and if the potential
>>> congestion
>>> > is in the uplink path from the access provider's residential box to
>>> the access
>>> > provider's router/switch, it will NOT measure congestion caused by
>>> bufferbloat
>>> > reliably on either side, since the bufferbloat will be outside the
>>> ICMP Ping
>>> > path.
>>> >
>>> > Puzzled, as i believe it is going to be the residential box that will
>>> respond
>>> > here, or will it be the AFTRs for CG-NAT that reflect the ICMP echo
>>> requests?
>>> >
>>> > >
>>> > > I realize that a browser based speed test has to be basically run
>>> from the
>>> > "server" end, because browsers are not that good at time measurement
>>> on a packet
>>> > basis. However, there are ways to solve this and avoid the ICMP Ping
>>> issue, with a
>>> > cooperative server.
>>> > >
>>> > > I once built a test that fixed this issue reasonably well. It
>>> carefully
>>> > created a TCP based RTT measurement channel (over HTTP) that made the
>>> echo have to
>>> > traverse the whole end-to-end path, which is the best and only way to
>>> accurately
>>> > define lag under load from the user's perspective. The client end of
>>> an unloaded
>>> > TCP connection can depend on TCP (properly prepared by getting it past
>>> slowstart)
>>> > to generate a single packet response.
>>> > >
>>> > > This "TCP ping" is thus compatible with getting the end-to-end
>>> measurement on
>>> > the server end of a true RTT.
>>> > >
>>> > > It's like tcp-traceroute tool, in that it tricks anyone in the
>>> middle boxes
>>> > into thinking this is a real, serious packet, not an optional low
>>> priority
>>> > packet.
>>> > >
>>> > > The same issue comes up with non-browser-based techniques for
>>> measuring true
>>> > lag-under-load.
>>> > >
>>> > > Now as we move HTTP to QUIC, this actually gets easier to do.
>>> > >
>>> > > One other opportunity I haven't explored, but which is pregnant with
>>> > potential is the use of WebRTC, which runs over UDP internally. Since
>>> JavaScript
>>> > has direct access to create WebRTC connections (multiple ones), this
>>> makes
>>> > detailed testing in the browser quite reasonable.
>>> > >
>>> > > And the time measurements can resolve well below 100 microseconds,
>>> if the JS
>>> > is based on modern JIT compilation (Chrome, Firefox, Edge all compile
>>> to machine
>>> > code speed if the code is restricted and in a loop). Then again, there
>>> is Web
>>> > Assembly if you want to write C code that runs in the brower fast.
>>> WebAssembly is
>>> > a low level language that compiles to machine code in the browser
>>> execution, and
>>> > still has access to all the browser networking facilities.
>>> >
>>> > Mmmh, according to https://github.com/w3c/hr-time/issues/56 due to
>>> spectre
>>> > side-channel vulnerabilities many browsers seemed to have lowered the
>>> timer
>>> > resolution, but even the ~1ms resolution should be fine for typical
>>> RTTs.
>>> >
>>> > Best Regards
>>> > Sebastian
>>> >
>>> > P.S.: I assume that I simply do not see/understand the full scope of
>>> the issue at
>>> > hand yet.
>>> >
>>> >
>>> > >
>>> > > On Saturday, May 2, 2020 12:52pm, "Dave Taht" <dave.taht at gmail.com>
>>> > said:
>>> > >
>>> > > > On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce <bcronce at gmail.com>
>>> > wrote:
>>> > > > >
>>> > > > > > Fast.com reports my unloaded latency as 4ms, my loaded latency
>>> > as ~7ms
>>> > > >
>>> > > > I guess one of my questions is that with a switch to BBR netflix is
>>> > > > going to do pretty well. If fast.com is using bbr, well... that
>>> > > > excludes much of the current side of the internet.
>>> > > >
>>> > > > > For download, I show 6ms unloaded and 6-7 loaded. But for upload
>>> > the loaded
>>> > > > shows as 7-8 and I see it blip upwards of 12ms. But I am no longer
>>> using
>>> > any
>>> > > > traffic shaping. Any anti-bufferbloat is from my ISP. A graph of
>>> the
>>> > bloat would
>>> > > > be nice.
>>> > > >
>>> > > > The tests do need to last a fairly long time.
>>> > > >
>>> > > > > On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom
>>> > <jannie at hanekom.net>
>>> > > > wrote:
>>> > > > >>
>>> > > > >> Michael Richardson <mcr at sandelman.ca>:
>>> > > > >> > Does it find/use my nearest Netflix cache?
>>> > > > >>
>>> > > > >> Thankfully, it appears so. The DSLReports bloat test was
>>> > interesting,
>>> > > > but
>>> > > > >> the jitter on the ~240ms base latency from South Africa (and
>>> > other parts
>>> > > > of
>>> > > > >> the world) was significant enough that the figures returned
>>> > were often
>>> > > > >> unreliable and largely unusable - at least in my experience.
>>> > > > >>
>>> > > > >> Fast.com reports my unloaded latency as 4ms, my loaded latency
>>> > as ~7ms
>>> > > > and
>>> > > > >> mentions servers located in local cities. I finally have a test
>>> > I can
>>> > > > share
>>> > > > >> with local non-technical people!
>>> > > > >>
>>> > > > >> (Agreed, upload test would be nice, but this is a huge step
>>> > forward from
>>> > > > >> what I had access to before.)
>>> > > > >>
>>> > > > >> Jannie Hanekom
>>> > > > >>
>>> > > > >> _______________________________________________
>>> > > > >> Cake mailing list
>>> > > > >> Cake at lists.bufferbloat.net
>>> > > > >> https://lists.bufferbloat.net/listinfo/cake
>>> > > > >
>>> > > > > _______________________________________________
>>> > > > > Cake mailing list
>>> > > > > Cake at lists.bufferbloat.net
>>> > > > > https://lists.bufferbloat.net/listinfo/cake
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Make Music, Not War
>>> > > >
>>> > > > Dave Täht
>>> > > > CTO, TekLibre, LLC
>>> > > > http://www.teklibre.com
>>> > > > Tel: 1-831-435-0729
>>> > > > _______________________________________________
>>> > > > Cake mailing list
>>> > > > Cake at lists.bufferbloat.net
>>> > > > https://lists.bufferbloat.net/listinfo/cake
>>> > > >
>>> > > _______________________________________________
>>> > > Cake mailing list
>>> > > Cake at lists.bufferbloat.net
>>> > > https://lists.bufferbloat.net/listinfo/cake
>>> >
>>> >
>>>
>>>
>>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Sergey Fedorov via Make-wifi-fast <
>> make-wifi-fast at lists.bufferbloat.net>
>> To: "David P. Reed" <dpreed at deepplum.com>
>> Cc: Michael Richardson <mcr at sandelman.ca>, Make-Wifi-fast <
>> make-wifi-fast at lists.bufferbloat.net>, bloat <bloat at lists.bufferbloat.net>,
>> Cake List <cake at lists.bufferbloat.net>, Jannie Hanekom <
>> jannie at hanekom.net>
>> Bcc:
>> Date: Mon, 04 May 2020 10:05:04 -0700 (PDT)
>> Subject: Re: [Make-wifi-fast] [Cake]  [Bloat] dslreports is no longer free
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
>
>
> ---------- Forwarded message ----------
> From: Bob McMahon via Bloat <bloat at lists.bufferbloat.net>
> To: Sergey Fedorov <sfedorov at netflix.com>
> Cc: Make-Wifi-fast <make-wifi-fast at lists.bufferbloat.net>, bloat <
> bloat at lists.bufferbloat.net>, "David P. Reed" <dpreed at deepplum.com>, Cake
> List <cake at lists.bufferbloat.net>, Jannie Hanekom <jannie at hanekom.net>
> Bcc:
> Date: Mon, 04 May 2020 17:03:19 -0700 (PDT)
> Subject: Re: [Bloat] [Make-wifi-fast] [Cake]  dslreports is no longer free
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>


-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20200504/8cccb3c1/attachment-0001.html>


More information about the Cake mailing list