[Starlink] Starlink hidden buffers

Wed May 24 11:31:45 EDT 2023

On Wed, May 24, 2023 at 8:49 AM Michael Richardson <mcr at sandelman.ca> wrote:
>
>
> Dave Taht via Starlink <starlink at lists.bufferbloat.net> wrote:
>     > These are the biggest reliability reasons why I think FQ is *necessary*
>     > across the edges of the internet.
>
> It saved your bacon, but yeah, like all other resilient protocols (DNS,
> Happy Eyeballs) tends to hide when one option is failing :-)
>
>     > pure AQM, in the case above, since that flood was uncontrollable, would
>     > have resulted in a 99.99% or so drop rate for all other traffic. While
>     > that would have been easier to diagnose I suppose, the near term
>     > outcome would have been quite damaging.
>
> What this says is that fq_codel doesn't have enough management reporting
> interfaces.   Going back 25 years, this has always been a problem with home
> routers: ntop3 is great, but it's not easy to use, and it's not that
> accessible, and it often can't see things that move around.
>
>     > I always try to make a clear distinction between FQ and AQM techniques.
>     > Both are useful and needed, for different reasons (but in the general
>     > case, I think the DRR++ derived FQ in fq_codel is the cats pajamas, and
>     > far more important than any form of AQM)
>
> Could fq_codel emit flow statistics as a side-effect of it's classifications?

It does. It always has. "tc -s class show" gives details of each queue.

it is a 5 tuple hash by default. This can of course be overridden via
filters to use another classification method.There are a few on-router
tools that do process this and provide a nice dashboard.

This could be better, in trying to identify problematic flows, but
would require more in kernel (ebpf?) processing than we have yet
attempted on a home router. AI is also on our minds.

Most of my focus for the past year has been in getting cake to scale
as an ISP middlebox, in Libreqos.

For example in LibreQos we are presently very successful in sampling
cake queue data at my preferred sample rate (10ms), in production, for
up to about 1k subscribers. However, in production, with 10k subs
(11gbit), sampling at 1s rates is where we are now. (that is 40
million queues sampled once per second). I am sure we can improve the
sample rates at high levels of subs further... compress reporting,
etc, but until now, most ISPs only had 5 minute averages to look at.

There are some really cool things you can do at high sample rates.
Here is a live/realtime movie of what netflix actually looks like:
https://www.youtube.com/watch?v=C-2oSBr2200
(also)  Another thing is that real traffic, displayed as we do it now,
is kind of mesmerizing, and looks very different from what we generate
via flent, on the testbed.

Anyway, on the libreqos front now we have over 30 ISPs and 98 folk
participating in the chat room, please feel free to hang out with us:
https://app.element.io/#/room/#libreqos:matrix.org - ask questions,
propose tests and plots....

I return yáll now to starlink (which could really use this stuff!)

> --
> ]               Never tell me the odds!                 | ipv6 mesh networks [
> ]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
> ]     mcr at sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [
>

-- 
Podcast: https://www.linkedin.com/feed/update/urn:li:activity:7058793910227111937/
Dave Täht CSO, LibreQos