Starlink has bufferbloat. Bad.
 help / color / mirror / Atom feed
* Re: [Starlink] FQ_Codel
       [not found] <mailman.63.1654706837.1281.starlink@lists.bufferbloat.net>
@ 2022-06-08 18:47 ` David P. Reed
  2022-06-08 19:12   ` warren ponder
  2022-06-09  8:50   ` Sebastian Moeller
  0 siblings, 2 replies; 12+ messages in thread
From: David P. Reed @ 2022-06-08 18:47 UTC (permalink / raw)
  To: starlink

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]


I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network".
 
Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals.
That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue.  For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link.
 
This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link.
Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means).
 
 
Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important.
 
The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue.
 
It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment.
 
FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat.
 

 

[-- Attachment #2: Type: text/html, Size: 4432 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-08 18:47 ` [Starlink] FQ_Codel David P. Reed
@ 2022-06-08 19:12   ` warren ponder
  2022-06-08 20:49     ` David P. Reed
  2022-06-09  0:12     ` Stuart Cheshire
  2022-06-09  8:50   ` Sebastian Moeller
  1 sibling, 2 replies; 12+ messages in thread
From: warren ponder @ 2022-06-08 19:12 UTC (permalink / raw)
  To: David P. Reed; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 3046 bytes --]

So this is really helpful. Is it fair to say then that end users with SQM
and fq_codel on a Starlink connection should essentially not turn on
SQM.and.just leave it off?



On Wed, Jun 8, 2022, 11:47 AM David P. Reed <dpreed@deepplum.com> wrote:

> I'm just going to remind folks that fixing bufferbloat in Starlink won't
> be possible with FQ-Codel in the CPE equipment. If that were possible, it
> could be fixed entirely in a box sitting between the dishy and the user's
> "home network".
>
>
>
> Evidence exists that the bulk of the "bloat" can exist, not just in the
> dishy, but also in the central "access point" where satellites in a
> coverage region direct all the traffic from and to the public Internet.
> This connection from the region becomes bloated if the inbound link and
> outbound link become "underprovisioned" for peak rates of all the served
> dishy terminals.
>
> That public-Internet to StarLink access point (is there a more distinct,
> precise name) can develop a very long delay queue.  For the same reason
> that bufferbloat always gets designed in - memory is cheap and plentiful,
> so instead of dropping packets to minimize latency, the link just stores
> packets until multiple seconds worth of traffic build up on one or both
> ends of that link.
>
>
>
> This problem can be solved only by dropping packets (with packet drop rate
> mitigated by ECN-marking) to match the desired round-trip latency across
> the entire Internet. Typically, this queue size should max out and start
> dropping packets at about 2 * cross-Internet desired latency * bit-rate of
> this link.
>
> Cross-Internet desired latency can be selected these days by using
> light-speed in fiber between one side of the North American continent and
> the other - around 15 msec. is appropriate. (which should be the worst case
> end-to-end latency observed using Starlink, and is around the 20 msec.
> number bandied about by Musk - though he really never understood what
> end-to-end latency means).
>
>
>
>
>
> Now it may be that the dishy itself also has such bloat built in, which
> would make FQ-Codel in the dishy also important.
>
>
>
> The problem called bufferbloat occurs whenever ANY router on ANY
> end-to-end shared path allows such queueing delay to accumulate before
> shortening the queue.
>
>
>
> It really frustrates me that memory keeps being added to router outbound
> buffers anywhere. And it may be that the reason is that almost nobody who
> designs packet forwarding systems understands Queueing Theory at all! It
> certainly doesn't help that "packet drops" (even one or two per second) are
> considered a failure of the equipment.
>
>
>
> FQ-codel is great, but why it works is that it makes the choice of what
> packet to drop far better (by being fair and a little bit elastic).
> However, the lack of FQ-Codel doesn't fix system-level bufferbloat.
>
>
>
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
>

[-- Attachment #2: Type: text/html, Size: 4843 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-08 19:12   ` warren ponder
@ 2022-06-08 20:49     ` David P. Reed
  2022-06-08 21:30       ` Dave Taht
  2022-06-09  8:58       ` Sebastian Moeller
  2022-06-09  0:12     ` Stuart Cheshire
  1 sibling, 2 replies; 12+ messages in thread
From: David P. Reed @ 2022-06-08 20:49 UTC (permalink / raw)
  To: warren ponder; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 4274 bytes --]


No, I don't think so. However, a hidden (not often discussed) way that using FQ-codel in a home router works is that you have to artificially restrict the total bitrate in both directions used by your home router to talk to the access provider link.
 
Typically, it is recommended to use 95% of the upload/download speeds of that link as the limit. This forces packets to be dropped when the constraint is exceeded. Now this forces congestion control signals (dropped packets) to be observed by both ends. (In a cable DOCSIS system, this allows the edge to manage the throughput of the CMTS for the local endpoint, because the CMTS won't drop packets when it should - because configuring DOCSIS 3.1 CMTS's is often done in a way that causes bufferbloat in CMTS. DOCSIS 2 always had bufferbloat in the CMTS).
 
Starlink doesn't sell you a stable "max rate" - instead that rate varies depending on traffic, and can't be easily measured.
So to configure the dishy or an edge router connected to it correctly, you need to enforce such a limit such that it actually causes FQ-codel to see dropped packets.
On Wednesday, June 8, 2022 3:12pm, "warren ponder" <wponder11@gmail.com> said:



So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?


On Wed, Jun 8, 2022, 11:47 AM David P. Reed <[ dpreed@deepplum.com ]( mailto:dpreed@deepplum.com )> wrote:
I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network".
 
Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals.
That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue.  For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link.
 
This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link.
Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means).
 
 
Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important.
 
The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue.
 
It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment.
 
FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat.
 

 _______________________________________________
 Starlink mailing list
[ Starlink@lists.bufferbloat.net ]( mailto:Starlink@lists.bufferbloat.net )
[ https://lists.bufferbloat.net/listinfo/starlink ]( https://lists.bufferbloat.net/listinfo/starlink )

[-- Attachment #2: Type: text/html, Size: 7189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-08 20:49     ` David P. Reed
@ 2022-06-08 21:30       ` Dave Taht
  2022-06-09  8:58       ` Sebastian Moeller
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Taht @ 2022-06-08 21:30 UTC (permalink / raw)
  To: David P. Reed; +Cc: warren ponder, starlink

On Wed, Jun 8, 2022 at 1:49 PM David P. Reed <dpreed@deepplum.com> wrote:
>
> No, I don't think so. However, a hidden (not often discussed) way that using FQ-codel in a home router works is that you have to artificially restrict the total bitrate in both directions used by your home router to talk to the access provider link.
>
>
>
> Typically, it is recommended to use 95% of the upload/download speeds of that link as the limit. This forces packets to be dropped when the constraint is exceeded. Now this forces congestion control signals (dropped packets) to be observed by both ends. (In a cable DOCSIS system, this allows the edge to manage the throughput of the CMTS for the local endpoint, because the CMTS won't drop packets when it should - because configuring DOCSIS 3.1 CMTS's is often done in a way that causes bufferbloat in CMTS. DOCSIS 2 always had bufferbloat in the CMTS).
>
>
>
> Starlink doesn't sell you a stable "max rate" - instead that rate varies depending on traffic, and can't be easily measured.

I appreciate david simplifying the problem here, but the details are:

On egress... At line rate... ethernet backpressure is provided via the
linux BQL facility ( https://lwn.net/Articles/469652/ ) which
basically buffers up one completion interrupt's worth of bytes
(packets), and punting the complicated FQ and codel drop/mark
decisions to a slightly higher layer. This is typically 3k bytes at
100Mbit, 40k to 128k) (With TSO) at a gbit. There's roughly 1/2 a ms
of needed buffering at the lowest layer on intel chips today. Some arm
chips are capable of doing interrupts quite a bit faster than intel,
100us is feasible on some.

Codel, being "time in queue based" also works to drop packets
intelligently with ethernet pause frames giving you a "variable rate"
link. I'm not big on pause frames, but codel (and pie) can work with
them, where RED cannot.

We long ago succeeded at making a plethora of the very variable rate
wifi devices work (in the driver) by adopting a motto of "one TXOP in
the hardware", "one ready to go", for wifi, leading to a max unmanaged
buffering of about 10ms before a smarter qdisc can kick in. The
biggest problem that wifi had (that I hope starlink doesn't!) was that
wifi packet aggregation is needed to get even close to the rated
bandwidth, and that with a fifo, rather than a per station queue,
performance degrades hugely when flows for more than one station are
active.

If only I could get a few million more people to lose 8 minutes of
their life to this simple and elegant demo of why wifi goes to hell:
https://www.youtube.com/watch?v=Rb-UnHDw02o&t=1550s

Or read up on how well it's solved as of 2016, and use our test suite
for multiple stations at the same time:
https://www.cs.kau.se/tohojo/airtime-fairness/

In both cases with smarter hardware all that could be eliminated, but
I digress. 10ms worst case latency for a sparse flow is much better
than the 7 seconds I first observed on the first wifi router starlink
shipped. I apologize for conflating these things - the principal
wireless gear I'm familiar with is wifi, and a few other TDM style
wireless macs. The principal problem the starlink network has is on
the satellite uplink, followed by the satellite downlink, and the wifi
router problems only show up if you use the router for local things.
The wifi solution we came up with seems generalizable to most forms of
wireless, including LTE/5G and starlink.

> So to configure the dishy or an edge router connected to it correctly, you need to enforce such a limit such that it actually causes FQ-codel to see dropped packets.

So to reiterate, for egress from the client up to the sat:

1) with an interrupts worth of backpressure from the radio, we just
slam cake, fq_codel, or fq_pie on it, and we have a minimum inter-flow
latency of however long that takes (under a ms) for sparse flows, and
a buffering target of 5ms with margin of 100ms for fat flows. (My
guess is that the radio is actually scheduled on intervals higher than
that btw)

or

 2) actual (and accurate) bandwidth stats from the radio as to how
much can fit into the next transmit opportunity, we can set a cake
rate
slightly below that.

The 24Mbit up we've seen would have oh, a 2% cpu impact on these arm
chips running cake with all options enabled. With the shaper, call it
7%

Ingress is harder, ideally, as I've said, they'd be managing the queue
depth and fq at the head end, and not in the sky, doing complicated
things like per-customer fairness, etc, there, and doing just enough
queuing in the sky to cover for 2 tx opportunities. The approach we've
long taken with shaping ingress also at the customer router, well,
that's WIP for variable rate links...

>
> On Wednesday, June 8, 2022 3:12pm, "warren ponder" <wponder11@gmail.com> said:
>
> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?

Right now, aside from trying to line up more testers of the autorate
code, it's not worth doing, yes.

> On Wed, Jun 8, 2022, 11:47 AM David P. Reed <dpreed@deepplum.com> wrote:
>>
>> I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network".
>>
>>
>>
>> Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals.
>>
>> That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue.  For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link.
>>
>>
>>
>> This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link.
>>
>> Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means).
>>
>>
>>
>>
>>
>> Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important.
>>
>>
>>
>> The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue.
>>
>>
>>
>> It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment.
>>
>>
>>
>> FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink



-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-08 19:12   ` warren ponder
  2022-06-08 20:49     ` David P. Reed
@ 2022-06-09  0:12     ` Stuart Cheshire
  2022-06-09  0:21       ` David Lang
  1 sibling, 1 reply; 12+ messages in thread
From: Stuart Cheshire @ 2022-06-09  0:12 UTC (permalink / raw)
  To: warren ponder; +Cc: David P. Reed, starlink

On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote:

> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?

My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there.

One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloated Wi-Fi access points shrinks as the rate coming into the house goes up.

Stuart Cheshire


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-09  0:12     ` Stuart Cheshire
@ 2022-06-09  0:21       ` David Lang
  2022-06-09  1:11         ` Dave Taht
  0 siblings, 1 reply; 12+ messages in thread
From: David Lang @ 2022-06-09  0:21 UTC (permalink / raw)
  To: Stuart Cheshire; +Cc: warren ponder, starlink, David P. Reed

[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]

multiple access points, good. Mesh can make the problem worse.

The combination of hidden transmitters (station in the middle can hear stations 
on both ends, but they can't hear each other and so step on each other) and just 
more airtime needed ro relay the messages as there are more hops can make the 
congestion worse (however, it is possible that higher data rates could make the 
transmissions shorter, but since the inter-aggregate gaps and per-aggregate 
headers are fixed at a low data rate, I doubt that it works that way in 
practice)

but get a few additional APs hooked together via wires, and you have a clear win 
that scales very well. It's what we do at the Scale conf with 100+ APs to 
support 3k+ geeks.

David Lang

On Wed, 8 Jun 2022, Stuart Cheshire wrote:

> On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote:
>
>> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?
>
> My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there.
>
> One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloate
 d Wi-Fi access points shrinks as the rate coming into the house goes up.
>
> Stuart Cheshire
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-09  0:21       ` David Lang
@ 2022-06-09  1:11         ` Dave Taht
  2022-06-09  2:01           ` David Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Taht @ 2022-06-09  1:11 UTC (permalink / raw)
  To: David Lang; +Cc: Stuart Cheshire, starlink, David P. Reed

On Wed, Jun 8, 2022 at 5:21 PM David Lang <david@lang.hm> wrote:
>
> multiple access points, good. Mesh can make the problem worse.
>
> The combination of hidden transmitters (station in the middle can hear stations
> on both ends, but they can't hear each other and so step on each other) and just
> more airtime needed ro relay the messages as there are more hops can make the
> congestion worse (however, it is possible that higher data rates could make the
> transmissions shorter, but since the inter-aggregate gaps and per-aggregate
> headers are fixed at a low data rate, I doubt that it works that way in
> practice)
>
> but get a few additional APs hooked together via wires, and you have a clear win
> that scales very well. It's what we do at the Scale conf with 100+ APs to
> support 3k+ geeks.

Is there a physical scale conference this year? (It's in LA and a lot
of space/film folk go there)

For those that don't know, david lang has been putting together the
fq_codeled APs there for what? 8 years now? Conference feedback on the
wifi has generally been uniformly positive.

What APs do you use now?

> David Lang
>
> On Wed, 8 Jun 2022, Stuart Cheshire wrote:
>
> > On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote:
> >
> >> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?
> >
> > My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there.
> >
> > One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloate
>  d Wi-Fi access points shrinks as the rate coming into the house goes up.
> >
> > Stuart Cheshire
> >
> > _______________________________________________
> > Starlink mailing list
> > Starlink@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/starlink_______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink



-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-09  1:11         ` Dave Taht
@ 2022-06-09  2:01           ` David Lang
  0 siblings, 0 replies; 12+ messages in thread
From: David Lang @ 2022-06-09  2:01 UTC (permalink / raw)
  To: Dave Taht; +Cc: David Lang, Stuart Cheshire, starlink, David P. Reed

[-- Attachment #1: Type: text/plain, Size: 4949 bytes --]

On Wed, 8 Jun 2022, Dave Taht wrote:

> On Wed, Jun 8, 2022 at 5:21 PM David Lang <david@lang.hm> wrote:
>>
>> multiple access points, good. Mesh can make the problem worse.
>>
>> The combination of hidden transmitters (station in the middle can hear stations
>> on both ends, but they can't hear each other and so step on each other) and just
>> more airtime needed ro relay the messages as there are more hops can make the
>> congestion worse (however, it is possible that higher data rates could make the
>> transmissions shorter, but since the inter-aggregate gaps and per-aggregate
>> headers are fixed at a low data rate, I doubt that it works that way in
>> practice)
>>
>> but get a few additional APs hooked together via wires, and you have a clear win
>> that scales very well. It's what we do at the Scale conf with 100+ APs to
>> support 3k+ geeks.
>
> Is there a physical scale conference this year? (It's in LA and a lot
> of space/film folk go there)

yes, it got pushed from the beginning of the year and will not be the last 
weekend of July

> For those that don't know, david lang has been putting together the
> fq_codeled APs there for what? 8 years now? Conference feedback on the
> wifi has generally been uniformly positive.

I've been doing the wifi since 2009 and am now co-chair for the network team.

> What APs do you use now?

We are back in the LAX hilton this year, our network is ~50 juniper 4200 
switches, 100+ wndr2700/3800s and several miles of cable and a pair of fairly 
beefy servers to run the VMs to run and monitor the network. We've been talking 
the last 4-5 years of replacing the APs with something newer, but a combination 
of a turnover of most of our tech staff, covid, and the unreliability of the 
drivers on our first pick has kept us pushing it back year by year (when you are 
going to buy 100+ APs it adds up to quite a price tag, especially for an 
all-volunteer event)

Over the last couple of years we've setup a couple of the 3800s hooked to a pi 
and a couple relays so every build gets auto-flashed and tested and just 
finished setting up a system that lets us put a hub/switch in place and flash 
the APs to the current version by the DHCP server detecting them come up on the 
network.

This year we will be doing an updated version of what I documented here 
https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless
but with an even higher density of APs on low power (the walls in this hotel are 
pretty good shielding and more people have 5G than did 10 years ago)

If anyone is in the area, stop by and chat. We are always looking for volunteers 
to help setup and teardown as well ;-) (setup starts monday, and the guy who 
was going to be running monitoring just had to back out if someone wants to 
jump in the deep end)


David Lang

>> David Lang
>>
>> On Wed, 8 Jun 2022, Stuart Cheshire wrote:
>>
>>> On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote:
>>>
>>>> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?
>>>
>>> My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there.
>>>
>>> One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloa
 te
>>  d Wi-Fi access points shrinks as the rate coming into the house goes up.
>>>
>>> Stuart Cheshire
>>>
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink_______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
>
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-08 18:47 ` [Starlink] FQ_Codel David P. Reed
  2022-06-08 19:12   ` warren ponder
@ 2022-06-09  8:50   ` Sebastian Moeller
  1 sibling, 0 replies; 12+ messages in thread
From: Sebastian Moeller @ 2022-06-09  8:50 UTC (permalink / raw)
  To: David P. Reed; +Cc: starlink

Hi David,


> On Jun 8, 2022, at 20:47, David P. Reed <dpreed@deepplum.com> wrote:
> 
> I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network".

	While we can not fix it, we can remedy it to some degree.


>  
> Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals.
> That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue.  For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful,

	... but CPU cycles are still precious, it is this combination that results in the "over-sized, but under-managed" combination that is so atrocious for latency under load/working latency. (Granted, decent management typically means that the queues never really grow to fill large memory buffers, but that also means that reserving large amounts of memory for buffering does not hurt anymore).

> so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link.

	As long as they service the remote stations in a somewhat predictable/fair rpund-robin fashion under load we should be able to remedy that though.

> This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link.
> Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means).
>  
>  
> Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important.
>  
> The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue.
>  
> It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment.
>  
> FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat.

	I would have guessed tat the FQ scheduler alone already helps a lot, as it restricts the pain from over-committing to the hash-bucket housing the offending flow. Sure selecting the most effective packet(s) to drop also helps, but FQ alone will already help non-capacity-seeking flows (that stay below their capacity share) a lot if competing with capacity seeking traffic on the same link.

Regards
	Sebastian


>  
>  
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-08 20:49     ` David P. Reed
  2022-06-08 21:30       ` Dave Taht
@ 2022-06-09  8:58       ` Sebastian Moeller
  1 sibling, 0 replies; 12+ messages in thread
From: Sebastian Moeller @ 2022-06-09  8:58 UTC (permalink / raw)
  To: David P. Reed; +Cc: warren ponder, starlink

Hi David,


> On Jun 8, 2022, at 22:49, David P. Reed <dpreed@deepplum.com> wrote:
> 
> No, I don't think so. However, a hidden (not often discussed) way that using FQ-codel in a home router works is that you have to artificially restrict the total bitrate in both directions used by your home router to talk to the access provider link.

	I am not sure I fully agree with the "hidden (not often discussed)" qualification, when I explain sqm principles I start by explaining queues* and bottleneck queues and the need to create an artificial bottleneck to get the queueing under our control where we can employ decent scheduling and AQM to minimize the downsides of excessive queueing. E.g in https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm-details we write:

"Basic Settings - the details…

SQM is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet. The algorithm cannot automatically adapt to network conditions on DSL, cable modems or GPON without any settings. Since the majority of ISP provided configurations for buffering are broken today, you need take control of the bottleneck link away from the ISP and move it into the router so it can be fixed. You do this by entering link speeds that are a few percent below the actual speeds."



*) Ad little as I understand them, I do not claim to be an expert on queueing theory.


> Typically, it is recommended to use 95% of the upload/download speeds of that link as the limit. This forces packets to be dropped when the constraint is exceeded. Now this forces congestion control signals (dropped packets) to be observed by both ends. (In a cable DOCSIS system, this allows the edge to manage the throughput of the CMTS for the local endpoint, because the CMTS won't drop packets when it should - because configuring DOCSIS 3.1 CMTS's is often done in a way that causes bufferbloat in CMTS. DOCSIS 2 always had bufferbloat in the CMTS).
>  
> Starlink doesn't sell you a stable "max rate" - instead that rate varies depending on traffic, and can't be easily measured.
> So to configure the dishy or an edge router connected to it correctly, you need to enforce such a limit such that it actually causes FQ-codel to see dropped packets.
> On Wednesday, June 8, 2022 3:12pm, "warren ponder" <wponder11@gmail.com> said:
> 
> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off?
> 
> On Wed, Jun 8, 2022, 11:47 AM David P. Reed <dpreed@deepplum.com> wrote:
> I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network".
>  
> Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals.
> That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue.  For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link.
>  
> This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link.
> Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means).
>  
>  
> Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important.
>  
> The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue.
>  
> It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment.
>  
> FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat.
>  
>  
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Starlink] FQ_Codel
  2022-06-06 16:20 Warren Ponder
@ 2022-06-08 16:47 ` Dave Taht
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Taht @ 2022-06-08 16:47 UTC (permalink / raw)
  To: Warren Ponder; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

On Wed, Jun 8, 2022 at 8:29 AM Warren Ponder <wponderlpp@gmail.com> wrote:
>
> I have been reading up on everything trying to get up to speed on fq_codel. I have Starlink and a router that implements fq_codel. I know implementations can vary. However, has anyone found any general strategies for CPE side settings that can make any improvement?

My "strategy" has been to somehow convince 'em to burn a weekend with
me on implementing sch_cake on the dishy. With BQL-like backpressure
from the radio, making the dishy reliably do low latency
videoconferencing or gaming is straightforward that way. Even SFQ
would gain them this. Users would stop complaining so much when the
bandwidth was low.

Presently though, the dishy's userspace code seems to treat linux more
like a bootloader - it reads from the radio and outputs to the
ethernet port. They haven't done a GPL drop of that, just the router's
- running a 10 year old (version 1), or 6 year old (version 2) hacked
up, ancient, decrepit, vendor supported only version of openwrt
"lede". In the olde days,
a company entering a market like this would coddle developers, give
them hardware and support, not

Worse, on the wifi front, they chose a really scarce mediatek chip for
that router... probably "locking it up"... and nobody in the openwrt
effort (that I'm aware of) has been working on adding in the mainline
support for it needed for any other company.  We're making huge
strides on mediatek in general getting all their other chipsets to
behave like this:

https://blog.cerowrt.org/post/fq-codel-unifi6/

with I hope, us finally fixing the tx power transmit bug that plagued
many of the other mediatek wifi implementations:
https://github.com/openwrt/mt76/issues/633

We've done enough reverse engineering on their devices for me to
conclude that with statistics from the radio - and perhaps some
signalling from the headend, that cake could compensate for
bufferbloat in both directions (backpressure would be better still
there, and on their headends), and there's some json stats that might
be helpful in getting a downstream router to compensate also, but
lacking a starlink node to hack on, I haven't got anywhere.

Mike Puchol has been pulling the stats for his lovely graphing
utilities... but I don't know if they are adaquate enough,
without feeding 'em into cake.

They could be doing so much better than the attached 300+ms of induced
latency on the rrul test. They could be nearly totally flat
latencywise across the board... if they'd give up on 100/20 and being
RDOF compliant, and focus on just providing good low latency service
at lower rates they could increase their subscriber density and nobody
but the speedtest.net devote's would notice.

It's been a frustrating year, not being able to get that weekend of
mutual hacking out of starlink. 400k subscribers that could have taken
advantage of all the innovations we've made in queuing delay in the
last decade if they'd just sink that weekend into applying what they
already almost have in their codebase. And we - in the bufferbloat
effort, would have an exemplary implementation to point to.

Anyway, elsewhere, we've been trying to get starlink users to test
these means of active sensing and configuring cake on your own router.
If you could give either of these scripts a shot?

https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/108848

 Still, I retain hope that someone over there, will end up owning this
problem, and fixing it.

> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink



-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC

[-- Attachment #2: rrul_be_-_2022-05-29_14:17:48.svg --]
[-- Type: image/svg+xml, Size: 155011 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Starlink] FQ_Codel
@ 2022-06-06 16:20 Warren Ponder
  2022-06-08 16:47 ` Dave Taht
  0 siblings, 1 reply; 12+ messages in thread
From: Warren Ponder @ 2022-06-06 16:20 UTC (permalink / raw)
  To: starlink

[-- Attachment #1: Type: text/plain, Size: 266 bytes --]

I have been reading up on everything trying to get up to speed on fq_codel.
I have Starlink and a router that implements fq_codel. I know
implementations can vary. However, has anyone found any general strategies
for CPE side settings that can make any improvement?

[-- Attachment #2: Type: text/html, Size: 291 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-09  8:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.63.1654706837.1281.starlink@lists.bufferbloat.net>
2022-06-08 18:47 ` [Starlink] FQ_Codel David P. Reed
2022-06-08 19:12   ` warren ponder
2022-06-08 20:49     ` David P. Reed
2022-06-08 21:30       ` Dave Taht
2022-06-09  8:58       ` Sebastian Moeller
2022-06-09  0:12     ` Stuart Cheshire
2022-06-09  0:21       ` David Lang
2022-06-09  1:11         ` Dave Taht
2022-06-09  2:01           ` David Lang
2022-06-09  8:50   ` Sebastian Moeller
2022-06-06 16:20 Warren Ponder
2022-06-08 16:47 ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox