This discussion is fascinating and made me think of a couple of points I really wish more people would grok:

1. What matters for the amount of queuing is the ratio of load over capacity, or demand/supply, if you like. This ratio, at any point in time, determines how quickly a queue fills or empties. It is the derivative of the queue depth, if you like. Drops in capacity are equivalent to spikes in load from this point of view.

This means the rate adaptation of WiFi and LTE, and link changes in the Starlink network, has far greater potential of causing latency spikes than TCP, even when many users connect at the same time. WiFi rates can go from 1000 to 1 from one packet to the next, and whenever that happens there simply isn't time for TCP or any other end-to-end congestion controller to react. In the presence of capacity seeking traffic there will, inevitably, be a latency spike (or packet loss) when link capacity drops.

I'm presenting a paper on this at ICC next week, and the preprint is here: https://arxiv.org/abs/2111.00488

2. IF you can describe how the ratio of demand to supply (or load/capacity) changes over time (i.e, how much and how quickly it can change), then we can use queuing theory (and/or simulations), to work out the utilization vs. queuing delay trade-off, including transient behaviour. Handling transients is what FQ excels at.

Because of the need for frequent link changes in the Starlink network, there will be a need for more buffering than your typical (relatively) static network. Not only because the load changes quickly, but because the capacity does as well. This causes rapid changes in the load-to-capacity-ratio, which will cause queues and/or packet loss unless it's planned really well. I'm not going to say that is impossible, but it's certainly hard.

Some queuing and deliberate under-utilization is needed to achieve reliable QoE in a system like that.

Just my two cents!

Cheers,

Bjørn Ivar Teigen

On Sat, 13 May 2023 at 12:10, Ulrich Speidel via Starlink <starlink@lists.bufferbloat.net> wrote:

Here's a bit of a question to you all. See what you make of it.

I've been thinking a bit about the latencies we see in the Starlink
network. This is why this list exist (right, Dave?). So what do we know?

1) We know that RTTs can be in the 100's of ms even in what appear to be
bent-pipe scenarios where the physical one-way path should be well under
3000 km, with physical RTT under 20 ms.
2) We know from plenty of traceroutes that these RTTs accrue in the
Starlink network, not between the Starlink handover point (POP) to the
Internet.
3) We know that they aren't an artifact of the Starlink WiFi router (our
traceroutes were done through their Ethernet adaptor, which bypasses the
router), so they must be delays on the satellites or the teleports.
4) We know that processing delay isn't a huge factor because we also see
RTTs well under 30 ms.
5) That leaves queuing delays.

This issue has been known for a while now. Starlink have been innovating
their heart out around pretty much everything here - and yet, this
bufferbloat issue hasn't changed, despite Dave proposing what appears to
be an easy fix compared to a lot of other things they have done. So what
are we possibly missing here?

Going back to first principles: The purpose of a buffer on a network
device is to act as a shock absorber against sudden traffic bursts. If I
want to size that buffer correctly, I need to know at the very least
(paraphrasing queueing theory here) something about my packet arrival
process.

If I look at conventional routers, then that arrival process involves
traffic generated by a user population that changes relatively slowly:
WiFi users come and go. One at a time. Computers in a company get turned
on and off and rebooted, but there are no instantaneous jumps in load -
you don't suddenly have a hundred users in the middle of watching
Netflix turning up that weren't there a second ago. Most of what we know
about Internet traffic behaviour is based on this sort of network, and
this is what we've designed our queuing systems around, right?

Observation: Starlink potentially breaks that paradigm. Why? Imagine a
satellite X handling N users that are located closely together in a
fibre-less rural town watching a range of movies. Assume that N is
relatively large. Say these users are currently handled through ground
station teleport A some distance away to the west (bent pipe with
switching or basic routing on the satellite). X is in view of both A and
the N users, but with X being a LEO satellite, that bliss doesn't last.
Say X is moving to the (south- or north-)east and out of A's range.
Before connection is lost, the N users migrate simultaneously to a new
satellite Y that has moved into view of both A and themselves. Y is
doing so from the west and is also catering to whatever users it can see
there, and let's suppose has been using A for a while already.

The point is that the user load on X and Y from users other than our N
friends could be quite different. E.g., one of them could be over the
ocean with few users, the other over countryside with a lot of
customers. The TCP stacks of our N friends are (hopefully) somewhat
adapted to the congestion situation on X with their cwnds open to
reasonable sizes, but they are now thrown onto a completely different
congestion scenario on Y. Similarly, say that Y had less than N users
before the handover. For existing users on Y, there is now a huge surge
of competing traffic that wasn't there a second ago - surging far faster
than we would expect this to happen in a conventional network because
there is no slow start involved.

This seems to explain the huge jumps you see on Starlink in TCP goodput
over time.

But could this be throwing a few spanners into the works in terms of
queuing? Does it invalidate what we know about queues and queue
management? Would surges like these justify larger buffers?

--
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************

_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink

Bjørn Ivar Teigen, Ph.D.

Head of Research

+47 47335952 | bjorn@domos.ai | www.domos.ai