[Starlink] Starlink hidden buffers

Starlink has bufferbloat. Bad.
 help / color / mirror / Atom feed

* [Starlink] Starlink hidden buffers
@ 2023-05-13 10:10 Ulrich Speidel
  2023-05-13 11:20 ` Sebastian Moeller
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-13 10:10 UTC (permalink / raw)
  To: starlink

Here's a bit of a question to you all. See what you make of it.

I've been thinking a bit about the latencies we see in the Starlink 
network. This is why this list exist (right, Dave?). So what do we know?

1) We know that RTTs can be in the 100's of ms even in what appear to be 
bent-pipe scenarios where the physical one-way path should be well under 
3000 km, with physical RTT under 20 ms.
2) We know from plenty of traceroutes that these RTTs accrue in the 
Starlink network, not between the Starlink handover point (POP) to the 
Internet.
3) We know that they aren't an artifact of the Starlink WiFi router (our 
traceroutes were done through their Ethernet adaptor, which bypasses the 
router), so they must be delays on the satellites or the teleports.
4) We know that processing delay isn't a huge factor because we also see 
RTTs well under 30 ms.
5) That leaves queuing delays.

This issue has been known for a while now. Starlink have been innovating 
their heart out around pretty much everything here - and yet, this 
bufferbloat issue hasn't changed, despite Dave proposing what appears to 
be an easy fix compared to a lot of other things they have done. So what 
are we possibly missing here?

Going back to first principles: The purpose of a buffer on a network 
device is to act as a shock absorber against sudden traffic bursts. If I 
want to size that buffer correctly, I need to know at the very least 
(paraphrasing queueing theory here) something about my packet arrival 
process.

If I look at conventional routers, then that arrival process involves 
traffic generated by a user population that changes relatively slowly: 
WiFi users come and go. One at a time. Computers in a company get turned 
on and off and rebooted, but there are no instantaneous jumps in load - 
you don't suddenly have a hundred users in the middle of watching 
Netflix turning up that weren't there a second ago. Most of what we know 
about Internet traffic behaviour is based on this sort of network, and 
this is what we've designed our queuing systems around, right?

Observation: Starlink potentially breaks that paradigm. Why? Imagine a 
satellite X handling N users that are located closely together in a 
fibre-less rural town watching a range of movies. Assume that N is 
relatively large. Say these users are currently handled through ground 
station teleport A some distance away to the west (bent pipe with 
switching or basic routing on the satellite). X is in view of both A and 
the N users, but with X being a LEO satellite, that bliss doesn't last. 
Say X is moving to the (south- or north-)east and out of A's range. 
Before connection is lost, the N users migrate simultaneously to a new 
satellite Y that has moved into view of both A and themselves. Y is 
doing so from the west and is also catering to whatever users it can see 
there, and let's suppose has been using A for a while already.

The point is that the user load on X and Y from users other than our N 
friends could be quite different. E.g., one of them could be over the 
ocean with few users, the other over countryside with a lot of 
customers. The TCP stacks of our N friends are (hopefully) somewhat 
adapted to the congestion situation on X with their cwnds open to 
reasonable sizes, but they are now thrown onto a completely different 
congestion scenario on Y. Similarly, say that Y had less than N users 
before the handover. For existing users on Y, there is now a huge surge 
of competing traffic that wasn't there a second ago - surging far faster 
than we would expect this to happen in a conventional network because 
there is no slow start involved.

This seems to explain the huge jumps you see on Starlink in TCP goodput 
over time.

But could this be throwing a few spanners into the works in terms of 
queuing? Does it invalidate what we know about queues and queue 
management? Would surges like these justify larger buffers?

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 10:10 [Starlink] Starlink hidden buffers Ulrich Speidel
@ 2023-05-13 11:20 ` Sebastian Moeller
  2023-05-13 12:16   ` Ulrich Speidel
  2023-05-13 22:57 ` David Lang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Sebastian Moeller @ 2023-05-13 11:20 UTC (permalink / raw)
  To: Ulrich Speidel, Ulrich Speidel via Starlink, starlink

[-- Attachment #1: Type: text/plain, Size: 4775 bytes --]

Hi Ulrich,

This situation is not completely different from say a train full of LTE/5G users moving through a set of cells with already established 'static' users, no?

On 13 May 2023 12:10:17 CEST, Ulrich Speidel via Starlink <starlink@lists.bufferbloat.net> wrote:
>Here's a bit of a question to you all. See what you make of it.
>
>I've been thinking a bit about the latencies we see in the Starlink network. This is why this list exist (right, Dave?). So what do we know?
>
>1) We know that RTTs can be in the 100's of ms even in what appear to be bent-pipe scenarios where the physical one-way path should be well under 3000 km, with physical RTT under 20 ms.
>2) We know from plenty of traceroutes that these RTTs accrue in the Starlink network, not between the Starlink handover point (POP) to the Internet.
>3) We know that they aren't an artifact of the Starlink WiFi router (our traceroutes were done through their Ethernet adaptor, which bypasses the router), so they must be delays on the satellites or the teleports.
>4) We know that processing delay isn't a huge factor because we also see RTTs well under 30 ms.
>5) That leaves queuing delays.
>
>This issue has been known for a while now. Starlink have been innovating their heart out around pretty much everything here - and yet, this bufferbloat issue hasn't changed, despite Dave proposing what appears to be an easy fix compared to a lot of other things they have done. So what are we possibly missing here?
>
>Going back to first principles: The purpose of a buffer on a network device is to act as a shock absorber against sudden traffic bursts. If I want to size that buffer correctly, I need to know at the very least (paraphrasing queueing theory here) something about my packet arrival process.
>
>If I look at conventional routers, then that arrival process involves traffic generated by a user population that changes relatively slowly: WiFi users come and go. One at a time. Computers in a company get turned on and off and rebooted, but there are no instantaneous jumps in load - you don't suddenly have a hundred users in the middle of watching Netflix turning up that weren't there a second ago. Most of what we know about Internet traffic behaviour is based on this sort of network, and this is what we've designed our queuing systems around, right?
>
>Observation: Starlink potentially breaks that paradigm. Why? Imagine a satellite X handling N users that are located closely together in a fibre-less rural town watching a range of movies. Assume that N is relatively large. Say these users are currently handled through ground station teleport A some distance away to the west (bent pipe with switching or basic routing on the satellite). X is in view of both A and the N users, but with X being a LEO satellite, that bliss doesn't last. Say X is moving to the (south- or north-)east and out of A's range. Before connection is lost, the N users migrate simultaneously to a new satellite Y that has moved into view of both A and themselves. Y is doing so from the west and is also catering to whatever users it can see there, and let's suppose has been using A for a while already.
>
>The point is that the user load on X and Y from users other than our N friends could be quite different. E.g., one of them could be over the ocean with few users, the other over countryside with a lot of customers. The TCP stacks of our N friends are (hopefully) somewhat adapted to the congestion situation on X with their cwnds open to reasonable sizes, but they are now thrown onto a completely different congestion scenario on Y. Similarly, say that Y had less than N users before the handover. For existing users on Y, there is now a huge surge of competing traffic that wasn't there a second ago - surging far faster than we would expect this to happen in a conventional network because there is no slow start involved.
>
>This seems to explain the huge jumps you see on Starlink in TCP goodput over time.
>
>But could this be throwing a few spanners into the works in terms of queuing? Does it invalidate what we know about queues and queue management? Would surges like these justify larger buffers?
>
>-- 
>****************************************************************
>Dr. Ulrich Speidel
>
>School of Computer Science
>
>Room 303S.594 (City Campus)
>
>The University of Auckland
>u.speidel@auckland.ac.nz
>http://www.cs.auckland.ac.nz/~ulrich/
>****************************************************************
>
>
>
>_______________________________________________
>Starlink mailing list
>Starlink@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/starlink

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[-- Attachment #2: Type: text/html, Size: 4684 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 11:20 ` Sebastian Moeller
@ 2023-05-13 12:16   ` Ulrich Speidel
  2023-05-13 23:00     ` David Lang
  0 siblings, 1 reply; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-13 12:16 UTC (permalink / raw)
  To: Sebastian Moeller, Ulrich Speidel via Starlink

[-- Attachment #1: Type: text/plain, Size: 6105 bytes --]

Hi Sebastian,

Yes and no. But yes, not completely, and I'm not sure whether anyone has 
ever looked at this, in fact. People who build mobile networks tend to 
regard their job as complete the moment they can get an IP packet from a 
mobile device to the Internet and vice versa, and mobile users tend to 
be a bit more tolerant if things slow down for a moment or three.

There are a few differences, though. One is that cells are (or at least 
can be) fibre connected, and that is something you would do along a 
high-speed train line. So there is less of a bottleneck than having to 
use RF for downlinking. I'd also imaging total user numbers to be lower 
and the bandwidth demand per user to be less (hands up who takes their 
50" TV onto trains to watch Netflix in HD?). The other is that most 
places have 3+ networks serving the train line, which brings down user 
numbers, or you have in-train cells, which communicate with off-train 
POPs that have no extra users.

But yes, good question IMHO!

Cheers,

Ulrich

On 13/05/2023 11:20 pm, Sebastian Moeller wrote:
> Hi Ulrich,
>
> This situation is not completely different from say a train full of 
> LTE/5G users moving through a set of cells with already established 
> 'static' users, no?
>
>
> On 13 May 2023 12:10:17 CEST, Ulrich Speidel via Starlink 
> <starlink@lists.bufferbloat.net> wrote:
>
>     Here's a bit of a question to you all. See what you make of it.
>     I've been thinking a bit about the latencies we see in the
>     Starlink network. This is why this list exist (right, Dave?). So
>     what do we know? 1) We know that RTTs can be in the 100's of ms
>     even in what appear to be bent-pipe scenarios where the physical
>     one-way path should be well under 3000 km, with physical RTT under
>     20 ms. 2) We know from plenty of traceroutes that these RTTs
>     accrue in the Starlink network, not between the Starlink handover
>     point (POP) to the Internet. 3) We know that they aren't an
>     artifact of the Starlink WiFi router (our traceroutes were done
>     through their Ethernet adaptor, which bypasses the router), so
>     they must be delays on the satellites or the teleports. 4) We know
>     that processing delay isn't a huge factor because we also see RTTs
>     well under 30 ms. 5) That leaves queuing delays. This issue has
>     been known for a while now. Starlink have been innovating their
>     heart out around pretty much everything here - and yet, this
>     bufferbloat issue hasn't changed, despite Dave proposing what
>     appears to be an easy fix compared to a lot of other things they
>     have done. So what are we possibly missing here? Going back to
>     first principles: The purpose of a buffer on a network device is
>     to act as a shock absorber against sudden traffic bursts. If I
>     want to size that buffer correctly, I need to know at the very
>     least (paraphrasing queueing theory here) something about my
>     packet arrival process. If I look at conventional routers, then
>     that arrival process involves traffic generated by a user
>     population that changes relatively slowly: WiFi users come and go.
>     One at a time. Computers in a company get turned on and off and
>     rebooted, but there are no instantaneous jumps in load - you don't
>     suddenly have a hundred users in the middle of watching Netflix
>     turning up that weren't there a second ago. Most of what we know
>     about Internet traffic behaviour is based on this sort of network,
>     and this is what we've designed our queuing systems around, right?
>     Observation: Starlink potentially breaks that paradigm. Why?
>     Imagine a satellite X handling N users that are located closely
>     together in a fibre-less rural town watching a range of movies.
>     Assume that N is relatively large. Say these users are currently
>     handled through ground station teleport A some distance away to
>     the west (bent pipe with switching or basic routing on the
>     satellite). X is in view of both A and the N users, but with X
>     being a LEO satellite, that bliss doesn't last. Say X is moving to
>     the (south- or north-)east and out of A's range. Before connection
>     is lost, the N users migrate simultaneously to a new satellite Y
>     that has moved into view of both A and themselves. Y is doing so
>     from the west and is also catering to whatever users it can see
>     there, and let's suppose has been using A for a while already. The
>     point is that the user load on X and Y from users other than our N
>     friends could be quite different. E.g., one of them could be over
>     the ocean with few users, the other over countryside with a lot of
>     customers. The TCP stacks of our N friends are (hopefully)
>     somewhat adapted to the congestion situation on X with their cwnds
>     open to reasonable sizes, but they are now thrown onto a
>     completely different congestion scenario on Y. Similarly, say that
>     Y had less than N users before the handover. For existing users on
>     Y, there is now a huge surge of competing traffic that wasn't
>     there a second ago - surging far faster than we would expect this
>     to happen in a conventional network because there is no slow start
>     involved. This seems to explain the huge jumps you see on Starlink
>     in TCP goodput over time. But could this be throwing a few
>     spanners into the works in terms of queuing? Does it invalidate
>     what we know about queues and queue management? Would surges like
>     these justify larger buffers?
>
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2: Type: text/html, Size: 6960 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 12:16   ` Ulrich Speidel
@ 2023-05-13 23:00     ` David Lang
  0 siblings, 0 replies; 34+ messages in thread
From: David Lang @ 2023-05-13 23:00 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: Sebastian Moeller, Ulrich Speidel via Starlink

[-- Attachment #1: Type: text/plain, Size: 5538 bytes --]

On Sun, 14 May 2023, Ulrich Speidel via Starlink wrote:

> I'd also imaging total user numbers to be lower and the 
> bandwidth demand per user to be less (hands up who takes their 50" TV onto 
> trains to watch Netflix in HD?).

most phones are >HD resolution, and the higher end are >4k resolution. the 
network bandwidth doesn't care if the resulting screen is 5" or 50", the 
resolution is all that matters.

> The other is that most places have 3+ networks serving the train line, which 
> brings down user numbers, or you have in-train cells, which communicate with 
> off-train POPs that have no extra users.

but the density of people in a train car is MUCH higher than in an office, even 
if split a couple of ways.

David Lang

> But yes, good question IMHO!
>
> Cheers,
>
> Ulrich
>
> On 13/05/2023 11:20 pm, Sebastian Moeller wrote:
>> Hi Ulrich,
>> 
>> This situation is not completely different from say a train full of LTE/5G 
>> users moving through a set of cells with already established 'static' 
>> users, no?
>> 
>> 
>> On 13 May 2023 12:10:17 CEST, Ulrich Speidel via Starlink 
>> <starlink@lists.bufferbloat.net> wrote:
>>
>>     Here's a bit of a question to you all. See what you make of it.
>>     I've been thinking a bit about the latencies we see in the
>>     Starlink network. This is why this list exist (right, Dave?). So
>>     what do we know? 1) We know that RTTs can be in the 100's of ms
>>     even in what appear to be bent-pipe scenarios where the physical
>>     one-way path should be well under 3000 km, with physical RTT under
>>     20 ms. 2) We know from plenty of traceroutes that these RTTs
>>     accrue in the Starlink network, not between the Starlink handover
>>     point (POP) to the Internet. 3) We know that they aren't an
>>     artifact of the Starlink WiFi router (our traceroutes were done
>>     through their Ethernet adaptor, which bypasses the router), so
>>     they must be delays on the satellites or the teleports. 4) We know
>>     that processing delay isn't a huge factor because we also see RTTs
>>     well under 30 ms. 5) That leaves queuing delays. This issue has
>>     been known for a while now. Starlink have been innovating their
>>     heart out around pretty much everything here - and yet, this
>>     bufferbloat issue hasn't changed, despite Dave proposing what
>>     appears to be an easy fix compared to a lot of other things they
>>     have done. So what are we possibly missing here? Going back to
>>     first principles: The purpose of a buffer on a network device is
>>     to act as a shock absorber against sudden traffic bursts. If I
>>     want to size that buffer correctly, I need to know at the very
>>     least (paraphrasing queueing theory here) something about my
>>     packet arrival process. If I look at conventional routers, then
>>     that arrival process involves traffic generated by a user
>>     population that changes relatively slowly: WiFi users come and go.
>>     One at a time. Computers in a company get turned on and off and
>>     rebooted, but there are no instantaneous jumps in load - you don't
>>     suddenly have a hundred users in the middle of watching Netflix
>>     turning up that weren't there a second ago. Most of what we know
>>     about Internet traffic behaviour is based on this sort of network,
>>     and this is what we've designed our queuing systems around, right?
>>     Observation: Starlink potentially breaks that paradigm. Why?
>>     Imagine a satellite X handling N users that are located closely
>>     together in a fibre-less rural town watching a range of movies.
>>     Assume that N is relatively large. Say these users are currently
>>     handled through ground station teleport A some distance away to
>>     the west (bent pipe with switching or basic routing on the
>>     satellite). X is in view of both A and the N users, but with X
>>     being a LEO satellite, that bliss doesn't last. Say X is moving to
>>     the (south- or north-)east and out of A's range. Before connection
>>     is lost, the N users migrate simultaneously to a new satellite Y
>>     that has moved into view of both A and themselves. Y is doing so
>>     from the west and is also catering to whatever users it can see
>>     there, and let's suppose has been using A for a while already. The
>>     point is that the user load on X and Y from users other than our N
>>     friends could be quite different. E.g., one of them could be over
>>     the ocean with few users, the other over countryside with a lot of
>>     customers. The TCP stacks of our N friends are (hopefully)
>>     somewhat adapted to the congestion situation on X with their cwnds
>>     open to reasonable sizes, but they are now thrown onto a
>>     completely different congestion scenario on Y. Similarly, say that
>>     Y had less than N users before the handover. For existing users on
>>     Y, there is now a huge surge of competing traffic that wasn't
>>     there a second ago - surging far faster than we would expect this
>>     to happen in a conventional network because there is no slow start
>>     involved. This seems to explain the huge jumps you see on Starlink
>>     in TCP goodput over time. But could this be throwing a few
>>     spanners into the works in terms of queuing? Does it invalidate
>>     what we know about queues and queue management? Would surges like
>>     these justify larger buffers?
>> 
>> -- 
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>

[-- Attachment #2: Type: text/plain, Size: 149 bytes --]

_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 10:10 [Starlink] Starlink hidden buffers Ulrich Speidel
  2023-05-13 11:20 ` Sebastian Moeller
@ 2023-05-13 22:57 ` David Lang
  2023-05-14  6:06   ` Ulrich Speidel
  2023-05-14  9:57 ` Oleg Kutkov
  2023-05-24 15:26 ` Bjørn Ivar Teigen
  3 siblings, 1 reply; 34+ messages in thread
From: David Lang @ 2023-05-13 22:57 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: starlink

On Sat, 13 May 2023, Ulrich Speidel via Starlink wrote:

> Here's a bit of a question to you all. See what you make of it.
>
> I've been thinking a bit about the latencies we see in the Starlink 
> network. This is why this list exist (right, Dave?). So what do we know?
>
> 1) We know that RTTs can be in the 100's of ms even in what appear to be 
> bent-pipe scenarios where the physical one-way path should be well under 
> 3000 km, with physical RTT under 20 ms.
> 2) We know from plenty of traceroutes that these RTTs accrue in the 
> Starlink network, not between the Starlink handover point (POP) to the 
> Internet.
> 3) We know that they aren't an artifact of the Starlink WiFi router (our 
> traceroutes were done through their Ethernet adaptor, which bypasses the 
> router), so they must be delays on the satellites or the teleports.

the ethernet adapter bypasses the wifi, but not the router, you have to cut the 
cable and replace the plug to bypass the router

> 4) We know that processing delay isn't a huge factor because we also see 
> RTTs well under 30 ms.
> 5) That leaves queuing delays.
>
> This issue has been known for a while now. Starlink have been innovating 
> their heart out around pretty much everything here - and yet, this 
> bufferbloat issue hasn't changed, despite Dave proposing what appears to 
> be an easy fix compared to a lot of other things they have done. So what 
> are we possibly missing here?
>
> Going back to first principles: The purpose of a buffer on a network 
> device is to act as a shock absorber against sudden traffic bursts. If I 
> want to size that buffer correctly, I need to know at the very least 
> (paraphrasing queueing theory here) something about my packet arrival 
> process.

The question is over what timeframe. If you have a huge buffer, you can buffer 
10s of seconds of traffic and eventually send it. That will make benchmarks look 
good, but not the user experience. The rapid drop in RAM prices (beyond merely 
a free fall) and the benchmark scores that heavily penalized any dropped packets 
encouraged buffers to get larger than is sane.

it's still a good question to define what is sane, the longer the buffer, the 
mor of a chance of finding time to catch up, but having packets in the buffer 
that have timed out (i.e. DNS queries tend to time out after 3 seconds, TCP will 
give up and send replacement packets, making the initial packets meaningless) is 
counterproductive. What is the acceptable delay to your users?

Here at the bufferbloat project, we tend to say that buffers past a few 10s of 
ms worth of traffic are probably bad and are aiming to single-digit ms in many 
cases.

> If I look at conventional routers, then that arrival process involves 
> traffic generated by a user population that changes relatively slowly: 
> WiFi users come and go. One at a time. Computers in a company get turned 
> on and off and rebooted, but there are no instantaneous jumps in load - 
> you don't suddenly have a hundred users in the middle of watching 
> Netflix turning up that weren't there a second ago. Most of what we know 
> about Internet traffic behaviour is based on this sort of network, and 
> this is what we've designed our queuing systems around, right?

not true, for businesses, every hour as meetings start and let out, and as 
people arrive in the morning, arrive back from lunch, you have very sharp 
changes in the traffic.

at home you have less changes in users, but you also may have less bandwidth 
(although many tech enthusiasts have more bandwidth than many companies, two of 
my last 3 jobs have had <400Mb at their main office with hundreds of employees 
while many people would consider that 'slow' for home use). As such a parent 
arriving home with a couple of kids will make a drastic change to the network 
usage in a very short time.

but the active quueing systems that we are designing (cake, fq_codel) handle 
these conditions very well because they don't try to guess what the usage is 
going to be, they just look at the packets that they have to process and figure 
out how to dispatch them out in the best way.

because we have observed that latency tends to be more noticable for short 
connections (DNS, checking if cached web pages are up to date, etc), our 
algorithms give a slight priority to new-low-traffic connections over 
long-running-high-traffic connections rather than just splitting the bandwidth 
evenly across all connections, and can even go further to split bandwith between 
endpoints, not just connections (with endpoints being a configurable definition)

without active queue management, the default is FIFO, which allows the 
high-user-impact, short connection packets to sit in a queue behind the 
low-user-impace, bulk data transfers. For benchmarks, a-packet-is-a-packet and 
they all count, so until you have enough buffering that you start having expired 
packets in flight, it doesn't matter, but for the user experience, there can be 
a huge difference.

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 22:57 ` David Lang
@ 2023-05-14  6:06   ` Ulrich Speidel
  2023-05-14  6:55     ` David Lang
  0 siblings, 1 reply; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-14  6:06 UTC (permalink / raw)
  To: David Lang; +Cc: starlink


On 14/05/2023 10:57 am, David Lang wrote:
> On Sat, 13 May 2023, Ulrich Speidel via Starlink wrote:
>
>> Here's a bit of a question to you all. See what you make of it.
>>
>> I've been thinking a bit about the latencies we see in the Starlink 
>> network. This is why this list exist (right, Dave?). So what do we know?
>>
>> 1) We know that RTTs can be in the 100's of ms even in what appear to 
>> be bent-pipe scenarios where the physical one-way path should be well 
>> under 3000 km, with physical RTT under 20 ms.
>> 2) We know from plenty of traceroutes that these RTTs accrue in the 
>> Starlink network, not between the Starlink handover point (POP) to 
>> the Internet.
>> 3) We know that they aren't an artifact of the Starlink WiFi router 
>> (our traceroutes were done through their Ethernet adaptor, which 
>> bypasses the router), so they must be delays on the satellites or the 
>> teleports.
>
> the ethernet adapter bypasses the wifi, but not the router, you have 
> to cut the cable and replace the plug to bypass the router
Good point - but you still don't get the WiFi buffering here. Or at 
least we don't seem to, looking at the difference between running with 
and without the adapter.
>
>> 4) We know that processing delay isn't a huge factor because we also 
>> see RTTs well under 30 ms.
>> 5) That leaves queuing delays.
>>
>> This issue has been known for a while now. Starlink have been 
>> innovating their heart out around pretty much everything here - and 
>> yet, this bufferbloat issue hasn't changed, despite Dave proposing 
>> what appears to be an easy fix compared to a lot of other things they 
>> have done. So what are we possibly missing here?
>>
>> Going back to first principles: The purpose of a buffer on a network 
>> device is to act as a shock absorber against sudden traffic bursts. 
>> If I want to size that buffer correctly, I need to know at the very 
>> least (paraphrasing queueing theory here) something about my packet 
>> arrival process.
>
> The question is over what timeframe. If you have a huge buffer, you 
> can buffer 10s of seconds of traffic and eventually send it. That will 
> make benchmarks look good, but not the user experience. The rapid drop 
> in RAM prices (beyond merely a free fall) and the benchmark scores 
> that heavily penalized any dropped packets encouraged buffers to get 
> larger than is sane.
>
> it's still a good question to define what is sane, the longer the 
> buffer, the mor of a chance of finding time to catch up, but having 
> packets in the buffer that have timed out (i.e. DNS queries tend to 
> time out after 3 seconds, TCP will give up and send replacement 
> packets, making the initial packets meaningless) is counterproductive. 
> What is the acceptable delay to your users?
>
> Here at the bufferbloat project, we tend to say that buffers past a 
> few 10s of ms worth of traffic are probably bad and are aiming to 
> single-digit ms in many cases.
Taken as read.
>
>> If I look at conventional routers, then that arrival process involves 
>> traffic generated by a user population that changes relatively 
>> slowly: WiFi users come and go. One at a time. Computers in a company 
>> get turned on and off and rebooted, but there are no instantaneous 
>> jumps in load - you don't suddenly have a hundred users in the middle 
>> of watching Netflix turning up that weren't there a second ago. Most 
>> of what we know about Internet traffic behaviour is based on this 
>> sort of network, and this is what we've designed our queuing systems 
>> around, right?
>
> not true, for businesses, every hour as meetings start and let out, 
> and as people arrive in the morning, arrive back from lunch, you have 
> very sharp changes in the traffic.
And herein lies the crunch: All of these things that you list happen 
over much longer timeframes than a switch to a different satellite. 
Also, folk coming back from lunch would start with something like 
cwnd=10. Users whose TCP connections get switched over to a different 
satellite by some underlying tunneling protocol could have much larger 
cwnd.
>
> at home you have less changes in users, but you also may have less 
> bandwidth (although many tech enthusiasts have more bandwidth than 
> many companies, two of my last 3 jobs have had <400Mb at their main 
> office with hundreds of employees while many people would consider 
> that 'slow' for home use). As such a parent arriving home with a 
> couple of kids will make a drastic change to the network usage in a 
> very short time.
I think you've missed my point - I'm talking about changes in network 
mid-flight, not people coming home and getting started over a period of 
a few minutes. The change you see in a handover is sudden and probably 
width sub-second ramp-up. And it's something that doesn't just happen 
when people come home or return from lunch - it happens every few minutes.
>
>
> but the active quueing systems that we are designing (cake, fq_codel) 
> handle these conditions very well because they don't try to guess what 
> the usage is going to be, they just look at the packets that they have 
> to process and figure out how to dispatch them out in the best way.
Understood - I've followed your work.
>
> because we have observed that latency tends to be more noticable for 
> short connections (DNS, checking if cached web pages are up to date, 
> etc), our algorithms give a slight priority to new-low-traffic 
> connections over long-running-high-traffic connections rather than 
> just splitting the bandwidth evenly across all connections, and can 
> even go further to split bandwith between endpoints, not just 
> connections (with endpoints being a configurable definition)
>
> without active queue management, the default is FIFO, which allows the 
> high-user-impact, short connection packets to sit in a queue behind 
> the low-user-impace, bulk data transfers. For benchmarks, 
> a-packet-is-a-packet and they all count, so until you have enough 
> buffering that you start having expired packets in flight, it doesn't 
> matter, but for the user experience, there can be a huge difference.

All understood - you're preaching to the converted. It's just that I 
think Starlink may be a different ballpark.

Put another way: If you have a protocol (TCP) that is designed to 
reasonably expect that its current cwnd is OK to use for now is put into 
a situation where there are relatively frequent, huge and lasting step 
changes in available BDP within subsecond periods, are your underlying 
assumptions still valid?

I suspect they're handing over whole cells, not individual users, at a 
time.

>
> David Lang
>
-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  6:06   ` Ulrich Speidel
@ 2023-05-14  6:55     ` David Lang
  2023-05-14  8:43       ` Ulrich Speidel
  0 siblings, 1 reply; 34+ messages in thread
From: David Lang @ 2023-05-14  6:55 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

On Sun, 14 May 2023, Ulrich Speidel wrote:

> On 14/05/2023 10:57 am, David Lang wrote:
>> On Sat, 13 May 2023, Ulrich Speidel via Starlink wrote:
>> 
>>> Here's a bit of a question to you all. See what you make of it.
>>> 
>>> I've been thinking a bit about the latencies we see in the Starlink 
>>> network. This is why this list exist (right, Dave?). So what do we know?
>>> 
>>> 1) We know that RTTs can be in the 100's of ms even in what appear to be 
>>> bent-pipe scenarios where the physical one-way path should be well under 
>>> 3000 km, with physical RTT under 20 ms.
>>> 2) We know from plenty of traceroutes that these RTTs accrue in the 
>>> Starlink network, not between the Starlink handover point (POP) to the 
>>> Internet.
>>> 3) We know that they aren't an artifact of the Starlink WiFi router (our 
>>> traceroutes were done through their Ethernet adaptor, which bypasses the 
>>> router), so they must be delays on the satellites or the teleports.
>> 
>> the ethernet adapter bypasses the wifi, but not the router, you have to cut 
>> the cable and replace the plug to bypass the router
>
> Good point - but you still don't get the WiFi buffering here. Or at least we 
> don't seem to, looking at the difference between running with and without the 
> adapter.

wifi is an added layer, with it's own problems, eliminating those problems when 
testing the satellite link is the first step, but it would also be a good idea 
to take the next step and bypass the router.

I just discovered that someone is manufacturing an adapter so you no longer have 
to cut the cable

https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P

> Put another way: If you have a protocol (TCP) that is designed to reasonably 
> expect that its current cwnd is OK to use for now is put into a situation 
> where there are relatively frequent, huge and lasting step changes in 
> available BDP within subsecond periods, are your underlying assumptions still 
> valid?

I think that with interference from other APs, WIFI suffers at least as much 
unpredictable changes to the available bandwidth.

> I suspect they're handing over whole cells, not individual users, at a time.

I would guess the same (remember, in spite of them having launched >4000 
satellites, this is still the early days, with the network changing as more are 
launching)

We've seen that it seems that there is only one satellite serving any cell at 
one time. But remember that the system does know how much usage there is in the 
cell before they do the handoff. It's unknown if they do anything with that, or 
if they are just relaying based on geography. We also don't know what the 
bandwidth to the ground stations is compared to the dishy.

And remember that for every cell that a satellite takes over, it's also giving 
away one cell at the same time.

I'm not saying that the problem is trivial, but just that it's not unique

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  6:55     ` David Lang
@ 2023-05-14  8:43       ` Ulrich Speidel
  2023-05-14  9:00         ` David Lang
  2023-05-14  9:06         ` Sebastian Moeller
  0 siblings, 2 replies; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-14  8:43 UTC (permalink / raw)
  To: David Lang; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 3304 bytes --]

On 14/05/2023 6:55 pm, David Lang wrote:
>
> I just discovered that someone is manufacturing an adapter so you no 
> longer have
> to cut the cable
>
> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>
>
I'll see whether I can get hold of one of these. Cutting a cable on a 
university IT asset as an academic is not allowed here, except if it 
doesn't meet electrical safety standards.

Alternatively, has anyone tried the standard Starlink Ethernet adapter 
with a PoE injector instead of the WiFi box? The adapter above seems to 
be like the Starlink one (which also inserts into the cable between 
Dishy and router).

> > Put another way: If you have a protocol (TCP) that is designed to 
> reasonably
> > expect that its current cwnd is OK to use for now is put into a 
> situation
> > where there are relatively frequent, huge and lasting step changes in
> > available BDP within subsecond periods, are your underlying 
> assumptions still
> > valid?
>
> I think that with interference from other APs, WIFI suffers at least 
> as much
> unpredictable changes to the available bandwidth.
Really? I'm thinking stuff like the sudden addition of packets from 
potentially dozens of TCP flows with large cwnd's?
>
> > I suspect they're handing over whole cells, not individual users, at 
> a time.
>
> I would guess the same (remember, in spite of them having launched >4000
> satellites, this is still the early days, with the network changing as 
> more are
> launching)
>
> We've seen that it seems that there is only one satellite serving any 
> cell at
> one time. 
But the reverse is almost certainly not true: Each satellite must serve 
multiple cells.
> But remember that the system does know how much usage there is in the
> cell before they do the handoff. It's unknown if they do anything with 
> that, or
> if they are just relaying based on geography. We also don't know what the
> bandwidth to the ground stations is compared to the dishy.
Well, we do know for NZ, sort of, based on the licences Starlink has here.
>
> And remember that for every cell that a satellite takes over, it's 
> also giving
> away one cell at the same time.
Yes, except that some cells may have no users in them and some of them 
have a lot (think of a satellite flying into range of California from 
the Pacific, dropping over-the-water cells and acquiring land-based ones).
>
> I'm not saying that the problem is trivial, but just that it's not unique
What makes me suspicious here that it's not the usual bufferbloat 
problem is this: With conventional bufferbloat and FIFOs, you'd expect 
standing queues, right? With Starlink, we see the queues emptying 
relatively occasionally with RTTs in the low 20 ms, and in some cases 
under 20 ms even. With large ping packets (1500 bytes).
>
> David Lang

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2: Type: text/html, Size: 4962 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  8:43       ` Ulrich Speidel
@ 2023-05-14  9:00         ` David Lang
  2023-05-15  2:41           ` Ulrich Speidel
  2023-05-14  9:06         ` Sebastian Moeller
  1 sibling, 1 reply; 34+ messages in thread
From: David Lang @ 2023-05-14  9:00 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

On Sun, 14 May 2023, Ulrich Speidel wrote:

>> I just discovered that someone is manufacturing an adapter so you no longer 
>> have
>> to cut the cable
>> 
>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>
>> 
> I'll see whether I can get hold of one of these. Cutting a cable on a 
> university IT asset as an academic is not allowed here, except if it doesn't 
> meet electrical safety standards.
>
> Alternatively, has anyone tried the standard Starlink Ethernet adapter with a 
> PoE injector instead of the WiFi box? The adapter above seems to be like the 
> Starlink one (which also inserts into the cable between Dishy and router).

that connects you a 2nd ethernet port on the router, not on the dishy

I just ordered one of those adapters, it will take a few weeks to arrive.

>> > Put another way: If you have a protocol (TCP) that is designed to 
>> > reasonably
>> > expect that its current cwnd is OK to use for now is put into a situation
>> > where there are relatively frequent, huge and lasting step changes in
>> > available BDP within subsecond periods, are your underlying assumptions 
>> > still
>> > valid?
>> 
>> I think that with interference from other APs, WIFI suffers at least as much 
>> unpredictable changes to the available bandwidth.

> Really? I'm thinking stuff like the sudden addition of packets from 
> potentially dozens of TCP flows with large cwnd's?

vs losing 90% of your available bandwidth to interference?? I think it's going 
to be a similar problem

>> 
>> > I suspect they're handing over whole cells, not individual users, at a 
>> time.
>> 
>> I would guess the same (remember, in spite of them having launched >4000
>> satellites, this is still the early days, with the network changing as more 
>> launching)
>> 
>> We've seen that it seems that there is only one satellite serving any cell 
>> one time.

> But the reverse is almost certainly not true: Each satellite must serve 
> multiple cells.

true, but while the satellite over a given area will change, the usage in that 
area isn't changing that much

>> But remember that the system does know how much usage there is in the
>> cell before they do the handoff. It's unknown if they do anything with 
>> that, or
>> if they are just relaying based on geography. We also don't know what the
>> bandwidth to the ground stations is compared to the dishy.

> Well, we do know for NZ, sort of, based on the licences Starlink has here.

what is the ground station bandwith?

>> And remember that for every cell that a satellite takes over, it's also 
>> giving away one cell at the same time.

> Yes, except that some cells may have no users in them and some of them have a 
> lot (think of a satellite flying into range of California from the Pacific, 
> dropping over-the-water cells and acquiring land-based ones).

>> I'm not saying that the problem is trivial, but just that it's not unique

> What makes me suspicious here that it's not the usual bufferbloat problem is 
> this: With conventional bufferbloat and FIFOs, you'd expect standing queues, 
> right? With Starlink, we see the queues emptying relatively occasionally with 
> RTTs in the low 20 ms, and in some cases under 20 ms even. With large ping 
> packets (1500 bytes).

it's not directly a bufferbloat problem, bufferbloat is a side effect (At most)

we know that the avaialble starlink bandwidth is chopped into timeslots (sorry, 
don't remember how many), and I could see the possibility of there being the 
same number of timeslots down to the ground station as up from the dishies, and 
if the bottleneck is at the uplink from the ground station, then things would 
queue there.

As latency changes, figuring out if it's extra distance that must be traveled, 
or buffering is hard. does the latency stay roughly the same until the next 
satellite change? or does it taper off?

If it stays the same, I would suspect that you are actually hitting a different 
ground station and there is a VPN backhaul to your egress point to the regular 
Internet (which doesn't support mobile IP addresses) for that cycle. If it 
tapers off, then I could buy bufferbloat that gets resolved as TCP backs off.

my main point in replying several messages ago was to point out other scenarios 
where the load changes rapidly and/or the available bandwidth changes rapidly. 
And you are correct that it is generally not handled well by common equipment.

I think that active queue management on the sending side of the bottleneck will 
handle it fairly well. It doesn't have to do calculations based on what the 
bandwidth is, it just needs to know what it has pending to go out.

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  9:00         ` David Lang
@ 2023-05-15  2:41           ` Ulrich Speidel
  2023-05-15  3:33             ` David Lang
  0 siblings, 1 reply; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-15  2:41 UTC (permalink / raw)
  To: David Lang; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 9836 bytes --]

On 14/05/2023 9:00 pm, David Lang wrote:
> On Sun, 14 May 2023, Ulrich Speidel wrote:
>
> >> I just discovered that someone is manufacturing an adapter so you 
> no longer
> >> have
> >> to cut the cable
> >>
> >> 
> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P> 
>
> >> 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>>
> >>
> > I'll see whether I can get hold of one of these. Cutting a cable on a
> > university IT asset as an academic is not allowed here, except if it 
> doesn't
> > meet electrical safety standards.
> >
> > Alternatively, has anyone tried the standard Starlink Ethernet 
> adapter with a
> > PoE injector instead of the WiFi box? The adapter above seems to be 
> like the
> > Starlink one (which also inserts into the cable between Dishy and 
> router).
>
> that connects you a 2nd ethernet port on the router, not on the dishy
>
> I just ordered one of those adapters, it will take a few weeks to arrive.
How do we know that the Amazon version doesn't do the same?
>
> >> > Put another way: If you have a protocol (TCP) that is designed to
> >> > reasonably
> >> > expect that its current cwnd is OK to use for now is put into a 
> situation
> >> > where there are relatively frequent, huge and lasting step changes in
> >> > available BDP within subsecond periods, are your underlying 
> assumptions
> >> > still
> >> > valid?
> >>
> >> I think that with interference from other APs, WIFI suffers at 
> least as much
> >> unpredictable changes to the available bandwidth.
>
> > Really? I'm thinking stuff like the sudden addition of packets from
> > potentially dozens of TCP flows with large cwnd's?
>
> vs losing 90% of your available bandwidth to interference?? I think 
> it's going
> to be a similar problem
Hm. Not convinced, but I take your point...
>
> >>
> >> > I suspect they're handing over whole cells, not individual users, 
> at a
> >> time.
> >>
> >> I would guess the same (remember, in spite of them having launched 
> >4000
> >> satellites, this is still the early days, with the network changing 
> as more
> >> launching)
> >>
> >> We've seen that it seems that there is only one satellite serving 
> any cell
> >> one time.
>
> > But the reverse is almost certainly not true: Each satellite must serve
> > multiple cells.
>
> true, but while the satellite over a given area will change, the usage 
> in that
> area isn't changing that much
Exactly. But your underlying queue sits on the satellite, not in the area.
>
> >> But remember that the system does know how much usage there is in the
> >> cell before they do the handoff. It's unknown if they do anything with
> >> that, or
> >> if they are just relaying based on geography. We also don't know 
> what the
> >> bandwidth to the ground stations is compared to the dishy.
>
> > Well, we do know for NZ, sort of, based on the licences Starlink has 
> here.
>
> what is the ground station bandwith?

https://rrf.rsm.govt.nz/ui/search/licence - seach for "Starlink"

...all NZ licences in all their glory. Looking at Starlink SES 
(satellite earth station) TX (which is the interesting direction I guess):

- Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 29750.000000 TX 
(BW = 500 MHz)
- Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28850.000000 TX 
(BW = 500 MHz)
- Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28350.000000 TX 
(BW = 500 MHz)
- Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28250.000000 TX 
(BW = 500 MHz)
- Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 27750.000000 TX 
(BW = 500 MHz)

So 2.5 GHz up, licensed from 6 ground stations. Now I'm not convinced 
that they would use all of those from all locations simultaneously 
because of the risk of off-beam interference. They'll all be 
transmitting south, ballpark. If there was full re-use at all ground 
stations, we'd be looking at 15 GHz. If they are able to re-use on all 
antennas at each ground station, then we're looking at 9 golf balls each 
in Puwera, Te Hana, Clevedon, Hinds and Cromwell, and an unknown number 
at Awarua. Assuming 9 there, we'd be looking at 135 GHz all up max.

Awarua and Cromwell are 175 km apart, Hinds another 220 km from 
Cromwell, then it's a hop of about 830 km to Clevedon, and from there 
another 100 km to Te Hana, which is another 53 km from Puwera, so 
keeping them all out of each other's hair all the time might be a bit 
difficult.

Lots of other interesting info in the licenses, such as EIRP, in case 
you're wanting to do link budgets.

>
> >> And remember that for every cell that a satellite takes over, it's 
> also
> >> giving away one cell at the same time.
>
> > Yes, except that some cells may have no users in them and some of 
> them have a
> > lot (think of a satellite flying into range of California from the 
> Pacific,
> > dropping over-the-water cells and acquiring land-based ones).
>
> >> I'm not saying that the problem is trivial, but just that it's not 
> unique
>
> > What makes me suspicious here that it's not the usual bufferbloat 
> problem is
> > this: With conventional bufferbloat and FIFOs, you'd expect standing 
> queues,
> > right? With Starlink, we see the queues emptying relatively 
> occasionally with
> > RTTs in the low 20 ms, and in some cases under 20 ms even. With 
> large ping
> > packets (1500 bytes).
>
> it's not directly a bufferbloat problem, bufferbloat is a side effect 
> (At most)
>
> we know that the avaialble starlink bandwidth is chopped into 
> timeslots (sorry,
> don't remember how many), and I could see the possibility of there 
> being the
> same number of timeslots down to the ground station as up from the 
> dishies, and
> if the bottleneck is at the uplink from the ground station, then 
> things would
> queue there.
>
> As latency changes, figuring out if it's extra distance that must be 
> traveled,
> or buffering is hard. does the latency stay roughly the same until the 
> next
> satellite change? or does it taper off?
Good question. You would expect step changes in physical latency between 
satellites, but also gradual change related to satellite movement. Plus 
of course any rubble thrown into any queue by something suddenly turning 
up on that path. Don't forget that it's not just cells now, we're also 
talking up- and downlink for the laser ISLs, at least in some places.
>
> If it stays the same, I would suspect that you are actually hitting a 
> different
> ground station and there is a VPN backhaul to your egress point to the 
> regular
> Internet (which doesn't support mobile IP addresses) for that cycle. 
> If it
> tapers off, then I could buy bufferbloat that gets resolved as TCP 
> backs off.

Yes, quite sorting out which part of your latency is what is the million 
dollar question here...

We saw significant RTT changes here during the recent cyclone over 
periods of several hours, and these came in steps (see below), with the 
initial change being a downward one. Averages are over 60 pings (the 
time scale isn't 100% true as we used "one ping, one second" timing) here.


We're still not sure whether to attribute this to load change or ground 
station changes. There were a lot of power outages, especially in 
Auckland's lifestyle block belt, which teems with Starlink users, but 
all three North Island ground stations were also in areas affected by 
power outages (although the power companies concerned don't provide the 
level of detail to establish whether they were affected). It's also not 
clear what, if any, backup power arrangements they have). At ~25 ms, the 
step changes in RTT are too large be the result of a switch in ground 
stations, though, the path differences just aren't that large. You'd 
also expect a ground station outage to result in longer RTTs, not 
shorter ones, if you need to re-route via another ground station. One 
explanation might be users getting cut off if they relied on one 
particular ground station for bent pipe ops - but that would not explain 
this order of magnitude effect as I'd expect that number to be small. So 
maybe power outages at the user end after all. But that would then tell 
us that these are load-dependent queuing delays. Moreover, since those 
load changes wouldn't have involved the router at our site, we can 
conclude that these are queue sojourn times in the Starlink network.

>
> my main point in replying several messages ago was to point out other 
> scenarios
> where the load changes rapidly and/or the available bandwidth changes 
> rapidly.
> And you are correct that it is generally not handled well by common 
> equipment.
>
> I think that active queue management on the sending side of the 
> bottleneck will
> handle it fairly well. It doesn't have to do calculations based on 
> what the
> bandwidth is, it just needs to know what it has pending to go out.
Understood - but your customer for AQM is the sending TCP client, and 
there are two questions here: (a) Does your AQM handle rapid load 
changes and (b) how do your TCP clients actually respond to your AQM's 
handling?
>
> David Lang

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2.1: Type: text/html, Size: 13218 bytes --]

[-- Attachment #2.2: gabrielle_rtt_pl.png --]
[-- Type: image/png, Size: 23782 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-15  2:41           ` Ulrich Speidel
@ 2023-05-15  3:33             ` David Lang
  2023-05-15  6:36               ` Sebastian Moeller
  2023-05-24 12:55               ` Ulrich Speidel
  0 siblings, 2 replies; 34+ messages in thread
From: David Lang @ 2023-05-15  3:33 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

On Mon, 15 May 2023, Ulrich Speidel wrote:

> On 14/05/2023 9:00 pm, David Lang wrote:
>> On Sun, 14 May 2023, Ulrich Speidel wrote:
>> 
>> >> I just discovered that someone is manufacturing an adapter so you no 
>> >> longer have to cut the cable
>> >>
>> >> 
>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P

>> >>
>> > I'll see whether I can get hold of one of these. Cutting a cable on a 
>> > university IT asset as an academic is not allowed here, except if it 
>> > doesn't meet electrical safety standards.
>> >
>> > Alternatively, has anyone tried the standard Starlink Ethernet adapter with 
>> > a PoE injector instead of the WiFi box? The adapter above seems to be like 
>> > the Starlink one (which also inserts into the cable between Dishy and 
>> > router).
>> 
>> that connects you a 2nd ethernet port on the router, not on the dishy
>> 
>> I just ordered one of those adapters, it will take a few weeks to arrive.
> How do we know that the Amazon version doesn't do the same?

because it doesn't involve the router at all. It allows you to replace the 
router with anything you want.

People have documented how to cut the cable and crimp on a RJ45 connector, use a 
standard PoE injector, and connect to any router you want. I was preparing to do 
that (and probably still will for one cable to use a different locations to 
avoid having a 75 ft cable from the dish mounted on the roof of my van to the 
router a couple feet away), This appears to allow me to do the same functional 
thing, but without cutting the cable.

>> >> > I suspect they're handing over whole cells, not individual users, at a 
>> >> > time.
>> >>
>> >> I would guess the same (remember, in spite of them having launched >4000 
>> >> satellites, this is still the early days, with the network changing as 
>> >> more launching)
>> >>
>> >> We've seen that it seems that there is only one satellite serving any cell 
>> >> one time.
>> 
>> > But the reverse is almost certainly not true: Each satellite must serve 
>> > multiple cells.
>> 
>> true, but while the satellite over a given area will change, the usage in 
>> that area isn't changing that much

> Exactly. But your underlying queue sits on the satellite, not in the area.

only if the satellite is where you have more input than output. That may be the 
case for users uploading, but for users downloading, I would expect that the 
bandwidth bottleneck would be from the Internet connected ground station to the 
satellite, with the satellite serving many cells but only having one uplink.

>> >> But remember that the system does know how much usage there is in the cell 
>> >> before they do the handoff. It's unknown if they do anything with that, or 
>> >> if they are just relaying based on geography. We also don't know what the 
>> >> bandwidth to the ground stations is compared to the dishy.
>> 
>> > Well, we do know for NZ, sort of, based on the licences Starlink has here.
>> 
>> what is the ground station bandwith?
>
> https://rrf.rsm.govt.nz/ui/search/licence - seach for "Starlink"
>
> ...all NZ licences in all their glory. Looking at Starlink SES (satellite 
> earth station) TX (which is the interesting direction I guess):
>
> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 29750.000000 TX (BW = 
> 500 MHz)
> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28850.000000 TX (BW = 
> 500 MHz)
> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28350.000000 TX (BW = 
> 500 MHz)
> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28250.000000 TX (BW = 
> 500 MHz)
> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 27750.000000 TX (BW = 
> 500 MHz)
>
> So 2.5 GHz up, licensed from 6 ground stations. Now I'm not convinced that 
> they would use all of those from all locations simultaneously because of the 
> risk of off-beam interference. They'll all be transmitting south, ballpark. 
> If there was full re-use at all ground stations, we'd be looking at 15 GHz. 
> If they are able to re-use on all antennas at each ground station, then we're 
> looking at 9 golf balls each in Puwera, Te Hana, Clevedon, Hinds and 
> Cromwell, and an unknown number at Awarua. Assuming 9 there, we'd be looking 
> at 135 GHz all up max.
>
> Awarua and Cromwell are 175 km apart, Hinds another 220 km from Cromwell, 
> then it's a hop of about 830 km to Clevedon, and from there another 100 km to 
> Te Hana, which is another 53 km from Puwera, so keeping them all out of each 
> other's hair all the time might be a bit difficult.
>
> Lots of other interesting info in the licenses, such as EIRP, in case you're 
> wanting to do link budgets.

I was asking more in terms of Gb/s rather than MHz of bandwidth. Dedicated 
ground stations with bigger antennas, better filters, more processing and 
overall a much higher budget can get much better data rates out of a given 
amount of bandwidth than the user end stations will.

it's also possible (especially with bigger antennas) for one ground station 
location to talk to multiple different satellites at once (the aiming of the 
antennas can isolate the signals from each other)

>> As latency changes, figuring out if it's extra distance that must be 
>> traveled, or buffering is hard. does the latency stay roughly the same until 
>> the next satellite change? or does it taper off?

> Good question. You would expect step changes in physical latency between 
> satellites, but also gradual change related to satellite movement. Plus of 
> course any rubble thrown into any queue by something suddenly turning up on 
> that path. Don't forget that it's not just cells now, we're also talking up- 
> and downlink for the laser ISLs, at least in some places.

how far do the satellites move in 15 min and what effect would that have on 
latency (I would assume that most of the time, the satellites are switched to as 
they are getting nearer the two stations, so most of the time, I would expect a 
slight reduction in latency for ~7 min and then a slight increase for ~7 min, 
but I would not expect that this would be a large variation

>> If it stays the same, I would suspect that you are actually hitting a 
>> different ground station and there is a VPN backhaul to your egress point to 
>> the regular Internet (which doesn't support mobile IP addresses) for that 
>> cycle. If it tapers off, then I could buy bufferbloat that gets resolved as 
>> TCP backs off.
>
> Yes, quite sorting out which part of your latency is what is the million 
> dollar question here...
>
> We saw significant RTT changes here during the recent cyclone over periods of 
> several hours, and these came in steps (see below), with the initial change 
> being a downward one. Averages are over 60 pings (the time scale isn't 100% 
> true as we used "one ping, one second" timing) here.
>
>
> We're still not sure whether to attribute this to load change or ground 
> station changes. There were a lot of power outages, especially in Auckland's 
> lifestyle block belt, which teems with Starlink users, but all three North 
> Island ground stations were also in areas affected by power outages (although 
> the power companies concerned don't provide the level of detail to establish 
> whether they were affected). It's also not clear what, if any, backup power 
> arrangements they have). At ~25 ms, the step changes in RTT are too large be 
> the result of a switch in ground stations, though, the path differences just 
> aren't that large. You'd also expect a ground station outage to result in 
> longer RTTs, not shorter ones, if you need to re-route via another ground 
> station. One explanation might be users getting cut off if they relied on one 
> particular ground station for bent pipe ops - but that would not explain this 
> order of magnitude effect as I'd expect that number to be small. So maybe 
> power outages at the user end after all. But that would then tell us that 
> these are load-dependent queuing delays. Moreover, since those load changes 
> wouldn't have involved the router at our site, we can conclude that these are 
> queue sojourn times in the Starlink network.

I have two starlink dishes in the southern california area, I'm going to put 
one on the low-priority mobile plan shortly. These are primarily used for backup 
communication, so I would be happy to add something to them to do latency 
monitoring. In looking at what geo-location reports my location as, I see it 
wander up and down the west coast, from the Los Angeles area all the way up to 
Canada.

>> I think that active queue management on the sending side of the bottleneck 
>> will handle it fairly well. It doesn't have to do calculations based on what 
>> the bandwidth is, it just needs to know what it has pending to go out.

> Understood - but your customer for AQM is the sending TCP client, and there 
> are two questions here: (a) Does your AQM handle rapid load changes and (b) 
> how do your TCP clients actually respond to your AQM's handling?

AQM allocates the available bandwidth between different connections (usually 
different users)

When it does this indirectly for inbound traffic by delaying acks, the results 
depend on the senders handling of these indirect signals that were never 
intended for this purpose.

But when it does this directly on the sending side, it doesn't matter what the 
senders want, their data WILL be managed to the priority/bandwidth that the AQM 
sets, and eventually their feedback is dropped packets, which everyone who is 
legitimate responds to. But even if they don't respond (say a ping flood or DoS 
attack), the AQM will limit the damage to that connection, allowing the other 
connections trying to use that link to continue to function.

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-15  3:33             ` David Lang
@ 2023-05-15  6:36               ` Sebastian Moeller
  2023-05-15 11:07                 ` David Lang
  2023-05-24 12:55               ` Ulrich Speidel
  1 sibling, 1 reply; 34+ messages in thread
From: Sebastian Moeller @ 2023-05-15  6:36 UTC (permalink / raw)
  To: David Lang; +Cc: Ulrich Speidel, starlink

Hi David,

please see [SM] below.

> On May 15, 2023, at 05:33, David Lang via Starlink <starlink@lists.bufferbloat.net> wrote:
> 
> On Mon, 15 May 2023, Ulrich Speidel wrote:
> 
>> On 14/05/2023 9:00 pm, David Lang wrote:
>>> On Sun, 14 May 2023, Ulrich Speidel wrote:
>>> >> I just discovered that someone is manufacturing an adapter so you no >> longer have to cut the cable
>>> >>
>>> >> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P
> 
>>> >>
>>> > I'll see whether I can get hold of one of these. Cutting a cable on a > university IT asset as an academic is not allowed here, except if it > doesn't meet electrical safety standards.
>>> >
>>> > Alternatively, has anyone tried the standard Starlink Ethernet adapter with > a PoE injector instead of the WiFi box? The adapter above seems to be like > the Starlink one (which also inserts into the cable between Dishy and > router).
>>> that connects you a 2nd ethernet port on the router, not on the dishy
>>> I just ordered one of those adapters, it will take a few weeks to arrive.
>> How do we know that the Amazon version doesn't do the same?
> 
> because it doesn't involve the router at all. It allows you to replace the router with anything you want.
> 
> People have documented how to cut the cable and crimp on a RJ45 connector, use a standard PoE injector, and connect to any router you want. I was preparing to do that (and probably still will for one cable to use a different locations to avoid having a 75 ft cable from the dish mounted on the roof of my van to the router a couple feet away), This appears to allow me to do the same functional thing, but without cutting the cable.
> 
>>> >> > I suspect they're handing over whole cells, not individual users, at a >> > time.
>>> >>
>>> >> I would guess the same (remember, in spite of them having launched >4000 >> satellites, this is still the early days, with the network changing as >> more launching)
>>> >>
>>> >> We've seen that it seems that there is only one satellite serving any cell >> one time.
>>> > But the reverse is almost certainly not true: Each satellite must serve > multiple cells.
>>> true, but while the satellite over a given area will change, the usage in that area isn't changing that much
> 
>> Exactly. But your underlying queue sits on the satellite, not in the area.
> 
> only if the satellite is where you have more input than output. That may be the case for users uploading, but for users downloading, I would expect that the bandwidth bottleneck would be from the Internet connected ground station to the satellite, with the satellite serving many cells but only having one uplink.
> 
>>> >> But remember that the system does know how much usage there is in the cell >> before they do the handoff. It's unknown if they do anything with that, or >> if they are just relaying based on geography. We also don't know what the >> bandwidth to the ground stations is compared to the dishy.
>>> > Well, we do know for NZ, sort of, based on the licences Starlink has here.
>>> what is the ground station bandwith?
>> 
>> https://rrf.rsm.govt.nz/ui/search/licence - seach for "Starlink"
>> 
>> ...all NZ licences in all their glory. Looking at Starlink SES (satellite earth station) TX (which is the interesting direction I guess):
>> 
>> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 29750.000000 TX (BW = 500 MHz)
>> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28850.000000 TX (BW = 500 MHz)
>> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28350.000000 TX (BW = 500 MHz)
>> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28250.000000 TX (BW = 500 MHz)
>> - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 27750.000000 TX (BW = 500 MHz)
>> 
>> So 2.5 GHz up, licensed from 6 ground stations. Now I'm not convinced that they would use all of those from all locations simultaneously because of the risk of off-beam interference. They'll all be transmitting south, ballpark. If there was full re-use at all ground stations, we'd be looking at 15 GHz. If they are able to re-use on all antennas at each ground station, then we're looking at 9 golf balls each in Puwera, Te Hana, Clevedon, Hinds and Cromwell, and an unknown number at Awarua. Assuming 9 there, we'd be looking at 135 GHz all up max.
>> 
>> Awarua and Cromwell are 175 km apart, Hinds another 220 km from Cromwell, then it's a hop of about 830 km to Clevedon, and from there another 100 km to Te Hana, which is another 53 km from Puwera, so keeping them all out of each other's hair all the time might be a bit difficult.
>> 
>> Lots of other interesting info in the licenses, such as EIRP, in case you're wanting to do link budgets.
> 
> I was asking more in terms of Gb/s rather than MHz of bandwidth. Dedicated ground stations with bigger antennas, better filters, more processing and overall a much higher budget can get much better data rates out of a given amount of bandwidth than the user end stations will.
> 
> it's also possible (especially with bigger antennas) for one ground station location to talk to multiple different satellites at once (the aiming of the antennas can isolate the signals from each other)
> 
>>> As latency changes, figuring out if it's extra distance that must be traveled, or buffering is hard. does the latency stay roughly the same until the next satellite change? or does it taper off?
> 
>> Good question. You would expect step changes in physical latency between satellites, but also gradual change related to satellite movement. Plus of course any rubble thrown into any queue by something suddenly turning up on that path. Don't forget that it's not just cells now, we're also talking up- and downlink for the laser ISLs, at least in some places.
> 
> how far do the satellites move in 15 min and what effect would that have on latency (I would assume that most of the time, the satellites are switched to as they are getting nearer the two stations, so most of the time, I would expect a slight reduction in latency for ~7 min and then a slight increase for ~7 min, but I would not expect that this would be a large variation
> 
>>> If it stays the same, I would suspect that you are actually hitting a different ground station and there is a VPN backhaul to your egress point to the regular Internet (which doesn't support mobile IP addresses) for that cycle. If it tapers off, then I could buy bufferbloat that gets resolved as TCP backs off.
>> 
>> Yes, quite sorting out which part of your latency is what is the million dollar question here...
>> 
>> We saw significant RTT changes here during the recent cyclone over periods of several hours, and these came in steps (see below), with the initial change being a downward one. Averages are over 60 pings (the time scale isn't 100% true as we used "one ping, one second" timing) here.
>> 
>> 
>> We're still not sure whether to attribute this to load change or ground station changes. There were a lot of power outages, especially in Auckland's lifestyle block belt, which teems with Starlink users, but all three North Island ground stations were also in areas affected by power outages (although the power companies concerned don't provide the level of detail to establish whether they were affected). It's also not clear what, if any, backup power arrangements they have). At ~25 ms, the step changes in RTT are too large be the result of a switch in ground stations, though, the path differences just aren't that large. You'd also expect a ground station outage to result in longer RTTs, not shorter ones, if you need to re-route via another ground station. One explanation might be users getting cut off if they relied on one particular ground station for bent pipe ops - but that would not explain this order of magnitude effect as I'd expect that number to be small. So maybe power outages at the user end after all. But that would then tell us that these are load-dependent queuing delays. Moreover, since those load changes wouldn't have involved the router at our site, we can conclude that these are queue sojourn times in the Starlink network.
> 
> I have two starlink dishes in the southern california area, I'm going to put one on the low-priority mobile plan shortly. These are primarily used for backup communication, so I would be happy to add something to them to do latency monitoring.


	[SM] I would consider using irtt for that (like running in for 15 minutes with say 5ms spacing a few times a day, sometimes together with a saturating load sometimes without), this is a case where OWDs are especially interesting and irtt will also report the direction in which packets were lost. Maybe Dave (once back from his time-off) has an idea about which irtt reflector to use?


> In looking at what geo-location reports my location as, I see it wander up and down the west coast, from the Los Angeles area all the way up to Canada.

	[SM] Demonstrating once more that geoIP is just a heuristic ;)


> 
>>> I think that active queue management on the sending side of the bottleneck will handle it fairly well. It doesn't have to do calculations based on what the bandwidth is, it just needs to know what it has pending to go out.
> 
>> Understood - but your customer for AQM is the sending TCP client, and there are two questions here: (a) Does your AQM handle rapid load changes and (b) how do your TCP clients actually respond to your AQM's handling?
> 
> AQM allocates the available bandwidth between different connections (usually different users)

	[SM] Not sure AQM is actually defined that stringently, I was under the impression anything other that FIFO with tail drop would already count as AQM, no?

> When it does this indirectly for inbound traffic by delaying acks, the results depend on the senders handling of these indirect signals that were never intended for this purpose.

	[SM] Hmm, ACKs where intended to be a feed-back mechanism for the sender to use to asses the in-flight data, so I am not sure whether delaying ACKs is something that was never envisaged by TCP's creators?

> 
> But when it does this directly on the sending side, it doesn't matter what the senders want, their data WILL be managed to the priority/bandwidth that the AQM sets, and eventually their feedback is dropped packets, which everyone who is legitimate responds to.

	[SM] Some more quickly than others though, looking at you BBR ;)


> But even if they don't respond (say a ping flood or DoS attack), the AQM will limit the damage to that connection, allowing the other connections trying to use that link to continue to function.

	[SM] Would that not require an AQM with effectively a multi-queue scheduler? I think it seems clear that starlink uses some form of AQM (potentially ARED), but on the scheduler/queue side there see to be competing claims ranging from single-queue FIFO (with ARED) to FQ-scheduler. Not having a starlink-link I can not test any of this so all I can report is competing reports from starlink users...

Regards
	Sebastian


> 
> David Lang
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-15  6:36               ` Sebastian Moeller
@ 2023-05-15 11:07                 ` David Lang
  0 siblings, 0 replies; 34+ messages in thread
From: David Lang @ 2023-05-15 11:07 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: David Lang, Ulrich Speidel, starlink

On Mon, 15 May 2023, Sebastian Moeller wrote:

>> I have two starlink dishes in the southern california area, I'm going to put one on the low-priority mobile plan shortly. These are primarily used for backup communication, so I would be happy to add something to them to do latency monitoring.
>
>
> 	[SM] I would consider using irtt for that (like running in for 15 minutes with say 5ms spacing a few times a day, sometimes together with a saturating load sometimes without), this is a case where OWDs are especially interesting and irtt will also report the direction in which packets were lost. Maybe Dave (once back from his time-off) has an idea about which irtt reflector to use?
>
>
>> In looking at what geo-location reports my location as, I see it wander up and down the west coast, from the Los Angeles area all the way up to Canada.
>
> 	[SM] Demonstrating once more that geoIP is just a heuristic ;)

and/or demonstrating that starlink is connecting me to different ground stations 
at different times.

>>>> I think that active queue management on the sending side of the bottleneck will handle it fairly well. It doesn't have to do calculations based on what the bandwidth is, it just needs to know what it has pending to go out.
>>
>>> Understood - but your customer for AQM is the sending TCP client, and there are two questions here: (a) Does your AQM handle rapid load changes and (b) how do your TCP clients actually respond to your AQM's handling?
>>
>> AQM allocates the available bandwidth between different connections (usually different users)
>
> 	[SM] Not sure AQM is actually defined that stringently, I was under the impression anything other that FIFO with tail drop would already count as AQM, no?

technically true, but I think that doing anything other than FIO with tail drop 
is allocating the bandwidth. I think it makes for a nice shorthand explination 
without getting into mechanisms.

>> When it does this indirectly for inbound traffic by delaying acks, the results depend on the senders handling of these indirect signals that were never intended for this purpose.
>
> 	[SM] Hmm, ACKs where intended to be a feed-back mechanism for the sender to use to asses the in-flight data, so I am not sure whether delaying ACKs is something that was never envisaged by TCP's creators?

It was not, it seems to work in practice, but imperfectly.

>> But when it does this directly on the sending side, it doesn't matter what the senders want, their data WILL be managed to the priority/bandwidth that the AQM sets, and eventually their feedback is dropped packets, which everyone who is legitimate responds to.
>
> 	[SM] Some more quickly than others though, looking at you BBR ;)
>
>
>> But even if they don't respond (say a ping flood or DoS attack), the AQM will limit the damage to that connection, allowing the other connections trying to use that link to continue to function.
>
> 	[SM] Would that not require an AQM with effectively a multi-queue scheduler? I think it seems clear that starlink uses some form of AQM (potentially ARED), but on the scheduler/queue side there see to be competing claims ranging from single-queue FIFO (with ARED) to FQ-scheduler. Not having a starlink-link I can not test any of this so all I can report is competing reports from starlink users...

no, it just requires a AQM that drops packets from a flow. It doesn't matter if 
it does this with multiple queues, or by just dropping packets from a busy 
connection when the queue is close to full while allowing other connections to 
get added to the queue.

And I didn't mean to imply that all AQMs achieve the goal of isolating the 
problem traffic, just that they all attempt to do so.

David Lang


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-15  3:33             ` David Lang
  2023-05-15  6:36               ` Sebastian Moeller
@ 2023-05-24 12:55               ` Ulrich Speidel
  2023-05-24 13:44                 ` Dave Taht
                                   ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-24 12:55 UTC (permalink / raw)
  To: David Lang; +Cc: starlink

[-- Attachment #1.1: Type: text/plain, Size: 17458 bytes --]

On 15/05/2023 3:33 pm, David Lang wrote:
> On Mon, 15 May 2023, Ulrich Speidel wrote:
>
> > On 14/05/2023 9:00 pm, David Lang wrote:
> >> On Sun, 14 May 2023, Ulrich Speidel wrote:
> >>
> >> >> I just discovered that someone is manufacturing an adapter so 
> you no
> >> >> longer have to cut the cable
> >> >>
> >> >>
> >> 
> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>
>
> >> >>
> >> > I'll see whether I can get hold of one of these. Cutting a cable 
> on a
> >> > university IT asset as an academic is not allowed here, except if it
> >> > doesn't meet electrical safety standards.
OK, we have one on order, along with PoE injector and power supply. 
Don't hold your breath, though, I'll be out of the country when it 
arrives and it'll be late July before I get to play with it.
> >> >
> >> > Alternatively, has anyone tried the standard Starlink Ethernet 
> adapter with
> >> > a PoE injector instead of the WiFi box? The adapter above seems 
> to be like
> >> > the Starlink one (which also inserts into the cable between Dishy 
> and
> >> > router).
> >>
> >> that connects you a 2nd ethernet port on the router, not on the dishy
> >>
> >> I just ordered one of those adapters, it will take a few weeks to 
> arrive.
> > How do we know that the Amazon version doesn't do the same?
>
> because it doesn't involve the router at all. It allows you to replace 
> the
> router with anything you want.
>
> People have documented how to cut the cable and crimp on a RJ45 
> connector, use a
> standard PoE injector, and connect to any router you want. I was 
> preparing to do
> that (and probably still will for one cable to use a different 
> locations to
> avoid having a 75 ft cable from the dish mounted on the roof of my van 
> to the
> router a couple feet away), This appears to allow me to do the same 
> functional
> thing, but without cutting the cable.
Let's see whether they actually work any different ;-) They're sure in 
the same position in the cable.
>
> >> >> > I suspect they're handing over whole cells, not individual 
> users, at a
> >> >> > time.
> >> >>
> >> >> I would guess the same (remember, in spite of them having 
> launched >4000
> >> >> satellites, this is still the early days, with the network 
> changing as
> >> >> more launching)
> >> >>
> >> >> We've seen that it seems that there is only one satellite 
> serving any cell
> >> >> one time.
> >>
> >> > But the reverse is almost certainly not true: Each satellite must 
> serve
> >> > multiple cells.
> >>
> >> true, but while the satellite over a given area will change, the 
> usage in
> >> that area isn't changing that much
>
> > Exactly. But your underlying queue sits on the satellite, not in the 
> area.
>
> only if the satellite is where you have more input than output. That 
> may be the
> case for users uploading, but for users downloading, I would expect 
> that the
> bandwidth bottleneck would be from the Internet connected ground 
> station to the
> satellite, with the satellite serving many cells but only having one 
> uplink.

Leaving lasers aside for the moment.

I'd expect there to be one queue for each satellite uplink at the 
gateway ground station, and that the occupancy of that queue depends on 
how much demand the users on that satellite currently produce. So as a 
remote terminal switches satellites, even if the ground station remains 
the same, it sees different queuing delays for its inbound traffic at 
the ground station.

For the uplink from the user terminal, we can't have multiple users 
accessing the same uplink channel (however you define "channel" - 
frequency, time slot, spreading code, beam, polarisation, any 
combination thereof, ...) simultaneously as they are not able to 
coordinate and you wouldn't want random access for your main data link 
channel because of the hidden node collisions this would produce (a 
random access channel paired with an access grant channel is a different 
story). So you'd get slot assignments from the satellite obviously, and 
the queue for one of these sits at the user terminal. But what isn't 
clear to me is whether the satellites are truly only handled by a single 
ground station at a time, or perhaps by multiple ground stations. If 
it's the latter, then you might end up with a situation where you have 
more traffic arriving at the satellite than it can dispatch to its 
ground station(s), and then you'd need a queue in the uplink direction 
also.

Similarly, if the combined uplinks from the ground station are able to 
deliver more data than the satellite can downlink to its users through 
its current slot assignments, we need a queuing system on the satellite 
in that direction, too.

Add lasers in, and it seems like having some sort of buffer on the 
satellites is a must.

>
> >> >> But remember that the system does know how much usage there is 
> in the cell
> >> >> before they do the handoff. It's unknown if they do anything 
> with that, or
> >> >> if they are just relaying based on geography. We also don't know 
> what the
> >> >> bandwidth to the ground stations is compared to the dishy.
> >>
> >> > Well, we do know for NZ, sort of, based on the licences Starlink 
> has here.
> >>
> >> what is the ground station bandwith?
> >
> > https://rrf.rsm.govt.nz/ui/search/licence 
> <https://rrf.rsm.govt.nz/ui/search/licence> 
> - seach for "Starlink"
> >
> > ...all NZ licences in all their glory. Looking at Starlink SES 
> (satellite
> > earth station) TX (which is the interesting direction I guess):
> >
> > - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 29750.000000 
> TX (BW =
> > 500 MHz)
> > - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28850.000000 
> TX (BW =
> > 500 MHz)
> > - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28350.000000 
> TX (BW =
> > 500 MHz)
> > - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 28250.000000 
> TX (BW =
> > 500 MHz)
> > - Awarua, Puwera, Hinds, Clevedon, Cromwell, Te Hana: 27750.000000 
> TX (BW =
> > 500 MHz)
> >
> > So 2.5 GHz up, licensed from 6 ground stations. Now I'm not 
> convinced that
> > they would use all of those from all locations simultaneously 
> because of the
> > risk of off-beam interference. They'll all be transmitting south, 
> ballpark.
> > If there was full re-use at all ground stations, we'd be looking at 
> 15 GHz.
> > If they are able to re-use on all antennas at each ground station, 
> then we're
> > looking at 9 golf balls each in Puwera, Te Hana, Clevedon, Hinds and
> > Cromwell, and an unknown number at Awarua. Assuming 9 there, we'd be 
> looking
> > at 135 GHz all up max.
> >
> > Awarua and Cromwell are 175 km apart, Hinds another 220 km from 
> Cromwell,
> > then it's a hop of about 830 km to Clevedon, and from there another 
> 100 km to
> > Te Hana, which is another 53 km from Puwera, so keeping them all out 
> of each
> > other's hair all the time might be a bit difficult.
> >
> > Lots of other interesting info in the licenses, such as EIRP, in 
> case you're
> > wanting to do link budgets.
>
> I was asking more in terms of Gb/s rather than MHz of bandwidth. 
> Dedicated
> ground stations with bigger antennas, better filters, more processing and
> overall a much higher budget can get much better data rates out of a 
> given
> amount of bandwidth than the user end stations will.
>
> it's also possible (especially with bigger antennas) for one ground 
> station
> location to talk to multiple different satellites at once (the aiming 
> of the
> antennas can isolate the signals from each other)

Well, the Gb/s is what a link budget would give you if you knew the 
modulation scheme(s) and any FEC used. The ground station antennas are 
normal parabolic dishes in radomes for all we can tell, and are all the 
same size, so you can kind of estimate aperture and hence gain 
reasonably well. Path loss depends a little on distance and path quality 
(weather / rain fade), and we don't really know in how far their modems 
use adaptive rates to cope with this (my Ookla tests during Cyclone 
Gabrielle most certainly don't rule this out - rates went down both ways 
during the thick of it). I guess we know relatively little about the 
on-board phased array on the satellites (apart from very loose bounds), 
which restricts our ability to say much about gain there (and potential 
for spatial separation / re-use of beam frequencies). We also don't know 
how Starlink manages its frequency allocation across its ground stations.

It's certainly noticeable here that they seem to have sets of three 
grouped together in a relatively compact geographical area (you could 
visit all NZ North Island ground stations in a day by car from Auckland, 
Auckland traffic notwithstanding, and at a stretch could do the same 
down south from Hinds to Awarua if you manage to ignore the scenery, but 
getting from the southernmost North Island ground station to the 
northernmost South Island one is basically a two day drive plus ferry trip).

>
> >> As latency changes, figuring out if it's extra distance that must be
> >> traveled, or buffering is hard. does the latency stay roughly the 
> same until
> >> the next satellite change? or does it taper off?
>
> > Good question. You would expect step changes in physical latency 
> between
> > satellites, but also gradual change related to satellite movement. 
> Plus of
> > course any rubble thrown into any queue by something suddenly 
> turning up on
> > that path. Don't forget that it's not just cells now, we're also 
> talking up-
> > and downlink for the laser ISLs, at least in some places.
>
> how far do the satellites move in 15 min and what effect would that 
> have on
> latency (I would assume that most of the time, the satellites are 
> switched to as
> they are getting nearer the two stations, so most of the time, I would 
> expect a
> slight reduction in latency for ~7 min and then a slight increase for 
> ~7 min,
> but I would not expect that this would be a large variation

Dishy tracks most satellites for significantly less than 15 minutes, and 
for a relatively small part of their orbit. Let me explain:

This is an obstruction map obtained with starlink-grpc-tools 
(https://github.com/sparky8512/starlink-grpc-tools). The way to read 
this is in polar coordinates: The centre of the image is the dishy 
boresight (direction of surface normal), distance from the centre is 
elevation measured as an angle from the surface normal, and direction 
from the centre is essentially the azimuth - top is north, left is west, 
bottom is south, and right is east. The white tracks are the satellites 
dishy uses, and a graph like this gets built up over time, one track at 
a time. Notice how short the tracks are - they don't follow the 
satellite for long - typically under a minute. The red bits are 
satellites getting obscured by the edge of our roof.

I've also attached a time lapse movie of how one of these graphs builds 
up - if I correctly remember (the script is on another machine), one 
frame in the video corresponds to 5 seconds.

Conclusion: latency change from tracking one satellite is smaller than 
the latency difference as you jump between satellites. You could be 
looking at several 100 km of path difference here. In an instant. Even 
that, at 300,000 km/s of propagation speed, is only in the order of 
maybe 1 ms or so - peanuts compared to the RTTs in the dozens of ms that 
we're seeing. But if you get thrown from one queue onto another as you 
get handed over - what does that do to the remote TCP stack that's 
serving you?

>
> >> If it stays the same, I would suspect that you are actually hitting a
> >> different ground station and there is a VPN backhaul to your egress 
> point to
> >> the regular Internet (which doesn't support mobile IP addresses) 
> for that
> >> cycle. If it tapers off, then I could buy bufferbloat that gets 
> resolved as
> >> TCP backs off.
> >
> > Yes, quite sorting out which part of your latency is what is the 
> million
> > dollar question here...
> >
> > We saw significant RTT changes here during the recent cyclone over 
> periods of
> > several hours, and these came in steps (see below), with the initial 
> change
> > being a downward one. Averages are over 60 pings (the time scale 
> isn't 100%
> > true as we used "one ping, one second" timing) here.
> >
> >
> > We're still not sure whether to attribute this to load change or ground
> > station changes. There were a lot of power outages, especially in 
> Auckland's
> > lifestyle block belt, which teems with Starlink users, but all three 
> North
> > Island ground stations were also in areas affected by power outages 
> (although
> > the power companies concerned don't provide the level of detail to 
> establish
> > whether they were affected). It's also not clear what, if any, 
> backup power
> > arrangements they have). At ~25 ms, the step changes in RTT are too 
> large be
> > the result of a switch in ground stations, though, the path 
> differences just
> > aren't that large. You'd also expect a ground station outage to 
> result in
> > longer RTTs, not shorter ones, if you need to re-route via another 
> ground
> > station. One explanation might be users getting cut off if they 
> relied on one
> > particular ground station for bent pipe ops - but that would not 
> explain this
> > order of magnitude effect as I'd expect that number to be small. So 
> maybe
> > power outages at the user end after all. But that would then tell us 
> that
> > these are load-dependent queuing delays. Moreover, since those load 
> changes
> > wouldn't have involved the router at our site, we can conclude that 
> these are
> > queue sojourn times in the Starlink network.
>
> I have two starlink dishes in the southern california area, I'm going 
> to put
> one on the low-priority mobile plan shortly. These are primarily used 
> for backup
> communication, so I would be happy to add something to them to do latency
> monitoring. In looking at what geo-location reports my location as, I 
> see it
> wander up and down the west coast, from the Los Angeles area all the 
> way up to
> Canada.
Would be worthwhile to also do traceroutes to various places to see 
where you emerge from the satellite side of things.
>
> >> I think that active queue management on the sending side of the 
> bottleneck
> >> will handle it fairly well. It doesn't have to do calculations 
> based on what
> >> the bandwidth is, it just needs to know what it has pending to go out.
>
> > Understood - but your customer for AQM is the sending TCP client, 
> and there
> > are two questions here: (a) Does your AQM handle rapid load changes 
> and (b)
> > how do your TCP clients actually respond to your AQM's handling?
>
> AQM allocates the available bandwidth between different connections 
> (usually
> different users)
But it does this under the assumption that the vector for changes in 
bandwidth availability is the incoming traffic, which AQM gives 
(indirect) feedback to, right?
>
> When it does this indirectly for inbound traffic by delaying acks, the 
> results
> depend on the senders handling of these indirect signals that were never
> intended for this purpose.
>
> But when it does this directly on the sending side, it doesn't matter 
> what the
> senders want, their data WILL be managed to the priority/bandwidth 
> that the AQM
> sets, and eventually their feedback is dropped packets, which everyone 
> who is
> legitimate responds to. 

Understood. You build a control loop, where the latency is the delay in 
the control signal.

Classically, you have a physical bottleneck that the AQM manages, where 
the physical bandwidth doesn't change.

The available bandwidth changes, (mostly) as a result of TCP connections 
(or similarly behaved UDP applications) joining in slow start, or 
disappearing.

Basically, your queues grow and shrink one packet at a time.

Your control signal allows you (if they're well behaved) throttle / 
accelerate senders.

What you don't get are quantum jumps in queue occupancy, jump changes in 
underlying physical bandwidth, or a whole set of new senders that are 
completely oblivious to any of your previous control signals. But you 
get all that with satellite handovers like these.

So what if the response you elicit in this way is to a queue scenario 
that no longer applies?

> But even if they don't respond (say a ping flood or DoS
> attack), the AQM will limit the damage to that connection, allowing 
> the other
> connections trying to use that link to continue to function.
All understood.
>
> David Lang

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************

[-- Attachment #1.2.1: Type: text/html, Size: 22640 bytes --]

[-- Attachment #1.2.2: obstructions_pos2_762.png --]
[-- Type: image/png, Size: 1289 bytes --]

[-- Attachment #2: obs_map20-2-23.mp4 --]
[-- Type: video/mp4, Size: 169217 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 12:55               ` Ulrich Speidel
@ 2023-05-24 13:44                 ` Dave Taht
  2023-05-24 14:05                   ` David Lang
  2023-05-24 14:49                   ` Michael Richardson
  2023-05-24 13:59                 ` David Lang
  2023-05-24 15:18                 ` Mark Handley
  2 siblings, 2 replies; 34+ messages in thread
From: Dave Taht @ 2023-05-24 13:44 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

This thread got pretty long. I just had a comment tweak me a bit:

Fair queueing provides an automatic and reasonably robust means of defense
against simple single threaded DOS attacks, and badly behaving software. My
favorite example of this was in the early days of cerowrt, we had a dhcpv6
bug that after a counter flipped over in 51 days, it flooded the upstream
with dhcpv6 requests. We did not notice this *at all* in day to day use,
until looking at cpu and bandwidth usage and scratching our heads for a
while (and rebooting... and waiting 51 days... and waiting for the user
population and ISPs to report more instances of this bug)

These are the biggest reliability reasons why I think FQ is *necessary*
across the edges of the internet.

pure AQM, in the case above, since that flood was uncontrollable, would
have resulted in a 99.99% or so drop rate for all other traffic. While that
would have been easier to diagnose I suppose, the near term outcome would
have been quite damaging.

Even the proposed policer modes in L4S would not have handled this bug.

I always try to make a clear distinction between FQ and AQM techniques.
Both are useful and needed, for different reasons (but in the general case,
I think the DRR++ derived FQ in fq_codel is the cats pajamas, and far more
important than any form of AQM)

[-- Attachment #2: Type: text/html, Size: 1498 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 13:44                 ` Dave Taht
@ 2023-05-24 14:05                   ` David Lang
  2023-05-24 14:49                   ` Michael Richardson
  1 sibling, 0 replies; 34+ messages in thread
From: David Lang @ 2023-05-24 14:05 UTC (permalink / raw)
  To: Dave Taht; +Cc: Ulrich Speidel, David Lang, starlink

fair point. I am playing a bit loose with the AQM terminology and definition 
here

David Lang

On Wed, 24 May 2023, Dave Taht wrote:

> This thread got pretty long. I just had a comment tweak me a bit:
>
> Fair queueing provides an automatic and reasonably robust means of defense
> against simple single threaded DOS attacks, and badly behaving software. My
> favorite example of this was in the early days of cerowrt, we had a dhcpv6
> bug that after a counter flipped over in 51 days, it flooded the upstream
> with dhcpv6 requests. We did not notice this *at all* in day to day use,
> until looking at cpu and bandwidth usage and scratching our heads for a
> while (and rebooting... and waiting 51 days... and waiting for the user
> population and ISPs to report more instances of this bug)
>
> These are the biggest reliability reasons why I think FQ is *necessary*
> across the edges of the internet.
>
> pure AQM, in the case above, since that flood was uncontrollable, would
> have resulted in a 99.99% or so drop rate for all other traffic. While that
> would have been easier to diagnose I suppose, the near term outcome would
> have been quite damaging.
>
> Even the proposed policer modes in L4S would not have handled this bug.
>
> I always try to make a clear distinction between FQ and AQM techniques.
> Both are useful and needed, for different reasons (but in the general case,
> I think the DRR++ derived FQ in fq_codel is the cats pajamas, and far more
> important than any form of AQM)
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 13:44                 ` Dave Taht
  2023-05-24 14:05                   ` David Lang
@ 2023-05-24 14:49                   ` Michael Richardson
  2023-05-24 15:09                     ` Dave Collier-Brown
  2023-05-24 15:31                     ` Dave Taht
  1 sibling, 2 replies; 34+ messages in thread
From: Michael Richardson @ 2023-05-24 14:49 UTC (permalink / raw)
  To: Dave Taht; +Cc: Ulrich Speidel, starlink

[-- Attachment #1: Type: text/plain, Size: 1469 bytes --]


Dave Taht via Starlink <starlink@lists.bufferbloat.net> wrote:
    > These are the biggest reliability reasons why I think FQ is *necessary*
    > across the edges of the internet.

It saved your bacon, but yeah, like all other resilient protocols (DNS,
Happy Eyeballs) tends to hide when one option is failing :-)

    > pure AQM, in the case above, since that flood was uncontrollable, would
    > have resulted in a 99.99% or so drop rate for all other traffic. While
    > that would have been easier to diagnose I suppose, the near term
    > outcome would have been quite damaging.

What this says is that fq_codel doesn't have enough management reporting
interfaces.   Going back 25 years, this has always been a problem with home
routers: ntop3 is great, but it's not easy to use, and it's not that
accessible, and it often can't see things that move around.

    > I always try to make a clear distinction between FQ and AQM techniques.
    > Both are useful and needed, for different reasons (but in the general
    > case, I think the DRR++ derived FQ in fq_codel is the cats pajamas, and
    > far more important than any form of AQM)

Could fq_codel emit flow statistics as a side-effect of it's classifications?

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 14:49                   ` Michael Richardson
@ 2023-05-24 15:09                     ` Dave Collier-Brown
  2023-05-24 15:31                     ` Dave Taht
  1 sibling, 0 replies; 34+ messages in thread
From: Dave Collier-Brown @ 2023-05-24 15:09 UTC (permalink / raw)
  To: starlink

On 5/24/23 10:49, Michael Richardson via Starlink wrote:

> It saved your bacon, but yeah, like all other resilient protocols (DNS,
> Happy Eyeballs) tends to hide when one option is failing :-)
>
>      > pure AQM, in the case above, since that flood was uncontrollable, would
>      > have resulted in a 99.99% or so drop rate for all other traffic. While
>      > that would have been easier to diagnose I suppose, the near term
>      > outcome would have been quite damaging.
>
> What this says is that fq_codel doesn't have enough management reporting
> interfaces.   Going back 25 years, this has always been a problem with home
> routers: ntop3 is great, but it's not easy to use, and it's not that
> accessible, and it often can't see things that move around.

Over and above management reporting, the deleterious effect should be
both visible and capable of being reasoned about by the community.

A classic example from Unix is that memory exhaustion and thrashing
causes a sudden increase in IO to the swap device, followed by a queue
building up and the dis slowing down.

The latter is something one usually has alerts on, to tell the sysadmin
if a disk is misbehaving. Getting a "disk is too busy" from the swap
device tends to cause the sysadmin to ask themselves what could cause
that, and realize that something is swapping. That send then off looking
for the root cause.

Yet another case of "belt and suspenders" (;-))

--dave

--
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
dave.collier-brown@indexexchange.com |              -- Mark Twain

CONFIDENTIALITY NOTICE AND DISCLAIMER : This telecommunication, including any and all attachments, contains confidential information intended only for the person(s) to whom it is addressed. Any dissemination, distribution, copying or disclosure is strictly prohibited and is not a waiver of confidentiality. If you have received this telecommunication in error, please notify the sender immediately by return electronic mail and delete the message from your inbox and deleted items folders. This telecommunication does not constitute an express or implied agreement to conduct transactions by electronic means, nor does it constitute a contract offer, a contract amendment or an acceptance of a contract offer. Contract terms contained in this telecommunication are subject to legal review and the completion of formal documentation and are not binding until same is confirmed in writing and has been signed by an authorized signatory.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 14:49                   ` Michael Richardson
  2023-05-24 15:09                     ` Dave Collier-Brown
@ 2023-05-24 15:31                     ` Dave Taht
  2023-05-24 18:30                       ` Michael Richardson
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Taht @ 2023-05-24 15:31 UTC (permalink / raw)
  To: Michael Richardson; +Cc: Ulrich Speidel, starlink

On Wed, May 24, 2023 at 8:49 AM Michael Richardson <mcr@sandelman.ca> wrote:
>
>
> Dave Taht via Starlink <starlink@lists.bufferbloat.net> wrote:
>     > These are the biggest reliability reasons why I think FQ is *necessary*
>     > across the edges of the internet.
>
> It saved your bacon, but yeah, like all other resilient protocols (DNS,
> Happy Eyeballs) tends to hide when one option is failing :-)
>
>     > pure AQM, in the case above, since that flood was uncontrollable, would
>     > have resulted in a 99.99% or so drop rate for all other traffic. While
>     > that would have been easier to diagnose I suppose, the near term
>     > outcome would have been quite damaging.
>
> What this says is that fq_codel doesn't have enough management reporting
> interfaces.   Going back 25 years, this has always been a problem with home
> routers: ntop3 is great, but it's not easy to use, and it's not that
> accessible, and it often can't see things that move around.
>
>     > I always try to make a clear distinction between FQ and AQM techniques.
>     > Both are useful and needed, for different reasons (but in the general
>     > case, I think the DRR++ derived FQ in fq_codel is the cats pajamas, and
>     > far more important than any form of AQM)
>
> Could fq_codel emit flow statistics as a side-effect of it's classifications?

It does. It always has. "tc -s class show" gives details of each queue.

it is a 5 tuple hash by default. This can of course be overridden via
filters to use another classification method.There are a few on-router
tools that do process this and provide a nice dashboard.

This could be better, in trying to identify problematic flows, but
would require more in kernel (ebpf?) processing than we have yet
attempted on a home router. AI is also on our minds.

Most of my focus for the past year has been in getting cake to scale
as an ISP middlebox, in Libreqos.

For example in LibreQos we are presently very successful in sampling
cake queue data at my preferred sample rate (10ms), in production, for
up to about 1k subscribers. However, in production, with 10k subs
(11gbit), sampling at 1s rates is where we are now. (that is 40
million queues sampled once per second). I am sure we can improve the
sample rates at high levels of subs further... compress reporting,
etc, but until now, most ISPs only had 5 minute averages to look at.

There are some really cool things you can do at high sample rates.
Here is a live/realtime movie of what netflix actually looks like:
https://www.youtube.com/watch?v=C-2oSBr2200
(also)  Another thing is that real traffic, displayed as we do it now,
is kind of mesmerizing, and looks very different from what we generate
via flent, on the testbed.

Anyway, on the libreqos front now we have over 30 ISPs and 98 folk
participating in the chat room, please feel free to hang out with us:
https://app.element.io/#/room/#libreqos:matrix.org - ask questions,
propose tests and plots....

I return yáll now to starlink (which could really use this stuff!)

> --
> ]               Never tell me the odds!                 | ipv6 mesh networks [
> ]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
> ]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [
>

-- 
Podcast: https://www.linkedin.com/feed/update/urn:li:activity:7058793910227111937/
Dave Täht CSO, LibreQos

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 15:31                     ` Dave Taht
@ 2023-05-24 18:30                       ` Michael Richardson
  2023-05-24 18:45                         ` Sebastian Moeller
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Richardson @ 2023-05-24 18:30 UTC (permalink / raw)
  To: Dave Taht; +Cc: Ulrich Speidel, starlink

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]


Dave Taht <dave.taht@gmail.com> wrote:
    >> Could fq_codel emit flow statistics as a side-effect of it's
    >> classifications?

    > It does. It always has. "tc -s class show" gives details of each queue.

Good.
Just a question of hooking up luci to that.

    > There are some really cool things you can do at high sample rates.
    > Here is a live/realtime movie of what netflix actually looks like:
    > https://www.youtube.com/watch?v=C-2oSBr2200 (also) Another thing is

A movie about people watching movies :-)
I wonder about the privacy implications of doing this at an ISP.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 658 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 18:30                       ` Michael Richardson
@ 2023-05-24 18:45                         ` Sebastian Moeller
  0 siblings, 0 replies; 34+ messages in thread
From: Sebastian Moeller @ 2023-05-24 18:45 UTC (permalink / raw)
  To: Michael Richardson; +Cc: Dave Täht, starlink

Hi Michael,

> On May 24, 2023, at 20:30, Michael Richardson via Starlink <starlink@lists.bufferbloat.net> wrote:
> 
> 
> Dave Taht <dave.taht@gmail.com> wrote:
>>> Could fq_codel emit flow statistics as a side-effect of it's
>>> classifications?
> 
>> It does. It always has. "tc -s class show" gives details of each queue.
> 
> Good.
> Just a question of hooking up luci to that.

	Keep in mind by default fq_codel uses 1024 hash bins, so worst case your GUI needs to display quite a lot of units; on the other hand many flows are really short, and the "tc -s class show" has little "hysteresis" that is once a hash bin runs empty, fq_codel will not report data for that bin. (Which is fine as empty buckets tend to be not very informative, but when using this output to feed a display tool now your sampling/query rate might be too coarse and not show ephemeral buckets).


> 
>> There are some really cool things you can do at high sample rates.
>> Here is a live/realtime movie of what netflix actually looks like:
>> https://www.youtube.com/watch?v=C-2oSBr2200 (also) Another thing is
> 
> A movie about people watching movies :-)

	And a "silent move" at that, very meta and very old school ;)

> I wonder about the privacy implications of doing this at an ISP.

	I think keeping the full 5-tuple input to the hash function might be problematic, but "reducing" that to the 10bit bucket-ID should be lossy enough, no?

Regards
	Sebastian

P.S.: for cake luci statistics has grown a module that plots the outputs of `tc -s qdisc` for cake instances over time. Sure that is not the per-flow or per hash-bin resolution, but for quickly seeing whether there is on-going high level marking/dropping it should suffice...


> 
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 12:55               ` Ulrich Speidel
  2023-05-24 13:44                 ` Dave Taht
@ 2023-05-24 13:59                 ` David Lang
  2023-05-24 22:39                   ` Ulrich Speidel
  2023-05-24 15:18                 ` Mark Handley
  2 siblings, 1 reply; 34+ messages in thread
From: David Lang @ 2023-05-24 13:59 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

[-- Attachment #1: Type: text/plain, Size: 12220 bytes --]

On Thu, 25 May 2023, Ulrich Speidel wrote:

> On 15/05/2023 3:33 pm, David Lang wrote:
>> On Mon, 15 May 2023, Ulrich Speidel wrote:
>> 
>> > On 14/05/2023 9:00 pm, David Lang wrote:
>> >> On Sun, 14 May 2023, Ulrich Speidel wrote:
>> >>
>> >> >> I just discovered that someone is manufacturing an adapter so you no
>> >> >> longer have to cut the cable
>> >> >>
>> >> >>
>> >> 
>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>
>> 
>> >> >>
>> >> > I'll see whether I can get hold of one of these. Cutting a cable on a
>> >> > university IT asset as an academic is not allowed here, except if it
>> >> > doesn't meet electrical safety standards.
> OK, we have one on order, along with PoE injector and power supply. Don't 
> hold your breath, though, I'll be out of the country when it arrives and 
> it'll be late July before I get to play with it.

I've got a couple on order, but they won't arrive for 1-3 more weeks :-(

>> >> >> But remember that the system does know how much usage there is in the 
>> >> >> cell before they do the handoff. It's unknown if they do anything with 
>> >> >> that, or if they are just relaying based on geography. We also don't 
>> >> >> know what the bandwidth to the ground stations is compared to the 
>> >> >> dishy.
>> >>
>> >> > Well, we do know for NZ, sort of, based on the licences Starlink has 
>> >> > here.
>> >>
>> >> what is the ground station bandwith?
>> I was asking more in terms of Gb/s rather than MHz of bandwidth. Dedicated
>> ground stations with bigger antennas, better filters, more processing and
>> overall a much higher budget can get much better data rates out of a given
>> amount of bandwidth than the user end stations will.
>> 
>> it's also possible (especially with bigger antennas) for one ground station 
>> location to talk to multiple different satellites at once (the aiming of the 
>> antennas can isolate the signals from each other)
>
> Well, the Gb/s is what a link budget would give you if you knew the 
> modulation scheme(s) and any FEC used. The ground station antennas are normal 
> parabolic dishes in radomes for all we can tell, and are all the same size, 
> so you can kind of estimate aperture and hence gain reasonably well. Path 
> loss depends a little on distance and path quality (weather / rain fade), and 
> we don't really know in how far their modems use adaptive rates to cope with 
> this (my Ookla tests during Cyclone Gabrielle most certainly don't rule this 
> out - rates went down both ways during the thick of it). I guess we know 
> relatively little about the on-board phased array on the satellites (apart 
> from very loose bounds), which restricts our ability to say much about gain 
> there (and potential for spatial separation / re-use of beam frequencies). We 
> also don't know how Starlink manages its frequency allocation across its 
> ground stations.

I'll also note that in the last launch of the v2 mini satellites, they mentioned 
that those now supported E band backhaul to handle 4x the bandwidth of the 
earlier satellites

> It's certainly noticeable here that they seem to have sets of three grouped 
> together in a relatively compact geographical area (you could visit all NZ 
> North Island ground stations in a day by car from Auckland, Auckland traffic 
> notwithstanding, and at a stretch could do the same down south from Hinds to 
> Awarua if you manage to ignore the scenery, but getting from the southernmost 
> North Island ground station to the northernmost South Island one is basically 
> a two day drive plus ferry trip).

I lived in Wanganui for a few years, including one RV trip down the South 
Island. I know what you mean about needing to ignore the scenery :-)

>> 
>> >> As latency changes, figuring out if it's extra distance that must be 
>> >> traveled, or buffering is hard. does the latency stay roughly the same 
>> >> until the next satellite change? or does it taper off?
>> 
>> > Good question. You would expect step changes in physical latency between 
>> > satellites, but also gradual change related to satellite movement. Plus of 
>> > course any rubble thrown into any queue by something suddenly turning up on 
>> > that path. Don't forget that it's not just cells now, we're also talking 
>> > up- and downlink for the laser ISLs, at least in some places.
>> 
>> how far do the satellites move in 15 min and what effect would that have on 
>> latency (I would assume that most of the time, the satellites are switched to 
>> as they are getting nearer the two stations, so most of the time, I would 
>> expect a slight reduction in latency for ~7 min and then a slight increase 
>> for ~7 min, but I would not expect that this would be a large variation
>
> Dishy tracks most satellites for significantly less than 15 minutes, and for 
> a relatively small part of their orbit. Let me explain:

Ok, I thought I had heard they switched every 15 min, so it's every 5 min 
instead?

> Conclusion: latency change from tracking one satellite is smaller than the 
> latency difference as you jump between satellites. You could be looking at 
> several 100 km of path difference here. In an instant. Even that, at 300,000 
> km/s of propagation speed, is only in the order of maybe 1 ms or so - peanuts 
> compared to the RTTs in the dozens of ms that we're seeing. But if you get 
> thrown from one queue onto another as you get handed over - what does that do 
> to the remote TCP stack that's serving you?

yes, the point I thought that I was trying to make was that the latency change 
from satellite movement was not very significant

>> >> If it stays the same, I would suspect that you are actually hitting a 
>> >> different ground station and there is a VPN backhaul to your egress point 
>> >> to the regular Internet (which doesn't support mobile IP addresses) for 
>> >> that cycle. If it tapers off, then I could buy bufferbloat that gets 
>> >> resolved as TCP backs off.
>> >
>> > Yes, quite sorting out which part of your latency is what is the million
>> > dollar question here...
>> >
>> > We saw significant RTT changes here during the recent cyclone over periods 
>> > of several hours, and these came in steps (see below), with the initial 
>> > change being a downward one. Averages are over 60 pings (the time scale 
>> > isn't 100% true as we used "one ping, one second" timing) here.
>> >
>> >
>> > We're still not sure whether to attribute this to load change or ground 
>> > station changes. There were a lot of power outages, especially in 
>> > Auckland's lifestyle block belt, which teems with Starlink users, but all 
>> > three North Island ground stations were also in areas affected by power 
>> > outages (although the power companies concerned don't provide the level of 
>> > detail to establish whether they were affected). It's also not clear what, 
>> > if any, backup power arrangements they have). At ~25 ms, the step changes 
>> > in RTT are too large be the result of a switch in ground stations, though, 
>> > the path differences just aren't that large. You'd also expect a ground 
>> > station outage to result in longer RTTs, not shorter ones, if you need to 
>> > re-route via another ground station. One explanation might be users getting 
>> > cut off if they relied on one particular ground station for bent pipe ops - 
>> > but that would not explain this order of magnitude effect as I'd expect 
>> > that number to be small. So maybe power outages at the user end after all. 
>> > But that would then tell us that these are load-dependent queuing delays. 
>> > Moreover, since those load changes wouldn't have involved the router at our 
>> > site, we can conclude that these are queue sojourn times in the Starlink 
>> > network.

remember that SpaceX controlls the ground stations as well, so if they are doing 
any mobile IP trickery to redirect traffic from one ground station to another, 
they can anticipate the shift or move the queue for the user or other trickery 
like this (probably aren't yet, they seem to be in the early days here, focusing 
on keeping things working and improving on the space side more than anything 
else)

>> I have two starlink dishes in the southern california area, I'm going to put 
>> one on the low-priority mobile plan shortly. These are primarily used for 
>> backup communication, so I would be happy to add something to them to do 
>> latency monitoring. In looking at what geo-location reports my location as, I 
>> see it wander up and down the west coast, from the Los Angeles area all the 
>> way up to Canada.

> Would be worthwhile to also do traceroutes to various places to see where you 
> emerge from the satellite side of things.
>> 
>> >> I think that active queue management on the sending side of the bottleneck 
>> >> will handle it fairly well. It doesn't have to do calculations based on 
>> >> what the bandwidth is, it just needs to know what it has pending to go 
>> >> out.
>> 
>> > Understood - but your customer for AQM is the sending TCP client, and there 
>> > are two questions here: (a) Does your AQM handle rapid load changes and (b) 
>> > how do your TCP clients actually respond to your AQM's handling?
>> 
>> AQM allocates the available bandwidth between different connections (usually 
>> different users)
> But it does this under the assumption that the vector for changes in bandwidth 
> availability is the incoming traffic, which AQM gives (indirect) feedback to, 
> right?

no, this is what I'm getting at below

>> When it does this indirectly for inbound traffic by delaying acks, the 
>> results depend on the senders handling of these indirect signals that were 
>> never intended for this purpose.

This is what you are thinking of, where it's providing indirect feedback to an 
unknowable inbound queue on a remote system

>> But when it does this directly on the sending side, it doesn't matter what 
>> the senders want, their data WILL be managed to the priority/bandwidth that 
>> the AQM sets, and eventually their feedback is dropped packets, which 
>> everyone who is legitimate responds to.

when the AQM in on the sending side of the bottleneck, it now has direct control 
over the queue, and potentially has information over the available bandwidth as 
it changes. But even if it doesn't know what the available bandwidth is, it 
still can dispatch the data in it's queues 'fairly' (whatever that means to the 
particulat AQM algorithm), changes in the data rate just change how fast the 
queue drains.

> Understood. You build a control loop, where the latency is the delay in the 
> control signal.
>
> Classically, you have a physical bottleneck that the AQM manages, where the 
> physical bandwidth doesn't change.
>
> The available bandwidth changes, (mostly) as a result of TCP connections (or 
> similarly behaved UDP applications) joining in slow start, or disappearing.
>
> Basically, your queues grow and shrink one packet at a time.
>
> Your control signal allows you (if they're well behaved) throttle / 
> accelerate senders.
>
> What you don't get are quantum jumps in queue occupancy, jump changes in 
> underlying physical bandwidth, or a whole set of new senders that are 
> completely oblivious to any of your previous control signals. But you get all 
> that with satellite handovers like these.

for a single TCP session,it has slow-start, but if you suddently start dozens or 
hundreds of TCP sessions, (bittorrent, other file transfer protocols, or just a 
website with hundreds of sub-elements), I think it's a bigger step than you are 
thinking.

And again, I think the same issue exists on cell sites as users move from one 
cell to another.

> So what if the response you elicit in this way is to a queue scenario that no 
> longer applies?

you run the risk of under-utilizing the link for a short time (which may mean 
that you decide to run the queues a little bigger than with fixed links, so that 
when a chunk of data disappears from your queue, you still will keep utilization 
up, sacraficing some latency to improve overall throughput)

David Lang

[-- Attachment #2: Type: video/mp4, Size: 169217 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 13:59                 ` David Lang
@ 2023-05-24 22:39                   ` Ulrich Speidel
  2023-05-25  0:06                     ` David Lang
  2023-07-27 20:37                     ` Ulrich Speidel
  0 siblings, 2 replies; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-24 22:39 UTC (permalink / raw)
  To: David Lang; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 10440 bytes --]


On 25/05/2023 1:59 am, David Lang wrote:
>
> >> >>
> >> 
> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P> 
>
> >> 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>>
> >>
> >> >> >>
> >> >> > I'll see whether I can get hold of one of these. Cutting a 
> cable on a
> >> >> > university IT asset as an academic is not allowed here, except 
> if it
> >> >> > doesn't meet electrical safety standards.
> > OK, we have one on order, along with PoE injector and power supply. 
> Don't
> > hold your breath, though, I'll be out of the country when it arrives 
> and
> > it'll be late July before I get to play with it.
>
> I've got a couple on order, but they won't arrive for 1-3 more weeks :-(
I envy you!
> I'll also note that in the last launch of the v2 mini satellites, they 
> mentioned
> that those now supported E band backhaul to handle 4x the bandwidth of 
> the
> earlier satellites
Still not enough to connect the missing 2.5 or so billion, but a step in 
the right direction for sure.
>
> > It's certainly noticeable here that they seem to have sets of three 
> grouped
> > together in a relatively compact geographical area (you could visit 
> all NZ
> > North Island ground stations in a day by car from Auckland, Auckland 
> traffic
> > notwithstanding, and at a stretch could do the same down south from 
> Hinds to
> > Awarua if you manage to ignore the scenery, but getting from the 
> southernmost
> > North Island ground station to the northernmost South Island one is 
> basically
> > a two day drive plus ferry trip).
>
> I lived in Wanganui for a few years, including one RV trip down the South
> Island. I know what you mean about needing to ignore the scenery :-)
Interesting - that must have been before the local īwi pointed out once 
again that the town had misspelled its name since 1854, and for once 
were heard - so it's now officially "Whanganui", for crown agencies, anyway.
> Ok, I thought I had heard they switched every 15 min, so it's every 5 min
> instead?
Dishy collects this information as a cumulative dataset, which the tools 
query via grpc. The frames in the movie corresponds to snapshots of the 
dataset taken at 5 second intervals. This indicates switches roughly 
every ten to seventy seconds, with most dwell times being around 15-30 
seconds.
>
> > Conclusion: latency change from tracking one satellite is smaller 
> than the
> > latency difference as you jump between satellites. You could be 
> looking at
> > several 100 km of path difference here. In an instant. Even that, at 
> 300,000
> > km/s of propagation speed, is only in the order of maybe 1 ms or so 
> - peanuts
> > compared to the RTTs in the dozens of ms that we're seeing. But if 
> you get
> > thrown from one queue onto another as you get handed over - what 
> does that do
> > to the remote TCP stack that's serving you?
>
> yes, the point I thought that I was trying to make was that the 
> latency change
> from satellite movement was not very significant
So it's got to come from somewhere else.
>
> >> >> If it stays the same, I would suspect that you are actually 
> hitting a
> >> >> different ground station and there is a VPN backhaul to your 
> egress point
> >> >> to the regular Internet (which doesn't support mobile IP 
> addresses) for
> >> >> that cycle. If it tapers off, then I could buy bufferbloat that 
> gets
> >> >> resolved as TCP backs off.
> >> >
> >> > Yes, quite sorting out which part of your latency is what is the 
> million
> >> > dollar question here...
> >> >
> >> > We saw significant RTT changes here during the recent cyclone 
> over periods
> >> > of several hours, and these came in steps (see below), with the 
> initial
> >> > change being a downward one. Averages are over 60 pings (the time 
> scale
> >> > isn't 100% true as we used "one ping, one second" timing) here.
> >> >
> >> >
> >> > We're still not sure whether to attribute this to load change or 
> ground
> >> > station changes. There were a lot of power outages, especially in
> >> > Auckland's lifestyle block belt, which teems with Starlink users, 
> but all
> >> > three North Island ground stations were also in areas affected by 
> power
> >> > outages (although the power companies concerned don't provide the 
> level of
> >> > detail to establish whether they were affected). It's also not 
> clear what,
> >> > if any, backup power arrangements they have). At ~25 ms, the step 
> changes
> >> > in RTT are too large be the result of a switch in ground 
> stations, though,
> >> > the path differences just aren't that large. You'd also expect a 
> ground
> >> > station outage to result in longer RTTs, not shorter ones, if you 
> need to
> >> > re-route via another ground station. One explanation might be 
> users getting
> >> > cut off if they relied on one particular ground station for bent 
> pipe ops -
> >> > but that would not explain this order of magnitude effect as I'd 
> expect
> >> > that number to be small. So maybe power outages at the user end 
> after all.
> >> > But that would then tell us that these are load-dependent queuing 
> delays.
> >> > Moreover, since those load changes wouldn't have involved the 
> router at our
> >> > site, we can conclude that these are queue sojourn times in the 
> Starlink
> >> > network.
>
> remember that SpaceX controlls the ground stations as well, so if they 
> are doing
> any mobile IP trickery to redirect traffic from one ground station to 
> another,
> they can anticipate the shift or move the queue for the user or other 
> trickery
> like this (probably aren't yet, they seem to be in the early days 
> here, focusing
> on keeping things working and improving on the space side more than 
> anything
> else)
I strongly suspect that they are experimenting with this here and with 
that there.
>
>
> >> AQM allocates the available bandwidth between different connections 
> (usually
> >> different users)
> > But it does this under the assumption that the vector for changes in 
> bandwidth
> > availability is the incoming traffic, which AQM gives (indirect) 
> feedback to,
> > right?
>
> no, this is what I'm getting at below
>
> >> When it does this indirectly for inbound traffic by delaying acks, the
> >> results depend on the senders handling of these indirect signals 
> that were
> >> never intended for this purpose.
>
> This is what you are thinking of, where it's providing indirect 
> feedback to an
> unknowable inbound queue on a remote system
>
> >> But when it does this directly on the sending side, it doesn't 
> matter what
> >> the senders want, their data WILL be managed to the 
> priority/bandwidth that
> >> the AQM sets, and eventually their feedback is dropped packets, which
> >> everyone who is legitimate responds to.
>
> when the AQM in on the sending side of the bottleneck, it now has 
> direct control
> over the queue, and potentially has information over the available 
> bandwidth as
> it changes. But even if it doesn't know what the available bandwidth 
> is, it
> still can dispatch the data in it's queues 'fairly' (whatever that 
> means to the
> particulat AQM algorithm), changes in the data rate just change how 
> fast the
> queue drains.

Yes - but if you delay ACKs, the only entity this has any effect on is 
the original (remote) TCP sender, which is who you are trying to 
persuade to take it easy so you're not going to be forced to (tail or 
otherwise) drop packets.

Dropping helps clear your queue (the one in front of the bottleneck).

>
> > Understood. You build a control loop, where the latency is the delay 
> in the
> > control signal.
> >
> > Classically, you have a physical bottleneck that the AQM manages, 
> where the
> > physical bandwidth doesn't change.
> >
> > The available bandwidth changes, (mostly) as a result of TCP 
> connections (or
> > similarly behaved UDP applications) joining in slow start, or 
> disappearing.
> >
> > Basically, your queues grow and shrink one packet at a time.
> >
> > Your control signal allows you (if they're well behaved) throttle /
> > accelerate senders.
> >
> > What you don't get are quantum jumps in queue occupancy, jump 
> changes in
> > underlying physical bandwidth, or a whole set of new senders that are
> > completely oblivious to any of your previous control signals. But 
> you get all
> > that with satellite handovers like these.
>
> for a single TCP session,it has slow-start, but if you suddently start 
> dozens or
> hundreds of TCP sessions, (bittorrent, other file transfer protocols, 
> or just a
> website with hundreds of sub-elements), I think it's a bigger step 
> than you are
> thinking.
Doesn't each TCP session maintain and manage its own cwnd?
>
> And again, I think the same issue exists on cell sites as users move 
> from one
> cell to another.
Yes. But that happens gradually in comparison to Starlink, and the only 
TCP stack that potentially gets affected badly as a user moves from one 
cell site to the next is that of the user. But what you have here is the 
equivalent of the cell tower moving out of range of a whole group of 
users in one go. Different ballpark?
>
> > So what if the response you elicit in this way is to a queue 
> scenario that no
> > longer applies?
>
> you run the risk of under-utilizing the link for a short time (which 
> may mean
> that you decide to run the queues a little bigger than with fixed 
> links, so that
> when a chunk of data disappears from your queue, you still will keep 
> utilization
> up, sacraficing some latency to improve overall throughput)
So we're back to the "more buffer" scenario here, too.
>
> David Lang 

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2: Type: text/html, Size: 14410 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 22:39                   ` Ulrich Speidel
@ 2023-05-25  0:06                     ` David Lang
  2023-07-27 20:37                     ` Ulrich Speidel
  1 sibling, 0 replies; 34+ messages in thread
From: David Lang @ 2023-05-25  0:06 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

[-- Attachment #1: Type: text/plain, Size: 4760 bytes --]

On Thu, 25 May 2023, Ulrich Speidel wrote:

>> I lived in Wanganui for a few years, including one RV trip down the South
>> Island. I know what you mean about needing to ignore the scenery :-)
> Interesting - that must have been before the local īwi pointed out once again 
> that the town had misspelled its name since 1854, and for once were heard - 
> so it's now officially "Whanganui", for crown agencies, anyway.

My spelling has always been a bit haphazard, but this was back in the '70s

>> remember that SpaceX controlls the ground stations as well, so if they are 
>> doing any mobile IP trickery to redirect traffic from one ground station to 
>> another, they can anticipate the shift or move the queue for the user or 
>> other trickery like this (probably aren't yet, they seem to be in the early 
>> days here, focusing on keeping things working and improving on the space side 
>> more than anything else)
> I strongly suspect that they are experimenting with this here and with that 
> there.
>> 
>> 
>> >> AQM allocates the available bandwidth between different connections 
>> >> (usually different users)
>> > But it does this under the assumption that the vector for changes in 
>> > bandwidth availability is the incoming traffic, which AQM gives (indirect) 
>> > feedback to, right?
>> 
>> no, this is what I'm getting at below
>> 
>> >> When it does this indirectly for inbound traffic by delaying acks, the 
>> >> results depend on the senders handling of these indirect signals that were 
>> >> never intended for this purpose.
>> 
>> This is what you are thinking of, where it's providing indirect feedback to 
>> an unknowable inbound queue on a remote system
>> 
>> >> But when it does this directly on the sending side, it doesn't matter what 
>> >> the senders want, their data WILL be managed to the priority/bandwidth 
>> >> that the AQM sets, and eventually their feedback is dropped packets, which 
>> >> everyone who is legitimate responds to.
>> 
>> when the AQM in on the sending side of the bottleneck, it now has direct 
>> control over the queue, and potentially has information over the available 
>> bandwidth as it changes. But even if it doesn't know what the available 
>> bandwidth is, it still can dispatch the data in it's queues 'fairly' 
>> (whatever that means to the particulat AQM algorithm), changes in the data 
>> rate just change how fast the queue drains.
>
> Yes - but if you delay ACKs, the only entity this has any effect on is the 
> original (remote) TCP sender, which is who you are trying to persuade to take 
> it easy so you're not going to be forced to (tail or otherwise) drop packets.

the delaying ack thing is only done when you are trying to manage the inbound 
traffic. When you are on the upstream side of the bottleneck, you don't mess 
with acks, you just dispatch/mark/discard the packets.

> Dropping helps clear your queue (the one in front of the bottleneck).

or ECN tagging, and as Dave pointed out, the 'fair' part of FQ_Codel and Cake 
provide fairness between users so that even in the face of overloads from one 
source, other traffic will still get through.

>> for a single TCP session,it has slow-start, but if you suddently start dozens 
>> or hundreds of TCP sessions, (bittorrent, other file transfer protocols, or 
>> just a website with hundreds of sub-elements), I think it's a bigger step 
>> than you are thinking.
> Doesn't each TCP session maintain and manage its own cwnd?

Yes, but when you have hundreds of TCP sessions running in parallel, each 
increasing their cwnd, the effect can be significant (especially on lower 
bandwidth links)

>> And again, I think the same issue exists on cell sites as users move from one 
>> cell to another.
> Yes. But that happens gradually in comparison to Starlink, and the only TCP 
> stack that potentially gets affected badly as a user moves from one cell site 
> to the next is that of the user. But what you have here is the equivalent of 
> the cell tower moving out of range of a whole group of users in one go. 
> Different ballpark?

as others noted, busses/trains move a bunch of people from one cell zone to the 
next at the same time

>> > So what if the response you elicit in this way is to a queue scenario that 
>> > no longer applies?
>> 
>> you run the risk of under-utilizing the link for a short time (which may mean 
>> that you decide to run the queues a little bigger than with fixed links, so 
>> that when a chunk of data disappears from your queue, you still will keep 
>> utilization up, sacraficing some latency to improve overall throughput)

> So we're back to the "more buffer" scenario here, too.

more buffer compared to static links, or accept the link being less utilized

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 22:39                   ` Ulrich Speidel
  2023-05-25  0:06                     ` David Lang
@ 2023-07-27 20:37                     ` Ulrich Speidel
  1 sibling, 0 replies; 34+ messages in thread
From: Ulrich Speidel @ 2023-07-27 20:37 UTC (permalink / raw)
  To: David Lang; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 11705 bytes --]

So we got a Yaosheng adapter here but I didn't get to play with it until 
last week. We hooked up a SuperMicro with a DHCP-ing Ethernet interface 
to it.

First impressions:

  * DHCP server and IPv4 gateway is 100.64.0.1, which sits on the
    infrastructure side of the Starlink network.
  * The IPv4 address is assigned from 100.64.0.0/10.
  * DNS assigned by 100.64.0.1 are 1.1.1.1 and 8.8.8.8 - but woe betide
    you, their reachability wasn't all that great when we tried, so a
    lot of name lookups failed.

More to come when I have a moment.

On 25/05/2023 10:39 am, Ulrich Speidel wrote:
>
>
> On 25/05/2023 1:59 am, David Lang wrote:
>>
>> >> >>
>> >> 
>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P> 
>>
>> >> 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>>
>> >>
>> >> >> >>
>> >> >> > I'll see whether I can get hold of one of these. Cutting a 
>> cable on a
>> >> >> > university IT asset as an academic is not allowed here, 
>> except if it
>> >> >> > doesn't meet electrical safety standards.
>> > OK, we have one on order, along with PoE injector and power supply. 
>> Don't
>> > hold your breath, though, I'll be out of the country when it 
>> arrives and
>> > it'll be late July before I get to play with it.
>>
>> I've got a couple on order, but they won't arrive for 1-3 more weeks :-(
> I envy you!
>> I'll also note that in the last launch of the v2 mini satellites, 
>> they mentioned
>> that those now supported E band backhaul to handle 4x the bandwidth 
>> of the
>> earlier satellites
> Still not enough to connect the missing 2.5 or so billion, but a step 
> in the right direction for sure.
>>
>> > It's certainly noticeable here that they seem to have sets of three 
>> grouped
>> > together in a relatively compact geographical area (you could visit 
>> all NZ
>> > North Island ground stations in a day by car from Auckland, 
>> Auckland traffic
>> > notwithstanding, and at a stretch could do the same down south from 
>> Hinds to
>> > Awarua if you manage to ignore the scenery, but getting from the 
>> southernmost
>> > North Island ground station to the northernmost South Island one is 
>> basically
>> > a two day drive plus ferry trip).
>>
>> I lived in Wanganui for a few years, including one RV trip down the 
>> South
>> Island. I know what you mean about needing to ignore the scenery :-)
> Interesting - that must have been before the local īwi pointed out 
> once again that the town had misspelled its name since 1854, and for 
> once were heard - so it's now officially "Whanganui", for crown 
> agencies, anyway.
>> Ok, I thought I had heard they switched every 15 min, so it's every 5 
>> min
>> instead?
> Dishy collects this information as a cumulative dataset, which the 
> tools query via grpc. The frames in the movie corresponds to snapshots 
> of the dataset taken at 5 second intervals. This indicates switches 
> roughly every ten to seventy seconds, with most dwell times being 
> around 15-30 seconds.
>>
>> > Conclusion: latency change from tracking one satellite is smaller 
>> than the
>> > latency difference as you jump between satellites. You could be 
>> looking at
>> > several 100 km of path difference here. In an instant. Even that, 
>> at 300,000
>> > km/s of propagation speed, is only in the order of maybe 1 ms or so 
>> - peanuts
>> > compared to the RTTs in the dozens of ms that we're seeing. But if 
>> you get
>> > thrown from one queue onto another as you get handed over - what 
>> does that do
>> > to the remote TCP stack that's serving you?
>>
>> yes, the point I thought that I was trying to make was that the 
>> latency change
>> from satellite movement was not very significant
> So it's got to come from somewhere else.
>>
>> >> >> If it stays the same, I would suspect that you are actually 
>> hitting a
>> >> >> different ground station and there is a VPN backhaul to your 
>> egress point
>> >> >> to the regular Internet (which doesn't support mobile IP 
>> addresses) for
>> >> >> that cycle. If it tapers off, then I could buy bufferbloat that 
>> gets
>> >> >> resolved as TCP backs off.
>> >> >
>> >> > Yes, quite sorting out which part of your latency is what is the 
>> million
>> >> > dollar question here...
>> >> >
>> >> > We saw significant RTT changes here during the recent cyclone 
>> over periods
>> >> > of several hours, and these came in steps (see below), with the 
>> initial
>> >> > change being a downward one. Averages are over 60 pings (the 
>> time scale
>> >> > isn't 100% true as we used "one ping, one second" timing) here.
>> >> >
>> >> >
>> >> > We're still not sure whether to attribute this to load change or 
>> ground
>> >> > station changes. There were a lot of power outages, especially in
>> >> > Auckland's lifestyle block belt, which teems with Starlink 
>> users, but all
>> >> > three North Island ground stations were also in areas affected 
>> by power
>> >> > outages (although the power companies concerned don't provide 
>> the level of
>> >> > detail to establish whether they were affected). It's also not 
>> clear what,
>> >> > if any, backup power arrangements they have). At ~25 ms, the 
>> step changes
>> >> > in RTT are too large be the result of a switch in ground 
>> stations, though,
>> >> > the path differences just aren't that large. You'd also expect a 
>> ground
>> >> > station outage to result in longer RTTs, not shorter ones, if 
>> you need to
>> >> > re-route via another ground station. One explanation might be 
>> users getting
>> >> > cut off if they relied on one particular ground station for bent 
>> pipe ops -
>> >> > but that would not explain this order of magnitude effect as I'd 
>> expect
>> >> > that number to be small. So maybe power outages at the user end 
>> after all.
>> >> > But that would then tell us that these are load-dependent 
>> queuing delays.
>> >> > Moreover, since those load changes wouldn't have involved the 
>> router at our
>> >> > site, we can conclude that these are queue sojourn times in the 
>> Starlink
>> >> > network.
>>
>> remember that SpaceX controlls the ground stations as well, so if 
>> they are doing
>> any mobile IP trickery to redirect traffic from one ground station to 
>> another,
>> they can anticipate the shift or move the queue for the user or other 
>> trickery
>> like this (probably aren't yet, they seem to be in the early days 
>> here, focusing
>> on keeping things working and improving on the space side more than 
>> anything
>> else)
> I strongly suspect that they are experimenting with this here and with 
> that there.
>>
>>
>> >> AQM allocates the available bandwidth between different 
>> connections (usually
>> >> different users)
>> > But it does this under the assumption that the vector for changes 
>> in bandwidth
>> > availability is the incoming traffic, which AQM gives (indirect) 
>> feedback to,
>> > right?
>>
>> no, this is what I'm getting at below
>>
>> >> When it does this indirectly for inbound traffic by delaying acks, 
>> the
>> >> results depend on the senders handling of these indirect signals 
>> that were
>> >> never intended for this purpose.
>>
>> This is what you are thinking of, where it's providing indirect 
>> feedback to an
>> unknowable inbound queue on a remote system
>>
>> >> But when it does this directly on the sending side, it doesn't 
>> matter what
>> >> the senders want, their data WILL be managed to the 
>> priority/bandwidth that
>> >> the AQM sets, and eventually their feedback is dropped packets, which
>> >> everyone who is legitimate responds to.
>>
>> when the AQM in on the sending side of the bottleneck, it now has 
>> direct control
>> over the queue, and potentially has information over the available 
>> bandwidth as
>> it changes. But even if it doesn't know what the available bandwidth 
>> is, it
>> still can dispatch the data in it's queues 'fairly' (whatever that 
>> means to the
>> particulat AQM algorithm), changes in the data rate just change how 
>> fast the
>> queue drains.
>
> Yes - but if you delay ACKs, the only entity this has any effect on is 
> the original (remote) TCP sender, which is who you are trying to 
> persuade to take it easy so you're not going to be forced to (tail or 
> otherwise) drop packets.
>
> Dropping helps clear your queue (the one in front of the bottleneck).
>
>>
>> > Understood. You build a control loop, where the latency is the 
>> delay in the
>> > control signal.
>> >
>> > Classically, you have a physical bottleneck that the AQM manages, 
>> where the
>> > physical bandwidth doesn't change.
>> >
>> > The available bandwidth changes, (mostly) as a result of TCP 
>> connections (or
>> > similarly behaved UDP applications) joining in slow start, or 
>> disappearing.
>> >
>> > Basically, your queues grow and shrink one packet at a time.
>> >
>> > Your control signal allows you (if they're well behaved) throttle /
>> > accelerate senders.
>> >
>> > What you don't get are quantum jumps in queue occupancy, jump 
>> changes in
>> > underlying physical bandwidth, or a whole set of new senders that are
>> > completely oblivious to any of your previous control signals. But 
>> you get all
>> > that with satellite handovers like these.
>>
>> for a single TCP session,it has slow-start, but if you suddently 
>> start dozens or
>> hundreds of TCP sessions, (bittorrent, other file transfer protocols, 
>> or just a
>> website with hundreds of sub-elements), I think it's a bigger step 
>> than you are
>> thinking.
> Doesn't each TCP session maintain and manage its own cwnd?
>>
>> And again, I think the same issue exists on cell sites as users move 
>> from one
>> cell to another.
> Yes. But that happens gradually in comparison to Starlink, and the 
> only TCP stack that potentially gets affected badly as a user moves 
> from one cell site to the next is that of the user. But what you have 
> here is the equivalent of the cell tower moving out of range of a 
> whole group of users in one go. Different ballpark?
>>
>> > So what if the response you elicit in this way is to a queue 
>> scenario that no
>> > longer applies?
>>
>> you run the risk of under-utilizing the link for a short time (which 
>> may mean
>> that you decide to run the queues a little bigger than with fixed 
>> links, so that
>> when a chunk of data disappears from your queue, you still will keep 
>> utilization
>> up, sacraficing some latency to improve overall throughput)
> So we're back to the "more buffer" scenario here, too.
>>
>> David Lang 
> -- 
> ****************************************************************
> Dr. Ulrich Speidel
>
> School of Computer Science
>
> Room 303S.594 (City Campus)
>
> The University of Auckland
> u.speidel@auckland.ac.nz  
> http://www.cs.auckland.ac.nz/~ulrich/
> ****************************************************************
>
>
>
-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2: Type: text/html, Size: 16427 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 12:55               ` Ulrich Speidel
  2023-05-24 13:44                 ` Dave Taht
  2023-05-24 13:59                 ` David Lang
@ 2023-05-24 15:18                 ` Mark Handley
  2023-05-24 21:50                   ` Ulrich Speidel
  2 siblings, 1 reply; 34+ messages in thread
From: Mark Handley @ 2023-05-24 15:18 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 3499 bytes --]

On Wed, 24 May 2023, at 1:55 PM, Ulrich Speidel via Starlink wrote:
> 
> Dishy tracks most satellites for significantly less than 15 minutes, and for a relatively small part of their orbit. Let me explain:
> 
> 
> 
> 
> This is an obstruction map obtained with starlink-grpc-tools (https://github.com/sparky8512/starlink-grpc-tools). The way to read this is in polar coordinates: The centre of the image is the dishy boresight (direction of surface normal), distance from the centre is elevation measured as an angle from the surface normal, and direction from the centre is essentially the azimuth - top is north, left is west, bottom is south, and right is east. The white tracks are the satellites dishy uses, and a graph like this gets built up over time, one track at a time. Notice how short the tracks are - they don't follow the satellite for long - typically under a minute. The red bits are satellites getting obscured by the edge of our roof.
> 
> I've also attached a time lapse movie of how one of these graphs builds up - if I correctly remember (the script is on another machine), one frame in the video corresponds to 5 seconds.
> 
> Conclusion: latency change from tracking one satellite is smaller than the latency difference as you jump between satellites. You could be looking at several 100 km of path difference here. In an instant. Even that, at 300,000 km/s of propagation speed, is only in the order of maybe 1 ms or so - peanuts compared to the RTTs in the dozens of ms that we're seeing. But if you get thrown from one queue onto another as you get handed over - what does that do to the remote TCP stack that's serving you? 
> 

Interesting video.  From eyeballing it, it seems that when it changes satellite, it's most often changing between satellites that are a similar distance from boresight.  When it does this, the difference in propogation delay from dishy to satellite will be minimal.  It's possible it's even switching when the latency matches - I can't really tell from the video.  

Of course you can't tell from just one end of the connection whether starlink is switching satellite just when overall ground-to-ground path latency of the current path drops below the path latency of the next path.  For that we'd need to see what happened at the groundstation too.  But if you were trying to optimize things to minimize reordering, you might try something like this.  As you point out, you've still got variable uplink queue sizes to handle as you switch, but there's no fundamental reason why path switches *always* need to result in latency discontinuities.  

If you did decide to switch when the underlying path latency matches, and thinking more about those uplink queues: when you switch a path from a smaller uplink queue (at a groundstation) to a larger one, there's no reordering, so TCP should be happy(ish).  When switching from a larger uplink queue to a smaller one, you can cause reordering, but it's easy enough to hide by adding an earliest release time to any new packets (based on the last time a packet from that flow was (or will be) last sent on the old path), and not release the packets from the new queue to send to the satellite before that time.  I've no idea if anyone cares enough to implement such a scheme though.

Not saying any of this is what Starlink does - just idle speculation as to how you might minimize reordering if it was enough of a problem.  And of course I'm ignoring any queues in satellites...

Cheers,
Mark

[-- Attachment #2.1: Type: text/html, Size: 4365 bytes --]

[-- Attachment #2.2: obstructions_pos2_762.png --]
[-- Type: image/png, Size: 1289 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 15:18                 ` Mark Handley
@ 2023-05-24 21:50                   ` Ulrich Speidel
  2023-05-25  0:17                     ` David Lang
  0 siblings, 1 reply; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-24 21:50 UTC (permalink / raw)
  To: Mark Handley; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 6099 bytes --]


On 25/05/2023 3:18 am, Mark Handley wrote:
>
>
> On Wed, 24 May 2023, at 1:55 PM, Ulrich Speidel via Starlink wrote:
>>
>> Dishy tracks most satellites for significantly less than 15 minutes, 
>> and for a relatively small part of their orbit. Let me explain:
>>
>>
>>
>> This is an obstruction map obtained with starlink-grpc-tools 
>> (https://github.com/sparky8512/starlink-grpc-tools 
>> <https://github.com/sparky8512/starlink-grpc-tools>). 
>> The way to read this is in polar coordinates: The centre of the image 
>> is the dishy boresight (direction of surface normal), distance from 
>> the centre is elevation measured as an angle from the surface normal, 
>> and direction from the centre is essentially the azimuth - top is 
>> north, left is west, bottom is south, and right is east. The white 
>> tracks are the satellites dishy uses, and a graph like this gets 
>> built up over time, one track at a time. Notice how short the tracks 
>> are - they don't follow the satellite for long - typically under a 
>> minute. The red bits are satellites getting obscured by the edge of 
>> our roof.
>>
>> I've also attached a time lapse movie of how one of these graphs 
>> builds up - if I correctly remember (the script is on another 
>> machine), one frame in the video corresponds to 5 seconds.
>>
>> Conclusion: latency change from tracking one satellite is smaller 
>> than the latency difference as you jump between satellites. You could 
>> be looking at several 100 km of path difference here. In an instant. 
>> Even that, at 300,000 km/s of propagation speed, is only in the order 
>> of maybe 1 ms or so - peanuts compared to the RTTs in the dozens of 
>> ms that we're seeing. But if you get thrown from one queue onto 
>> another as you get handed over - what does that do to the remote TCP 
>> stack that's serving you?
>>
>
> Interesting video.  From eyeballing it, it seems that when it changes 
> satellite, it's most often changing between satellites that are a 
> similar distance from boresight.  When it does this, the difference in 
> propogation delay from dishy to satellite will be minimal.  It's 
> possible it's even switching when the latency matches - I can't really 
> tell from the video.
Qualified "maybe" here ... most of Starlink still runs on bent pipe 
topology, and we don't know how or why a particular satellite is chosen, 
of for that matter where that choice is made. The video was produced in 
Auckland, within relatively close proximity (23.15 km) to Starlink's 
Clevedon ground station. So there would have been quite a few satellites 
to choose from that were in sight of both ends. Also, on our deck (where 
the measurement was taken), there are obstructions in pretty much all 
directions on the lower horizon. That's not necessarily the situation 
you'd get on the ridgeline of a farmhouse roof 300 km away from a 
gateway. So that "similar distance from boresight" might be a location 
artefact.
>
> Of course you can't tell from just one end of the connection whether 
> starlink is switching satellite just when overall ground-to-ground 
> path latency of the current path drops below the path latency of the 
> next path.  For that we'd need to see what happened at the 
> groundstation too.  But if you were trying to optimize things to 
> minimize reordering, you might try something like this.  As you point 
> out, you've still got variable uplink queue sizes to handle as you 
> switch, but there's no fundamental reason why path switches *always* 
> need to result in latency discontinuities.
Yes, although with slot assignments (which they can't really avoid I 
guess), satellite capacity would be the primary criterion I suppose. The 
effect of reordering is mostly that it drives up the amount of buffer 
memory needed for reassembly at the receiving end, which is not much of 
an issue nowadays with sufficient receiver socket memory. In this sort 
of scenario, delays from reordering to the application reading from the 
socket are no worse than delays from not switching until a bit later.
>
>
> If you did decide to switch when the underlying path latency matches, 
> and thinking more about those uplink queues: when you switch a path 
> from a smaller uplink queue (at a groundstation) to a larger one, 
> there's no reordering, so TCP should be happy(ish).  When switching 
> from a larger uplink queue to a smaller one, you can cause reordering, 
> but it's easy enough to hide by adding an earliest release time to any 
> new packets (based on the last time a packet from that flow was (or 
> will be) last sent on the old path), and not release the packets from 
> the new queue to send to the satellite before that time.  I've no idea 
> if anyone cares enough to implement such a scheme though.
Case in point: This discussion started because we were wondering why 
Starlink had so much buffer in the system. That adding of earliest 
release time means that you are buffering, so it'd be exactly the thing 
that started this mailing list!
>
> Not saying any of this is what Starlink does - just idle speculation 
> as to how you might minimize reordering if it was enough of a 
> problem.  And of course I'm ignoring any queues in satellites...

We know that we're seeing RTTs into the hundreds of ms in scenarios 
where we have physical path latencies of at most a couple of dozen ms. 
So, yes, speculation, but ...

Also, I don't get the impression that path latency minimisation is top 
priority for Starlink. My impression is that as long as RTT is what you 
might see on a terrestrial connection to the other side of the globe, 
it's good enough for Starlink.

Cheers,

Ulrich

-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2.1: Type: text/html, Size: 8381 bytes --]

[-- Attachment #2.2: obstructions_pos2_762.png --]
[-- Type: image/png, Size: 1289 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 21:50                   ` Ulrich Speidel
@ 2023-05-25  0:17                     ` David Lang
  0 siblings, 0 replies; 34+ messages in thread
From: David Lang @ 2023-05-25  0:17 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: Mark Handley, starlink

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

On Thu, 25 May 2023, Ulrich Speidel via Starlink wrote:

> Also, I don't get the impression that path latency minimisation is top 
> priority for Starlink. My impression is that as long as RTT is what you might 
> see on a terrestrial connection to the other side of the globe, it's good 
> enough for Starlink.

I agree, I don't think they have latency as one of their top priorities. I think 
they are focusing on expanding the network and jusst increasing bandwidth.

I think having multiple satellites servicing a single cell is a much higher 
priority than minimizing latency).

I also think that in-space-routing is a higher priority than minimizing latency.

But I think latency is something they care about, just not their top priority. 
As long as it's "good enough" they are working on other things (and "good 
enough" is not fiber-connected good, but more like slow-dsl-good)

It's important to keep the market in mind, they aren't aiming the service at 
people who can get high speed DSL/Cable/Fiber, they are aiming it at the people 
who can't, who get slow DSL, dialup, cell service (LTE, not 5G), wireless ISPs, 
or do without.

David Lang

[-- Attachment #2: Type: text/plain, Size: 149 bytes --]

_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  8:43       ` Ulrich Speidel
  2023-05-14  9:00         ` David Lang
@ 2023-05-14  9:06         ` Sebastian Moeller
  2023-05-14  9:13           ` David Lang
  1 sibling, 1 reply; 34+ messages in thread
From: Sebastian Moeller @ 2023-05-14  9:06 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: David Lang, starlink

Hi Ulrich,


silly question, does starlink operate using fixed geographical cells and are CPE/dishies assigned to a single cell? In which case handover would not have to be so bad, the satellite leaving a cell is going to shed all its load and is going to take over the previous satellite's load in the cell it just starts serving. Assuming equal "air-conditions" the modulation scheme should be similar. So wouldn't the biggest problem be the actual switch-over time required for dishies to move from one satellite to the next (and would this be in line with the reported latency spikes every 15 seconds)?


> On May 14, 2023, at 10:43, Ulrich Speidel via Starlink <starlink@lists.bufferbloat.net> wrote:
> 
> On 14/05/2023 6:55 pm, David Lang wrote:
>> 
>> I just discovered that someone is manufacturing an adapter so you no longer have 
>> to cut the cable
>> 
>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P
>> 
> I'll see whether I can get hold of one of these. Cutting a cable on a university IT asset as an academic is not allowed here, except if it doesn't meet electrical safety standards.

	[SM] There must be a way to get this accomplished with in regulations if the test requiring this is somehow made part of the experiment, no? (Maybe requires partnering with other faculties like electrical engineering to get the necessary clout with the administration?)


> Alternatively, has anyone tried the standard Starlink Ethernet adapter with a PoE injector instead of the WiFi box? The adapter above seems to be like the Starlink one (which also inserts into the cable between Dishy and router).
> 
>> > Put another way: If you have a protocol (TCP) that is designed to reasonably 
>> > expect that its current cwnd is OK to use for now is put into a situation 
>> > where there are relatively frequent, huge and lasting step changes in 
>> > available BDP within subsecond periods, are your underlying assumptions still 
>> > valid?
>> 
>> I think that with interference from other APs, WIFI suffers at least as much 
>> unpredictable changes to the available bandwidth.
> Really? I'm thinking stuff like the sudden addition of packets from potentially dozens of TCP flows with large cwnd's?  

	[SM] But would these really be added to an already existing load?

>> 
>> > I suspect they're handing over whole cells, not individual users, at a time.
>> 
>> I would guess the same (remember, in spite of them having launched >4000 
>> satellites, this is still the early days, with the network changing as more are 
>> launching)
>> 
>> We've seen that it seems that there is only one satellite serving any cell at 
>> one time.
> But the reverse is almost certainly not true: Each satellite must serve multiple cells.

	[SM] Which is not necessarily a problem if the half-pipe to the base-station has enough capacity for all the per-cell traffic aggregated?


>> But remember that the system does know how much usage there is in the 
>> cell before they do the handoff. It's unknown if they do anything with that, or 
>> if they are just relaying based on geography. We also don't know what the 
>> bandwidth to the ground stations is compared to the dishy.
> Well, we do know for NZ, sort of, based on the licences Starlink has here. 
>> 
>> And remember that for every cell that a satellite takes over, it's also giving 
>> away one cell at the same time.
> Yes, except that some cells may have no users in them and some of them have a lot (think of a satellite flying into range of California from the Pacific, dropping over-the-water cells and acquiring land-based ones).

	[SM] But the coming satellite should have pretty much the same over-air capacity as the leaving satellite, no? Sure there can be some modulation changes, but I would guess the changes would typically be relatively small?


>> 
>> I'm not saying that the problem is trivial, but just that it's not unique
> What makes me suspicious here that it's not the usual bufferbloat problem is this: With conventional bufferbloat and FIFOs, you'd expect standing queues, right? 

	[SM] I thought it had been confirmed that starlink uses some form of AQM so not a dumb FIFO?


> With Starlink, we see the queues emptying relatively occasionally with RTTs in the low 20 ms, and in some cases under 20 ms even. With large ping packets (1500 bytes).
>> 
>> David Lang
> -- 
> ****************************************************************
> Dr. Ulrich Speidel
> 
> School of Computer Science
> 
> Room 303S.594 (City Campus)
> 
> The University of Auckland
> 
> u.speidel@auckland.ac.nz
>  
> 
> http://www.cs.auckland.ac.nz/~ulrich/
> 
> ****************************************************************
> 
> 
> 
> 
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  9:06         ` Sebastian Moeller
@ 2023-05-14  9:13           ` David Lang
  0 siblings, 0 replies; 34+ messages in thread
From: David Lang @ 2023-05-14  9:13 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Ulrich Speidel, David Lang, starlink

On Sun, 14 May 2023, Sebastian Moeller wrote:

> silly question, does starlink operate using fixed geographical cells and are 
> CPE/dishies assigned to a single cell? In which case handover would not have 
> to be so bad, the satellite leaving a cell is going to shed all its load and 
> is going to take over the previous satellite's load in the cell it just starts 
> serving. Assuming equal "air-conditions" the modulation scheme should be 
> similar. So wouldn't the biggest problem be the actual switch-over time 
> required for dishies to move from one satellite to the next (and would this be 
> in line with the reported latency spikes every 15 seconds)?

yes, there was a paper not that long ago from someone who was using the starlink 
signal for time/location purposes that detailed the protocol and the coverage 
(at least at that time)

in the last week or so the FCC approved increased power/utilization percentage, 
I don't know if units in the field have been modified to use it yet.

>
>> On May 14, 2023, at 10:43, Ulrich Speidel via Starlink <starlink@lists.bufferbloat.net> wrote:
>>
>> On 14/05/2023 6:55 pm, David Lang wrote:
>>>
>>> I just discovered that someone is manufacturing an adapter so you no longer have
>>> to cut the cable
>>>
>>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P
>>>
>> I'll see whether I can get hold of one of these. Cutting a cable on a university IT asset as an academic is not allowed here, except if it doesn't meet electrical safety standards.
>
> 	[SM] There must be a way to get this accomplished with in regulations if 
> the test requiring this is somehow made part of the experiment, no? (Maybe 
> requires partnering with other faculties like electrical engineering to get 
> the necessary clout with the administration?)

the other optin would be to order a second cord and cut that. You then aren't 
modifying the IT asset.

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 10:10 [Starlink] Starlink hidden buffers Ulrich Speidel
  2023-05-13 11:20 ` Sebastian Moeller
  2023-05-13 22:57 ` David Lang
@ 2023-05-14  9:57 ` Oleg Kutkov
  2023-05-14  9:59   ` Oleg Kutkov
  2023-05-24 15:26 ` Bjørn Ivar Teigen
  3 siblings, 1 reply; 34+ messages in thread
From: Oleg Kutkov @ 2023-05-14  9:57 UTC (permalink / raw)
  To: starlink


On 5/13/23 13:10, Ulrich Speidel via Starlink wrote:
> 3) We know that they aren't an artifact of the Starlink WiFi router 
> (our traceroutes were done through their Ethernet adaptor, which 
> bypasses the router), so they must be delays on the satellites or the 
> teleports. 

Here is what bypass mode does:

https://github.com/SpaceExplorationTechnologies/starlink-wifi-gen2/blob/main/openwrt/package/base-files/files/sbin/setup_iptables.sh#L19
https://github.com/SpaceExplorationTechnologies/starlink-wifi-gen2/blob/main/openwrt/package/base-files/files/sbin/wifi#L213
https://github.com/SpaceExplorationTechnologies/starlink-wifi-gen2/blob/main/openwrt/package/base-files/files/etc/init.d/wifi_control#L26 


So packages still going through this chain:
Dishy -> Mediatek PHY -> Linux kernel -> br-lan -> Linux kernel -> 
External LAN PHY (Marvell or MTK).

It's better to use a PoE injector and run tests without the Starlink router.

-- 
Best regards,
Oleg Kutkov


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-14  9:57 ` Oleg Kutkov
@ 2023-05-14  9:59   ` Oleg Kutkov
  0 siblings, 0 replies; 34+ messages in thread
From: Oleg Kutkov @ 2023-05-14  9:59 UTC (permalink / raw)
  To: starlink

Sorry, "packets", of course.

On 5/14/23 12:57, Oleg Kutkov via Starlink wrote:
> So packages still going through this chain: 

-- 
Best regards,
Oleg Kutkov


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-13 10:10 [Starlink] Starlink hidden buffers Ulrich Speidel
                   ` (2 preceding siblings ...)
  2023-05-14  9:57 ` Oleg Kutkov
@ 2023-05-24 15:26 ` Bjørn Ivar Teigen
  2023-05-24 21:53   ` Ulrich Speidel
  3 siblings, 1 reply; 34+ messages in thread
From: Bjørn Ivar Teigen @ 2023-05-24 15:26 UTC (permalink / raw)
  To: Ulrich Speidel; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 6811 bytes --]

 This discussion is fascinating and made me think of a couple of points I
really wish more people would grok:

1. What matters for the amount of queuing is the ratio of load over
capacity, or demand/supply, if you like. This ratio, at any point in time,
determines how quickly a queue fills or empties. It is the derivative of
the queue depth, if you like. Drops in capacity are equivalent to spikes in
load from this point of view.

This means the rate adaptation of WiFi and LTE, and link changes in the
Starlink network, has far greater potential of causing latency spikes than
TCP, even when many users connect at the same time. WiFi rates can go from
1000 to 1 from one packet to the next, and whenever that happens there
simply isn't time for TCP or any other end-to-end congestion controller to
react. In the presence of capacity seeking traffic there will, inevitably,
be a latency spike (or packet loss) when link capacity drops.

I'm presenting a paper on this at ICC next week, and the preprint is here:
https://arxiv.org/abs/2111.00488

2. IF you can describe how the ratio of demand to supply (or load/capacity)
changes over time (i.e, how much and how quickly it can change), then we
can use queuing theory (and/or simulations), to work out the utilization
vs. queuing delay trade-off, including transient behaviour. Handling
transients is what FQ excels at.

Because of the need for frequent link changes in the Starlink network,
there will be a need for more buffering than your typical (relatively)
static network. Not only because the load changes quickly, but because the
capacity does as well. This causes rapid changes in the
load-to-capacity-ratio, which will cause queues and/or packet loss unless
it's planned *really* well. I'm not going to say that is impossible, but
it's certainly hard.

Some queuing and deliberate under-utilization is needed to achieve reliable
QoE in a system like that.

Just my two cents!

Cheers,
Bjørn Ivar Teigen

On Sat, 13 May 2023 at 12:10, Ulrich Speidel via Starlink <
starlink@lists.bufferbloat.net> wrote:

> Here's a bit of a question to you all. See what you make of it.
>
> I've been thinking a bit about the latencies we see in the Starlink
> network. This is why this list exist (right, Dave?). So what do we know?
>
> 1) We know that RTTs can be in the 100's of ms even in what appear to be
> bent-pipe scenarios where the physical one-way path should be well under
> 3000 km, with physical RTT under 20 ms.
> 2) We know from plenty of traceroutes that these RTTs accrue in the
> Starlink network, not between the Starlink handover point (POP) to the
> Internet.
> 3) We know that they aren't an artifact of the Starlink WiFi router (our
> traceroutes were done through their Ethernet adaptor, which bypasses the
> router), so they must be delays on the satellites or the teleports.
> 4) We know that processing delay isn't a huge factor because we also see
> RTTs well under 30 ms.
> 5) That leaves queuing delays.
>
> This issue has been known for a while now. Starlink have been innovating
> their heart out around pretty much everything here - and yet, this
> bufferbloat issue hasn't changed, despite Dave proposing what appears to
> be an easy fix compared to a lot of other things they have done. So what
> are we possibly missing here?
>
> Going back to first principles: The purpose of a buffer on a network
> device is to act as a shock absorber against sudden traffic bursts. If I
> want to size that buffer correctly, I need to know at the very least
> (paraphrasing queueing theory here) something about my packet arrival
> process.
>
> If I look at conventional routers, then that arrival process involves
> traffic generated by a user population that changes relatively slowly:
> WiFi users come and go. One at a time. Computers in a company get turned
> on and off and rebooted, but there are no instantaneous jumps in load -
> you don't suddenly have a hundred users in the middle of watching
> Netflix turning up that weren't there a second ago. Most of what we know
> about Internet traffic behaviour is based on this sort of network, and
> this is what we've designed our queuing systems around, right?
>
> Observation: Starlink potentially breaks that paradigm. Why? Imagine a
> satellite X handling N users that are located closely together in a
> fibre-less rural town watching a range of movies. Assume that N is
> relatively large. Say these users are currently handled through ground
> station teleport A some distance away to the west (bent pipe with
> switching or basic routing on the satellite). X is in view of both A and
> the N users, but with X being a LEO satellite, that bliss doesn't last.
> Say X is moving to the (south- or north-)east and out of A's range.
> Before connection is lost, the N users migrate simultaneously to a new
> satellite Y that has moved into view of both A and themselves. Y is
> doing so from the west and is also catering to whatever users it can see
> there, and let's suppose has been using A for a while already.
>
> The point is that the user load on X and Y from users other than our N
> friends could be quite different. E.g., one of them could be over the
> ocean with few users, the other over countryside with a lot of
> customers. The TCP stacks of our N friends are (hopefully) somewhat
> adapted to the congestion situation on X with their cwnds open to
> reasonable sizes, but they are now thrown onto a completely different
> congestion scenario on Y. Similarly, say that Y had less than N users
> before the handover. For existing users on Y, there is now a huge surge
> of competing traffic that wasn't there a second ago - surging far faster
> than we would expect this to happen in a conventional network because
> there is no slow start involved.
>
> This seems to explain the huge jumps you see on Starlink in TCP goodput
> over time.
>
> But could this be throwing a few spanners into the works in terms of
> queuing? Does it invalidate what we know about queues and queue
> management? Would surges like these justify larger buffers?
>
> --
> ****************************************************************
> Dr. Ulrich Speidel
>
> School of Computer Science
>
> Room 303S.594 (City Campus)
>
> The University of Auckland
> u.speidel@auckland.ac.nz
> http://www.cs.auckland.ac.nz/~ulrich/
> ****************************************************************
>
>
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
>


-- 
Bjørn Ivar Teigen, Ph.D.
Head of Research
+47 47335952 | bjorn@domos.ai | www.domos.ai

[-- Attachment #2: Type: text/html, Size: 9534 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Starlink] Starlink hidden buffers
  2023-05-24 15:26 ` Bjørn Ivar Teigen
@ 2023-05-24 21:53   ` Ulrich Speidel
  0 siblings, 0 replies; 34+ messages in thread
From: Ulrich Speidel @ 2023-05-24 21:53 UTC (permalink / raw)
  To: Bjørn Ivar Teigen; +Cc: starlink

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]


On 25/05/2023 3:26 am, Bjørn Ivar Teigen wrote:
> Because of the need for frequent link changes in the Starlink network, 
> there will be a need for more buffering than your typical (relatively) 
> static network. Not only because the load changes quickly, but because 
> the capacity does as well. This causes rapid changes in the 
> load-to-capacity-ratio, which will cause queues and/or packet loss 
> unless it's planned /really/ well. I'm not going to say that is 
> impossible, but it's certainly hard.
>
All nicely put. That's just the point I was trying to make, albeit as a 
question not necessarily a conclusion.

-- 

****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************



[-- Attachment #2: Type: text/html, Size: 1758 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-07-27 20:37 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-13 10:10 [Starlink] Starlink hidden buffers Ulrich Speidel
2023-05-13 11:20 ` Sebastian Moeller
2023-05-13 12:16   ` Ulrich Speidel
2023-05-13 23:00     ` David Lang
2023-05-13 22:57 ` David Lang
2023-05-14  6:06   ` Ulrich Speidel
2023-05-14  6:55     ` David Lang
2023-05-14  8:43       ` Ulrich Speidel
2023-05-14  9:00         ` David Lang
2023-05-15  2:41           ` Ulrich Speidel
2023-05-15  3:33             ` David Lang
2023-05-15  6:36               ` Sebastian Moeller
2023-05-15 11:07                 ` David Lang
2023-05-24 12:55               ` Ulrich Speidel
2023-05-24 13:44                 ` Dave Taht
2023-05-24 14:05                   ` David Lang
2023-05-24 14:49                   ` Michael Richardson
2023-05-24 15:09                     ` Dave Collier-Brown
2023-05-24 15:31                     ` Dave Taht
2023-05-24 18:30                       ` Michael Richardson
2023-05-24 18:45                         ` Sebastian Moeller
2023-05-24 13:59                 ` David Lang
2023-05-24 22:39                   ` Ulrich Speidel
2023-05-25  0:06                     ` David Lang
2023-07-27 20:37                     ` Ulrich Speidel
2023-05-24 15:18                 ` Mark Handley
2023-05-24 21:50                   ` Ulrich Speidel
2023-05-25  0:17                     ` David Lang
2023-05-14  9:06         ` Sebastian Moeller
2023-05-14  9:13           ` David Lang
2023-05-14  9:57 ` Oleg Kutkov
2023-05-14  9:59   ` Oleg Kutkov
2023-05-24 15:26 ` Bjørn Ivar Teigen
2023-05-24 21:53   ` Ulrich Speidel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox