Starlink has bufferbloat. Bad.
 help / color / mirror / Atom feed
From: Ulrich Speidel <u.speidel@auckland.ac.nz>
To: David Lang <david@lang.hm>
Cc: "starlink@lists.bufferbloat.net" <starlink@lists.bufferbloat.net>
Subject: Re: [Starlink] Starlink hidden buffers
Date: Sun, 14 May 2023 18:06:42 +1200	[thread overview]
Message-ID: <48b00469-0dbb-54c4-bedb-3aecbf714a1a@auckland.ac.nz> (raw)
In-Reply-To: <0no84q43-s4n6-45n8-50or-12o3rq104n99@ynat.uz>


On 14/05/2023 10:57 am, David Lang wrote:
> On Sat, 13 May 2023, Ulrich Speidel via Starlink wrote:
>
>> Here's a bit of a question to you all. See what you make of it.
>>
>> I've been thinking a bit about the latencies we see in the Starlink 
>> network. This is why this list exist (right, Dave?). So what do we know?
>>
>> 1) We know that RTTs can be in the 100's of ms even in what appear to 
>> be bent-pipe scenarios where the physical one-way path should be well 
>> under 3000 km, with physical RTT under 20 ms.
>> 2) We know from plenty of traceroutes that these RTTs accrue in the 
>> Starlink network, not between the Starlink handover point (POP) to 
>> the Internet.
>> 3) We know that they aren't an artifact of the Starlink WiFi router 
>> (our traceroutes were done through their Ethernet adaptor, which 
>> bypasses the router), so they must be delays on the satellites or the 
>> teleports.
>
> the ethernet adapter bypasses the wifi, but not the router, you have 
> to cut the cable and replace the plug to bypass the router
Good point - but you still don't get the WiFi buffering here. Or at 
least we don't seem to, looking at the difference between running with 
and without the adapter.
>
>> 4) We know that processing delay isn't a huge factor because we also 
>> see RTTs well under 30 ms.
>> 5) That leaves queuing delays.
>>
>> This issue has been known for a while now. Starlink have been 
>> innovating their heart out around pretty much everything here - and 
>> yet, this bufferbloat issue hasn't changed, despite Dave proposing 
>> what appears to be an easy fix compared to a lot of other things they 
>> have done. So what are we possibly missing here?
>>
>> Going back to first principles: The purpose of a buffer on a network 
>> device is to act as a shock absorber against sudden traffic bursts. 
>> If I want to size that buffer correctly, I need to know at the very 
>> least (paraphrasing queueing theory here) something about my packet 
>> arrival process.
>
> The question is over what timeframe. If you have a huge buffer, you 
> can buffer 10s of seconds of traffic and eventually send it. That will 
> make benchmarks look good, but not the user experience. The rapid drop 
> in RAM prices (beyond merely a free fall) and the benchmark scores 
> that heavily penalized any dropped packets encouraged buffers to get 
> larger than is sane.
>
> it's still a good question to define what is sane, the longer the 
> buffer, the mor of a chance of finding time to catch up, but having 
> packets in the buffer that have timed out (i.e. DNS queries tend to 
> time out after 3 seconds, TCP will give up and send replacement 
> packets, making the initial packets meaningless) is counterproductive. 
> What is the acceptable delay to your users?
>
> Here at the bufferbloat project, we tend to say that buffers past a 
> few 10s of ms worth of traffic are probably bad and are aiming to 
> single-digit ms in many cases.
Taken as read.
>
>> If I look at conventional routers, then that arrival process involves 
>> traffic generated by a user population that changes relatively 
>> slowly: WiFi users come and go. One at a time. Computers in a company 
>> get turned on and off and rebooted, but there are no instantaneous 
>> jumps in load - you don't suddenly have a hundred users in the middle 
>> of watching Netflix turning up that weren't there a second ago. Most 
>> of what we know about Internet traffic behaviour is based on this 
>> sort of network, and this is what we've designed our queuing systems 
>> around, right?
>
> not true, for businesses, every hour as meetings start and let out, 
> and as people arrive in the morning, arrive back from lunch, you have 
> very sharp changes in the traffic.
And herein lies the crunch: All of these things that you list happen 
over much longer timeframes than a switch to a different satellite. 
Also, folk coming back from lunch would start with something like 
cwnd=10. Users whose TCP connections get switched over to a different 
satellite by some underlying tunneling protocol could have much larger 
cwnd.
>
> at home you have less changes in users, but you also may have less 
> bandwidth (although many tech enthusiasts have more bandwidth than 
> many companies, two of my last 3 jobs have had <400Mb at their main 
> office with hundreds of employees while many people would consider 
> that 'slow' for home use). As such a parent arriving home with a 
> couple of kids will make a drastic change to the network usage in a 
> very short time.
I think you've missed my point - I'm talking about changes in network 
mid-flight, not people coming home and getting started over a period of 
a few minutes. The change you see in a handover is sudden and probably 
width sub-second ramp-up. And it's something that doesn't just happen 
when people come home or return from lunch - it happens every few minutes.
>
>
> but the active quueing systems that we are designing (cake, fq_codel) 
> handle these conditions very well because they don't try to guess what 
> the usage is going to be, they just look at the packets that they have 
> to process and figure out how to dispatch them out in the best way.
Understood - I've followed your work.
>
> because we have observed that latency tends to be more noticable for 
> short connections (DNS, checking if cached web pages are up to date, 
> etc), our algorithms give a slight priority to new-low-traffic 
> connections over long-running-high-traffic connections rather than 
> just splitting the bandwidth evenly across all connections, and can 
> even go further to split bandwith between endpoints, not just 
> connections (with endpoints being a configurable definition)
>
> without active queue management, the default is FIFO, which allows the 
> high-user-impact, short connection packets to sit in a queue behind 
> the low-user-impace, bulk data transfers. For benchmarks, 
> a-packet-is-a-packet and they all count, so until you have enough 
> buffering that you start having expired packets in flight, it doesn't 
> matter, but for the user experience, there can be a huge difference.

All understood - you're preaching to the converted. It's just that I 
think Starlink may be a different ballpark.

Put another way: If you have a protocol (TCP) that is designed to 
reasonably expect that its current cwnd is OK to use for now is put into 
a situation where there are relatively frequent, huge and lasting step 
changes in available BDP within subsecond periods, are your underlying 
assumptions still valid?

I suspect they're handing over whole cells, not individual users, at a 
time.

>
> David Lang
>
-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel@auckland.ac.nz
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************




  reply	other threads:[~2023-05-14  6:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-13 10:10 Ulrich Speidel
2023-05-13 11:20 ` Sebastian Moeller
2023-05-13 12:16   ` Ulrich Speidel
2023-05-13 23:00     ` David Lang
2023-05-13 22:57 ` David Lang
2023-05-14  6:06   ` Ulrich Speidel [this message]
2023-05-14  6:55     ` David Lang
2023-05-14  8:43       ` Ulrich Speidel
2023-05-14  9:00         ` David Lang
2023-05-15  2:41           ` Ulrich Speidel
2023-05-15  3:33             ` David Lang
2023-05-15  6:36               ` Sebastian Moeller
2023-05-15 11:07                 ` David Lang
2023-05-24 12:55               ` Ulrich Speidel
2023-05-24 13:44                 ` Dave Taht
2023-05-24 14:05                   ` David Lang
2023-05-24 14:49                   ` Michael Richardson
2023-05-24 15:09                     ` Dave Collier-Brown
2023-05-24 15:31                     ` Dave Taht
2023-05-24 18:30                       ` Michael Richardson
2023-05-24 18:45                         ` Sebastian Moeller
2023-05-24 13:59                 ` David Lang
2023-05-24 22:39                   ` Ulrich Speidel
2023-05-25  0:06                     ` David Lang
2023-07-27 20:37                     ` Ulrich Speidel
2023-05-24 15:18                 ` Mark Handley
2023-05-24 21:50                   ` Ulrich Speidel
2023-05-25  0:17                     ` David Lang
2023-05-14  9:06         ` Sebastian Moeller
2023-05-14  9:13           ` David Lang
2023-05-14  9:57 ` Oleg Kutkov
2023-05-14  9:59   ` Oleg Kutkov
2023-05-24 15:26 ` Bjørn Ivar Teigen
2023-05-24 21:53   ` Ulrich Speidel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/starlink.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48b00469-0dbb-54c4-bedb-3aecbf714a1a@auckland.ac.nz \
    --to=u.speidel@auckland.ac.nz \
    --cc=david@lang.hm \
    --cc=starlink@lists.bufferbloat.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox