[Starlink] [NNagain] When Flows Collide?

Sat Mar 9 14:08:48 EST 2024

Hi David,

Thanks for the explanation.  I had heard most of those technology 
buzzwords but your message puts them into context together. 
Prioritization and queue management certainly helps, but I still don't 
understand how the system behaves when it hits capacity somewhere deep 
inside -- the "thought experiments" I described with a bottleneck 
somewhere deep inside the Internet as two flows collide.

The scheme you describe also seems vulnerable to users' innovative 
tactics to get better service.  E.g., an "ISO download" using some 
scheme like Torrents would spread the traffic around a bunch of 
connections which may not all go through the same bottleneck and not be 
judged as low priority.  Also it seems prone to things like DDOS 
attacks,  e.g., flooding a path with DNS queries from many bot sources 
that are judged as high priority.

The behavior "packets get marked/dropped to signal the sender to slow 
down" seems essentially the same design as the "Source Quench" behavior 
defined in the early 1980s.  At the time, I was responsible for a TCP 
implementation, and had questions about what my TCP should do when it 
received such a "slow down" message.  It was especially unclear in 
certain situations - e.g., if my TCP sent a datagram to open a new 
connection and got a "slow down" response, what exactly should it do?

There were no good answers back then.  One TCP implementor decided that 
the best reaction on receiving a "slow down" message was to immediately 
retransmit the datagram that had just been confirmed to be discarded.  
"Slow down" actually meant "Speed up, I threw away your last datagram."

So, I'm still curious about the Internet behavior with the current 
mechanisms when the system hits its maximum capacity - the two simple 
scenarios I mentioned with bottlenecks and only two data flows involved 
that converged at the bottleneck.   What's supposed to happen in 
theory?   Are implementations actually doing what they're supposed to 
do?  What does happen in a real-world test?

Jack Haverty

On 3/8/24 18:57, David Lang wrote:
> this is what bufferbloat has been fighting. The default was that 'data 
> is important, don't throw it away, hang on to it and send it later'
>
> In practice, this has proven to be suboptimal as the buffers grew 
> large enough that the data being buffered was retransmitted anyway 
> (among other problems)
>
> And because the data was buffered, new data arriving was delayed 
> behind the buffered data.
>
> This is measurable as 'latency under load' for light connections. So 
> while latency isn't everything, it turns out to be a good proxy to 
> detect when the standard queuing mechansisms are failing to give you 
> good performance.
>
> It turns out that not all data is equally important. Active Queue 
> Management is the art of deciding priorities, both in deciding what 
> data to throw away, but also in allowing some later arriving data to 
> be transmitted ahead of data in another connection that arrived before 
> it.
>
> With fq_codel and cake, this involves tracking the different 
> connections and their behavior. connections that send relatively 
> little data (DNS lookups, video chat) have priority over connections 
> that send a lot of data (ISO downloads), not based on classifying the 
> data, but by watching the behavior.
>
> connections with a lot of data can buffer a bit, but aren't allowed to 
> use all the available buffer space, after they have used 'their 
> share', packets get marked/dropped to signal the sender to slow down.
>
> While it is possible for an implementation to 'cheat' by detecting the 
> latency probes and prioritizing them, measuring the latency on real 
> data works as well and they can't cheat on that without actually 
> addressing the problem
>
> That's why you see such a significant focus on latency in this group, 
> it's not latency for the sake of latency, it's latency as a sign that 
> new and sparse flows can get a reasonable share of bandwith even in 
> the face of heavy/hostile users on the same links.
>
> David Lang
>
>
> On Fri, 8 Mar 2024, Jack Haverty via Nnagain wrote:
>
>> Date: Fri, 8 Mar 2024 15:44:05 -0800
>> From: Jack Haverty via Nnagain <nnagain at lists.bufferbloat.net>
>> To: nnagain at lists.bufferbloat.net, Starlink at lists.bufferbloat.net
>> Cc: Jack Haverty <jack at 3kitty.org>
>> Subject: [NNagain] When Flows Collide?
>>
>> It's great to see that latency is getting attention as well as action 
>> to control it.  But it's only part of the bigger picture of Internet 
>> performance.
>>
>> While performance across a particular network is interesting, most 
>> uses of the Internet involve data flowing through several separate 
>> networks.  That's pretty much the definition of "Internet".  The 
>> endpoints might be some kind of LAN in a home or corporate IT 
>> facility or public venue.   In between there might be fiber, radio, 
>> satellite, or other (even whimsically avian!?) networks carrying a 
>> users data.   This kind of system configuration has existed since the 
>> genesis of The Internet and seems likely to continue. Technology has 
>> advanced a lot, with bigger and bigger "pipes" invented to carry more 
>> data, but fundamental issues remain.
>>
>> System configurations we used in the early research days were real 
>> experiments to be measured and tested, or often just "thought 
>> experiments" to imagine how the system would behave, what algorithms 
>> would be appropriate, and what protocols had to exist to coordinate 
>> the activities of all the components.
>>
>> One such configuration was very simple.  Imagine there are three very 
>> fast computers, each attached to a very fast LAN.   The computers and 
>> LAN can send and receive data as fast as you can imagine, so that 
>> they are not a limiting factor.   The LANs are attached to some "ISP" 
>> which isn't as fast (in bandwidth or latency) as a LAN.  ISPs are 
>> interconnected at various points, forming a somewhat rich mesh of 
>> topology with several, or many, possible routes from any source to 
>> any destination.
>>
>> Now imagine a user configuration in which two of the computers send a 
>> constant stream of data to the third computer at a predefined rate.  
>> Perhaps it is a UDP datagram every N milliseconds, each datagram 
>> containing a frame of video.  If N=20 it corresponds to a 50Hz frame 
>> rate, which is common for video.
>>
>> Somewhere along the way to that common destination, those two data 
>> streams collide, and there is a bottleneck.   All the data coming in 
>> cannot fit in the pipe going out.  Something has to give.
>>
>> Thought experiment -- What should happen?  Does the bottleneck 
>> discard datagrams it can't handle?  How does it decide which ones to 
>> discard?   Does the bottleneck buffer the excess datagrams, hoping 
>> that the situation is just temporary?   Does the bottleneck somehow 
>> signal back to the sources to reduce their data rate?  Does th 
>> ebottleneck discard datagrams that it knows won't reach the 
>> destination in time to be useful?  Does the bottleneck trigger some 
>> kind of network reconfiguration, perhaps to route "low priority" data 
>> along some alternate path to free up capacity for the video streams 
>> that requires low latency?
>>
>> Real experiment -- set up such a configuration and observe what 
>> happens, especially from the end-users' perspectives.  What kind of 
>> video does the end-user see?
>>
>> Second thought experiment -- Using the same configuration, send data 
>> using TCP instead of UDP.  This adds more mechanisms, but now in the 
>> end-users' computers.  How should the ISPs and TCPs involved behave?  
>> How should they cooperate?  What should happen?  What mechanisms 
>> (algorithms, protocols, etc.) are needed to make the system behave 
>> that way?
>>
>> Second real Experiment -- How do the specific TCP implementations 
>> actually behave?  What kind of video quality do the end users 
>> experience?  What kind of data flows actually travel through the 
>> network components?
>>
>> Of course we all observe such real experiments every day, whenever we 
>> see or participate in various kinds of videoconferences.  Perhaps 
>> someone has instrumented and gathered performance data...?
>>
>> These questions were discussed and debated at great length more than 
>> 40 years ago as TCP V4 was designed.  We couldn't figure out the 
>> appropriate algorithms and protocols, and didn't have computer 
>> equipment or communications capabilities to implement anything more 
>> than the simplest mechanisms anyway.   So the topic became an item on 
>> the "future study" list.
>>
>> But we did put various "placeholder" mechanisms in place in TCP/IP 
>> V4, as a reminder that a "real" solution was needed for some future 
>> next generation release.  Time-to-live (TTL) would likely need to be 
>> based on actual time instead of hops - which were silly but the best 
>> we could do with available equipment at the time.  Source Quench (SQ) 
>> needed to be replaced by a more effective mechanism, and include 
>> details of how all the components should act when sending or 
>> receiving an SQ.   Routing needed to be expanded to add the ability 
>> to send different data flows over different routes, so that bulk and 
>> interactive data could more readily coexist.   Lots of such issues to 
>> be resolved.
>>
>> In the meanwhile, the general consensus was that everything would 
>> work OK as long as the traffic flows only rarely created "bottleneck" 
>> situations, and such events would be short and transitory.   There 
>> wasn't a lot of data flow yet; the Internet was still an Experiment.  
>> We figured we'd be OK for a while as the research continued and found 
>> solutions.
>>
>> Meanwhile, the Web happened.  Videoconferencing, vlogs, and other 
>> generators of high traffic exploded.  Clouds have formed, with users 
>> now interacting with very remote computers instead of the ones on 
>> their desks or down the hall.
>>
>> As Dorothy would say, "We're not in Kansas anymore".
>>
>> Jack Haverty
>>
>>
>>
>>
>>
>>
>>
>>
>> On 3/8/24 12:31, Dave Taht via Nnagain wrote:
>>> I am deeply appreciative of everyones efforts here over the past 3
>>> years, and within starlink burning the midnight oil on their 20ms
>>> goal, (especially nathan!!!!) to make all the progress made on their
>>> systems in these past few months. I was so happy to burn about 12
>>> minutes, publicly, taking apart Oleg's results here, last week:
>>>
>>> https://www.youtube.com/watch?v=N0Tmvv5jJKs&t=1760s
>>>
>>> But couldn't then and still can't talk better to the whys and the
>>> problems remaining. (It's not a kernel problem, actually)
>>>
>>> As for starlink/space support of us, bufferbloat.net, and/or lowering
>>> latency across the internet in general, I don't know. I keep hoping a
>>> used tesla motor for my boat will arrive in the mail one day, that's
>>> all. :)
>>>
>>> It is my larger hope that with this news, all the others doing FWA,
>>> and for that matter, cable, and fiber, will also get on the stick,
>>> finally. Maybe someone in the press will explain bufferbloat. Who
>>> knows what the coming days hold!?
>>>
>>> 13 herbs and spices....
>>>
>>> On Fri, Mar 8, 2024 at 3:10 PM the keyboard of geoff goodfellow via
>>> Starlink<starlink at lists.bufferbloat.net>  wrote:
>>>> it would be a super good and appreciative gesture if they would 
>>>> disclose what/if any of the stuff they are making use of and then 
>>>> also to make a donation :)
>>>>
>>>> On Fri, Mar 8, 2024 at 12:50 PM J Pan<Pan at uvic.ca> wrote:
>>>>> they benefited a lot from this mailing list and the research and even
>>>>> user community at large
>>>>> -- 
>>>>> J Pan, UVic CSc, ECS566, 250-472-5796 (NO VM),Pan at UVic.CA, 
>>>>> Web.UVic.CA/~pan
>>>>>
>>>>>
>>>>> On Fri, Mar 8, 2024 at 11:40 AM the keyboard of geoff goodfellow via
>>>>> Starlink<starlink at lists.bufferbloat.net>  wrote:
>>>>>> Super excited to be able to share some of what we have been 
>>>>>> working on over the last few months!
>>>>>> EXCERPT:
>>>>>>
>>>>>> Starlink engineering teams have been focused on improving the 
>>>>>> performance of our network with the goal of delivering a service 
>>>>>> with stable 20 millisecond (ms) median latency and minimal packet 
>>>>>> loss.
>>>>>>
>>>>>> Over the past month, we have meaningfully reduced median and 
>>>>>> worst-case latency for users around the world. In the United 
>>>>>> States alone, we reduced median latency by more than 30%, from 
>>>>>> 48.5ms to 33ms during hours of peak usage. Worst-case peak hour 
>>>>>> latency (p99) has dropped by over 60%, from over 150ms to less 
>>>>>> than 65ms. Outside of the United States, we have also reduced 
>>>>>> median latency by up to 25% and worst-case latencies by up to 35%...
>>>>>>
>>>>>> [...]
>>>>>> https://api.starlink.com/public-files/StarlinkLatency.pdf
>>>>>> via
>>>>>> https://twitter.com/Starlink/status/1766179308887028005
>>>>>> &
>>>>>> https://twitter.com/VirtuallyNathan/status/1766179789927522460
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Geoff.Goodfellow at iconia.com
>>>>>> living as The Truth is True
>>>>>>
>>>>>> _______________________________________________
>>>>>> Starlink mailing list
>>>>>> Starlink at lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/starlink
>>>>
>>>> -- 
>>>> Geoff.Goodfellow at iconia.com
>>>> living as The Truth is True
>>>>
>>>> _______________________________________________
>>>> Starlink mailing list
>>>> Starlink at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/starlink
>>>
>>>
>>
>>
>
> _______________________________________________
> Nnagain mailing list
> Nnagain at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/nnagain

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/starlink/attachments/20240309/530d59b1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x746CC322403B8E50.asc
Type: application/pgp-keys
Size: 2428 bytes
Desc: OpenPGP public key
URL: <https://lists.bufferbloat.net/pipermail/starlink/attachments/20240309/530d59b1/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <https://lists.bufferbloat.net/pipermail/starlink/attachments/20240309/530d59b1/attachment-0001.sig>