[Starlink] Starlink hidden buffers

Thu Jul 27 16:37:40 EDT 2023

So we got a Yaosheng adapter here but I didn't get to play with it until 
last week. We hooked up a SuperMicro with a DHCP-ing Ethernet interface 
to it.

First impressions:

  * DHCP server and IPv4 gateway is 100.64.0.1, which sits on the
    infrastructure side of the Starlink network.
  * The IPv4 address is assigned from 100.64.0.0/10.
  * DNS assigned by 100.64.0.1 are 1.1.1.1 and 8.8.8.8 - but woe betide
    you, their reachability wasn't all that great when we tried, so a
    lot of name lookups failed.

More to come when I have a moment.

On 25/05/2023 10:39 am, Ulrich Speidel wrote:
>
>
> On 25/05/2023 1:59 am, David Lang wrote:
>>
>> >> >>
>> >> 
>> https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P> 
>>
>> >> 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P 
>> <https://www.amazon.com/YAOSHENG-Rectangular-Adapter-Connect-Injector/dp/B0BYJTHX4P>>
>> >>
>> >> >> >>
>> >> >> > I'll see whether I can get hold of one of these. Cutting a 
>> cable on a
>> >> >> > university IT asset as an academic is not allowed here, 
>> except if it
>> >> >> > doesn't meet electrical safety standards.
>> > OK, we have one on order, along with PoE injector and power supply. 
>> Don't
>> > hold your breath, though, I'll be out of the country when it 
>> arrives and
>> > it'll be late July before I get to play with it.
>>
>> I've got a couple on order, but they won't arrive for 1-3 more weeks :-(
> I envy you!
>> I'll also note that in the last launch of the v2 mini satellites, 
>> they mentioned
>> that those now supported E band backhaul to handle 4x the bandwidth 
>> of the
>> earlier satellites
> Still not enough to connect the missing 2.5 or so billion, but a step 
> in the right direction for sure.
>>
>> > It's certainly noticeable here that they seem to have sets of three 
>> grouped
>> > together in a relatively compact geographical area (you could visit 
>> all NZ
>> > North Island ground stations in a day by car from Auckland, 
>> Auckland traffic
>> > notwithstanding, and at a stretch could do the same down south from 
>> Hinds to
>> > Awarua if you manage to ignore the scenery, but getting from the 
>> southernmost
>> > North Island ground station to the northernmost South Island one is 
>> basically
>> > a two day drive plus ferry trip).
>>
>> I lived in Wanganui for a few years, including one RV trip down the 
>> South
>> Island. I know what you mean about needing to ignore the scenery :-)
> Interesting - that must have been before the local īwi pointed out 
> once again that the town had misspelled its name since 1854, and for 
> once were heard - so it's now officially "Whanganui", for crown 
> agencies, anyway.
>> Ok, I thought I had heard they switched every 15 min, so it's every 5 
>> min
>> instead?
> Dishy collects this information as a cumulative dataset, which the 
> tools query via grpc. The frames in the movie corresponds to snapshots 
> of the dataset taken at 5 second intervals. This indicates switches 
> roughly every ten to seventy seconds, with most dwell times being 
> around 15-30 seconds.
>>
>> > Conclusion: latency change from tracking one satellite is smaller 
>> than the
>> > latency difference as you jump between satellites. You could be 
>> looking at
>> > several 100 km of path difference here. In an instant. Even that, 
>> at 300,000
>> > km/s of propagation speed, is only in the order of maybe 1 ms or so 
>> - peanuts
>> > compared to the RTTs in the dozens of ms that we're seeing. But if 
>> you get
>> > thrown from one queue onto another as you get handed over - what 
>> does that do
>> > to the remote TCP stack that's serving you?
>>
>> yes, the point I thought that I was trying to make was that the 
>> latency change
>> from satellite movement was not very significant
> So it's got to come from somewhere else.
>>
>> >> >> If it stays the same, I would suspect that you are actually 
>> hitting a
>> >> >> different ground station and there is a VPN backhaul to your 
>> egress point
>> >> >> to the regular Internet (which doesn't support mobile IP 
>> addresses) for
>> >> >> that cycle. If it tapers off, then I could buy bufferbloat that 
>> gets
>> >> >> resolved as TCP backs off.
>> >> >
>> >> > Yes, quite sorting out which part of your latency is what is the 
>> million
>> >> > dollar question here...
>> >> >
>> >> > We saw significant RTT changes here during the recent cyclone 
>> over periods
>> >> > of several hours, and these came in steps (see below), with the 
>> initial
>> >> > change being a downward one. Averages are over 60 pings (the 
>> time scale
>> >> > isn't 100% true as we used "one ping, one second" timing) here.
>> >> >
>> >> >
>> >> > We're still not sure whether to attribute this to load change or 
>> ground
>> >> > station changes. There were a lot of power outages, especially in
>> >> > Auckland's lifestyle block belt, which teems with Starlink 
>> users, but all
>> >> > three North Island ground stations were also in areas affected 
>> by power
>> >> > outages (although the power companies concerned don't provide 
>> the level of
>> >> > detail to establish whether they were affected). It's also not 
>> clear what,
>> >> > if any, backup power arrangements they have). At ~25 ms, the 
>> step changes
>> >> > in RTT are too large be the result of a switch in ground 
>> stations, though,
>> >> > the path differences just aren't that large. You'd also expect a 
>> ground
>> >> > station outage to result in longer RTTs, not shorter ones, if 
>> you need to
>> >> > re-route via another ground station. One explanation might be 
>> users getting
>> >> > cut off if they relied on one particular ground station for bent 
>> pipe ops -
>> >> > but that would not explain this order of magnitude effect as I'd 
>> expect
>> >> > that number to be small. So maybe power outages at the user end 
>> after all.
>> >> > But that would then tell us that these are load-dependent 
>> queuing delays.
>> >> > Moreover, since those load changes wouldn't have involved the 
>> router at our
>> >> > site, we can conclude that these are queue sojourn times in the 
>> Starlink
>> >> > network.
>>
>> remember that SpaceX controlls the ground stations as well, so if 
>> they are doing
>> any mobile IP trickery to redirect traffic from one ground station to 
>> another,
>> they can anticipate the shift or move the queue for the user or other 
>> trickery
>> like this (probably aren't yet, they seem to be in the early days 
>> here, focusing
>> on keeping things working and improving on the space side more than 
>> anything
>> else)
> I strongly suspect that they are experimenting with this here and with 
> that there.
>>
>>
>> >> AQM allocates the available bandwidth between different 
>> connections (usually
>> >> different users)
>> > But it does this under the assumption that the vector for changes 
>> in bandwidth
>> > availability is the incoming traffic, which AQM gives (indirect) 
>> feedback to,
>> > right?
>>
>> no, this is what I'm getting at below
>>
>> >> When it does this indirectly for inbound traffic by delaying acks, 
>> the
>> >> results depend on the senders handling of these indirect signals 
>> that were
>> >> never intended for this purpose.
>>
>> This is what you are thinking of, where it's providing indirect 
>> feedback to an
>> unknowable inbound queue on a remote system
>>
>> >> But when it does this directly on the sending side, it doesn't 
>> matter what
>> >> the senders want, their data WILL be managed to the 
>> priority/bandwidth that
>> >> the AQM sets, and eventually their feedback is dropped packets, which
>> >> everyone who is legitimate responds to.
>>
>> when the AQM in on the sending side of the bottleneck, it now has 
>> direct control
>> over the queue, and potentially has information over the available 
>> bandwidth as
>> it changes. But even if it doesn't know what the available bandwidth 
>> is, it
>> still can dispatch the data in it's queues 'fairly' (whatever that 
>> means to the
>> particulat AQM algorithm), changes in the data rate just change how 
>> fast the
>> queue drains.
>
> Yes - but if you delay ACKs, the only entity this has any effect on is 
> the original (remote) TCP sender, which is who you are trying to 
> persuade to take it easy so you're not going to be forced to (tail or 
> otherwise) drop packets.
>
> Dropping helps clear your queue (the one in front of the bottleneck).
>
>>
>> > Understood. You build a control loop, where the latency is the 
>> delay in the
>> > control signal.
>> >
>> > Classically, you have a physical bottleneck that the AQM manages, 
>> where the
>> > physical bandwidth doesn't change.
>> >
>> > The available bandwidth changes, (mostly) as a result of TCP 
>> connections (or
>> > similarly behaved UDP applications) joining in slow start, or 
>> disappearing.
>> >
>> > Basically, your queues grow and shrink one packet at a time.
>> >
>> > Your control signal allows you (if they're well behaved) throttle /
>> > accelerate senders.
>> >
>> > What you don't get are quantum jumps in queue occupancy, jump 
>> changes in
>> > underlying physical bandwidth, or a whole set of new senders that are
>> > completely oblivious to any of your previous control signals. But 
>> you get all
>> > that with satellite handovers like these.
>>
>> for a single TCP session,it has slow-start, but if you suddently 
>> start dozens or
>> hundreds of TCP sessions, (bittorrent, other file transfer protocols, 
>> or just a
>> website with hundreds of sub-elements), I think it's a bigger step 
>> than you are
>> thinking.
> Doesn't each TCP session maintain and manage its own cwnd?
>>
>> And again, I think the same issue exists on cell sites as users move 
>> from one
>> cell to another.
> Yes. But that happens gradually in comparison to Starlink, and the 
> only TCP stack that potentially gets affected badly as a user moves 
> from one cell site to the next is that of the user. But what you have 
> here is the equivalent of the cell tower moving out of range of a 
> whole group of users in one go. Different ballpark?
>>
>> > So what if the response you elicit in this way is to a queue 
>> scenario that no
>> > longer applies?
>>
>> you run the risk of under-utilizing the link for a short time (which 
>> may mean
>> that you decide to run the queues a little bigger than with fixed 
>> links, so that
>> when a chunk of data disappears from your queue, you still will keep 
>> utilization
>> up, sacraficing some latency to improve overall throughput)
> So we're back to the "more buffer" scenario here, too.
>>
>> David Lang 
> -- 
> ****************************************************************
> Dr. Ulrich Speidel
>
> School of Computer Science
>
> Room 303S.594 (City Campus)
>
> The University of Auckland
> u.speidel at auckland.ac.nz  
> http://www.cs.auckland.ac.nz/~ulrich/
> ****************************************************************
>
>
>
-- 
****************************************************************
Dr. Ulrich Speidel

School of Computer Science

Room 303S.594 (City Campus)

The University of Auckland
u.speidel at auckland.ac.nz  
http://www.cs.auckland.ac.nz/~ulrich/
****************************************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/starlink/attachments/20230728/5424f5e8/attachment-0001.html>