[Starlink] SatNetLab: A call to arms for the next global Internet testbed

George Burdell gb at teklibre.net
Mon Jul 19 11:50:17 EDT 2021


On Sat, Jul 10, 2021 at 01:27:28PM -0700, David Lang wrote:
> any buffer sizing based on the number of packets is wrong. Base your buffer
> size on transmit time and you have a chance of being reasonable.

This is very true. Packets have a dynamic range of 64 bytes to 64k (GRO) and
sizing queues in terms of packets leads to bad behavior on mixed up and
down traffic particularly.

Also... people doing AQM and TCP designs tend to almost always 
test one way traffic only, and this leads to less than desirable behavior
on real world traffic. Strike that. Terrible behavior! a pure
single queue AQM struggles mightily to find a good hit rate when there are a 
ton of acks, dns, gaming, voip, etc, mixed in with the capacity seeking 
flows.

Nearly every AQM paper you read never tests real, bidir traffic. It's
a huge blind spot, which is why the bufferbloat effort *starts* with
the rrul test and related on testing any new idea we have. 

bfifos are better, but harder to implement in hardware. 

A fun trick: If you are trying to optimize your traffic for R/T communications
rather than speedtest, you can clamp your tcp "mss" to smaller than 600
bytes *at the router*, and your network gets better.

(we really should get around to publishing something on that, when you are
plagued by a giant upstream FIFO, filling it with smaller packets really
helps, and it's something a smart user could easily do regardless of the
ISP's preferences) 

> 
> In cases like wifi where packets aren't sent individually, but are sent in
> blobs of packets going to the same destination, 

yes...

> you want to buffer at least
> a blobs worth of packets to each destination so that when your transmit slot
> comes up, you can maximize it.

Nooooooo! This is one of those harder tradeoffs that is pretty counter
intuitive. You want per station queuing, yes. However the decision as to
how much service time you want to grant each station is absolutely
not in maximizing the transmit slot, but in maximizing the number of
stations you can serve in reasonable time. Simple (and inaccurate) example:

100 stations at 4ms txop each, stuffed full of *udp* data, is 400ms/round.
(plus usually insane numbers of retries).

This breaks a lot of things,
and doesn't respect the closely coupled nature of tcp (please re-read
the codel paper!). Cutting the txop in this case to 1ms cuts interstation
service time... at the cost of "bandwidth" that can't be stuffed into
the slow header + wifi data rate equation.

but what you really want to do is give the sparsest stations quicker
access to the media so they can ramp up to parity (and usually 
complete their short flows much faster, and then get off)

I run with BE 2.4ms txops and announce the same in the beacon. I'd
be willing to bet your scale conference network would work
much better if you did that also. (It would be better if we could
scale txop size to the load, but fq_codel on wifi already
does the sparse station optimization which translates into many
shorter txops than you would see from other wifi schedulers, and
the bulk of the problem I see is the *stations*)

lastly, you need to defer constructing the blob as long as possible,
so you can shoot at, mark, or reschedule (FQ), the packets in there
at the last moment before they are committed to the hardware.

Ideally you would not construct any blob at all until a few microseconds
before the transmit opportunity. 

Shifting this back to starlink - they have a marvelous opportunity
to do just this, in the dishy, as they are half duplex and could 
defer grabbing the packets from a sch_cake buffer until precisely
before that txop to the sat arrives.
(my guess would be no more than 400us based on what I understand
of the arm chip they are using)

This would be much better than what we could do in the ath9k
where we were forced to always have "one in the hardware, one
ready to go" due to limitations in that chip. We're making
some progress on the openwifi fpga here, btw...


> Wifi has the added issue that the blob headers are at a much lower data rate
> than the dta itself, so you can cram a LOT of data into a blob without
> making a significant difference in the airtime used, so you really do want
> to be able to send full blobs (not at the cost of delaying tranmission if
> you don't have a full blob, a mistake some people make, but you do want to
> buffer enough to fill the blobs)
> 
> and given that dropped packets results in timeouts and retransmissions that
> affect the rest of the network, it's not obviously wrong for a lossy hop
> like wifi to retry a failed transmission, it just needs to not retry too
> many times.
> 
> David Lang
> 
> 
>  On Sat, 10 Jul 2021, Rodney W. Grimes wrote:
> 
> >Date: Sat, 10 Jul 2021 04:49:50 -0700 (PDT)
> >From: Rodney W. Grimes <starlink at gndrsh.dnsmgr.net>
> >To: Dave Taht <dave.taht at gmail.com>
> >Cc: starlink at lists.bufferbloat.net, Ankit Singla <asingla at ethz.ch>,
> >    Sam Kumar <samkumar at cs.berkeley.edu>
> >Subject: Re: [Starlink] SatNetLab: A call to arms for the next global Internet
> >     testbed
> >
> >>While it is good to have a call to arms, like this:
> >...  much information removed as I only one to reply to 1 very
> >    narrow, but IMHO, very real problem in our networks today ...
> >
> >>Here's another piece of pre-history - alohanet - the TTL field was the
> >>"time to live" field. The intent was that the packet would indicate
> >>how much time it would be valid before it was discarded. It didn't
> >>work out, and was replaced by hopcount, which of course switched
> >>networks ignore and isonly semi-useful for detecting loops and the
> >>like.
> >
> >TTL works perfectly fine where the original assumptions that a
> >device along a network path only hangs on to a packet for a
> >reasonable short duration, and that there is not some "retry"
> >mechanism in place that is causing this time to explode.  BSD,
> >and as far as I can recall, almost ALL original IP stacks had
> >a Q depth limit of 50 packets on egress interfaces.  Everything
> >pretty much worked well and the net was happy.  Then these base
> >assumptions got blasted in the name of "measurable bandwidth" and
> >the concept of packets are so precious we must not loose them,
> >at almost any cost.  Linux crammed the per interface Q up to 1000,
> >wifi decided that it was reasable to retry at the link layer so
> >many times that I have seen packets that are >60 seconds old.
> >
> >Proposed FIX:  Any device that transmits packets that does not
> >already have an inherit FIXED transmission time MUST consider
> >the current TTL of that packet and give up if > 10mS * TTL elapses
> >while it is trying to transmit.  AND change the default if Q
> >size in LINUX to 50 for fifo, the codel, etc AQM stuff is fine
> >at 1000 as it has delay targets that present the issue that
> >initially bumping this to 1000 caused.
> >
> >... end of Rods Rant ...
> >
> >--
> >Rod Grimes                                                 rgrimes at freebsd.org
> >_______________________________________________
> >Starlink mailing list
> >Starlink at lists.bufferbloat.net
> >https://lists.bufferbloat.net/listinfo/starlink
> _______________________________________________
> Starlink mailing list
> Starlink at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink



More information about the Starlink mailing list