[Starlink] SatNetLab: A call to arms for the next global> Internet testbed

Mon Jul 12 21:23:25 EDT 2021

> From: David Lang <david at lang.hm>
> 
> Wifi has the added issue that the blob headers are at a much lower data rate
> than the dta itself, so you can cram a LOT of data into a blob without making a
> significant difference in the airtime used, so you really do want to be able to
> send full blobs (not at the cost of delaying tranmission if you don't have a
> full blob, a mistake some people make, but you do want to buffer enough to fill
> the blobs)
This happens naturally if the senders in the LAN take turns and transmit what they have accumulated while waiting their turn, fairly naturally. Capping the total airtime in a cycle limits short message latency, which is why small packets are helpful.

> 
> and given that dropped packets results in timeouts and retransmissions that
> affect the rest of the network, it's not obviously wrong for a lossy hop like
> wifi to retry a failed transmission, it just needs to not retry too many times.
> 
Absolutely right, though not perfect. local retransmit on a link (or WLAN domain) benefits if the link has a high bit-error rate. On the other hand, it's better if you can to use FEC, or erasure coding or just lower the attempted signalling rate, from an information theoretic point of view. If you have an estimator of Bit Error Rate on the link (which gives you a packet error rate), there's a reasonable bound on the number of retransmits on an individual packet at the link level that doesn't kill end-to-end latency. I forget how the formula is derived. It's also important as BER increases to use shorter packet frames.

End to end retransmit is not the optimal way to correct link errors - the end-to-end checksum and retransmit in TCP has confused people over the years into thinking link reliability can be omitted! That was never the reason TCP does end-to-end error checking. People got confused about that. As Dave Taht can recount based on discussions with Steve Crocker and me (ARPANET and TCP/IP) the point of end-to-end checks is to make sure that *overall* the system doesn't introduce errors, including in buffer memory, software that doesn't quite work, etc. The TCP retransmission is mostly about recovering from packet drops and things like duplicated packets resulting from routing changes, etc.

So fix link errors at link level (but remember that retransmit with checksum isn't really optimal there - there are better ways if BER is high or the error might be because of software or hardware bugs which tend to be non-random).

> David Lang
> 
> 
>   On Sat, 10 Jul 2021, Rodney W. Grimes wrote:
> 
>> Date: Sat, 10 Jul 2021 04:49:50 -0700 (PDT)
>> From: Rodney W. Grimes <starlink at gndrsh.dnsmgr.net>
>> To: Dave Taht <dave.taht at gmail.com>
>> Cc: starlink at lists.bufferbloat.net, Ankit Singla <asingla at ethz.ch>,
>>     Sam Kumar <samkumar at cs.berkeley.edu>
>> Subject: Re: [Starlink] SatNetLab: A call to arms for the next global Internet
>>      testbed
>>
>>> While it is good to have a call to arms, like this:
>> ...  much information removed as I only one to reply to 1 very
>>     narrow, but IMHO, very real problem in our networks today ...
>>
>>> Here's another piece of pre-history - alohanet - the TTL field was the
>>> "time to live" field. The intent was that the packet would indicate
>>> how much time it would be valid before it was discarded. It didn't
>>> work out, and was replaced by hopcount, which of course switched
>>> networks ignore and isonly semi-useful for detecting loops and the
>>> like.
>>
>> TTL works perfectly fine where the original assumptions that a
>> device along a network path only hangs on to a packet for a
>> reasonable short duration, and that there is not some "retry"
>> mechanism in place that is causing this time to explode.  BSD,
>> and as far as I can recall, almost ALL original IP stacks had
>> a Q depth limit of 50 packets on egress interfaces.  Everything
>> pretty much worked well and the net was happy.  Then these base
>> assumptions got blasted in the name of "measurable bandwidth" and
>> the concept of packets are so precious we must not loose them,
>> at almost any cost.  Linux crammed the per interface Q up to 1000,
>> wifi decided that it was reasable to retry at the link layer so
>> many times that I have seen packets that are >60 seconds old.
>>
>> Proposed FIX:  Any device that transmits packets that does not
>> already have an inherit FIXED transmission time MUST consider
>> the current TTL of that packet and give up if > 10mS * TTL elapses
>> while it is trying to transmit.  AND change the default if Q
>> size in LINUX to 50 for fifo, the codel, etc AQM stuff is fine
>> at 1000 as it has delay targets that present the issue that
>> initially bumping this to 1000 caused.
>>
>> ... end of Rods Rant ...
>>
>> --
>> Rod Grimes                                                 rgrimes at freebsd.org
>> _______________________________________________
>> Starlink mailing list
>> Starlink at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> Starlink mailing list
> Starlink at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
> 
> 
> ------------------------------
> 
> End of Starlink Digest, Vol 4, Issue 21
> ***************************************
>