[Starlink] SatNetLab: A call to arms for the next global Internet testbed

Dave Taht dave.taht at gmail.com
Fri Jul 9 15:19:11 EDT 2021


While it is good to have a call to arms, like this:

https://people.inf.ethz.ch/asingla/papers/satnetlab.pdf

One of my favorite of "kelly johnson's rules" is "The designers of the
airplane must be on the shop floor while the first ones are built". I
strongly feel that new transports, packet scheduling, aqm, and
congestion control mechanisms be co-designed with the l1, l2, and l3
people all locked in the same room, with representatives also from
open source, hardware and software vendors, and academia - with enough
skills to actually implement and test the design as it evolves.

Going to the call for big fat centralized testbeds in this paper...

                "the center cannot hold, it all falls apart" - Yeats (0)

yes, while I do think long term funding and some centralization is
needed for longitudinal studies, by the time a large testbed like
planet lab is built, it is almost always obsolete, with difficult
hurdles for j.random.user to meet, and further, offering services over
some centralized facility just for tests doesn't scale, and without
the services it offered being entirely open source, well... planetlab
closed, and cerowrt - a widely distributed effort - survived and
nearly everything from that bargain basement project made it out into
the linux kernel, into open source, and now fq_codel, in particular,
is into billions of devices. Along the way we spawned three ietf
working groups (aqm, homenet, and babel), and the ipv6 stuff in
openwrt is still (IMHO), the most well thought out of nearly anything
along the edge "out there". IPv6 still suffers from not having had
kelly johnson along to ride herd on everyone.

it is far better to opt for the most decentralized environment you can
and engage with every kind of engineer along the way. Much like how
the original IMPs were spread to universities across the entire united
states, a future "satnet" should be spread across the world, and
should be closely tied to the actual users and innovators also using
it, much like how in the 80s ipv4 outran the ISO stack for usefulness
to real people for real users. ISO was "the future" for so long that
the future outran it, and until the "kobe revolt" ousted the folk in
charge of that top down design from iana mgmt (where vint had that
famous incident baring the "IP on everything t-shirt")... then real
forward progress on commercializing the internet proceeded rapidly.

Anyway, in the satnetlab paper:

The author points to some good work at l3 but completely missed the
real world changes that happened in the past decade at all layers of
the stack.  In passing I note that bbr and delay based e2e congestion
controls work much better with five tuple DRR, SFQ, QFQ, or SQF at the
bottleneck links in the network.

Ankit is right in that BGP ran out of napkins long ago and is so
ossified as to be pretty useless for inter connecting new portions of
the internet. Centralized IGPs like OSPF are probably not the right
thing either, and my bet as to a routing protocol worth leveraging (at
least some of) has been the distance-vector protocol "babel" which has
(so far as I know) the only working implementation of source specific
routing and a more or less working RTT metric.

The other big thing that makes me a bit crazy is that network designs
are NOT laws of nature!, they are protocol agreements and engineering
changes, and at every change you need to recheck your assumptions at
all the other layers in the stack.... [1]

Here's another piece of pre-history - alohanet - the TTL field was the
"time to live" field. The intent was that the packet would indicate
how much time it would be valid before it was discarded. It didn't
work out, and was replaced by hopcount, which of course switched
networks ignore and isonly semi-useful for detecting loops and the
like.

Thought: A new satcom network l2, could actually record the universal
originating time of a packet from a world-wide gps clock (64 bit
header), and thus measuring transit times becomes easy. Didn't have
gps back in the 60s and 70s....

...

To try and pick on everyone equally (including myself! I wasn't
monitoring the fq_codel deployment in the cloud closely and it turns
out at least one increasingly common  virtualized driver (virtio-net)
doesn't have bql in it, leading to "sloshy behavior" of tcp, with >
16MB of data living out on the tx and rx rings) [1]

l3 folk talk about "mice and elephants" far too much when talking
about network traffic.  Years ago we added a new taxonomy to that,
"Ants", which scurry around almost invisibly keeping the network
ecosystem healthy
and mostly need to happen around or below what we think of as l2.

It's easy to show what happens to a network without "ants" in it -
block ARP for a minute and an ethernet network will fail. Similarly,
ipv6 ND. Block address assignment via DHCP... or dns... or take a hard
look at "management frames" in wifi, or if you want to make your head
really hurt, take a deep dive into 3gpp, and see if you can come out
the other side with your brain intact.

To me the "ants" are the most important part of the network ecosystem,
hardly studied, totally necessary.

A misunderstanding about the nature of buffers led to the continuously
unrolling, seemingly endless disaster that is ethernet over powerline
with its interaction of hardware flow control, variable rates, and
buffering [2,3]

I don't want to talk to GPON today.

And then: there are all sorts of useful secrets buried in the history
of the Internet, Aloha, and Arpanet that seem to have been lost on a
lot of people... that we sort of, have been counting on being always
there...

(going back to there being a lot of useful experiments that can get
done on cheap hardware, in a decentralized fashion)

Example:

Recently I was told that at least one breed of "thread" wireless chip
did not have exponential backoff in the (rom!) firmware, which meant
(to me) that any significant density your office's lightbulb array
would suffer congestion collapse on the next firmware update....

... and to test that out required a deep knowledge and expensive gear
to sniff the radio and sitting side by side with an EE type to decode
the signals (an exercise I recommend to any CS major), reverse
decompiling the firmware (which I recommend to EE types)...  or buying
16 of these 8 dollar chips, designing an experiment with a dense mesh,
and beating the heck out of it with real traffic, which is what I
planned to do when I got around to it. I figured the results would be
hilarious for a while... but then I would probably end up worrying
about the financial future of whatever company actually tried to ship
these chips, qty millions, into the field, or the millions of
customers sometimes unable to flick their lightbulbs on or off for no
apparent reason... and thus I haven't got around to powering them up,
and then filing bug reports and so forth, and climbing through 9
layers of VPs committed to long term buying decisions.

Have a good weekend everyone. I'm tapped out. Have a poem.

0) Turning and turning in the widening gyre
The falcon cannot hear the falconer;
Things fall apart; the center cannot hold;
Mere anarchy is loosed upon the world.  -
https://www.sparknotes.com/lit/things/quotes/

1) https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf

2) Interactions between TCP and Ethernet flow controlover Netgear
XAVB2001 HomePlug AV links
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.278.6149&rep=rep1&type=pdf
3) Buffer size estimationTP LINK TL-PA211KITHomePlug AV adapters
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.300.7521&rep=rep1&type=pdf

-- 
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC



More information about the Starlink mailing list