[Starlink] on using and debugging lossless congestion control (RFC3168 ECN)

Dave Taht dave.taht at gmail.com
Mon Jun 21 04:41:52 EDT 2021


It came up recently that a lot of folk don't like the idea of dropping
packets as a congestion
control mechanism. Like loss or not... the fundamental paper on this
subject is Van Jacobson and Mike Karels:

"On congestion avoidance and control"

http://web.stanford.edu/class/cs244/papers/CongestionControl.pdf

which is a really great read, IMHO. A classic. I like telling the
backstory a lot. "The Internet was down!"

Certainly as RTTs grow, the cost of recovering a lost packet also grows,
and ECN was designed to help out greatly at planet girdling RTTs for
interactive traffic.

Explicit congestion notification (ECN, RFC3168) has a long history
going back to the mid-80s
and I'd like to invite kk in for a talk about it. Sally Floyd (sadly
now deceased)
also wrote many cogent papers on it as the concept evolved that are
worth reading.

The RFC3168 mechanism is that: you mark packets as ecn-capable ECT(0)
in the IP header, and an ECN-capable aqm along the path can mark as
"CE", rather than drop, packets, when
the flow needs to be signaled to slow down (at the time, by half) ECN
was essentially co-designed with the RED algorithm back in the early
90s but took a decade longer to be standardized, and worse, when first
tried out over the broader internet, twiddling these bit crashed some
major routers so it was turned off and sometimes bleached out. More
active testing resumed in 2011...

anyway... RFC3168 defines that a mark should be treated the same as a
drop by the endpoint (but it's more complicated than that)

Both sch_cake and fq_codel enable RFC3168-style handling of ecn *by default*.
fq_codel is the nearly universal qdisc in most major linux distribution kernels.
FQ_codel for wifi also enables it by default but in a somewhat limited
fashion in older linux releases.[1]

All the implementations of these algorithms keep statistics. TCP_INFO
in linux, also, returns whether
the CE mark has been seen.

Wireshark can be used to tear apart captures and tcptrace -G/xplot.org
can also see marked packets.

Most endpoints will serve ECN-enabled requests. Enabling ecn support
on an endpoint in both directions
is a single sysctl. [0]

Apple products request ECN only sometimes nowadays however, there is a
sysctl to disable the heuristic for this
so you can have it on all the time.

Anyway...

I often use ECN marks merely to determine if an AQM is working
properly. Loss should then be
coming from something else. For a extreme example of this usage, see:
http://blog.cerowrt.org/post/crypto_fq_bug/

Having RFC3168 ecn enabled on your path and on your host machine is
the only way to get an A+ score on the dslreports bufferbloat test.

It.just.works with every major operating system on the planet if enabled.

Lastly.

The future of ECN debate over changes to RFC3168, L4S and SCE in the
ietf tsvwg working group has been the least fun debate of my entire
life, but I don't want to talk about that today. I have a small fix to
the RFC3168 compliant behavior of cubic that I am liking a lot in
simulation and long to test at scale.

I do encourage people to try out ECN as it exists today and see if
they can measure any
difference in behaviors induced. Years ago we put it into "mosh" for
example, so for interactive traffic
we do not lose those packets (and we put in a vastly more robust
response than RFC3168 - cutting the mosh frame rate from a max of
60fps to 1 or 2FPS on a CE)

[0] https://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN/
[1] Other linux mainline AQMs with ECN support include RED, codel, pie
& fq-pie, but off by default.

-- 
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC



More information about the Starlink mailing list