[Bloat] [ih] Installed base momentum (was Re: Design choices in SMTP)

Mon Feb 13 21:58:48 EST 2023

> ---------- Forwarded message ---------
> From: Jack Haverty via Internet-history <internet-history at elists.isoc.org>
> 
> Even today, as an end user, I can't tell if "congestion control" is
> implemented and working well, or if congestion is just mostly being
> avoided by deployment of lots of fiber and lots of buffer memory in all
> the switching locations where congestion might be expected. That of
> course results in the phenomenon of "buffer bloat".   That's another
> question for the Historians.  Has "Congestion Control" in the Internet
> been solved?  Or avoided?

It's a good question, and one that shows understanding of the underlying problem.

TCP has implemented a workable congestion control system since the introduction of Reno, and has continued to take congestion control seriously with the newer flavours of Reno (eg. NewReno, SACK, etc) and CUBIC.  Each of these schemes reacts to congestion *signals* from the network; they probe gradually for capacity, then back off rapidly when that capacity is evidently exceeded, repeatedly.

Confusingly, this process is called the "congestion avoidance" phase of TCP, to distinguish it from the "slow start" phase which is, equally confusingly, a rapid initial probe for path capacity.  CUBIC's main refinement is that it spends more time near the capacity limit thus found than Reno does, and thus scales better to modern high-capacity networks at Internet scale.

In the simplest and most widespread case, the overflow of a buffer, resulting in packet loss, results in that loss being interpreted as a congestion signal, as well as triggering the "reliable stream" function of retransmission.  Congestion signals can also be explicitly encoded by the network onto IP packets, in the form of ECN, without requiring packet losses and the consequent retransmissions.

My take is that *if* networks focus only on increasing link and buffer capacity, then they are "avoiding" congestion - a strategy that only works so long as capacity consistently exceeds load.  However, it has repeatedly been shown in many contexts (not just networking) that increased capacity *stimulates* increased load; the phenomenon is called "induced demand".  In particular, many TCP-based Internet applications are "capacity seeking" by nature, and will *immediately* expand to fill whatever path capacity is made available to them.  If this causes the path latency to exceed about 2 seconds, DNS timeouts can be expected and the user experience will suffer dramatically.

Fortunately, many networks and, more importantly, equipment providers are now learning the value of implementing AQM (to apply congestion signals explicitly, before the buffers are full), or failing that, of sizing the buffers appropriately so that path latency doesn't increase unreasonably before congestion signals are naturally produced.  This allows TCP's sophisticated congestion control algorithms to work as intended.

 - Jonathan Morton