[Cerowrt-devel] [Bloat] Bechtolschiem

Fri Jul 2 16:28:52 EDT 2021

> On 2 Jul, 2021, at 7:59 pm, Stephen Hemminger <stephen at networkplumber.org> wrote:
> 
> In real world tests, TCP Cubic will consume any buffer it sees at a
> congested link. Maybe that is what they mean by capture effect.

First, I'll note that what they call "small buffer" corresponds to about a tenth of a millisecond at the port's link rate.  This would be ludicrously small at Internet scale, but is actually reasonable for datacentre conditions where RTTs are often in the microseconds.

Assuming the effect as described is real, it ultimately stems from a burst of traffic from a particular flow arriving at a queue that is *already* full.  Such bursts are expected from ack-clocked flows coming out of application-limited mode (ie. on completion of a disk read), in slow-start, or recovering from earlier losses.  It is also possible for a heavily coalesced ack to abruptly open the receive and congestion windows and trigger a send burst.  These bursts occur much less in paced flows, because the object of pacing is to avoid bursts.

The queue is full because tail drop upon queue overflow is the only congestion signal provided by the switch, and ack-clocked capacity-seeking transports naturally keep the queue as full as they can - especially under high statistical multiplexing conditions where a single multiplicative decrease event does not greatly reduce the total traffic demand. CUBIC arguably spends more time with the queue very close to full than Reno does, due to the plateau designed into it, but at these very short RTTs I would not be surprised if CUBIC is equivalent to Reno in practice.

The solution is to keep some normally-unused space in the queue for bursts of traffic to use occasionally.  This is most naturally done using ECN applied by some AQM algorithm, or the AQM can pre-emptively and selectively drop packets in Not-ECT flows.  And because the AQM is more likely to mark or drop packets from flows that occupy more link time or queue capacity, it has a natural equalising effect between flows.

Applying ECN requires some Layer 3 awareness in the switch, which might not be practical.  A simple alternative it to drop packets instead.  Single packet losses are easily recovered from by retransmission after approximately one RTT.  There are also emerging techniques for applying congestion signals at Layer 2, which can be converted into ECN signals at some convenient point downstream.

However it is achieved, the point is that keeping the *standing* queue down to some fraction of the total queue depth reserves space for accommodating those bursts which are expected occasionally in normal traffic.  Because those bursts are not lost, the flows experiencing them are not disadvantaged and the so-called "capture effect" will not occur.

 - Jonathan Morton