[Rpm] Alternate definitions of "working condition" - unnecessary?

Wed Oct 6 19:18:47 EDT 2021

> On 7 Oct, 2021, at 12:22 am, Dave Taht via Rpm <rpm at lists.bufferbloat.net> wrote:
> 
> There are additional cases where, perhaps, the fq component works, and the aqm doesn't.

Such as Apple's version of FQ-Codel?  The source code is public, so we might as well talk about it.

There are two deviations I know about in the AQM portion of that.  First is that they do the marking and/or dropping at the tail of the queue, not the head.  Second is that the marking/dropping frequency is fixed, instead of increasing during a continuous period of congestion as real Codel does.

I predict the consequences of these mistakes will differ according to the type of traffic applied:

With TCP traffic over an Internet-scale path, the consequences are not serious.  The tail-drop means that the response at the end of slow-start will be slower, with a higher peak of intra-flow induced delay, and there is also a small but measurable risk of tail-loss causing a more serious application-level delay.  These alone *should* be enough to prompt a fix, if Apple are actually serious about improving application responsiveness.  The fixed marking frequency, however, is probably invisible for this traffic.

With TCP traffic over a short-RTT path, the effects are more pronounced.  The delay excursion at the end of slow-start will be larger in comparison to the baseline RTT, and when the latter is short enough, the fixed congestion signalling frequency means there will be some standing queue that real Codel would get rid of.  This standing queue will influence the TCP stack's RTT estimator and thus RTO value, increasing the delay consequent to tail loss.

Similar effects to the above can be expected with other reliable stream transports (SCTP, QUIC), though the details may differ.

The consequences with non-congestion-controlled traffic could be much more serious.  Real Codel will increase its drop frequency continuously when faced with overload, eventually gaining control of the queue depth as long as the load remains finite and reasonably constant.  Because Apple's AQM doesn't increase its drop frequency, the queue depth for such a flow will increase continuously until either a delay-sensitive rate selection mechanism is triggered at the sender, or the queue overflows and triggers burst losses.

So in the context of this discussion, is it worth generating a type of load that specifically exercises this failure mode?  If so, what does it look like?

 - Jonathan Morton