[Codel] [Bloat] The challenge

Wed May 9 19:34:52 PDT 2012

On 9 May, 2012, at 4:04 am, Dave Taht wrote:

> Both the acm queue article and jim's blog entry this morning were way
> above mensa's standards.

Does that mean we're allowed to feel extra smart if we understood it anyway?  :-)

> Nobody has attempted to explain the elegant simplicity of the
> algorithm itself in the inverse sqrt however!  I have a good grip on
> it, and am trying, but can barely explain it to myself. Anyone else
> care to dig through the codel code and try to put it into english?

Sounds like fun!

I have a suspicion myself of what the quantitative reasoning behind the inverse sqrt might be, but I also suspect that such details are much less important than some of the *qualitative* properties of the algorithm.  Most likely the people that most need to be convinced will respond better to that than to hard maths anyway.

I think however that everyone has missed the single most important property: the fact that drops occur on *dequeue* rather than enqueue, and the result that this keeps the "resonant frequency" of each flow near the physical RTT by minimising the delay between a packet drop and the TCP's detection of and response to it.

It is counter-intuitive to keep a queue filled with packets that you might drop later - as an engineer you tend to think it is more efficient to drop a packet *before* it starts to consume space in your precious queue.  This is a good deal of what needs to be solved in wireless drivers - there is a wrong assumption in there that dropping at the tail is all that is needed, and that leads to wrong design decisions.

Tail-drop and RED drop on enqueue (at tail), not dequeue (at head), so the missing packet cannot be detected by the TCP until it's "bubble" has passed through the queue.  This sets the resonant frequency lower than the true RTT and delays the TCP's response, so the queue remains overfilled for longer (triggering extra packet drops which require extra recovery) and recovery (through retransmission) also takes longer and is less reliable.

Another point that might be missed by people naively reading the paper - as opposed to regular readers of AQM literature - is that "drop" can actually mean "mark with ECN".  Looking at the code, ECN is indeed used when appropriate.  This is good because it signals the TCP without forcing a retransmission - and without "wasting" the space in the queue occupied by the packet.  This is a good reinforcement of the principle that tail-dropping is no longer the right choice.

Important property #3 is that the queue responds immediately when it's length decreases below the threshold, by stopping the marking/dropping process entirely.  This helps to maintain the high-frequency signal (of "too fast" versus "slow enough") to the TCPs.  The response to exceeding the threshold is slower, but only to the extent of avoiding over-reaction to a natural burst of traffic.

Now about the inverse sqrt: qualitatively, it is just a convenient method of gradually varying the mark/drop rate in terms of a time interval rather than a packet count interval.  The longer the queue remains overfull - controlled by the number of mark/drop events that have occurred - the higher the marking/dropping rate gets.  If the queue then empties (and stays empty-ish for a few RTTs), the rate is reset.  Meanwhile, if the queue regularly oscillates between "full" and "empty" states, the marking/dropping rate is remembered so that aggressive TCPs receive the correct magnitude signal.

Now as for the *quantitative* reason behind it, it is because as the interval between drops gets shorter, the number of increments per RTT goes up because there is one increment per drop.  If the interval is shortened linearly per increment, that means it will in practice shorten exponentially faster, such that the drop rate runs away faster than the TCP can reasonably be expected to respond.  This would result in excessive drop rates and oscillatory behaviour typical of an over-sensitive control system.  By basing the drop rate on the square root of the incremented variable, the runaway behaviour is curtailed since the drop interval now shortens linearly per RTT.

Maybe a diagram or animation would help to clarify that last paragraph.  I'm fairly sure I can draw one.

So overall we have an AQM that provides a low-latency signal of appropriate magnitude to TCPs when the link is genuinely congested, and gets out of the way when it isn't.  Combined with a fair queueing discipline (eg. SFQ or QFQ), I think this will turn out to be an excellent default setting for all sorts of equipment.  What would it take to get this into a DSL modem (at either end of the link)?

 - Jonathan Morton