[Cake] Proposing COBALT

Jonathan Morton chromatix99 at gmail.com
Fri May 20 13:31:35 EDT 2016

> On 20 May, 2016, at 19:03, David Lang <david at lang.hm> wrote:
> On Fri, 20 May 2016, Jonathan Morton wrote:
>>>> If the relative load from the flow decreases, BLUE’s action will begin to leave the subqueue empty when serviced, causing BLUE’s drop probability to fall off gradually, potentially until it reaches zero.  At this point the subqueue is naturally reset and will react normally to subsequent traffic using it.
>>> But if we reach a queue length of codel’s target (for some small amount of time), would that not be the best point in time to hand back to codel? Otherwise we push the queue to zero only to have codel come in and let it grow back to target (well approximately).
>> No, because at that moment we can only assume that it is the heavy pressure of BLUE that is keeping the queue under control.  Again, this is an aspect of Codel’s behaviour which should not be duplicated in its alternate, because it depends on assumptions which have been demonstrated not to hold.
>> BLUE doesn’t even start backing off until it sees the queue empty, so for the simple and common case of an isochronous flow (or a steady flood limited by some upstream capacity), BLUE will rapidly increase its drop rate until the queue stops being continuously full.  In all likelihood the queue will now slowly and steadily drain until it is empty.  But the load is still there, so if BLUE stopped dropping entirely at that point, the queue would almost instantly be full again and it would have to ramp up from scratch.
>> Instead, BLUE backs off slightly and waits to see if the queue *remains* empty during its timeout.  If so, it backs off some more.  As long as the queue is still serviced while BLUE’s drop probability is nonzero, it will back down all the way to zero *if* the traffic has really gone away or become responsive enough for Codel to deal with.
>> Hence BLUE will hand control back to Codel only when it is sure its extra effort is not required.
> How about testing having BLUE back off not when the queue is empty, but when the queue is down to the codel target.
> If it's efforts are still needed, it will ramp back up, but waiting for the queue to go all the way to zero repeatedly, with codel still running, seems like a pessimistic way to go that will over-penalize a flow that eventually corrects itself.

I can think of three cases where BLUE might trigger on a flow which is, in fact, responsive.

1) The true RTT is *much* longer than the estimated RTT - this could reasonably happen if the estimated RTT is “regional” grade but the flow is going over a satellite link or two.  Since the queue limit is auto-tuned (in Cake) with the estimated RTT in mind, it’s possible for a TCP in slow-start to bounce off the end of that queue if it has a very long RTT, even though Codel is frantically signalling to it over ECN.  The damage here is limited to a few, widely-spaced dropped packets which are easily retransmitted, with no RTOs, and a somewhat excessive (temporary) reduction in the congestion window.

2) Something downstream is munging the ECN bits and erasing Codel’s signal.  This should be a rare situation in practice, but effectively leaves BLUE in sole charge of managing the queue length via packet drops.  It should be adequate at that job.

3) The flow is managed by DCTCP rather than a normal TCP.  DCTCP’s response to ECN signalling is much softer than standard, and I suspect Codel will be unable to control it, because it doesn’t behave the way DCTCP expects (which is something more akin to a specially configured RED).  However, DCTCP is (supposed to be) responsive to packet drops to the standard extent, so as in case 2, BLUE will take sole charge.

I don’t really see how leaving Codel “primed” to take over from BLUE helps in any of these situations.  However, Codel’s additional action superimposed on BLUE might result in a stable state emerging, rather than the queue length oscillating between “empty” and “full”.  This is best achieved by having BLUE’s down-trigger *below* that of Codel.

 - Jonathan Morton

More information about the Cake mailing list