[Cerowrt-devel] [Cake] openwrt build available with latest cake and fq_pie

Sun Jun 14 13:19:15 EDT 2015

> On 14 Jun, 2015, at 19:09, Dave Taht <dave.taht at gmail.com> wrote:
> 
> I do pretty strongly think count - 1 is the rightest thing still.

I really don’t.  Here’s why:

Every time Codel triggers the dropping state, it will mark or drop at least one packet, and increment count by that number.  With count decremented only by 1 on recovery, it will effectively remain constant *if*, by some miracle, the queue empties before the second signal was sent; it cannot decrease between episodes unless it resets or wraps.

With count decremented by 2 on recovery, it is possible for count to decrease slowly in that ideal case, but it’ll remain constant if two signals were sent before the queue cleared, and - this is important - it will always continue to increase if three or more signals are sent before the queue empties.

If one signal did suffice to clear the queue, then logically the value of count was irrelevant to that congestion episode and shouldn’t be preserved.  This is true regardless of the actual reason the queue emptied.

The problem arises when more than one signal is sent before the queue is observed to clear.  This could be a sign of several distinct network conditions:

- The RTT is longer than interval / sqrt(count), in which case one signal would still have been sufficient, and the ideal value of count is less than its current value.  On non-ECN TCP flows, this results in more retransmissions than necessary.

- The RTT is much shorter than interval / sqrt(count), so the congestion window is recovering faster than the signalling rate, and count needs to increase to compensate for that.

- There is more than one flow sharing the queue, and it was necessary to signal to all of them, in which case count should reflect the flow count and be capable of adjusting both up and down.

- The flow is unresponsive, so count should adjust to provide the correct dropping rate, and RTT is irrelevant.  With default parameters, the maximum drop rate is presently 25600 pps (which would cause count to wrap after a few seconds, until I put in the saturating arithmetic).

How does Codel distinguish between those cases?  It can’t - at least, not reliably.  So it must allow count to increase until the queue is observed to be controlled, and then decrease count by some other means to cover the case where it was overestimated.  For this latter phase, count-2 is obviously insufficient to cope with the case where count is actually correct, but more than one signal per episode is required.

*That* is why I put in count/2.  A multiplicative decrease allows count to stabilise at some value which adequately controls the queue, rather than continuously increasing past it.  For the typical cake case where there is one flow per Codel instance and the RTT is of Internet scale, this should work at least as well as an additive decrease; in particular, the behaviour is identical where count ended at 2, 3 or 4 (it can’t end at 1).

Of course, hard data would help to evaluate it, but I do think it’s theoretically sound.

 - Jonathan Morton