[Cake] [Codel] Proposing COBALT

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Mon Jun 27 11:18:57 EDT 2016

On 27/06/16 04:56, Jonathan Morton wrote:
>> On 4 Jun, 2016, at 22:55, Jonathan Morton <chromatix99 at gmail.com> wrote:
>> COBALT should turn out to be a reasonable antidote to sender-side cheating, due to the way BLUE works; the drop probability remains steady until the queue has completely emptied, and then decays slowly.  Assuming the congestion-control response to packet drops is normal, BLUE should find a stable operating point where the queue is kept partly full on average.  The resulting packet loss will be higher than for a dumb FIFO or a naive ECN AQM, but lower than for a loss-based AQM with a tight sojourn-time target.
>> For this reason, I’m putting off drafting such an explanation to Valve until I have a chance to evaluate COBALT’s performance against the faulty traffic.
> The COBALTified Cake is now working quite nicely, after I located and excised some annoying lockup bugs.  As a side-effect of these fixes (which introduced a third, lightly-serviced flowchain for “decaying flows”, which are counted as “sparse” in the stats report), the sparse and bulk flow counts should be somewhat less jittery and more useful.
Encouraging.  I've a patch ready to go into LEDE for the tc side of 
things and I've just locally built a new package for the 'cake/cobalt' 
alternative implementation.  I was just about to issue the pull request 
into LEDE for the tc update when a thought occurred.....
> I replaced the defunct “last_len” stat with a new “un_flows”, meaning “unresponsive flows”, to indicate when the BLUE part of COBALT is active.  This lights up nicely when passing Steam traffic, which no longer has anywhere near as detrimental effect on my Internet connection as it did with only Codel; this indicates that BLUE’s ECN-blind dropping is successfully keeping the upstream queue empty.  (Of course it wouldn’t help against a UDP flood, but nothing can do that in this topology.)
bearing in mind 'last_len' was originally a u32, and something makes me 
think that 'unresponsive flows' really shouldn't go about 65535...and 
all the other flow category stats are u16..... would it be better to 
split the u32 space into two u16 ie.  u16 un_flows and u16 spare?  It's 
a very minor nitpick but I think worth doing now as highlights potential 
spare space should another type of flow stat be required.

Similarly I updated the 'no last_len' pull request into sch_cake which I 
think is more in line with the direction taken on 'cobalt' - will only 
merge with permission.  I'd be happy to also update the LEDE related 
package to point to that merge should it happen.

How do you feel about switching that package to the cobalt variant for 
wider stress testing?


> While working on this, I also noticed that the triple-isolation logic is probably quite CPU-intensive.  It should be feasible to do better, so I’ll have a go at that soon.  Also on the to-do list is enhancing the overhead logic with new data, and adding a three-class Diffserv mode which Dave has wanted for a while.
> I’ve also come up with a tentative experimental setup to test the “85% rule” more robustly than the Chinese paper found recently.  I should be able to do it wth just three hosts, one having dual NICs, and using only Cake and netem qdiscs.
> Now if only the sauna were not the *coolest* part of my residence right now…
>   - Jonathan Morton

More information about the Cake mailing list