[Cerowrt-devel] The next slice of cake

Tue Mar 17 16:08:39 EDT 2015

After far too long, it looks like I’ll have the opportunity to work on sch_cake a bit more.  So here’s a little bit of a “state of the union” speech about what we’ve got and what I’m planing to add to it.

So far we’ve got a deficit-mode, non-bursting shaper that works pretty well, and an integrated implementation of fq_codel that tunes itself (that is, the target delay) to the bandwidth set on the shaper.  The configuration is “as easy as cake”; the intention is that you can just specify one parameter (the bandwidth to shape at) and leave everything else at the defaults; there simply aren’t very many visible knobs, because they aren’t needed.

We’ve also got Diffserv classification, and that part hasn’t been so successful.  Each class grabs all traffic with some subset of the codepoints, and stuffs them into a separate shaper+fq_codel instance, and the higher-priority shapers steal bandwidth from the lower ones to enforce priority.  High-priority classes can only use a limited amount of bandwidth, exactly as specified in generic Diffserv PHBs.

It works, perfectly as designed, but the resulting behaviour isn’t particularly desirable from an end-user perspective.  In particular, people run tests using best-effort traffic to see how much bandwidth they’re getting, resulting in complaints that cake had to be given a bigger number to get the correct throughput - which of course also stops it from functioning correctly when background traffic is added to the mix.  So that needed a rethink.

Incidentally, the existing Diffserv implementation can be disabled by specifying the “besteffort” keyword.  This lumps all traffic into a single class, handled by a single shaper at the configured rate.  Cake already works pretty well in that mode; sometimes I turn the shaper down to analogue-modem speeds and note, with some satisfaction, that everything *still* works.  Except YouTube, but that’s only because streaming video really does need more than analogue-modem bandwidth.

As for performance, I’m able to make my ancient Pentium-MMX shape at over 50 Mbps, summing traffic in both directions between two bridged Fast Ethernet cards.  This limitation is probably a combination of timer latency and context-switch overhead.  I don’t expect it to improve much, unless we find a way to seriously reduce those overheads (which are already quite low for a modern desktop OS).  A faster machine with better timers gets better performance, of course.

So there are two big things I want to change in the next version:

The easy part (at least in terms of how many unknowns there are) is adjusting the flow-queueing part so that it uses set-associative hashing instead of straight hashing when selecting a queue.  This should reduce the incidence of hash collisions considerably for a given number of flow queues, or conversely provide equivalent collision performance with a smaller number of queues.

The more interesting part is to rework the Diffserv prioritiser so that it behaves more usefully.  I think I’ve hit upon the right idea which should make this work in practice - instead of individually hard-shaping each class, instead use the shaper logic as a threshold function between high and low priority, and instead implement a single shaper to handle all traffic.  The priority function can then be handled by a weighted DRR system - which is already in place, but doesn’t do much - with just that small modification for changing the weights based on the shaper state.

So high-priority traffic gets high priority - but only if it limits itself to a reasonable bandwidth.  Above that bandwidth, it gets low priority, but is still able to use the full shaped bandwidth if nobody else contends for it.  And (unlike say HFSC) we need precisely two parameters per class to do this, both specified as ratios rather than hard bandwidth numbers: a bandwidth share (which determines both the shaper setting and the low-priority-mode DRR weighting) and a priority factor (which determines the high-priority-mode DRR weighting).  So if those knobs end up being exposed to userspace, they’ll be easier to understand and thus use correctly.

All of this feeds my main goal with Diffserv, which is to start giving applications natural incentives to mark their traffic appropriately.  Each class has both an advantage, and a tradeoff which must be accepted to realise that advantage.  If you need absolutely minimal latency, you can choose a high-priority class, but you’ll have to be frugal about bandwidth.  If you need maximum throughput, you’ll have to put up with reduced priority compared to latency-sensitive traffic.  And if you want to be altruistic, you can choose to mark your stuff as bulk, background traffic, and it’ll be treated accordingly.  All of this is in accordance with existing RFCs.

A small caveat: cake is not designed for wifi.  It’s designed for links that can at least be treated as full-duplex to a close approximation.  Shared-medium links *can* behave like that, if they’re shaped to a miserly enough degree, but we really need something different for wifi - although several of cake’s components and ideas could be used in such a qdisc.

Roll on cake3.

 - Jonathan Morton