[Cake] cake's flaws

Wed Jun 1 12:09:14 EDT 2016

I just got back from vacation - and did not intend message 1 to come
across as this cranky.

Certainly my take on "finishing cake" was to get more users using it
and providing feedback, and getting it into lede mainline will make
for more users as well as make it possible for me and others to easily
test again, and once it showed at least a few benefits, and perhaps
grew or lost a few more features, push towards mainline into linux.

I applaud this. Keep at it. No matter how grouchy I sound below, I am
rooting for y'all to get it right.

But:

*My* principal design goal for "cake" was to pour the existing
sqm-scripts into C, where it would be faster, and to scale well across
a bandwidth range of 0-40Gbit+ with older and modern hardware.
Everybody else here does not have a need for performance much higher
than a few mbit, it seems, and that colors your viewpoints. I'd at
least like to get inbound shaping to work well at 400Mbits...

Cake *was*  - last june - significantly faster than htb + fq_codel,
enough so to do 100Mbit inbound queue management where htb+fq_codel
fell over at 60Mbit on things like the wndr3800 and archer c7v2. No
longer. It benched as slower when I last benched it (in december), and
with incremental sub percentage point improvements or disimprovements,
including many issues in the codel implementation, "presumed" fixed.

It was a huge percentage slower than pfifo_fast on 10GigE and higher.

At which point I gave up, went back to htb+fq_codel, and focused all
my energies on building up my ability to work on wifi, where we are
now showing comfortable order-of-magnitude gains and real progress.

I do see many - undertested - features "improving" this that or the
other thing have landed since I last paid attention. I do fear it is
expected of me and toke to take cake through a serious string of
tests, and my pushback has been to ask that those working on it and
testing it first work with the fleet of flent servers worldwide to
test the codel implementation, at least, first, and preferably have a
few boxes locally to be able to test other features, or something in
the cloud, perhaps leveraging mahi-mahi or some other framework.

It will be easier for me to do drive-by tests of cake again once it
hits lede mainline.

Open issues:

cake still lacks a "sqm" mode - 3 tiers of shaping - which makes it
impossible to benchmark properly vs the sqm-scripts. I still see no
proof that more tiers help in any way, nor any testing or proof that
it (or the hfsc-fq_codel stuff that landed more recently) that it is
any better. fq_codel's natural characteristics solve for VOIP just
fine, in particular. 3 tiers has been enough for every other qdisc
(:cough: pfifo_fast, mqprio) since the dawn of linux time.

I also preferred to statically generate the parameters for each
diffserv related model, saving tons of code AND resulting in shared
data for it (increasingly important with hw mq) I also thought isps
and some users would want a more strict prio queue model available,
similar to what free.fr is using, which makes managing tv multicast
easier.

I think the quantum should be even more dynamic than it is today,
scaling up to 3028 as sch_fq does (say, starting at 200-500mbit), and
it should go back to peeling less hard. I am aware I am the one that
ripped it out (in favor of testing better what we had)...

I do not see any proof that the triple isolation mode for torrents
does any better than the regular mode for torrents, against real
torrents - or any other forms of normal multi-user traffic Somebody
prove that cake's mode for this actually makes a difference, please.

I thought the invsqrt cache was pointless, and most of the other
tweaks to codel needed testing and evaluation, and all of them cost
cpu.

Register usage was poor on arm and mips architectures.

I did not see a functional use for the rate estimator.

I felt nearly all of the statistics collection could be dropped.

In terms of API - the rate limiter does not work above 40GBit.

*All* the new hardware I have played with of late does 4 or more
hardware queues (on inbound and outbound), and finding ways to handle
those within a single qdisc across those cpus sounds like an
increasingly good idea. They are doing that for CPU efficiency, not
QoS.

That said, even basic support for BQL has been lacking in those arches
(I'm looking at you linksys ac1200!)

And I'd hoped that sane ways of leveraging cake from an ISP's
perspective would emerge, which seems to involve lots more tc or
iptables magic yet to be written.

Conceptually I do love the idea of a set associative cache, but as for
actual measurements of it's helpfulness, I have very little to show
for it's benefits thus far.

please keep banging the rocks together, and *please* benchmark the
thing at higher rates and RTTs.