I wanted to look at a few things - cpu usage, 4 different tcps,
different server schedulers, ecn vs non-ecn, sqm fq_codel simplest.qos
vs cake, etc, etc. I just did tcp_ndown tests of 128 flows to see what
happened for starters. I also tried to capture tcp_cwnd, etc, stats,
but that seems not to be plotting in flent....

My first objective, of course, was to make sure upstream cake didn't
crash. It didn't. No real surprises, cake ecn causes some more
collateral inter-packet latency damage, cake (aside from general cpu
over-usage) is a mild win across the board... including, surprisingly,
tail drop queue depth. (see below)

metric ton of flent files: http://www.taht.net/~d/cake_128flows.tgz
(script therein)
topology: server -> apu2 -> switch -> client (no switch between the
apu2 and server)

A couple notes:

1) ecn_vs_bbr_ineffective.png:

vs bbr in ecn mode, (thus rendering codel or cake ineffective and
reverting to tail drop) "sqm simplest.qos" at this speed (100mbit)
uses a too large packet limit vs what cake uses. I will argue in favor
of using the new "memlimit" parameter to fq_codel in the sqm scripts
to better limit the buffer size.

bbr is remarkably good for taildrop especially given how gnarly and
unreasonable this set of tests is. I was tempted to tweak cake's
memlimit value in half... but certainly also tweak sqm.

see assumptiontaildrop.png

also, cake oscillates interestingly compared against itself in ecn vs
noecn on a couple tests and tcps.

bbr noecn wins in terms of queue depth when cake's aqm dropping it...

2) cpuwise, on egress cake besteffort flows bandwidth 100mbit is
~5-20% slower than the equivalent htb+fq_codel.

gso-splitting or not does not appear to change much cpu at these
speeds, but I imagine it will
start to matter as we approach a gbit.

3) cake starts marking ecn sooner than fq_codel does (gso? cobalt? no
idea). This doesn't do
any good on this 128 flows through 100mbit test (at least on
everything but bbr) as we are bound by capping cwnd reductions I
think. bbr uses gain to control the pacing rate and somewhat ignores
cwnd.

I'm thinking that getting more than one CE per observed RTT should
lead to even more rate reductions in cubic, etc. More CEs per RTT
means your RTT estimate is way out of wack.

Too bad we can't have fractional cwnds!

4) You can clearly see the effect of the giant GSO burp from starting
128 flows at the same time, not that anybody sane would do that....

anyway, back to the day job

-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619