I wanted to look at a few things - cpu usage, 4 different tcps, different server schedulers, ecn vs non-ecn, sqm fq_codel simplest.qos vs cake, etc, etc. I just did tcp_ndown tests of 128 flows to see what happened for starters. I also tried to capture tcp_cwnd, etc, stats, but that seems not to be plotting in flent.... My first objective, of course, was to make sure upstream cake didn't crash. It didn't. No real surprises, cake ecn causes some more collateral inter-packet latency damage, cake (aside from general cpu over-usage) is a mild win across the board... including, surprisingly, tail drop queue depth. (see below) metric ton of flent files: http://www.taht.net/~d/cake_128flows.tgz (script therein) topology: server -> apu2 -> switch -> client (no switch between the apu2 and server) A couple notes: 1) ecn_vs_bbr_ineffective.png: vs bbr in ecn mode, (thus rendering codel or cake ineffective and reverting to tail drop) "sqm simplest.qos" at this speed (100mbit) uses a too large packet limit vs what cake uses. I will argue in favor of using the new "memlimit" parameter to fq_codel in the sqm scripts to better limit the buffer size. bbr is remarkably good for taildrop especially given how gnarly and unreasonable this set of tests is. I was tempted to tweak cake's memlimit value in half... but certainly also tweak sqm. see assumptiontaildrop.png also, cake oscillates interestingly compared against itself in ecn vs noecn on a couple tests and tcps. bbr noecn wins in terms of queue depth when cake's aqm dropping it... 2) cpuwise, on egress cake besteffort flows bandwidth 100mbit is ~5-20% slower than the equivalent htb+fq_codel. gso-splitting or not does not appear to change much cpu at these speeds, but I imagine it will start to matter as we approach a gbit. 3) cake starts marking ecn sooner than fq_codel does (gso? cobalt? no idea). This doesn't do any good on this 128 flows through 100mbit test (at least on everything but bbr) as we are bound by capping cwnd reductions I think. bbr uses gain to control the pacing rate and somewhat ignores cwnd. I'm thinking that getting more than one CE per observed RTT should lead to even more rate reductions in cubic, etc. More CEs per RTT means your RTT estimate is way out of wack. Too bad we can't have fractional cwnds! 4) You can clearly see the effect of the giant GSO burp from starting 128 flows at the same time, not that anybody sane would do that.... anyway, back to the day job -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619