<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/" class="">http://www.drhleny.cz/bufferbloat/cake/round1/</a><div class=""><br class=""></div><div class="">Round 1 Tarball: <a href="http://www.drhleny.cz/bufferbloat/cake/round1.tgz" class="">http://www.drhleny.cz/bufferbloat/cake/round1.tgz</a></div><div class=""><br class=""></div><div class="">Round 0 Tarball (previous run): <a href="http://www.drhleny.cz/bufferbloat/cake/round0.tgz" class="">http://www.drhleny.cz/bufferbloat/cake/round0.tgz</a><br class=""><div class=""><br class=""></div><div class="">*** Notes/Analysis ***</div><div class=""><br class=""></div><div class="">* New bql tests show the effectiveness of cake’s TSO/GSO/GRO “peeling” vs fq_codel? Or am I seeing an mq artifact on my 4-queue device?</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_fq_codel_nolimit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_fq_codel_nolimit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_cakeeth_nolimit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_cakeeth_nolimit/index.html</a></div><div class=""><br class=""></div><div class="">* Cake holds TCP RTT to half that of fq_codel at 10mbit bandwidth. I like to call this technique of rate limiting well below the interface’s maximum "over-limiting”, which seems to work well with stable point-to-point WiFi connections. (Of course, point-to-multipoint or unstable rates requires the new ath9k/10k driver changes as limiting in this way would not be effective, well explained here- <a href="https://www.youtube.com/watch?v=Rb-UnHDw02o" class="">https://www.youtube.com/watch?v=Rb-UnHDw02o</a>):</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_sfq_10.0mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_sfq_10.0mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_10.0mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_10.0mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_10.0mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_10.0mbit/index.html</a></div><div class=""><br class=""></div><div class="">* Cake at 950mbit performed just as well as fq_codel, vs the round0 runs where fq_codel had a bit of an advantage. Perhaps the addition of the “ethernet” keyword did this?</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_950mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_950mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_950mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_950mbit/index.html</a></div><div class=""><br class=""></div><div class="">** I’m finding the "32 Flows, RRUL Best-Effort” tests fascinating to look at. It might be possible to spot implementation differences between fq_codel and cake from these.</div><div class=""><br class=""></div><div class="">* At 10mbit, cake and fq_codel are better at most things than sfq by an order of magnitude or more. But interestingly, at this bandwidth fq_codel’s results look a bit better than cake, where total bandwidth for fq_codel is higher (4.78/9.12mbit for fq_codel and 3.91/8.63mbit for cake) and ping latency a bit lower (1.79ms vs 1.92ms), and TCP RTT significantly better (~30ms vs ~45 ms). Maybe cake's “ethernet” keyword at these low bandwidths affects a test like this disproportionally?</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_sfq_10.0mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_sfq_10.0mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_10.0mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_10.0mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_10.0mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_10.0mbit/index.html</a></div><div class=""><br class=""></div><div class="">* At 100mbit, the situation reverses, with fq_codel TCP RTT above 10ms and cake around 4.75ms.</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_100mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_100mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_100mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_100mbit/index.html</a></div><div class=""><br class=""></div><div class="">* And then above 200mbit, fq_codel performs considerably better than cake at the 32/32 flow tests. At 900mbit, UDP/ping is 1.1ms for fq_codel and 10ms for cake. TCP RTT is ~6.5ms for fq_codel and ~12ms for cake. Dave’s earlier explanation probably applies here: "Since fq_codel supports superpackets and cake peels them, we have a cpu and latency hit that originates from that. Also the code derived algorithm in cake differs quite significantly from mainline codel, and my principal gripe about it has been that it has not been extensively tested against higher delays."</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_900mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_900mbit/index.html</a></div><div class=""><br class=""></div><div class="">* On the Cake RTT tests, we take about a 15% hit in total TCP throughput at rtt 1ms vs rtt 10ms (1454mbit vs 1700mbit), and a 55% hit at rtt 100us (which is why you’d probably only consider using that on 10gbit links). If we don’t remove the ‘ethernet’ keyword altogether, I guess I’d like to see it at least be 10ms, as TCP RTT only goes from around 0.8ms to 1.8ms, which I don’t think makes a huge latency difference in real world terms. Or it might be another argument for removing datacentre, ethernet and metro altogether, because there are tradeoffs to decide about.</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_10ms_rrulbe_eg_cake_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_10ms_rrulbe_eg_cake_900mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_1ms_rrulbe_eg_cake_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_1ms_rrulbe_eg_cake_900mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_100us_rrulbe_eg_cake_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_100us_rrulbe_eg_cake_900mbit/index.html</a></div><div class=""><br class=""></div><div class="">* I wonder if the UDP flood tests really work at 900mbit:</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_fq_codel_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_fq_codel_900mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_cakeeth_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_cakeeth_900mbit/index.html</a></div><div class=""><br class=""></div><div class="">* Again as before, I’m surprised that srchost/dsthost is much more fair. Numbers that follow are 1-flow/12-flow throughput. For srchost/dsthost, it’s 413/439mbit up, 413/447 down and for dual-srchost/dual-dsthost it’s 126/647mbit up, 77/749mbit down. Rampant speculation: does this have to do with the “peeling”? And should we / do we even do peeling with soft rate limiting? I think I saw it help with bql(?), but I’m not sure I’ve seen it help when rate limited below the interface’s rate.</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_src_cake_dst_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_src_cake_dst_900mbit/index.html</a></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html</a></div><div class=""><br class=""></div><div class="">* I still need a better understanding of what triple-isolate does. It isn’t clear to me from the man page. Results here are similar to dual-srchost/dual-dsthost:</div><div class=""><br class=""></div><div class=""><a href="http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html" class="">http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html</a></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">*** Round 2 Plans ***</div><div class=""><br class=""></div><div class="">- Add bql tests to anywhere rate limiting is used</div><div class="">- Add ethernet keyword to host isolation tests</div><div class="">- Add ethtool output to host info</div><div class="">- Remove or improve flow isolation tests</div><div class=""><div class=""><div class="">- Add host isolation tests with rtt variation (to look again at problem I reported in an earlier thread)</div><div class=""><div class=""><br class=""></div></div><div class="">*** Future Plans ***</div><div class=""><br class=""></div><div class="">- Use netem to make a spread of rtts and bandwidths</div><div class=""><div class="">- Add VoIP tests (I hope to do this with irtt)</div></div><div class="">- Add ack filtering tests</div><div class="">- Test BBR</div></div><div class=""><div class="">- Use qemu to test other archs (I may never get to this, honestly)</div><div class=""><br class=""></div></div></div></div></body></html>