[Cake] cake flenter results round 1

Pete Heist peteheist at gmail.com
Mon Nov 27 06:04:19 EST 2017


http://www.drhleny.cz/bufferbloat/cake/round1/ <http://www.drhleny.cz/bufferbloat/cake/round1/>

Round 1 Tarball: http://www.drhleny.cz/bufferbloat/cake/round1.tgz <http://www.drhleny.cz/bufferbloat/cake/round1.tgz>

Round 0 Tarball (previous run): http://www.drhleny.cz/bufferbloat/cake/round0.tgz <http://www.drhleny.cz/bufferbloat/cake/round0.tgz>

*** Notes/Analysis ***

* New bql tests show the effectiveness of cake’s TSO/GSO/GRO “peeling” vs fq_codel? Or am I seeing an mq artifact on my 4-queue device?

http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_fq_codel_nolimit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_fq_codel_nolimit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_cakeeth_nolimit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/bql_csrt_rrulbe_eg_cakeeth_nolimit/index.html>

* Cake holds TCP RTT to half that of fq_codel at 10mbit bandwidth. I like to call this technique of rate limiting well below the interface’s maximum "over-limiting”, which seems to work well with stable point-to-point WiFi connections. (Of course, point-to-multipoint or unstable rates requires the new ath9k/10k driver changes as limiting in this way would not be effective, well explained here- https://www.youtube.com/watch?v=Rb-UnHDw02o <https://www.youtube.com/watch?v=Rb-UnHDw02o>):

http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_sfq_10.0mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_sfq_10.0mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_10.0mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_10.0mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_10.0mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_10.0mbit/index.html>

* Cake at 950mbit performed just as well as fq_codel, vs the round0 runs where fq_codel had a bit of an advantage. Perhaps the addition of the “ethernet” keyword did this?

http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_950mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_fq_codel_950mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_950mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/eg_csrt_rrulbe_eg_cakeeth_950mbit/index.html>

** I’m finding the "32 Flows, RRUL Best-Effort” tests fascinating to look at. It might be possible to spot implementation differences between fq_codel and cake from these.

* At 10mbit, cake and fq_codel are better at most things than sfq by an order of magnitude or more. But interestingly, at this bandwidth fq_codel’s results look a bit better than cake, where total bandwidth for fq_codel is higher (4.78/9.12mbit for fq_codel and 3.91/8.63mbit for cake) and ping latency a bit lower (1.79ms vs 1.92ms), and TCP RTT significantly better (~30ms vs ~45 ms). Maybe cake's “ethernet” keyword at these low bandwidths affects a test like this disproportionally?

http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_sfq_10.0mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_sfq_10.0mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_10.0mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_10.0mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_10.0mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_10.0mbit/index.html>

* At 100mbit, the situation reverses, with fq_codel TCP RTT above 10ms and cake around 4.75ms.

http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_100mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_100mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_100mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_100mbit/index.html>

* And then above 200mbit, fq_codel performs considerably better than cake at the 32/32 flow tests. At 900mbit, UDP/ping is 1.1ms for fq_codel and 10ms for cake. TCP RTT is ~6.5ms for fq_codel and ~12ms for cake. Dave’s earlier explanation probably applies here: "Since fq_codel supports superpackets and cake peels them, we have a cpu and latency hit that originates from that. Also the code derived algorithm in cake differs quite significantly from mainline codel, and my principal gripe about it has been that it has not been extensively tested against higher delays."

http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_fq_codel_900mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/32flows_eg_cakeeth_900mbit/index.html>

* On the Cake RTT tests, we take about a 15% hit in total TCP throughput at rtt 1ms vs rtt 10ms (1454mbit vs 1700mbit), and a 55% hit at rtt 100us (which is why you’d probably only consider using that on 10gbit links). If we don’t remove the ‘ethernet’ keyword altogether, I guess I’d like to see it at least be 10ms, as TCP RTT only goes from around 0.8ms to 1.8ms, which I don’t think makes a huge latency difference in real world terms. Or it might be another argument for removing datacentre, ethernet and metro altogether, because there are tradeoffs to decide about.

http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_10ms_rrulbe_eg_cake_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_10ms_rrulbe_eg_cake_900mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_1ms_rrulbe_eg_cake_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_1ms_rrulbe_eg_cake_900mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_100us_rrulbe_eg_cake_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/cake_rtt_100us_rrulbe_eg_cake_900mbit/index.html>

* I wonder if the UDP flood tests really work at 900mbit:

http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_fq_codel_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_fq_codel_900mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_cakeeth_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/udpflood_eg_cakeeth_900mbit/index.html>

* Again as before, I’m surprised that srchost/dsthost is much more fair. Numbers that follow are 1-flow/12-flow throughput. For srchost/dsthost, it’s 413/439mbit up, 413/447 down and for dual-srchost/dual-dsthost it’s 126/647mbit up, 77/749mbit down. Rampant speculation: does this have to do with the “peeling”? And should we / do we even do peeling with soft rate limiting? I think I saw it help with bql(?), but I’m not sure I’ve seen it help when rate limited below the interface’s rate.

http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_src_cake_dst_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_src_cake_dst_900mbit/index.html>
http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html>

* I still need a better understanding of what triple-isolate does. It isn’t clear to me from the man page. Results here are similar to dual-srchost/dual-dsthost:

http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html <http://www.drhleny.cz/bufferbloat/cake/round1/hostiso_eg_cake_dsrc_cake_ddst_900mbit/index.html>


*** Round 2 Plans ***

- Add bql tests to anywhere rate limiting is used
- Add ethernet keyword to host isolation tests
- Add ethtool output to host info
- Remove or improve flow isolation tests
- Add host isolation tests with rtt variation (to look again at problem I reported in an earlier thread)

*** Future Plans ***

- Use netem to make a spread of rtts and bandwidths
- Add VoIP tests (I hope to do this with irtt)
- Add ack filtering tests
- Test BBR
- Use qemu to test other archs (I may never get to this, honestly)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20171127/4696c486/attachment-0001.html>


More information about the Cake mailing list