[Cake] Cake latency update

Pete Heist peteheist at gmail.com
Sun Feb 12 07:43:21 EST 2017


As an update on this, I now suspect a problem with either the Ethernet hardware or (more likely) sky2 driver on ‘mbp’, my 2007 MBP that acts as Flent server and where I’m often using a qdisc. I should have looked at dmesg earlier, as there are log entries like this:

-----
[  221.478753] eth0: hw csum failure
[  221.478756] CPU: 1 PID: 1890 Comm: netserver Tainted: G        W       4.8.0-37-generic #39-Ubuntu
[  221.478757] Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS    MBP41.88Z.00C1.B03.0802271651 02/27/08
[  221.478762]  0000000000000286 000000003844a735 ffff9c293fd03ba8 ffffffffb5c30e12
[  221.478765]  ffff9c293a505000 ffffffffb66fb5c0 ffff9c293fd03bc0 ffffffffb5f7c028
[  221.478769]  ffff9c29399ea800 ffff9c293fd03be0 ffffffffb5f71f26 af75267500000000
[  221.478770] Call Trace:
[  221.478775]  <IRQ>  [<ffffffffb5c30e12>] dump_stack+0x63/0x81
[  221.478778]  [<ffffffffb5f7c028>] netdev_rx_csum_fault+0x38/0x40
[  221.478781]  [<ffffffffb5f71f26>] __skb_checksum_complete+0xb6/0xc0
…
[  226.478373] net_ratelimit: 386 callbacks suppressed
[  226.478378] eth0: hw csum failure
[  226.479523] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-37-generic #39-Ubuntu
[  226.479527] Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS    MBP41.88Z.00C1.B03.0802271651 02/27/08
[  226.479533]  0000000000000286 f78e43dca42a09d0 ffff9c293fd03b88 ffffffffb5c30e12
[  226.479542]  ffff9c293a505000 ffffffffb66fb5c0 ffff9c293fd03ba0 ffffffffb5f7c028
[  226.479549]  ffff9c2932093b00 ffff9c293fd03bc0 ffffffffb5f71f26 46898f6100000000
[  226.479557] Call Trace:
[  226.479560]  <IRQ>  [<ffffffffb5c30e12>] dump_stack+0x63/0x81
[  226.479581]  [<ffffffffb5f7c028>] netdev_rx_csum_fault+0x38/0x40
-----

What’s interesting is that they only occur during testing, and when QoS with rate limiting is applied (Cake or HTB+X also). It’s also interesting that they occur on exactly 5 second intervals, not every 5 seconds, but sometimes after 10, or 15 seconds, but on 5 second intervals. I went back and looked at my results, and realized that a very large number of the latency and throughput shifts I saw are also quantized to 5 second intervals. I don’t think that’s a coincidence.

I saw Dave posted something that he saw a similar 'hw csum failure' on raspi earlier in 2016:

https://github.com/raspberrypi/linux/issues/1371 <https://github.com/raspberrypi/linux/issues/1371>

but since I’ve also seen more reports of this over the years with no clear solution.

Why I saw it more with Cake than other qdiscs I don’t know, but I think it’s safe to say there’s no point in you trying to reproduce this until I can get past this with my hardware, and also I’m likely going to have to do a re-run of all of my tests after this is sorted out.

Pete

> On Feb 10, 2017, at 1:21 PM, Pete Heist <peteheist at gmail.com> wrote:
> 
> 
>> On Feb 10, 2017, at 12:35 PM, Sebastian Moeller <moeller0 at gmx.de <mailto:moeller0 at gmx.de>> wrote:
>> 
>> Hi Pete,
>> 
>>> On Feb 10, 2017, at 12:08, Pete Heist <peteheist at gmail.com <mailto:peteheist at gmail.com>> wrote:
>>> 
>>> Not a problem. I’ll run a spread of Cake and fq_codel over Ethernet at various bandwidths. It will be through their Apple USB Ethernet adapters (used now for management), which are also connected through a switch, but I think that setup should be fine for this purpose. Should be done in a hour or so and we’ll see…
>> 
>> 	I believe the Apple USB dongles are fastEthernet only, at least the USB2 types I have available here, which for your tested bandwidth would work, but it will not allow you test at what shaper rate things go pear shaped… Also it wifi creates a bit more CPU load than wired ethernet, it _might_ make sense to concurrently excercise the WIFI cards just to re-create the SIRQ load (but probably not as the first experiment ;) ).
>> 
>> Best Regards
>> 	Sebastian 
> 
> Hi Sebastian, yes, they’re only 100 Mbit, but that’s enough to cover the rates where I was seeing the problem with Wi-Fi. Also in my test setup there are four nodes connected as described under Configuration #1:
> 
> http://www.drhleny.cz/bufferbloat/wifi_bufferbloat.html <http://www.drhleny.cz/bufferbloat/wifi_bufferbloat.html>
> 
> I’m running Cake on ‘mini’ and ‘mbp’, and the Wi-Fi radios are only on ‘om1’ and ‘om2’, so the CPU load shouldn’t be different for mini and mbp when connected directly via Ethernet, instead of via Ethernet and a Wi-Fi link, I suppose.
> 
> I think we just wanted to see if the throughput shifting would reproduce over Ethernet at the same rates, but so far it didn’t for me, although there are other anomalies that don’t look like the throughput shifts I sent before (there’s a throughput anomaly for Cake 20Mbit and latency anomalies for fq_codel 60Mbit and 90Mbit):
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_10mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_10mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_20mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_20mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_30mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_30mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_40mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_40mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_50mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_50mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_60mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_60mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_70mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_70mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_75mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_75mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_80mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_80mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_85mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_85mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_90mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_90mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/cake_hd-eth_100mbit/index.html <http://www.drhleny.cz/bufferbloat/cake_hd-eth_100mbit/index.html>
> 
> fq_codel:
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_10mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_10mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_20mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_20mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_30mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_30mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_40mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_40mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_50mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_50mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_60mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_60mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_70mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_70mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_75mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_75mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_80mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_80mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_85mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_85mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_90mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_90mbit/index.html>
> 
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_100mbit/index.html <http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth_100mbit/index.html>
> 
> So that suggests that the throughput shifting problem may also be somehow related to Wi-Fi. I’m still going to be testing Chaos Calmer, as well as two Ubiquiti NanoStation M5’s, though this will take some more time. We might learn some more from this, or if you can reproduce it with ath9k hardware that would be good too...
> 
> Thanks,
> Pete
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20170212/21ee0afb/attachment-0001.html>


More information about the Cake mailing list