[Cake] some comprehensive arm64 w/cake results

David P. Reed dpreed at deepplum.com
Mon Sep 18 16:24:50 EDT 2023

On Monday, September 18, 2023 3:50pm, "dave seddon via Cake" <cake at lists.bufferbloat.net> said:

> _______________________________________________
> Cake mailing list
> Cake at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
> G'day Mr David Reed,
> Thanks for the comments.
> Definitely agree with your sentiments and the tests definitely do NOT
> simply represent Intel verse ARM.
> Perhaps I should have been more clear about the objectives of the testing:

It's just an issue I'm sensitive to, because throughout my career I've read "Brand X is slow" when the test was actually testing something else. (An annoying post popped up on Medium today that claimed "WebAssembly doesn't speed up Web applications" based on a badly designed Linux Foundation-commissioned study that the poster misunderstood. The poster also seemed to think that running web applications using the laptop's cycles is bad compared to running web applications exclusively on the server in the cloud). This already had me in a sour mood.

> I'm curious to understand the performance of these lower end SoC devices,
> because these are the types of devices that act as home gateway routers, as
> access points, and such.  There are many many millions of these devices out
> there and I don't know how well understood their performance is:
> e.g. How bad is my Spectrum Internet cable modem?
> e.g. I have a Unifi security gateway and it's "smart queue" performance is
> pretty poor ( <200 Mb/s ).  Why is it so poor?

I'm curious, too! We know that on older home routers, with really slow MIPS processors, Cake struggles with GigE. As these old MIPS designs get phased out and replaced by ARM designs, it will matter.
Raspberry Pi 4's just aren't very good at networking because of their I/O architecture on the board, just as they are slow at USB in general. That's why the CM4 is interesting. It's interesting that the PiHole has gotten so popular - it would run better on an Pi with a better network architecture.

> Obviously, with real servers ( and even virtual AWS ones ) which have real
> NICs, you get things like multi-queues with RSS, and a lot more tuning
> knobs, and so they can go a lot faster.
> In the tests so far, the Asus CN60 device with the r8169 performs pretty
> well, where the NIC is likely to be contributing positively.  The default
> configuration has a bunch of off-loading enabled:
> root at asus-cn60-2:/home/das# ethtool --show-features enp1s0 | grep ": on"
> rx-checksumming: on
> tx-checksumming: on
> tx-checksum-ipv4: on
> tx-checksum-ipv6: on
> generic-receive-offload: on
> rx-vlan-offload: on
> tx-vlan-offload: on
> highdma: on [fixed]
> However, based on these initial tests, which are not complete, it's
> certainly curious that the Pi4 is doing ~923Mbit/s with pfifo_fast and then
> doing significantly less ( ~621 Mbits/sec ) with cake.  I'm interested to
> understand this in more detail, where DaveT has recommended adding 20ms or
> 40ms.  The cake tests so far had rtt 1ms and rtt 3ms, which might be too
> low.  ( If it is too low, then maybe it would make sense to remove "rtt lan
> = rtt 1ms" option, as it's a misleading configuration option? )
> Definitely, during the testing these little devices have the NIC IRQs all
> going through core 0, so I want to explore tuning options.
> root at rpi4b:/home/das# cat /proc/interrupts | grep -E '(CPU0|eth0)'
>            CPU0       CPU1       CPU2       CPU3
>  30:   38651749          0          0          0     GICv2 189 Level
> eth0      <--- IRQs only going to CPU0
>  31:   20418643          0          0          0     GICv2 190 Level
> eth0
> Some ideas include:
> - Moving most processes of core0. e.g. Configure all the systemd slices NOT
> to use core0, so core0 is essentially freed to only service the IRQs
> - RPS (
> https://www.kernel.org/doc/html/latest/networking/scaling.html#rps-receive-packet-steering
> ). e.g. Can the other cores get more involved?
> - Tuning ideas from here:
> https://github.com/leandromoreira/linux-network-performance-parameters.
> Specifically, I was wondering about increasing netdev_budget sysctls.
> The defaults are shown here
> root at rpi4b:/home/das# sysctl -a | grep netdev_budget
> net.core.netdev_budget = 300
> net.core.netdev_budget_usecs = 8000
> "Armbian's kernel isn't a particularly high performance kernel build."
> Happy to discuss any recommended tuning.  Armbrian is very easy to install
> on the microSD card.  ( Actually, I have the LicheePi 4A RISC-V, but can't
> find a easy image to just load on a microSD card. )
> Over the weekend, I reconfigured the testing setup using a lot more VLANs.
> Now each device has ALL the different qdiscs configured on different VLANs
> and IPs, allowing the iperf/flent tests to be run one after the other with
> no need to change the qdiscs between tests.  I'm currently repeating every
> combination of test, before adding the netem 20/40ms latency as DaveT
> suggested.  ( Test take a while: 8 devices * 6 qdiscs = 48 tests, by 10
> minute tests = 480 minutes = 8 hours )
> Roughly the plan is:
> 1. Retest all combinations.  This is to confirm the starting position. <---
> running now
> 2. Add netem latency 20 and 40ms, and retest all combinations.  I'm hoping
> Pi4 cake performance will be closer to > 900 Mb/s
> 3. Apply some tuning options, and retest all combinations
I'm very interested in seeing your results after this.
Grat job so far.
> Kind regards,
> Dave Seddon
> On Sun, Sep 17, 2023 at 6:05 PM Dave Taht <dave.taht at gmail.com> wrote:
>> A huge thanks to dave seddon for buckling down and doing some
>> comprehensive testing of a variety of arm64 gear!
>> https://docs.google.com/document/d/1HxIU_TEBI6xG9jRHlr8rzyyxFEN43zMcJXUFlRuhiUI/edit#heading=h.bpvv3vr500nw
>> --
>> Oct 30:
>> https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.html
>> Dave Täht CSO, LibreQos
> --
> Regards,
> Dave Seddon
> +1 415 857 5102

More information about the Cake mailing list