[Bloat] [Cerowrt-devel] beating the drum for BQL

Fri Aug 24 07:24:42 EDT 2018

Mikael Abrahamsson <swmike at swm.pp.se> writes:

> On Thu, 23 Aug 2018, Dave Taht wrote:
>
>> I should also point out that the kinds of routing latency numbers in
>> those blog entries was on very high end intel hardware. It would be
>> good to re-run those sort of tests on the armada and others for
>> 1,10,100, 1000 routes. Clever complicated algorithms have a tendency
>> to bloat icache and cost more than they are worth, fairly often, on
>> hardware that typically has 32k i/d caches, and a small L2.
>
> My testing has been on OpenWrt with 4.14 on intel x86-64. Looking how the 
> box behaves, I'd say it's limited by context switching / interrupt load, 
> and not actually by CPU being busy doing "hard work".
>
> All of the fast routing implementations (snabbswitch, FD.IO/VPP etc)
> they take away CPU and devices from Linux, and runs busy-loop with
> polling a lot of the time, an never context switching which means L1
> cache is never churned. This is how they become fast. I see potential
> to do "XDP offload" of forwarding here, basically doing similar job to
> what a hardware packet accelerator does.

Yup, that would help; we see basically 2-3x improvement in routing
performance with XDP over the regular stack. Don't think there's XDP
support in any of the low-end ethernet drivers yet, though...

-Toke