[Bloat] [Cerowrt-devel] beating the drum for BQL

Fri Aug 24 03:05:38 EDT 2018

On Thu, 23 Aug 2018, Dave Taht wrote:

> I should also point out that the kinds of routing latency numbers in
> those blog entries was on very high end intel hardware. It would be
> good to re-run those sort of tests on the armada and others for
> 1,10,100, 1000 routes. Clever complicated algorithms have a tendency
> to bloat icache and cost more than they are worth, fairly often, on
> hardware that typically has 32k i/d caches, and a small L2.

My testing has been on OpenWrt with 4.14 on intel x86-64. Looking how the 
box behaves, I'd say it's limited by context switching / interrupt load, 
and not actually by CPU being busy doing "hard work".

All of the fast routing implementations (snabbswitch, FD.IO/VPP etc) they 
take away CPU and devices from Linux, and runs busy-loop with polling a 
lot of the time, an never context switching which means L1 cache is never 
churned. This is how they become fast. I see potential to do "XDP 
offload" of forwarding here, basically doing similar job to what a 
hardware packet accelerator does. Then we can optimise forwarding by using 
lessons learnt from the other projects potentially. Need to keep the 
bufferbloat work in mind when doing this though, so we don't make that bad 
again.

-- 
Mikael Abrahamsson    email: swmike at swm.pp.se