Hello, On Fri, Feb 15, 2019 at 10:45 PM Dave Taht wrote: > I still regard inbound shaping as our biggest deployment problem, > especially on cheap hardware. > > Some days I want to go back to revisiting the ideas in the "bobbie" > shaper, other days... > > In terms of speeding up cake: > > * At higher speeds (e.g. > 200mbit) cake tends to bottleneck on a > single cpu, in softirq. A lwn article just went by about a proposed > set of improvements for that: > https://lwn.net/SubscriberLink/779738/771e8f7050c26ade/ Will this help devices with a single core CPU? > > > * Hardware multiqueue is more and more common (APU2 has 4). FQ_codel > is inherently parallel and could take advantage of hardware > multiqueue, if there was a better way to express it. What happens > nowadays is you get the "mq" scheduler with 4 fq_codel instances, when > running at line rate, but I tend to think with 64 hardware queues, > increasingly common in the >10GigE, having 64k fq_codel queues is > excessive. I'd love it if there was a way to have there be a divisor > in the mq -> subqdisc code so that we would have, oh, 32 queues per hw > queue in this case. > > Worse, there's no way to attach a global shaped instance to that > hardware, e.g. in cake, which forces all those hardware queues (even > across cpus) into one. The ingress mirred code, here, is also a > problem. a "cake-mq" seemed feasible (basically you just turn the > shaper tracking into an atomic operation in three places), but the > overlying qdisc architecture for sch_mq -> subqdiscs has to be > extended or bypassed, somehow. (there's no way for sch_mq to > automagically pass sub-qdisc options to the next qdisc, and there's no > reason to have sch_mq > The problem I deal with is performance on even lower end hardware with a single queue. My experience with mq has been limited. > > * I really liked the ingress "skb list" rework, but I'm not sure how > to get that from A to B. > What was this skb list rework? Is there a patch somewhere? > > * and I have a long standing dream of being able to kill off mirred > entirely and just be able to write > > tc qdisc add dev eth0 ingress cake bandwidth X > Ingress on its own seems to be a performance hit. Do you think this would reduce the performance hit? > > * native codel is 32 bit, cake is 64 bit. I > Was there something else you forgot to write here? > > * hashing three times as cake does is expensive. Getting a partial > hash and combining it into a final would be faster. > Could you elaborate how this would look, please? I've read the code a while ago. It might be that I didn't figure out all the places where hashing is done. > > * 8 way set associative is slower than 4 way and almost > indistinguishable from 8. Even direct mapping > This should be easy to address by changing the 8 ways to to 4. Was there something else you wanted to write here? > > * The cake blue code is rarely triggered and inline > > I really did want cake to be faster than htb+fq_codel, I started a > project to basically ressurrect "early cake" - which WAS 40% faster > than htb+fq_codel and add in the idea *only* of an atomic builtin > hw-mq shaper a while back, but haven't got back to it. > > https://github.com/dtaht/fq_codel_fast > > with everything I ripped out in that it was about 5% less cpu to start > with. > Perhaps further improvements made to the codel_vars struct will also help fq_codel_fast. Do you think this could be improved further? A cake_fast might be worth a shot. > > I can't tell you how many times I've looked over > > https://elixir.bootlin.com/linux/latest/source/net/sched/sch_mqprio.c > > hoping that enlightment would strike and there was a clean way to get > rid of that layer of abstraction. > > But coming up with how to run more stuff in parallel was beyond my rcu-foo. >