Hello,

This answers some of my own questions.

It seems the mirred and ifb combination is indeed what reduces performance
in my case. All optimizations made to fq_codel didn't help with ingress.

A simple fq_police would be a better solution for ingress than cake or
fq_codel.


On Sat, Feb 16, 2019 at 11:35 AM Adrian Popescu <adriannnpopescu@gmail.com>
wrote:

> Hello,
>
> On Fri, Feb 15, 2019 at 10:45 PM Dave Taht <dave.taht@gmail.com> wrote:
>
>> I still regard inbound shaping as our biggest deployment problem,
>> especially on cheap hardware.
>>
>> Some days I want to go back to revisiting the ideas in the "bobbie"
>> shaper, other days...
>>
>> In terms of speeding up cake:
>>
>> * At higher speeds (e.g. > 200mbit) cake tends to bottleneck on a
>> single cpu, in softirq. A lwn article just went by about a proposed
>> set of improvements for that:
>> https://lwn.net/SubscriberLink/779738/771e8f7050c26ade/
>
> Will this help devices with a single core CPU?
>
>
>>
>>
>> * Hardware multiqueue is more and more common (APU2 has 4). FQ_codel
>> is inherently parallel and could take advantage of hardware
>> multiqueue, if there was a better way to express it. What happens
>> nowadays is you get the "mq" scheduler with 4 fq_codel instances, when
>> running at line rate, but I tend to think with 64 hardware queues,
>> increasingly common in the >10GigE, having 64k fq_codel queues is
>> excessive. I'd love it if there was a way to have there be a divisor
>> in the mq -> subqdisc code so that we would have, oh, 32 queues per hw
>> queue in this case.
>>
>> Worse, there's no way to attach a global shaped instance to that
>> hardware, e.g. in cake, which forces all those hardware queues (even
>> across cpus) into one. The ingress mirred code, here, is also a
>> problem. a "cake-mq" seemed feasible (basically you just turn the
>> shaper tracking into an atomic operation in three places), but the
>> overlying qdisc architecture for sch_mq -> subqdiscs has to be
>> extended or bypassed, somehow. (there's no way for sch_mq to
>> automagically pass sub-qdisc options to the next qdisc, and there's no
>> reason to have sch_mq
>>
>
> The problem I deal with is performance on even lower end hardware with a
> single queue. My experience with mq has been limited.
>
>
>>
>> * I really liked the ingress "skb list" rework, but I'm not sure how
>> to get that from A to B.
>>
>
> What was this skb list rework? Is there a patch somewhere?
>
>
>>
>> * and I have a long standing dream of being able to kill off mirred
>> entirely and just be able to write
>>
>> tc qdisc add dev eth0 ingress cake bandwidth X
>>
>
> Ingress on its own seems to be a performance hit. Do you think this would
> reduce the performance hit?
>
>
>>
>> *  native codel is 32 bit, cake is 64 bit. I
>>
>
> Was there something else you forgot to write here?
>
>
>>
>> * hashing three times as cake does is expensive. Getting a partial
>> hash and combining it into a final would be faster.
>>
>
> Could you elaborate how this would look, please? I've read the code a
> while ago. It might be that I didn't figure out all the places where
> hashing is done.
>
>
>>
>> * 8 way set associative is slower than 4 way and almost
>> indistinguishable from 8. Even direct mapping
>>
>
> This should be easy to address by changing the 8 ways to to 4. Was there
> something else you wanted to write here?
>
>
>>
>> * The cake blue code is rarely triggered and inline
>>
>> I really did want cake to be faster than htb+fq_codel, I started a
>> project to basically ressurrect "early cake" - which WAS 40% faster
>> than htb+fq_codel and add in the idea *only* of an atomic builtin
>> hw-mq shaper a while back, but haven't got back to it.
>>
>> https://github.com/dtaht/fq_codel_fast
>>
>> with everything I ripped out in that it was about 5% less cpu to start
>> with.
>>
>
> Perhaps further improvements made to the codel_vars struct will also help
> fq_codel_fast. Do you think this could be improved further?
>
> A cake_fast might be worth a shot.
>
>
>>
>> I can't tell you how many times I've looked over
>>
>> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_mqprio.c
>>
>> hoping that enlightment would strike and there was a clean way to get
>> rid of that layer of abstraction.
>>
>> But coming up with how to run more stuff in parallel was beyond my
>> rcu-foo.
>>
>