[Cake] cake at 60gbit

Toke Høiland-Jørgensen toke at toke.dk
Fri Jul 6 08:04:56 EDT 2018


Pete Heist <pete at heistp.net> writes:

>> On Jul 6, 2018, at 1:33 PM, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>> 
>> AHA! Found the culprit!
>> 
>> The bulk dequeue mechanism in sch_generic.c will dequeue a bunch of
>> packets at once, then check if they belong on the same hardware txq. If
>> they don't, they will be put back on a separate queue in the qdisc
>> structure (sch->skb_bad_txq), and the qlen will be increased, without
>> telling the qdisc about it.
>
> Solid, nice work!

Thanks :)

>> This obviously only happens on hardware with multiple TXQs, which is why
>> the bug doesn't happen on veth.
>
>
> It would be nice if veth were mq capable.
>
> For whatever reason, I didn’t see this on my i210at’s (1gbit ethernet
> with 4 transmit and 4 receive queues).

Well, you have to hit the exact conditions; i.e., a bulk dequeue that
happens to get a bunch of packets that hit different TX queues. So that
depends on both the TXQ hashing, and the queue state, number of flows
etc. I only get a handful of "lockups" (debug lines) on a 10-sec netperf
test with 6 flows.

> I’m now playing with netem, cake and veth for the first time (two
> namespaces with netem as the parent qdisc to cake for each namespace).
> I’ve gotten the setup not to lock up in an infinite loop but to
> occasionally stop passing traffic sometimes after a netperf test. This
> could easily be a problem specific to netns though, so I’ll be playing
> with it some more and will post if I can narrow it down to something
> specific.

Yay, more fun! :P

Please do see if you can narrow this down; it would be good to fix this
as well before we submit another version upstream...

-Toke


More information about the Cake mailing list