lockup with cake and veth

Toke Høiland-Jørgensen toke at toke.dk
Fri Jul 6 09:29:28 EDT 2018

Pete Heist <pete at heistp.net> writes:

> I don’t know if we want to call this an issue, but...
> I’m seeing a lockup with cake (and also sfq, but not either pfifo or
> fq_codel), when run over veth devices. Two network namespaces are
> created, one for client and one for server, each with one veth device.
> Netem is added as the root qdisc with a delay of 1ms, and a leaf qdisc
> may be added. Lockups occur on my box when the leaf qdisc is either
> cake or sfq, and I'm running flent’s tcp_ndown test with >= 4 download
> streams. Note that I happen to be running on a quad-core.
> - If no leaf qdisc is added below netem, no lockup occurs.
> - If either pfifo or fq_codel is added below netem, no lockup occurs.
> - If either cake or sfq is the leaf, the lockup occurs.
> The symptoms (lockup with >= 4 streams on a quad-core box), and the
> fact that it occurs with both cake and sfq, make me think that it may
> simply have to do with the code not being re-entrant, which may be the
> case for veth, and this is just by design? maybe something that we
> should consider fixing but wouldn’t be a show-stopper? But that should
> be confirmed.
> I’ll keep investigating, but am sharing the scripts I’m running
> meanwhile in case anyone else wants to look. See README.txt in the
> attached...

Thanks for investigating! I'll take a look later. The fact that it
happens with sfq as well means it's probably not cake-specific, though,
so I don't think we should hold off on the upstream submission until
we've figured it out. Using leaf qdiscs with netem has been dodgy for a
while IIRC...


