[Cake] lockup with multiple cake instances on 3.16.7

Pete Heist pete at heistp.net
Thu Jan 31 18:18:13 EST 2019


> On Feb 1, 2019, at 12:09 AM, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
>> 
>> 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five
>> cake instances?
> 
> Probably because it's running out of kernel memory? How much system
> memory do you have on the system you are testing this on?

Plenty of memory (used 131308, free 1911900). I’m guessing this was by design where earlier kernels allocated a smaller initial size for tail space, but that’s only a guess as I haven’t found where that’s done.

>> 2) Is calling sch_tree_unlock the right thing to do in the failure
>> case, or am I working around a kernel bug, and doing something that
>> would fail in other kernels?
> 
> Yes, I think you are working around a kernel bug. See
> https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_api.c#L1330 <https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_api.c#L1330>
> 
> The lock is taken in gnet_stats_start_copy_compat() and released in
> gnet_stats_finish_copy(). The latter is skipped in the failure path. It
> seems this bug is present all the way up to Eric's change to remove the
> locking entirely (which went into 4.8). So I guess you could get a patch
> accepted for the stable trees in 3.16 and 4.4; not that this would help
> you much if you are stuck on 3.16.7…

Hehe, “crossing the streams” here. :) That’s what I gathered after looking at that code for a while, but I’m glad to be sure about it.

Would you accept my workaround in cake_dump_stats, or rather not?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20190201/53e5aea5/attachment-0001.html>


More information about the Cake mailing list