On Feb 1, 2019, at 12:09 AM, Toke Høiland-Jørgensen <
toke@redhat.com> wrote:
1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five
cake instances?
Probably because it's running out of kernel memory? How much systemmemory do you have on the system you are testing this on?
Plenty of memory (used 131308, free 1911900). I’m guessing this was by design where earlier kernels allocated a smaller initial size for tail space, but that’s only a guess as I haven’t found where that’s done.
2) Is calling sch_tree_unlock the right thing to do in the failure
case, or am I working around a kernel bug, and doing something that
would fail in other kernels?
Yes, I think you are working around a kernel bug. Seehttps://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_api.c#L1330The lock is taken in gnet_stats_start_copy_compat() and released ingnet_stats_finish_copy(). The latter is skipped in the failure path. Itseems this bug is present all the way up to Eric's change to remove thelocking entirely (which went into 4.8). So I guess you could get a patchaccepted for the stable trees in 3.16 and 4.4; not that this would helpyou much if you are stuck on 3.16.7…
Hehe, “crossing the streams” here. :) That’s what I gathered after looking at that code for a while, but I’m glad to be sure about it.