[Cake] lockup with multiple cake instances on 3.16.7

Pete Heist pete at heistp.net
Thu Jan 31 18:10:29 EST 2019


> On Jan 31, 2019, at 9:25 PM, Pete Heist <pete at heistp.net> wrote:
> 
> 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five cake instances?

nla_put_u32 is returning -EMSGSIZE, so the skb space in tailroom isn’t large enough (per nla_put doc).

After it fails, cake_dump_stats is called a second time right away, which succeeds. I _think_ what’s happening here is that after it sees -EMSGSIZE, the kernel allocates more tailroom and calls cake_dump_stats again. This doesn’t happen for kernel 4.9.0, it always succeeds, so presumably the initial size is larger.

> 2) Is calling sch_tree_unlock the right thing to do in the failure case, or am I working around a kernel bug, and doing something that would fail in other kernels?

I don’t think we’d want to do this after that same edb09eb17 commit. :) So at a minimum, to unlock after error:

 nla_put_failure:
        nla_nest_cancel(d->skb, stats);
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 8, 0)
+       sch_tree_unlock(sch);
+#endif
        return -1;
 }

The question still is, would that break other kernel versions if an error occurs? If the tree is unlocked again elsewhere in some kernel versions after an error, that would end badly. It looks like in tc_fill_qdisc (sch_api.c) that gnet_stats_finish_copy, which unlocks, is not called if q->ops->dump_stats returns < 0. So I’m not sure how it ever unlocked properly in case of error. Hrm.

Do you think this is a correct patch?



More information about the Cake mailing list