* [Cake] NLA_F_NESTED is missing @ 2020-11-01 10:15 Dean Scarff 2020-11-01 16:53 ` Y ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Dean Scarff @ 2020-11-01 10:15 UTC (permalink / raw) To: cake Hi, I've been happily running the out-of-tree sch_cake on my Raspberry Pi since 2015. However, I recently upgraded my kernel (to 5.4.72 from Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the sch_cake in mainline. Now, when running: sudo /sbin/tc qdisc add dev ppp0 root cake I get the error: Error: NLA_F_NESTED is missing. I get this error with the sch_cake in mainline, and also with sch_cake built out-of-tree. I also get the error with both Debian's iproute2 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's tc-adv repo. Any ideas on what this error means and how to fix it? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-01 10:15 [Cake] NLA_F_NESTED is missing Dean Scarff @ 2020-11-01 16:53 ` Y 2020-11-02 12:37 ` Toke Høiland-Jørgensen 2020-11-03 1:14 ` Jonathan Morton 2 siblings, 0 replies; 10+ messages in thread From: Y @ 2020-11-01 16:53 UTC (permalink / raw) To: cake, Dean Scarff My pi doesn't have error using cake through eth0. Le dimanche 1 novembre 2020 à 19:15:54 UTC+9, Dean Scarff <dos@scarff.id.au> a écrit : Hi, I've been happily running the out-of-tree sch_cake on my Raspberry Pi since 2015. However, I recently upgraded my kernel (to 5.4.72 from Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the sch_cake in mainline. Now, when running: sudo /sbin/tc qdisc add dev ppp0 root cake I get the error: Error: NLA_F_NESTED is missing. I get this error with the sch_cake in mainline, and also with sch_cake built out-of-tree. I also get the error with both Debian's iproute2 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's tc-adv repo. Any ideas on what this error means and how to fix it? _______________________________________________ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-01 10:15 [Cake] NLA_F_NESTED is missing Dean Scarff 2020-11-01 16:53 ` Y @ 2020-11-02 12:37 ` Toke Høiland-Jørgensen 2020-11-03 1:11 ` Dean Scarff 2020-11-03 1:14 ` Jonathan Morton 2 siblings, 1 reply; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-11-02 12:37 UTC (permalink / raw) To: Dean Scarff, cake Dean Scarff <dos@scarff.id.au> writes: > Hi, > > I've been happily running the out-of-tree sch_cake on my Raspberry Pi > since 2015. However, I recently upgraded my kernel (to 5.4.72 from > Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the > sch_cake in mainline. Now, when running: > > sudo /sbin/tc qdisc add dev ppp0 root cake > > I get the error: > > Error: NLA_F_NESTED is missing. > > I get this error with the sch_cake in mainline, and also with sch_cake > built out-of-tree. I also get the error with both Debian's iproute2 > 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's tc-adv > repo. > > Any ideas on what this error means and how to fix it? I just tried building a 5.4.72 kernel and couldn't reproduce this, so it seems it's a fault with the raspberry pi kernel; I guess opening a bug against that would be the way to go? As for what's actually causing this, I couldn't find anything obvious that touches this code in the qdisc layer; but I suppose it has something to do with the core qdisc netlink parsing code? -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-02 12:37 ` Toke Høiland-Jørgensen @ 2020-11-03 1:11 ` Dean Scarff 2020-11-03 8:07 ` Dean Scarff 2020-11-03 11:00 ` Toke Høiland-Jørgensen 0 siblings, 2 replies; 10+ messages in thread From: Dean Scarff @ 2020-11-03 1:11 UTC (permalink / raw) To: cake On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote: > Dean Scarff <dos@scarff.id.au> writes: > >> Hi, >> >> I've been happily running the out-of-tree sch_cake on my Raspberry >> Pi >> since 2015. However, I recently upgraded my kernel (to 5.4.72 from >> Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the >> sch_cake in mainline. Now, when running: >> >> sudo /sbin/tc qdisc add dev ppp0 root cake >> >> I get the error: >> >> Error: NLA_F_NESTED is missing. >> >> I get this error with the sch_cake in mainline, and also with >> sch_cake >> built out-of-tree. I also get the error with both Debian's >> iproute2 >> 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's >> tc-adv >> repo. >> >> Any ideas on what this error means and how to fix it? > > I just tried building a 5.4.72 kernel and couldn't reproduce this, so > it > seems it's a fault with the raspberry pi kernel; I guess opening a > bug > against that would be the way to go? > > As for what's actually causing this, I couldn't find anything obvious > that touches this code in the qdisc layer; but I suppose it has > something to do with the core qdisc netlink parsing code? > > -Toke Thanks for the data point. For the record, the relevant kernel source is: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143 and the Pi branch: https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143 It seems very unlikely that the Pi folks are patching the netlink stuff, so I don't think I'll get much traction there unless I can call out something specifically wrong with their patchset. My current theory (despite the 4 combinations I tried) is that there's some mismatch between Raspbian/Debian's tc and the kernel (somewhere in the tc's qdisc code it's calling nla_parse_nested but not setting nla_type), but it's odd that nobody else can repro. tbh the Debian patches look pretty innocent too: https://salsa.debian.org/debian/iproute2/-/tree/558bae88bd0befc1bf3e1070733bafd522e44992/debian/patches I should be able to figure it out by poking around in tc with gdb. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-03 1:11 ` Dean Scarff @ 2020-11-03 8:07 ` Dean Scarff 2020-11-03 11:00 ` Toke Høiland-Jørgensen 1 sibling, 0 replies; 10+ messages in thread From: Dean Scarff @ 2020-11-03 8:07 UTC (permalink / raw) To: cake On Tue, 03 Nov 2020 12:11:06 +1100, Dean Scarff wrote: > I should be able to figure it out by poking around in tc with gdb. I did this, and I confirmed that tc isn't trying to send any nested attributes. So I think the problem is on the kernel side, since it seems to be hallucinating attributes it expects to be nested but aren't. Note that "tc" does send an empty options attribute: addattr_l(n, 1024, TCA_OPTIONS, NULL, 0); https://salsa.debian.org/debian/iproute2/-/blob/v5.7.0/tc/q_cake.c#L356 It's the same in upstream iproute2 and iproute2-next: https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/tc/q_cake.c#n356 https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/tree/tc/q_cake.c#n356 This looks valid to me. While I'm less sure about all the other attributes being added in cake_parse_opt (i.e. whether they should be nested under TCA_OPTIONS), that's moot in my repro case, because they're not being set anyway. --- Interesting parts of the gdb session: (gdb) run qdisc add dev ppp0 root cake Starting program: /home/dean/iproute2/tc/tc qdisc add dev ppp0 root cake Breakpoint 10, rtnl_talk (rtnl=0xc72d0 <rth>, n=0x7efefb78, answer=0x0) at libnetlink.c:1048 1048 return __rtnl_talk(rtnl, n, answer, true, NULL); (gdb) p *rtnl $14 = {fd = 3, local = {nl_family = 16, nl_pad = 0, nl_pid = 18698, nl_groups = 0}, peer = {nl_family = 0, nl_pad = 0, nl_pid = 0, nl_groups = 0}, seq = 1604370876, dump = 1604370876, proto = 0, dump_fp = 0x0, flags = 0} (gdb) p *n $15 = {nlmsg_len = 52, nlmsg_type = 36, nlmsg_flags = 1537, nlmsg_seq = 0, nlmsg_pid = 0} (gdb) p sizeof(struct nlmsghdr) $16 = 16 (gdb) call print_qdisc(n, stdout) added qdisc cake 0: dev ppp0 root refcnt 0 nonat nowash no-ack-filter no-split-gso noatm overhead 0 $17 = 0 I've annotated the following to show the structure of the request. There are only two attributes, TCA_KIND and TCA_OPTIONS, and neither of those is nested. (gdb) x/52xb n nlmsghdr: 0x7efefb78: [0x34] 0x00 0x00 0x00 [0x24] 0x00 0x01 0x06 len=52 RTM_NEWQDISC 0x7efefb80: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 payload: family header: 0x7efefb88: [0x00][0x00][0x00 0x00][0x05 0x00][0x00 0x00] family=AF_UNSPEC ifindex=ppp0 pad1 pad2 alignment 0x7efefb90: [0x00 0x00 0x00 0x00][0xff 0xff 0xff 0xff] handle=0 parent=TC_H_ROOT attributes: 0x7efefb98: [0x00 0x00 0x00 0x00][0x09 0x00][0x01 0x00] info=0 rta_len=9 rta_type=TCA_KIND 0x7efefba0: [0x63 0x61 0x6b 0x65 0x00][0x00 0x00 0x00] rta_data=“cake” alignment 0x7efefba8: [0x04 0x00][0x02 0x00] rta_len=4 rta_type=TCA_OPTIONS (gdb) up #1 0x000199a4 in tc_qdisc_modify (cmd=36, flags=1536, argc=0, argv=0x7efffd70) at tc_qdisc.c:208 208 if (rtnl_talk(&rth, &req.n, NULL) < 0) (gdb) p req.t $19 = {tcm_family = 0 '\000', tcm__pad1 = 0 '\000', tcm__pad2 = 0, tcm_ifindex = 5, tcm_handle = 0, tcm_parent = 4294967295, tcm_info = 0} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-03 1:11 ` Dean Scarff 2020-11-03 8:07 ` Dean Scarff @ 2020-11-03 11:00 ` Toke Høiland-Jørgensen 2020-11-04 5:48 ` Dean Scarff 1 sibling, 1 reply; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-11-03 11:00 UTC (permalink / raw) To: Dean Scarff, cake Dean Scarff <dos@scarff.id.au> writes: > On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote: >> Dean Scarff <dos@scarff.id.au> writes: >> >>> Hi, >>> >>> I've been happily running the out-of-tree sch_cake on my Raspberry >>> Pi >>> since 2015. However, I recently upgraded my kernel (to 5.4.72 from >>> Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the >>> sch_cake in mainline. Now, when running: >>> >>> sudo /sbin/tc qdisc add dev ppp0 root cake >>> >>> I get the error: >>> >>> Error: NLA_F_NESTED is missing. >>> >>> I get this error with the sch_cake in mainline, and also with >>> sch_cake >>> built out-of-tree. I also get the error with both Debian's >>> iproute2 >>> 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's >>> tc-adv >>> repo. >>> >>> Any ideas on what this error means and how to fix it? >> >> I just tried building a 5.4.72 kernel and couldn't reproduce this, so >> it >> seems it's a fault with the raspberry pi kernel; I guess opening a >> bug >> against that would be the way to go? >> >> As for what's actually causing this, I couldn't find anything obvious >> that touches this code in the qdisc layer; but I suppose it has >> something to do with the core qdisc netlink parsing code? >> >> -Toke > > Thanks for the data point. > > For the record, the relevant kernel source is: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143 > and the Pi branch: > https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143 > > It seems very unlikely that the Pi folks are patching the netlink > stuff, so I don't think I'll get much traction there unless I can call > out something specifically wrong with their patchset. Well, something odd is certainly going on. The error message you're quoting comes form a part of the netlink parsing code (in the kernel) that shouldn't even be hit by the qdisc addition: NLA_F_NESTED parsing is only enabled in 'strict' validation mode, which is not used for qdiscs. So IDK, maybe a compiler issue or a bit that gets set wrong somewhere? Bisecting the kernel may be the only option here, I don't think you're going to find anything in userspace... -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-03 11:00 ` Toke Høiland-Jørgensen @ 2020-11-04 5:48 ` Dean Scarff 2020-11-04 11:27 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 10+ messages in thread From: Dean Scarff @ 2020-11-04 5:48 UTC (permalink / raw) To: cake On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote: > Dean Scarff <dos@scarff.id.au> writes: > >> On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote: >>> Dean Scarff <dos@scarff.id.au> writes: >>> >>>> Hi, >>>> >>>> I've been happily running the out-of-tree sch_cake on my >>>> Raspberry >>>> Pi >>>> since 2015. However, I recently upgraded my kernel (to 5.4.72 >>>> from >>>> Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the >>>> sch_cake in mainline. Now, when running: >>>> >>>> sudo /sbin/tc qdisc add dev ppp0 root cake >>>> >>>> I get the error: >>>> >>>> Error: NLA_F_NESTED is missing. >>>> >>>> I get this error with the sch_cake in mainline, and also with >>>> sch_cake >>>> built out-of-tree. I also get the error with both Debian's >>>> iproute2 >>>> 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's >>>> tc-adv >>>> repo. >>>> >>>> Any ideas on what this error means and how to fix it? >>> >>> I just tried building a 5.4.72 kernel and couldn't reproduce this, >>> so >>> it >>> seems it's a fault with the raspberry pi kernel; I guess opening a >>> bug >>> against that would be the way to go? >>> >>> As for what's actually causing this, I couldn't find anything >>> obvious >>> that touches this code in the qdisc layer; but I suppose it has >>> something to do with the core qdisc netlink parsing code? >>> >>> -Toke >> >> Thanks for the data point. >> >> For the record, the relevant kernel source is: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143 >> and the Pi branch: >> >> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143 >> >> It seems very unlikely that the Pi folks are patching the netlink >> stuff, so I don't think I'll get much traction there unless I can >> call >> out something specifically wrong with their patchset. > > Well, something odd is certainly going on. The error message you're > quoting comes form a part of the netlink parsing code (in the kernel) > that shouldn't even be hit by the qdisc addition: NLA_F_NESTED > parsing > is only enabled in 'strict' validation mode, which is not used for > qdiscs. > > So IDK, maybe a compiler issue or a bit that gets set wrong > somewhere? > Bisecting the kernel may be the only option here, I don't think > you're > going to find anything in userspace... Yeah, I came to the same conclusion. I verified the userspace was sane via gdb (see earlier post), and I also read through the sch_api.c and nlattr.c kernel code and it sure looks impossible for the strict validation to be getting hit. Safe to say this was random corruption: I downgraded the kernel, things worked as expected, then I upgraded back to the 5.4.72 and it worked too! Interestingly, the problem persisted across reboots (so it wasn't just RAM corruption), and all the kernel files also matched their "dpkg" MD5s (so it wasn't like the binaries were obviously corrupt on disk). I've replaced the Pi's microSD card just to be safe, though... kernel corruption is scary. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-04 5:48 ` Dean Scarff @ 2020-11-04 11:27 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-11-04 11:27 UTC (permalink / raw) To: Dean Scarff, cake Dean Scarff <dos@scarff.id.au> writes: > On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote: >> Dean Scarff <dos@scarff.id.au> writes: >> >>> On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote: >>>> Dean Scarff <dos@scarff.id.au> writes: >>>> >>>>> Hi, >>>>> >>>>> I've been happily running the out-of-tree sch_cake on my >>>>> Raspberry >>>>> Pi >>>>> since 2015. However, I recently upgraded my kernel (to 5.4.72 >>>>> from >>>>> Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the >>>>> sch_cake in mainline. Now, when running: >>>>> >>>>> sudo /sbin/tc qdisc add dev ppp0 root cake >>>>> >>>>> I get the error: >>>>> >>>>> Error: NLA_F_NESTED is missing. >>>>> >>>>> I get this error with the sch_cake in mainline, and also with >>>>> sch_cake >>>>> built out-of-tree. I also get the error with both Debian's >>>>> iproute2 >>>>> 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's >>>>> tc-adv >>>>> repo. >>>>> >>>>> Any ideas on what this error means and how to fix it? >>>> >>>> I just tried building a 5.4.72 kernel and couldn't reproduce this, >>>> so >>>> it >>>> seems it's a fault with the raspberry pi kernel; I guess opening a >>>> bug >>>> against that would be the way to go? >>>> >>>> As for what's actually causing this, I couldn't find anything >>>> obvious >>>> that touches this code in the qdisc layer; but I suppose it has >>>> something to do with the core qdisc netlink parsing code? >>>> >>>> -Toke >>> >>> Thanks for the data point. >>> >>> For the record, the relevant kernel source is: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143 >>> and the Pi branch: >>> >>> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143 >>> >>> It seems very unlikely that the Pi folks are patching the netlink >>> stuff, so I don't think I'll get much traction there unless I can >>> call >>> out something specifically wrong with their patchset. >> >> Well, something odd is certainly going on. The error message you're >> quoting comes form a part of the netlink parsing code (in the kernel) >> that shouldn't even be hit by the qdisc addition: NLA_F_NESTED >> parsing >> is only enabled in 'strict' validation mode, which is not used for >> qdiscs. >> >> So IDK, maybe a compiler issue or a bit that gets set wrong >> somewhere? >> Bisecting the kernel may be the only option here, I don't think >> you're >> going to find anything in userspace... > > Yeah, I came to the same conclusion. I verified the userspace was sane > via gdb (see earlier post), and I also read through the sch_api.c and > nlattr.c kernel code and it sure looks impossible for the strict > validation to be getting hit. > > Safe to say this was random corruption: I downgraded the kernel, things > worked as expected, then I upgraded back to the 5.4.72 and it worked > too! Interestingly, the problem persisted across reboots (so it wasn't > just RAM corruption), and all the kernel files also matched their "dpkg" > MD5s (so it wasn't like the binaries were obviously corrupt on disk). > I've replaced the Pi's microSD card just to be safe, though... kernel > corruption is scary. Ugh, Heisenbugs are the worst! Great to hear you managed to resolve it, though :) -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-01 10:15 [Cake] NLA_F_NESTED is missing Dean Scarff 2020-11-01 16:53 ` Y 2020-11-02 12:37 ` Toke Høiland-Jørgensen @ 2020-11-03 1:14 ` Jonathan Morton 2020-11-03 1:51 ` Dean Scarff 2 siblings, 1 reply; 10+ messages in thread From: Jonathan Morton @ 2020-11-03 1:14 UTC (permalink / raw) To: Dean Scarff; +Cc: cake > On 1 Nov, 2020, at 12:15 pm, Dean Scarff <dos@scarff.id.au> wrote: > > Error: NLA_F_NESTED is missing. Since you're running an up-to-date kernel, you should check you are also running up-to-date userspace tools. That flag is associated with the interface between the two. - Jonathan Morton ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] NLA_F_NESTED is missing 2020-11-03 1:14 ` Jonathan Morton @ 2020-11-03 1:51 ` Dean Scarff 0 siblings, 0 replies; 10+ messages in thread From: Dean Scarff @ 2020-11-03 1:51 UTC (permalink / raw) To: cake On Tue, 3 Nov 2020 03:14:37 +0200, Jonathan Morton wrote: >> On 1 Nov, 2020, at 12:15 pm, Dean Scarff <dos@scarff.id.au> wrote: >> >> Error: NLA_F_NESTED is missing. > > Since you're running an up-to-date kernel, you should check you are > also running up-to-date userspace tools. That flag is associated > with > the interface between the two. > > - Jonathan Morton Thanks. I figured the same thing (see my other post today), but if anything, one of the userspace versions I tested (iproute2 5.9.0) is *too* new (released Oct 19 for 5.9 kernels, see: https://lwn.net/Articles/834755/ ). For good measure, I also tested with Debian's iproute2_5.7.0-1 ;) Either way though, I can debug the userspace tools, which should get me to the root cause. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-11-04 11:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-01 10:15 [Cake] NLA_F_NESTED is missing Dean Scarff 2020-11-01 16:53 ` Y 2020-11-02 12:37 ` Toke Høiland-Jørgensen 2020-11-03 1:11 ` Dean Scarff 2020-11-03 8:07 ` Dean Scarff 2020-11-03 11:00 ` Toke Høiland-Jørgensen 2020-11-04 5:48 ` Dean Scarff 2020-11-04 11:27 ` Toke Høiland-Jørgensen 2020-11-03 1:14 ` Jonathan Morton 2020-11-03 1:51 ` Dean Scarff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox