[Cake] NLA_F_NESTED is missing

Toke Høiland-Jørgensen toke at toke.dk
Wed Nov 4 06:27:53 EST 2020


Dean Scarff <dos at scarff.id.au> writes:

>  On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote:
>> Dean Scarff <dos at scarff.id.au> writes:
>>
>>>  On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote:
>>>> Dean Scarff <dos at scarff.id.au> writes:
>>>>
>>>>>  Hi,
>>>>>
>>>>>  I've been happily running the out-of-tree sch_cake on my 
>>>>> Raspberry
>>>>> Pi
>>>>>  since 2015.  However, I recently upgraded my kernel (to 5.4.72 
>>>>> from
>>>>>  Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the
>>>>>  sch_cake in mainline.  Now, when running:
>>>>>
>>>>>    sudo /sbin/tc qdisc add dev ppp0 root cake
>>>>>
>>>>>  I get the error:
>>>>>
>>>>>    Error: NLA_F_NESTED is missing.
>>>>>
>>>>>  I get this error with the sch_cake in mainline, and also with
>>>>> sch_cake
>>>>>  built out-of-tree.  I also get the error with both Debian's
>>>>> iproute2
>>>>>  5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's
>>>>> tc-adv
>>>>>  repo.
>>>>>
>>>>>  Any ideas on what this error means and how to fix it?
>>>>
>>>> I just tried building a 5.4.72 kernel and couldn't reproduce this, 
>>>> so
>>>> it
>>>> seems it's a fault with the raspberry pi kernel; I guess opening a
>>>> bug
>>>> against that would be the way to go?
>>>>
>>>> As for what's actually causing this, I couldn't find anything 
>>>> obvious
>>>> that touches this code in the qdisc layer; but I suppose it has
>>>> something to do with the core qdisc netlink parsing code?
>>>>
>>>> -Toke
>>>
>>>  Thanks for the data point.
>>>
>>>  For the record, the relevant kernel source is:
>>>  
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143
>>>  and the Pi branch:
>>>  
>>> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143
>>>
>>>  It seems very unlikely that the Pi folks are patching the netlink
>>>  stuff, so I don't think I'll get much traction there unless I can 
>>> call
>>>  out something specifically wrong with their patchset.
>>
>> Well, something odd is certainly going on. The error message you're
>> quoting comes form a part of the netlink parsing code (in the kernel)
>> that shouldn't even be hit by the qdisc addition: NLA_F_NESTED 
>> parsing
>> is only enabled in 'strict' validation mode, which is not used for
>> qdiscs.
>>
>> So IDK, maybe a compiler issue or a bit that gets set wrong 
>> somewhere?
>> Bisecting the kernel may be the only option here, I don't think 
>> you're
>> going to find anything in userspace...
>
>  Yeah, I came to the same conclusion.  I verified the userspace was sane 
>  via gdb (see earlier post), and I also read through the sch_api.c and 
>  nlattr.c kernel code and it sure looks impossible for the strict 
>  validation to be getting hit.
>
>  Safe to say this was random corruption: I downgraded the kernel, things 
>  worked as expected, then I upgraded back to the 5.4.72 and it worked 
>  too!  Interestingly, the problem persisted across reboots (so it wasn't 
>  just RAM corruption), and all the kernel files also matched their "dpkg" 
>  MD5s (so it wasn't like the binaries were obviously corrupt on disk).  
>  I've replaced the Pi's microSD card just to be safe, though... kernel 
>  corruption is scary.

Ugh, Heisenbugs are the worst! Great to hear you managed to resolve it,
though :)

-Toke


More information about the Cake mailing list