From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [45.145.95.4]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 82A473B2A4 for ; Wed, 4 Nov 2020 06:27:55 -0500 (EST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1604489274; bh=D6O4BG6BZHXD3Z/QLUyEfOGfJ3rd1qUPoqRvpV409io=; h=From:To:Subject:In-Reply-To:References:Date:From; b=mjFZFmMHLogzCIuNlwJKluYBg2FM4DaQ7S0UwMnSXa+xsLbLqMtON4BWUOBBRDh5m kHpa0oMhdTJcWFzMPbKVEO/ncrY18ey5GOhWHb9FP37cOd28J2CxvqhwXznqJ3/p5L M/ZcWVPnFzR7Fg/K2Rv7ufa9JOytiSXf1XEj3ujlSBhIczbyHFkNrlZsf3iF0Jkr7c jF7eCpfFKaaRYqy6X18SX52GDnwTbyq163D5ojOnvxUkr+IxalQnL6reAqYPGfkXRW SOFleGXprc9zhtuR5qjZRkifxTsckdEiTkqDdzrTGhSQiVg8eTft/5kYZxfeEk9HK3 /C2PsZP/YVCRA== To: Dean Scarff , cake@lists.bufferbloat.net In-Reply-To: <6737e53394e4608f26677644d062bb23@scarff.id.au> References: <202fa41a446859d714728d90e890d1d2@scarff.id.au> <87d00wkk9f.fsf@toke.dk> <87k0v2k8m0.fsf@toke.dk> <6737e53394e4608f26677644d062bb23@scarff.id.au> Date: Wed, 04 Nov 2020 12:27:53 +0100 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87tuu5uzt2.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] NLA_F_NESTED is missing X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 11:27:55 -0000 Dean Scarff writes: > On Tue, 03 Nov 2020 12:00:55 +0100, Toke H=C3=83=C2=B8iland-J=C3=83=C2= =B8rgensen wrote: >> Dean Scarff writes: >> >>> On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote: >>>> Dean Scarff writes: >>>> >>>>> Hi, >>>>> >>>>> I've been happily running the out-of-tree sch_cake on my=20 >>>>> Raspberry >>>>> Pi >>>>> since 2015. However, I recently upgraded my kernel (to 5.4.72=20 >>>>> from >>>>> Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the >>>>> sch_cake in mainline. Now, when running: >>>>> >>>>> sudo /sbin/tc qdisc add dev ppp0 root cake >>>>> >>>>> I get the error: >>>>> >>>>> Error: NLA_F_NESTED is missing. >>>>> >>>>> I get this error with the sch_cake in mainline, and also with >>>>> sch_cake >>>>> built out-of-tree. I also get the error with both Debian's >>>>> iproute2 >>>>> 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's >>>>> tc-adv >>>>> repo. >>>>> >>>>> Any ideas on what this error means and how to fix it? >>>> >>>> I just tried building a 5.4.72 kernel and couldn't reproduce this,=20 >>>> so >>>> it >>>> seems it's a fault with the raspberry pi kernel; I guess opening a >>>> bug >>>> against that would be the way to go? >>>> >>>> As for what's actually causing this, I couldn't find anything=20 >>>> obvious >>>> that touches this code in the qdisc layer; but I suppose it has >>>> something to do with the core qdisc netlink parsing code? >>>> >>>> -Toke >>> >>> Thanks for the data point. >>> >>> For the record, the relevant kernel source is: >>>=20=20 >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/i= nclude/net/netlink.h?h=3Dv5.4.72#n1143 >>> and the Pi branch: >>>=20=20 >>> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022= -1/include/net/netlink.h#L1143 >>> >>> It seems very unlikely that the Pi folks are patching the netlink >>> stuff, so I don't think I'll get much traction there unless I can=20 >>> call >>> out something specifically wrong with their patchset. >> >> Well, something odd is certainly going on. The error message you're >> quoting comes form a part of the netlink parsing code (in the kernel) >> that shouldn't even be hit by the qdisc addition: NLA_F_NESTED=20 >> parsing >> is only enabled in 'strict' validation mode, which is not used for >> qdiscs. >> >> So IDK, maybe a compiler issue or a bit that gets set wrong=20 >> somewhere? >> Bisecting the kernel may be the only option here, I don't think=20 >> you're >> going to find anything in userspace... > > Yeah, I came to the same conclusion. I verified the userspace was sane= =20 > via gdb (see earlier post), and I also read through the sch_api.c and=20 > nlattr.c kernel code and it sure looks impossible for the strict=20 > validation to be getting hit. > > Safe to say this was random corruption: I downgraded the kernel, things= =20 > worked as expected, then I upgraded back to the 5.4.72 and it worked=20 > too! Interestingly, the problem persisted across reboots (so it wasn't= =20 > just RAM corruption), and all the kernel files also matched their "dpkg"= =20 > MD5s (so it wasn't like the binaries were obviously corrupt on disk).=20= =20 > I've replaced the Pi's microSD card just to be safe, though... kernel=20 > corruption is scary. Ugh, Heisenbugs are the worst! Great to hear you managed to resolve it, though :) -Toke