From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2001:470:dc45:1000::1]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4BB6E3BA8E for ; Mon, 2 Jul 2018 15:31:21 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1530559879; bh=dzuNPZ+z1qkY9vdazCGy8nbFv+cqP/7XE0GwMWEMxd4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=gCFQG2gVuXA87BMQLwYkBJHk3zh9vNtAkQKghkclHVSHjf1iueW1vmi+aa9yU7G8O JfYmObBqvaRi3TdJr9uGRrMUFbeJr4FPt14TWXEaLoyjT/4CzoZ6xqZxEKU1DeZgka FU9AHPOlf7zapsuwtr/zYKspxu8+SBQQUkqWs1Ml2aB1inaGLPVHkkvCC9gvHYkKee T6TWfIgDfztYV4WplEyDP139HZAGPsKqiOmCRabQOlPa2psIzbHfnwKwz7u7vdNgS1 CamajZ7sUuS7YfY+jpO0SzXuIAAScgfC/DigtCKKXcTMGZ4k9LWRXurnMCYOnksNOq HvbfjhiEoDjsg== To: Pete Heist Cc: Kevin Darbyshire-Bryant , Cake List In-Reply-To: References: <6DF9A5E0-EFD5-4519-9889-BC0A7B9BD48E@darbyshire-bryant.me.uk> <1A8BA286-6B31-4581-86C9-6855AC28C245@heistp.net> <673EAD3F-AB09-4B90-88BB-5DCE0BD65534@heistp.net> <6FE8D434-01BE-41A1-BD6B-EFFD67AC8784@heistp.net> <94C9790F-E9BC-4D59-9845-17C305E4B910@darbyshire-bryant.me.uk> <17AF79A0-0213-44E3-95B9-62795A644A47@heistp.net> <87lgatj13k.fsf@toke.dk> <87fu11ipir.fsf@toke.dk> <8815D90E-DEAB-4211-B4B4-7058178DEA47@heistp.net> Date: Mon, 02 Jul 2018 21:31:29 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <874lhhigdq.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] Cake on openwrt - falling behind X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jul 2018 19:31:21 -0000 Pete Heist writes: >> On Jul 2, 2018, at 7:04 PM, Pete Heist wrote: >>=20 >>=20 >>=20 >>> On Jul 2, 2018, at 6:14 PM, Toke H=C3=B8iland-J=C3=B8rgensen > wrote: >>>=20 >>> Aha! I think I figured out what is going on: >>>=20 >>> The gen_stats facility will add an nlattr header at the beginning of the >>> qdisc stats, which is the toplevel TLV that contains all stats (and that >>> we put our stats inside). It stores a reference to this header, and when >>> all the per-qdisc callbacks have finished adding their stats, it goes >>> back and fixes up the length of the containing header. >>>=20 >>> The problem is that on architectures that need padding, the padding TLV >>> is added *first*, which means that the nlattr pointer that is stored >>> before the callbacks are performed points to the padding TLV and not the >>> stats TLV. And so, when the header is fixed up, the result (from the >>> parser's perspective) is just a very big padding TLV. >>>=20 >>> The options TLV is before the stats TLV, so the bug only occurs if the >>> options happen to have a length that means the stats will need padding. >>> Which is why messing with the number of options "fixes" the bug. >>>=20 >>> Could you try applying the patch below (to the kernel) and see if that >>> resolves the issue, please? >>=20 >> Awesome Toke! It looks like from Kevin=E2=80=99s email that it works for= him, >> but it didn=E2=80=99t work for me the first time around. This may have t= o do >> with how I added the patch as I=E2=80=99m still not that familiar with >> OpenWRT=E2=80=99s build system (first kernel patch I tried). I wasn=E2= =80=99t sure if >> it should go into generic or platform, for one, so I tried generic=E2=80= =A6is >> that right? > > Ok, I got it to work after re-flashing with tftp. :) It looks like the > OM2P is not always successfully performing sysupgrades, perhaps due to > its limited memory (64M), but I=E2=80=99m not sure. Great, thanks for testing, both of you! > I still have my debugging in place and do still have one question. The > pointer in TCA_STATS2 is now valid, but there is still a pointer value > in TCA_PAD, which is pointing to a place 32 bits before TCA_STATS2. Is > that expected? Yes, that is expected. The PAD is there to align the subsequent STATS TLV, so it is being parsed but is unused. This is the offending spot from Kevin's pcap files: Working: The header starts at byte 0x17c with a TLV of type 7 (TCA_STATS2) of length 0x268. No padding TLV needed, so stats work: 00000170: 0000 0000 0008 000a 0000 0000 0268 0007 .............h.. 00000180: 0234 0004 0008 0002 0025 f4cc 0008 0003 .4.......%...... Broken: The header starts at byte 0x180 with a PAD TLV (type 9) followed by the TCA_STATS2 TLV. Both start out with 4-byte lengths (just the header), but because the PAD TLV is the one being extended, that gets a new length of 0x29c, which means it now contains the TCA_STATS2 TLV as the first nested TLV. What should have happened was that the TCA_PAD TLV should have stayed at 4 bytes, thus aligning the payload of the TCA_STATS2 TLV at 0x188 bytes, and the TCA_STATS2 TLV should have been length 0x298. Which is what happens after the patch, and why TCA_PAD is still set by the parser. 00000180: 029c 0009 0004 0007 0260 0004 000c 0002 .........`...... -Toke