[Cake] the Cake stalemate

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Tue Jun 19 09:41:36 EDT 2018



> On 19 Jun 2018, at 13:26, Pete Heist <pete at heistp.net> wrote:
> 
> 
>> On Jun 19, 2018, at 1:54 PM, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>> 
>> We also saw a bug on 32-bit MIPS where some combinations of 64-bit
>> netlink attributes would cause stats display in tc to fail. However, I
>> believe this is more a case of Cake exposing a latent bug somewhere in
>> the tc or kernel netlink code (alignment issues, perhaps?), and so I'm
>> not sure it is necessarily a blocker for merging Cake. However, if
>> someone could take a look that would be very helpful. I forget if the
>> current head of the cobalt branch exposes the bug, but I think it does.
>> It's quite obvious when it happens: no stats output whatsoever...
> 
> I have a 32-bit MIPS in my ER-X, but it sounds like what I saw (outrageous refcnt values) was something different:
<snip>

Yes it was.  At one point iproute’s tc was doing hidden type promotions in printing from 32bit to 64bit types and neglecting to tell the printf formatter of the change, thus printf was starting at the wrong point in memory in big endian environments.  This was part of the move to JSON output.

Toke took my bug report & patch and made it acceptable to upstream where it now lives as: https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=4db2ff0db46f6368d89cfb3498a700e1256d2a04 and is included in iproute2 v.4.17

> However, if there’s a way I should try to reproduce something on this hardware to take a look, send any info you’ve got (how to add 64-bit netlink attributes?). I even have a spare ER-X on which I could put OpenWRT in case I need to be working with a more modern kernel…

The lack of stats on recent (ie post https://github.com/dtaht/sch_cake/commit/af1d7cde7046af55ec867b29854d754816b64bc8 May 15th) with MIPS BE & LE 32 bit arch is a mystery.  My hack workaround to that for my own personal openwrt builds is https://github.com/ldir-EDB0/openwrt/tree/tokesiproutedebug - which also includes a debug commit from Toke.

I considered bumping openwrt’s master branch to point at latest commit of ‘cobalt’ like my build does, so we could judge from the resultant screaming if it was just MIPS affected or other 32 bit arch’s.  I was dissuaded from doing so.

I got a little further into collecting info on this courtesy ‘kmod-netnl’ which allows packet capture of netlink packets as if on a network interface - captures sent to Toke IIRC but they require hand disassembly to determine where the packet formatting is going wrong.  And there $real_life intervened and I’ve not looked at since/had some more pressing bugs to ponder.

Openwrt nearly bumped to iproute v4.17 but I haven’t yet got around to seeing if that makes any difference.  It looks like netlink_parse_nested cannot cope with 64bit netlink attributes…. but this requires a person who can code rather than me to go any further.

RE: the stalemate.  I swing between an absolute hatred of anything linux/open source/mail lists and finding some people *incredibly* helpful and thinking ‘it’s not so bad, actually this is fun’.  I offer a very recent example of this where I worked with David Woodhouse on a kernel PPPoATM bug (caused by a ticking timebomb that one E Dumazet left behind ;-) that stretched me to my absolute limits but was executed in a spirit of helpfulness, curiosity & fun.  So it seems to be about finding the right person in kernel land who can both see the errors in our code but also see the value and effort in what we have achieved.  Maybe I’m being unfair and not interpreting the kernel mailing list environment correctly but to me it comes across as abrasive at best (and I swore I'd put my head in a tiger’s mouth and tickle its testicles with a spanner before I even think of trying to submit another patch upstream)

On the other hand I can also see that had we approached/involved the kernel people earlier on then some of the blind alleys we’ve travelled (I’m thinking passing of netlink stats here) could have been avoided.  Instead we’ve invested years of work and just presented a fait accompli.  Whether that would have yielded some of the layer breaking stuff we’ve ended up with I very much doubt and cobalt would have been much, much poorer as a result.

The beauty of cake/cobalt is that it does a number of sensible things all in one command line (and has to work around some of linux’s layering decisions.. IFB)

Anyway, there’s my opinion.

KDB

> 
> _______________________________________________
> Cake mailing list
> Cake at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20180619/a9b08d59/attachment.sig>


More information about the Cake mailing list