From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.taht.net (mail.taht.net [IPv6:2a01:7e00::f03c:91ff:feae:7028]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 9C8823B2A4 for ; Wed, 15 Nov 2017 14:44:48 -0500 (EST) Received: from nemesis.taht.net (c-24-6-113-161.hsd1.ca.comcast.net [24.6.113.161]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id 0CB1D21474; Wed, 15 Nov 2017 19:44:46 +0000 (UTC) From: Dave Taht To: Pete Heist Cc: cake@lists.bufferbloat.net References: <87vaic8vv1.fsf@nemesis.taht.net> Date: Wed, 15 Nov 2017 11:44:45 -0800 In-Reply-To: (Pete Heist's message of "Wed, 15 Nov 2017 15:41:52 +0100") Message-ID: <87bmk372du.fsf_-_@nemesis.taht.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: [Cake] Cake upstream Planning X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Nov 2017 19:44:48 -0000 (note changed topic thread) I dearly would like to try and submit cake to mainline linux in december. Getting it done is going to take group effort. And trying to cover all the corner cases, is going to take co-ordination and scripting, and perhaps we should switch to google docs to pull together. Also, it might be fun to schedule a dramatic reading of the source code via videoconference because theres a lot in cake that not enough people (except maybe jonathan) understand. Pete Heist writes: > On Nov 14, 2017, at 9:10 PM, Dave Taht wrote: > >=20=20=20=20=20 > Pete Heist writes: >=20=20=20=20=20 > By the way, what or how much is needed to get Cake mainlined? >=20=20=20=20=20=20=20=20=20 > > I'd like us to give it a go when net-next reopens in two weeks, > we'd then have 6 weeks or so to get it right. >=20=20=20=20=20 > We need: >=20=20=20=20=20 > * Someone to do the heavy lifting. Which I suspect would be me. > * Someones with various hardware platforms that current kernels can be > run on. qemu? > * I'd like to see the ack filtering work get tested on lede at low > bandwidths on dsl especially. > * A whole lotta tests at various RTTs >=20=20=20=20=20 > > I can offer some testing time, and can script or batch a range of RTTs. n= etns > would be useful here. For completeness, I suggest a product of rrul_be ru= ns: > > Rates: 128 / 256 / 512Kbit, 1 / 2 / 4 / 8 / 16 / 32 / 64 / 128 / 256 / 51= 2Mbit, > 1Gbit > > RTTs: 150 / 300 / 600us, 1 / 2 / 4 / 8 / 16 / 32 / 64 / 128 / 256 / 512 /= 1024ms Well, we need simple basic single tcp download tests, I would love to also reuse the http and voip tests toke used in the first paper. > Opinions? Some of those might be rough (I=E2=80=99m looking at you 128Kbi= t / 1024ms), > but it would be good to know what happens. For hardware, I could turn my = Mac > Mini into a qemu box. I guess this list is about right: Doing a few qemu setups would be good. In particular it helps with letting us test a net-next kernel. If we could make available qemu images all the better. > https://www.debian.org/releases/stable/i386/ch02s01.html.en. I don=E2=80= =99t know if all > tests need to be tried on all platforms. My principal requirement for multi-arch testing is that it "not crash" and "compile". More direct testing - like with the mvneta and other odd ethernet devices, kind of requires real hardware. > Testing could go much further, with host fairness, diffserv keywords, rtt > settings (more on that later), overheads, nat, etc. We could also test > underpowered hardware with rate limiting to see if it degrades gracefully= . For > sanity, we could just test a smattering of these things. This is a case where flent's batch facility would help. And we can divvy up the load among servers using the new netns technique. Assuming I get a bit of funding we can also grab some servers in the cloud, but I'm not expecting that, so... I do plan on getting a box to replace snapon also in this timeframe. > > Blockers: >=20=20=20=20=20 > * Ripping out all the backward compatability cruft for submission to > mainline and following netdev formatting conventions for comments and > indentation. I'd like any new features in the backport to get > backported, though (sigh), as lede looks to be shipping a 4.9 based > kernel. >=20=20=20=20=20 > > Argh, but probably has to be done. That turned out to not be hard. I'm about to test that result today. Folding the result sanely back into the main repo did turn out to be hard. I also have no idea how to fold together the cobalt and regular cake branches at the moment, so I'm sticking with cobalt. > > * tc-cake man page needs to be updated. >=20=20=20=20=20 > * tc-adv related code updated to latest iproute2 I will start a repo for this. > * There is some work going on here to add ack filtering to cake, which > looks VERY promising: https://github.com/dtaht/sch_cake/pull/63 >=20=20=20=20=20 > I'm going to add something like this to netem also. It may be that > merely leveraging the hash would be enough in cake's case. >=20=20=20=20=20 > * Testing against the net-next kernel on x86, x86_64, arm, mips, and > aarch architectures. (I just got bit by not testing 32 bit arches, si= gh) >=20=20=20=20=20 > > Regarding the target and interval settings Cake uses, here are the current > keywords available and their settings: > > datacentre: 19us / 114us (us yanks might like =E2=80=98datacenter' as a s= ynonym) > lan: 50us / 1ms > metro: 500us / 10ms > regional: 1.5ms / 30ms > internet: 5ms / 100ms > oceanic: 15ms / 300ms > satellite: 50ms / 1s > interplanetary: 5ms / 3600s > > About a year ago I raised a concern that these values were outside what t= he > CoDel authors intended. The counter-argument at the time was that > experimentally, we can show that TCP RTT can be reduced on a Gbit LAN wit= h the > =E2=80=98lan=E2=80=99 keyword. And that argument seems to hold, so far. O= n two BQLd systems (2x > PCEngines APU2s) connected with GigE, I can run the same experiment now a= nd show > that: > > TCP RTT ~=3D 8ms with default qdisc, throughput ~=3D 940 Mbit > TCP RTT ~=3D 4.5ms with =E2=80=98cake unlimited=E2=80=99, throughput ~=3D= 920 Mbit > TCP RTT ~=3D 1ms with =E2=80=98cake unlimited lan=E2=80=99, throughput ~= =3D 920 Mbit > > So yes, we can lower TCP RTT with these more aggressive settings. But jus= t to > make sure, we=E2=80=99re confident that there are no other side effects f= rom these lower > targets and intervals? Is there anything else I should test for to be sur= e? For > example, when I rate limit to 950 Mbit and try the same test above, =E2= =80=98lan=E2=80=99 causes > a 20% drop in throughput vs the defaults. That may be from an overtaxed C= PU, but > I don=E2=80=99t know. I also wonder how this affects routed vs local traf= fic. I=E2=80=99ll try > to test this at some point, as I want to understand it better anyway to k= now how > backhaul links should be configured... > > Non-Blockers: >=20=20=20=20=20 > * I don't believe in cobalt, or rather, I won't believe in it until we > have data at many RTTs. That said, what I'd propose would be a > monolithic cobalt.h file rather than codel5.h. >=20=20=20=20=20 > The netns stuff will make simulating RTTs and bandwidths much easier= =E2=80=A6. > >=20=20=20=20=20 > * I think the fq_codel batch drop facility is better than what cake u= ses > in case of floods. Partially due to the need to handle backports the > mechanism fq_codel uses is hard to use - but going mainline we could = add > this. >=20=20=20=20=20 > * The autorate_ingress code should be marked experimental. I keep hop= ing > it can be improved by better looking for "smoothness" inbound, but > algorithms escape me. This doesn't bother me much, as tcp continues to > be improved over the past 50 years, perhaps we can find ways to impro= ve > this with more users. >=20=20=20=20=20 > * It is possible to tune the quantum and peeling functions to not peel > to the extent they do. Particularly there is usually no need (aside f= rom > wanting accurate statistics) to peel below 1500 bytes (except perhaps > with the new ack filter mode). We experimented a lot with this in the > early days but could never come to a resolution. >=20=20=20=20=20 > * I don't have any use for precidence mode and would like to remove i= t. > > Regarding non-blockers, for FreeNet=E2=80=99s purposes, I wanted to see i= f I could add > the option to use packet marks as one of the identifiers for host isolati= on, but > I=E2=80=99ve not had time to explore it yet. This would be helpful for IS= Ps that want to > ensure fairness when there isn=E2=80=99t a one-to-one mapping between IP = address and > customer. I=E2=80=99ll see if I can at least try it.