From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-x236.google.com (mail-wr0-x236.google.com [IPv6:2a00:1450:400c:c0c::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 0413F3B2A4 for ; Wed, 15 Nov 2017 09:41:55 -0500 (EST) Received: by mail-wr0-x236.google.com with SMTP id k61so20649205wrc.4 for ; Wed, 15 Nov 2017 06:41:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=4UYC9UGTzJVap2szEqhWWjw5vM4xOrJG5Ws3aki7EXA=; b=irODp55VM3FUVrWBzO87matM2T343D94I+yaetutJxhLVNQmbjp/vB+/cTCcfGiy07 DBDroWZE2jJYBOGtkTEqN/9VYk1ofeZIt/aBlhBKQ+dw5hL7+RqloSoTpNo9AQy3OQN/ zWSd7jlhsTQ8DiAERX+8SHEsC0S2/qBe2rhgNBItq1io/TXxhvx/gXdi3Lc/+vIpKNTc ASp4Q8lWX1q03zCWh3oIW99DYsYUIP1wP6wNuAX21ErwqiY2w5Dp9egBbIdGZrLmh/hc N43pV/PPaIb/Izhxy1Tccsj8wzJ3KXkPJ3hIoI8X0v3ixK+zP9w7OBlK9+rIKHZ9i0uS qR4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=4UYC9UGTzJVap2szEqhWWjw5vM4xOrJG5Ws3aki7EXA=; b=KNlT4p52nmyCcFzZWTTVnj6vA8DHYgWbYByoygqKH3piI9Qdq8wJGwuZsD8ODgjw4r 8s+fCDx2fBhI/CajHZOuE/rs+a6V7rr2CILqRgSjT46bPVY5AXrZQ1bHCD4z199UrJD+ 3DQleGvOwj9vjda42lok9VEuyfK2UWcOMP4AcsVtCzFJuRe1qCRcxDuucRppbNIIOSik nVihE2GgwKINDzwz/yUt7vIhQVsWEpHxgDbenogptywUygx0RrDLVYMw00jqq4itiP88 mek8nR5yoU/SiYRFstwu2egLiskUk69mRPmcHlLG4OwFVEFOqOfayUmQ25oYVajt1yRB C0hQ== X-Gm-Message-State: AJaThX55iVx/4mdcFvwU2RY+HIfwv9uppqi/CmBu5DnB9xN85CMSHFiY VyC0gYkV1JPvhYC/1rszzfu5nWdk X-Google-Smtp-Source: AGs4zMZHB3k6c2AeoM3Ly3jZGr7Ie63JHDInnhWhNVrO0cbp5iOUB2LGOlehFdA2kayoUMg0Nq1YBA== X-Received: by 10.223.153.100 with SMTP id x91mr12910752wrb.189.1510756914909; Wed, 15 Nov 2017 06:41:54 -0800 (PST) Received: from [10.72.0.130] (h-1169.lbcfree.net. [185.99.119.68]) by smtp.gmail.com with ESMTPSA id 29sm22311962wrz.77.2017.11.15.06.41.53 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 15 Nov 2017 06:41:54 -0800 (PST) Content-Type: multipart/alternative; boundary="Apple-Mail=_FD5E9921-4030-4B00-A078-3E5FCC77FE7D" Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) From: Pete Heist In-Reply-To: <87vaic8vv1.fsf@nemesis.taht.net> Date: Wed, 15 Nov 2017 15:41:52 +0100 Cc: cake@lists.bufferbloat.net Message-Id: References: <87vaic8vv1.fsf@nemesis.taht.net> To: Dave Taht X-Mailer: Apple Mail (2.3124) Subject: Re: [Cake] Donation X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Nov 2017 14:41:56 -0000 --Apple-Mail=_FD5E9921-4030-4B00-A078-3E5FCC77FE7D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Nov 14, 2017, at 9:10 PM, Dave Taht wrote: >=20 > Pete Heist > writes: >=20 >> By the way, what or how much is needed to get Cake mainlined? >=20 > I'd like us to give it a go when net-next reopens in two weeks, > we'd then have 6 weeks or so to get it right. >=20 > We need: >=20 > * Someone to do the heavy lifting. Which I suspect would be me. > * Someones with various hardware platforms that current kernels can be > run on. qemu? > * I'd like to see the ack filtering work get tested on lede at low > bandwidths on dsl especially. > * A whole lotta tests at various RTTs I can offer some testing time, and can script or batch a range of RTTs. = netns would be useful here. For completeness, I suggest a product of = rrul_be runs: Rates: 128 / 256 / 512Kbit, 1 / 2 / 4 / 8 / 16 / 32 / 64 / 128 / 256 / = 512Mbit, 1Gbit RTTs: 150 / 300 / 600us, 1 / 2 / 4 / 8 / 16 / 32 / 64 / 128 / 256 / 512 = / 1024ms Opinions? Some of those might be rough (I=E2=80=99m looking at you = 128Kbit / 1024ms), but it would be good to know what happens. For = hardware, I could turn my Mac Mini into a qemu box. I guess this list is = about right: https://www.debian.org/releases/stable/i386/ch02s01.html.en = . I don=E2=80= =99t know if all tests need to be tried on all platforms. Testing could go much further, with host fairness, diffserv keywords, = rtt settings (more on that later), overheads, nat, etc. We could also = test underpowered hardware with rate limiting to see if it degrades = gracefully. For sanity, we could just test a smattering of these things. > Blockers: >=20 > * Ripping out all the backward compatability cruft for submission to > mainline and following netdev formatting conventions for comments and > indentation. I'd like any new features in the backport to get > backported, though (sigh), as lede looks to be shipping a 4.9 based > kernel. Argh, but probably has to be done. > * tc-cake man page needs to be updated. >=20 > * tc-adv related code updated to latest iproute2 >=20 > * There is some work going on here to add ack filtering to cake, which > looks VERY promising: https://github.com/dtaht/sch_cake/pull/63 = >=20 > I'm going to add something like this to netem also. It may be that > merely leveraging the hash would be enough in cake's case. >=20 > * Testing against the net-next kernel on x86, x86_64, arm, mips, and > aarch architectures. (I just got bit by not testing 32 bit arches, = sigh) Regarding the target and interval settings Cake uses, here are the = current keywords available and their settings: datacentre: 19us / 114us (us yanks might like =E2=80=98datacenter' as a = synonym) lan: 50us / 1ms metro: 500us / 10ms regional: 1.5ms / 30ms internet: 5ms / 100ms oceanic: 15ms / 300ms satellite: 50ms / 1s interplanetary: 5ms / 3600s About a year ago I raised a concern that these values were outside what = the CoDel authors intended. The counter-argument at the time was that = experimentally, we can show that TCP RTT can be reduced on a Gbit LAN = with the =E2=80=98lan=E2=80=99 keyword. And that argument seems to hold, = so far. On two BQLd systems (2x PCEngines APU2s) connected with GigE, I = can run the same experiment now and show that: TCP RTT ~=3D 8ms with default qdisc, throughput ~=3D 940 Mbit TCP RTT ~=3D 4.5ms with =E2=80=98cake unlimited=E2=80=99, throughput ~=3D = 920 Mbit TCP RTT ~=3D 1ms with =E2=80=98cake unlimited lan=E2=80=99, throughput = ~=3D 920 Mbit So yes, we can lower TCP RTT with these more aggressive settings. But = just to make sure, we=E2=80=99re confident that there are no other side = effects from these lower targets and intervals? Is there anything else I = should test for to be sure? For example, when I rate limit to 950 Mbit = and try the same test above, =E2=80=98lan=E2=80=99 causes a 20% drop in = throughput vs the defaults. That may be from an overtaxed CPU, but I = don=E2=80=99t know. I also wonder how this affects routed vs local = traffic. I=E2=80=99ll try to test this at some point, as I want to = understand it better anyway to know how backhaul links should be = configured... > Non-Blockers: >=20 > * I don't believe in cobalt, or rather, I won't believe in it until we > have data at many RTTs. That said, what I'd propose would be a > monolithic cobalt.h file rather than codel5.h. >=20 > The netns stuff will make simulating RTTs and bandwidths much = easier=E2=80=A6. >=20 > * I think the fq_codel batch drop facility is better than what cake = uses > in case of floods. Partially due to the need to handle backports the > mechanism fq_codel uses is hard to use - but going mainline we could = add this. >=20 > * The autorate_ingress code should be marked experimental. I keep = hoping > it can be improved by better looking for "smoothness" inbound, but > algorithms escape me. This doesn't bother me much, as tcp continues to > be improved over the past 50 years, perhaps we can find ways to = improve > this with more users. >=20 > * It is possible to tune the quantum and peeling functions to not peel > to the extent they do. Particularly there is usually no need (aside = from > wanting accurate statistics) to peel below 1500 bytes (except perhaps > with the new ack filter mode). We experimented a lot with this in the > early days but could never come to a resolution. >=20 > * I don't have any use for precidence mode and would like to remove = it. Regarding non-blockers, for FreeNet=E2=80=99s purposes, I wanted to see = if I could add the option to use packet marks as one of the identifiers = for host isolation, but I=E2=80=99ve not had time to explore it yet. = This would be helpful for ISPs that want to ensure fairness when there = isn=E2=80=99t a one-to-one mapping between IP address and customer. = I=E2=80=99ll see if I can at least try it.= --Apple-Mail=_FD5E9921-4030-4B00-A078-3E5FCC77FE7D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Nov 14, 2017, at 9:10 PM, Dave Taht <dave@taht.net> = wrote:

Pete Heist <peteheist@gmail.com> writes:

By the way, what or how much is needed to get Cake = mainlined?

I'd like us to give it a go when net-next = reopens in two weeks,
we'd then have 6 weeks or so to get it = right.

We need:

* Someone to do the heavy lifting. Which = I suspect would be me.
* Someones with various hardware = platforms that current kernels can be
 run on. qemu?
* I'd like to see = the ack filtering work get tested on lede at low
 bandwidths on = dsl especially.
* A whole lotta tests at various RTTs

I can offer some testing time, and can script or = batch a range of RTTs. netns would be useful here. For completeness, I = suggest a product of rrul_be runs:

Rates: 128 / 256 / 512Kbit, 1 / 2 / 4 / 8 / 16 / = 32 / 64 / 128 / 256 / 512Mbit, 1Gbit

RTTs: 150 / 300 / 600us, 1 / 2 / 4 / 8 / 16 / 32 / = 64 / 128 / 256 / 512 / 1024ms

Opinions? Some of those might be rough (I=E2=80=99m = looking at you 128Kbit / 1024ms), but it would be good to know what = happens. For hardware, I could turn my Mac Mini into a qemu box. I guess = this list is about right: https://www.debian.org/releases/stable/i386/ch02s01.html.en= . I don=E2=80=99t know if all tests need to be tried on all = platforms.

Testing could go much = further, with host fairness, diffserv keywords, rtt settings (more on = that later), overheads, nat, etc. We could also test underpowered = hardware with rate limiting to see if it degrades gracefully. For = sanity, we could just test a smattering of these things.

Blockers:

* Ripping out all the backward compatability = cruft for submission to
 mainline and following netdev = formatting conventions for comments and
 indentation. I'd like any new = features in the backport to get
 backported, though (sigh), as lede = looks to be shipping a 4.9 based
 kernel.

Argh, but probably has to be done.

* tc-cake man page needs to be = updated.

* tc-adv related = code updated to latest iproute2

* There is some work going on here to add ack = filtering to cake, which
looks VERY promising: https://github.com/dtaht/sch_cake/pull/63

I'm going to add something like this to = netem also. It may be that
merely leveraging the hash would be = enough in cake's case.

* Testing against the net-next kernel on x86, = x86_64, arm, mips, and
aarch architectures. (I just got bit by = not testing 32 bit arches, sigh)

Regarding the target and interval settings Cake = uses, here are the current keywords available and their = settings:

datacentre: 19us / 114us = (us yanks might like =E2=80=98datacenter' as a synonym)
lan: = 50us / 1ms
metro: 500us / 10ms
regional: 1.5ms = / 30ms
internet: 5ms / 100ms
oceanic: = 15ms / 300ms
satellite: 50ms / 1s
interplanetary: = 5ms / 3600s

About a year ago I = raised a concern that these values were outside what the CoDel authors = intended. The counter-argument at the time was that experimentally, we = can show that TCP RTT can be reduced on a Gbit LAN with the =E2=80=98lan=E2= =80=99 keyword. And that argument seems to hold, so far. On two BQLd = systems (2x PCEngines APU2s) connected with GigE, I can run the same = experiment now and show that:

TCP = RTT ~=3D 8ms with default qdisc, throughput ~=3D 940 Mbit
TCP = RTT ~=3D 4.5ms with =E2=80=98cake unlimited=E2=80=99, throughput ~=3D = 920 Mbit
TCP RTT ~=3D 1ms with =E2=80=98cake unlimited lan=E2=80= =99, throughput ~=3D 920 Mbit

So = yes, we can lower TCP RTT with these more aggressive settings. But just = to make sure, we=E2=80=99re confident that there are no other side = effects from these lower targets and intervals? Is there anything else I = should test for to be sure? For example, when I rate limit to 950 Mbit = and try the same test above, =E2=80=98lan=E2=80=99 causes a 20% drop in = throughput vs the defaults. That may be from an overtaxed CPU, but I = don=E2=80=99t know. I also wonder how this affects routed vs local = traffic. I=E2=80=99ll try to test this at some point, as I want to = understand it better anyway to know how backhaul links should be = configured...

Non-Blockers:

* I don't believe in cobalt, or rather, I won't = believe in it until we
have data at many RTTs. That said, what = I'd propose would be a
monolithic cobalt.h file rather than = codel5.h.

The netns stuff = will make simulating RTTs and bandwidths much = easier=E2=80=A6.

* I think the fq_codel batch drop = facility is better than what cake uses
in case of floods. Partially due to the = need to handle backports the
mechanism fq_codel uses is hard to use - = but going mainline we could add this.

* The autorate_ingress code should be marked = experimental. I keep hoping
it can be improved by better looking for = "smoothness" inbound, but
algorithms escape me. This doesn't bother = me much, as tcp continues to
be improved over the past 50 years, = perhaps we can find ways to improve
this with more users.

* It is possible to tune the quantum and = peeling functions to not peel
to the extent they do. Particularly there = is usually no need (aside from
wanting accurate statistics) to peel = below 1500 bytes (except perhaps
with the new ack filter mode). We = experimented a lot with this in the
early days but could never come to a = resolution.

* I don't have any = use for precidence mode and would like to remove = it.

Regarding = non-blockers, for FreeNet=E2=80=99s purposes, I wanted to see if I could = add the option to use packet marks as one of the identifiers for host = isolation, but I=E2=80=99ve not had time to explore it yet. This would = be helpful for ISPs that want to ensure fairness when there isn=E2=80=99t = a one-to-one mapping between IP address and customer. I=E2=80=99ll see = if I can at least try it.
= --Apple-Mail=_FD5E9921-4030-4B00-A078-3E5FCC77FE7D--