From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com
	[IPv6:2a00:1450:4010:c04::22d])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id D80F521F248
	for <cake@lists.bufferbloat.net>; Mon, 11 May 2015 04:34:36 -0700 (PDT)
Received: by lbbuc2 with SMTP id uc2so91472799lbb.2
	for <cake@lists.bufferbloat.net>; Mon, 11 May 2015 04:34:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=content-type:mime-version:subject:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to;
	bh=rPHvOs19pu3ZR3hQY+ov0cogh2JdJIkkFV+UaYnNGJ8=;
	b=iguCI21ufw/QeOUOgCslMEgEoTE5Uc/Vpaaw9l87jgeLTm1/w5p6n01wG0RocQFSay
	07bCKFYUPpFDl6oaplb0x9zEE02lhLoQwYlsSlGRBoTLSoh7d6X6pzN5LxxumSOuQEfV
	9CniCF4DM/ws7gDqYXskkA+tZaAoOB8tq562U7BsCxeJJLM4NSil6Ru3tpHpjGIm6jfd
	XKhSCIThx2A+MkZwGVcLKBJxnHCabonEsdbTLxsVUDi+knAUVprTNKWIH6zS/YhTrmlc
	JNE50T8DKD83hP2pKvhKOK4egC6XotS97f9XpvSGFvGa0F4TmT0R3150//RjjW3fI8pT
	sU5A==
X-Received: by 10.152.7.97 with SMTP id i1mr7530856laa.49.1431344074161;
	Mon, 11 May 2015 04:34:34 -0700 (PDT)
Received: from bass.home.chromatix.fi (188-67-185-198.bb.dnainternet.fi.
	[188.67.185.198])
	by mx.google.com with ESMTPSA id rp10sm2994373lbb.8.2015.05.11.04.34.25
	(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
	Mon, 11 May 2015 04:34:33 -0700 (PDT)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <E453FF95-5C1C-4A89-9C66-17FA33BBC83B@gmx.de>
Date: Mon, 11 May 2015 14:34:16 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <C7425B27-6704-4269-8EFC-3D4CD9EE1FD1@gmail.com>
References: <CAJq5cE35ptd-P=EPB4-qhnfVZiMmXWUFL4jD2_BxUUCvU2ACGw@mail.gmail.com>
	<152DD781-725D-4DD7-AB94-C7412D92F82C@gmx.de>
	<1F323E22-817A-4212-A354-C6A14D2F1DBB@gmail.com>
	<E453FF95-5C1C-4A89-9C66-17FA33BBC83B@gmx.de>
To: Sebastian Moeller <moeller0@gmx.de>
X-Mailer: Apple Mail (2.2098)
Cc: cake@lists.bufferbloat.net
Subject: Re: [Cake] Control theory and congestion control
X-BeenThere: cake@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Cake - FQ_codel the next generation <cake.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cake>,
	<mailto:cake-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cake>
List-Post: <mailto:cake@lists.bufferbloat.net>
List-Help: <mailto:cake-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cake>,
	<mailto:cake-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2015 11:35:05 -0000

>>>> Congestion control looks like a simple problem too. If there is no =
congestion, increase the amount of data in flight; if there is, reduce =
it. We even have Explicit Congestion Notification now to tell us that =
crucial data point, but we could always infer it from dropped packets =
before.
>>>=20
>>> I think we critically depend on being able to interpret lost packets =
as well, as a) not all network nodes use ECN signaling, and b) even =
those that do can go into =93drop-everything=94 mode if overloaded.
>>=20
>> Yes, but I consider that a degraded mode of operation.  Even if it =
is, for the time being, the dominant mode.
>>=20
>>> 1) Competiton with simple greedy non-ECN flows, if these push the =
router into the dropping regime how will well behaved ECN flows be able =
to compete?
>>=20
>> Backwards compatibility for current ECN means dropping non-ECN =
packets that would have been marked.  That works, so we can use it as a =
model.
>=20
> 	Let me elaborate, what I mean is if we got an ecn reduce slowly =
signal on the ecn flow and the router goes into overload, what =
guarantees that our flow with the double reduce-slowly ecn signal plus =
the reduce-hard drop will end not end up at an disadvantage over greedy =
non-ecn flows? It probably is quite simple but I can not see it right =
now.

There are two possible answers to this:

1) The most restrictive signal seen during an RTT is the one to react =
to.  So a =93fast down=94 signal overrides anything else.

2) If ELR signals are being received which indicate that the bottleneck =
queue is basically under control, then it might be reasonable to assume =
that packet drops in the same RTT are *not* congestion related, but due =
to random losses.  This is not in itself novel behaviour: Westwood+ uses =
RTT variation to infer the same thing.

>> Backwards compatibility for =93enhanced=94 ECN - let=92s call it ELR =
for Explicit Load Regulation - would mean providing legacy ECN signals =
to legacy ECN traffic.  But, in the absence of flow isolation, if we =
only marked packets with ECN when they fell into the =93fast down=94 =
category (which corresponds to their actual behaviour), then they=92d =
get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in =
the day (and for basically the same reason).
>=20
> 	In other words ELR will be outcompeted by ECN classic?

Given such a naive implementation, yes.  Bear in mind that I=92m =
essentially thinking out loud here.  The details are *not* all worked =
out.

>> The solution is to provide robust flow isolation, and/or to ECN-mark =
packets in =93hold=94 and =93slow down=94 states as well as =93fast =
down=94.  This ensures that legacy ECN does not unfairly outcompete ELR, =
although it might reduce ECN traffic=92s throughput.
>=20
> 	Well if we want ELR to be the next big thing we should aim to =
make it more competitive than classic ECN (assuming we get enough =
=93buy-in=94 from the regulating parties, like IETF and friends)

It=92s one possible approach.  Unambiguous throughput improvements =
probably do sell well.

I=92m also now thinking about how to approximate fairness between ELR =
flows *without* flow isolation.  Since ELR would aim to provide a =
continuous signal rather than a stochastic one, this is actually a =
harder problem than it sounds; naively, a new flow would stay at minimum =
cwnd as long as a competing flow was saturating the link, since both =
would be given the same up/down signals.  There might need to be some =
non-obvious properties in the way the signal is provided to overcome =
that; I have the beginnings of an idea, but need to work it out.

>> Edge routers are rather more capable of keeping sufficient per-flow =
state for effective flow isolation, as cake and fq_codel do.
>=20
> 	But we already have a hard time to convince the operators of the =
edge routers (telcos cable cos=85) to actually implement something saner =
than deep buffers at those devices. If they would at least own up to the =
head-end buffers for the downlink we would be in much better shape, and =
if they would offer to handle up-link buffer bloat as part of their =
optional ISP-router-thingy the issue would be stamped already. But did =
you look inside a typical CPE recently, still kernel from the 2.X =
series, so no codel/fq_codel and what ever else fixes were found in the =
several years since 2.X was the hot new thing=85

For CPE at least, there exists a market opportunity for somebody to =
fill.  OpenWRT shows what can be done with existing hardware with some =
user engagement.  In principle, it=92s only a short step from there to a =
new commercial product that Does the Right Things.

>>> Is the steady state, potentially outside of the home, link truly =
likely enough that an non-oscillating congestion controller will =
effectively work better? In other words would the intermediate node ever =
signal hold sufficiently often that implementing this stage seems =
reasonable?
>>=20
>> It=92s a fair question, and probably requires further research to =
answer reliably.  However, you should also probably consider the typical =
nature of the *bottleneck* link, rather than every possible Internet =
link.  It=92s usually the last mile.
>=20
> 	I wish that was true=85 I switched to a 100/40 link and since =
then suffer from bad peering of my ISP (this seems to be on purpose to =
incentivise content providers to agree to payed peering with my ISP, but =
it seems only very little of the content providers went along, and so I =
feel that even the router=92s connecting different networks could work =
much better/fairer under saturating load=85 but I have no real data nor =
ways to measure it so this is conjecture)

>> Core routers don=92t track flow state, but they are typically =
provisioned to not saturate their links in the first place. =20
>=20
> 	This I heard quite often; it always makes me wonder whether =
there is a better way to design a network to work well at capacity =
instead of working  around this by simply over-provisining, I thought it =
is called network engineering not network-=93brute-forcing=94=85

Peering points are one of the few =93core like=94 locations where =
adequate capacity cannot be relied on.  Fortunately, what I hear is that =
peering links are often made using a set of 10GbE cables.  At 10Gbps, =
it=92s entirely feasible to run fq_codel (probably based on IP =
addresses, not individual flows) in software, never mind in hardware.  =
So that=92s a solvable problem at the technical level.

The fact that certain ISPs are *deliberately* restricting capacity is a =
thornier problem, and one that=92s entirely political.

True core networks are, I hear, often made using optical switches rather =
than routers per se.  It=92s a very alien environment.  I wouldn=92t be =
surprised if there was difficulty even running something as simple as =
RED at the speeds they use.  I=92m perfectly happy with the idea of them =
aiming to keep the bottlenecks elsewhere - at the peering points if =
nowhere else.

>>> True, but how stable is a network path actually over seconds time =
frames?
>>=20
>> Stable enough for VoIP and multiplayer twitch games to work already, =
if the link is idle.
>=20
> 	Both of which pretty much try to keep constant bitrate UDP =
traffic flows going I believe, so they only care if the immediate =
network path and or alternatives a) has sufficient headroom for the data =
and b) latency changes due to path re-routing stay inside the =
de-jitter/de-lag buffer systems that are in use; or put differently, =
these traffic types will not attempt to saturate a given link by =
themselves so they are not the most sensitive probes for network path =
stability, no?

I fully appreciate that *some* network paths may be unstable, and any =
congestion control system will need to chase the sweet spot up and down =
under such conditions.

Most of the time, however, baseline RTT is stable over timescales of the =
order of minutes, and available bandwidth is dictated by the last-mile =
link as the bottleneck.  BDP and therefore the ideal cwnd is a simple =
function of baseline RTT and bandwidth.  Hence there are common =
scenarios in which a steady-state condition can exist.  That=92s enough =
to justify the =93hold=94 signal.

>>> Could an intermediate router actually figure out what signal to send =
all flows realistically?
>>=20
>> I described a possible method of doing so, using information already =
available in fq_codel and cake. =20
>=20
> 	We are back at the issue, how to make sure big routers learn =
codel /q_codel as options in their AQM subsystems=85 It would be =
interesting to know what the cisco=92s/juniper=92s/huawei=92s of the =
world actually test in their private labs ;)