From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 6BE2B21F283 for ; Mon, 11 May 2015 00:36:33 -0700 (PDT) Received: from u-084-c190.eap.uni-tuebingen.de ([134.2.84.190]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0MN1Gu-1YpUGc3ZVY-006csN; Mon, 11 May 2015 09:36:30 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: <1F323E22-817A-4212-A354-C6A14D2F1DBB@gmail.com> Date: Mon, 11 May 2015 09:36:30 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <152DD781-725D-4DD7-AB94-C7412D92F82C@gmx.de> <1F323E22-817A-4212-A354-C6A14D2F1DBB@gmail.com> To: Jonathan Morton X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:yi/zrZkoMju21tOZMKuIpMBn64tMPDOFyYb6/vSBbpjDUoNlnnO WMkPWykoYth2/Z3NW/DZKIsAVZ5fUaBTZKbdfgAnxG55yWYrEiYpZEBoHzZ/VfeHPaOvZBr l0YOTQrd4MpkCxcYovPJ9ELE3jAhhr97qYnTHphSoh2a4C088SHgEdjhCrvcJnXlwevFV33 8+Tag10Lu4v+J9Cc90TnA== X-UI-Out-Filterresults: notjunk:1; Cc: cake@lists.bufferbloat.net Subject: Re: [Cake] Control theory and congestion control X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 May 2015 07:37:03 -0000 Hi Jonathan, On May 10, 2015, at 20:32 , Jonathan Morton = wrote: >=20 >> On 10 May, 2015, at 19:48, Sebastian Moeller wrote: >>=20 >>> Congestion control looks like a simple problem too. If there is no = congestion, increase the amount of data in flight; if there is, reduce = it. We even have Explicit Congestion Notification now to tell us that = crucial data point, but we could always infer it from dropped packets = before. >>=20 >> I think we critically depend on being able to interpret lost packets = as well, as a) not all network nodes use ECN signaling, and b) even = those that do can go into =93drop-everything=94 mode if overloaded. >=20 > Yes, but I consider that a degraded mode of operation. Even if it is, = for the time being, the dominant mode. >=20 >> 1) Competiton with simple greedy non-ECN flows, if these push the = router into the dropping regime how will well behaved ECN flows be able = to compete? >=20 > Backwards compatibility for current ECN means dropping non-ECN packets = that would have been marked. That works, so we can use it as a model. Let me elaborate, what I mean is if we got an ecn reduce slowly = signal on the ecn flow and the router goes into overload, what = guarantees that our flow with the double reduce-slowly ecn signal plus = the reduce-hard drop will end not end up at an disadvantage over greedy = non-ecn flows? It probably is quite simple but I can not see it right = now. >=20 > Backwards compatibility for =93enhanced=94 ECN - let=92s call it ELR = for Explicit Load Regulation - would mean providing legacy ECN signals = to legacy ECN traffic. But, in the absence of flow isolation, if we = only marked packets with ECN when they fell into the =93fast down=94 = category (which corresponds to their actual behaviour), then they=92d = get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in = the day (and for basically the same reason). In other words ELR will be outcompeted by ECN classic? >=20 > The solution is to provide robust flow isolation, and/or to ECN-mark = packets in =93hold=94 and =93slow down=94 states as well as =93fast = down=94. This ensures that legacy ECN does not unfairly outcompete ELR, = although it might reduce ECN traffic=92s throughput. Well if we want ELR to be the next big thing we should aim to = make it more competitive than classic ECN (assuming we get enough = =93buy-in=94 from the regulating parties, like IETF and friends) >=20 > The other side of the compatibility coin is what happens when ELR = traffic hits a legacy router (whether ECN enabled or not). Such a = router should be able to recognise ELR packets as ECN and perform ECN = marking when appropriate, to be interpreted as a =93fast down=94 signal. = Or, of course, to simply drop packets if it doesn=92t even support ECN. >=20 >> And how can the intermediate router control/check that a flow truly = is well-behaved, especially with all the allergies against keeping = per-flow state that router=92s seem to have? >=20 > Core routers don=92t track flow state, but they are typically = provisioned to not saturate their links in the first place. =20 This I heard quite often; it always makes me wonder whether = there is a better way to design a network to work well at capacity = instead of working around this by simply over-provisining, I thought it = is called network engineering not network-=93brute-forcing=94... > Adequate backwards-compatibility handling will do here. >=20 > Edge routers are rather more capable of keeping sufficient per-flow = state for effective flow isolation, as cake and fq_codel do. But we already have a hard time to convince the operators of the = edge routers (telcos cable cos=85) to actually implement something saner = than deep buffers at those devices. If they would at least own up to the = head-end buffers for the downlink we would be in much better shape, and = if they would offer to handle up-link buffer bloat as part of their = optional ISP-router-thingy the issue would be stamped already. But did = you look inside a typical CPE recently, still kernel from the 2.X = series, so no codel/fq_codel and what ever else fixes were found in the = several years since 2.X was the hot new thing=85 >=20 > Unresponsive flows are already just as much of a problem with ECN as = they would be with ELR. Flow isolation contains the problem neatly. = Transitioning to packet drops (ignoring both ECN and ELR) under overload = conditions is also a good safety valve. >=20 >> Is the steady state, potentially outside of the home, link truly = likely enough that an non-oscillating congestion controller will = effectively work better? In other words would the intermediate node ever = signal hold sufficiently often that implementing this stage seems = reasonable? >=20 > It=92s a fair question, and probably requires further research to = answer reliably. However, you should also probably consider the typical = nature of the *bottleneck* link, rather than every possible Internet = link. It=92s usually the last mile. I wish that was true=85 I switched to a 100/40 link and since = then suffer from bad peering of my ISP (this seems to be on purpose to = incentivise content providers to agree to payed peering with my ISP, but = it seems only very little of the content providers went along, and so I = feel that even the router=92s connecting different networks could work = much better/fairer under saturating load=85 but I have no real data nor = ways to measure it so this is conjecture) >=20 >> True, but how stable is a network path actually over seconds time = frames? >=20 > Stable enough for VoIP and multiplayer twitch games to work already, = if the link is idle. Both of which pretty much try to keep constant bitrate UDP = traffic flows going I believe, so they only care if the immediate = network path and or alternatives a) has sufficient headroom for the data = and b) latency changes due to path re-routing stay inside the = de-jitter/de-lag buffer systems that are in use; or put differently, = these traffic types will not attempt to saturate a given link by = themselves so they are not the most sensitive probes for network path = stability, no? >=20 >> Could an intermediate router actually figure out what signal to send = all flows realistically? >=20 > I described a possible method of doing so, using information already = available in fq_codel and cake. =20 We are back at the issue, how to make sure big routers learn = codel /q_codel as options in their AQM subsystems=85 It would be = interesting to know what the cisco=92s/juniper=92s/huawei=92s of the = world actually test in their private labs ;) Best Regards Sebastian > Whether they would work satisfactorily in practice is an open = question. >=20 > - Jonathan Morton >=20