From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com [IPv6:2a00:1450:4010:c04::22d]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id D80F521F248 for ; Mon, 11 May 2015 04:34:36 -0700 (PDT) Received: by lbbuc2 with SMTP id uc2so91472799lbb.2 for ; Mon, 11 May 2015 04:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=rPHvOs19pu3ZR3hQY+ov0cogh2JdJIkkFV+UaYnNGJ8=; b=iguCI21ufw/QeOUOgCslMEgEoTE5Uc/Vpaaw9l87jgeLTm1/w5p6n01wG0RocQFSay 07bCKFYUPpFDl6oaplb0x9zEE02lhLoQwYlsSlGRBoTLSoh7d6X6pzN5LxxumSOuQEfV 9CniCF4DM/ws7gDqYXskkA+tZaAoOB8tq562U7BsCxeJJLM4NSil6Ru3tpHpjGIm6jfd XKhSCIThx2A+MkZwGVcLKBJxnHCabonEsdbTLxsVUDi+knAUVprTNKWIH6zS/YhTrmlc JNE50T8DKD83hP2pKvhKOK4egC6XotS97f9XpvSGFvGa0F4TmT0R3150//RjjW3fI8pT sU5A== X-Received: by 10.152.7.97 with SMTP id i1mr7530856laa.49.1431344074161; Mon, 11 May 2015 04:34:34 -0700 (PDT) Received: from bass.home.chromatix.fi (188-67-185-198.bb.dnainternet.fi. [188.67.185.198]) by mx.google.com with ESMTPSA id rp10sm2994373lbb.8.2015.05.11.04.34.25 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 11 May 2015 04:34:33 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) From: Jonathan Morton In-Reply-To: Date: Mon, 11 May 2015 14:34:16 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <152DD781-725D-4DD7-AB94-C7412D92F82C@gmx.de> <1F323E22-817A-4212-A354-C6A14D2F1DBB@gmail.com> To: Sebastian Moeller X-Mailer: Apple Mail (2.2098) Cc: cake@lists.bufferbloat.net Subject: Re: [Cake] Control theory and congestion control X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 May 2015 11:35:05 -0000 >>>> Congestion control looks like a simple problem too. If there is no = congestion, increase the amount of data in flight; if there is, reduce = it. We even have Explicit Congestion Notification now to tell us that = crucial data point, but we could always infer it from dropped packets = before. >>>=20 >>> I think we critically depend on being able to interpret lost packets = as well, as a) not all network nodes use ECN signaling, and b) even = those that do can go into =93drop-everything=94 mode if overloaded. >>=20 >> Yes, but I consider that a degraded mode of operation. Even if it = is, for the time being, the dominant mode. >>=20 >>> 1) Competiton with simple greedy non-ECN flows, if these push the = router into the dropping regime how will well behaved ECN flows be able = to compete? >>=20 >> Backwards compatibility for current ECN means dropping non-ECN = packets that would have been marked. That works, so we can use it as a = model. >=20 > Let me elaborate, what I mean is if we got an ecn reduce slowly = signal on the ecn flow and the router goes into overload, what = guarantees that our flow with the double reduce-slowly ecn signal plus = the reduce-hard drop will end not end up at an disadvantage over greedy = non-ecn flows? It probably is quite simple but I can not see it right = now. There are two possible answers to this: 1) The most restrictive signal seen during an RTT is the one to react = to. So a =93fast down=94 signal overrides anything else. 2) If ELR signals are being received which indicate that the bottleneck = queue is basically under control, then it might be reasonable to assume = that packet drops in the same RTT are *not* congestion related, but due = to random losses. This is not in itself novel behaviour: Westwood+ uses = RTT variation to infer the same thing. >> Backwards compatibility for =93enhanced=94 ECN - let=92s call it ELR = for Explicit Load Regulation - would mean providing legacy ECN signals = to legacy ECN traffic. But, in the absence of flow isolation, if we = only marked packets with ECN when they fell into the =93fast down=94 = category (which corresponds to their actual behaviour), then they=92d = get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in = the day (and for basically the same reason). >=20 > In other words ELR will be outcompeted by ECN classic? Given such a naive implementation, yes. Bear in mind that I=92m = essentially thinking out loud here. The details are *not* all worked = out. >> The solution is to provide robust flow isolation, and/or to ECN-mark = packets in =93hold=94 and =93slow down=94 states as well as =93fast = down=94. This ensures that legacy ECN does not unfairly outcompete ELR, = although it might reduce ECN traffic=92s throughput. >=20 > Well if we want ELR to be the next big thing we should aim to = make it more competitive than classic ECN (assuming we get enough = =93buy-in=94 from the regulating parties, like IETF and friends) It=92s one possible approach. Unambiguous throughput improvements = probably do sell well. I=92m also now thinking about how to approximate fairness between ELR = flows *without* flow isolation. Since ELR would aim to provide a = continuous signal rather than a stochastic one, this is actually a = harder problem than it sounds; naively, a new flow would stay at minimum = cwnd as long as a competing flow was saturating the link, since both = would be given the same up/down signals. There might need to be some = non-obvious properties in the way the signal is provided to overcome = that; I have the beginnings of an idea, but need to work it out. >> Edge routers are rather more capable of keeping sufficient per-flow = state for effective flow isolation, as cake and fq_codel do. >=20 > But we already have a hard time to convince the operators of the = edge routers (telcos cable cos=85) to actually implement something saner = than deep buffers at those devices. If they would at least own up to the = head-end buffers for the downlink we would be in much better shape, and = if they would offer to handle up-link buffer bloat as part of their = optional ISP-router-thingy the issue would be stamped already. But did = you look inside a typical CPE recently, still kernel from the 2.X = series, so no codel/fq_codel and what ever else fixes were found in the = several years since 2.X was the hot new thing=85 For CPE at least, there exists a market opportunity for somebody to = fill. OpenWRT shows what can be done with existing hardware with some = user engagement. In principle, it=92s only a short step from there to a = new commercial product that Does the Right Things. >>> Is the steady state, potentially outside of the home, link truly = likely enough that an non-oscillating congestion controller will = effectively work better? In other words would the intermediate node ever = signal hold sufficiently often that implementing this stage seems = reasonable? >>=20 >> It=92s a fair question, and probably requires further research to = answer reliably. However, you should also probably consider the typical = nature of the *bottleneck* link, rather than every possible Internet = link. It=92s usually the last mile. >=20 > I wish that was true=85 I switched to a 100/40 link and since = then suffer from bad peering of my ISP (this seems to be on purpose to = incentivise content providers to agree to payed peering with my ISP, but = it seems only very little of the content providers went along, and so I = feel that even the router=92s connecting different networks could work = much better/fairer under saturating load=85 but I have no real data nor = ways to measure it so this is conjecture) >> Core routers don=92t track flow state, but they are typically = provisioned to not saturate their links in the first place. =20 >=20 > This I heard quite often; it always makes me wonder whether = there is a better way to design a network to work well at capacity = instead of working around this by simply over-provisining, I thought it = is called network engineering not network-=93brute-forcing=94=85 Peering points are one of the few =93core like=94 locations where = adequate capacity cannot be relied on. Fortunately, what I hear is that = peering links are often made using a set of 10GbE cables. At 10Gbps, = it=92s entirely feasible to run fq_codel (probably based on IP = addresses, not individual flows) in software, never mind in hardware. = So that=92s a solvable problem at the technical level. The fact that certain ISPs are *deliberately* restricting capacity is a = thornier problem, and one that=92s entirely political. True core networks are, I hear, often made using optical switches rather = than routers per se. It=92s a very alien environment. I wouldn=92t be = surprised if there was difficulty even running something as simple as = RED at the speeds they use. I=92m perfectly happy with the idea of them = aiming to keep the bottlenecks elsewhere - at the peering points if = nowhere else. >>> True, but how stable is a network path actually over seconds time = frames? >>=20 >> Stable enough for VoIP and multiplayer twitch games to work already, = if the link is idle. >=20 > Both of which pretty much try to keep constant bitrate UDP = traffic flows going I believe, so they only care if the immediate = network path and or alternatives a) has sufficient headroom for the data = and b) latency changes due to path re-routing stay inside the = de-jitter/de-lag buffer systems that are in use; or put differently, = these traffic types will not attempt to saturate a given link by = themselves so they are not the most sensitive probes for network path = stability, no? I fully appreciate that *some* network paths may be unstable, and any = congestion control system will need to chase the sweet spot up and down = under such conditions. Most of the time, however, baseline RTT is stable over timescales of the = order of minutes, and available bandwidth is dictated by the last-mile = link as the bottleneck. BDP and therefore the ideal cwnd is a simple = function of baseline RTT and bandwidth. Hence there are common = scenarios in which a steady-state condition can exist. That=92s enough = to justify the =93hold=94 signal. >>> Could an intermediate router actually figure out what signal to send = all flows realistically? >>=20 >> I described a possible method of doing so, using information already = available in fq_codel and cake. =20 >=20 > We are back at the issue, how to make sure big routers learn = codel /q_codel as options in their AQM subsystems=85 It would be = interesting to know what the cisco=92s/juniper=92s/huawei=92s of the = world actually test in their private labs ;)