From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com [IPv6:2a00:1450:4010:c04::22d]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 411F221F3F1; Sat, 9 May 2015 23:56:05 -0700 (PDT) Received: by lbbqq2 with SMTP id qq2so75895299lbb.3; Sat, 09 May 2015 23:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ZAMineARucg5YbOaGtUsVs7gX4Kk/UtUsQl6R60gvo4=; b=wZMjhEb190gmw2PUD80ws/gMs1WRK2r4UBwWmFigrATahDSU3xOpWwhhEAfrUTq1RD cS0qhetkZR2FE0kuBgmRAlnrKQzdk265qSUpsPUy2aOgaX5fm5cIoGK6rnI3qiiRumUO X7cc7vHDXnFzGv2GA+eMuDWxyPyQH8/lphYv0KbG6jSagJ+lj1Yi+KqyrjwaOgB4a82Z qBGopHQEDVqI/8XYeYQrmfBIiha+scYEKEuJRZCCAYUAeBAm1+XeDLR0ssLKrYi7wAWb 2yM95d50eP444YWwBKIDUdHjqjtWudfIBUSCJJvOpKUyi9qtEo4ryGKpAe3DYXxf7DTZ vjhQ== X-Received: by 10.152.23.163 with SMTP id n3mr4068933laf.34.1431240962993; Sat, 09 May 2015 23:56:02 -0700 (PDT) Received: from bass.home.chromatix.fi (188-67-131-101.bb.dnainternet.fi. [188.67.131.101]) by mx.google.com with ESMTPSA id eq9sm2228238lac.7.2015.05.09.23.55.54 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 09 May 2015 23:56:02 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) From: Jonathan Morton In-Reply-To: Date: Sun, 10 May 2015 09:55:41 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <88693200-1BB3-4E79-9CCF-788041A5713C@gmail.com> References: To: Dave Taht X-Mailer: Apple Mail (2.2098) Cc: cake@lists.bufferbloat.net, "codel@lists.bufferbloat.net" , bloat Subject: Re: [Cake] Control theory and congestion control X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 06:56:35 -0000 > On 10 May, 2015, at 06:35, Dave Taht wrote: >=20 > On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton = wrote: >>> The "right" amount of buffering is *1* packet, all the time (the = goal is >>> nearly 0 latency with 100% utilization). We are quite far from = achieving >>> that on anything... >>=20 >> And control theory shows, I think, that we never will unless the = mechanisms >> available to us for signalling congestion improve. ECN is good, but = it's not >> sufficient to achieve that ultimate goal. I'll try to explain why. >=20 > The conex and dctcp work explored using ecn for multi-bit signalling. A quick glance at those indicates that they=E2=80=99re focusing on the = echo path - getting the data back from the receiver to the sender. = That=E2=80=99s the *easy* part; all you need is a small TCP option, = which can be slotted into the padding left by TCP Timestamps and/or = SACK, so it doesn=E2=80=99t even take any extra space. But they do nothing to address the problem of allowing routers to = provide a =E2=80=9Chold=E2=80=9D signal. Even a single ECN mark has to = be taken to mean =E2=80=9Cback off=E2=80=9D; being able to signal that = more than one ECN mark happened in one RTT simply means that you now = have a way to say =E2=80=9Cback off harder=E2=80=9D. The problem is that we need a three-bit signal (five new-style = signalling states, two states indicating legacy ECN support, and one = =E2=80=9CECN unsupported=E2=80=9D state) at the IP layer to do it = properly, and we=E2=80=99re basically out of bits there, at least in = IPv4. The best solution I can think of right now is to use both of the = ECT states somehow, but we=E2=80=99d have to make sure that doesn=E2=80=99= t conflict too badly with existing uses of ECT(1), such as the =E2=80=9Cno= nce sum=E2=80=9D. Backwards and forwards compatibility here is = essential. I=E2=80=99m thinking about the problem. >> Bufferbloat is fundamentally about having insufficient information at = the >> endpoints about conditions in the network. >=20 > Well said. >=20 >> We've done a lot to improve that, >> by moving from zero information to one bit per RTT. But to achieve = that holy >> grail, we need more information still. >=20 > context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc. >=20 >> Specifically, we need to know when we're at the correct BDP, not just = when >> it's too high. And it'd be nice if we also knew if we were close to = it. But >> there is currently no way to provide that information from the = network to >> the endpoints. >=20 > This is where I was pointing out that FQ and the behavior of multiple > flows in their two phases (slow start and congestion avoidance) > provides a few pieces of useful information that could possibly be > used to get closer to the ideal. There certainly is enough information available in fq_codel and cake to = derive a five-state congestion signal, rather than a two-state one, with = very little extra effort. Flow is sparse -> =E2=80=9CFast up=E2=80=9D Flow is saturating, but no standing queue -> =E2=80=9CSlow up=E2=80=9D Flow is saturating, with small standing queue -> =E2=80=9CHold=E2=80=9D Flow is saturating, with large standing queue -> =E2=80=9CSlow down=E2=80=9D= Flow is saturating, with large, *persistent* standing queue -> =E2=80=9CFa= st down=E2=80=9D In simple terms, =E2=80=9Cfast=E2=80=9D here means =E2=80=9Cmultiplicative= =E2=80=9D and =E2=80=9Cslow=E2=80=9D means =E2=80=9Cadditive=E2=80=9D, = in the sense of AIMD being the current standard for TCP behaviour. AIMD = itself is a result of the two-state =E2=80=9Cbang-bang=E2=80=9D control = model introduced back in the 1980s. It=E2=80=99s worth remembering that the Great Internet Congestion = Collapse Event was 30 years ago, and ECN was specified 15 years ago. > A control theory-ish issue with codel is that it depends on an = arbitrary ideal (5ms) as a definition for "good queue", where "a > gooder queue=E2=80=9D is, in my definition at the moment, "1 packet = outstanding ever closer to 100% of the time while there is 100% = utilization=E2=80=9D. As the above table shows, Codel reacts (by design) only to the most = extreme situation that we would want to plug into an improved = congestion-control model. It=E2=80=99s really quite remarkable, in that = context, that it works as well as it does. I don=E2=80=99t think we can = hope to do significantly better until a better signalling mechanism is = available. But it does highlight that the correct meaning of an ECN mark is =E2=80=9C= back off hard, now=E2=80=9D. That=E2=80=99s how it=E2=80=99s currently = interpreted by TCPs, in accordance with the ECN RFCs, and Codel relies = on that behaviour too. We have to use some other, deliberately softer = signal to give a =E2=80=9Chold=E2=80=9D or even a =E2=80=9Cslow down=E2=80= =9D indication. - Jonathan Morton