From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com
	[IPv6:2a00:1450:4010:c04::235])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 276E921FBBA
	for <codel@lists.bufferbloat.net>; Mon, 20 Jul 2015 12:35:08 -0700 (PDT)
Received: by lblf12 with SMTP id f12so101554356lbl.2
	for <codel@lists.bufferbloat.net>; Mon, 20 Jul 2015 12:35:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=content-type:mime-version:subject:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to;
	bh=pyrBPJdSO+VGLN1l+OHx4NnpLYaCye91Tb0d8MCBBYY=;
	b=ZiYzRO9pCJLgHdrGaZIP4oCsfnF/FWxhT4URJCUJZB/+KE3iuOTkyvUid7qYi7tbFT
	1HJos2SmQ3L9U3sNPmswTbmXTarPwwYAigTtIAGen/mfWP483Rw5E0R6nclxtdxR9VA8
	yDPsCn1sPmkezVYj1ny5/ybMBNTrRL5s1zRgGkoKKwjt+enNq5tpuTTalK9PyV5cUL0M
	3+8XRgcDHGUoSkkFSqIrVARp5bm+fnEraLdjLPgWnrZyz8B47DLlQiyQatxDdW+py44e
	fDkGHRYWDhXCcamFhgSSDKB5lTYuWuliAWTOwyueutsVwchgH0nHGkEPEUa8MaOFiHPq
	RcIg==
X-Received: by 10.152.178.229 with SMTP id db5mr15830450lac.55.1437420906321; 
	Mon, 20 Jul 2015 12:35:06 -0700 (PDT)
Received: from bass.home.chromatix.fi (188-67-145-250.bb.dnainternet.fi.
	[188.67.145.250]) by smtp.gmail.com with ESMTPSA id
	pl1sm3062999lbc.30.2015.07.20.12.35.03
	(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
	Mon, 20 Jul 2015 12:35:05 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <55AD2695.8050605@kit.edu>
Date: Mon, 20 Jul 2015 22:34:59 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <07D540E8-1184-4DBD-B372-40C55A485C40@gmail.com>
References: <55AD2695.8050605@kit.edu>
To: Roland Bless <roland.bless@kit.edu>,
 Anil.Agarwal@viasat.com
X-Mailer: Apple Mail (2.2102)
Cc: codel@lists.bufferbloat.net, aqm@ietf.org
Subject: Re: [Codel] [aqm] Codel's count variable and re-entering dropping
	state at small time intervals
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 20 Jul 2015 19:35:37 -0000

I can=E2=80=99t speak for Kathy, but as Cake=E2=80=99s author I have =
some insight into the changes I=E2=80=99ve made to the associated =
version of Codel.

> Ronald Bless:

> 1. after exiting dropping state, count is usually reset, unless "the
> next drop state is entered too close to the previous one=E2=80=9D.

In fact, in most cases (ie. where the true RTT is near to or less than =
the one assumed by the interval parameter, and the flow is persistent) =
the drop state will be re-entered sufficiently soon to prevent the =
reset.  The backoff behaviour for that case is thus more significant in =
practice.

The standard backoff behaviour is to reduce count by just 2.  This is =
obviously wrong in the case where count has already reached a =
significant value; with a high drop frequency, count will climb a lot =
during a single RTT.  But if the elevated drop frequency is necessary =
and sufficient to control the flow, we don=E2=80=99t want it to grow =
indefinitely.

Cake=E2=80=99s version halves the count, reducing the drop frequency by =
a factor of sqrt(2) after a pause.  This multiplicative backoff prevents =
count from growing indefinitely, and allows it to reduce gradually to a =
lower value if that becomes appropriate due to reduced network load.

Anil's suggestion of continuously reducing count during the non-dropping =
state is an interesting alternative which might also make sense.  It has =
the pleasing property of relating the backoff rate to the length of time =
spent outside the dropping state.

> 2. is 'count' supposed to be reset or saturated on overflow and what
> should be its maximum value (it makes a difference whether you are =
using
> 16-, 32-, or 64-bit counter)?

It=E2=80=99s clear to me that a saturating increment is necessary to =
deal with unresponsive flows adequately.  Otherwise, the drop frequency =
will wrap around to a low value periodically when it needs to be =
consistently high.

My test case for this involves 50 upstream TCP flows on a narrow link =
(say 10Mbit) and a single downstream TCP flow on a wide link (say =
100Mbit).  In this configuration, the acks for the downstream flow =
constitute an unresponsive (in fact, anti-responsive) upstream flow of =
greater throughput demand than the individual upstream TCP flows.  =
Anti-responsive means that dropping packets actually increases the =
network load, because reducing the ack density permits higher throughput =
on the downlink, which in turn generates additional acks on the uplink.

Since Cake uses very robust flow isolation, simply delivering all the =
acks in order, using a fair-share of the bandwidth, would severely limit =
the throughput of the downstream flow.  However, the standard Codel =
behaviour resulted in wildly oscillating throughput on the downstream =
flow.  Straightforward calculations suggested that the count variable =
was probably wrapping.

At first, I only put in the modified backoff behaviour mentioned above.  =
Since the downstream flow was often reaching full throughput, I thought =
that occasionally the drop rate was managing to empty the ack queue, =
making the backoff behaviour significant. This assumption turned out to =
be correct, but the backoff modification was insufficient to completely =
correct the behaviour.  Only when I changed count to use a saturating =
increment instead of a wrapping one did the downstream flow achieve full =
throughput consistently.

It=E2=80=99s possible that if count was a wider variable (32 instead of =
16 bits) that the backoff modification would have been sufficient to =
prevent wrapping in this case.  However, using saturating arithmetic is =
inexpensive on modern CPUs (many have dedicated instructions which could =
be used by highly optimised implementations) and is the most robust =
solution.

Note that in the absence of robust flow isolation, eg. with plain Codel =
or PIE, the upstream TCP flows will back off to make room for all the =
acks.  This does not require a particularly high drop rate, but it =
reduces the upstream goodput significantly.  Cake instead achieves high =
goodput in both directions in this especially challenging case.

> Anil Agarwal:

> I have attached the updated document.

There are indeed a lot of suggestions here, which I haven=E2=80=99t yet =
fully assimilated.  I=E2=80=99ll cherry-pick a few that stood out:

> When RTT is small compared to the default Interval, CoDel is slow in =
reacting and controlling queue size/delay during TCP slow start.

> For example, if Interval =3D 100 ms and RTT =3D 20 ms, CoDel will not =
drop or mark a packet for the first 100 ms of a new TCP connection; if =
link rate =3D 10 Mbps, initial window size =3D 14600 bytes, then during =
this period, the TCP cwnd will grow to 104 kbytes; queuing delay will be =
63 ms.

Cake=E2=80=99s modified Codel does vary its sensitivity according to how =
quickly the queue is filling.  With the default parameters, it may =
trigger as quickly as 35ms after a large burst arrives, or as slowly as =
200ms if the queue is growing very slowly.

It could be improved further in theory, by giving Codel visibility of =
the length and drain rate of the queue (and thus the ability to predict =
future delay) rather than only observing the exit delay.  However, in =
the general case the drain rate is variable, so this may be difficult to =
make robust.

Ultimately, triggering instantly when the queue begins to fill is =
optimal to deal with slow-start, which will double its cwnd further =
while the congestion signal is in transit, before responding with a =
halving.  However, an instant trigger is suboptimal for the =
congestion-avoidance phase of TCP, which is why Codel always delays its =
response.  Flow isolation is a good (if imperfect) countermeasure for =
the suboptimal slow-start behaviour this causes.

My =E2=80=9CExplicit Load Regulation=E2=80=9D proposal, which hasn=E2=80=99=
t yet been widely circulated, aims to get closer to optimal behaviour in =
both situations.  It is an extension to ECN behaviour, giving the option =
of more moderate signals than ECN permits, while being easier to deploy =
incrementally than DCTCP is.  Hopefully I=E2=80=99ll be able to write it =
up properly later.

> The draft rfc states - =E2=80=9CFor example, there is about an 86% =
probability that no more than two of the 100 VoIP sessions will be =
involved in any given collision=E2=80=9D
> Based on analytical equations for hash collision probabilities (I =
derived them independently some time ago), the probability of no =
collision =3D 90.78%=E2=80=A6

Cake=E2=80=99s set-associative hash function reduces the probability of =
flow collisions further, to negligible values for the above number of =
flows.

> Alternatively, codel_dequeue() can be re-structured as two routines=E2=80=
=A6

Certainly a refactoring would be beneficial for readability.  I =
haven=E2=80=99t got around to doing it yet.

 - Jonathan Morton