From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com
	[209.85.212.175]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 9E18E2006AA;
	Wed,  9 May 2012 19:34:57 -0700 (PDT)
Received: by wibhn19 with SMTP id hn19so968990wib.10
	for <multiple recipients>; Wed, 09 May 2012 19:34:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	bh=32g0vYMqReJDih1qxnb5VwZUB4TkYVfLYU+KxdBLoCA=;
	b=eJApni8ghIFjByKnocHrM4uQv4LWvPHl+9BML2F/NfsQhs+f/BlJOMDvjLxh9Tsw1m
	wPjiNe23NJAt4tQ2s49KdkQpZmbu2ZKocvnmPJMYsfNqyn/oWphY2mTCWjo22YrJjxqX
	qb84WIPbsguvgl+cKDUbK3pbHOsMbUq9zG5mx14MpIX06Mti7ZsTloGuPiRduQoCPtAs
	JyRRPY1l+HCxy/8aJhR8NiXt6kozipOJr4CRRNwW5zkFerFo96KJrrBYthYNgO9KRjK5
	9OsRGyYYj0vXtA2AH7z6SkGU3tpUPtQ2r6gb+s6S0gT3xSQucVbaP/tLtRnxArDhFJfX
	3jOg==
Received: by 10.216.27.200 with SMTP id e50mr1387590wea.23.1336617295635;
	Wed, 09 May 2012 19:34:55 -0700 (PDT)
Received: from [192.168.1.4] (xdsl-83-150-84-172.nebulazone.fi.
	[83.150.84.172])
	by mx.google.com with ESMTPS id fn2sm19413wib.0.2012.05.09.19.34.53
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 09 May 2012 19:34:55 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <CAA93jw7EUSf-NAg9q6oxRms5LQopO-VseYUtXAyCkUi=Yitx-g@mail.gmail.com>
Date: Thu, 10 May 2012 05:34:52 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <532BA404-30FF-4395-9575-709A68E4C723@gmail.com>
References: <CAA93jw7EUSf-NAg9q6oxRms5LQopO-VseYUtXAyCkUi=Yitx-g@mail.gmail.com>
To: Dave Taht <dave.taht@gmail.com>
X-Mailer: Apple Mail (2.1084)
Cc: codel@lists.bufferbloat.net, bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Codel] [Bloat] The challenge
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 10 May 2012 02:34:58 -0000


On 9 May, 2012, at 4:04 am, Dave Taht wrote:

> Both the acm queue article and jim's blog entry this morning were way
> above mensa's standards.

Does that mean we're allowed to feel extra smart if we understood it =
anyway?  :-)

> Nobody has attempted to explain the elegant simplicity of the
> algorithm itself in the inverse sqrt however!  I have a good grip on
> it, and am trying, but can barely explain it to myself. Anyone else
> care to dig through the codel code and try to put it into english?

Sounds like fun!

I have a suspicion myself of what the quantitative reasoning behind the =
inverse sqrt might be, but I also suspect that such details are much =
less important than some of the *qualitative* properties of the =
algorithm.  Most likely the people that most need to be convinced will =
respond better to that than to hard maths anyway.

I think however that everyone has missed the single most important =
property: the fact that drops occur on *dequeue* rather than enqueue, =
and the result that this keeps the "resonant frequency" of each flow =
near the physical RTT by minimising the delay between a packet drop and =
the TCP's detection of and response to it.

It is counter-intuitive to keep a queue filled with packets that you =
might drop later - as an engineer you tend to think it is more efficient =
to drop a packet *before* it starts to consume space in your precious =
queue.  This is a good deal of what needs to be solved in wireless =
drivers - there is a wrong assumption in there that dropping at the tail =
is all that is needed, and that leads to wrong design decisions.

Tail-drop and RED drop on enqueue (at tail), not dequeue (at head), so =
the missing packet cannot be detected by the TCP until it's "bubble" has =
passed through the queue.  This sets the resonant frequency lower than =
the true RTT and delays the TCP's response, so the queue remains =
overfilled for longer (triggering extra packet drops which require extra =
recovery) and recovery (through retransmission) also takes longer and is =
less reliable.

Another point that might be missed by people naively reading the paper - =
as opposed to regular readers of AQM literature - is that "drop" can =
actually mean "mark with ECN".  Looking at the code, ECN is indeed used =
when appropriate.  This is good because it signals the TCP without =
forcing a retransmission - and without "wasting" the space in the queue =
occupied by the packet.  This is a good reinforcement of the principle =
that tail-dropping is no longer the right choice.

Important property #3 is that the queue responds immediately when it's =
length decreases below the threshold, by stopping the marking/dropping =
process entirely.  This helps to maintain the high-frequency signal (of =
"too fast" versus "slow enough") to the TCPs.  The response to exceeding =
the threshold is slower, but only to the extent of avoiding =
over-reaction to a natural burst of traffic.

Now about the inverse sqrt: qualitatively, it is just a convenient =
method of gradually varying the mark/drop rate in terms of a time =
interval rather than a packet count interval.  The longer the queue =
remains overfull - controlled by the number of mark/drop events that =
have occurred - the higher the marking/dropping rate gets.  If the queue =
then empties (and stays empty-ish for a few RTTs), the rate is reset.  =
Meanwhile, if the queue regularly oscillates between "full" and "empty" =
states, the marking/dropping rate is remembered so that aggressive TCPs =
receive the correct magnitude signal.

Now as for the *quantitative* reason behind it, it is because as the =
interval between drops gets shorter, the number of increments per RTT =
goes up because there is one increment per drop.  If the interval is =
shortened linearly per increment, that means it will in practice shorten =
exponentially faster, such that the drop rate runs away faster than the =
TCP can reasonably be expected to respond.  This would result in =
excessive drop rates and oscillatory behaviour typical of an =
over-sensitive control system.  By basing the drop rate on the square =
root of the incremented variable, the runaway behaviour is curtailed =
since the drop interval now shortens linearly per RTT.

Maybe a diagram or animation would help to clarify that last paragraph.  =
I'm fairly sure I can draw one.

So overall we have an AQM that provides a low-latency signal of =
appropriate magnitude to TCPs when the link is genuinely congested, and =
gets out of the way when it isn't.  Combined with a fair queueing =
discipline (eg. SFQ or QFQ), I think this will turn out to be an =
excellent default setting for all sorts of equipment.  What would it =
take to get this into a DSL modem (at either end of the link)?

 - Jonathan Morton