From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 9E18E2006AA; Wed, 9 May 2012 19:34:57 -0700 (PDT) Received: by wibhn19 with SMTP id hn19so968990wib.10 for ; Wed, 09 May 2012 19:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=32g0vYMqReJDih1qxnb5VwZUB4TkYVfLYU+KxdBLoCA=; b=eJApni8ghIFjByKnocHrM4uQv4LWvPHl+9BML2F/NfsQhs+f/BlJOMDvjLxh9Tsw1m wPjiNe23NJAt4tQ2s49KdkQpZmbu2ZKocvnmPJMYsfNqyn/oWphY2mTCWjo22YrJjxqX qb84WIPbsguvgl+cKDUbK3pbHOsMbUq9zG5mx14MpIX06Mti7ZsTloGuPiRduQoCPtAs JyRRPY1l+HCxy/8aJhR8NiXt6kozipOJr4CRRNwW5zkFerFo96KJrrBYthYNgO9KRjK5 9OsRGyYYj0vXtA2AH7z6SkGU3tpUPtQ2r6gb+s6S0gT3xSQucVbaP/tLtRnxArDhFJfX 3jOg== Received: by 10.216.27.200 with SMTP id e50mr1387590wea.23.1336617295635; Wed, 09 May 2012 19:34:55 -0700 (PDT) Received: from [192.168.1.4] (xdsl-83-150-84-172.nebulazone.fi. [83.150.84.172]) by mx.google.com with ESMTPS id fn2sm19413wib.0.2012.05.09.19.34.53 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 09 May 2012 19:34:55 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Jonathan Morton In-Reply-To: Date: Thu, 10 May 2012 05:34:52 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <532BA404-30FF-4395-9575-709A68E4C723@gmail.com> References: To: Dave Taht X-Mailer: Apple Mail (2.1084) Cc: codel@lists.bufferbloat.net, bloat Subject: Re: [Codel] [Bloat] The challenge X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 02:34:58 -0000 On 9 May, 2012, at 4:04 am, Dave Taht wrote: > Both the acm queue article and jim's blog entry this morning were way > above mensa's standards. Does that mean we're allowed to feel extra smart if we understood it = anyway? :-) > Nobody has attempted to explain the elegant simplicity of the > algorithm itself in the inverse sqrt however! I have a good grip on > it, and am trying, but can barely explain it to myself. Anyone else > care to dig through the codel code and try to put it into english? Sounds like fun! I have a suspicion myself of what the quantitative reasoning behind the = inverse sqrt might be, but I also suspect that such details are much = less important than some of the *qualitative* properties of the = algorithm. Most likely the people that most need to be convinced will = respond better to that than to hard maths anyway. I think however that everyone has missed the single most important = property: the fact that drops occur on *dequeue* rather than enqueue, = and the result that this keeps the "resonant frequency" of each flow = near the physical RTT by minimising the delay between a packet drop and = the TCP's detection of and response to it. It is counter-intuitive to keep a queue filled with packets that you = might drop later - as an engineer you tend to think it is more efficient = to drop a packet *before* it starts to consume space in your precious = queue. This is a good deal of what needs to be solved in wireless = drivers - there is a wrong assumption in there that dropping at the tail = is all that is needed, and that leads to wrong design decisions. Tail-drop and RED drop on enqueue (at tail), not dequeue (at head), so = the missing packet cannot be detected by the TCP until it's "bubble" has = passed through the queue. This sets the resonant frequency lower than = the true RTT and delays the TCP's response, so the queue remains = overfilled for longer (triggering extra packet drops which require extra = recovery) and recovery (through retransmission) also takes longer and is = less reliable. Another point that might be missed by people naively reading the paper - = as opposed to regular readers of AQM literature - is that "drop" can = actually mean "mark with ECN". Looking at the code, ECN is indeed used = when appropriate. This is good because it signals the TCP without = forcing a retransmission - and without "wasting" the space in the queue = occupied by the packet. This is a good reinforcement of the principle = that tail-dropping is no longer the right choice. Important property #3 is that the queue responds immediately when it's = length decreases below the threshold, by stopping the marking/dropping = process entirely. This helps to maintain the high-frequency signal (of = "too fast" versus "slow enough") to the TCPs. The response to exceeding = the threshold is slower, but only to the extent of avoiding = over-reaction to a natural burst of traffic. Now about the inverse sqrt: qualitatively, it is just a convenient = method of gradually varying the mark/drop rate in terms of a time = interval rather than a packet count interval. The longer the queue = remains overfull - controlled by the number of mark/drop events that = have occurred - the higher the marking/dropping rate gets. If the queue = then empties (and stays empty-ish for a few RTTs), the rate is reset. = Meanwhile, if the queue regularly oscillates between "full" and "empty" = states, the marking/dropping rate is remembered so that aggressive TCPs = receive the correct magnitude signal. Now as for the *quantitative* reason behind it, it is because as the = interval between drops gets shorter, the number of increments per RTT = goes up because there is one increment per drop. If the interval is = shortened linearly per increment, that means it will in practice shorten = exponentially faster, such that the drop rate runs away faster than the = TCP can reasonably be expected to respond. This would result in = excessive drop rates and oscillatory behaviour typical of an = over-sensitive control system. By basing the drop rate on the square = root of the incremented variable, the runaway behaviour is curtailed = since the drop interval now shortens linearly per RTT. Maybe a diagram or animation would help to clarify that last paragraph. = I'm fairly sure I can draw one. So overall we have an AQM that provides a low-latency signal of = appropriate magnitude to TCPs when the link is genuinely congested, and = gets out of the way when it isn't. Combined with a fair queueing = discipline (eg. SFQ or QFQ), I think this will turn out to be an = excellent default setting for all sorts of equipment. What would it = take to get this into a DSL modem (at either end of the link)? - Jonathan Morton