From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-34-ewr.dyndns.com (mxout-077-ewr.mailhop.org [216.146.33.77]) by lists.bufferbloat.net (Postfix) with ESMTP id 6C6852E04A9 for ; Thu, 24 Mar 2011 05:40:37 -0700 (PDT) Received: from scan-32-ewr.mailhop.org (scan-32-ewr.local [10.0.141.238]) by mail-34-ewr.dyndns.com (Postfix) with ESMTP id DABD47083E3 for ; Thu, 24 Mar 2011 12:40:36 +0000 (UTC) X-Spam-Score: -1.0 (-) X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 209.85.215.43 Received: from mail-ew0-f43.google.com (mail-ew0-f43.google.com [209.85.215.43]) by mail-34-ewr.dyndns.com (Postfix) with ESMTP id 0B01170C2E5 for ; Thu, 24 Mar 2011 12:40:31 +0000 (UTC) Received: by ewy20 with SMTP id 20so3058813ewy.16 for ; Thu, 24 Mar 2011 05:40:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to:x-mailer; bh=O+xQJGv8zpy76jYmEPgIQgu5ECIB9SS0Hu/umre600Y=; b=qBo09w39n5btblPprXyvZ7l+cJByNqsWukTlo0A95YII0u2MYS49QO/AbTiGrBhQfl Ea2pmAfIC3tFeB6aIenQTw2vRsv86pRcKVzORWm7lBmC1j0zCTANTXXExo5dxC9pvhpp QbDjy2CrWO1ctOuElZOHsiBP+Y/P9jHBpVsRU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=Xp6EKtGK3/aB1M8Ogt//E+G9bJJ27JfqzfdFSQz7X0DmPLG/HUeH4uYFp6HnXj7aRQ quW3KDX8/BYFZGMEg+P6a7RIvs5A7pqHUd1IUseqHGrLYdwMZU8seg/YPTGUGHnygWdJ eCsWi5MAaVUCvZ2OZu+EhOQc/itbPZHd/zOrE= Received: by 10.14.124.74 with SMTP id w50mr3004812eeh.34.1300970431277; Thu, 24 Mar 2011 05:40:31 -0700 (PDT) Received: from [192.168.239.42] (xdsl-83-150-84-172.nebulazone.fi [83.150.84.172]) by mx.google.com with ESMTPS id q53sm4473615eeh.18.2011.03.24.05.40.28 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 Mar 2011 05:40:29 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Jonathan Morton In-Reply-To: <7imxklz5vu.fsf@lanthane.pps.jussieu.fr> Date: Thu, 24 Mar 2011 14:40:27 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <160809C8-284C-4463-97FE-0E2F03C08589@gmail.com> References: <7imxklz5vu.fsf@lanthane.pps.jussieu.fr> To: Juliusz Chroboczek X-Mailer: Apple Mail (2.1082) Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Thoughts on Stochastic Fair Blue X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Mar 2011 12:40:37 -0000 On 24 Mar, 2011, at 3:03 am, Juliusz Chroboczek wrote: > (I'm the original author of sch_sfb.) >=20 >> Having read some more documents and code, I have some extra insight = into >> SFB that might prove helpful. Note that I haven't actually tried it >> yet, but it looks good anyway. In control-systems parlance, this is >> effectively a multichannel I-type controller, where RED is >> a single-channel P-type controller. >=20 > Methinks that it would be worthwile to implement plain BLUE, in order = to > see how it compares. (Of course, once Jim comes down from Mount Sinai > and hands us RED-Lite, it might also be worth thinking about SFRed.) I'd be interested to see if you can make a BLUE implementation which = doesn't throw a wobbler with lossy child qdiscs. Because there's only = one queue, you should be able to query the child's queue length instead = of maintaining it internally. I'd *also* be interested in an SFB implementation which also has the = packet-reordering characteristics of SFQ built-in, so that applying = child qdiscs to it would be unnecessary. I'm just about to try putting = this combination together manually on a live network. Finally, it might also be interesting and useful to add bare-bones ECN = support to the existing "dumb" qdiscs, such as SFQ and the FIFO family. = Simply start marking (and dropping non-supporting flows) when the queue = is more than half full. >> My first thought after reading just the paper was that = unconditionally >> dropping the packets which increase the marking probability was = suspect. >> It should be quite possible to manage a queue using just ECN, without >> any packet loss, in simple cases such as a single bulk TCP flow. = Thus >> I am pleased to see that the SFB code in the debloat tree has = separate >> thresholds for increasing the marking rate and tail-dropping. They = are >> fairly close together, but they are at least distinct. >=20 > I hesitated for a long time before doing that, and would dearly like = to > see some conclusive experimental data that this is a good idea. The > trouble is -- when the drop rate is too low, we risk receiving a burst > of packets from a traditional TCP sender. Having the drop threshold > larger than the increase threshold will get such bursts into our > buffer. I'm not going to explain on this particular list why such > bursts are ungood ;-) Actually, we *do* need to support momentary bursts of packets, although = with ECN we should expect these excursions to be smaller and less = frequent than without it. The primary cause of a large packet burst is = presumably from packet loss recovery, although some broken TCPs can = produce them with no provocation. At the bare minimum, you need to support ECN-marking the first = triggering packet rather than dropping it. The goal here is to have = functioning congestion control without packet loss (a concept which = should theoretically please the Cisco crowd). With BLUE as described in = the paper, a packet would always be dropped before ECN marking started, = and that is what I was concerned about. With even a small extra buffer = on top, the TCP has some chance to back off before loss occurs. With packet reordering like SFQ, the effects of bursts of packets on a = single flow are (mostly) isolated to that flow. I think it's better to = accommodate them than to squash them, especially as dropping packets = will lead to more bursts as the sending TCP tries to compensate and = recover. > The other, somewhat unrelated, issue you should be aware of is that > ECN marking has some issues in highly congested networks [1]; this is > the reason why sch_sfb will start dropping after the mark probability > has gone above 1/2. I haven't had time to read the paper thoroughly, but I don't argue with = this - if the marking probability goes above 1/2 then you probably have = an unresponsive flow anyway. I can't imagine any sane TCP responding so = aggressively to the equivalent of a 50% packet loss. >> the length of the queue - which does not appear to be self-tuned by >> the flow rate. However, the default values appear to be sensible. >=20 > Please clarify. The consensus seems to be that queue length should depend on bandwidth - = if we assume that link latency is negligible, then the RTT is usually = dominated by the general Internet, assumed constant at 100ms. OTOH, = there is another school of thought which says that queue length must = *also* depend on the number of flows, with a greater number of flows = causing a shortening in optimum queue length (because the bandwidth and = thus burst size from an individual flow is smaller). But tuning the queue length might not actually be necessary, provided = the qdisc is sufficiently sophisticated in other ways. We shall see. >> The major concern with this arrangement is the incompatibility with >> qdiscs that can drop packets internally, since this is not = necessarily >> obvious to end-user admins. >=20 > Agreed. More generally, Linux' qdisc setup is error-prone, and > certainly beyond the abilities of the people we're targeting; we need = to > get a bunch of reasonable defaults into distributions. (Please start > with OpenWRT, whose qos-scripts package[2] is used by a fair number of > people.) Something better than pfifo_fast is definitely warranted by default, = except on the tiniest embedded devices which cannot cope with the memory = requirements. But those are always a corner case. >> I also thought of a different way to implement the hash rotation. >> Instead of shadowing the entire set of buckets, simply replace the = hash >> on one row at a time. This requires that the next-to-minimum values = for >> q_len and p_mark are used, rather than the strict minima. It is = still >> necessary to calculate two hash values for each packet, but the = memory >> requirements are reduced at the expense of effectively removing one = row >> from the Bloom filter. >=20 > Interesting idea. - Jonathan