From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e33.co.us.ibm.com", Issuer "GeoTrust SSL CA" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id CFF2621F0F0 for ; Tue, 27 Nov 2012 16:40:57 -0800 (PST) Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 27 Nov 2012 17:40:56 -0700 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 27 Nov 2012 17:27:20 -0700 Received: from d03relay03.boulder.ibm.com (d03relay03.boulder.ibm.com [9.17.195.228]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 1906419D8076; Tue, 27 Nov 2012 17:27:13 -0700 (MST) Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qAS0RCOf233478; Tue, 27 Nov 2012 17:27:12 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qAS0RBNu005237; Tue, 27 Nov 2012 17:27:11 -0700 Received: from paulmck-ThinkPad-W500 ([9.47.24.61]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id qAS0RAE0005184; Tue, 27 Nov 2012 17:27:11 -0700 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 6A66AEBF22; Tue, 27 Nov 2012 16:27:10 -0800 (PST) Date: Tue, 27 Nov 2012 16:27:10 -0800 From: "Paul E. McKenney" To: Greg White Message-ID: <20121128002710.GS2474@linux.vnet.ibm.com> References: <20121127224915.GM2474@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12112800-2398-0000-0000-00000E3D01C4 Cc: Paolo Valente , Toke =?iso-8859-1?Q?H=F8iland-J=F8rgensen?= , Eric Raymond , "codel@lists.bufferbloat.net" , "cerowrt-devel@lists.bufferbloat.net" , bloat , John Crispin Subject: Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list Reply-To: paulmck@linux.vnet.ibm.com List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Nov 2012 00:40:58 -0000 On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote: > BTW, I've heard some use the term "stochastic flow queueing" as a > replacement to avoid the term "fair". Seems like a more apt term anyway. Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-) Thanx, Paul > -Greg > > > On 11/27/12 3:49 PM, "Paul E. McKenney" wrote: > > >Thank you for the review and comments, Jim! I will apply them when > >I get the pen back from Dave. And yes, that is the thing about > >"fairness" -- there are a great many definitions, many of the most > >useful of which appear to many to be patently unfair. ;-) > > > >As you suggest, it might well be best to drop discussion of fairness, > >or to at the least supply the corresponding definition. > > > > Thanx, Paul > > > >On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote: > >> Some points worth making: > >> > >> 1) It is important to point out that (and how) fq_codel avoids > >>starvation: > >> unpleasant as elephant flows are, it would be very unfriendly to never > >> service them at all until they time out. > >> > >> 2) "fairness" is not necessarily what we ultimately want at all; you'd > >> really like to penalize those who induce congestion the most. But we > >>don't > >> currently have a solution (though Bob Briscoe at BT thinks he does, and > >>is > >> seeing if he can get it out from under a BT patent), so the current > >> fq_codel round robins ultimately until/unless we can do something like > >> Bob's idea. This is a local information only subset of the ideas he's > >>been > >> working on in the congestion exposure (conex) group at the IETF. > >> > >> 3) "fairness" is always in the eyes of the beholder (and should be left > >>to > >> the beholder to determine). "fairness" depends on where in the network > >>you > >> are. While being "fair" among TCP flows is sensible default policy for > >>a > >> host, else where in the network it may not be/usually isn't. > >> > >> Two examples: > >> o at a home router, you probably want to be "fair" according to transmit > >> opportunities. We really don't want a single system remote from the > >>router > >> to be able to starve the network so that devices near the router get > >>much > >> less bandwidth than you might hope/expect. > >> > >> What is more, you probably want to account for a single host using many > >> flows, and regulate that they not be able to "hog" bandwidth in the home > >> environment, but only use their "fair" share. > >> > >> o at an ISP, you must to be "fair" between customers; it is best to > >>leave > >> the judgement of "fairness" at finer granularity (e.g. host and TCP > >>flows) > >> to the points closer to the customer's systems, so that they can enforce > >> whatever definition of "fair" they need to themselves. > >> > >> > >> Algorithms like fq_codel can be/should be adjusted to the circumstances. > >> > >> And therefore exactly what you choose to hash against to form the > >>buckets > >> will vary depending on where you are. That at least one step (at the > >> user's device) of this be TCP flow "fair" does have the great advantage > >>of > >> helping the RTT unfairness problem that violates the principle of "least > >> surprise", such as that routinely seen in places like New Zealand. > >> > >> This is why I have so many problems using the word "fair" near this > >> algorithm. "fair" is impossible to define, overloaded in people's mind > >> with TCP fair queuing, not even desirable much of the time, and by > >> definition and design, even today's fq_codel isn't fair to lots of > >>things, > >> and the same basic algorithm can/should be tweaked in lots of directions > >> depending on what we need to do. Calling this "smart" queuing or some > >>such > >> would be better. > >> > >> When you've done another round on the document, I'll do a more detailed > >> read. > >> - Jim > >> > >> > >> > >> > >> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney < > >> paulmck@linux.vnet.ibm.com> wrote: > >> > >> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote: > >> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a > >> > > variety of drivers and network layers in a typical bonded adsl stack > >> > > yesterday. The complexity of it all makes my head hurt. I'm happy > >>that > >> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from > >>it, > >> > > which he submitted to netdev... > >> > > >> > Cool!!! ;-) > >> > > >> > > I made a recording of us last night discussing the layers, which I > >> > > will produce and distribute later... > >> > > > >> > > Anyway, along the way, we fiddled a lot with trying to analyze where > >> > > the 350ms or so of added latency was coming from in the traverse > >>geo's > >> > > adsl implementation and overlying stack.... > >> > > > >> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz > >> > > > >> > > Note: 1: > >> > > > >> > > The netperf sample rate on the rrul test needs to be higher than > >> > > 100ms in order to get a decent result at sub 10Mbit speeds. > >> > > > >> > > Note 2: > >> > > > >> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken > >>on > >> > > a gigE link from a Mac running Linux to another gigE link. (in other > >> > > words, NOT on the friggin adsl link) (firefox can display svg, I > >>don't > >> > > know what else) I find the T+10 delay before stream start in the > >> > > fq.svg graph suspicious and think the "throw out the outlier" code > >>in > >> > > the netperf-wrapper code is at fault. Prior to that, codel is merely > >> > > buffering up things madly, which can also be seen in the pfifo_fast > >> > > behavior, with 1000pkts it's default. > >> > > >> > I am using these two in a new "Effectiveness of FQ-CoDel" section. > >> > Chrome can display .svg, and if it becomes a problem, I am sure that > >> > they can be converted. Please let me know if some other data would > >> > make the point better. > >> > > >> > I am assuming that the colored throughput spikes are due to occasional > >> > packet losses. Please let me know if this interpretation is overly > >>naive. > >> > > >> > Also, I know what ICMP is, but the UDP variants are new to me. Could > >> > you please expand the "EF", "BK", "BE", and "CSS" acronyms? > >> > > >> > > (Arguably, the default queue length in codel can be reduced from 10k > >> > > packets to something more reasonable at GigE speeds) > >> > > > >> > > (the indicator that it's the graph, not the reality, is that the > >> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual > >>with > >> > > fq_codel.) > >> > > >> > All sessions were started at T+5, then? > >> > > >> > > As for the *.ps graphs, well, they would take david's network > >>topology > >> > > to explain, and were conducted over a variety of circumstances, > >> > > including wifi, with more variables in play than I care to think > >> > > about. > >> > > > >> > > We didn't really get anywhere on digging deeper. As we got to purer > >> > > tests - with a minimal number of boxes, running pure ethernet, > >> > > switched over a couple of switches, even in the simplest two box > >>case, > >> > > my HTB based "ceroshaper" implementation had multiple problems in > >> > > cutting median latencies below 100ms, on this very slow ADSL link. > >> > > David suspects problems on the path along the carrier backbone as a > >> > > potential issue, and the only way to measure that is with two one > >>way > >> > > trip time measurements (rather than rtt), time synced via ntp... I > >> > > keep hoping to find a rtp test, but I'm open to just about any > >>option > >> > > at this point. anyone? > >> > > > >> > > We also found a probable bug in mtr in that multiple mtrs on the > >>same > >> > > box don't co-exist. > >> > > >> > I must confess that I am not seeing all that clear a difference > >>between > >> > the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better > >>latencies > >> > for FQ-CoDel, but not unambiguously so. > >> > > >> > > Moving back to more scientific clarity and simpler tests... > >> > > > >> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this: > >> > > > >> > > > >> > > >>http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff > >>erbloat_on_wifi.pdf > >> > > > >> > > appear to show the advantage of fq_codel fq + codel + head drop over > >> > > tail drop during the slow start period on a 10Mbit link - (see how > >> > > squiggly slow start is on pfifo fast?) as well as the marvelous > >> > > interstream latency that can be achieved with BQL=3000 (on a 10 mbit > >> > > link.) Even that latency can be halved by reducing BQL to 1500, > >>which > >> > > is just fine on a 10mbit. Below those rates I'd like to be rid of > >>BQL > >> > > entirely, and just have a single packet outstanding... in everything > >> > > from adsl to cable... > >> > > > >> > > That said, I'd welcome other explanations of the squiggly slowstart > >> > > pfifo_fast behavior before I put that explanation on the slide.... > >>ECN > >> > > was in play here, too. I can redo this test easily, it's basically > >> > > running a netperf TCP_RR for 70 seconds, and starting up a > >>TCP_MAERTS > >> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's > >> > > limit and the link speeds on two sides of a directly connected > >>laptop > >> > > connection. > >> > > >> > I must defer to others on this one. I do note the much lower > >>latencies > >> > on slide 6 compared to slide 5, though. > >> > > >> > Please see attached for update including .git directory. > >> > > >> > Thanx, Paul > >> > > >> > > ethtool -s eth0 advertise 0x002 # 10 Mbit > >> > > > >> > > >> > _______________________________________________ > >> > Cerowrt-devel mailing list > >> > Cerowrt-devel@lists.bufferbloat.net > >> > https://lists.bufferbloat.net/listinfo/cerowrt-devel > >> > > >> > > > > >_______________________________________________ > >Codel mailing list > >Codel@lists.bufferbloat.net > >https://lists.bufferbloat.net/listinfo/codel >