From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mho-02-ewr.mailhop.org (mho-04-ewr.mailhop.org [204.13.248.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 9EA4221F15A; Tue, 27 Nov 2012 19:43:39 -0800 (PST) Received: from c-24-4-217-203.hsd1.ca.comcast.net ([24.4.217.203] helo=kmn.local) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1TdYYf-0000eb-3Y; Wed, 28 Nov 2012 03:43:33 +0000 X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 24.4.217.203 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/cnuJe9CAfVkrJR6ULLPiqCf2knQlvoCs= Message-ID: <50B5887C.7010605@pollere.com> Date: Tue, 27 Nov 2012 19:43:56 -0800 From: Kathleen Nichols User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com References: <20121127224915.GM2474@linux.vnet.ibm.com> <20121128002710.GS2474@linux.vnet.ibm.com> In-Reply-To: <20121128002710.GS2474@linux.vnet.ibm.com> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Paolo Valente , =?ISO-8859-1?Q?Toke_H=F8iland-J=F8rgensen?= , "codel@lists.bufferbloat.net" , "cerowrt-devel@lists.bufferbloat.net" , bloat , John Crispin Subject: Re: [Bloat] [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Nov 2012 03:43:40 -0000 It would be me that tries to say "stochastic flow queuing with CoDel" as I like to be accurate. But I think FQ-Codel is Flow queuing with CoDel. JimG suggests "smart flow queuing" because he is ever mindful of the big audience. On 11/27/12 4:27 PM, Paul E. McKenney wrote: > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote: >> BTW, I've heard some use the term "stochastic flow queueing" as a >> replacement to avoid the term "fair". Seems like a more apt term anyway. > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-) > > Thanx, Paul > >> -Greg >> >> >> On 11/27/12 3:49 PM, "Paul E. McKenney" wrote: >> >>> Thank you for the review and comments, Jim! I will apply them when >>> I get the pen back from Dave. And yes, that is the thing about >>> "fairness" -- there are a great many definitions, many of the most >>> useful of which appear to many to be patently unfair. ;-) >>> >>> As you suggest, it might well be best to drop discussion of fairness, >>> or to at the least supply the corresponding definition. >>> >>> Thanx, Paul >>> >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote: >>>> Some points worth making: >>>> >>>> 1) It is important to point out that (and how) fq_codel avoids >>>> starvation: >>>> unpleasant as elephant flows are, it would be very unfriendly to never >>>> service them at all until they time out. >>>> >>>> 2) "fairness" is not necessarily what we ultimately want at all; you'd >>>> really like to penalize those who induce congestion the most. But we >>>> don't >>>> currently have a solution (though Bob Briscoe at BT thinks he does, and >>>> is >>>> seeing if he can get it out from under a BT patent), so the current >>>> fq_codel round robins ultimately until/unless we can do something like >>>> Bob's idea. This is a local information only subset of the ideas he's >>>> been >>>> working on in the congestion exposure (conex) group at the IETF. >>>> >>>> 3) "fairness" is always in the eyes of the beholder (and should be left >>>> to >>>> the beholder to determine). "fairness" depends on where in the network >>>> you >>>> are. While being "fair" among TCP flows is sensible default policy for >>>> a >>>> host, else where in the network it may not be/usually isn't. >>>> >>>> Two examples: >>>> o at a home router, you probably want to be "fair" according to transmit >>>> opportunities. We really don't want a single system remote from the >>>> router >>>> to be able to starve the network so that devices near the router get >>>> much >>>> less bandwidth than you might hope/expect. >>>> >>>> What is more, you probably want to account for a single host using many >>>> flows, and regulate that they not be able to "hog" bandwidth in the home >>>> environment, but only use their "fair" share. >>>> >>>> o at an ISP, you must to be "fair" between customers; it is best to >>>> leave >>>> the judgement of "fairness" at finer granularity (e.g. host and TCP >>>> flows) >>>> to the points closer to the customer's systems, so that they can enforce >>>> whatever definition of "fair" they need to themselves. >>>> >>>> >>>> Algorithms like fq_codel can be/should be adjusted to the circumstances. >>>> >>>> And therefore exactly what you choose to hash against to form the >>>> buckets >>>> will vary depending on where you are. That at least one step (at the >>>> user's device) of this be TCP flow "fair" does have the great advantage >>>> of >>>> helping the RTT unfairness problem that violates the principle of "least >>>> surprise", such as that routinely seen in places like New Zealand. >>>> >>>> This is why I have so many problems using the word "fair" near this >>>> algorithm. "fair" is impossible to define, overloaded in people's mind >>>> with TCP fair queuing, not even desirable much of the time, and by >>>> definition and design, even today's fq_codel isn't fair to lots of >>>> things, >>>> and the same basic algorithm can/should be tweaked in lots of directions >>>> depending on what we need to do. Calling this "smart" queuing or some >>>> such >>>> would be better. >>>> >>>> When you've done another round on the document, I'll do a more detailed >>>> read. >>>> - Jim >>>> >>>> >>>> >>>> >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney < >>>> paulmck@linux.vnet.ibm.com> wrote: >>>> >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote: >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a >>>>>> variety of drivers and network layers in a typical bonded adsl stack >>>>>> yesterday. The complexity of it all makes my head hurt. I'm happy >>>> that >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged from >>>> it, >>>>>> which he submitted to netdev... >>>>> >>>>> Cool!!! ;-) >>>>> >>>>>> I made a recording of us last night discussing the layers, which I >>>>>> will produce and distribute later... >>>>>> >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze where >>>>>> the 350ms or so of added latency was coming from in the traverse >>>> geo's >>>>>> adsl implementation and overlying stack.... >>>>>> >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz >>>>>> >>>>>> Note: 1: >>>>>> >>>>>> The netperf sample rate on the rrul test needs to be higher than >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds. >>>>>> >>>>>> Note 2: >>>>>> >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were taken >>>> on >>>>>> a gigE link from a Mac running Linux to another gigE link. (in other >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I >>>> don't >>>>>> know what else) I find the T+10 delay before stream start in the >>>>>> fq.svg graph suspicious and think the "throw out the outlier" code >>>> in >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is merely >>>>>> buffering up things madly, which can also be seen in the pfifo_fast >>>>>> behavior, with 1000pkts it's default. >>>>> >>>>> I am using these two in a new "Effectiveness of FQ-CoDel" section. >>>>> Chrome can display .svg, and if it becomes a problem, I am sure that >>>>> they can be converted. Please let me know if some other data would >>>>> make the point better. >>>>> >>>>> I am assuming that the colored throughput spikes are due to occasional >>>>> packet losses. Please let me know if this interpretation is overly >>>> naive. >>>>> >>>>> Also, I know what ICMP is, but the UDP variants are new to me. Could >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms? >>>>> >>>>>> (Arguably, the default queue length in codel can be reduced from 10k >>>>>> packets to something more reasonable at GigE speeds) >>>>>> >>>>>> (the indicator that it's the graph, not the reality, is that the >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is usual >>>> with >>>>>> fq_codel.) >>>>> >>>>> All sessions were started at T+5, then? >>>>> >>>>>> As for the *.ps graphs, well, they would take david's network >>>> topology >>>>>> to explain, and were conducted over a variety of circumstances, >>>>>> including wifi, with more variables in play than I care to think >>>>>> about. >>>>>> >>>>>> We didn't really get anywhere on digging deeper. As we got to purer >>>>>> tests - with a minimal number of boxes, running pure ethernet, >>>>>> switched over a couple of switches, even in the simplest two box >>>> case, >>>>>> my HTB based "ceroshaper" implementation had multiple problems in >>>>>> cutting median latencies below 100ms, on this very slow ADSL link. >>>>>> David suspects problems on the path along the carrier backbone as a >>>>>> potential issue, and the only way to measure that is with two one >>>> way >>>>>> trip time measurements (rather than rtt), time synced via ntp... I >>>>>> keep hoping to find a rtp test, but I'm open to just about any >>>> option >>>>>> at this point. anyone? >>>>>> >>>>>> We also found a probable bug in mtr in that multiple mtrs on the >>>> same >>>>>> box don't co-exist. >>>>> >>>>> I must confess that I am not seeing all that clear a difference >>>> between >>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better >>>> latencies >>>>> for FQ-CoDel, but not unambiguously so. >>>>> >>>>>> Moving back to more scientific clarity and simpler tests... >>>>>> >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of this: >>>>>> >>>>>> >>>>> >>>> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff >>>> erbloat_on_wifi.pdf >>>>>> >>>>>> appear to show the advantage of fq_codel fq + codel + head drop over >>>>>> tail drop during the slow start period on a 10Mbit link - (see how >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10 mbit >>>>>> link.) Even that latency can be halved by reducing BQL to 1500, >>>> which >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid of >>>> BQL >>>>>> entirely, and just have a single packet outstanding... in everything >>>>>> from adsl to cable... >>>>>> >>>>>> That said, I'd welcome other explanations of the squiggly slowstart >>>>>> pfifo_fast behavior before I put that explanation on the slide.... >>>> ECN >>>>>> was in play here, too. I can redo this test easily, it's basically >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a >>>> TCP_MAERTS >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's >>>>>> limit and the link speeds on two sides of a directly connected >>>> laptop >>>>>> connection. >>>>> >>>>> I must defer to others on this one. I do note the much lower >>>> latencies >>>>> on slide 6 compared to slide 5, though. >>>>> >>>>> Please see attached for update including .git directory. >>>>> >>>>> Thanx, Paul >>>>> >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Cerowrt-devel mailing list >>>>> Cerowrt-devel@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>>> >>>>> >>> >>> _______________________________________________ >>> Codel mailing list >>> Codel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/codel >> > > _______________________________________________ > Codel mailing list > Codel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/codel >