From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ondar.cablelabs.com (ondar.cablelabs.com [192.160.73.61]) by huchra.bufferbloat.net (Postfix) with ESMTP id 9AE1021F0F2; Tue, 27 Nov 2012 15:53:24 -0800 (PST) Received: from kyzyl.cablelabs.com (kyzyl [10.253.0.7]) by ondar.cablelabs.com (8.14.5/8.14.5) with ESMTP id qARNrJMF017759; Tue, 27 Nov 2012 16:53:19 -0700 Received: from srvxchg.cablelabs.com (10.5.0.15) by kyzyl.cablelabs.com (F-Secure/fsigk_smtp/407/kyzyl.cablelabs.com); Tue, 27 Nov 2012 16:53:19 -0700 (MST) X-Virus-Status: clean(F-Secure/fsigk_smtp/407/kyzyl.cablelabs.com) Received: from srvxchg.cablelabs.com ([10.5.0.15]) by srvxchg ([10.5.0.15]) with mapi; Tue, 27 Nov 2012 16:53:19 -0700 From: Greg White To: "paulmck@linux.vnet.ibm.com" , Jim Gettys Date: Tue, 27 Nov 2012 16:53:34 -0700 Thread-Topic: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review Thread-Index: Ac3M+mCdJcIc3+jDRt6DnBQgLnbZog== Message-ID: In-Reply-To: <20121127224915.GM2474@linux.vnet.ibm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.5.121010 acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Paolo Valente , =?iso-8859-1?Q?Toke_H=F8iland-J=F8rgensen?= , "codel@lists.bufferbloat.net" , "cerowrt-devel@lists.bufferbloat.net" , bloat , John Crispin Subject: Re: [Bloat] [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Nov 2012 23:53:24 -0000 BTW, I've heard some use the term "stochastic flow queueing" as a replacement to avoid the term "fair". Seems like a more apt term anyway. -Greg On 11/27/12 3:49 PM, "Paul E. McKenney" wrote: >Thank you for the review and comments, Jim! I will apply them when >I get the pen back from Dave. And yes, that is the thing about >"fairness" -- there are a great many definitions, many of the most >useful of which appear to many to be patently unfair. ;-) > >As you suggest, it might well be best to drop discussion of fairness, >or to at the least supply the corresponding definition. > > Thanx, Paul > >On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote: >> Some points worth making: >>=20 >> 1) It is important to point out that (and how) fq_codel avoids >>starvation: >> unpleasant as elephant flows are, it would be very unfriendly to never >> service them at all until they time out. >>=20 >> 2) "fairness" is not necessarily what we ultimately want at all; you'd >> really like to penalize those who induce congestion the most. But we >>don't >> currently have a solution (though Bob Briscoe at BT thinks he does, and >>is >> seeing if he can get it out from under a BT patent), so the current >> fq_codel round robins ultimately until/unless we can do something like >> Bob's idea. This is a local information only subset of the ideas he's >>been >> working on in the congestion exposure (conex) group at the IETF. >>=20 >> 3) "fairness" is always in the eyes of the beholder (and should be left >>to >> the beholder to determine). "fairness" depends on where in the network >>you >> are. While being "fair" among TCP flows is sensible default policy for >>a >> host, else where in the network it may not be/usually isn't. >>=20 >> Two examples: >> o at a home router, you probably want to be "fair" according to transmit >> opportunities. We really don't want a single system remote from the >>router >> to be able to starve the network so that devices near the router get >>much >> less bandwidth than you might hope/expect. >>=20 >> What is more, you probably want to account for a single host using many >> flows, and regulate that they not be able to "hog" bandwidth in the home >> environment, but only use their "fair" share. >>=20 >> o at an ISP, you must to be "fair" between customers; it is best to >>leave >> the judgement of "fairness" at finer granularity (e.g. host and TCP >>flows) >> to the points closer to the customer's systems, so that they can enforce >> whatever definition of "fair" they need to themselves. >>=20 >>=20 >> Algorithms like fq_codel can be/should be adjusted to the circumstances. >>=20 >> And therefore exactly what you choose to hash against to form the >>buckets >> will vary depending on where you are. That at least one step (at the >> user's device) of this be TCP flow "fair" does have the great advantage >>of >> helping the RTT unfairness problem that violates the principle of "least >> surprise", such as that routinely seen in places like New Zealand. >>=20 >> This is why I have so many problems using the word "fair" near this >> algorithm. "fair" is impossible to define, overloaded in people's mind >> with TCP fair queuing, not even desirable much of the time, and by >> definition and design, even today's fq_codel isn't fair to lots of >>things, >> and the same basic algorithm can/should be tweaked in lots of directions >> depending on what we need to do. Calling this "smart" queuing or some >>such >> would be better. >>=20 >> When you've done another round on the document, I'll do a more detailed >> read. >> - Jim >>=20 >>=20 >>=20 >>=20 >> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney < >> paulmck@linux.vnet.ibm.com> wrote: >>=20 >> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote: >> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a >> > > variety of drivers and network layers in a typical bonded adsl stack >> > > yesterday. The complexity of it all makes my head hurt. I'm happy >>that >> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from >>it, >> > > which he submitted to netdev... >> > >> > Cool!!! ;-) >> > >> > > I made a recording of us last night discussing the layers, which I >> > > will produce and distribute later... >> > > >> > > Anyway, along the way, we fiddled a lot with trying to analyze where >> > > the 350ms or so of added latency was coming from in the traverse >>geo's >> > > adsl implementation and overlying stack.... >> > > >> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz >> > > >> > > Note: 1: >> > > >> > > The netperf sample rate on the rrul test needs to be higher than >> > > 100ms in order to get a decent result at sub 10Mbit speeds. >> > > >> > > Note 2: >> > > >> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken >>on >> > > a gigE link from a Mac running Linux to another gigE link. (in other >> > > words, NOT on the friggin adsl link) (firefox can display svg, I >>don't >> > > know what else) I find the T+10 delay before stream start in the >> > > fq.svg graph suspicious and think the "throw out the outlier" code >>in >> > > the netperf-wrapper code is at fault. Prior to that, codel is merely >> > > buffering up things madly, which can also be seen in the pfifo_fast >> > > behavior, with 1000pkts it's default. >> > >> > I am using these two in a new "Effectiveness of FQ-CoDel" section. >> > Chrome can display .svg, and if it becomes a problem, I am sure that >> > they can be converted. Please let me know if some other data would >> > make the point better. >> > >> > I am assuming that the colored throughput spikes are due to occasional >> > packet losses. Please let me know if this interpretation is overly >>naive. >> > >> > Also, I know what ICMP is, but the UDP variants are new to me. Could >> > you please expand the "EF", "BK", "BE", and "CSS" acronyms? >> > >> > > (Arguably, the default queue length in codel can be reduced from 10k >> > > packets to something more reasonable at GigE speeds) >> > > >> > > (the indicator that it's the graph, not the reality, is that the >> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual >>with >> > > fq_codel.) >> > >> > All sessions were started at T+5, then? >> > >> > > As for the *.ps graphs, well, they would take david's network >>topology >> > > to explain, and were conducted over a variety of circumstances, >> > > including wifi, with more variables in play than I care to think >> > > about. >> > > >> > > We didn't really get anywhere on digging deeper. As we got to purer >> > > tests - with a minimal number of boxes, running pure ethernet, >> > > switched over a couple of switches, even in the simplest two box >>case, >> > > my HTB based "ceroshaper" implementation had multiple problems in >> > > cutting median latencies below 100ms, on this very slow ADSL link. >> > > David suspects problems on the path along the carrier backbone as a >> > > potential issue, and the only way to measure that is with two one >>way >> > > trip time measurements (rather than rtt), time synced via ntp... I >> > > keep hoping to find a rtp test, but I'm open to just about any >>option >> > > at this point. anyone? >> > > >> > > We also found a probable bug in mtr in that multiple mtrs on the >>same >> > > box don't co-exist. >> > >> > I must confess that I am not seeing all that clear a difference >>between >> > the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better >>latencies >> > for FQ-CoDel, but not unambiguously so. >> > >> > > Moving back to more scientific clarity and simpler tests... >> > > >> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this: >> > > >> > > >> >=20 >>http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff >>erbloat_on_wifi.pdf >> > > >> > > appear to show the advantage of fq_codel fq + codel + head drop over >> > > tail drop during the slow start period on a 10Mbit link - (see how >> > > squiggly slow start is on pfifo fast?) as well as the marvelous >> > > interstream latency that can be achieved with BQL=3D3000 (on a 10 mb= it >> > > link.) Even that latency can be halved by reducing BQL to 1500, >>which >> > > is just fine on a 10mbit. Below those rates I'd like to be rid of >>BQL >> > > entirely, and just have a single packet outstanding... in everything >> > > from adsl to cable... >> > > >> > > That said, I'd welcome other explanations of the squiggly slowstart >> > > pfifo_fast behavior before I put that explanation on the slide.... >>ECN >> > > was in play here, too. I can redo this test easily, it's basically >> > > running a netperf TCP_RR for 70 seconds, and starting up a >>TCP_MAERTS >> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's >> > > limit and the link speeds on two sides of a directly connected >>laptop >> > > connection. >> > >> > I must defer to others on this one. I do note the much lower >>latencies >> > on slide 6 compared to slide 5, though. >> > >> > Please see attached for update including .git directory. >> > >> > Thanx, Paul >> > >> > > ethtool -s eth0 advertise 0x002 # 10 Mbit >> > > >> > >> > _______________________________________________ >> > Cerowrt-devel mailing list >> > Cerowrt-devel@lists.bufferbloat.net >> > https://lists.bufferbloat.net/listinfo/cerowrt-devel >> > >> > > >_______________________________________________ >Codel mailing list >Codel@lists.bufferbloat.net >https://lists.bufferbloat.net/listinfo/codel