From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <g.white@CableLabs.com>
Received: from ondar.cablelabs.com (ondar.cablelabs.com [192.160.73.61])
	by huchra.bufferbloat.net (Postfix) with ESMTP id 9AE1021F0F2;
	Tue, 27 Nov 2012 15:53:24 -0800 (PST)
Received: from kyzyl.cablelabs.com (kyzyl [10.253.0.7])
	by ondar.cablelabs.com (8.14.5/8.14.5) with ESMTP id qARNrJMF017759;
	Tue, 27 Nov 2012 16:53:19 -0700
Received: from srvxchg.cablelabs.com (10.5.0.15)
	by kyzyl.cablelabs.com (F-Secure/fsigk_smtp/407/kyzyl.cablelabs.com);
	Tue, 27 Nov 2012 16:53:19 -0700 (MST)
X-Virus-Status: clean(F-Secure/fsigk_smtp/407/kyzyl.cablelabs.com)
Received: from srvxchg.cablelabs.com ([10.5.0.15]) by srvxchg ([10.5.0.15])
	with mapi; Tue, 27 Nov 2012 16:53:19 -0700
From: Greg White <g.white@CableLabs.com>
To: "paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>, Jim Gettys
	<jg@freedesktop.org>
Date: Tue, 27 Nov 2012 16:53:34 -0700
Thread-Topic: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
Thread-Index: Ac3M+mCdJcIc3+jDRt6DnBQgLnbZog==
Message-ID: <CCDA9A91.16B11%g.white@cablelabs.com>
In-Reply-To: <20121127224915.GM2474@linux.vnet.ibm.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
user-agent: Microsoft-MacOutlook/14.2.5.121010
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Paolo Valente <paolo.valente@unimore.it>,
	=?iso-8859-1?Q?Toke_H=F8iland-J=F8rgensen?= <toke@toke.dk>,
	"codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>,
	"cerowrt-devel@lists.bufferbloat.net"
	<cerowrt-devel@lists.bufferbloat.net>, bloat <bloat@lists.bufferbloat.net>,
	John Crispin <blogic@openwrt.org>
Subject: Re: [Bloat] [Codel] [Cerowrt-devel] FQ_Codel lwn draft article
	review
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 27 Nov 2012 23:53:24 -0000

BTW, I've heard some use the term "stochastic flow queueing" as a
replacement to avoid the term "fair".  Seems like a more apt term anyway.

-Greg


On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

>Thank you for the review and comments, Jim!  I will apply them when
>I get the pen back from Dave.  And yes, that is the thing about
>"fairness" -- there are a great many definitions, many of the most
>useful of which appear to many to be patently unfair.  ;-)
>
>As you suggest, it might well be best to drop discussion of fairness,
>or to at the least supply the corresponding definition.
>
>							Thanx, Paul
>
>On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
>> Some points worth making:
>>=20
>> 1) It is important to point out that (and how) fq_codel avoids
>>starvation:
>> unpleasant as elephant flows are, it would be very unfriendly to never
>> service them at all until they time out.
>>=20
>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
>> really like to penalize those who induce congestion the most.  But we
>>don't
>> currently have a solution (though Bob Briscoe at BT thinks he does, and
>>is
>> seeing if he can get it out from under a BT patent), so the current
>> fq_codel round robins ultimately until/unless we can do something like
>> Bob's idea.  This is a local information only subset of the ideas he's
>>been
>> working on in the congestion exposure (conex) group at the IETF.
>>=20
>> 3) "fairness" is always in the eyes of the beholder (and should be left
>>to
>> the beholder to determine). "fairness" depends on where in the network
>>you
>> are.  While being "fair" among TCP flows is sensible default policy for
>>a
>> host, else where in the network it may not be/usually isn't.
>>=20
>> Two examples:
>> o at a home router, you probably want to be "fair" according to transmit
>> opportunities.  We really don't want a single system remote from the
>>router
>> to be able to starve the network so that devices near the router get
>>much
>> less bandwidth than you might hope/expect.
>>=20
>> What is more, you probably want to account for a single host using many
>> flows, and regulate that they not be able to "hog" bandwidth in the home
>> environment, but only use their "fair" share.
>>=20
>> o at an ISP, you must to be "fair" between customers; it is best to
>>leave
>> the judgement of "fairness" at finer granularity (e.g. host and TCP
>>flows)
>> to the points closer to the customer's systems, so that they can enforce
>> whatever definition of "fair" they need to themselves.
>>=20
>>=20
>> Algorithms like fq_codel can be/should be adjusted to the circumstances.
>>=20
>> And therefore exactly what you choose to hash against to form the
>>buckets
>> will vary depending on where you are.  That at least one step (at the
>> user's device) of this be TCP flow "fair" does have the great advantage
>>of
>> helping the RTT unfairness problem that violates the principle of "least
>> surprise", such as that routinely seen in places like New Zealand.
>>=20
>> This is why I have so many problems using the word "fair" near this
>> algorithm.  "fair" is impossible to define, overloaded in people's mind
>> with TCP fair queuing, not even desirable much of the time, and by
>> definition and design, even today's fq_codel isn't fair to lots of
>>things,
>> and the same basic algorithm can/should be tweaked in lots of directions
>> depending on what we need to do.  Calling this "smart" queuing or some
>>such
>> would be better.
>>=20
>> When you've done another round on the document, I'll do a more detailed
>> read.
>>                              - Jim
>>=20
>>=20
>>=20
>>=20
>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
>> paulmck@linux.vnet.ibm.com> wrote:
>>=20
>> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
>> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a
>> > > variety of drivers and network layers in a typical bonded adsl stack
>> > > yesterday. The complexity of it all makes my head hurt. I'm happy
>>that
>> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from
>>it,
>> > > which he submitted to netdev...
>> >
>> > Cool!!!  ;-)
>> >
>> > > I made a recording of us last night discussing the layers, which I
>> > > will produce and distribute later...
>> > >
>> > > Anyway, along the way, we fiddled a lot with trying to analyze where
>> > > the 350ms or so of added latency was coming from in the traverse
>>geo's
>> > > adsl implementation and overlying stack....
>> > >
>> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
>> > >
>> > > Note: 1:
>> > >
>> > > The  netperf sample rate on the rrul test needs to be higher than
>> > > 100ms in order to get a decent result at sub 10Mbit speeds.
>> > >
>> > > Note 2:
>> > >
>> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken
>>on
>> > > a gigE link from a Mac running Linux to another gigE link. (in other
>> > > words, NOT on the friggin adsl link) (firefox can display svg, I
>>don't
>> > > know what else) I find the T+10 delay before stream start in the
>> > > fq.svg graph suspicious and think the "throw out the outlier" code
>>in
>> > > the netperf-wrapper code is at fault. Prior to that, codel is merely
>> > > buffering up things madly, which can also be seen in the pfifo_fast
>> > > behavior, with 1000pkts it's default.
>> >
>> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
>> > Chrome can display .svg, and if it becomes a problem, I am sure that
>> > they can be converted.  Please let me know if some other data would
>> > make the point better.
>> >
>> > I am assuming that the colored throughput spikes are due to occasional
>> > packet losses.  Please let me know if this interpretation is overly
>>naive.
>> >
>> > Also, I know what ICMP is, but the UDP variants are new to me.  Could
>> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>> >
>> > > (Arguably, the default queue length in codel can be reduced from 10k
>> > > packets to something more reasonable at GigE speeds)
>> > >
>> > > (the indicator that it's the graph, not the reality, is that the
>> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual
>>with
>> > > fq_codel.)
>> >
>> > All sessions were started at T+5, then?
>> >
>> > > As for the *.ps graphs, well, they would take david's network
>>topology
>> > > to explain, and were conducted over a variety of circumstances,
>> > > including wifi, with more variables in play than I care to think
>> > > about.
>> > >
>> > > We didn't really get anywhere on digging deeper. As we got to purer
>> > > tests - with a minimal number of boxes, running pure ethernet,
>> > > switched over a couple of switches, even in the simplest two box
>>case,
>> > > my HTB based "ceroshaper" implementation had multiple problems in
>> > > cutting median latencies below 100ms, on this very slow ADSL link.
>> > > David suspects problems on the path along the carrier backbone as a
>> > > potential issue, and the only way to measure that is with two one
>>way
>> > > trip time measurements (rather than rtt), time synced via ntp... I
>> > > keep hoping to find a rtp test, but I'm open to just about any
>>option
>> > > at this point. anyone?
>> > >
>> > > We also found a probable bug in mtr in that multiple mtrs on the
>>same
>> > > box don't co-exist.
>> >
>> > I must confess that I am not seeing all that clear a difference
>>between
>> > the behaviors of ceroshaper and FQ-CoDel.  Maybe somewhat better
>>latencies
>> > for FQ-CoDel, but not unambiguously so.
>> >
>> > > Moving back to more scientific clarity and simpler tests...
>> > >
>> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
>> > >
>> > >
>> >=20
>>http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
>>erbloat_on_wifi.pdf
>> > >
>> > > appear to show the advantage of fq_codel fq + codel + head drop over
>> > > tail drop during the slow start period on a 10Mbit link - (see how
>> > > squiggly slow start is on pfifo fast?) as well as the marvelous
>> > > interstream latency that can be achieved with BQL=3D3000 (on a 10 mb=
it
>> > > link.)  Even that latency can be halved by reducing BQL to 1500,
>>which
>> > > is just fine on a 10mbit. Below those rates I'd like to be rid of
>>BQL
>> > > entirely, and just have a single packet outstanding... in everything
>> > > from adsl to cable...
>> > >
>> > > That said, I'd welcome other explanations of the squiggly slowstart
>> > > pfifo_fast behavior before I put that explanation on the slide....
>>ECN
>> > > was in play here, too. I can redo this test easily, it's basically
>> > > running a netperf TCP_RR for 70 seconds, and starting up a
>>TCP_MAERTS
>> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
>> > > limit and the link speeds on two sides of a directly connected
>>laptop
>> > > connection.
>> >
>> > I must defer to others on this one.  I do note the much lower
>>latencies
>> > on slide 6 compared to slide 5, though.
>> >
>> > Please see attached for update including .git directory.
>> >
>> >                                                         Thanx, Paul
>> >
>> > > ethtool -s eth0 advertise 0x002 # 10 Mbit
>> > >
>> >
>> > _______________________________________________
>> > Cerowrt-devel mailing list
>> > Cerowrt-devel@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >
>> >
>
>_______________________________________________
>Codel mailing list
>Codel@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/codel