From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <paulmck@linux.vnet.ibm.com>
Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "e35.co.us.ibm.com", Issuer "GeoTrust SSL CA" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 230AD21F11D
	for <codel@lists.bufferbloat.net>; Tue, 27 Nov 2012 14:49:22 -0800 (PST)
Received: from /spool/local
	by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <codel@lists.bufferbloat.net> from <paulmck@linux.vnet.ibm.com>;
	Tue, 27 Nov 2012 15:49:20 -0700
Received: from d03dlp02.boulder.ibm.com (9.17.202.178)
	by e35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted; 
	Tue, 27 Nov 2012 15:49:19 -0700
Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com
	[9.17.195.227])
	by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id A94B63E4003D;
	Tue, 27 Nov 2012 15:49:15 -0700 (MST)
Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168])
	by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	qARMnH3u297794; Tue, 27 Nov 2012 15:49:17 -0700
Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1])
	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id qARMnGtp002342; Tue, 27 Nov 2012 15:49:17 -0700
Received: from paulmck-ThinkPad-W500 ([9.47.24.61])
	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id
	qARMnGOY002312; Tue, 27 Nov 2012 15:49:16 -0700
Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000)
	id 10B45EBF25; Tue, 27 Nov 2012 14:49:16 -0800 (PST)
Date: Tue, 27 Nov 2012 14:49:16 -0800
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jim Gettys <jg@freedesktop.org>
Message-ID: <20121127224915.GM2474@linux.vnet.ibm.com>
References: <CAA93jw5yFvrOyXu2s2DY3oK_0v3OaNfnL+1zTteJodfxtAAzcQ@mail.gmail.com>
	<CAA93jw5DcnWbE9Zb-JCeimT+YckUcM7AA3iiwXJx95Np2F8vmw@mail.gmail.com>
	<20121123221842.GD2829@linux.vnet.ibm.com>
	<CAGhGL2BoyZ+p+sD5kCq3n-8eQUkMa3gRj77m1x8ga262uY513g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAGhGL2BoyZ+p+sD5kCq3n-8eQUkMa3gRj77m1x8ga262uY513g@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12112722-4834-0000-0000-000000CD60B4
Cc: Paolo Valente <paolo.valente@unimore.it>,
	Toke =?iso-8859-1?Q?H=F8iland-J=F8rgensen?= <toke@toke.dk>,
	Eric Raymond <esr@thyrsus.com>,
	"codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>,
	"cerowrt-devel@lists.bufferbloat.net"
	<cerowrt-devel@lists.bufferbloat.net>, bloat <bloat@lists.bufferbloat.net>,
	John Crispin <blogic@openwrt.org>
Subject: Re: [Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
Reply-To: paulmck@linux.vnet.ibm.com
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 27 Nov 2012 22:49:22 -0000

Thank you for the review and comments, Jim!  I will apply them when
I get the pen back from Dave.  And yes, that is the thing about
"fairness" -- there are a great many definitions, many of the most
useful of which appear to many to be patently unfair.  ;-)

As you suggest, it might well be best to drop discussion of fairness,
or to at the least supply the corresponding definition.

							Thanx, Paul

On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> Some points worth making:
> 
> 1) It is important to point out that (and how) fq_codel avoids starvation:
> unpleasant as elephant flows are, it would be very unfriendly to never
> service them at all until they time out.
> 
> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> really like to penalize those who induce congestion the most.  But we don't
> currently have a solution (though Bob Briscoe at BT thinks he does, and is
> seeing if he can get it out from under a BT patent), so the current
> fq_codel round robins ultimately until/unless we can do something like
> Bob's idea.  This is a local information only subset of the ideas he's been
> working on in the congestion exposure (conex) group at the IETF.
> 
> 3) "fairness" is always in the eyes of the beholder (and should be left to
> the beholder to determine). "fairness" depends on where in the network you
> are.  While being "fair" among TCP flows is sensible default policy for a
> host, else where in the network it may not be/usually isn't.
> 
> Two examples:
> o at a home router, you probably want to be "fair" according to transmit
> opportunities.  We really don't want a single system remote from the router
> to be able to starve the network so that devices near the router get much
> less bandwidth than you might hope/expect.
> 
> What is more, you probably want to account for a single host using many
> flows, and regulate that they not be able to "hog" bandwidth in the home
> environment, but only use their "fair" share.
> 
> o at an ISP, you must to be "fair" between customers; it is best to leave
> the judgement of "fairness" at finer granularity (e.g. host and TCP flows)
> to the points closer to the customer's systems, so that they can enforce
> whatever definition of "fair" they need to themselves.
> 
> 
> Algorithms like fq_codel can be/should be adjusted to the circumstances.
> 
> And therefore exactly what you choose to hash against to form the buckets
> will vary depending on where you are.  That at least one step (at the
> user's device) of this be TCP flow "fair" does have the great advantage of
> helping the RTT unfairness problem that violates the principle of "least
> surprise", such as that routinely seen in places like New Zealand.
> 
> This is why I have so many problems using the word "fair" near this
> algorithm.  "fair" is impossible to define, overloaded in people's mind
> with TCP fair queuing, not even desirable much of the time, and by
> definition and design, even today's fq_codel isn't fair to lots of things,
> and the same basic algorithm can/should be tweaked in lots of directions
> depending on what we need to do.  Calling this "smart" queuing or some such
> would be better.
> 
> When you've done another round on the document, I'll do a more detailed
> read.
>                              - Jim
> 
> 
> 
> 
> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > > David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > > variety of drivers and network layers in a typical bonded adsl stack
> > > yesterday. The complexity of it all makes my head hurt. I'm happy that
> > > a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> > > which he submitted to netdev...
> >
> > Cool!!!  ;-)
> >
> > > I made a recording of us last night discussing the layers, which I
> > > will produce and distribute later...
> > >
> > > Anyway, along the way, we fiddled a lot with trying to analyze where
> > > the 350ms or so of added latency was coming from in the traverse geo's
> > > adsl implementation and overlying stack....
> > >
> > > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > >
> > > Note: 1:
> > >
> > > The  netperf sample rate on the rrul test needs to be higher than
> > > 100ms in order to get a decent result at sub 10Mbit speeds.
> > >
> > > Note 2:
> > >
> > > The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> > > a gigE link from a Mac running Linux to another gigE link. (in other
> > > words, NOT on the friggin adsl link) (firefox can display svg, I don't
> > > know what else) I find the T+10 delay before stream start in the
> > > fq.svg graph suspicious and think the "throw out the outlier" code in
> > > the netperf-wrapper code is at fault. Prior to that, codel is merely
> > > buffering up things madly, which can also be seen in the pfifo_fast
> > > behavior, with 1000pkts it's default.
> >
> > I am using these two in a new "Effectiveness of FQ-CoDel" section.
> > Chrome can display .svg, and if it becomes a problem, I am sure that
> > they can be converted.  Please let me know if some other data would
> > make the point better.
> >
> > I am assuming that the colored throughput spikes are due to occasional
> > packet losses.  Please let me know if this interpretation is overly naive.
> >
> > Also, I know what ICMP is, but the UDP variants are new to me.  Could
> > you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> >
> > > (Arguably, the default queue length in codel can be reduced from 10k
> > > packets to something more reasonable at GigE speeds)
> > >
> > > (the indicator that it's the graph, not the reality, is that the
> > > fq.svg pings and udp start at T+5 and grow minimally, as is usual with
> > > fq_codel.)
> >
> > All sessions were started at T+5, then?
> >
> > > As for the *.ps graphs, well, they would take david's network topology
> > > to explain, and were conducted over a variety of circumstances,
> > > including wifi, with more variables in play than I care to think
> > > about.
> > >
> > > We didn't really get anywhere on digging deeper. As we got to purer
> > > tests - with a minimal number of boxes, running pure ethernet,
> > > switched over a couple of switches, even in the simplest two box case,
> > > my HTB based "ceroshaper" implementation had multiple problems in
> > > cutting median latencies below 100ms, on this very slow ADSL link.
> > > David suspects problems on the path along the carrier backbone as a
> > > potential issue, and the only way to measure that is with two one way
> > > trip time measurements (rather than rtt), time synced via ntp... I
> > > keep hoping to find a rtp test, but I'm open to just about any option
> > > at this point. anyone?
> > >
> > > We also found a probable bug in mtr in that multiple mtrs on the same
> > > box don't co-exist.
> >
> > I must confess that I am not seeing all that clear a difference between
> > the behaviors of ceroshaper and FQ-CoDel.  Maybe somewhat better latencies
> > for FQ-CoDel, but not unambiguously so.
> >
> > > Moving back to more scientific clarity and simpler tests...
> > >
> > > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> > >
> > >
> > http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
> > >
> > > appear to show the advantage of fq_codel fq + codel + head drop over
> > > tail drop during the slow start period on a 10Mbit link - (see how
> > > squiggly slow start is on pfifo fast?) as well as the marvelous
> > > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > > link.)  Even that latency can be halved by reducing BQL to 1500, which
> > > is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> > > entirely, and just have a single packet outstanding... in everything
> > > from adsl to cable...
> > >
> > > That said, I'd welcome other explanations of the squiggly slowstart
> > > pfifo_fast behavior before I put that explanation on the slide.... ECN
> > > was in play here, too. I can redo this test easily, it's basically
> > > running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> > > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > > limit and the link speeds on two sides of a directly connected laptop
> > > connection.
> >
> > I must defer to others on this one.  I do note the much lower latencies
> > on slide 6 compared to slide 5, though.
> >
> > Please see attached for update including .git directory.
> >
> >                                                         Thanx, Paul
> >
> > > ethtool -s eth0 advertise 0x002 # 10 Mbit
> > >
> >
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
> >