[Codel] [Cerowrt-devel] FQ_Codel lwn draft article review
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Wed Nov 28 11:01:33 EST 2012
Dave gave me back the pen, so I looked to see what I had expanded
FQ-CoDel to. The answer was... Nothing. Nothing at all.
So I added a Quick Quiz as follows:
Quick Quiz 2: What does the FQ-CoDel acronym expand to?
Answer: There are some differences of opinion on this. The
comment header in net/sched/sch_fq_codel.c says
“Fair Queue CoDel” (presumably by analogy to SFQ's
expansion of “Stochastic Fairness Queueing”), and
“CoDel” is generally agreed to expand to “controlled
delay”. However, some prefer “Flow Queue Controlled
Delay” and still others prefer to prepend a silent and
invisible "S", expanding to “Stochastic Flow Queue
Controlled Delay” or “Smart Flow Queue Controlled
Delay”. No doubt additional expansions will appear in
the fullness of time.
In the meantime, this article focuses on the concepts,
implementation, and performance, leaving naming debates
to others.
This level snarkiness would go over reasonably well in an LWN article,
I would -not- suggest this approach in an academic paper, just in case
you were wondering. But if there is too much discomfort with snarking,
I just might be convinced to take another approach.
Thanx, Paul
On Tue, Nov 27, 2012 at 08:38:38PM -0800, Paul E. McKenney wrote:
> I guess I just have to be grateful that people mostly agree on the acronym,
> regardless of the expansion.
>
> Thanx, Paul
>
> On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
> >
> > It would be me that tries to say "stochastic flow queuing with CoDel"
> > as I like to be accurate. But I think FQ-Codel is Flow queuing with CoDel.
> > JimG suggests "smart flow queuing" because he is ever mindful of the
> > big audience.
> >
> > On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> > >> BTW, I've heard some use the term "stochastic flow queueing" as a
> > >> replacement to avoid the term "fair". Seems like a more apt term anyway.
> > >
> > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay? ;-)
> > >
> > > Thanx, Paul
> > >
> > >> -Greg
> > >>
> > >>
> > >> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> wrote:
> > >>
> > >>> Thank you for the review and comments, Jim! I will apply them when
> > >>> I get the pen back from Dave. And yes, that is the thing about
> > >>> "fairness" -- there are a great many definitions, many of the most
> > >>> useful of which appear to many to be patently unfair. ;-)
> > >>>
> > >>> As you suggest, it might well be best to drop discussion of fairness,
> > >>> or to at the least supply the corresponding definition.
> > >>>
> > >>> Thanx, Paul
> > >>>
> > >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> > >>>> Some points worth making:
> > >>>>
> > >>>> 1) It is important to point out that (and how) fq_codel avoids
> > >>>> starvation:
> > >>>> unpleasant as elephant flows are, it would be very unfriendly to never
> > >>>> service them at all until they time out.
> > >>>>
> > >>>> 2) "fairness" is not necessarily what we ultimately want at all; you'd
> > >>>> really like to penalize those who induce congestion the most. But we
> > >>>> don't
> > >>>> currently have a solution (though Bob Briscoe at BT thinks he does, and
> > >>>> is
> > >>>> seeing if he can get it out from under a BT patent), so the current
> > >>>> fq_codel round robins ultimately until/unless we can do something like
> > >>>> Bob's idea. This is a local information only subset of the ideas he's
> > >>>> been
> > >>>> working on in the congestion exposure (conex) group at the IETF.
> > >>>>
> > >>>> 3) "fairness" is always in the eyes of the beholder (and should be left
> > >>>> to
> > >>>> the beholder to determine). "fairness" depends on where in the network
> > >>>> you
> > >>>> are. While being "fair" among TCP flows is sensible default policy for
> > >>>> a
> > >>>> host, else where in the network it may not be/usually isn't.
> > >>>>
> > >>>> Two examples:
> > >>>> o at a home router, you probably want to be "fair" according to transmit
> > >>>> opportunities. We really don't want a single system remote from the
> > >>>> router
> > >>>> to be able to starve the network so that devices near the router get
> > >>>> much
> > >>>> less bandwidth than you might hope/expect.
> > >>>>
> > >>>> What is more, you probably want to account for a single host using many
> > >>>> flows, and regulate that they not be able to "hog" bandwidth in the home
> > >>>> environment, but only use their "fair" share.
> > >>>>
> > >>>> o at an ISP, you must to be "fair" between customers; it is best to
> > >>>> leave
> > >>>> the judgement of "fairness" at finer granularity (e.g. host and TCP
> > >>>> flows)
> > >>>> to the points closer to the customer's systems, so that they can enforce
> > >>>> whatever definition of "fair" they need to themselves.
> > >>>>
> > >>>>
> > >>>> Algorithms like fq_codel can be/should be adjusted to the circumstances.
> > >>>>
> > >>>> And therefore exactly what you choose to hash against to form the
> > >>>> buckets
> > >>>> will vary depending on where you are. That at least one step (at the
> > >>>> user's device) of this be TCP flow "fair" does have the great advantage
> > >>>> of
> > >>>> helping the RTT unfairness problem that violates the principle of "least
> > >>>> surprise", such as that routinely seen in places like New Zealand.
> > >>>>
> > >>>> This is why I have so many problems using the word "fair" near this
> > >>>> algorithm. "fair" is impossible to define, overloaded in people's mind
> > >>>> with TCP fair queuing, not even desirable much of the time, and by
> > >>>> definition and design, even today's fq_codel isn't fair to lots of
> > >>>> things,
> > >>>> and the same basic algorithm can/should be tweaked in lots of directions
> > >>>> depending on what we need to do. Calling this "smart" queuing or some
> > >>>> such
> > >>>> would be better.
> > >>>>
> > >>>> When you've done another round on the document, I'll do a more detailed
> > >>>> read.
> > >>>> - Jim
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> > >>>> paulmck at linux.vnet.ibm.com> wrote:
> > >>>>
> > >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > >>>>>> variety of drivers and network layers in a typical bonded adsl stack
> > >>>>>> yesterday. The complexity of it all makes my head hurt. I'm happy
> > >>>> that
> > >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged from
> > >>>> it,
> > >>>>>> which he submitted to netdev...
> > >>>>>
> > >>>>> Cool!!! ;-)
> > >>>>>
> > >>>>>> I made a recording of us last night discussing the layers, which I
> > >>>>>> will produce and distribute later...
> > >>>>>>
> > >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze where
> > >>>>>> the 350ms or so of added latency was coming from in the traverse
> > >>>> geo's
> > >>>>>> adsl implementation and overlying stack....
> > >>>>>>
> > >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > >>>>>>
> > >>>>>> Note: 1:
> > >>>>>>
> > >>>>>> The netperf sample rate on the rrul test needs to be higher than
> > >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
> > >>>>>>
> > >>>>>> Note 2:
> > >>>>>>
> > >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were taken
> > >>>> on
> > >>>>>> a gigE link from a Mac running Linux to another gigE link. (in other
> > >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
> > >>>> don't
> > >>>>>> know what else) I find the T+10 delay before stream start in the
> > >>>>>> fq.svg graph suspicious and think the "throw out the outlier" code
> > >>>> in
> > >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is merely
> > >>>>>> buffering up things madly, which can also be seen in the pfifo_fast
> > >>>>>> behavior, with 1000pkts it's default.
> > >>>>>
> > >>>>> I am using these two in a new "Effectiveness of FQ-CoDel" section.
> > >>>>> Chrome can display .svg, and if it becomes a problem, I am sure that
> > >>>>> they can be converted. Please let me know if some other data would
> > >>>>> make the point better.
> > >>>>>
> > >>>>> I am assuming that the colored throughput spikes are due to occasional
> > >>>>> packet losses. Please let me know if this interpretation is overly
> > >>>> naive.
> > >>>>>
> > >>>>> Also, I know what ICMP is, but the UDP variants are new to me. Could
> > >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> > >>>>>
> > >>>>>> (Arguably, the default queue length in codel can be reduced from 10k
> > >>>>>> packets to something more reasonable at GigE speeds)
> > >>>>>>
> > >>>>>> (the indicator that it's the graph, not the reality, is that the
> > >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is usual
> > >>>> with
> > >>>>>> fq_codel.)
> > >>>>>
> > >>>>> All sessions were started at T+5, then?
> > >>>>>
> > >>>>>> As for the *.ps graphs, well, they would take david's network
> > >>>> topology
> > >>>>>> to explain, and were conducted over a variety of circumstances,
> > >>>>>> including wifi, with more variables in play than I care to think
> > >>>>>> about.
> > >>>>>>
> > >>>>>> We didn't really get anywhere on digging deeper. As we got to purer
> > >>>>>> tests - with a minimal number of boxes, running pure ethernet,
> > >>>>>> switched over a couple of switches, even in the simplest two box
> > >>>> case,
> > >>>>>> my HTB based "ceroshaper" implementation had multiple problems in
> > >>>>>> cutting median latencies below 100ms, on this very slow ADSL link.
> > >>>>>> David suspects problems on the path along the carrier backbone as a
> > >>>>>> potential issue, and the only way to measure that is with two one
> > >>>> way
> > >>>>>> trip time measurements (rather than rtt), time synced via ntp... I
> > >>>>>> keep hoping to find a rtp test, but I'm open to just about any
> > >>>> option
> > >>>>>> at this point. anyone?
> > >>>>>>
> > >>>>>> We also found a probable bug in mtr in that multiple mtrs on the
> > >>>> same
> > >>>>>> box don't co-exist.
> > >>>>>
> > >>>>> I must confess that I am not seeing all that clear a difference
> > >>>> between
> > >>>>> the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better
> > >>>> latencies
> > >>>>> for FQ-CoDel, but not unambiguously so.
> > >>>>>
> > >>>>>> Moving back to more scientific clarity and simpler tests...
> > >>>>>>
> > >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> > >>>> erbloat_on_wifi.pdf
> > >>>>>>
> > >>>>>> appear to show the advantage of fq_codel fq + codel + head drop over
> > >>>>>> tail drop during the slow start period on a 10Mbit link - (see how
> > >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
> > >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > >>>>>> link.) Even that latency can be halved by reducing BQL to 1500,
> > >>>> which
> > >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid of
> > >>>> BQL
> > >>>>>> entirely, and just have a single packet outstanding... in everything
> > >>>>>> from adsl to cable...
> > >>>>>>
> > >>>>>> That said, I'd welcome other explanations of the squiggly slowstart
> > >>>>>> pfifo_fast behavior before I put that explanation on the slide....
> > >>>> ECN
> > >>>>>> was in play here, too. I can redo this test easily, it's basically
> > >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
> > >>>> TCP_MAERTS
> > >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > >>>>>> limit and the link speeds on two sides of a directly connected
> > >>>> laptop
> > >>>>>> connection.
> > >>>>>
> > >>>>> I must defer to others on this one. I do note the much lower
> > >>>> latencies
> > >>>>> on slide 6 compared to slide 5, though.
> > >>>>>
> > >>>>> Please see attached for update including .git directory.
> > >>>>>
> > >>>>> Thanx, Paul
> > >>>>>
> > >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
> > >>>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Cerowrt-devel mailing list
> > >>>>> Cerowrt-devel at lists.bufferbloat.net
> > >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > >>>>>
> > >>>>>
> > >>>
> > >>> _______________________________________________
> > >>> Codel mailing list
> > >>> Codel at lists.bufferbloat.net
> > >>> https://lists.bufferbloat.net/listinfo/codel
> > >>
> > >
> > > _______________________________________________
> > > Codel mailing list
> > > Codel at lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/codel
> > >
> >
More information about the Codel
mailing list