[Cerowrt-devel] [Codel] FQ_Codel lwn draft article review

Paul E. McKenney paulmck at linux.vnet.ibm.com
Wed Nov 28 12:44:40 EST 2012


You lost me on this one.  It looks to me like net/sched/sch_fq_codel.c
in fact does hash packets into flows, so FQ-CoDel is stochastic in the
the same sense that SFQ is.  In particular, FQ-CoDel can hash a thin
session into the same flow as a thick session, which really is the
birthday effect.

Now FQ-CoDel uses a 1024-bucket hash table compared to SFQ's default
of 128 buckets, so FQ-CoDel will have smaller collision probabilities
than will SFQ on a given set of flows.  In addition, FQ-CoDel seems
to be able to tolerate a limited number of collisions among thin flows,
while SFQ doesn't distinguish thin from thick.

But the possibility of stochastic collision behavior really is there
with FQ-CoDel.  I hasten to add that in practice, I do not expect this
possibility of stochastic behavior to be a problem in the common case.

Or am I missing your point?  Or perhaps your definition of either
fairness or stochastic?

							Thanx, Paul

On Wed, Nov 28, 2012 at 06:16:08PM +0200, Jonathan Morton wrote:
> It may be worth noting that fq-codel is not stochastic in it's fairness
> mechanism. SFQ suffers from the birthday effect because it hashes packets
> into buffers, which is what makes it stochastic.
> 
> - Jonathan Morton
>  On Nov 28, 2012 6:02 PM, "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> wrote:
> 
> > Dave gave me back the pen, so I looked to see what I had expanded
> > FQ-CoDel to.  The answer was...  Nothing.  Nothing at all.
> >
> > So I added a Quick Quiz as follows:
> >
> >         Quick Quiz 2: What does the FQ-CoDel acronym expand to?
> >
> >         Answer: There are some differences of opinion on this. The
> >                 comment header in net/sched/sch_fq_codel.c says
> >                 “Fair Queue CoDel” (presumably by analogy to SFQ's
> >                 expansion of “Stochastic Fairness Queueing”), and
> >                 “CoDel” is generally agreed to expand to “controlled
> >                 delay”. However, some prefer “Flow Queue Controlled
> >                 Delay” and still others prefer to prepend a silent and
> >                 invisible "S", expanding to “Stochastic Flow Queue
> >                 Controlled Delay” or “Smart Flow Queue Controlled
> >                 Delay”. No doubt additional expansions will appear in
> >                 the fullness of time.
> >
> >                 In the meantime, this article focuses on the concepts,
> >                 implementation, and performance, leaving naming debates
> >                 to others.
> >
> > This level snarkiness would go over reasonably well in an LWN article,
> > I would -not- suggest this approach in an academic paper, just in case
> > you were wondering.  But if there is too much discomfort with snarking,
> > I just might be convinced to take another approach.
> >
> >                                                         Thanx, Paul
> >
> > On Tue, Nov 27, 2012 at 08:38:38PM -0800, Paul E. McKenney wrote:
> > > I guess I just have to be grateful that people mostly agree on the
> > acronym,
> > > regardless of the expansion.
> > >
> > >                                                       Thanx, Paul
> > >
> > > On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
> > > >
> > > > It would be me that tries to say "stochastic flow queuing with CoDel"
> > > > as I like to be accurate. But I think FQ-Codel is Flow queuing with
> > CoDel.
> > > > JimG suggests "smart flow queuing" because he is ever mindful of the
> > > > big audience.
> > > >
> > > > On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > > > > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> > > > >> BTW, I've heard some use the term "stochastic flow queueing" as a
> > > > >> replacement to avoid the term "fair".  Seems like a more apt term
> > anyway.
> > > > >
> > > > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay?  ;-)
> > > > >
> > > > >                                                   Thanx, Paul
> > > > >
> > > > >> -Greg
> > > > >>
> > > > >>
> > > > >> On 11/27/12 3:49 PM, "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> > wrote:
> > > > >>
> > > > >>> Thank you for the review and comments, Jim!  I will apply them when
> > > > >>> I get the pen back from Dave.  And yes, that is the thing about
> > > > >>> "fairness" -- there are a great many definitions, many of the most
> > > > >>> useful of which appear to many to be patently unfair.  ;-)
> > > > >>>
> > > > >>> As you suggest, it might well be best to drop discussion of
> > fairness,
> > > > >>> or to at the least supply the corresponding definition.
> > > > >>>
> > > > >>>                                                         Thanx, Paul
> > > > >>>
> > > > >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> > > > >>>> Some points worth making:
> > > > >>>>
> > > > >>>> 1) It is important to point out that (and how) fq_codel avoids
> > > > >>>> starvation:
> > > > >>>> unpleasant as elephant flows are, it would be very unfriendly to
> > never
> > > > >>>> service them at all until they time out.
> > > > >>>>
> > > > >>>> 2) "fairness" is not necessarily what we ultimately want at all;
> > you'd
> > > > >>>> really like to penalize those who induce congestion the most.
> >  But we
> > > > >>>> don't
> > > > >>>> currently have a solution (though Bob Briscoe at BT thinks he
> > does, and
> > > > >>>> is
> > > > >>>> seeing if he can get it out from under a BT patent), so the
> > current
> > > > >>>> fq_codel round robins ultimately until/unless we can do something
> > like
> > > > >>>> Bob's idea.  This is a local information only subset of the ideas
> > he's
> > > > >>>> been
> > > > >>>> working on in the congestion exposure (conex) group at the IETF.
> > > > >>>>
> > > > >>>> 3) "fairness" is always in the eyes of the beholder (and should
> > be left
> > > > >>>> to
> > > > >>>> the beholder to determine). "fairness" depends on where in the
> > network
> > > > >>>> you
> > > > >>>> are.  While being "fair" among TCP flows is sensible default
> > policy for
> > > > >>>> a
> > > > >>>> host, else where in the network it may not be/usually isn't.
> > > > >>>>
> > > > >>>> Two examples:
> > > > >>>> o at a home router, you probably want to be "fair" according to
> > transmit
> > > > >>>> opportunities.  We really don't want a single system remote from
> > the
> > > > >>>> router
> > > > >>>> to be able to starve the network so that devices near the router
> > get
> > > > >>>> much
> > > > >>>> less bandwidth than you might hope/expect.
> > > > >>>>
> > > > >>>> What is more, you probably want to account for a single host
> > using many
> > > > >>>> flows, and regulate that they not be able to "hog" bandwidth in
> > the home
> > > > >>>> environment, but only use their "fair" share.
> > > > >>>>
> > > > >>>> o at an ISP, you must to be "fair" between customers; it is best
> > to
> > > > >>>> leave
> > > > >>>> the judgement of "fairness" at finer granularity (e.g. host and
> > TCP
> > > > >>>> flows)
> > > > >>>> to the points closer to the customer's systems, so that they can
> > enforce
> > > > >>>> whatever definition of "fair" they need to themselves.
> > > > >>>>
> > > > >>>>
> > > > >>>> Algorithms like fq_codel can be/should be adjusted to the
> > circumstances.
> > > > >>>>
> > > > >>>> And therefore exactly what you choose to hash against to form the
> > > > >>>> buckets
> > > > >>>> will vary depending on where you are.  That at least one step (at
> > the
> > > > >>>> user's device) of this be TCP flow "fair" does have the great
> > advantage
> > > > >>>> of
> > > > >>>> helping the RTT unfairness problem that violates the principle of
> > "least
> > > > >>>> surprise", such as that routinely seen in places like New Zealand.
> > > > >>>>
> > > > >>>> This is why I have so many problems using the word "fair" near
> > this
> > > > >>>> algorithm.  "fair" is impossible to define, overloaded in
> > people's mind
> > > > >>>> with TCP fair queuing, not even desirable much of the time, and by
> > > > >>>> definition and design, even today's fq_codel isn't fair to lots of
> > > > >>>> things,
> > > > >>>> and the same basic algorithm can/should be tweaked in lots of
> > directions
> > > > >>>> depending on what we need to do.  Calling this "smart" queuing or
> > some
> > > > >>>> such
> > > > >>>> would be better.
> > > > >>>>
> > > > >>>> When you've done another round on the document, I'll do a more
> > detailed
> > > > >>>> read.
> > > > >>>>                              - Jim
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
> > > > >>>> paulmck at linux.vnet.ibm.com> wrote:
> > > > >>>>
> > > > >>>>> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > > > >>>>>> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > > > >>>>>> variety of drivers and network layers in a typical bonded adsl
> > stack
> > > > >>>>>> yesterday. The complexity of it all makes my head hurt. I'm
> > happy
> > > > >>>> that
> > > > >>>>>> a newly BQL'd ethernet driver (for the geos and qemu) emerged
> > from
> > > > >>>> it,
> > > > >>>>>> which he submitted to netdev...
> > > > >>>>>
> > > > >>>>> Cool!!!  ;-)
> > > > >>>>>
> > > > >>>>>> I made a recording of us last night discussing the layers,
> > which I
> > > > >>>>>> will produce and distribute later...
> > > > >>>>>>
> > > > >>>>>> Anyway, along the way, we fiddled a lot with trying to analyze
> > where
> > > > >>>>>> the 350ms or so of added latency was coming from in the traverse
> > > > >>>> geo's
> > > > >>>>>> adsl implementation and overlying stack....
> > > > >>>>>>
> > > > >>>>>> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> > > > >>>>>>
> > > > >>>>>> Note: 1:
> > > > >>>>>>
> > > > >>>>>> The  netperf sample rate on the rrul test needs to be higher
> > than
> > > > >>>>>> 100ms in order to get a decent result at sub 10Mbit speeds.
> > > > >>>>>>
> > > > >>>>>> Note 2:
> > > > >>>>>>
> > > > >>>>>> The two nicest graphs here are nofq.svg vs fq.svg, which were
> > taken
> > > > >>>> on
> > > > >>>>>> a gigE link from a Mac running Linux to another gigE link. (in
> > other
> > > > >>>>>> words, NOT on the friggin adsl link) (firefox can display svg, I
> > > > >>>> don't
> > > > >>>>>> know what else) I find the T+10 delay before stream start in the
> > > > >>>>>> fq.svg graph suspicious and think the "throw out the outlier"
> > code
> > > > >>>> in
> > > > >>>>>> the netperf-wrapper code is at fault. Prior to that, codel is
> > merely
> > > > >>>>>> buffering up things madly, which can also be seen in the
> > pfifo_fast
> > > > >>>>>> behavior, with 1000pkts it's default.
> > > > >>>>>
> > > > >>>>> I am using these two in a new "Effectiveness of FQ-CoDel"
> > section.
> > > > >>>>> Chrome can display .svg, and if it becomes a problem, I am sure
> > that
> > > > >>>>> they can be converted.  Please let me know if some other data
> > would
> > > > >>>>> make the point better.
> > > > >>>>>
> > > > >>>>> I am assuming that the colored throughput spikes are due to
> > occasional
> > > > >>>>> packet losses.  Please let me know if this interpretation is
> > overly
> > > > >>>> naive.
> > > > >>>>>
> > > > >>>>> Also, I know what ICMP is, but the UDP variants are new to me.
> >  Could
> > > > >>>>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> > > > >>>>>
> > > > >>>>>> (Arguably, the default queue length in codel can be reduced
> > from 10k
> > > > >>>>>> packets to something more reasonable at GigE speeds)
> > > > >>>>>>
> > > > >>>>>> (the indicator that it's the graph, not the reality, is that the
> > > > >>>>>> fq.svg pings and udp start at T+5 and grow minimally, as is
> > usual
> > > > >>>> with
> > > > >>>>>> fq_codel.)
> > > > >>>>>
> > > > >>>>> All sessions were started at T+5, then?
> > > > >>>>>
> > > > >>>>>> As for the *.ps graphs, well, they would take david's network
> > > > >>>> topology
> > > > >>>>>> to explain, and were conducted over a variety of circumstances,
> > > > >>>>>> including wifi, with more variables in play than I care to think
> > > > >>>>>> about.
> > > > >>>>>>
> > > > >>>>>> We didn't really get anywhere on digging deeper. As we got to
> > purer
> > > > >>>>>> tests - with a minimal number of boxes, running pure ethernet,
> > > > >>>>>> switched over a couple of switches, even in the simplest two box
> > > > >>>> case,
> > > > >>>>>> my HTB based "ceroshaper" implementation had multiple problems
> > in
> > > > >>>>>> cutting median latencies below 100ms, on this very slow ADSL
> > link.
> > > > >>>>>> David suspects problems on the path along the carrier backbone
> > as a
> > > > >>>>>> potential issue, and the only way to measure that is with two
> > one
> > > > >>>> way
> > > > >>>>>> trip time measurements (rather than rtt), time synced via
> > ntp... I
> > > > >>>>>> keep hoping to find a rtp test, but I'm open to just about any
> > > > >>>> option
> > > > >>>>>> at this point. anyone?
> > > > >>>>>>
> > > > >>>>>> We also found a probable bug in mtr in that multiple mtrs on the
> > > > >>>> same
> > > > >>>>>> box don't co-exist.
> > > > >>>>>
> > > > >>>>> I must confess that I am not seeing all that clear a difference
> > > > >>>> between
> > > > >>>>> the behaviors of ceroshaper and FQ-CoDel.  Maybe somewhat better
> > > > >>>> latencies
> > > > >>>>> for FQ-CoDel, but not unambiguously so.
> > > > >>>>>
> > > > >>>>>> Moving back to more scientific clarity and simpler tests...
> > > > >>>>>>
> > > > >>>>>> The two graphs, taken a few weeks back, on pages 5 and 6 of
> > this:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buff
> > > > >>>> erbloat_on_wifi.pdf
> > > > >>>>>>
> > > > >>>>>> appear to show the advantage of fq_codel fq + codel + head drop
> > over
> > > > >>>>>> tail drop during the slow start period on a 10Mbit link - (see
> > how
> > > > >>>>>> squiggly slow start is on pfifo fast?) as well as the marvelous
> > > > >>>>>> interstream latency that can be achieved with BQL=3000 (on a 10
> > mbit
> > > > >>>>>> link.)  Even that latency can be halved by reducing BQL to 1500,
> > > > >>>> which
> > > > >>>>>> is just fine on a 10mbit. Below those rates I'd like to be rid
> > of
> > > > >>>> BQL
> > > > >>>>>> entirely, and just have a single packet outstanding... in
> > everything
> > > > >>>>>> from adsl to cable...
> > > > >>>>>>
> > > > >>>>>> That said, I'd welcome other explanations of the squiggly
> > slowstart
> > > > >>>>>> pfifo_fast behavior before I put that explanation on the
> > slide....
> > > > >>>> ECN
> > > > >>>>>> was in play here, too. I can redo this test easily, it's
> > basically
> > > > >>>>>> running a netperf TCP_RR for 70 seconds, and starting up a
> > > > >>>> TCP_MAERTS
> > > > >>>>>> and TCP_STREAM for 60 seconds a T+5, after hammering down on
> > BQL's
> > > > >>>>>> limit and the link speeds on two sides of a directly connected
> > > > >>>> laptop
> > > > >>>>>> connection.
> > > > >>>>>
> > > > >>>>> I must defer to others on this one.  I do note the much lower
> > > > >>>> latencies
> > > > >>>>> on slide 6 compared to slide 5, though.
> > > > >>>>>
> > > > >>>>> Please see attached for update including .git directory.
> > > > >>>>>
> > > > >>>>>                                                         Thanx,
> > Paul
> > > > >>>>>
> > > > >>>>>> ethtool -s eth0 advertise 0x002 # 10 Mbit
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> _______________________________________________
> > > > >>>>> Cerowrt-devel mailing list
> > > > >>>>> Cerowrt-devel at lists.bufferbloat.net
> > > > >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > > > >>>>>
> > > > >>>>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> Codel mailing list
> > > > >>> Codel at lists.bufferbloat.net
> > > > >>> https://lists.bufferbloat.net/listinfo/codel
> > > > >>
> > > > >
> > > > > _______________________________________________
> > > > > Codel mailing list
> > > > > Codel at lists.bufferbloat.net
> > > > > https://lists.bufferbloat.net/listinfo/codel
> > > > >
> > > >
> >
> > _______________________________________________
> > Codel mailing list
> > Codel at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/codel
> >




More information about the Cerowrt-devel mailing list