[Cerowrt-devel] FQ_Codel lwn draft article review

Jim Gettys jg at freedesktop.org
Tue Nov 27 17:03:02 EST 2012


Some points worth making:

1) It is important to point out that (and how) fq_codel avoids starvation:
unpleasant as elephant flows are, it would be very unfriendly to never
service them at all until they time out.

2) "fairness" is not necessarily what we ultimately want at all; you'd
really like to penalize those who induce congestion the most.  But we don't
currently have a solution (though Bob Briscoe at BT thinks he does, and is
seeing if he can get it out from under a BT patent), so the current
fq_codel round robins ultimately until/unless we can do something like
Bob's idea.  This is a local information only subset of the ideas he's been
working on in the congestion exposure (conex) group at the IETF.

3) "fairness" is always in the eyes of the beholder (and should be left to
the beholder to determine). "fairness" depends on where in the network you
are.  While being "fair" among TCP flows is sensible default policy for a
host, else where in the network it may not be/usually isn't.

Two examples:
o at a home router, you probably want to be "fair" according to transmit
opportunities.  We really don't want a single system remote from the router
to be able to starve the network so that devices near the router get much
less bandwidth than you might hope/expect.

What is more, you probably want to account for a single host using many
flows, and regulate that they not be able to "hog" bandwidth in the home
environment, but only use their "fair" share.

o at an ISP, you must to be "fair" between customers; it is best to leave
the judgement of "fairness" at finer granularity (e.g. host and TCP flows)
to the points closer to the customer's systems, so that they can enforce
whatever definition of "fair" they need to themselves.


Algorithms like fq_codel can be/should be adjusted to the circumstances.

And therefore exactly what you choose to hash against to form the buckets
will vary depending on where you are.  That at least one step (at the
user's device) of this be TCP flow "fair" does have the great advantage of
helping the RTT unfairness problem that violates the principle of "least
surprise", such as that routinely seen in places like New Zealand.

This is why I have so many problems using the word "fair" near this
algorithm.  "fair" is impossible to define, overloaded in people's mind
with TCP fair queuing, not even desirable much of the time, and by
definition and design, even today's fq_codel isn't fair to lots of things,
and the same basic algorithm can/should be tweaked in lots of directions
depending on what we need to do.  Calling this "smart" queuing or some such
would be better.

When you've done another round on the document, I'll do a more detailed
read.
                             - Jim




On Fri, Nov 23, 2012 at 5:18 PM, Paul E. McKenney <
paulmck at linux.vnet.ibm.com> wrote:

> On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> > David Woodhouse and I fiddled a lot with adsl and openwrt and a
> > variety of drivers and network layers in a typical bonded adsl stack
> > yesterday. The complexity of it all makes my head hurt. I'm happy that
> > a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> > which he submitted to netdev...
>
> Cool!!!  ;-)
>
> > I made a recording of us last night discussing the layers, which I
> > will produce and distribute later...
> >
> > Anyway, along the way, we fiddled a lot with trying to analyze where
> > the 350ms or so of added latency was coming from in the traverse geo's
> > adsl implementation and overlying stack....
> >
> > Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> >
> > Note: 1:
> >
> > The  netperf sample rate on the rrul test needs to be higher than
> > 100ms in order to get a decent result at sub 10Mbit speeds.
> >
> > Note 2:
> >
> > The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> > a gigE link from a Mac running Linux to another gigE link. (in other
> > words, NOT on the friggin adsl link) (firefox can display svg, I don't
> > know what else) I find the T+10 delay before stream start in the
> > fq.svg graph suspicious and think the "throw out the outlier" code in
> > the netperf-wrapper code is at fault. Prior to that, codel is merely
> > buffering up things madly, which can also be seen in the pfifo_fast
> > behavior, with 1000pkts it's default.
>
> I am using these two in a new "Effectiveness of FQ-CoDel" section.
> Chrome can display .svg, and if it becomes a problem, I am sure that
> they can be converted.  Please let me know if some other data would
> make the point better.
>
> I am assuming that the colored throughput spikes are due to occasional
> packet losses.  Please let me know if this interpretation is overly naive.
>
> Also, I know what ICMP is, but the UDP variants are new to me.  Could
> you please expand the "EF", "BK", "BE", and "CSS" acronyms?
>
> > (Arguably, the default queue length in codel can be reduced from 10k
> > packets to something more reasonable at GigE speeds)
> >
> > (the indicator that it's the graph, not the reality, is that the
> > fq.svg pings and udp start at T+5 and grow minimally, as is usual with
> > fq_codel.)
>
> All sessions were started at T+5, then?
>
> > As for the *.ps graphs, well, they would take david's network topology
> > to explain, and were conducted over a variety of circumstances,
> > including wifi, with more variables in play than I care to think
> > about.
> >
> > We didn't really get anywhere on digging deeper. As we got to purer
> > tests - with a minimal number of boxes, running pure ethernet,
> > switched over a couple of switches, even in the simplest two box case,
> > my HTB based "ceroshaper" implementation had multiple problems in
> > cutting median latencies below 100ms, on this very slow ADSL link.
> > David suspects problems on the path along the carrier backbone as a
> > potential issue, and the only way to measure that is with two one way
> > trip time measurements (rather than rtt), time synced via ntp... I
> > keep hoping to find a rtp test, but I'm open to just about any option
> > at this point. anyone?
> >
> > We also found a probable bug in mtr in that multiple mtrs on the same
> > box don't co-exist.
>
> I must confess that I am not seeing all that clear a difference between
> the behaviors of ceroshaper and FQ-CoDel.  Maybe somewhat better latencies
> for FQ-CoDel, but not unambiguously so.
>
> > Moving back to more scientific clarity and simpler tests...
> >
> > The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> >
> >
> http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf
> >
> > appear to show the advantage of fq_codel fq + codel + head drop over
> > tail drop during the slow start period on a 10Mbit link - (see how
> > squiggly slow start is on pfifo fast?) as well as the marvelous
> > interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> > link.)  Even that latency can be halved by reducing BQL to 1500, which
> > is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> > entirely, and just have a single packet outstanding... in everything
> > from adsl to cable...
> >
> > That said, I'd welcome other explanations of the squiggly slowstart
> > pfifo_fast behavior before I put that explanation on the slide.... ECN
> > was in play here, too. I can redo this test easily, it's basically
> > running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> > and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> > limit and the link speeds on two sides of a directly connected laptop
> > connection.
>
> I must defer to others on this one.  I do note the much lower latencies
> on slide 6 compared to slide 5, though.
>
> Please see attached for update including .git directory.
>
>                                                         Thanx, Paul
>
> > ethtool -s eth0 advertise 0x002 # 10 Mbit
> >
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20121127/eeda9e61/attachment-0002.html>


More information about the Cerowrt-devel mailing list