[Cerowrt-devel] FQ_Codel lwn draft article review
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Fri Nov 23 17:18:42 EST 2012
On Fri, Nov 23, 2012 at 09:57:34AM +0100, Dave Taht wrote:
> David Woodhouse and I fiddled a lot with adsl and openwrt and a
> variety of drivers and network layers in a typical bonded adsl stack
> yesterday. The complexity of it all makes my head hurt. I'm happy that
> a newly BQL'd ethernet driver (for the geos and qemu) emerged from it,
> which he submitted to netdev...
> I made a recording of us last night discussing the layers, which I
> will produce and distribute later...
> Anyway, along the way, we fiddled a lot with trying to analyze where
> the 350ms or so of added latency was coming from in the traverse geo's
> adsl implementation and overlying stack....
> Plots: http://david.woodhou.se/dwmw2-netperf-plots.tar.gz
> Note: 1:
> The netperf sample rate on the rrul test needs to be higher than
> 100ms in order to get a decent result at sub 10Mbit speeds.
> Note 2:
> The two nicest graphs here are nofq.svg vs fq.svg, which were taken on
> a gigE link from a Mac running Linux to another gigE link. (in other
> words, NOT on the friggin adsl link) (firefox can display svg, I don't
> know what else) I find the T+10 delay before stream start in the
> fq.svg graph suspicious and think the "throw out the outlier" code in
> the netperf-wrapper code is at fault. Prior to that, codel is merely
> buffering up things madly, which can also be seen in the pfifo_fast
> behavior, with 1000pkts it's default.
I am using these two in a new "Effectiveness of FQ-CoDel" section.
Chrome can display .svg, and if it becomes a problem, I am sure that
they can be converted. Please let me know if some other data would
make the point better.
I am assuming that the colored throughput spikes are due to occasional
packet losses. Please let me know if this interpretation is overly naive.
Also, I know what ICMP is, but the UDP variants are new to me. Could
you please expand the "EF", "BK", "BE", and "CSS" acronyms?
> (Arguably, the default queue length in codel can be reduced from 10k
> packets to something more reasonable at GigE speeds)
> (the indicator that it's the graph, not the reality, is that the
> fq.svg pings and udp start at T+5 and grow minimally, as is usual with
All sessions were started at T+5, then?
> As for the *.ps graphs, well, they would take david's network topology
> to explain, and were conducted over a variety of circumstances,
> including wifi, with more variables in play than I care to think
> We didn't really get anywhere on digging deeper. As we got to purer
> tests - with a minimal number of boxes, running pure ethernet,
> switched over a couple of switches, even in the simplest two box case,
> my HTB based "ceroshaper" implementation had multiple problems in
> cutting median latencies below 100ms, on this very slow ADSL link.
> David suspects problems on the path along the carrier backbone as a
> potential issue, and the only way to measure that is with two one way
> trip time measurements (rather than rtt), time synced via ntp... I
> keep hoping to find a rtp test, but I'm open to just about any option
> at this point. anyone?
> We also found a probable bug in mtr in that multiple mtrs on the same
> box don't co-exist.
I must confess that I am not seeing all that clear a difference between
the behaviors of ceroshaper and FQ-CoDel. Maybe somewhat better latencies
for FQ-CoDel, but not unambiguously so.
> Moving back to more scientific clarity and simpler tests...
> The two graphs, taken a few weeks back, on pages 5 and 6 of this:
> appear to show the advantage of fq_codel fq + codel + head drop over
> tail drop during the slow start period on a 10Mbit link - (see how
> squiggly slow start is on pfifo fast?) as well as the marvelous
> interstream latency that can be achieved with BQL=3000 (on a 10 mbit
> link.) Even that latency can be halved by reducing BQL to 1500, which
> is just fine on a 10mbit. Below those rates I'd like to be rid of BQL
> entirely, and just have a single packet outstanding... in everything
> from adsl to cable...
> That said, I'd welcome other explanations of the squiggly slowstart
> pfifo_fast behavior before I put that explanation on the slide.... ECN
> was in play here, too. I can redo this test easily, it's basically
> running a netperf TCP_RR for 70 seconds, and starting up a TCP_MAERTS
> and TCP_STREAM for 60 seconds a T+5, after hammering down on BQL's
> limit and the link speeds on two sides of a directly connected laptop
I must defer to others on this one. I do note the much lower latencies
on slide 6 compared to slide 5, though.
Please see attached for update including .git directory.
> ethtool -s eth0 advertise 0x002 # 10 Mbit
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 893092 bytes
More information about the Cerowrt-devel