[Codel] useful netem-based testing?

Thu Jul 11 14:37:35 EDT 2013

My results with netem were generally so dismal that I'd abandoned it.
I just finished building a bunch of kernels from net-next with eric's
improvements to it, so I'm about to try it again.

My intent is to duplicate most of the RTTs cablelabs simulated with,
under essentially the same workloads, as well as things like rrul,
rrul_rtt_fair, etc. I intend to be using ipv6 primarily just to make
more headaches for myself... and just finished a tc specific script
for moving stuff around at varying rtts to different virtual servers
here:

https://github.com/dtaht/netem-ipv6/blob/master/cablelabs.sh

I figure I should switch to drr from prio in order to make this work right?

(and obviously the limit needs to be high enough for bandwidth and
rtt, haven't done that yet)

If anyone has any suggestions as to useful stuff to do with netem
(assuming I get sane results with it this time around), now's the time
to ask! Certainly I find reordering and packet loss interesting things
to dink with, but I'd settle for sane delays to start with....

Now: I haven't started working with netem yet.

I'm still validating the behavior of the atom boxes under various
workloads, merely talking to each other directly at first, then the
other boxes in series (testdriver -> router -> delays -> router(s) ->
server(s))
                                          (fast box) -> atom  -> atom
-> cerowrt -> atom/beaglebone/other boxes

One oddity that cropped up is that in turning off offloads entirely at
GigE on the two atom boxes the netserver processes peg both cpus (kind
of expected), can't achieve full bandwidth (also expected), and that
latencies then creep up over a very long test cycle (unexpected) on
fq_codel. I figure this is a bug of some sort (cpu scheduler? memory
leak somewhere?) and doesn't (visibly) happen with offloads on, and
haven't tried 100Mbit yet.

http://results.lab.taht.net/maxpacketlimit_300s_NoOffloads_IPv6_1Gbit_Atoms-5889/maxpacketlimit_300s_Offloads_IPv6_1Gbit_Atoms-fq_codel-all_scaled-rrul.svg

Thinking that perhaps BQL was growing its estimate based on not being
able to keep the ethernet device saturated, I set it to a fixed
amount... don't think so...

http://results.lab.taht.net/maxpacketlimit_300s_NoOffloads_BQL20000_IPv6_1Gbit_Atoms-7228/maxpacketlimit_300s_NoOffloads_BQL20000_IPv6_1Gbit_Atoms-fq_codel-all_scaled-rrul.svg

The slope of the curve for codel also shows a slight overall growth too:

http://results.lab.taht.net/maxpacketlimit_300s_NoOffloads_IPv6_1Gbit_Atoms-5889/maxpacketlimit_300s_Offloads_IPv6_1Gbit_Atoms-codel-all_scaled-rrul.svg

I am running even longer period tests now. I might just backport the
netem changes to 3.10 and try again.

the quick n dirty test scripts are here:
http://results.lab.taht.net/revised_nfq/

-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html