[Codel] RRUL

Dave Taht dave.taht at gmail.com
Tue Jul 9 00:42:21 EDT 2013


I note that this convo would be saner over on the codel or bloat lists.

Jim and Dave had said some nice things about rrul, but I object to
them calling it "simple". :) It took a lot of thought to come up with
something that showed the intrinsic relationship of delay to
congestion avoidance and I still think it presents too much data by
default. I do like that it can show a ton of interesting things
possibly "wrong" in a dataset before leaping off into cdf-plot-land
which obscures the reality in a result.

For comparison, an industry standard is mrtg, which takes data on a 30
second period, averages it down to 5 minutes, 2 hours, days, weeks,
and is totally misleading as to peak traffic behavior on even a 5
minute timescale. I care about timescales down to about 2ms in the
general case, yet I can't think up a means as to be able to collect
that sort of data usefully via snmp...

Other issues with netperf-wrapper so far:

-1) Not enough people are running it.

0) There are a ton of other useful tests currently in the suite. The
simpler ones are pretty useful by themselves, however I regard the
ipv4 vs ipv6 competing tests as pretty useful, and I love the abiltity
to compare different forms of tcp in another one of the tests. The RTT
fairness test is very interesting...

1) Tons of full rate flows are not how the internet actually works
today. (with the exception of bittorrent, which due to bugs in the
TCP_LEDBAT code I can't exercise)

You typically have one or two at most, and the rest are things living
in slow start. It is a valid criticism of rrul to say it's not
realistic, and I keep stressing that it's modest purpose in life is to
load up a link, fast, and then you should run some sort of other test
against it, like the chrome benchmarks or (what I'm fiddling with
currently) webrtc... or a web workload, or a fileserver or...

After thinking about how to represent 32 flows, I think a rrul-like
presentation averaging 28 of the flows and then presenting the two
highest rate and two lowest rate will show the kind of outliers I tend
to look for but then you have huge variability in one flows
performance over another. I'd rather know why a given flow dropped
through the floor - was it a timeout, a string of dropped packets, a
bug. what?

2) RRUL itself can be misleading on the timescale I selected by
default. I try to look at things at 20 seconds and 300 seconds to try
to ensure I'm not fooling myself, and that the data is good. Recently,
for example, I found that the beaglebones had a cron job that would
mess up a test, happening once a minute.

20 seconds is very interesting from a slow start analysis perspective,
and even then the load looks nothing like what happens with a heavily
sharded website, nor do I have a grip on iw10, which is something
that's creeping higher on my lists to come up with ways to look at. I
flat out don't believe the google result in defense of iw10 because
they aren't measuring a post fq_codel reality but our present highly
overbuffered one under circumstances that seem to exist only in a lab,
not, in places like nicaragua.

The 20 second test is adaquate on short RTT links which I will show
later in this email, and saves you a lot of time... but I strongly
encourage 300second or more tests too!

3) RRUL is otherwise incomplete. I'm very happy that toke took it on
himself to generate such useful code (and unhappy that I haven't had
time to tackle some of the harder problems left over) and is having
such great results.... but I wish I knew a way to move it along so it
could be a test an ordinary sysadmin could run, and ultimately
something that could be an app on ios and android, windows, and
osx....

But in particular, my original intent with rrul was not to use ping
but a one way delay measurement in each direction. It's looking like
owamp (http://www.internet2.edu/performance/owamp/ )
 will do the job, but I have to admit I'd hoped to add a test like
that to netperf rather than have a separate infrastructure. The use of
ping doesn't scale well to 100+RTTS, and the bandwidth used by ping
explodes geometrically as you get shorter and shorter RTTs. Also the
current UDP ping in rrul dies after the loss of a single packet in
that flow.

This is somewhat useful in showing "survivability" of low rate flows
on the current graph, but I'd like some other measurement and graphing
method to show "bunchyness" of packet loss on low rate flows and keep
them sustained, which can only be done via an isochronous stream.

There are a few other things in the "spec" like mtr that would be good
to also be doing at the same time... and when we get into things like
looking at rtt fairness to different sites, things get harder.

Lastly,

Like toke, I have been gradually expanding the range of tests, and
tediously working through all the bugs involved. I documented a few of
them recently on g+. Here's another one - merely induced by running
the following configuration

1Atom -> 2Atom -> Switch -> Beaglebone
  GigE      GigE        GigE        100Mbit

So naively, you'd think that the bottleneck in the rightwards
direction would be in the switch, and in the leftwards one, the
beaglebone, right?

Already mentioned the cron job...

...

So, I ran this:

http://results.lab.taht.net/20s_NoOffloads_BQL9000_IPv6_Atoms_via_switch_beagle-1125/20s_NoOffloads_BQL9000_IPv6_Atoms_via_switch_beagle-pfifo_fast-all_scaled-rrul_noclassification.svg

http://results.lab.taht.net/20s_NoOffloads_BQL9000_IPv6_Atoms_via_switch_beagle-1125/20s_NoOffloads_BQL9000_IPv6_Atoms_via_switch_beagle-fq_codel-all_scaled-rrul_noclassification.svg

Wow, 90ms worth of buffering in one direction or the other. I knew,
going in, that the switch didn't have that much... so where's it
coming from?

OK, so I put nfq_codel in on the second atom and repeated the tests. I
note that it turned off gro offloads were enabled on 2atom too, and
that pushed codel's maxpacket out to 16k eventually, which did
interesting things to tcp's window and codel's drop strategy, which
you can see in the codel tests during that test series...

Now I get results that almost make sense...

http://results.lab.taht.net/20s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_to_beagle-3578/20s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_to_beagle-pfifo_fast-all_scaled-rrul_noclassification.svg

http://results.lab.taht.net/300s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_to_beagle-4330/300s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_to_beagle-fq_codel-all_scaled-rrul.svg

but - in cutting observed latency by 66% what this seems to show is
that fq_codel helps even in a situation where it's not the bottleneck
link,
or that the bottleneck link was the GigE 2atom box!?

Part of the problem here is that I'm still measuring things
bidirectionally, so I can't tell which side the latency comes from...
but it also points to another queue, in this case, NAPI, running on
one or both of the atom boxes, which establishes a queue for ingress
serviced relatively infrequently, then the OSes push all in a bunch to
the egress interface where it too has a micro-queue created for
fq_codel to work against... or it points to BQL being set too low so
one interface is starving (but that doesn't make sense as it's GigE)

but it does make it appear that fq_codel could be a good thing, even
if it's not on the bottleneck link (the switch), and I'm still sad
that ~17ms worth of buffering appears to lie in the switch, but I'm
not done isolating variables... and, well, if I use htb to make 2atom
the real bottleneck at 92Mbits in each direction using nfq_codel...

http://results.lab.taht.net/20s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_simplest.qos_via_enki_to_beagle-3123/20s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_simplest.qos_via_enki_to_beagle-fq_codel-all_scaled-rrul_noclassification.svg

http://results.lab.taht.net/20s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_simplest.qos_via_enki_to_beagle-3123/20s_NoOffloads_really_BQL9000_IPv6_Atoms_via_enki_simplest.qos_via_enki_to_beagle-pfifo_fast-all_scaled-rrul_noclassification.svg

I get this pecular behavior with one stream no matter what qdisc is on
the endpoints, and get 3ms worth of latency e2e.

I mean, I ultimately want to get to where I have netem fired up on
2atom with long rtts to multiple "servers", and random packet
loss/delay/reordering but just getting to where 4 boxes produced
results that made a tiny bit of sense has taken weeks....

On Mon, Jul 8, 2013 at 8:24 PM, Dave Taht <dave.taht at gmail.com> wrote:
> On Mon, Jul 8, 2013 at 10:03 AM, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>> Mikael Abrahamsson <swmike at swm.pp.se> writes:
>>
>>> I have not so far seen tests with FQ_CODEL with a simulated 100ms
>>> extra latency one-way (200ms RTT). They might be out there, but I have
>>> not seen them. I encourage these tests to be done.
>>
>> Did a few test runs on my setup. Here are some figures (can't go higher
>> than 100mbit with the hardware I have, sorry).
>>
>> Note that I haven't done tests at 100mbit on this setup before, so can't
>> say whether something weird is going on there.
>
> It looks to me as though one direction on the path is running at
> 10Mbit, and the other at 100Mbit. So I think you typoed an ethtool or
> netem line....
>
> Incidentally, I'd like to know if accidental results like that are
> repeatable. I'm not a big fan of asymmetric links in the first place
> (6x1 being about the worst I ever thought semi-sane), and if behavior
> like this:
>
> http://archive.tohojo.dk/bufferbloat-data/long-rtt/rrul-100mbit-pfifo_fast.png
>
> and particularly this:
>
> http://archive.tohojo.dk/bufferbloat-data/long-rtt/tcp_bidirectional-100mbit-pfifo_fast.png
>
> holds up over these longer (200ms) RTT links, you are onto something.
>
>
>> I'm a little bit puzzled
>> as to why the flows don't seem to get going at all in one direction for
>> the rrul test.
>
> At high levels of utilization, it is certainly possible to so saturate
> the queues that other flows cannot start at all...
>
>>I'm guessing it has something to do with TSQ.
>
> Don't think so. I have incidentally been tuning that way up so as to
> get pre-linux 3.6 behavior on several tests. On the other hand the
> advent of TSQ makes Linux hosts almost have a pure pull through stack.
> If UDP had the same behavior we could almost get rid of the txqueue
> entirely (on hosts) and apply fq and codel techniques directly to the
> highest levels of the kernel stack.
>
> TSQ might be more effective if it was capped at (current BQL limit *
> 2)/(number of flows active)... this would start reducing the amount of
> data that floods the tso/gso offloads at higher numbers of streams.
>
>> Attaching graphs makes the listserv bounce my mail, so instead they're
>> here: http://archive.tohojo.dk/bufferbloat-data/long-rtt/ with
>> throughput data below. Overall, it looks pretty good for fq_codel I'd
>> say :)
>
> One of your results for fq_codel is impossible, as you get 11Mbit of
> throughput out of a 10Mbit link.
>
>>
>> I can put up the data files as well if you'd like.
>>
>> -Toke
>>
>>
>> Throughput data:
>>
>>
>> 10mbit:
>>
>> rrul test (4 flows each way), pfifo_fast qdisc:
>>  TCP download sum:
>>   Data points: 299
>>   Total:       375.443728 Mbits
>>   Mean:        6.278323 Mbits/s
>>   Median:      6.175466 Mbits/s
>>   Min:         0.120000 Mbits/s
>>   Max:         9.436373 Mbits/s
>>   Std dev:     1.149514
>>   Variance:    1.321382
>> --
>>  TCP upload sum:
>>   Data points: 300
>>   Total:       401.740454 Mbits
>>   Mean:        6.695674 Mbits/s
>>   Median:      6.637576 Mbits/s
>>   Min:         2.122827 Mbits/s
>>   Max:         16.892302 Mbits/s
>>   Std dev:     1.758319
>>   Variance:    3.091687
>>
>>
>> rrul test (4 flows each way), fq_codel qdisc:
>>  TCP download sum:
>>   Data points: 301
>>   Total:       492.824346 Mbits
>>   Mean:        8.186451 Mbits/s
>>   Median:      8.416901 Mbits/s
>>   Min:         0.120000 Mbits/s
>>   Max:         9.965051 Mbits/s
>>   Std dev:     1.244959
>>   Variance:    1.549924
>> --
>>  TCP upload sum:
>>   Data points: 305
>>   Total:       717.499994 Mbits
>>   Mean:        11.762295 Mbits/s
>>   Median:      8.630924 Mbits/s
>>   Min:         2.513799 Mbits/s
>>   Max:         323.180000 Mbits/s
>>   Std dev:     31.056047
>>   Variance:    964.478066
>>
>>
>> TCP test (one flow each way), pfifo_fast qdisc:
>>  TCP download:
>>   Data points: 301
>>   Total:       263.445418 Mbits
>>   Mean:        4.376170 Mbits/s
>>   Median:      4.797729 Mbits/s
>>   Min:         0.030000 Mbits/s
>>   Max:         5.757982 Mbits/s
>>   Std dev:     1.135209
>>   Variance:    1.288699
>> ---
>>  TCP upload:
>>   Data points: 302
>>   Total:       321.090853 Mbits
>>   Mean:        5.316074 Mbits/s
>>   Median:      5.090142 Mbits/s
>>   Min:         0.641123 Mbits/s
>>   Max:         24.390000 Mbits/s
>>   Std dev:     2.126472
>>   Variance:    4.521882
>>
>>
>> TCP test (one flow each way), fq_codel qdisc:
>>  TCP download:
>>   Data points: 302
>>   Total:       365.357123 Mbits
>>   Mean:        6.048959 Mbits/s
>>   Median:      6.550488 Mbits/s
>>   Min:         0.030000 Mbits/s
>>   Max:         9.090000 Mbits/s
>>   Std dev:     1.316275
>>   Variance:    1.732579
>> ---
>>  TCP upload:
>>   Data points: 303
>>   Total:       466.550695 Mbits
>>   Mean:        7.698856 Mbits/s
>>   Median:      6.144435 Mbits/s
>>   Min:         0.641154 Mbits/s
>>   Max:         127.690000 Mbits/s
>>   Std dev:     12.075298
>>   Variance:    145.812812
>>
>>
>> 100 mbit:
>>
>> rrul test (4 flows each way), pfifo_fast qdisc:
>>  TCP download sum:
>>   Data points: 301
>>   Total:       291.718140 Mbits
>>   Mean:        4.845816 Mbits/s
>>   Median:      4.695355 Mbits/s
>>   Min:         0.120000 Mbits/s
>>   Max:         10.774475 Mbits/s
>>   Std dev:     1.818852
>>   Variance:    3.308222
>> --
>>  TCP upload sum:
>>   Data points: 305
>>   Total:       5468.339961 Mbits
>>   Mean:        89.644917 Mbits/s
>>   Median:      90.731214 Mbits/s
>>   Min:         2.600000 Mbits/s
>>   Max:         186.362429 Mbits/s
>>   Std dev:     21.782436
>>   Variance:    474.474532
>>
>>
>> rrul test (4 flows each way), fq_codel qdisc:
>>  TCP download sum:
>>   Data points: 304
>>   Total:       427.064699 Mbits
>>   Mean:        7.024090 Mbits/s
>>   Median:      7.074768 Mbits/s
>>   Min:         0.150000 Mbits/s
>>   Max:         17.870000 Mbits/s
>>   Std dev:     2.079303
>>   Variance:    4.323501
>> --
>>  TCP upload sum:
>>   Data points: 305
>>   Total:       5036.774674 Mbits
>>   Mean:        82.570077 Mbits/s
>>   Median:      82.782532 Mbits/s
>>   Min:         2.600000 Mbits/s
>>   Max:         243.990000 Mbits/s
>>   Std dev:     22.566052
>>   Variance:    509.226709
>>
>>
>> TCP test (one flow each way), pfifo_fast qdisc:
>>  TCP download:
>>   Data points: 160
>>   Total:       38.477172 Mbits
>>   Mean:        1.202412 Mbits/s
>>   Median:      1.205256 Mbits/s
>>   Min:         0.020000 Mbits/s
>>   Max:         4.012585 Mbits/s
>>   Std dev:     0.728299
>>   Variance:    0.530419
>>  TCP upload:
>>   Data points: 165
>>   Total:       2595.453489 Mbits
>>   Mean:        78.650106 Mbits/s
>>   Median:      92.387832 Mbits/s
>>   Min:         0.650000 Mbits/s
>>   Max:         102.610000 Mbits/s
>>   Std dev:     30.432215
>>   Variance:    926.119728
>>
>>
>>
>> TCP test (one flow each way), fq_codel qdisc:
>>   Data points: 301
>>   Total:       396.307606 Mbits
>>   Mean:        6.583183 Mbits/s
>>   Median:      7.786816 Mbits/s
>>   Min:         0.030000 Mbits/s
>>   Max:         15.270000 Mbits/s
>>   Std dev:     3.034477
>>   Variance:    9.208053
>>  TCP upload:
>>   Data points: 302
>>   Total:       4238.768131 Mbits
>>   Mean:        70.178280 Mbits/s
>>   Median:      74.722554 Mbits/s
>>   Min:         0.650000 Mbits/s
>>   Max:         91.901862 Mbits/s
>>   Std dev:     17.860375
>>   Variance:    318.993001
>>
>>
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html


More information about the Codel mailing list