[Cerowrt-devel] tc-stab versus htb on ADSL 8000/700

Dave Taht dave.taht at gmail.com
Thu Aug 15 11:17:57 EDT 2013


On Thu, Aug 15, 2013 at 2:32 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:

> Hi Dave, hi Fred,
>
>
> On Aug 15, 2013, at 04:15 , Dave Taht <dave.taht at gmail.com> wrote:
>
> 0) What cero version is this? I have a slightle optimization for codel in
> 3.10.6 that I'd hoped would improve < 4mbit behavior... basically it turns
> off maxpacket (and was discussed earlier this month on the codel list) as
> not being useful outside of the ns2 environment.
>
>
> Interesting, I did my latest tests wit 3.10.6-1 and saw decent behavior
> for 14.7Mbit/s down, 2.4Mbit/s up, 40ms ping RTTs from 25-30ms unloaded,
> alas I have no comparison data for 3.10.1-1, as my router wiped out while I
> wanted to run those tests and I had to ref lash it via TFTP (and I decided
> to go to 3.10.6-1 directly, not knowing about the differences in fq_codel,
> even though I think you announced them somewhere)
>
>
That's pretty good. The best you can do would be +10 ms (as you are adding
5ms in each direction).

I note that the nstat utility (under linux) will give a total number of
packets on the link and a variety of other statistics (that I mostly don't
understand), which will make it more possible to understand what else is
going on on that host, tcp's underlying behavior (packet loss, fast
recovery, ecn usage, etc (ecn tracking was just added to it, btw))

So future versions of rrul will probably run something like:

nstat > /dev/null # wipe out the statistics
some_rrul_related_test
nstat > saveitsomewhere

How to actually present the data? Damned if I know. Also have no idea if a
similar tool exists for other oses. It's using some semi-standard snmp
counters... might be possible to pull it off the router... dunno



>
> 1) I kind of prefer graphs get stuck on a website somewhere, rather than
> email. Having to approve big postings manually adds to the 10 spams I have
> to deal with per day, per list.
>
>
> I will look around to find a way to post things, would google+ work?
>
>
No tiffs, please? :) pngs are MUCH smaller, svg's show more detail....

>
> We would certainly like to add an "upload this test" feature to rrul one
> day, that captures more data about the user's configuration and the test
> data (from both sides!), but that project and servers remain unfunded, and
> toke's really busy with his masters thesis...
>
> 2) Test #1 at T+48 or so appears to have a glitch - either caused by local
> traffic on the link or something else. The long diagonal lines that you see
> are bugs in the python-matplotlib library, they are fixed in ubuntu 13.4
> and latest versions of arch.
>
>
> Ah, but you can install matplotlib version 1.3 under ubuntu 12.04 in the
> terminal:
> 1) sudo apt-get build-dep python-matplotlib
>
> 2) potentially required:
> sudo pip install --upgrade freetype-py
>
> 3)  sudo pip install --upgrade matpltolib
>
> (I might have forgotten a required step, so should anyone get stuck, just
> contact me)
>
>
good to know, thx.


>
> The choppy resolution of the second graph in each chart is due to the
> sample interval being kind of small relative to the bandwidth and the RTT.
> That's sort of fixable, but it's readable without it….
>
>
> Is there a simple way to fix this in netperf-wrapper?
>
>
Well, it kind of comes down to presenting raw data as raw data. The
academic and sysadm universe is in the habit of presenting highly processed
data, showing averages, eliminating the 95 percentile, etc, and by
constantly promoting looking hard at the raw data rather than the processed
stuff, I have hoped that at least I'd be able to consistently show people
that latency and bandwidth utilization are tightly interrelated, and that
raw data is very important when outliers are present - which they always
are, in networking.

As you noticed, you had a pattern that formed on an interval. I've seen
patterns all the time - one caused by cron every 60 seconds, another caused
by instruction traps on ipv6 - these turned out to be very important! -
I've seen other odd patterns that have formed on various intervals as well,
that were useful to understand.

And you see interesting patterns if you do things like run other traffic at
the same time as rrul. I'd be very interested if you ran the chrome web
page benchmarker against the alexa top 10 during a rrul or the simpler
tcp_upload or bidirectional tests on your setup (hint, hint)

An example of "smoothing" things that make me crazy are the five minute
averages that mrtg reports, when networks run at microsecond scales. At
least things like smokeping are a lot closer to being able to pickup on
outliers.

So, no, I'd rather not smooth that plot or change the sample interval (yes
you can change the sample interval)

I would like to have box plots one day.  Those can be genuinely useful but
also require a trained eye to read....

http://en.wikipedia.org/wiki/Box_plot


> Moving to what I see here, you are approximately 50ms (?) or so from the
> icei.org server which is located on the east coast of the US (NJ, in
> linode's co-location facility) (services on this box are graciously donated
> by the icei organization that has been working in the background to help
> out in many ways)
>
>
> The black line is an average of 4 streams in the first and second graphs
> in each chart. So you can multiply by 4 to get a rough estimate of actual
> bandwidth on this link, but you do have to factor in the measurement
> streams (graph 3), and the overhead of acks in each direction (which is
> usually 66 bytes every other packet for ipv4), which are hard to measure.
>
>
> Ah, so these ACKs will fill cause around 38 byte of padding in a packet of
> 3 ATM cells, leading to 26.4% increase in effective bandwidth used by the
> ATM stream versus what is send out over ge00 (ethernet). Together with the
> small ping and UDP RTT probes this explains nicely why proper link layer
> adjustments decrease ping time as well as decrease of the TCP rates.
>
>
yea, you are getting close to ideal. Could you re-run your setup set to
fred's settings? The atm etc mods won't kick in, but it would be
interesting to see if you have problems converging below 100ms...


> So you are showing 6Mbit of raw bandwidth down, and about 480 up.
> Factoring in the ack overhead of the the down, into the up, gets pretty
> close to your set limit of 700. You experienced packet loss at time T+6
> (not unusual) that killed the non-ping measurement flows. (some day in the
> future we will add one way ping measurements and NOT stop measuring after
> the first loss)
>
> (There are multiple other plot types. do a --list-plots rrul. You can
> generate a new plot on the same data by taking the *json.gz file and
> supplying a different output filename (-o filename.svg) and plot type (-p
> totals
>
>
> I regard the cdf plots as the most useful, but ALWAYS check this main
> graph type to see glitches. Otherwise a cdf can be very misleading.)
>
>
> Ah, for one I can really relate, in MRI analysis the mantra to teach
> newcomers is always "look at your raw data".
>
>
Outliers can kill you. Fact. I've pointed to frank rowands talks on this
subject a couple times.

The list of space launch failures due to stupid stuff like off by one bugs
and misplaced decimal points is rather high. And measurement error - or
worse, measuring the wrong thing - can mess up your whole day.

http://www.youtube.com/watch?v=2eGiqqoYP5E



>
>
> So in this test latency spikes by about 100ms. Why does it do that? Well,
> you have to fit 6k bytes (4 1500 byte flows), + 122 bytes (2 acks), + 65
> bytes (ping) into the queues, and at 700kb/second that queue is far less
> than the default 5ms target we start with. a 1500 byte packet takes 13ms to
> transmit at 1Mbit, so we are ending up here with sufficient "standing
> queue" to
>
> Frankly, it should, eventually, achieve a tcp window size that will reduce
> the latency to something lower than 100ms, but it obviously isn't.
> nfq_codel is "tighter", but who knows, we're still in very early days of
> trying to optimize for these bandwidths, and, like I said, I just killed
> the maxpacket thing which might help some. A longer test (-l 300) at this
> rtt) might be more revealing.
>
>
>
> So here is my result against "an unnamed netperf server in Germany" for
> 300seconds using 3.10.6-1 with simple.qos and fq_codel. Ping times only
> increase from ~30ms to 40ms (the same link gets ~300ms ping RTT without
> AQM, and ~80ms without proper linklayer adaptation). (Plus you can see my
> Macbook obviously is doing some periodically things (roughly every 15
> seconds) that eats bandwidth and causes RT
>

Excellent. If you re-run that test with "simple.qos" instead you can see
the classification classes, "doing something", at least on upload. On
download, if you don't see it "doing anything", it generally means that
your ToS values were stomped on in transit.



> Ts to increase a lot).
>

to get closer to an accurate value for traffic on the link, do the nstat
trick I mentioned above.


>
>
>
> Clear as mud?
>
>
>
> On Wed, Aug 14, 2013 at 2:28 PM, Fred Stratton <fredstratton at imap.cc
> > wrote:
>
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>


-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20130815/e7a5bf14/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.tiff
Type: image/tiff
Size: 946958 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20130815/e7a5bf14/attachment-0002.tiff>


More information about the Cerowrt-devel mailing list