[Cake] [Make-wifi-fast] Flent results for point-to-point Wi-Fi on LEDE/OM2P-HS available

Wed Feb 1 09:48:59 EST 2017

Pete Heist <peteheist at gmail.com> writes:

>  On Jan 30, 2017, at 10:44 PM, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>
>  Oh my, this is quite a lot of tests. Nice :)
>
> It’s also a thumbs up for the ath9k driver changes that nothing went
> wrong during the testing. It takes about 15 hours for a full run and I
> probably did that 4-5 times total.

Cool, thanks for confirming that :)
>
>  Few general points on running tests:
>
>  - Yeah, as you note Flent has a batch facility. Did you not use this
>   simply because you couldn't find it, or was there some other reason?
>   Would love some feedback on what I can do to make that more useful to
>   people... While I have no doubt that your 'flenter.py' works, wrapping
>   a wrapper in this sense makes me cringe a little bit ;)
>
> I actually didn’t notice it existed until I was about 85% done and
> scanning the Flent man page for some other reason. I cringed, but at
> that point I just stuck with what I had. I don’t know if Flent can
> also make some basic html report with the graphs and setup output, but
> that was useful to write for myself. Flent’s metadata feature sounds
> useful and I’ll try that.

Right, so first thing is advertising it better. I'll look into that;
really do need to freshen up the web site some...

I will look over your automation script in more detail and see if
there's anything that might be worth adding to Flent's capabilities. Not
sure if HTML report generation can be generalised sufficiently to be
useful outside your specific scenario, but I'll think it over :)

>  - I'm not sure if you're checking that applying your qdiscs actually
>   works? For the WiFi interfaces with 'noqueue' you *cannot* apply a
>   different qdisc (which also answers your question #2).
>
> Hmm. Unless I’m missing something, what I’m seeing is that I _can_ add
> another qdisc, only that it’s ineffective unless soft rate limiting is
> used. As evidence, here's my nolimit test of fq_codel:
>
> http://www.drhleny.cz/bufferbloat/fq_codel_nolimit/index.html

Ah, totally missed that the qdisc information was available on those
pages as well; guess my scroll bar must have been broken. ;)

And yeah, just verified that you can indeed install a qdisc on a noqueue
device; how odd.

>  Question 1 (and partly #13): Yeah, the version of LEDE you're running
>  already has the FQ-CoDel-based queueing in the ath9k driver. The
>  baseline you're seeing is consistent with the results we've been getting
>  in testing. This is also seen by any gains you get being paired with
>  quite a hefty hit in throughput. So with this driver, I would say it's
>  not worth it. However, this is going to be different on a setup without
>  the WiFi queueing fixes.
>
> Ok, that explains a lot, thanks. I was still able to see about a 50%
> reduction in latency (from ~25 ms to ~12ms) with a 13% drop in
> throughput (from ~92 Mbps to ~80Mbps), when doing half-duplex rate
> limiting to 85Mbps and fq_codel’ing on the external router. See:
>
> http://www.drhleny.cz/bufferbloat/fq_codel_hd-eth-ap_85mbit/index.html
>
> vs the default:
>
> http://www.drhleny.cz/bufferbloat/default/index.html
>
> I can get down to 10ms if I give up another 5 Mbps, or lower values
> with more severe throughput sacrifices.
>
> But this is with a stable RSSI of around -50 and low noise. I
> understand that fq-codel’ing in the driver must be superior in its
> handling of rate changes, retries or other external factors, and that
> point-to-multipoint is a different story. But maybe some of FreeNet’s
> line-of-sight point-to-point links may also be stable enough such that
> fixed software rate limiting is usable for them, I’m not sure yet.

Yeah, whether that is worth it depends on your requirements of course,
and how stable your link is. That's very much a YMMV issue, I think.

> It’s not critical, but why am I able to see this level of reduction
> when there’s already fq-codel in the driver? 25ms is very good, I only
> wonder where I’m getting the extra 10-15ms from, out of interest. :)

The driver queues up two aggregates beneath the queue to keep the
hardware busy. It may be possible to improve slightly upon this, but we
have not gotten around to trying yet.

>  Question 5: For TCP you can't get packet loss from user space; you'll
>  need packet captures for that. So no way to get it from Flent either.
>  You can, however, get average throughput. Look at the box plots; if you
>  run multiple iterations of the same test, you can plot several data
>  files in a single box_combine plot, to get error bars. `flent
>  file.flent.gz -f summary` (which is the default if you don't specify a
>  plot) will get you averages per data series; or you can extract it from
>  the metadata.
>
> Ok, so far, I was doing `cat file.flent.gz | grep null | wc -l`, which
> is a very crude count of the nulls recorded, which seem to happen for
> the udp and icmp flows with packet loss. There are always some nulls
> from before the test starts and after it ends, but if the count jumps
> up I speculate that there’s more packet loss. It’s pretty weak but
> it’s a hint.

This is very unlikely to get you anything resembling a right answer. The
null values recorded by Flent are simply *sampling periods* without any
output from netperf. There will be a bunch of those at the start or end
of each test, and there can be other reasons apart from packet loss that
will give null values.

You can get packet loss for ICMP by looking at the sequence numbers in
the raw_values object in the data file (you can get that as a Python
object by doing 'from flent import resultset; r =
resultset.load(filename)' and poking around in that object). Don't think
there's currently a way to export that as a loss measure...

-Toke