[Cerowrt-devel] cerowrt 3.3.8-17: nice latency improvements, some issues with bind

Sat Aug 25 14:09:10 EDT 2012

On Sat, Aug 25, 2012 at 6:56 AM, Török Edwin
<edwin+ml-cerowrt at etorok.net> wrote:
> On 08/18/2012 08:07 PM, Dave Taht wrote:
>> Thx again for the benchmarks on your hardware! Can I get you to go one
>> more time to the well?
>
> Yes, but you have to wait until I have some time to do it.

No worries. Doing good science takes time.

>
>> Stripping out the incremental steps some will save you some time
>> on benchmarking, so lets go with 3,4,12,35,100. Wireless data is
>> incredibly noisy and I usually end up going with cdf plots like this
>> old one
>>> To get twice the speed a qlen=11 is enough already, and to get all the speed back a qlen=35 is needed.
>>
>> This is an incomplete conclusion. It is incomplete in that A) these
>> tests were done under laboratory conditions at the highest data rate
>> (MCS15), and B), it was with a single point to point link to an AP
>> which normally would be handling more than one client. C) it tests a
>> single full throttle TCP stream when typical websites and usage
>> involve 70+ dns lookups and 70 separate short streams.
>>
>> I can live with B and C) for now, although I note that the chrome
>> benchmark while doing a full blown stream test as you are doing now in
>> the background and ping is quite useful for looking at C. Let's tackle
>> A...
>>
>>>
>>> And here are the results with fq_codel on the laptop too (just nttcp -t as thats the one affected):
>>>
>>> fq_codel on laptop, cerowrt defaults,  nttcp -t:  1.248/12.960/108.490/16.733 ms; 90 Mbps
>>> fq_codel on laptop, cerowrt qlen_*=4,  nttcp -t:  1.205/10.843/ 76.983/12.460 ms; 105 Mbps
>>> fq_codel on laptop, cerowrt qlen_*=8,  nttcp -t:  4.034/16.088/ 98.611/17.050 ms; 120 Mbps
>>> fq_codel on laptop, cerowrt qlen_*=11, nttcp -t:  3.766/15.687/ 56.684/11.135 ms; 114 Mbps
>>> fq_codel on laptop, cerowrt qlen_*=35, nttcp -t: 11.360/26.742/ 48.051/ 7.489 ms; 113 Mbps
>>
>> So, if you could move your laptop to where it gets MCS4 on a fairly
>> reliable basis, and repeat the tests? a wall or three will do it.
>
> I've put my laptop in a place where I got MCS4 on TX most of the time.
> RX is MCS4 most of the time too, but it is switching to MCS5, 7, 11, 12 and back to MCS4
> quite a lot.
>
>> please don't change your kernel out before trying that test... (and I
>> make no warranties about the reliability/usefulness of a rc2!)
>
> Here are the results with fq_codel on the laptop, and same 3.5.0 kernel:
>
> qlen 100, nttcp -t:  5.966/57.104/192.017/26.674 ms; 52.2376 Mbps
> qlen  35, nttcp -t: 15.636/54.823/108.921/19.762 ms; 52.4675 Mbps
> qlen  12, nttcp -t:  4.768/29.439/132.924/27.159 ms; 51.2619 Mbps
> qlen  4,  nttcp -t:  2.631/20.500/152.741/31.549 ms; 40.3949 Mbps
> qlen def, ntccp -t:  2.010/21.851/317.085/49.323 ms; 35.8268 Mbps
>
> qlen 100, nttcp -r: 23.225/44.101/142.835/21.181 ms; 36.6789 Mbps
> qlen  35, nttcp -r:  3.755/23.413/ 83.530/15.329 ms; 35.4602 Mbps
> qlen  12, nttcp -r:  4.318/10.251/ 96.773/12.008 ms; 31.1557 Mbps
> qlen   4, nttcp -r:  2.733/ 4.507/ 16.353/ 1.917 ms; 24.6688 Mbps
> qlen def, nttcp -r:  2.119/ 4.999/ 64.968/ 7.275 ms; 27.3645 Mbps
>
> Note that the laptop was on battery this time, so that may add some jitter
> (CPU freq switching, wifi power saving?), but shouldn't matter for >10ms quantities.

Thank you for so clearly showing the trendline and relationship between
overbuffering, bandwidth, latency, and jitter on linux wifi in this
combination of these two drivers and OSes!

(I am inclined to throw out the second qlen 4 result as anomalous however)

(Did I add enough qualifications to the above statement?)

It does look like qlen 12 (presently) fits within my overall goals.
However (to me) the next step is switching the ath9k driver's buffering itself
from a straight fifo to (a tree?) trying to inspect its queue(s)
for possible aggregate-able packets and fq-ing (again) the result.

a better method (probably) would be for it to tell the overlying qdisc
"I want up to x packets or y bytes for station z", and the overlying
qdisc to be doing that job, and thus the codel notion of "maxpacket"
could apply to each station.

"maxpacket" is kind of misnamed, what it means is the maximum number
of bytes that can be delivered in one go - so it is MTU for devices
that don't have TSO or GSO enabled, size of a TSO (something less than
64k) for TSO/GSO, and should (probably!!! we're not there yet) be
equal to
"proposed next aggregate-able size/bytes for this destination" in wifi.

> Looks like the iwl4965 is somewhat bloated, with those 100ms+ latencies.

Ya think? It turns out one of my laptops has the 5100AGN, which is similar.
Somewhere on this list last year was a long discussion and some
proposed patches for the iwl series... I think the guy working on it
got swamped by some phd work, though.

right now that box is used as a wired endpoint (and has a SR71 card in it).

A few x86 boxes were just donated that I can replace that with, and do
a bit more wireless testing next week than I presently do. (and the
x86 boxes will dramatically expand testing longer RTTs, which I care
about a lot)

(THANK YOU VYATTA!)

Regrettably, first I have to get those boxes here, then setup, a
working OS and kernel on them, and then running netem...

> I don't know what happened there, but with the default qlen (2,3,3,3) I get the 317 ms max latency,
> whereas with qlen 4 I get 152 ms max latency on TX. The average is also better with qlen 4.
> Same observation goes for the RX side.

We have a potential interaction with the default quantums I'm using on cero,
which are 256, rather than 1500, (which is the default). In that case,
we can end up with 3 timestamped ipv4 acks in a row, but not 4, so a
given stream can "leak over" into the next potential aggregate, which
might be arbitrarily shortened by incorporating another portion of a
stream for another destination.

So, I'm inclined to bump it up to 12 for the cerowrt userbase as the
cost in normal usage is low and the benefit high, (note the same
problem above will occur, just slightly less often, and on average the
aggregates will be larger, so it's a win)

but in the interest of science and continuing to analyze codel's
behavior I'm going to keep it at 3 for while longer. (feel free to use
values that make you happy, just clearly tell me when you do, please!)

fiddling with the qlen is a very blunt hammer for the real job that
needs to be happening in the qdisc and driver, regardless. I hope we
can get much smarter about it soon, but at least in my case that
requires more insight into the ath9k than I have currently. Felix is
probably pretty wrapped up in the openwrt freeze, Andrew has another
day job...

>>
>> I will predict several things:
>>
>> 1) the bulk of the buffering problem is going to move to your laptop,
>> as it has weaker antennas than the wndrs. Most likely you will end up
>> with tx on the one side higher than rx on the other.
>
> Yes the laptop TX latencies are worse.
>
>>
>> 2) you will see much higher jitter and latency and much lower
>> throughput. Your results will also get wildly more variable run to
>> run. (I tend to run tests for 2 minutes or longer and toss out the
>> first few seconds)
>
> On TX it is quite consistently in MCS4 (according to watch iw wlan0 station dump),
> but on RX its jumping quite a lot.

As good as the minstrel algorithm is, I've often felt it could be improved
with deeper analysis of what really happens in the wireless-n cases,
particularly in the case of retries and within-aggregate packet loss.

Tuning it for -g took a year of data collection and a ton of analysis
and cash... and the (excellent) paper on it is unpublishable because
it so far exceeds the MPU. I doubt there are 12 people in the world
that deeply understand how minstrel works, and I wish there were
thousands... there is a wealth of information in it that could be used
for other things, like improving the behavior of mesh routing
protocols.

>>
>> 3) The lower fixed buffering sizes on cero's qlens will start making a
>> lot more sense, but it may be hard to see due to 1 and 2.
>
> qlen 12 and 4 look good. The default looks worse though.
>
>>
>> The thing I don't honestly know is how well fq_codel reacts to sudden
>> bandwidth changes when the underlying device driver (the iwl in this
>> case) is overbuffered or how well codel's target idea really works in
>> the wifi case in general. It would be nice to have some data on it.
>> (hint, hint)
>
> The bandwidth varies quite a lot on RX even if both the laptop and router
> are perfectly still. So the -r numbers above should be what you are looking for.
> If you want some other data let me know.

I'll try not to abuse your time, but if I can convince you to be able to
duplicate your experiments exactly, when needed, it would be an enormous help.

>>
>> Some work was done on debloating the iwl last year, I don't know if
>> any of the work made it into mainline.
>>
>> Lastly, I put a version of Linux 3.6-rc2 up here.
>>
>> http://snapon.lab.bufferbloat.net/~cero1/deb/
>>
>> It has a fix to codel in it that was needed (I think but have not
>> checked to see if it's in 3.5.1), and it also incorporates "TCP small
>> queues", which reduces tcp-related buffering in pfifo_fast enormously,
>> and helps on other qdiscs as well. Switching to it will invalidate the
>> testing you've done so far...
>
> I assume these are in the upstream 3.6-rc3 too, right?

yes. The rc3 I just put up there has some subtle changes to codel in
it, however that differ from the mainline. I'll have to clearly
distinguish between that and mainline better in the future.

> Here is just one measurement done with 3.6-rc3 on the laptop and fq_codel
> (same location as above tests, approx MCS4):
> qlen def, nttcp -t, 2.871/15.655/375.777/44.212 ms; 35.2776 Mbps
> qlen def, nttcp -r, 1.406/ 3.434/ 12.763/ 1.649 ms; 24.3334 Mbps
>
> It looks somewhat better.

12 remains the sanest win right now. But too early to change it this month.

thx again!

>>
>> (another reason why I'm reluctant to post graphs on codel/fq_codel
>> right now is that good stuff keeps happening above/below it in Linux),
>>
>>
>>
>>> Shouldn't wireless N be able to do 200 - 300 Mbps though? If I enable debugging in iwl4965 I see that it
>>> starts TX aggregation, so not sure whats wrong (router or laptop?). With encryption off I can get at most 160 Mbps.
>>
>> A UDP test will get you in the 270Mbit range usually.
>
> nttcp -T -u -D -n2000 gives ~180 Mbps at most, and with -r I can't make sense of it (looks like most gets dropped):
>      Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
> l    16384    0.08    0.00      1.6090  13107.2000       5     61.38  500000.0
> 1  8192000    0.08    0.04    845.8113   1820.6973    2003  25850.83   55646.6

I'll think about this on another day. Feel free to do pfifo_fast on this test
a couple times in either direction to get a baseline.

Doing badly on this test right now doesn't bother me at all...

>
>>
>>>
>>> iw dev sw10 station dump shows:
>>> ...
>>>         signal:         -56 [-60, -59] dBm
>>>         signal avg:     -125 [-65, -58] dBm
>>>         tx bitrate:     300.0 MBit/s MCS 15 40Mhz short GI
>>>         rx bitrate:     300.0 MBit/s MCS 15 40Mhz short GI
>>>
>>> On laptop:
>>>         tx bitrate:     300.0 Mbit/s MCS 15 40Mhz short GI
>>
>> In non-lab conditions you generally don't lock into a rate. The
>> minstrel algorithm tries various strategies to get the packets
>> through, so you can
>> get a grip on what's really happening by looking at the rc_stats file
>> for your particular device.
>>
>> example here:
>>
>>
>> http://www.bufferbloat.net/projects/cerowrt/wiki/Minstrel_Wireless_Rate_Selection
>>
>
> I looked at the rc_stats file by cd-ing into the stations dir on the router. After disabling/enabling the radio
> the stations subdir was gone though:
> root at OpenWrt:~# ls /sys/kernel/debug/ieee80211/phy1/netdev\:sw10/stations/ -al
> drwxr-xr-x    2 root     root             0 Aug 25 10:28 .
> drwxr-xr-x    3 root     root             0 Aug 25 10:28 ..
>
> So unfortunately I'm without an rc_stats now (until I reboot the router probably?).
>
> Best regards,
> --Edwin

-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"