[Bloat] [Cerowrt-devel] cerowrt 3.3.8-17: nice latency improvements, some issues with bind

Török Edwin edwin+ml-cerowrt at etorok.net
Sat Aug 25 09:56:04 EDT 2012


On 08/18/2012 08:07 PM, Dave Taht wrote:
> Thx again for the benchmarks on your hardware! Can I get you to go one
> more time to the well?

Yes, but you have to wait until I have some time to do it.

> Stripping out the incremental steps some will save you some time
> on benchmarking, so lets go with 3,4,12,35,100. Wireless data is
> incredibly noisy and I usually end up going with cdf plots like this
> old one
>> To get twice the speed a qlen=11 is enough already, and to get all the speed back a qlen=35 is needed.
> 
> This is an incomplete conclusion. It is incomplete in that A) these
> tests were done under laboratory conditions at the highest data rate
> (MCS15), and B), it was with a single point to point link to an AP
> which normally would be handling more than one client. C) it tests a
> single full throttle TCP stream when typical websites and usage
> involve 70+ dns lookups and 70 separate short streams.
> 
> I can live with B and C) for now, although I note that the chrome
> benchmark while doing a full blown stream test as you are doing now in
> the background and ping is quite useful for looking at C. Let's tackle
> A...
> 
>>
>> And here are the results with fq_codel on the laptop too (just nttcp -t as thats the one affected):
>>
>> fq_codel on laptop, cerowrt defaults,  nttcp -t:  1.248/12.960/108.490/16.733 ms; 90 Mbps
>> fq_codel on laptop, cerowrt qlen_*=4,  nttcp -t:  1.205/10.843/ 76.983/12.460 ms; 105 Mbps
>> fq_codel on laptop, cerowrt qlen_*=8,  nttcp -t:  4.034/16.088/ 98.611/17.050 ms; 120 Mbps
>> fq_codel on laptop, cerowrt qlen_*=11, nttcp -t:  3.766/15.687/ 56.684/11.135 ms; 114 Mbps
>> fq_codel on laptop, cerowrt qlen_*=35, nttcp -t: 11.360/26.742/ 48.051/ 7.489 ms; 113 Mbps
> 
> So, if you could move your laptop to where it gets MCS4 on a fairly
> reliable basis, and repeat the tests? a wall or three will do it.

I've put my laptop in a place where I got MCS4 on TX most of the time.
RX is MCS4 most of the time too, but it is switching to MCS5, 7, 11, 12 and back to MCS4
quite a lot.

> please don't change your kernel out before trying that test... (and I
> make no warranties about the reliability/usefulness of a rc2!)

Here are the results with fq_codel on the laptop, and same 3.5.0 kernel:

qlen 100, nttcp -t:  5.966/57.104/192.017/26.674 ms; 52.2376 Mbps
qlen  35, nttcp -t: 15.636/54.823/108.921/19.762 ms; 52.4675 Mbps
qlen  12, nttcp -t:  4.768/29.439/132.924/27.159 ms; 51.2619 Mbps
qlen  4,  nttcp -t:  2.631/20.500/152.741/31.549 ms; 40.3949 Mbps
qlen def, ntccp -t:  2.010/21.851/317.085/49.323 ms; 35.8268 Mbps

qlen 100, nttcp -r: 23.225/44.101/142.835/21.181 ms; 36.6789 Mbps
qlen  35, nttcp -r:  3.755/23.413/ 83.530/15.329 ms; 35.4602 Mbps
qlen  12, nttcp -r:  4.318/10.251/ 96.773/12.008 ms; 31.1557 Mbps
qlen   4, nttcp -r:  2.733/ 4.507/ 16.353/ 1.917 ms; 24.6688 Mbps
qlen def, nttcp -r:  2.119/ 4.999/ 64.968/ 7.275 ms; 27.3645 Mbps

Note that the laptop was on battery this time, so that may add some jitter
(CPU freq switching, wifi power saving?), but shouldn't matter for >10ms quantities.

Looks like the iwl4965 is somewhat bloated, with those 100ms+ latencies.

I don't know what happened there, but with the default qlen (2,3,3,3) I get the 317 ms max latency,
whereas with qlen 4 I get 152 ms max latency on TX. The average is also better with qlen 4.
Same observation goes for the RX side.

> 
> I will predict several things:
> 
> 1) the bulk of the buffering problem is going to move to your laptop,
> as it has weaker antennas than the wndrs. Most likely you will end up
> with tx on the one side higher than rx on the other.

Yes the laptop TX latencies are worse.

> 
> 2) you will see much higher jitter and latency and much lower
> throughput. Your results will also get wildly more variable run to
> run. (I tend to run tests for 2 minutes or longer and toss out the
> first few seconds)

On TX it is quite consistently in MCS4 (according to watch iw wlan0 station dump),
but on RX its jumping quite a lot.

> 
> 3) The lower fixed buffering sizes on cero's qlens will start making a
> lot more sense, but it may be hard to see due to 1 and 2.

qlen 12 and 4 look good. The default looks worse though.

> 
> The thing I don't honestly know is how well fq_codel reacts to sudden
> bandwidth changes when the underlying device driver (the iwl in this
> case) is overbuffered or how well codel's target idea really works in
> the wifi case in general. It would be nice to have some data on it.
> (hint, hint)

The bandwidth varies quite a lot on RX even if both the laptop and router
are perfectly still. So the -r numbers above should be what you are looking for.
If you want some other data let me know.

> 
> Some work was done on debloating the iwl last year, I don't know if
> any of the work made it into mainline.
> 
> Lastly, I put a version of Linux 3.6-rc2 up here.
> 
> http://snapon.lab.bufferbloat.net/~cero1/deb/
> 
> It has a fix to codel in it that was needed (I think but have not
> checked to see if it's in 3.5.1), and it also incorporates "TCP small
> queues", which reduces tcp-related buffering in pfifo_fast enormously,
> and helps on other qdiscs as well. Switching to it will invalidate the
> testing you've done so far...

I assume these are in the upstream 3.6-rc3 too, right?

Here is just one measurement done with 3.6-rc3 on the laptop and fq_codel
(same location as above tests, approx MCS4):
qlen def, nttcp -t, 2.871/15.655/375.777/44.212 ms; 35.2776 Mbps
qlen def, nttcp -r, 1.406/ 3.434/ 12.763/ 1.649 ms; 24.3334 Mbps

It looks somewhat better.

> 
> (another reason why I'm reluctant to post graphs on codel/fq_codel
> right now is that good stuff keeps happening above/below it in Linux),
> 
> 
> 
>> Shouldn't wireless N be able to do 200 - 300 Mbps though? If I enable debugging in iwl4965 I see that it
>> starts TX aggregation, so not sure whats wrong (router or laptop?). With encryption off I can get at most 160 Mbps.
> 
> A UDP test will get you in the 270Mbit range usually.

nttcp -T -u -D -n2000 gives ~180 Mbps at most, and with -r I can't make sense of it (looks like most gets dropped):
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
l    16384    0.08    0.00      1.6090  13107.2000       5     61.38  500000.0
1  8192000    0.08    0.04    845.8113   1820.6973    2003  25850.83   55646.6

> 
>>
>> iw dev sw10 station dump shows:
>> ...
>>         signal:         -56 [-60, -59] dBm
>>         signal avg:     -125 [-65, -58] dBm
>>         tx bitrate:     300.0 MBit/s MCS 15 40Mhz short GI
>>         rx bitrate:     300.0 MBit/s MCS 15 40Mhz short GI
>>
>> On laptop:
>>         tx bitrate:     300.0 Mbit/s MCS 15 40Mhz short GI
> 
> In non-lab conditions you generally don't lock into a rate. The
> minstrel algorithm tries various strategies to get the packets
> through, so you can
> get a grip on what's really happening by looking at the rc_stats file
> for your particular device.
> 
> example here:
> 
> 
> http://www.bufferbloat.net/projects/cerowrt/wiki/Minstrel_Wireless_Rate_Selection
> 

I looked at the rc_stats file by cd-ing into the stations dir on the router. After disabling/enabling the radio
the stations subdir was gone though:
root at OpenWrt:~# ls /sys/kernel/debug/ieee80211/phy1/netdev\:sw10/stations/ -al
drwxr-xr-x    2 root     root             0 Aug 25 10:28 .
drwxr-xr-x    3 root     root             0 Aug 25 10:28 ..

So unfortunately I'm without an rc_stats now (until I reboot the router probably?).

Best regards,
--Edwin



More information about the Bloat mailing list