[Cerowrt-devel] Managed to break 802.11n (on a 3800)

Sebastian Moeller moeller0 at gmx.de
Thu Jan 16 17:56:39 EST 2014


Hi Dave,

many thanks for all the information & elucidation, as always.

On Jan 16, 2014, at 23:30 , Dave Taht <dave.taht at gmail.com> wrote:

> On Thu, Jan 16, 2014 at 10:29 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
>> Hi Aaron,
>> 
>> On Jan 16, 2014, at 16:03 , Aaron Wood <woody77 at gmail.com> wrote:
>> 
>>> All,
>>> 
>>> I'm noting this here in case anyone is interested.  After I write this up, I'm going to start from scratch on the configuration, and factory-reset the router.
>>> 
>>> =====
>>> 
>>> The 5GHz radio on my 3800 seems to be in a very odd state.  I'm not quire sure what state it's in, but it seems to be only doing HT20 1x1.  And in a fairly broken manner at that.
>>> 
>>> Running the rrul test (over wifi directly to the router as the netserver), tcp uploads were 25Mbps or so, but download was 5Mbps.
>> 
>>        This is with your mac? Try rrul_noclassification, macosx (at least 10.8) will not do RRUL fair to a fast host. Why I do not know… it always prioritizes the upload, as if it did not see/trust the downstream markings (heck maybe it is busy using all bandwidth for upstream so that it literally never sees the markings on the downstream packets..)
> 
> rrul with classification blows up 802.11e on all devices, everywhere.
> The VO and VI queues generally get all the bandwidth.
> Been saying that a while. VO and VI should be strictly admission
> controlled and are not, anywhere. All the queues fill
> and bad things happen. What should happen in a 802.11n world is that a
> set of packets should wind up in the best queue for the TXOP, and VO
> used not at all.
> 
> rrul_noclassification better looks like the intent for classification
> was for 802.11e and thus works better. There are a couple
> other tests in the netperf-wrapper suite that don't use classification
> at all, that might be saner to use.

	Ah, so in rrul_noclassification, the UDP flows still are tos marked (at least that is reported in the plots and visible in the plots), but even using tcp_bidirectional I see a crazy imbalance 80:1, so this laptop's Broadcom BCM43xx (apple is not as informative as I would like about the components, but the firmware marker points at broadcom I would say) isn't better than the intel wifi in your's I would say…


> 
> lastly, if you are doing a test over the internet, many providers pee
> on the tos bits. Unless you've done a packet capture, you can't trust
> that you are actually seeing classified packets coming back from the
> internet.

	Good point, comparing just the local rrul plots with the ones to demo, I see what you mean, there is a tiny bit of the priority classes visible in the uplink (bur barely) and none at all in the downlink, so my ISP does not think too much of the toe bits (I guess the tos effect on the uplink is from what cero is doing and since cero controls the bottleneck some "imprint" remains to be seen at packet reception time at demo, or so I think...).

> 
> One of the things I hope to fix with the twd effort is to detect tos
> bit preservation and note it in the test.
> 
> I'm delighted you'all are seeing these results for yourselves. Getting
> dinged on bandwidth after aiming for low latency by the public is not
> something I'd wanted to happen with a "stable" release. Regrettably
> fixing the drivers to work better only has
> felix working on it in his spare time, and I've been trying to clear
> my plate for months to help do the delicate rework
> required. (or recruit others to help)

	I would love to help, but this is far out of my league and area of expertise…

best
	Sebastian

> 
> 
>> About the other issue I do not know anything…
>> 
>> Best Regards
>>        Sebastian
>> 
>>> This is me 1-2 meters from the router.  Load was never more than 0.33.  (I can share the results of people are interested).
>>> 
>>> After a full power cycle, wifi isn't coming up at all.
>>> 
>>> =====
>>> 
>>> How I got here:
>>> 
>>> 
>>> I'm in France, and had dutifully set my unit with the FR country code when setting up CeroWRT.  I had noticed some odd latencies (periodic 100-200ms latency every 10-20 seconds over wifi) on the 5GHz network.  The router was on channel 36, and I wanted to move it up to the far-upper ranges, so I tried to specify a "custom" channel to do so (140).  This was the channel I thought I had been using with stock (Netgear) firmware.
>>> 
>>> Wifi didn't come back up after applying the changes, and the luci interface seemed to be tripping up over stuff that it was reading out of the configuration files.
>>> 
>>> I ssh'd in via ethernet, and fixed up the configurations by hand.
>>> 
>>> Except the driver is still reporting that the 5GHz network won't kick into 802.11n modes, and won't use HT40.  It seems to be sure it's configured for it, but isn't using it.
>>> 
>>> Further, digging into the rc_stats files with the minstrel speeds, I found some very odd data (not what I was expecting to see:
>>> 
>>> (laptop, which can do 2x2 HT40)
>>> rate      throughput  ewma prob  this prob  this succ/attempt   success    attempts
>>>   D   6         6.0       99.9      100.0             2(  2)        65          65
>>>       9         0.0        0.0        0.0             0(  0)         0           0
>>>      12         2.9       25.0      100.0             0(  0)         1           1
>>>      18         4.3       25.0      100.0             0(  0)         1           1
>>>      24         5.6       25.0      100.0             0(  0)         1           1
>>> A   P 36        32.4       99.9      100.0             0(  0)        51          51
>>>  C   48        10.4       25.0      100.0             0(  0)         1           1
>>> B    54        11.5       25.0      100.0             0(  0)         1           1
>>> 
>>> Total packet count::    ideal 53      lookaround 7
>>> 
>>> (AppleTV, 1x1 HT20)
>>> root at cerowrt:/sys/kernel/debug/ieee80211/phy1/netdev:sw10# cat stations/58\:55\:ca\:51\:b5\:4b/rc_stats
>>> rate      throughput  ewma prob  this prob  this succ/attempt   success    attempts
>>>       6         3.5       57.8      100.0             0(  0)         6           6
>>>       9         3.9       43.7      100.0             0(  0)         2           2
>>>      12         5.1       43.7      100.0             0(  0)         2           2
>>>      18        10.0       57.8      100.0             0(  0)         3           3
>>>   D  24        13.1       57.8      100.0             0(  0)         3           3
>>>  C   36        14.2       43.7      100.0             0(  0)         2           2
>>> B    48        18.2       43.7      100.0             0(  0)         2           2
>>> A   P 54        46.2       99.9      100.0             1(  1)       348         367
>>> 
> 
> No AMPDUs. Hmm. Might be a bug.
> 
>>> Total packet count::    ideal 331      lookaround 37
> 
> Hmm. The radios are set for HT20 for the 2.4ghz and HT40+ for the
> 5ghz. I note that
> HT40 in wireless-n the 8 channels used up need to be congruent.
> 
> HT40+ is 36+40, and 44+48 for example. You can't do 40+44.
> 
> Availability of HTXX is dependent upon your regulatory domain.
> 
>>> Whereas what I'm seeing for the 2.4GHz radio is:
>>> 
>>> root at cerowrt:/sys/kernel/debug/ieee80211/phy0/netdev:sw00/stations# cat 10\:9a\:dd\:30\:96\:34/rc_stats
>>> type         rate     throughput  ewma prob   this prob  retry   this succ/attempt   success    attempts
>>> CCK/LP        1.0M           0.7      100.0       100.0      0              0(  0)         2           2
>>> CCK/SP        2.0M           0.0        0.0         0.0      0              0(  0)         0           0
>>> CCK/SP        5.5M           0.0        0.0         0.0      0              0(  0)         0           0
>>> CCK/SP       11.0M           0.0        0.0         0.0      0              0(  0)         0           0
>>> HT20/LGI     MCS0            5.6      100.0       100.0      1              0(  0)         2           2
>>> HT20/LGI     MCS1            0.0        0.0         0.0      0              0(  0)         0           0
>>> HT20/LGI     MCS2            0.0        0.0         0.0      0              0(  0)         0           0
>>> HT20/LGI     MCS3            0.0        0.0         0.0      0              0(  0)         0           0
>>> HT20/LGI     MCS4            0.0        0.0         0.0      0              0(  0)         0           0
>>> HT20/LGI     MCS5           30.3      100.0       100.0      5              0(  0)         1           1
>>> HT20/LGI  t  MCS6           32.5      100.0       100.0      5              0(  0)        11          11
>>> HT20/LGI T P MCS7           35.0      100.0       100.0      5              6(  6)        34          34
>>> 
>>> Total packet count::    ideal 45      lookaround 3
>>> Average A-MPDU length: 1.3
> 
> You are doing good at the highest possible rate. However packet
> aggregation is pretty terrible.
> 
>>> 
>>> And here are radio blocks from the current /etc/config/wireless:
>>> 
>>> config wifi-device 'radio1'
>>>      option type 'mac80211'
>>>      option macaddr '28:c6:8e:bb:9a:49'
>>>      list ht_capab 'SHORT-GI-40'
>>>      list ht_capab 'TX-STBC'
>>>      list ht_capab 'RX-STBC1'
>>>      list ht_capab 'DSSS_CCK-40'
>>>      option txpower '17'
>>>      option distance '25'
>>>      option channel '48'
>>>      option country 'US'
>>> 
>>> config wifi-device 'radio0'
>>>      option type 'mac80211'
>>>      option hwmode '11ng'
>>>      option macaddr '28:c6:8e:bb:9a:47'
>>>      option htmode 'HT20'
>>>      list ht_capab 'SHORT-GI-40'
>>>      list ht_capab 'TX-STBC'
>>>      list ht_capab 'RX-STBC1'
>>>      list ht_capab 'DSSS_CCK-40'
>>>      option txpower '26'
>>>      option country 'FR'
>>>      option distance '15'
>>>      option channel 'auto'
> 
> I don't know anyone that has fiddled with distance to such an extent.
> your country codes need to be the same and you should look at what
> is allowed in FR.
> 
>>> ======
>>> 
>>> Some notes after having repaired the situation:
>>> 
>>> - The pci paths to the radios was missing from /etc/config/wireless, that's the only thing that I saw that seemed grossly out of place.
>>> 
>>> - Back up and running, and yes, it's much happier, now.  Over wifi I get 60-70Mbps upload and ~40Mbps download (running rrul).  Latency sucks.  Wifi has some ugly bufferbloat.  (although these results are somewhat in question when the router has a 1m load average over 5.0...)
> 
> Trying to measure the one way delay here is important (and hard. The
> only tool I've found for it so far was owamp, so I'm trying to write
> that test in twd). A TON of your delay is coming from your client. A
> network connection is like a fountain, or a toilet, both sides of the
> flow count...
> 
>>> 
>>> - Enabling all the SQM features I was having previously also considerably cleaned up wifi performance.  It's more balanced, but still not nearly as balanced as I see on gigabit ethernet.
>>> 
>>> 
>>> 
>>> -Aaron
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel at lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> 
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 
> 
> 
> -- 
> Dave Täht
> 
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140116/ad5e09d1/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tcp_bidirectional_hms-beagle_2_cerowrt.png
Type: image/png
Size: 38902 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140116/ad5e09d1/attachment-0002.png>


More information about the Cerowrt-devel mailing list