[Cerowrt-devel] Managed to break 802.11n (on a 3800)

Dave Taht dave.taht at gmail.com
Thu Jan 16 15:12:38 PST 2014


On Thu, Jan 16, 2014 at 5:56 PM, Sebastian Moeller <moeller0 at gmx.de> wrote:

> Hi Dave,
>
> many thanks for all the information & elucidation, as always.
>

I enjoy trying to find the words to explain.


>
> On Jan 16, 2014, at 23:30 , Dave Taht <dave.taht at gmail.com> wrote:
>
> On Thu, Jan 16, 2014 at 10:29 AM, Sebastian Moeller <moeller0 at gmx.de>
> wrote:
>
> Hi Aaron,
>
> On Jan 16, 2014, at 16:03 , Aaron Wood <woody77 at gmail.com> wrote:
>
> All,
>
> I'm noting this here in case anyone is interested.  After I write this up,
> I'm going to start from scratch on the configuration, and factory-reset the
> router.
>
> =====
>
> The 5GHz radio on my 3800 seems to be in a very odd state.  I'm not quire
> sure what state it's in, but it seems to be only doing HT20 1x1.  And in a
> fairly broken manner at that.
>
> Running the rrul test (over wifi directly to the router as the netserver),
> tcp uploads were 25Mbps or so, but download was 5Mbps.
>
>
>        This is with your mac? Try rrul_noclassification, macosx (at least
> 10.8) will not do RRUL fair to a fast host. Why I do not know… it always
> prioritizes the upload, as if it did not see/trust the downstream markings
> (heck maybe it is busy using all bandwidth for upstream so that it
> literally never sees the markings on the downstream packets..)
>
>
> rrul with classification blows up 802.11e on all devices, everywhere.
> The VO and VI queues generally get all the bandwidth.
> Been saying that a while. VO and VI should be strictly admission
> controlled and are not, anywhere. All the queues fill
> and bad things happen. What should happen in a 802.11n world is that a
> set of packets should wind up in the best queue for the TXOP, and VO
> used not at all.
>
> rrul_noclassification better looks like the intent for classification
> was for 802.11e and thus works better. There are a couple
> other tests in the netperf-wrapper suite that don't use classification
> at all, that might be saner to use.
>
>
> Ah, so in rrul_noclassification, the UDP flows still are tos marked (at
> least that is reported in the plots and visible in the plots), but even
> using tcp_bidirectional I see a crazy imbalance 80:1, so this
> laptop's Broadcom BCM43xx (apple is not as informative as I would like
> about the components, but the firmware marker points at broadcom I would
> say) isn't better than the intel wifi in your's I would say…
>

the iwl is a nightmare. the 802.11ac stuff is looking bad too.

Another issue with the current implementation of rrul is my intent with the
specification was to test voip-like streams, an
isochronous 10ms packet in each direction.

The implementation currently sends measurement flows based on the RTT, just
like ping. As the RTT declines in length,
the amount of "space" used up by the measurement flow gets bigger and
bigger. At a 3ms RTT, just the EF measurement
flow eats ~2/3s of the available txops as it runs through the VO queue,
which is limited to a single packet per txop. The other measurement flows
like the CS5 flow, eat the VI queue, and the BE and BK queues get starved
for txops.

I can barely explain to myself how the queues are supposed to get airtime
scheduled, see the 802.11e page on wikipedia. I thought 802.11e was a bad
idea in the first place... but what rrul does is try to get txops on all 4
queues, which means it
needs 4x as much airtime (this is not accurate), and grabs airtime for it's
VO queue first most of the time, followed by
VI, BE, and bk.

I think for wifi testing with the current rrul test there needs to be a new
test that does everything in BE. (toke?)
Classification is very rarely used in the real world anyway.

Most of the usage of rrul to date has been over longer RTTs over
ethernet... (again, I'm delighted y'all are doing this,
and I do hope to get a more voip-like test)


>
>
>
> lastly, if you are doing a test over the internet, many providers pee
> on the tos bits. Unless you've done a packet capture, you can't trust
> that you are actually seeing classified packets coming back from the
> internet.
>
>
> Good point, comparing just the local rrul plots with the ones to demo, I
> see what you mean, there is a tiny bit of the priority classes visible in
> the uplink (bur barely) and none at all in the downlink, so my ISP does not
> think too much of the toe bits (I guess the tos effect on the uplink is
> from what cero is doing and since cero controls the bottleneck some
> "imprint" remains to be seen at packet reception time at demo, or so I
> think...).
>

simple.qos respects 3 of the 4 tiers that wifi does.

simplest does not.


>
>
> One of the things I hope to fix with the twd effort is to detect tos
> bit preservation and note it in the test.
>
>
> I'm delighted you'all are seeing these results for yourselves. Getting
> dinged on bandwidth after aiming for low latency by the public is not
> something I'd wanted to happen with a "stable" release. Regrettably
> fixing the drivers to work better only has
> felix working on it in his spare time, and I've been trying to clear
> my plate for months to help do the delicate rework
> required. (or recruit others to help)
>
>
> I would love to help, but this is far out of my league and area of
> expertise…
>
>
yer helping plenty, and the more people that "get this", the sooner people
will work
on fixing it. I have enjoyed trying to explain these behaviors today.
Someday
once we have words that match the concepts they will make sense to a CTO.

I have been very pleased by googling for bufferbloat of late. Almost
everyone that
has talked about it on the web for the past month seems to get it.

So if we start now, and make this the year of "make-wifi-fast", in a couple
years
maybe the world will get it...

... sadly long after 802.11ac is fully deployed and messing up everything
for
everybody.

> best
> Sebastian
>
>
>
> About the other issue I do not know anything…
>
> Best Regards
>        Sebastian
>
> This is me 1-2 meters from the router.  Load was never more than 0.33.  (I
> can share the results of people are interested).
>
> After a full power cycle, wifi isn't coming up at all.
>
> =====
>
> How I got here:
>
>
> I'm in France, and had dutifully set my unit with the FR country code when
> setting up CeroWRT.  I had noticed some odd latencies (periodic 100-200ms
> latency every 10-20 seconds over wifi) on the 5GHz network.  The router was
> on channel 36, and I wanted to move it up to the far-upper ranges, so I
> tried to specify a "custom" channel to do so (140).  This was the channel I
> thought I had been using with stock (Netgear) firmware.
>
> Wifi didn't come back up after applying the changes, and the luci
> interface seemed to be tripping up over stuff that it was reading out of
> the configuration files.
>
> I ssh'd in via ethernet, and fixed up the configurations by hand.
>
> Except the driver is still reporting that the 5GHz network won't kick into
> 802.11n modes, and won't use HT40.  It seems to be sure it's configured for
> it, but isn't using it.
>
> Further, digging into the rc_stats files with the minstrel speeds, I found
> some very odd data (not what I was expecting to see:
>
> (laptop, which can do 2x2 HT40)
> rate      throughput  ewma prob  this prob  this succ/attempt   success
>    attempts
>   D   6         6.0       99.9      100.0             2(  2)        65
>          65
>       9         0.0        0.0        0.0             0(  0)         0
>           0
>      12         2.9       25.0      100.0             0(  0)         1
>           1
>      18         4.3       25.0      100.0             0(  0)         1
>           1
>      24         5.6       25.0      100.0             0(  0)         1
>           1
> A   P 36        32.4       99.9      100.0             0(  0)        51
>          51
>  C   48        10.4       25.0      100.0             0(  0)         1
>           1
> B    54        11.5       25.0      100.0             0(  0)         1
>           1
>
> Total packet count::    ideal 53      lookaround 7
>
> (AppleTV, 1x1 HT20)
> root at cerowrt:/sys/kernel/debug/ieee80211/phy1/netdev:sw10# cat
> stations/58\:55\:ca\:51\:b5\:4b/rc_stats
> rate      throughput  ewma prob  this prob  this succ/attempt   success
>    attempts
>       6         3.5       57.8      100.0             0(  0)         6
>           6
>       9         3.9       43.7      100.0             0(  0)         2
>           2
>      12         5.1       43.7      100.0             0(  0)         2
>           2
>      18        10.0       57.8      100.0             0(  0)         3
>           3
>   D  24        13.1       57.8      100.0             0(  0)         3
>           3
>  C   36        14.2       43.7      100.0             0(  0)         2
>           2
> B    48        18.2       43.7      100.0             0(  0)         2
>           2
> A   P 54        46.2       99.9      100.0             1(  1)       348
>         367
>
>
> No AMPDUs. Hmm. Might be a bug.
>
> Total packet count::    ideal 331      lookaround 37
>
>
> Hmm. The radios are set for HT20 for the 2.4ghz and HT40+ for the
> 5ghz. I note that
> HT40 in wireless-n the 8 channels used up need to be congruent.
>
> HT40+ is 36+40, and 44+48 for example. You can't do 40+44.
>
> Availability of HTXX is dependent upon your regulatory domain.
>
> Whereas what I'm seeing for the 2.4GHz radio is:
>
> root at cerowrt:/sys/kernel/debug/ieee80211/phy0/netdev:sw00/stations# cat
> 10\:9a\:dd\:30\:96\:34/rc_stats
> type         rate     throughput  ewma prob   this prob  retry   this
> succ/attempt   success    attempts
> CCK/LP        1.0M           0.7      100.0       100.0      0
>              0(  0)         2           2
> CCK/SP        2.0M           0.0        0.0         0.0      0
>              0(  0)         0           0
> CCK/SP        5.5M           0.0        0.0         0.0      0
>              0(  0)         0           0
> CCK/SP       11.0M           0.0        0.0         0.0      0
>              0(  0)         0           0
> HT20/LGI     MCS0            5.6      100.0       100.0      1
>              0(  0)         2           2
> HT20/LGI     MCS1            0.0        0.0         0.0      0
>              0(  0)         0           0
> HT20/LGI     MCS2            0.0        0.0         0.0      0
>              0(  0)         0           0
> HT20/LGI     MCS3            0.0        0.0         0.0      0
>              0(  0)         0           0
> HT20/LGI     MCS4            0.0        0.0         0.0      0
>              0(  0)         0           0
> HT20/LGI     MCS5           30.3      100.0       100.0      5
>              0(  0)         1           1
> HT20/LGI  t  MCS6           32.5      100.0       100.0      5
>              0(  0)        11          11
> HT20/LGI T P MCS7           35.0      100.0       100.0      5
>              6(  6)        34          34
>
> Total packet count::    ideal 45      lookaround 3
> Average A-MPDU length: 1.3
>
>
> You are doing good at the highest possible rate. However packet
> aggregation is pretty terrible.
>
>
> And here are radio blocks from the current /etc/config/wireless:
>
> config wifi-device 'radio1'
>      option type 'mac80211'
>      option macaddr '28:c6:8e:bb:9a:49'
>      list ht_capab 'SHORT-GI-40'
>      list ht_capab 'TX-STBC'
>      list ht_capab 'RX-STBC1'
>      list ht_capab 'DSSS_CCK-40'
>      option txpower '17'
>      option distance '25'
>      option channel '48'
>      option country 'US'
>
> config wifi-device 'radio0'
>      option type 'mac80211'
>      option hwmode '11ng'
>      option macaddr '28:c6:8e:bb:9a:47'
>      option htmode 'HT20'
>      list ht_capab 'SHORT-GI-40'
>      list ht_capab 'TX-STBC'
>      list ht_capab 'RX-STBC1'
>      list ht_capab 'DSSS_CCK-40'
>      option txpower '26'
>      option country 'FR'
>      option distance '15'
>      option channel 'auto'
>
>
> I don't know anyone that has fiddled with distance to such an extent.
> your country codes need to be the same and you should look at what
> is allowed in FR.
>
> ======
>
> Some notes after having repaired the situation:
>
> - The pci paths to the radios was missing from /etc/config/wireless,
> that's the only thing that I saw that seemed grossly out of place.
>
> - Back up and running, and yes, it's much happier, now.  Over wifi I get
> 60-70Mbps upload and ~40Mbps download (running rrul).  Latency sucks.  Wifi
> has some ugly bufferbloat.  (although these results are somewhat in
> question when the router has a 1m load average over 5.0...)
>
>
> Trying to measure the one way delay here is important (and hard. The
> only tool I've found for it so far was owamp, so I'm trying to write
> that test in twd). A TON of your delay is coming from your client. A
> network connection is like a fountain, or a toilet, both sides of the
> flow count...
>
>
> - Enabling all the SQM features I was having previously also considerably
> cleaned up wifi performance.  It's more balanced, but still not nearly as
> balanced as I see on gigabit ethernet.
>
>
>
> -Aaron
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
>
>
>


-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140116/de8fd4b9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tcp_bidirectional_hms-beagle_2_cerowrt.png
Type: image/png
Size: 38902 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140116/de8fd4b9/attachment-0001.png>


More information about the Cerowrt-devel mailing list