From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id EBCC1201B02 for ; Thu, 16 Jan 2014 15:25:41 -0800 (PST) Received: from hms-beagle-3.home.lan ([217.254.130.56]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0ME33j-1W82dw3RnD-00HSGv for ; Fri, 17 Jan 2014 00:25:32 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) From: Sebastian Moeller In-Reply-To: Date: Fri, 17 Jan 2014 00:25:31 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <28FC2E07-1E58-4613-8446-E05DB34FC5DB@gmx.de> References: <43C8C069-3FC9-465D-A554-011DBCE7132D@gmx.de> To: Dave Taht X-Mailer: Apple Mail (2.1510) X-Provags-ID: V03:K0:9T6w1hz/Z0klPOuwjudHv28gs9eAB7RTcfmhaWeToeIhxvzXcMC C4DERwzQGRBQPUsto95JuTW05Xy8TVfMp7i5ZYhY/OzfhfyER4ulLYdMxQKaV2uWB8oqHop 5b816en5ClroH4q90vi2nKiskYVmjJ+Rv460gJdWjJGPKyfXOmrq0wVok83gHhHBo8fGy/5 poDU6aqKaL4zu3mEhDVCQ== Cc: cerowrt-devel Subject: Re: [Cerowrt-devel] Managed to break 802.11n (on a 3800) X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jan 2014 23:25:59 -0000 Hi Dave, On Jan 17, 2014, at 00:12 , Dave Taht wrote: >=20 >=20 >=20 > On Thu, Jan 16, 2014 at 5:56 PM, Sebastian Moeller = wrote: > Hi Dave, >=20 > many thanks for all the information & elucidation, as always. >=20 > I enjoy trying to find the words to explain. > =20 >=20 > On Jan 16, 2014, at 23:30 , Dave Taht wrote: >=20 >> On Thu, Jan 16, 2014 at 10:29 AM, Sebastian Moeller = wrote: >>> Hi Aaron, >>>=20 >>> On Jan 16, 2014, at 16:03 , Aaron Wood wrote: >>>=20 >>>> All, >>>>=20 >>>> I'm noting this here in case anyone is interested. After I write = this up, I'm going to start from scratch on the configuration, and = factory-reset the router. >>>>=20 >>>> =3D=3D=3D=3D=3D >>>>=20 >>>> The 5GHz radio on my 3800 seems to be in a very odd state. I'm not = quire sure what state it's in, but it seems to be only doing HT20 1x1. = And in a fairly broken manner at that. >>>>=20 >>>> Running the rrul test (over wifi directly to the router as the = netserver), tcp uploads were 25Mbps or so, but download was 5Mbps. >>>=20 >>> This is with your mac? Try rrul_noclassification, macosx (at = least 10.8) will not do RRUL fair to a fast host. Why I do not know=85 = it always prioritizes the upload, as if it did not see/trust the = downstream markings (heck maybe it is busy using all bandwidth for = upstream so that it literally never sees the markings on the downstream = packets..) >>=20 >> rrul with classification blows up 802.11e on all devices, everywhere. >> The VO and VI queues generally get all the bandwidth. >> Been saying that a while. VO and VI should be strictly admission >> controlled and are not, anywhere. All the queues fill >> and bad things happen. What should happen in a 802.11n world is that = a >> set of packets should wind up in the best queue for the TXOP, and VO >> used not at all. >>=20 >> rrul_noclassification better looks like the intent for classification >> was for 802.11e and thus works better. There are a couple >> other tests in the netperf-wrapper suite that don't use = classification >> at all, that might be saner to use. >=20 > Ah, so in rrul_noclassification, the UDP flows still are tos = marked (at least that is reported in the plots and visible in the = plots), but even using tcp_bidirectional I see a crazy imbalance 80:1, = so this laptop's Broadcom BCM43xx (apple is not as informative as I = would like about the components, but the firmware marker points at = broadcom I would say) isn't better than the intel wifi in your's I would = say=85 >=20 > the iwl is a nightmare. the 802.11ac stuff is looking bad too. >=20 > Another issue with the current implementation of rrul is my intent = with the specification was to test voip-like streams, an > isochronous 10ms packet in each direction.=20 >=20 > The implementation currently sends measurement flows based on the RTT, = just like ping. As the RTT declines in length,=20 > the amount of "space" used up by the measurement flow gets bigger and = bigger. At a 3ms RTT, just the EF measurement > flow eats ~2/3s of the available txops as it runs through the VO = queue, which is limited to a single packet per txop. So, how much data could one fit into a txop? Would it make sense = for the driver to "pad" the VO txop with other data just to efficiently = use the air bottleneck? > The other measurement flows like the CS5 flow, eat the VI queue, and = the BE and BK queues get starved for tops. Ah so this is why I only see the TOS UDP data in the = rrul_noclassification test, as they are otherwise crowded out by the tcp = streams of same class, and nbot reported after the first drop... >=20 > I can barely explain to myself how the queues are supposed to get = airtime scheduled, see the 802.11e page on wikipedia. I thought 802.11e = was a bad idea in the first place... but what rrul does is try to get = txops on all 4 queues, which means it > needs 4x as much airtime (this is not accurate), and grabs airtime for = it's VO queue first most of the time, followed by > VI, BE, and bk. >=20 > I think for wifi testing with the current rrul test there needs to be = a new test that does everything in BE. (toke?) > Classification is very rarely used in the real world anyway. So that means the UDP streams as well? >=20 > Most of the usage of rrul to date has been over longer RTTs over = ethernet... (again, I'm delighted y'all are doing this, > and I do hope to get a more voip-like test)=20 Yeah netperf-wrapper has been a delight in getting the ATM mess = sorted out, great work. And now with the successor in the works things = will get even better :) > =20 >=20 > >=20 >>=20 >> lastly, if you are doing a test over the internet, many providers pee >> on the tos bits. Unless you've done a packet capture, you can't trust >> that you are actually seeing classified packets coming back from the >> internet. >=20 > Good point, comparing just the local rrul plots with the ones to = demo, I see what you mean, there is a tiny bit of the priority classes = visible in the uplink (bur barely) and none at all in the downlink, so = my ISP does not think too much of the toe bits (I guess the tos effect = on the uplink is from what cero is doing and since cero controls the = bottleneck some "imprint" remains to be seen at packet reception time at = demo, or so I think...). >=20 > simple.qos respects 3 of the 4 tiers that wifi does. >=20 > simplest does not. I know, even though I have no real use case I like the general = idea of having dedicated bandwidth-limited channels for normal, = important , and background traffic. Sort of just in case, belt and = suspender kind of thinking. > =20 >=20 >>=20 >> One of the things I hope to fix with the twd effort is to detect tos >> bit preservation and note it in the test. >>=20 >> I'm delighted you'all are seeing these results for yourselves. = Getting >> dinged on bandwidth after aiming for low latency by the public is not >> something I'd wanted to happen with a "stable" release. Regrettably >> fixing the drivers to work better only has >> felix working on it in his spare time, and I've been trying to clear >> my plate for months to help do the delicate rework >> required. (or recruit others to help) >=20 > I would love to help, but this is far out of my league and area = of expertise=85 >=20 >=20 > yer helping plenty, and the more people that "get this", the sooner = people will work > on fixing it. I have enjoyed trying to explain these behaviors today. = Someday > once we have words that match the concepts they will make sense to a = CTO. > =20 > I have been very pleased by googling for bufferbloat of late. Almost = everyone that > has talked about it on the web for the past month seems to get it. >=20 > So if we start now, and make this the year of "make-wifi-fast", in a = couple years > maybe the world will get it... >=20 > ... sadly long after 802.11ac is fully deployed and messing up = everything for > everybody. "Make wifi fast" is a pretty good motto=85 Best regards Sebastian > best > Sebastian >=20 >>=20 >>=20 >>> About the other issue I do not know anything=85 >>>=20 >>> Best Regards >>> Sebastian >>>=20 >>>> This is me 1-2 meters from the router. Load was never more than = 0.33. (I can share the results of people are interested). >>>>=20 >>>> After a full power cycle, wifi isn't coming up at all. >>>>=20 >>>> =3D=3D=3D=3D=3D >>>>=20 >>>> How I got here: >>>>=20 >>>>=20 >>>> I'm in France, and had dutifully set my unit with the FR country = code when setting up CeroWRT. I had noticed some odd latencies = (periodic 100-200ms latency every 10-20 seconds over wifi) on the 5GHz = network. The router was on channel 36, and I wanted to move it up to = the far-upper ranges, so I tried to specify a "custom" channel to do so = (140). This was the channel I thought I had been using with stock = (Netgear) firmware. >>>>=20 >>>> Wifi didn't come back up after applying the changes, and the luci = interface seemed to be tripping up over stuff that it was reading out of = the configuration files. >>>>=20 >>>> I ssh'd in via ethernet, and fixed up the configurations by hand. >>>>=20 >>>> Except the driver is still reporting that the 5GHz network won't = kick into 802.11n modes, and won't use HT40. It seems to be sure it's = configured for it, but isn't using it. >>>>=20 >>>> Further, digging into the rc_stats files with the minstrel speeds, = I found some very odd data (not what I was expecting to see: >>>>=20 >>>> (laptop, which can do 2x2 HT40) >>>> rate throughput ewma prob this prob this succ/attempt = success attempts >>>> D 6 6.0 99.9 100.0 2( 2) = 65 65 >>>> 9 0.0 0.0 0.0 0( 0) = 0 0 >>>> 12 2.9 25.0 100.0 0( 0) = 1 1 >>>> 18 4.3 25.0 100.0 0( 0) = 1 1 >>>> 24 5.6 25.0 100.0 0( 0) = 1 1 >>>> A P 36 32.4 99.9 100.0 0( 0) = 51 51 >>>> C 48 10.4 25.0 100.0 0( 0) = 1 1 >>>> B 54 11.5 25.0 100.0 0( 0) = 1 1 >>>>=20 >>>> Total packet count:: ideal 53 lookaround 7 >>>>=20 >>>> (AppleTV, 1x1 HT20) >>>> root@cerowrt:/sys/kernel/debug/ieee80211/phy1/netdev:sw10# cat = stations/58\:55\:ca\:51\:b5\:4b/rc_stats >>>> rate throughput ewma prob this prob this succ/attempt = success attempts >>>> 6 3.5 57.8 100.0 0( 0) = 6 6 >>>> 9 3.9 43.7 100.0 0( 0) = 2 2 >>>> 12 5.1 43.7 100.0 0( 0) = 2 2 >>>> 18 10.0 57.8 100.0 0( 0) = 3 3 >>>> D 24 13.1 57.8 100.0 0( 0) = 3 3 >>>> C 36 14.2 43.7 100.0 0( 0) = 2 2 >>>> B 48 18.2 43.7 100.0 0( 0) = 2 2 >>>> A P 54 46.2 99.9 100.0 1( 1) = 348 367 >>>>=20 >>=20 >> No AMPDUs. Hmm. Might be a bug. >>=20 >>>> Total packet count:: ideal 331 lookaround 37 >>=20 >> Hmm. The radios are set for HT20 for the 2.4ghz and HT40+ for the >> 5ghz. I note that >> HT40 in wireless-n the 8 channels used up need to be congruent. >>=20 >> HT40+ is 36+40, and 44+48 for example. You can't do 40+44. >>=20 >> Availability of HTXX is dependent upon your regulatory domain. >>=20 >>>> Whereas what I'm seeing for the 2.4GHz radio is: >>>>=20 >>>> root@cerowrt:/sys/kernel/debug/ieee80211/phy0/netdev:sw00/stations# = cat 10\:9a\:dd\:30\:96\:34/rc_stats >>>> type rate throughput ewma prob this prob retry = this succ/attempt success attempts >>>> CCK/LP 1.0M 0.7 100.0 100.0 0 = 0( 0) 2 2 >>>> CCK/SP 2.0M 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> CCK/SP 5.5M 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> CCK/SP 11.0M 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> HT20/LGI MCS0 5.6 100.0 100.0 1 = 0( 0) 2 2 >>>> HT20/LGI MCS1 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> HT20/LGI MCS2 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> HT20/LGI MCS3 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> HT20/LGI MCS4 0.0 0.0 0.0 0 = 0( 0) 0 0 >>>> HT20/LGI MCS5 30.3 100.0 100.0 5 = 0( 0) 1 1 >>>> HT20/LGI t MCS6 32.5 100.0 100.0 5 = 0( 0) 11 11 >>>> HT20/LGI T P MCS7 35.0 100.0 100.0 5 = 6( 6) 34 34 >>>>=20 >>>> Total packet count:: ideal 45 lookaround 3 >>>> Average A-MPDU length: 1.3 >>=20 >> You are doing good at the highest possible rate. However packet >> aggregation is pretty terrible. >>=20 >>>>=20 >>>> And here are radio blocks from the current /etc/config/wireless: >>>>=20 >>>> config wifi-device 'radio1' >>>> option type 'mac80211' >>>> option macaddr '28:c6:8e:bb:9a:49' >>>> list ht_capab 'SHORT-GI-40' >>>> list ht_capab 'TX-STBC' >>>> list ht_capab 'RX-STBC1' >>>> list ht_capab 'DSSS_CCK-40' >>>> option txpower '17' >>>> option distance '25' >>>> option channel '48' >>>> option country 'US' >>>>=20 >>>> config wifi-device 'radio0' >>>> option type 'mac80211' >>>> option hwmode '11ng' >>>> option macaddr '28:c6:8e:bb:9a:47' >>>> option htmode 'HT20' >>>> list ht_capab 'SHORT-GI-40' >>>> list ht_capab 'TX-STBC' >>>> list ht_capab 'RX-STBC1' >>>> list ht_capab 'DSSS_CCK-40' >>>> option txpower '26' >>>> option country 'FR' >>>> option distance '15' >>>> option channel 'auto' >>=20 >> I don't know anyone that has fiddled with distance to such an extent. >> your country codes need to be the same and you should look at what >> is allowed in FR. >>=20 >>>> =3D=3D=3D=3D=3D=3D >>>>=20 >>>> Some notes after having repaired the situation: >>>>=20 >>>> - The pci paths to the radios was missing from = /etc/config/wireless, that's the only thing that I saw that seemed = grossly out of place. >>>>=20 >>>> - Back up and running, and yes, it's much happier, now. Over wifi = I get 60-70Mbps upload and ~40Mbps download (running rrul). Latency = sucks. Wifi has some ugly bufferbloat. (although these results are = somewhat in question when the router has a 1m load average over 5.0...) >>=20 >> Trying to measure the one way delay here is important (and hard. The >> only tool I've found for it so far was owamp, so I'm trying to write >> that test in twd). A TON of your delay is coming from your client. A >> network connection is like a fountain, or a toilet, both sides of the >> flow count... >>=20 >>>>=20 >>>> - Enabling all the SQM features I was having previously also = considerably cleaned up wifi performance. It's more balanced, but still = not nearly as balanced as I see on gigabit ethernet. >>>>=20 >>>>=20 >>>>=20 >>>> -Aaron >>>> _______________________________________________ >>>> Cerowrt-devel mailing list >>>> Cerowrt-devel@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>=20 >>> _______________________________________________ >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>=20 >>=20 >>=20 >> --=20 >> Dave T=E4ht >>=20 >> Fixing bufferbloat with cerowrt: = http://www.teklibre.com/cerowrt/subscribe.html >=20 >=20 >=20 >=20 > --=20 > Dave T=E4ht >=20 > Fixing bufferbloat with cerowrt: = http://www.teklibre.com/cerowrt/subscribe.html