[Cerowrt-devel] Recording RF management info _and_ associated traffic?

David Lang david at lang.hm
Thu Jan 22 04:18:01 EST 2015


On Thu, 22 Jan 2015, Richard Smith wrote:

> On 01/21/2015 06:58 PM, David Lang wrote:
>> On Wed, 21 Jan 2015, Richard Smith wrote:
>> 
>
> Thanks for the response.  First I want to say that I'm sensitive to the fact 
> that this is the Cerowrt-devel list and not the small business WiFi help 
> list.  If things go too far off-topic or people get tired of the discussion 
> let me know and I'll take it off the list.
>
>> Ok, this would suggest that you are unlikely to have interference
>> causing your problems. I don't have the earlier part of this thread
>> still in my mailbox, what is the problem that you are trying to solve
>> again?
>
> I didn't really describe the problem(s) in detail (see above note) but I'll 
> provide a detailed description of my woes.
>
> We have a small network of about 30 people or so with ~60 devices connected. 
> Most of which are wireless of some sort (both 2.4Ghz and 5hz).  Here's my 
> issues + my story. :)
>
> 1) Periodic reports of poor "Internet". However, its not the Internet uplink. 
> I setup a netperf-wrapper test that goes off every 10 minutes with a brief 
> speed+latency test to a well connected host.  Tracked across several weeks 
> the uplink/downlink always exactly as expected.  So I'm suspecting it's poor 
> wireless rather than poor Internet.
>
> 2) Occasional total loss of WiFi.  This a bit fuzzy since I have multiple 
> hardware permutations and currently no consistent failure.
>
> The story:
>
> Originally we had an Engenius 2.4/5Ghz AP and a Netgear AP/router (WiFi 
> turned off).  I can't remember the original router model number. I didn't set 
> any of the original hardware up.
>
> Several times a week the Engenius AP would stop passing traffic.  A power 
> cycle or reboot would fix it.  The Engenius forums had lots of people 
> reporting similar problems.  We did firmware upgrades which seemed to help 
> but not eliminate the issue.
>
> Sometime later we added VoIP phones.  But bufferbloat in the cable modem 
> caused large latencies under load and VoIP was unhappy.
>
> Enter the trusty WNDR3700v2 from my stash with OpenWRT (pre-barrier breaker 
> build).  I replaced both the original router and the Engenius AP with it.
>
> QoS solved VoIP issues and for the most part wireless was happy.  Still 
> occasionally though 5Ghz would stop working but much less frequent than the 
> Engenius.  Rebooting the box would fix it.  I suspected the single box 
> running all the AP + DHCP + DNS + routing may not have had the resources for 
> our load or perhaps the pre-release of barrier breaker had issues.
>
> Replaced the routing/DHCP/DNS/QoS portion with a x86 box running OpenWRT x86 
> (using released barrier breaker, but locally built).   Now the WNDR3700v2 was 
> just an AP.  This also allowed us actually get our rated cable modem speed. 
> QoS on the wndr was capping out at ~60Mbps, a well known limit among members 
> of this list.
>
> Around the same time I also added a 2nd AP on a different 5Ghz channel 
> (TP-Link AC1750) to spread the connected clients across multiple channels. 
> They have different ESSIDs.  Things seem to be happy.  I got the the TP-Link 
> because its on target to be supported by OpenWRT and has 3 external antennas 
> which I though might provided a path for different antenna testing.
>
> Recently, we picked up the 11th floor as well and moved many people up there. 
> I got a 3rd AP (another TP-Link AC1750) and set that one up on a free channel 
> with a different ESSID.

I like to put all the APs on the same ESSID so that people can roam between 
them. This requires that the APs act as bridges to a dedicated common network, 
not as routers.

> Then about a week before my original post I got notified that Internet was 
> down.  Both 10th floor APs had stopped working.  The 11th floor (where I am) 
> was still working.   On the 10th floor, I could connect to the  TP-link via 
> its IP address on its wired interface but it did not seem to be passing 
> wireless traffic. A reboot fixed it.

There has been an ongoing bug with Apple devices on 5Ghz that causes the wifi 
chipset to lockup. We think we've fixed it in the current Cerowrt, but I don't 
know what kernel versions have this problem. This is likely to affect multiple 
vendors who use the same chipset (check the openwrt hardware list for details of 
the chipsets in each model)

> The WNDR3700 was completely unresponsive both via WiFi and when I tried its 
> IP connected directly to it's switch with a Cat-5.  I also have a serial port 
> mod on that wndr3700 so I connected up to that instead.

hmm, it's not common to have it be unresponsive on the wired network.

> From the serial port everything appeared to be running fine only no would 
> pass on the bridge.  Dropping the interfaces with ifconfig and then bringing 
> them back up had no effect and I didn't see anything unusual in the system 
> logs.  A power cycle fixed it.  I've never seen my wndr3700 do something like 
> that.
>
> So then I really began to wonder... that's 3 different hardware vendors with 
> 3 very different firmware's all that had similar issues.  2 of them at 
> exactly the same time.
>
> I considered the possibility of a power event but the 2 APs are on different 
> circuits and in physically different locations.  The power connection for the 
> wndr3700 also has the x86 router, 2 switches, the cable modem, and a linux 
> box plugged up and all of those devices were still working.
>
> That's when I figured I needed to start looking at what was going on in RF 
> land.  At that time I didn't have anything like horst to be able to verify 
> that wireless really was broken and not some other mysterious network 
> gremlin. So I started tooling up.  When it happens again I can investigate 
> deeper.   I have a 2nd wndr3700v2 at my disposal set up in monitor on that 
> channel that I can run horst on when the next total loss happens.
>
> It's not happened again.  While I'm waiting I've been trying to look into 
> issue 1 by trying to understand what is really happing on the RF channel its 
> on.  Thus my query about wanting to see associated network traffic decoded 
> along with the radiotap info.
>
>> When you do a wifi survey, you are not just looking at one spot, or near
>> the APs for what you see. You should also be going to all the areas your
>> users are going to be trying to access your network and see if you have
>> a strong enough signal from at least one AP everywhere.
>
> I have taken readings at multiple points in the office but it was not a very 
> rigorous survey. I should repeat with more care.  The wireless signal 
> indicators most clients I've messed with show good strength.
>
> Our floor(s) are fairly small and almost completely open. There are no 
> cubicles and very few internal walls. There are some offices and conference 
> rooms but each of them have large walls of glass that look into the center of 
> the room.   The only big obstruction is a large concrete pillar in the center 
> of the room.  The 10th floor TPlink AP is located in a ceiling cable tray 
> very close to the center of the room. All the stations are in about a 40 foot 
> radius and all but 1 or 2 have line of sight to the AP.  The wndr3700 is in a 
> closet on the side of the room with other equipment so it might be 80 feet 
> away from the furthest station or so.

this doesn't sound unreasonable unless your users are trying to use a LOT of 
bandwidth (although the fact that you refer to the 50Mb bottleneck indicates 
that you may be)

>> Also note that
>> if you have high-power APs,
>
> What Tx level qualifies as a high-power AP?  The wndr says 50mW.  The tplink 
> just gives me low,medium,and high as choices.  It's still at the default of 
> high.
>
>> you may hear a signal from them, but they
>> may not be able to hear the signal from the mobile device very well.
>> Mobile devices tend to have lousy antennas, and try to operate a lower
>> power levels to save battery power. So you may need to look at the stats
>> on the AP showing the signal it sees from the client.
>
> I can see those for things connected to the wndr unit but sadly the stock 
> tplink firmware does not show me rx strength.
>
> Can I perhaps approximate signal strength by looking at the bitrate for 
> packets that station sends?  The theory being that higher quality RF links 
> should use the higher bitrate encodings when sending.

not reliably, too many other things factor in to that.

> If need be I can move the wndr to the same location as the tplink and then 
> have stations connect to the wndr so I can watch the rx signal strength.
>
>> Assuming that you have enough signal, the next question is how many
>> people are going to be trying to use the network at one time. You may be
>> better off with more APs operating at lower power levels so that you
>> have fewer people talking to each one.
>
> The tplink is better located so in general people tend to use that one over 
> the the wndr. Last check it has around 20 stations connected to it during the 
> day. The rest are connected to the 2 other APs.
>
> Thanks again for any insights you have.
>
> Lastly, I've been doing some reading on getting enterprise class APs from 
> Cisco, HP, etc.  A large number of them seem to require a lot of extra 
> infrastructure running wireless controllers and special software you have to 
> run to set them up.
>
> Any recommendations for something that's a step above consumer grade devices 
> but that does not require additional controllers or licensed software would 
> be appreciated.

There is a lot of room with consumer grade equipment from where you currently 
are. The "Enterprise Grade" systems do have a lot of infrastructure to 
coordinate the different APs.

David Lang




More information about the Cerowrt-devel mailing list