[Cerowrt-devel] CeroWrt 3.10.50-1 diagnostic help
alan.christopher.jenkins at gmail.com
Sat Aug 1 06:00:52 EDT 2015
On 01/08/15 00:56, Rich Brown wrote:
> I would like some suggestions for debugging a problem I have with CeroWrt.
> I have deployed CeroWrt 3.10.50-1 on two WNDR3800's at a hospitality business nearby. These routers have worked fine in my house in the past. WNDR3800 #1 talks to my DSL modem (wifi disabled), and WNDR3800 #2 has its WAN wired to the LAN side of #1 (routed, no NAT). I also have a third router (Netgear something or another, running stock firmware and NAT) with its WAN port wired to WNDR3800 #2 LAN, at the far end of the property. While in operation, they work as expected, and fq_codel is doing its job (also as expected). The setup - all dashed lines are Ethernet:
> [ Internet ] --- [Fairpoint DSL Modem] --- [WNDR3800 #1] --- [WNDR3800 #2] --- [Netgear ?]
> The problem is that the Wifi locks up on either/both WNDR3800's after a while (a day or so). Guests complain that they cannot connect to the wifi. If the innkeepers reboot the router, Presto! it's fine for a while longer.
> I have only been present once when it was in the stuck state, and wired access to/through the WNDR3800 #1 was fine. My Macbook was *not* able to get a connection through wifi, but both Wifi Explorer on the Mac and Wifi-Analyzer on android could see a healthy signal level (and no overlapping channels) on the expected channel. Here's the wifi setup:
> - I only have one interface on each of the 2.4 and 5 GHz radios. (I turn off babel and the other wifi channel)
> - All SSIDs (on each of the routers) are the same string "Loch Lyme Lodge"
> -The wifi channels are different (1, 6, 11 for 2.4GHz, 36 & 44 for 5GHz) for all the routers
> My questions:
> - Any thoughts about what might be causing this?
Sorry to hear that Rich. Be prepared to give up :).
My brother's router is immediately allergic to one of his wifi devices
(not sure if the effect was limited to wifi though). That's the
variable I'd instinctively blame - wifi driver / hardware and
"incompatibility" bugs. Two incompatibilities I've seen were "known
problems". If it happens with the original firmware on a popular
device, there's likely a report of it online somewhere, though not
necessarily a fix.
I wouldn't know how to fix it. If my instinct is right you ideally want
to reproduce the exact chipset that breaks the AP. Which I wouldn't
know how to check unless I could pin it down to a laptop and look at
that :(. Don't know about phones.
Since the "signal" stays up, you can't even run it in parallel with an
automatic fallback. A manual poweroff would still be required.
> - What should I look for (log files, symptoms, etc) next time I get the word that it has happened?
> Many thanks!
Given your symptoms, you could see if the hostapd process has crashed
and isn't running any more (in "ps"), or is looping (100% cpu in
"top"). Unfortunately procd doesn't seem to log daemon deaths.
At the most basic level you could make sure connection logs are enabled
in the wpa supplicant (seems so by default) and perhaps send them
somewhere permanent. Logs are always nice. It logs the device's
unique MAC. Fwiw you could then look up the MAC online to see the "OUI"
- the vendor e.g. Broadcom.
Thought: to confirm exact failure times, leave an old phone /
raspberry-PI w/wifi plugged in with <waves hand vaguely> a ping
monitor. On the AP using a usb to avoid filling the nand? "mount
/mnt/usb-stick; cd /mnt/usb-stick; nohup ping >>ping.log &".
* nohup may require installing coreutils-nohup
** coreutils-nohup not present in cero package list :'(. Maybe try
grabbing packages from a matching version of openwrt.
 syslog to usb: http://wiki.openwrt.org/doc/howto/log.essentials#output
I guess you'd want the same "nohup CMD>>logfile &" treatment with the
command they suggest, put in the /etc/rc.local boot script. The same
"logread" will also show any default-enabled messages from the kernel.
More information about the Cerowrt-devel