From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x22c.google.com (mail-wi0-x22c.google.com [IPv6:2a00:1450:400c:c05::22c]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 4E07E21FF8F for ; Sat, 1 Aug 2015 03:00:57 -0700 (PDT) Received: by wicgj17 with SMTP id gj17so46471933wic.1 for ; Sat, 01 Aug 2015 03:00:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=J8J1d6J+8iirVo4RXGUGdJiTgEmY/6TSYqVew9ByOf8=; b=F88D0qBEU+TnueR+sTHFKQ5qxVjTis9FRzjSx82ovMLvc6zLq1Qz5sXZxpDvHxvGy+ QnSNtT4d7EyzKTfMcepRxgcf4TeAE/uwC6crM7wavgUvEoQYhh5LNWFY140OAGXcnpjg BzpAfPy2tE4aFdXy910h+TxEbmIqNvB7SM+/fEmFszsQf670eebrwZgr1suktVdwqFIb tHkVG2//yEPLjyxJeOFT2UbkC3ruZtE8UuXVx4n4fAhz8qudIEv3dC5jiyq8ytH0p3FG 3cu+dZsMVrUwbcoNn60iahE9w0mjjZKfvYNfH/mlq0+lp19JozjrbQZYRqmHEAH0maNO wnFQ== X-Received: by 10.194.90.70 with SMTP id bu6mr15022127wjb.149.1438423255037; Sat, 01 Aug 2015 03:00:55 -0700 (PDT) Received: from volcano.localdomain (host-89-243-97-79.as13285.net. [89.243.97.79]) by smtp.googlemail.com with ESMTPSA id ib9sm11711501wjb.2.2015.08.01.03.00.53 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 01 Aug 2015 03:00:54 -0700 (PDT) To: Rich Brown , cerowrt-devel References: From: Alan Jenkins Message-ID: <55BC98D4.4020507@gmail.com> Date: Sat, 1 Aug 2015 11:00:52 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Cerowrt-devel] CeroWrt 3.10.50-1 diagnostic help X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Aug 2015 10:01:26 -0000 On 01/08/15 00:56, Rich Brown wrote: > Folks, > > I would like some suggestions for debugging a problem I have with CeroWrt. > > I have deployed CeroWrt 3.10.50-1 on two WNDR3800's at a hospitality business nearby. These routers have worked fine in my house in the past. WNDR3800 #1 talks to my DSL modem (wifi disabled), and WNDR3800 #2 has its WAN wired to the LAN side of #1 (routed, no NAT). I also have a third router (Netgear something or another, running stock firmware and NAT) with its WAN port wired to WNDR3800 #2 LAN, at the far end of the property. While in operation, they work as expected, and fq_codel is doing its job (also as expected). The setup - all dashed lines are Ethernet: > > [ Internet ] --- [Fairpoint DSL Modem] --- [WNDR3800 #1] --- [WNDR3800 #2] --- [Netgear ?] > > The problem is that the Wifi locks up on either/both WNDR3800's after a while (a day or so). Guests complain that they cannot connect to the wifi. If the innkeepers reboot the router, Presto! it's fine for a while longer. > > I have only been present once when it was in the stuck state, and wired access to/through the WNDR3800 #1 was fine. My Macbook was *not* able to get a connection through wifi, but both Wifi Explorer on the Mac and Wifi-Analyzer on android could see a healthy signal level (and no overlapping channels) on the expected channel. Here's the wifi setup: > > - I only have one interface on each of the 2.4 and 5 GHz radios. (I turn off babel and the other wifi channel) > - All SSIDs (on each of the routers) are the same string "Loch Lyme Lodge" > -The wifi channels are different (1, 6, 11 for 2.4GHz, 36 & 44 for 5GHz) for all the routers > > My questions: > > - Any thoughts about what might be causing this? Sorry to hear that Rich. Be prepared to give up :). My brother's router is immediately allergic to one of his wifi devices (not sure if the effect was limited to wifi though). That's the variable I'd instinctively blame - wifi driver / hardware and "incompatibility" bugs. Two incompatibilities I've seen were "known problems". If it happens with the original firmware on a popular device, there's likely a report of it online somewhere, though not necessarily a fix. I wouldn't know how to fix it. If my instinct is right you ideally want to reproduce the exact chipset that breaks the AP. Which I wouldn't know how to check unless I could pin it down to a laptop and look at that :(. Don't know about phones. Since the "signal" stays up, you can't even run it in parallel with an automatic fallback. A manual poweroff would still be required. > - What should I look for (log files, symptoms, etc) next time I get the word that it has happened? > > Many thanks! > > Rich Given your symptoms, you could see if the hostapd process has crashed and isn't running any more (in "ps"), or is looping (100% cpu in "top"). Unfortunately procd doesn't seem to log daemon deaths. At the most basic level you could make sure connection logs are enabled in the wpa supplicant (seems so by default) and perhaps send them somewhere permanent[1]. Logs are always nice. It logs the device's unique MAC. Fwiw you could then look up the MAC online to see the "OUI" - the vendor e.g. Broadcom. Thought: to confirm exact failure times, leave an old phone / raspberry-PI w/wifi plugged in with a ping monitor. On the AP using a usb to avoid filling the nand? "mount /mnt/usb-stick; cd /mnt/usb-stick; nohup ping >>ping.log &". * nohup may require installing coreutils-nohup ** coreutils-nohup not present in cero package list :'(. Maybe try grabbing packages from a matching version of openwrt. [1] syslog to usb: http://wiki.openwrt.org/doc/howto/log.essentials#output I guess you'd want the same "nohup CMD>>logfile &" treatment with the command they suggest, put in the /etc/rc.local boot script. The same "logread" will also show any default-enabled messages from the kernel. Alan