From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com [IPv6:2607:f8b0:4001:c03::231]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 9A1532021DD for ; Fri, 24 Jan 2014 15:23:29 -0800 (PST) Received: by mail-ie0-f177.google.com with SMTP id at1so3593435iec.22 for ; Fri, 24 Jan 2014 15:23:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=O7+KX3Pt0Cpl9LEfZ3OhYOtIMArGaLiJItsjqkFPbhQ=; b=EKQZa5C0Gxmqaksd8dQGhU55PUeAUKCKIPJ0Y/pAPxs9fkmxchibYPxzEYPmMZLdMI /k0xNVOpXw7D5329txRejTe4457zu7TGTCOU4fsPEDsJV3RlHHCxMydhTV+G9nW7Av3R dtviryByfSnKOyMyDMRth2Y9/j/tbrXZULKBVb0pGJqOnYhpjvX4pArbNN9T8CRZ47Hk CNHIvi/aY97GPagG9BPjDkwFZ2CtQc4XjUIR547LWu0FHPyTtdSQpA0s+5lNND6pmK47 oZ5hTkdCBNtBWcpJQvFhHtS2nKnDXvRDqronQ3GnKFscYwu9EmxtkypPRMQAy3sTFIqU ZyXg== MIME-Version: 1.0 X-Received: by 10.42.88.79 with SMTP id b15mr3114105icm.65.1390605808792; Fri, 24 Jan 2014 15:23:28 -0800 (PST) Received: by 10.64.145.67 with HTTP; Fri, 24 Jan 2014 15:23:28 -0800 (PST) In-Reply-To: References: Date: Fri, 24 Jan 2014 18:23:28 -0500 Message-ID: From: Dave Taht To: Steve Jenson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] cerowrt issues (3.10.24-8) X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jan 2014 23:23:29 -0000 On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson wrote: > Hi everybody, > > I've been using cerowrt as a secondary wifi network (just a single AP for > now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state = and > eventually rebooted. I've had this happen a few times now and am looking = for > ways to debug the issue. I'm new to cerowrt and openwrt so any advice is > appreciated. > > Since I use it as a secondary network, this is no way critical. Yea! I appreciate caution before putting alpha software on your gw. > I'm not > looking for free tech support but I couldn't find anything on the wiki ab= out > troubleshooting. I'd love to start a page and write some shell scripts to > diagnose and report issues. I know that a cerowrt router is meant to be a > research project rather a consumer device but these things seem helpful > regardless. Sure, let me know your wiki account. I have been lax about granting access of late as the signup process is overrun by spammers. > Please let me know if you'd prefer I not email the list with these issues= or > if you'd rather I used trac or a different forum. > The list is where most stuff happens. Also in the irc channel. If it gets to where it needs to be tracked we have a bugtracker at http://www.bufferbloat.net/projects/cerowrt/issues The first question I have is: Are you on comcast? Cerowrt had a dhcpv6-pd implementation that "just worked" from feburary through december. Regrettably they changed the RA announcement interval to a really low number around then... and this triggers a firewall reload every minute on everything prior to the release I point to below. If there is a memory leak somewhere that would have triggered it. > -- > > In this state, I can connect to the cerowrt base station via wifi but am > unable to route packets to the internet. I can connect to :81 and see the > login page but logging in results in a lua error at `/cgi-bin/luci` > > > /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function > dispatcher target for entry '/'. > The called action terminated with an exception: > /usr/lib/lua/luci/sauth.lua:87: Session data invalid! > stack traceback: > [C]: in function 'assert' > /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' > /usr/lib/lua/luci/dispatcher.lua:195: in function > > > I can ssh into the device and cat various log files until the router hang= s > and reboots. here's a few relevant lines from my terminal history before = the > device rebooted (I'm assuming a watchdog kicked in and rebooted it). > > root@buffy2-1:~# ping google.com > ping: bad address 'google.com' > root@buffy2-1:~# free > total used free shared buffers > Mem: 126336 110332 16004 0 5616 > -/+ buffers: 104716 21620 > Swap: 0 0 0 > root@buffy2-1:~# uptime > 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17 > root@buffy2-1:~# dmesg > [ 0.000000] Linux version 3.10.24 (cero2@snapon) (gcc version 4.6.4 > (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24 > 10:50:15 PST 2013 > [skipping some lines] > > [ 13.156250] Error: Driver 'gpio-keys-polled' is already registered, > aborting... > [ 19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready > [ 19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000 > [ 19.429687] se00: link up (1000Mbps/Full duplex) > [ 22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready > [ 23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes ready > [ 23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000 > [ 23.757812] ge00: link up (1000Mbps/Full duplex) > [ 23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes ready > > root@buffy2-1:~# ifconfig > ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1 > inet addr:192.168.1.138 Bcast:192.168.1.255 Mask:255.255.255.= 0 > inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link > inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Glob= al > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1469670 errors:0 dropped:8 overruns:0 frame:0 > TX packets:547733 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:229243410 (218.6 MiB) TX bytes:57304808 (54.6 MiB) > Interrupt:5 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:65536 Metric:1 > RX packets:23689 errors:0 dropped:0 overruns:0 frame:0 > TX packets:23689 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:2612713 (2.4 MiB) TX bytes:2612713 (2.4 MiB) > > pimreg Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > UP RUNNING NOARP MTU:1472 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) > > se00 Link encap:Ethernet HWaddr 2E:B0:5D:A0:C5:B0 > inet addr:172.30.42.1 Bcast:172.30.42.31 Mask:255.255.255.224 > inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Glob= al How are you assigning your ipv6 addresses? > inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:191740 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:0 (0.0 B) TX bytes:42184988 (40.2 MiB) > Interrupt:4 > > sw00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B0 > inet addr:172.30.42.65 Bcast:172.30.42.95 Mask:255.255.255.22= 4 > inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Glob= al > inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:70239 errors:0 dropped:0 overruns:0 frame:0 > TX packets:286967 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:15590189 (14.8 MiB) TX bytes:127357293 (121.4 MiB) > > root@buffy2-1:~# less /var/log/babeld.log > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > send: Cannot assign requested address > send: Cannot assign requested address > send: Cannot assign requested address > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > netlink_read: recvmsg(): No buffer space available > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. > Couldn't determine channel of interface sw00: Invalid argument. This is a problem in babel detecting the channel on a "normal" rather than a mesh interface. It's bugged me a long while, but haven't got around to finding what triggers it. Might "fix" it by acquiring the channel at babel start time from /etc/config/wireless. It messes up the diversity routing calculation, grump. There is a possibility a logfile got really big, but this one generally doesn't, but I should turn off logging in some future release... > > > and then the system hang. Note: the first time I ran 'less > /var/log/babeld.log', it was Killed. I would assume that's an OOM > killer? Yes it really sounds like you were out of memory. Your free does not indicate that however. > I'm running cerowrt 3.10.24-8 on a wndr3800. I have a spare wndr3800 > that I'm getting ready to flash with the same version and I can see if I = get > the same issue. What's the best way to clone a configuration? scp -r therouter:/overlay somewheresafe then reflash it clean > > --- > > Here are some log lines from a separate instance of the same problem, I > recovered these log lines from the web server (I was able to login to > the web interface for another hour until I got the same lua exception > as in the beginning of this email) You can setup remote syslog logging if you lke. > This time I was completely unable to connect to wifi from my iPhone > and you can see where hostapd disassociates the iPhone soon after > it connects. > > the kernel log: > > [158995.550781] icmp6_send: no reply to icmp error > > [160930.527343] ath: phy0: Failed to stop TX DMA, queues=3D0x004! We see this all the time for 3 years now. > [160930.542968] ath: phy0: DMA failed to stop in 10 ms AR_CR=3D0x00000024 > AR_DIAG_SW=3D0x42000020 DMADBG_7=3D0x000084c0 that is kind of new. > > [160930.554687] ath: phy0: Could not stop RX, we could be confusing the D= MA > engine when we start RX up So is that. > > and in the system log: > > Fri Jan 24 15:51:15 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 15:51:15 2014 daemon.err avahi-daemon[1273]: netlink.c: recvms= g() > failed: No buffer space available > > Fri Jan 24 15:52:18 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > > root@buffy2-1:/tmp/log# free > > total used free shared buffers > > Mem: 126336 110128 16208 0 5616 > > -/+ buffers: 104512 21824 > > Swap: 0 0 0 > > > Fri Jan 24 17:14:53 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 17:15:53 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: authenticated > > Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: associated (aid 1) > > Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > RADIUS: starting accounting session 52DC70A7-00000010 > > Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > WPA: pairwise key handshake completed (RSN) > > Fri Jan 24 17:16:18 2014 daemon.info dnsmasq-dhcp[4999]: DHCPREQUEST(sw00= ) > 192.168.1.114 90:72:40:e9:9c:b6 > > Fri Jan 24 17:16:18 2014 daemon.info dnsmasq-dhcp[4999]: DHCPNAK(sw00) > 192.168.1.114 90:72:40:e9:9c:b6 wrong network > > Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPDISCOVER(sw0= 0) > 90:72:40:e9:9c:b6 > > Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPOFFER(sw00) > 172.30.42.82 90:72:40:e9:9c:b6 > > Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPDISCOVER(sw0= 0) > 90:72:40:e9:9c:b6 > > Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPOFFER(sw00) > 172.30.42.82 90:72:40:e9:9c:b6 > > Fri Jan 24 17:16:22 2014 daemon.info dnsmasq-dhcp[4999]: DHCPREQUEST(sw00= ) > 172.30.42.82 90:72:40:e9:9c:b6 > > Fri Jan 24 17:16:22 2014 daemon.info dnsmasq-dhcp[4999]: DHCPACK(sw00) > 172.30.42.82 90:72:40:e9:9c:b6 stevejs-iPhone > > Fri Jan 24 17:16:35 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: disassociated > > Fri Jan 24 17:16:36 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE) > > Fri Jan 24 17:16:57 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 17:17:59 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 17:19:04 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 17:19:04 2014 daemon.err avahi-daemon[1273]: netlink.c: recvms= g() > failed: No buffer space available > > Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: authenticated > > Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: associated (aid 1) > > Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > RADIUS: starting accounting session 52DC70A7-00000011 > > Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > WPA: pairwise key handshake completed (RSN) > > Fri Jan 24 17:19:45 2014 daemon.info dnsmasq-dhcp[4999]: DHCPDISCOVER(sw0= 0) > 90:72:40:e9:9c:b6 > > Fri Jan 24 17:19:45 2014 daemon.info dnsmasq-dhcp[4999]: DHCPOFFER(sw00) > 172.30.42.82 90:72:40:e9:9c:b6 > > Fri Jan 24 17:19:47 2014 daemon.info dnsmasq-dhcp[4999]: DHCPREQUEST(sw00= ) > 172.30.42.82 90:72:40:e9:9c:b6 > > Fri Jan 24 17:19:47 2014 daemon.info dnsmasq-dhcp[4999]: DHCPACK(sw00) > 172.30.42.82 90:72:40:e9:9c:b6 stevejs-iPhone > > Fri Jan 24 17:20:01 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: disassociated > > Fri Jan 24 17:20:02 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6 > IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE) > > Fri Jan 24 17:20:09 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > Fri Jan 24 17:21:10 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0):= No > buffer space available > > and then it hung and rebooted. You are running out of ram somewhere unobservable, and yes, I'd be a little nervous that it isn't actually a memory or hw problem. > > I bought this particular wndr3800 used so it's possible I have a bad piec= e > of hardware. Are there any hardware testing scripts or anything like > memtest86 for these devices? Are there entries in /sys that I should be > looking at? I've noticed about 9k unaligned_instructions on boot but that > never grows. I have only observed 4 in the last 24 hours on this build: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2014-January/002093.h= tml > > Thanks! > Steve Jenson > @stevej > > > > > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel > --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.= html