[Cerowrt-devel] cerowrt issues (3.10.24-8)

Dave Taht dave.taht at gmail.com
Fri Jan 24 18:23:28 EST 2014


On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson <stevej at fruitless.org> wrote:
> Hi everybody,
>
> I've been using cerowrt as a secondary wifi network (just a single AP for
> now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state and
> eventually rebooted. I've had this happen a few times now and am looking for
> ways to debug the issue. I'm new to cerowrt and openwrt so any advice is
> appreciated.
>
> Since I use it as a secondary network, this is no way critical.

Yea! I appreciate caution before putting alpha software on your gw.

> I'm not
> looking for free tech support but I couldn't find anything on the wiki about
> troubleshooting. I'd love to start a page and write some shell scripts to
> diagnose and report issues. I know that a cerowrt router is meant to be a
> research project rather a consumer device but these things seem helpful
> regardless.

Sure, let me know your wiki account. I have been lax about granting
access of late as the signup process is overrun by spammers.

> Please let me know if you'd prefer I not email the list with these issues or
> if you'd rather I used trac or a different forum.
>

The list is where most stuff happens. Also in the irc channel.

If it gets to where it needs to be tracked we have a bugtracker at

http://www.bufferbloat.net/projects/cerowrt/issues

The first question I have is: Are you on comcast? Cerowrt
had a dhcpv6-pd implementation that "just worked" from feburary through
december. Regrettably they changed the RA announcement interval
to a really low number around then... and this triggers a firewall reload
every minute on everything prior to the release I point to below.

If there is a memory leak somewhere that would have triggered it.

> --
>
> In this state, I can connect to the cerowrt base station via wifi but am
> unable to route packets to the internet. I can connect to :81 and see the
> login page but logging in results in a lua error at `/cgi-bin/luci`
>
>
>   /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function
> dispatcher target for entry '/'.
>   The called action terminated with an exception:
>   /usr/lib/lua/luci/sauth.lua:87: Session data invalid!
>   stack traceback:
>   [C]: in function 'assert'
>   /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch'
>   /usr/lib/lua/luci/dispatcher.lua:195: in function
> </usr/lib/lua/luci/dispatcher.lua:194>
>
> I can ssh into the device and cat various log files until the router hangs
> and reboots. here's a few relevant lines from my terminal history before the
> device rebooted (I'm assuming a watchdog kicked in and rebooted it).
>
> root at buffy2-1:~# ping google.com
> ping: bad address 'google.com'
> root at buffy2-1:~# free
>              total         used         free       shared      buffers
> Mem:        126336       110332        16004            0         5616
> -/+ buffers:             104716        21620
> Swap:            0            0            0
> root at buffy2-1:~# uptime
>  02:08:54 up 2 days,  1:26,  load average: 0.10, 0.21, 0.17
> root at buffy2-1:~# dmesg
> [    0.000000] Linux version 3.10.24 (cero2 at snapon) (gcc version 4.6.4
> (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24
> 10:50:15 PST 2013
> [skipping some lines]
>
> [   13.156250] Error: Driver 'gpio-keys-polled' is already registered,
> aborting...
> [   19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready
> [   19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000
> [   19.429687] se00: link up (1000Mbps/Full duplex)
> [   22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready
> [   23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes ready
> [   23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000
> [   23.757812] ge00: link up (1000Mbps/Full duplex)
> [   23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes ready
>
> root at buffy2-1:~# ifconfig
> ge00      Link encap:Ethernet  HWaddr 2C:B0:5D:A0:C5:B1
>           inet addr:192.168.1.138  Bcast:192.168.1.255  Mask:255.255.255.0
>           inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link
>           inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Global
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1469670 errors:0 dropped:8 overruns:0 frame:0
>           TX packets:547733 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:229243410 (218.6 MiB)  TX bytes:57304808 (54.6 MiB)
>           Interrupt:5
>
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:23689 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:23689 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:2612713 (2.4 MiB)  TX bytes:2612713 (2.4 MiB)
>
> pimreg    Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           UP RUNNING NOARP  MTU:1472  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>
> se00      Link encap:Ethernet  HWaddr 2E:B0:5D:A0:C5:B0
>           inet addr:172.30.42.1  Bcast:172.30.42.31  Mask:255.255.255.224
>           inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Global

How are you assigning your ipv6 addresses?

>           inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:191740 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:0 (0.0 B)  TX bytes:42184988 (40.2 MiB)
>           Interrupt:4
>
> sw00      Link encap:Ethernet  HWaddr 2C:B0:5D:A0:C5:B0
>           inet addr:172.30.42.65  Bcast:172.30.42.95  Mask:255.255.255.224
>           inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Global
>           inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:70239 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:286967 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:15590189 (14.8 MiB)  TX bytes:127357293 (121.4 MiB)
>
> root at buffy2-1:~# less /var/log/babeld.log
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> send: Cannot assign requested address
> send: Cannot assign requested address
> send: Cannot assign requested address
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> netlink_read: recvmsg(): No buffer space available
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.
> Couldn't determine channel of interface sw00: Invalid argument.

This is a problem in babel detecting the channel on a "normal"
rather than a mesh interface. It's bugged me a long while, but
haven't got around to finding what triggers it. Might "fix" it by
acquiring the channel at babel start time from /etc/config/wireless.

It messes up the diversity routing calculation, grump.

There is a possibility a logfile got really big, but this one
generally doesn't, but I should turn off logging in some
future release...

> </quote>
>
> and then the system hang. Note: the first time I ran 'less
> /var/log/babeld.log', it was Killed. I would assume that's an OOM
> killer?

Yes it really sounds like you were out of memory. Your free does not
indicate that however.

> I'm running cerowrt 3.10.24-8 on a wndr3800. I have a spare wndr3800
> that I'm getting ready to flash with the same version and I can see if I get
> the same issue. What's the best way to clone a configuration?

scp -r therouter:/overlay somewheresafe

then reflash it clean

>
> ---
>
> Here are some log lines from a separate instance of the same problem, I
> recovered these log lines from the web server (I was able to login to
> the web interface for another hour until I got the same lua exception
> as in the beginning of this email)

You can setup remote syslog logging if you lke.

> This time I was completely unable to connect to wifi from my iPhone
> and you can see where hostapd disassociates the iPhone soon after
> it connects.
>
> the kernel log:
>
> [158995.550781] icmp6_send: no reply to icmp error
>
> [160930.527343] ath: phy0: Failed to stop TX DMA, queues=0x004!

We see this all the time for 3 years now.

> [160930.542968] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024
> AR_DIAG_SW=0x42000020 DMADBG_7=0x000084c0

that is kind of new.

>
> [160930.554687] ath: phy0: Could not stop RX, we could be confusing the DMA
> engine when we start RX up

So is that.

>
> and in the system log:
>
> Fri Jan 24 15:51:15 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 15:51:15 2014 daemon.err avahi-daemon[1273]: netlink.c: recvmsg()
> failed: No buffer space available
>
> Fri Jan 24 15:52:18 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
>
> root at buffy2-1:/tmp/log# free
>
>              total         used         free       shared      buffers
>
> Mem:        126336       110128        16208            0         5616
>
> -/+ buffers:             104512        21824
>
> Swap:            0            0            0
>
>
> Fri Jan 24 17:14:53 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 17:15:53 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: authenticated
>
> Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: associated (aid 1)
>
> Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> RADIUS: starting accounting session 52DC70A7-00000010
>
> Fri Jan 24 17:16:18 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> WPA: pairwise key handshake completed (RSN)
>
> Fri Jan 24 17:16:18 2014 daemon.info dnsmasq-dhcp[4999]: DHCPREQUEST(sw00)
> 192.168.1.114 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:16:18 2014 daemon.info dnsmasq-dhcp[4999]: DHCPNAK(sw00)
> 192.168.1.114 90:72:40:e9:9c:b6 wrong network
>
> Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPDISCOVER(sw00)
> 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPOFFER(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPDISCOVER(sw00)
> 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:16:21 2014 daemon.info dnsmasq-dhcp[4999]: DHCPOFFER(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:16:22 2014 daemon.info dnsmasq-dhcp[4999]: DHCPREQUEST(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:16:22 2014 daemon.info dnsmasq-dhcp[4999]: DHCPACK(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6 stevejs-iPhone
>
> Fri Jan 24 17:16:35 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: disassociated
>
> Fri Jan 24 17:16:36 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
>
> Fri Jan 24 17:16:57 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 17:17:59 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 17:19:04 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 17:19:04 2014 daemon.err avahi-daemon[1273]: netlink.c: recvmsg()
> failed: No buffer space available
>
> Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: authenticated
>
> Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: associated (aid 1)
>
> Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> RADIUS: starting accounting session 52DC70A7-00000011
>
> Fri Jan 24 17:19:45 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> WPA: pairwise key handshake completed (RSN)
>
> Fri Jan 24 17:19:45 2014 daemon.info dnsmasq-dhcp[4999]: DHCPDISCOVER(sw00)
> 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:19:45 2014 daemon.info dnsmasq-dhcp[4999]: DHCPOFFER(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:19:47 2014 daemon.info dnsmasq-dhcp[4999]: DHCPREQUEST(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6
>
> Fri Jan 24 17:19:47 2014 daemon.info dnsmasq-dhcp[4999]: DHCPACK(sw00)
> 172.30.42.82 90:72:40:e9:9c:b6 stevejs-iPhone
>
> Fri Jan 24 17:20:01 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: disassociated
>
> Fri Jan 24 17:20:02 2014 daemon.info hostapd: sw00: STA 90:72:40:e9:9c:b6
> IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
>
> Fri Jan 24 17:20:09 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> Fri Jan 24 17:21:10 2014 daemon.err minissdpd[4933]: recvmsg(s, &hdr, 0): No
> buffer space available
>
> and then it hung and rebooted.

You are running out of ram somewhere unobservable, and yes, I'd be
a little nervous that it isn't actually a memory or hw problem.

>
> I bought this particular wndr3800 used so it's possible I have a bad piece
> of hardware. Are there any hardware testing scripts or anything like
> memtest86 for these devices?  Are there entries in /sys that I should be
> looking at? I've noticed about 9k unaligned_instructions on boot but that
> never grows.

I have only observed 4 in the last 24 hours on this build:
https://lists.bufferbloat.net/pipermail/cerowrt-devel/2014-January/002093.html

>
> Thanks!
> Steve Jenson
> @stevej
>
>
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



More information about the Cerowrt-devel mailing list