[Cerowrt-devel] cerowrt issues (3.10.24-8)

Steve Jenson stevej at fruitless.org
Mon Jan 27 16:06:55 EST 2014


On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht <dave.taht at gmail.com> wrote:

> On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson <stevej at fruitless.org>
> wrote:
> > Hi everybody,
> >
> > I've been using cerowrt as a secondary wifi network (just a single AP for
> > now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state
> and
> > eventually rebooted. I've had this happen a few times now and am looking
> for
> > ways to debug the issue. I'm new to cerowrt and openwrt so any advice is
> > appreciated.
> >
> > Since I use it as a secondary network, this is no way critical.
>
> Yea! I appreciate caution before putting alpha software on your gw.
>
> > I'm not
> > looking for free tech support but I couldn't find anything on the wiki
> about
> > troubleshooting. I'd love to start a page and write some shell scripts to
> > diagnose and report issues. I know that a cerowrt router is meant to be a
> > research project rather a consumer device but these things seem helpful
> > regardless.
>
> Sure, let me know your wiki account. I have been lax about granting
> access of late as the signup process is overrun by spammers.


My username is stevej on the wiki. Thanks!



>  > Please let me know if you'd prefer I not email the list with these
> issues or
> > if you'd rather I used trac or a different forum.
> >
>
> The list is where most stuff happens. Also in the irc channel.
>
> If it gets to where it needs to be tracked we have a bugtracker at
>
> http://www.bufferbloat.net/projects/cerowrt/issues
>
> The first question I have is: Are you on comcast? Cerowrt
> had a dhcpv6-pd implementation that "just worked" from feburary through
> december. Regrettably they changed the RA announcement interval
> to a really low number around then... and this triggers a firewall reload
> every minute on everything prior to the release I point to below.
>
> If there is a memory leak somewhere that would have triggered it.


I am on AT&T ADSL2+ with a Motorola NVG510 modem.



> > In this state, I can connect to the cerowrt base station via wifi but am
> > unable to route packets to the internet. I can connect to :81 and see the
> > login page but logging in results in a lua error at `/cgi-bin/luci`
> >
> >
> >   /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function
> > dispatcher target for entry '/'.
> >   The called action terminated with an exception:
> >   /usr/lib/lua/luci/sauth.lua:87: Session data invalid!
> >   stack traceback:
> >   [C]: in function 'assert'
> >   /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch'
> >   /usr/lib/lua/luci/dispatcher.lua:195: in function
> > </usr/lib/lua/luci/dispatcher.lua:194>
> >
> > I can ssh into the device and cat various log files until the router
> hangs
> > and reboots. here's a few relevant lines from my terminal history before
> the
> > device rebooted (I'm assuming a watchdog kicked in and rebooted it).
> >
> > root at buffy2-1:~# ping google.com
> > ping: bad address 'google.com'
> > root at buffy2-1:~# free
> >              total         used         free       shared      buffers
> > Mem:        126336       110332        16004            0         5616
> > -/+ buffers:             104716        21620
> > Swap:            0            0            0
> > root at buffy2-1:~# uptime
> >  02:08:54 up 2 days,  1:26,  load average: 0.10, 0.21, 0.17
> > root at buffy2-1:~# dmesg
> > [    0.000000] Linux version 3.10.24 (cero2 at snapon) (gcc version 4.6.4
> > (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24
> > 10:50:15 PST 2013
> > [skipping some lines]
> >
> > [   13.156250] Error: Driver 'gpio-keys-polled' is already registered,
> > aborting...
> > [   19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready
> > [   19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000
> > [   19.429687] se00: link up (1000Mbps/Full duplex)
> > [   22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready
> > [   23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes ready
> > [   23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000
> > [   23.757812] ge00: link up (1000Mbps/Full duplex)
> > [   23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes ready
> >
> > root at buffy2-1:~# ifconfig
> > ge00      Link encap:Ethernet  HWaddr 2C:B0:5D:A0:C5:B1
> >           inet addr:192.168.1.138  Bcast:192.168.1.255
>  Mask:255.255.255.0
> >           inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link
> >           inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64
> Scope:Global
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:1469670 errors:0 dropped:8 overruns:0 frame:0
> >           TX packets:547733 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:229243410 (218.6 MiB)  TX bytes:57304808 (54.6 MiB)
> >           Interrupt:5
> >
> > lo        Link encap:Local Loopback
> >           inet addr:127.0.0.1  Mask:255.0.0.0
> >           inet6 addr: ::1/128 Scope:Host
> >           UP LOOPBACK RUNNING  MTU:65536  Metric:1
> >           RX packets:23689 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:23689 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:2612713 (2.4 MiB)  TX bytes:2612713 (2.4 MiB)
> >
> > pimreg    Link encap:UNSPEC  HWaddr
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> >           UP RUNNING NOARP  MTU:1472  Metric:1
> >           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> >
> > se00      Link encap:Ethernet  HWaddr 2E:B0:5D:A0:C5:B0
> >           inet addr:172.30.42.1  Bcast:172.30.42.31  Mask:255.255.255.224
> >           inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64
> Scope:Global
>
> How are you assigning your ipv6 addresses?
>

It's been a while since I messed with this but I think IPv6 is assigned
thanks to 6relayd? My modem has IPv6 enabled but no DHCPv6 options that I
can find. Here's how cerowrt is configured.

root at buffy2-1:/overlay/etc/config# cat 6relayd
config server 'default'
  option fallback_relay 'rd dhcpv6 ndp'
  list network 'ge00'
  list network 'ge01'
  list network 'gw00'
  list network 'gw01'
  list network 'gw10'
  list network 'gw11'
  list network 'se00'
  list network 'sw00'
  list network 'sw10'
  option rd 'relay'
  option dhcpv6 'relay'
  option ndp 'relay'
  option master 'ge00'


> >           inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:191740 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:0 (0.0 B)  TX bytes:42184988 (40.2 MiB)
> >           Interrupt:4
> >
> > sw00      Link encap:Ethernet  HWaddr 2C:B0:5D:A0:C5:B0
> >           inet addr:172.30.42.65  Bcast:172.30.42.95
>  Mask:255.255.255.224
> >           inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64
> Scope:Global
> >           inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:70239 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:286967 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:15590189 (14.8 MiB)  TX bytes:127357293 (121.4 MiB)
> >
> > root at buffy2-1:~# less /var/log/babeld.log
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > send: Cannot assign requested address
> > send: Cannot assign requested address
> > send: Cannot assign requested address
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > netlink_read: recvmsg(): No buffer space available
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
> > Couldn't determine channel of interface sw00: Invalid argument.
>
> This is a problem in babel detecting the channel on a "normal"
> rather than a mesh interface. It's bugged me a long while, but
> haven't got around to finding what triggers it. Might "fix" it by
> acquiring the channel at babel start time from /etc/config/wireless.
>
> It messes up the diversity routing calculation, grump.
>
> There is a possibility a logfile got really big, but this one
> generally doesn't, but I should turn off logging in some
> future release...


I believe I've tracked down part of what's going on. It looks like my tmpfs
is filling up 100% and then the device enters a bad state:

After 24 hours, with tmpfs at 50%, babeld.log is the largest file by far in
tmpfs and the only file that appears to be growing (based on `du`). It
takes about 48 hours from reboot to fill up tmpfs on my device.

#  sort babeld.log | uniq -c |sort -rn |head

 503236 Couldn't determine channel of interface sw00: Invalid argument.

   1376 netlink_read: recvmsg(): No buffer space available

      3 send: Cannot assign requested address

# wc -l babeld.log

504617 babeld.log
I sped up system failure by using `dd` to fill up tmpfs and the system
became immediately unusable.

This also explains the luci session store errors as sessions are stored in
tmpfs.

The other buffer issues may or may not be related to this.

Best,
Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140127/a1f20094/attachment-0002.html>


More information about the Cerowrt-devel mailing list