<div dir="ltr">Looking more, the buffer errors are showing up in syslog well before tmpfs fills up. Is the memtester openwrt package available for cerowrt? I don't see it under `Available packages`.<div><br></div><div>
Thanks,</div><div>Steve</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson <span dir="ltr"><<a href="mailto:stevej@fruitless.org" target="_blank">stevej@fruitless.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div class="im">On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht <span dir="ltr"><<a href="mailto:dave.taht@gmail.com" target="_blank">dave.taht@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson <<a href="mailto:stevej@fruitless.org" target="_blank">stevej@fruitless.org</a>> wrote:<br>
> Hi everybody,<br>
><br>
> I've been using cerowrt as a secondary wifi network (just a single AP for<br>
> now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state and<br>
> eventually rebooted. I've had this happen a few times now and am looking for<br>
> ways to debug the issue. I'm new to cerowrt and openwrt so any advice is<br>
> appreciated.<br>
><br>
> Since I use it as a secondary network, this is no way critical.<br>
<br>
</div>Yea! I appreciate caution before putting alpha software on your gw.<br>
<div><br>
> I'm not<br>
> looking for free tech support but I couldn't find anything on the wiki about<br>
> troubleshooting. I'd love to start a page and write some shell scripts to<br>
> diagnose and report issues. I know that a cerowrt router is meant to be a<br>
> research project rather a consumer device but these things seem helpful<br>
> regardless.<br>
<br>
</div>Sure, let me know your wiki account. I have been lax about granting<br>
access of late as the signup process is overrun by spammers.</blockquote><div><br></div></div><div>My username is stevej on the wiki. Thanks! </div><div class="im"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>
> Please let me know if you'd prefer I not email the list with these issues or<br>
> if you'd rather I used trac or a different forum.<br>
><br>
<br>
</div>The list is where most stuff happens. Also in the irc channel.<br>
<br>
If it gets to where it needs to be tracked we have a bugtracker at<br>
<br>
<a href="http://www.bufferbloat.net/projects/cerowrt/issues" target="_blank">http://www.bufferbloat.net/projects/cerowrt/issues</a><br>
<br>
The first question I have is: Are you on comcast? Cerowrt<br>
had a dhcpv6-pd implementation that "just worked" from feburary through<br>
december. Regrettably they changed the RA announcement interval<br>
to a really low number around then... and this triggers a firewall reload<br>
every minute on everything prior to the release I point to below.<br>
<br>
If there is a memory leak somewhere that would have triggered it.</blockquote><div><br></div></div><div>I am on AT&T ADSL2+ with a Motorola NVG510 modem. </div><div><div class="h5"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div><div>> In this state, I can connect to the cerowrt base station via wifi but am<br>
> unable to route packets to the internet. I can connect to :81 and see the<br>
> login page but logging in results in a lua error at `/cgi-bin/luci`<br>
><br>
><br>
> /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function<br>
> dispatcher target for entry '/'.<br>
> The called action terminated with an exception:<br>
> /usr/lib/lua/luci/sauth.lua:87: Session data invalid!<br>
> stack traceback:<br>
> [C]: in function 'assert'<br>
> /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch'<br>
> /usr/lib/lua/luci/dispatcher.lua:195: in function<br>
> </usr/lib/lua/luci/dispatcher.lua:194><br>
><br>
> I can ssh into the device and cat various log files until the router hangs<br>
> and reboots. here's a few relevant lines from my terminal history before the<br>
> device rebooted (I'm assuming a watchdog kicked in and rebooted it).<br>
><br>
> root@buffy2-1:~# ping <a href="http://google.com" target="_blank">google.com</a><br>
> ping: bad address '<a href="http://google.com" target="_blank">google.com</a>'<br>
> root@buffy2-1:~# free<br>
> total used free shared buffers<br>
> Mem: 126336 110332 16004 0 5616<br>
> -/+ buffers: 104716 21620<br>
> Swap: 0 0 0<br>
> root@buffy2-1:~# uptime<br>
> 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17<br>
> root@buffy2-1:~# dmesg<br>
> [ 0.000000] Linux version 3.10.24 (cero2@snapon) (gcc version 4.6.4<br>
> (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24<br>
> 10:50:15 PST 2013<br>
> [skipping some lines]<br>
><br>
> [ 13.156250] Error: Driver 'gpio-keys-polled' is already registered,<br>
> aborting...<br>
> [ 19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready<br>
> [ 19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000<br>
> [ 19.429687] se00: link up (1000Mbps/Full duplex)<br>
> [ 22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready<br>
> [ 23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes ready<br>
> [ 23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000<br>
> [ 23.757812] ge00: link up (1000Mbps/Full duplex)<br>
> [ 23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes ready<br>
><br>
> root@buffy2-1:~# ifconfig<br>
> ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1<br>
> inet addr:192.168.1.138 Bcast:192.168.1.255 Mask:255.255.255.0<br>
> inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link<br>
> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Global<br>
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1<br>
> RX packets:1469670 errors:0 dropped:8 overruns:0 frame:0<br>
> TX packets:547733 errors:0 dropped:0 overruns:0 carrier:0<br>
> collisions:0 txqueuelen:1000<br>
> RX bytes:229243410 (218.6 MiB) TX bytes:57304808 (54.6 MiB)<br>
> Interrupt:5<br>
><br>
> lo Link encap:Local Loopback<br>
> inet addr:127.0.0.1 Mask:255.0.0.0<br>
> inet6 addr: ::1/128 Scope:Host<br>
> UP LOOPBACK RUNNING MTU:65536 Metric:1<br>
> RX packets:23689 errors:0 dropped:0 overruns:0 frame:0<br>
> TX packets:23689 errors:0 dropped:0 overruns:0 carrier:0<br>
> collisions:0 txqueuelen:0<br>
> RX bytes:2612713 (2.4 MiB) TX bytes:2612713 (2.4 MiB)<br>
><br>
> pimreg Link encap:UNSPEC HWaddr<br>
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00<br>
> UP RUNNING NOARP MTU:1472 Metric:1<br>
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0<br>
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0<br>
> collisions:0 txqueuelen:0<br>
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)<br>
><br>
> se00 Link encap:Ethernet HWaddr 2E:B0:5D:A0:C5:B0<br>
> inet addr:172.30.42.1 Bcast:172.30.42.31 Mask:255.255.255.224<br>
> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Global<br>
<br>
</div></div>How are you assigning your ipv6 addresses?<br></blockquote><div><br></div></div></div><div>It's been a while since I messed with this but I think IPv6 is assigned thanks to 6relayd? My modem has IPv6 enabled but no DHCPv6 options that I can find. Here's how cerowrt is configured.<br>
</div><div><br></div>root@buffy2-1:/overlay/etc/config# cat 6relayd<br>config server 'default'<br> option fallback_relay 'rd dhcpv6 ndp'<br> list network 'ge00'<br> list network 'ge01'<br>
list network 'gw00'<br> list network 'gw01'<br> list network 'gw10'<br> list network 'gw11'<br> list network 'se00'<br> list network 'sw00'<br> list network 'sw10'<br>
option rd 'relay'<br> option dhcpv6 'relay'<br> option ndp 'relay'<br> option master 'ge00'<div><div class="h5"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div><div><br>
> inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link<br>
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1<br>
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0<br>
> TX packets:191740 errors:0 dropped:0 overruns:0 carrier:0<br>
> collisions:0 txqueuelen:1000<br>
> RX bytes:0 (0.0 B) TX bytes:42184988 (40.2 MiB)<br>
> Interrupt:4<br>
><br>
> sw00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B0<br>
> inet addr:172.30.42.65 Bcast:172.30.42.95 Mask:255.255.255.224<br>
> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 Scope:Global<br>
> inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link<br>
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1<br>
> RX packets:70239 errors:0 dropped:0 overruns:0 frame:0<br>
> TX packets:286967 errors:0 dropped:0 overruns:0 carrier:0<br>
> collisions:0 txqueuelen:1000<br>
> RX bytes:15590189 (14.8 MiB) TX bytes:127357293 (121.4 MiB)<br>
><br>
> root@buffy2-1:~# less /var/log/babeld.log<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> send: Cannot assign requested address<br>
> send: Cannot assign requested address<br>
> send: Cannot assign requested address<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> netlink_read: recvmsg(): No buffer space available<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
> Couldn't determine channel of interface sw00: Invalid argument.<br>
<br>
</div></div>This is a problem in babel detecting the channel on a "normal"<br>
rather than a mesh interface. It's bugged me a long while, but<br>
haven't got around to finding what triggers it. Might "fix" it by<br>
acquiring the channel at babel start time from /etc/config/wireless.<br>
<br>
It messes up the diversity routing calculation, grump.<br>
<br>
There is a possibility a logfile got really big, but this one<br>
generally doesn't, but I should turn off logging in some<br>
future release...</blockquote><div><br></div></div></div><div>I believe I've tracked down part of what's going on. It looks like my tmpfs is filling up 100% and then the device enters a bad state:</div><div><br></div>
<div>After 24 hours, with tmpfs at 50%, babeld.log is the largest file by far in tmpfs and the only file that appears to be growing (based on `du`). It takes about 48 hours from reboot to fill up tmpfs on my device.</div>
<div><br></div><div>
<p># sort babeld.log | uniq -c |sort -rn |head<br></p></div><div>
<p> 503236 Couldn't determine channel of interface sw00: Invalid argument.</p>
<p> 1376 netlink_read: recvmsg(): No buffer space available</p>
<p> 3 send: Cannot assign requested address</p></div><div><div><p># wc -l babeld.log</p><p>504617 babeld.log</p></div></div><div>I sped up system failure by using `dd` to fill up tmpfs and the system became immediately unusable.</div>
<div><br></div><div>This also explains the luci session store errors as sessions are stored in tmpfs.</div><div><br></div><div>The other buffer issues may or may not be related to this.</div><div><br></div><div>Best,</div>
<div>Steve</div></div></div></div>
</blockquote></div><br></div>