From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-x22d.google.com (mail-ig0-x22d.google.com [IPv6:2607:f8b0:4001:c05::22d]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 8361121F09E for ; Mon, 27 Jan 2014 13:14:03 -0800 (PST) Received: by mail-ig0-f173.google.com with SMTP id c10so10212967igq.0 for ; Mon, 27 Jan 2014 13:14:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=RUDZfXnMX4zhmYoQ/D+kYUxAMu9C2ACBMV6eGCov7uc=; b=K4ljmsnNdmePE10cbX0N8OSjDewLeu2FeVtcTbTDsYxIP9JgsF7MkohEGPa0A4qFdB hAFdTs0ZmUSOf9C/sGqEyfGp9xL8Js1oFmvlUmfMcD9uHIqtOSv8pnBqE/TMqMWrWhGp dn3svHImetQxkUTD8Xo5Ko7rxx9ZhIqS1utdkeX4c911wXJfeS1g6NclRYVa2UqK4Z2r b+yUWeCLgq3GZ3O5XG9iXhw9MhKiSjNqCNxnR5t+yUJNqgnX/QqOTv2/aVolE+0KNVu9 Nc6CCX4/1XzoyKeCmcGXWIDOEjY1u576D4CwOihh+5Xd53KE28rI4uxQxYrT5ndOik+o 5sqw== MIME-Version: 1.0 X-Received: by 10.50.25.41 with SMTP id z9mr19453841igf.10.1390857242581; Mon, 27 Jan 2014 13:14:02 -0800 (PST) Received: by 10.64.145.67 with HTTP; Mon, 27 Jan 2014 13:14:02 -0800 (PST) In-Reply-To: References: Date: Mon, 27 Jan 2014 16:14:02 -0500 Message-ID: From: Dave Taht To: Steve Jenson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] cerowrt issues (3.10.24-8) X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jan 2014 21:14:03 -0000 certainly turn off the babeld log! I will leave it off in the next release. On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson wrote: > Looking more, the buffer errors are showing up in syslog well before tmpf= s > fills up. Is the memtester openwrt package available for cerowrt? I don't > see it under `Available packages`. > > Thanks, > Steve > > > On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson wrot= e: >> >> On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht wrote: >>> >>> On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson >>> wrote: >>> > Hi everybody, >>> > >>> > I've been using cerowrt as a secondary wifi network (just a single AP >>> > for >>> > now) for a few weeks now. Recently, my wndr3800 got stuck in a bad >>> > state and >>> > eventually rebooted. I've had this happen a few times now and am >>> > looking for >>> > ways to debug the issue. I'm new to cerowrt and openwrt so any advice >>> > is >>> > appreciated. >>> > >>> > Since I use it as a secondary network, this is no way critical. >>> >>> Yea! I appreciate caution before putting alpha software on your gw. >>> >>> > I'm not >>> > looking for free tech support but I couldn't find anything on the wik= i >>> > about >>> > troubleshooting. I'd love to start a page and write some shell script= s >>> > to >>> > diagnose and report issues. I know that a cerowrt router is meant to = be >>> > a >>> > research project rather a consumer device but these things seem helpf= ul >>> > regardless. >>> >>> Sure, let me know your wiki account. I have been lax about granting >>> access of late as the signup process is overrun by spammers. >> >> >> My username is stevej on the wiki. Thanks! >> >> >>> >>> > Please let me know if you'd prefer I not email the list with these >>> > issues or >>> > if you'd rather I used trac or a different forum. >>> > >>> >>> The list is where most stuff happens. Also in the irc channel. >>> >>> If it gets to where it needs to be tracked we have a bugtracker at >>> >>> http://www.bufferbloat.net/projects/cerowrt/issues >>> >>> The first question I have is: Are you on comcast? Cerowrt >>> had a dhcpv6-pd implementation that "just worked" from feburary through >>> december. Regrettably they changed the RA announcement interval >>> to a really low number around then... and this triggers a firewall relo= ad >>> every minute on everything prior to the release I point to below. >>> >>> If there is a memory leak somewhere that would have triggered it. >> >> >> I am on AT&T ADSL2+ with a Motorola NVG510 modem. >> >> >>> >>> > In this state, I can connect to the cerowrt base station via wifi but >>> > am >>> > unable to route packets to the internet. I can connect to :81 and see >>> > the >>> > login page but logging in results in a lua error at `/cgi-bin/luci` >>> > >>> > >>> > /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function >>> > dispatcher target for entry '/'. >>> > The called action terminated with an exception: >>> > /usr/lib/lua/luci/sauth.lua:87: Session data invalid! >>> > stack traceback: >>> > [C]: in function 'assert' >>> > /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' >>> > /usr/lib/lua/luci/dispatcher.lua:195: in function >>> > >>> > >>> > I can ssh into the device and cat various log files until the router >>> > hangs >>> > and reboots. here's a few relevant lines from my terminal history >>> > before the >>> > device rebooted (I'm assuming a watchdog kicked in and rebooted it). >>> > >>> > root@buffy2-1:~# ping google.com >>> > ping: bad address 'google.com' >>> > root@buffy2-1:~# free >>> > total used free shared buffer= s >>> > Mem: 126336 110332 16004 0 561= 6 >>> > -/+ buffers: 104716 21620 >>> > Swap: 0 0 0 >>> > root@buffy2-1:~# uptime >>> > 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17 >>> > root@buffy2-1:~# dmesg >>> > [ 0.000000] Linux version 3.10.24 (cero2@snapon) (gcc version 4.6.= 4 >>> > (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24 >>> > 10:50:15 PST 2013 >>> > [skipping some lines] >>> > >>> > [ 13.156250] Error: Driver 'gpio-keys-polled' is already registered= , >>> > aborting... >>> > [ 19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready >>> > [ 19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000 >>> > [ 19.429687] se00: link up (1000Mbps/Full duplex) >>> > [ 22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready >>> > [ 23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes read= y >>> > [ 23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000 >>> > [ 23.757812] ge00: link up (1000Mbps/Full duplex) >>> > [ 23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes read= y >>> > >>> > root@buffy2-1:~# ifconfig >>> > ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1 >>> > inet addr:192.168.1.138 Bcast:192.168.1.255 >>> > Mask:255.255.255.0 >>> > inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link >>> > inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>> > Scope:Global >>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> > RX packets:1469670 errors:0 dropped:8 overruns:0 frame:0 >>> > TX packets:547733 errors:0 dropped:0 overruns:0 carrier:0 >>> > collisions:0 txqueuelen:1000 >>> > RX bytes:229243410 (218.6 MiB) TX bytes:57304808 (54.6 MiB= ) >>> > Interrupt:5 >>> > >>> > lo Link encap:Local Loopback >>> > inet addr:127.0.0.1 Mask:255.0.0.0 >>> > inet6 addr: ::1/128 Scope:Host >>> > UP LOOPBACK RUNNING MTU:65536 Metric:1 >>> > RX packets:23689 errors:0 dropped:0 overruns:0 frame:0 >>> > TX packets:23689 errors:0 dropped:0 overruns:0 carrier:0 >>> > collisions:0 txqueuelen:0 >>> > RX bytes:2612713 (2.4 MiB) TX bytes:2612713 (2.4 MiB) >>> > >>> > pimreg Link encap:UNSPEC HWaddr >>> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>> > UP RUNNING NOARP MTU:1472 Metric:1 >>> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>> > collisions:0 txqueuelen:0 >>> > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) >>> > >>> > se00 Link encap:Ethernet HWaddr 2E:B0:5D:A0:C5:B0 >>> > inet addr:172.30.42.1 Bcast:172.30.42.31 >>> > Mask:255.255.255.224 >>> > inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>> > Scope:Global >>> >>> How are you assigning your ipv6 addresses? >> >> >> It's been a while since I messed with this but I think IPv6 is assigned >> thanks to 6relayd? My modem has IPv6 enabled but no DHCPv6 options that = I >> can find. Here's how cerowrt is configured. >> >> root@buffy2-1:/overlay/etc/config# cat 6relayd >> config server 'default' >> option fallback_relay 'rd dhcpv6 ndp' >> list network 'ge00' >> list network 'ge01' >> list network 'gw00' >> list network 'gw01' >> list network 'gw10' >> list network 'gw11' >> list network 'se00' >> list network 'sw00' >> list network 'sw10' >> option rd 'relay' >> option dhcpv6 'relay' >> option ndp 'relay' >> option master 'ge00' >> >>> >>> > inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link >>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>> > TX packets:191740 errors:0 dropped:0 overruns:0 carrier:0 >>> > collisions:0 txqueuelen:1000 >>> > RX bytes:0 (0.0 B) TX bytes:42184988 (40.2 MiB) >>> > Interrupt:4 >>> > >>> > sw00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B0 >>> > inet addr:172.30.42.65 Bcast:172.30.42.95 >>> > Mask:255.255.255.224 >>> > inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>> > Scope:Global >>> > inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link >>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> > RX packets:70239 errors:0 dropped:0 overruns:0 frame:0 >>> > TX packets:286967 errors:0 dropped:0 overruns:0 carrier:0 >>> > collisions:0 txqueuelen:1000 >>> > RX bytes:15590189 (14.8 MiB) TX bytes:127357293 (121.4 MiB= ) >>> > >>> > root@buffy2-1:~# less /var/log/babeld.log >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > send: Cannot assign requested address >>> > send: Cannot assign requested address >>> > send: Cannot assign requested address >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > netlink_read: recvmsg(): No buffer space available >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> > Couldn't determine channel of interface sw00: Invalid argument. >>> >>> This is a problem in babel detecting the channel on a "normal" >>> rather than a mesh interface. It's bugged me a long while, but >>> haven't got around to finding what triggers it. Might "fix" it by >>> acquiring the channel at babel start time from /etc/config/wireless. >>> >>> It messes up the diversity routing calculation, grump. >>> >>> There is a possibility a logfile got really big, but this one >>> generally doesn't, but I should turn off logging in some >>> future release... >> >> >> I believe I've tracked down part of what's going on. It looks like my >> tmpfs is filling up 100% and then the device enters a bad state: >> >> After 24 hours, with tmpfs at 50%, babeld.log is the largest file by far >> in tmpfs and the only file that appears to be growing (based on `du`). I= t >> takes about 48 hours from reboot to fill up tmpfs on my device. >> >> # sort babeld.log | uniq -c |sort -rn |head >> >> 503236 Couldn't determine channel of interface sw00: Invalid argument. >> >> 1376 netlink_read: recvmsg(): No buffer space available >> >> 3 send: Cannot assign requested address >> >> # wc -l babeld.log >> >> 504617 babeld.log >> >> I sped up system failure by using `dd` to fill up tmpfs and the system >> became immediately unusable. >> >> This also explains the luci session store errors as sessions are stored = in >> tmpfs. >> >> The other buffer issues may or may not be related to this. >> >> Best, >> Steve > > --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.= html