From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 5EF8F21F113 for ; Wed, 29 Jan 2014 09:45:09 -0800 (PST) Received: from [10.133.166.167] ([80.187.96.26]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0MZ8fw-1VpvLZ0Wia-00KuMV for ; Wed, 29 Jan 2014 18:45:06 +0100 User-Agent: K-9 Mail for Android In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable From: Sebastian Moeller Date: Wed, 29 Jan 2014 18:44:52 +0100 To: Dave Taht Message-ID: X-Provags-ID: V03:K0:KwqHbfIZmYlocsPpltaxiVcCMw4udZ7W/xf4nLM8Z32SS62+Pf0 UCxSC61S+cfTLf6T1NfVk1U+75k6vUYNtKp/FemNVuD3nM5IGQgogs+QsdlaUGcq1OhdAvu AUeojFz9rmoA2NdWb+sjvnmEJdYxCXtQtgc6TRUInWW8aZHhjG6rK4rwKog07XXDLjUAxoX 7hPBZ7jOoN7J0GVSjFBmQ== Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] cerowrt issues (3.10.24-8) X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jan 2014 17:45:09 -0000 On January 29, 2014 5:10:18 PM CET, Dave Taht wro= te: >On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller >wrote: >> Hi Dave, >> >> quick question, how does one turn of logging for babeld? It seems >that if daemonized it defaults to logging to /var/log/babeld=2Elog (or >similar)=2E Is setting the log file to /dev/null really the answer? > >seems so=2E =20 Okay, I guess I will try that then=2E=2E=2E > >> (Since I have no the IPv6 issue not yet resolved, I assume >babeld is unhappy) I resorted to stopping babeld completely, but that >feels like a crutch=2E=2E=2E > >no daemon in an embedded system should ever write to flash in an >uncontrollable manner=2E What bugs me is that it basically keeps repeating the same error over= and over again=2E If it would rate limit and:or push messages to the syste= m log it would be nicer=2E > >I will also argue that not being able to find the channel is a bug >that messes with diversity routing >in particular=2E I have actually not yet understood what it wants to tell me ;), sinc= e I got your attention, is there an easy way to run a babel client under ma= cosx? Best Regards Sebastian > >> >> Best >> Sebastian >> >> >> >> On Jan 27, 2014, at 22:14 , Dave Taht wrote: >> >>> certainly turn off the babeld log! I will leave it off in the next >release=2E >>> >>> On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson >wrote: >>>> Looking more, the buffer errors are showing up in syslog well >before tmpfs >>>> fills up=2E Is the memtester openwrt package available for cerowrt? I >don't >>>> see it under `Available packages`=2E >>>> >>>> Thanks, >>>> Steve >>>> >>>> >>>> On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson > wrote: >>>>> >>>>> On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht >wrote: >>>>>> >>>>>> On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson > >>>>>> wrote: >>>>>>> Hi everybody, >>>>>>> >>>>>>> I've been using cerowrt as a secondary wifi network (just a >single AP >>>>>>> for >>>>>>> now) for a few weeks now=2E Recently, my wndr3800 got stuck in a >bad >>>>>>> state and >>>>>>> eventually rebooted=2E I've had this happen a few times now and am >>>>>>> looking for >>>>>>> ways to debug the issue=2E I'm new to cerowrt and openwrt so any >advice >>>>>>> is >>>>>>> appreciated=2E >>>>>>> >>>>>>> Since I use it as a secondary network, this is no way critical=2E >>>>>> >>>>>> Yea! I appreciate caution before putting alpha software on your >gw=2E >>>>>> >>>>>>> I'm not >>>>>>> looking for free tech support but I couldn't find anything on >the wiki >>>>>>> about >>>>>>> troubleshooting=2E I'd love to start a page and write some shell >scripts >>>>>>> to >>>>>>> diagnose and report issues=2E I know that a cerowrt router is >meant to be >>>>>>> a >>>>>>> research project rather a consumer device but these things seem >helpful >>>>>>> regardless=2E >>>>>> >>>>>> Sure, let me know your wiki account=2E I have been lax about >granting >>>>>> access of late as the signup process is overrun by spammers=2E >>>>> >>>>> >>>>> My username is stevej on the wiki=2E Thanks! >>>>> >>>>> >>>>>> >>>>>>> Please let me know if you'd prefer I not email the list with >these >>>>>>> issues or >>>>>>> if you'd rather I used trac or a different forum=2E >>>>>>> >>>>>> >>>>>> The list is where most stuff happens=2E Also in the irc channel=2E >>>>>> >>>>>> If it gets to where it needs to be tracked we have a bugtracker >at >>>>>> >>>>>> http://www=2Ebufferbloat=2Enet/projects/cerowrt/issues >>>>>> >>>>>> The first question I have is: Are you on comcast? Cerowrt >>>>>> had a dhcpv6-pd implementation that "just worked" from feburary >through >>>>>> december=2E Regrettably they changed the RA announcement interval >>>>>> to a really low number around then=2E=2E=2E and this triggers a >firewall reload >>>>>> every minute on everything prior to the release I point to below=2E >>>>>> >>>>>> If there is a memory leak somewhere that would have triggered it=2E >>>>> >>>>> >>>>> I am on AT&T ADSL2+ with a Motorola NVG510 modem=2E >>>>> >>>>> >>>>>> >>>>>>> In this state, I can connect to the cerowrt base station via >wifi but >>>>>>> am >>>>>>> unable to route packets to the internet=2E I can connect to :81 >and see >>>>>>> the >>>>>>> login page but logging in results in a lua error at >`/cgi-bin/luci` >>>>>>> >>>>>>> >>>>>>> /usr/lib/lua/luci/dispatcher=2Elua:448: Failed to execute >function >>>>>>> dispatcher target for entry '/'=2E >>>>>>> The called action terminated with an exception: >>>>>>> /usr/lib/lua/luci/sauth=2Elua:87: Session data invalid! >>>>>>> stack traceback: >>>>>>> [C]: in function 'assert' >>>>>>> /usr/lib/lua/luci/dispatcher=2Elua:448: in function 'dispatch' >>>>>>> /usr/lib/lua/luci/dispatcher=2Elua:195: in function >>>>>>> >>>>>>> >>>>>>> I can ssh into the device and cat various log files until the >router >>>>>>> hangs >>>>>>> and reboots=2E here's a few relevant lines from my terminal >history >>>>>>> before the >>>>>>> device rebooted (I'm assuming a watchdog kicked in and rebooted >it)=2E >>>>>>> >>>>>>> root@buffy2-1:~# ping google=2Ecom >>>>>>> ping: bad address 'google=2Ecom' >>>>>>> root@buffy2-1:~# free >>>>>>> total used free shared =20 >buffers >>>>>>> Mem: 126336 110332 16004 0 =20 > 5616 >>>>>>> -/+ buffers: 104716 21620 >>>>>>> Swap: 0 0 0 >>>>>>> root@buffy2-1:~# uptime >>>>>>> 02:08:54 up 2 days, 1:26, load average: 0=2E10, 0=2E21, 0=2E17 >>>>>>> root@buffy2-1:~# dmesg >>>>>>> [ 0=2E000000] Linux version 3=2E10=2E24 (cero2@snapon) (gcc ver= sion >4=2E6=2E4 >>>>>>> (OpenWrt/Linaro GCC 4=2E6-2013=2E05 r38226) ) #1 Tue Dec 24 >>>>>>> 10:50:15 PST 2013 >>>>>>> [skipping some lines] >>>>>>> >>>>>>> [ 13=2E156250] Error: Driver 'gpio-keys-polled' is already >registered, >>>>>>> aborting=2E=2E=2E >>>>>>> [ 19=2E414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not >ready >>>>>>> [ 19=2E421875] ar71xx: pll_reg 0xb8050010: 0x11110000 >>>>>>> [ 19=2E429687] se00: link up (1000Mbps/Full duplex) >>>>>>> [ 22=2E140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not >ready >>>>>>> [ 23=2E351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes >ready >>>>>>> [ 23=2E757812] ar71xx: pll_reg 0xb8050014: 0x11110000 >>>>>>> [ 23=2E757812] ge00: link up (1000Mbps/Full duplex) >>>>>>> [ 23=2E773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes >ready >>>>>>> >>>>>>> root@buffy2-1:~# ifconfig >>>>>>> ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1 >>>>>>> inet addr:192=2E168=2E1=2E138 Bcast:192=2E168=2E1=2E255 >>>>>>> Mask:255=2E255=2E255=2E0 >>>>>>> inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link >>>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>>>>>> Scope:Global >>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>>> RX packets:1469670 errors:0 dropped:8 overruns:0 >frame:0 >>>>>>> TX packets:547733 errors:0 dropped:0 overruns:0 >carrier:0 >>>>>>> collisions:0 txqueuelen:1000 >>>>>>> RX bytes:229243410 (218=2E6 MiB) TX bytes:57304808 (54= =2E6 >MiB) >>>>>>> Interrupt:5 >>>>>>> >>>>>>> lo Link encap:Local Loopback >>>>>>> inet addr:127=2E0=2E0=2E1 Mask:255=2E0=2E0=2E0 >>>>>>> inet6 addr: ::1/128 Scope:Host >>>>>>> UP LOOPBACK RUNNING MTU:65536 Metric:1 >>>>>>> RX packets:23689 errors:0 dropped:0 overruns:0 frame:0 >>>>>>> TX packets:23689 errors:0 dropped:0 overruns:0 >carrier:0 >>>>>>> collisions:0 txqueuelen:0 >>>>>>> RX bytes:2612713 (2=2E4 MiB) TX bytes:2612713 (2=2E4 MiB= ) >>>>>>> >>>>>>> pimreg Link encap:UNSPEC HWaddr >>>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>>>>>> UP RUNNING NOARP MTU:1472 Metric:1 >>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>>> collisions:0 txqueuelen:0 >>>>>>> RX bytes:0 (0=2E0 B) TX bytes:0 (0=2E0 B) >>>>>>> >>>>>>> se00 Link encap:Ethernet HWaddr 2E:B0:5D:A0:C5:B0 >>>>>>> inet addr:172=2E30=2E42=2E1 Bcast:172=2E30=2E42=2E31 >>>>>>> Mask:255=2E255=2E255=2E224 >>>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>>>>>> Scope:Global >>>>>> >>>>>> How are you assigning your ipv6 addresses? >>>>> >>>>> >>>>> It's been a while since I messed with this but I think IPv6 is >assigned >>>>> thanks to 6relayd? My modem has IPv6 enabled but no DHCPv6 options >that I >>>>> can find=2E Here's how cerowrt is configured=2E >>>>> >>>>> root@buffy2-1:/overlay/etc/config# cat 6relayd >>>>> config server 'default' >>>>> option fallback_relay 'rd dhcpv6 ndp' >>>>> list network 'ge00' >>>>> list network 'ge01' >>>>> list network 'gw00' >>>>> list network 'gw01' >>>>> list network 'gw10' >>>>> list network 'gw11' >>>>> list network 'se00' >>>>> list network 'sw00' >>>>> list network 'sw10' >>>>> option rd 'relay' >>>>> option dhcpv6 'relay' >>>>> option ndp 'relay' >>>>> option master 'ge00' >>>>> >>>>>> >>>>>>> inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link >>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>> TX packets:191740 errors:0 dropped:0 overruns:0 >carrier:0 >>>>>>> collisions:0 txqueuelen:1000 >>>>>>> RX bytes:0 (0=2E0 B) TX bytes:42184988 (40=2E2 MiB) >>>>>>> Interrupt:4 >>>>>>> >>>>>>> sw00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B0 >>>>>>> inet addr:172=2E30=2E42=2E65 Bcast:172=2E30=2E42=2E95 >>>>>>> Mask:255=2E255=2E255=2E224 >>>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>>>>>> Scope:Global >>>>>>> inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link >>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>>> RX packets:70239 errors:0 dropped:0 overruns:0 frame:0 >>>>>>> TX packets:286967 errors:0 dropped:0 overruns:0 >carrier:0 >>>>>>> collisions:0 txqueuelen:1000 >>>>>>> RX bytes:15590189 (14=2E8 MiB) TX bytes:127357293 (121= =2E4 >MiB) >>>>>>> >>>>>>> root@buffy2-1:~# less /var/log/babeld=2Elog >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> send: Cannot assign requested address >>>>>>> send: Cannot assign requested address >>>>>>> send: Cannot assign requested address >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> netlink_read: recvmsg(): No buffer space available >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>>> Couldn't determine channel of interface sw00: Invalid argument=2E >>>>>> >>>>>> This is a problem in babel detecting the channel on a "normal" >>>>>> rather than a mesh interface=2E It's bugged me a long while, but >>>>>> haven't got around to finding what triggers it=2E Might "fix" it by >>>>>> acquiring the channel at babel start time from >/etc/config/wireless=2E >>>>>> >>>>>> It messes up the diversity routing calculation, grump=2E >>>>>> >>>>>> There is a possibility a logfile got really big, but this one >>>>>> generally doesn't, but I should turn off logging in some >>>>>> future release=2E=2E=2E >>>>> >>>>> >>>>> I believe I've tracked down part of what's going on=2E It looks like >my >>>>> tmpfs is filling up 100% and then the device enters a bad state: >>>>> >>>>> After 24 hours, with tmpfs at 50%, babeld=2Elog is the largest file >by far >>>>> in tmpfs and the only file that appears to be growing (based on >`du`)=2E It >>>>> takes about 48 hours from reboot to fill up tmpfs on my device=2E >>>>> >>>>> # sort babeld=2Elog | uniq -c |sort -rn |head >>>>> >>>>> 503236 Couldn't determine channel of interface sw00: Invalid >argument=2E >>>>> >>>>> 1376 netlink_read: recvmsg(): No buffer space available >>>>> >>>>> 3 send: Cannot assign requested address >>>>> >>>>> # wc -l babeld=2Elog >>>>> >>>>> 504617 babeld=2Elog >>>>> >>>>> I sped up system failure by using `dd` to fill up tmpfs and the >system >>>>> became immediately unusable=2E >>>>> >>>>> This also explains the luci session store errors as sessions are >stored in >>>>> tmpfs=2E >>>>> >>>>> The other buffer issues may or may not be related to this=2E >>>>> >>>>> Best, >>>>> Steve >>>> >>>> >>> >>> >>> >>> -- >>> Dave T=C3=A4ht >>> >>> Fixing bufferbloat with cerowrt: >http://www=2Eteklibre=2Ecom/cerowrt/subscribe=2Ehtml >>> _______________________________________________ >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists=2Ebufferbloat=2Enet >>> https://lists=2Ebufferbloat=2Enet/listinfo/cerowrt-devel >> Hi Dave, --=20 Sent from my Android phone with K-9 Mail=2E Please excuse my brevity=2E