[Cerowrt-devel] cerowrt issues (3.10.24-8)
Dave Taht
dave.taht at gmail.com
Wed Jan 29 10:21:42 PST 2014
On Wed, Jan 29, 2014 at 9:44 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
> On January 29, 2014 5:10:18 PM CET, Dave Taht <dave.taht at gmail.com> wrote:
>>On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller <moeller0 at gmx.de>
>>wrote:
>>> Hi Dave,
>>>
>>> quick question, how does one turn of logging for babeld? It seems
>>that if daemonized it defaults to logging to /var/log/babeld.log (or
>>similar). Is setting the log file to /dev/null really the answer?
>>
>>seems so.
>
> Okay, I guess I will try that then...
>
>
>>
>>> (Since I have no the IPv6 issue not yet resolved, I assume
>>babeld is unhappy) I resorted to stopping babeld completely, but that
>>feels like a crutch...
>>
>>no daemon in an embedded system should ever write to flash in an
>>uncontrollable manner.
>
> What bugs me is that it basically keeps repeating the same error over and over again. If it would rate limit and:or push messages to the system log it would be nicer.
>
>
>>
>>I will also argue that not being able to find the channel is a bug
>>that messes with diversity routing
>>in particular.
>
> I have actually not yet understood what it wants to tell me ;), since I got your attention, is there an easy way to run a babel client under macosx?
for coping with the mac I use "macports" to get a compiler and support
for open source software.
I haven't ever tried to run babeld on the mac I have, I will put it on
my list...
>
>
> Best Regards
> Sebastian
>
>
>>
>>>
>>> Best
>>> Sebastian
>>>
>>>
>>>
>>> On Jan 27, 2014, at 22:14 , Dave Taht <dave.taht at gmail.com> wrote:
>>>
>>>> certainly turn off the babeld log! I will leave it off in the next
>>release.
>>>>
>>>> On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson <stevej at fruitless.org>
>>wrote:
>>>>> Looking more, the buffer errors are showing up in syslog well
>>before tmpfs
>>>>> fills up. Is the memtester openwrt package available for cerowrt? I
>>don't
>>>>> see it under `Available packages`.
>>>>>
>>>>> Thanks,
>>>>> Steve
>>>>>
>>>>>
>>>>> On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson
>><stevej at fruitless.org> wrote:
>>>>>>
>>>>>> On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht <dave.taht at gmail.com>
>>wrote:
>>>>>>>
>>>>>>> On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson
>><stevej at fruitless.org>
>>>>>>> wrote:
>>>>>>>> Hi everybody,
>>>>>>>>
>>>>>>>> I've been using cerowrt as a secondary wifi network (just a
>>single AP
>>>>>>>> for
>>>>>>>> now) for a few weeks now. Recently, my wndr3800 got stuck in a
>>bad
>>>>>>>> state and
>>>>>>>> eventually rebooted. I've had this happen a few times now and am
>>>>>>>> looking for
>>>>>>>> ways to debug the issue. I'm new to cerowrt and openwrt so any
>>advice
>>>>>>>> is
>>>>>>>> appreciated.
>>>>>>>>
>>>>>>>> Since I use it as a secondary network, this is no way critical.
>>>>>>>
>>>>>>> Yea! I appreciate caution before putting alpha software on your
>>gw.
>>>>>>>
>>>>>>>> I'm not
>>>>>>>> looking for free tech support but I couldn't find anything on
>>the wiki
>>>>>>>> about
>>>>>>>> troubleshooting. I'd love to start a page and write some shell
>>scripts
>>>>>>>> to
>>>>>>>> diagnose and report issues. I know that a cerowrt router is
>>meant to be
>>>>>>>> a
>>>>>>>> research project rather a consumer device but these things seem
>>helpful
>>>>>>>> regardless.
>>>>>>>
>>>>>>> Sure, let me know your wiki account. I have been lax about
>>granting
>>>>>>> access of late as the signup process is overrun by spammers.
>>>>>>
>>>>>>
>>>>>> My username is stevej on the wiki. Thanks!
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Please let me know if you'd prefer I not email the list with
>>these
>>>>>>>> issues or
>>>>>>>> if you'd rather I used trac or a different forum.
>>>>>>>>
>>>>>>>
>>>>>>> The list is where most stuff happens. Also in the irc channel.
>>>>>>>
>>>>>>> If it gets to where it needs to be tracked we have a bugtracker
>>at
>>>>>>>
>>>>>>> http://www.bufferbloat.net/projects/cerowrt/issues
>>>>>>>
>>>>>>> The first question I have is: Are you on comcast? Cerowrt
>>>>>>> had a dhcpv6-pd implementation that "just worked" from feburary
>>through
>>>>>>> december. Regrettably they changed the RA announcement interval
>>>>>>> to a really low number around then... and this triggers a
>>firewall reload
>>>>>>> every minute on everything prior to the release I point to below.
>>>>>>>
>>>>>>> If there is a memory leak somewhere that would have triggered it.
>>>>>>
>>>>>>
>>>>>> I am on AT&T ADSL2+ with a Motorola NVG510 modem.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> In this state, I can connect to the cerowrt base station via
>>wifi but
>>>>>>>> am
>>>>>>>> unable to route packets to the internet. I can connect to :81
>>and see
>>>>>>>> the
>>>>>>>> login page but logging in results in a lua error at
>>`/cgi-bin/luci`
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute
>>function
>>>>>>>> dispatcher target for entry '/'.
>>>>>>>> The called action terminated with an exception:
>>>>>>>> /usr/lib/lua/luci/sauth.lua:87: Session data invalid!
>>>>>>>> stack traceback:
>>>>>>>> [C]: in function 'assert'
>>>>>>>> /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch'
>>>>>>>> /usr/lib/lua/luci/dispatcher.lua:195: in function
>>>>>>>> </usr/lib/lua/luci/dispatcher.lua:194>
>>>>>>>>
>>>>>>>> I can ssh into the device and cat various log files until the
>>router
>>>>>>>> hangs
>>>>>>>> and reboots. here's a few relevant lines from my terminal
>>history
>>>>>>>> before the
>>>>>>>> device rebooted (I'm assuming a watchdog kicked in and rebooted
>>it).
>>>>>>>>
>>>>>>>> root at buffy2-1:~# ping google.com
>>>>>>>> ping: bad address 'google.com'
>>>>>>>> root at buffy2-1:~# free
>>>>>>>> total used free shared
>>buffers
>>>>>>>> Mem: 126336 110332 16004 0
>> 5616
>>>>>>>> -/+ buffers: 104716 21620
>>>>>>>> Swap: 0 0 0
>>>>>>>> root at buffy2-1:~# uptime
>>>>>>>> 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17
>>>>>>>> root at buffy2-1:~# dmesg
>>>>>>>> [ 0.000000] Linux version 3.10.24 (cero2 at snapon) (gcc version
>>4.6.4
>>>>>>>> (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24
>>>>>>>> 10:50:15 PST 2013
>>>>>>>> [skipping some lines]
>>>>>>>>
>>>>>>>> [ 13.156250] Error: Driver 'gpio-keys-polled' is already
>>registered,
>>>>>>>> aborting...
>>>>>>>> [ 19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not
>>ready
>>>>>>>> [ 19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000
>>>>>>>> [ 19.429687] se00: link up (1000Mbps/Full duplex)
>>>>>>>> [ 22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not
>>ready
>>>>>>>> [ 23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes
>>ready
>>>>>>>> [ 23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000
>>>>>>>> [ 23.757812] ge00: link up (1000Mbps/Full duplex)
>>>>>>>> [ 23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes
>>ready
>>>>>>>>
>>>>>>>> root at buffy2-1:~# ifconfig
>>>>>>>> ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1
>>>>>>>> inet addr:192.168.1.138 Bcast:192.168.1.255
>>>>>>>> Mask:255.255.255.0
>>>>>>>> inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link
>>>>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64
>>>>>>>> Scope:Global
>>>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>>>>>> RX packets:1469670 errors:0 dropped:8 overruns:0
>>frame:0
>>>>>>>> TX packets:547733 errors:0 dropped:0 overruns:0
>>carrier:0
>>>>>>>> collisions:0 txqueuelen:1000
>>>>>>>> RX bytes:229243410 (218.6 MiB) TX bytes:57304808 (54.6
>>MiB)
>>>>>>>> Interrupt:5
>>>>>>>>
>>>>>>>> lo Link encap:Local Loopback
>>>>>>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>>>>>>> inet6 addr: ::1/128 Scope:Host
>>>>>>>> UP LOOPBACK RUNNING MTU:65536 Metric:1
>>>>>>>> RX packets:23689 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>> TX packets:23689 errors:0 dropped:0 overruns:0
>>carrier:0
>>>>>>>> collisions:0 txqueuelen:0
>>>>>>>> RX bytes:2612713 (2.4 MiB) TX bytes:2612713 (2.4 MiB)
>>>>>>>>
>>>>>>>> pimreg Link encap:UNSPEC HWaddr
>>>>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>>>>>> UP RUNNING NOARP MTU:1472 Metric:1
>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>> collisions:0 txqueuelen:0
>>>>>>>> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>>>>>>>>
>>>>>>>> se00 Link encap:Ethernet HWaddr 2E:B0:5D:A0:C5:B0
>>>>>>>> inet addr:172.30.42.1 Bcast:172.30.42.31
>>>>>>>> Mask:255.255.255.224
>>>>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64
>>>>>>>> Scope:Global
>>>>>>>
>>>>>>> How are you assigning your ipv6 addresses?
>>>>>>
>>>>>>
>>>>>> It's been a while since I messed with this but I think IPv6 is
>>assigned
>>>>>> thanks to 6relayd? My modem has IPv6 enabled but no DHCPv6 options
>>that I
>>>>>> can find. Here's how cerowrt is configured.
>>>>>>
>>>>>> root at buffy2-1:/overlay/etc/config# cat 6relayd
>>>>>> config server 'default'
>>>>>> option fallback_relay 'rd dhcpv6 ndp'
>>>>>> list network 'ge00'
>>>>>> list network 'ge01'
>>>>>> list network 'gw00'
>>>>>> list network 'gw01'
>>>>>> list network 'gw10'
>>>>>> list network 'gw11'
>>>>>> list network 'se00'
>>>>>> list network 'sw00'
>>>>>> list network 'sw10'
>>>>>> option rd 'relay'
>>>>>> option dhcpv6 'relay'
>>>>>> option ndp 'relay'
>>>>>> option master 'ge00'
>>>>>>
>>>>>>>
>>>>>>>> inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link
>>>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>> TX packets:191740 errors:0 dropped:0 overruns:0
>>carrier:0
>>>>>>>> collisions:0 txqueuelen:1000
>>>>>>>> RX bytes:0 (0.0 B) TX bytes:42184988 (40.2 MiB)
>>>>>>>> Interrupt:4
>>>>>>>>
>>>>>>>> sw00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B0
>>>>>>>> inet addr:172.30.42.65 Bcast:172.30.42.95
>>>>>>>> Mask:255.255.255.224
>>>>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64
>>>>>>>> Scope:Global
>>>>>>>> inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link
>>>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>>>>>> RX packets:70239 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>> TX packets:286967 errors:0 dropped:0 overruns:0
>>carrier:0
>>>>>>>> collisions:0 txqueuelen:1000
>>>>>>>> RX bytes:15590189 (14.8 MiB) TX bytes:127357293 (121.4
>>MiB)
>>>>>>>>
>>>>>>>> root at buffy2-1:~# less /var/log/babeld.log
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> send: Cannot assign requested address
>>>>>>>> send: Cannot assign requested address
>>>>>>>> send: Cannot assign requested address
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> netlink_read: recvmsg(): No buffer space available
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>> Couldn't determine channel of interface sw00: Invalid argument.
>>>>>>>
>>>>>>> This is a problem in babel detecting the channel on a "normal"
>>>>>>> rather than a mesh interface. It's bugged me a long while, but
>>>>>>> haven't got around to finding what triggers it. Might "fix" it by
>>>>>>> acquiring the channel at babel start time from
>>/etc/config/wireless.
>>>>>>>
>>>>>>> It messes up the diversity routing calculation, grump.
>>>>>>>
>>>>>>> There is a possibility a logfile got really big, but this one
>>>>>>> generally doesn't, but I should turn off logging in some
>>>>>>> future release...
>>>>>>
>>>>>>
>>>>>> I believe I've tracked down part of what's going on. It looks like
>>my
>>>>>> tmpfs is filling up 100% and then the device enters a bad state:
>>>>>>
>>>>>> After 24 hours, with tmpfs at 50%, babeld.log is the largest file
>>by far
>>>>>> in tmpfs and the only file that appears to be growing (based on
>>`du`). It
>>>>>> takes about 48 hours from reboot to fill up tmpfs on my device.
>>>>>>
>>>>>> # sort babeld.log | uniq -c |sort -rn |head
>>>>>>
>>>>>> 503236 Couldn't determine channel of interface sw00: Invalid
>>argument.
>>>>>>
>>>>>> 1376 netlink_read: recvmsg(): No buffer space available
>>>>>>
>>>>>> 3 send: Cannot assign requested address
>>>>>>
>>>>>> # wc -l babeld.log
>>>>>>
>>>>>> 504617 babeld.log
>>>>>>
>>>>>> I sped up system failure by using `dd` to fill up tmpfs and the
>>system
>>>>>> became immediately unusable.
>>>>>>
>>>>>> This also explains the luci session store errors as sessions are
>>stored in
>>>>>> tmpfs.
>>>>>>
>>>>>> The other buffer issues may or may not be related to this.
>>>>>>
>>>>>> Best,
>>>>>> Steve
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dave Täht
>>>>
>>>> Fixing bufferbloat with cerowrt:
>>http://www.teklibre.com/cerowrt/subscribe.html
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>> Cerowrt-devel at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>
>
> Hi Dave,
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
More information about the Cerowrt-devel
mailing list