[Cerowrt-devel] 3.3.6-2

Sebastian Moeller moeller0 at gmx.de
Sat Jun 2 03:03:36 EDT 2012


Hi Robert,

tool me some time to get a bit further with more testing...

On May 25, 2012, at 3:38 PM, Robert Bradley wrote:

> On 25/05/12 19:25, Sebastian Moeller wrote:
>> Hi Robert,
>> 
>> 
>> On May 25, 2012, at 4:11 AM, Robert Bradley wrote:
>> 
>>> That said, unless we can
>>> find an obvious reason for /tmp overfilling, I'm not sure we should do
>>> that, since it will cause problems upgrading.
>> 	But if I create a file of 30000 1KB blocks in /tmp (so that around 400 KB stay available), the router goes into OOM, so I do not think that upgrading would work well if it really needs so much memory? I have a hunch that the openwork base under cerowrt does not assume something as big and demanding as the 11MB bind9 named process running :)
> The flash memory size is about 16MB for the WNDR3700, so it's probably ok for normal use.  It's less certain with BIND and everything else running, although it'd be possible to restart the router, stop BIND and then update.

	From my totally unscientific testing I am quite convinced that even 16MB of /tmp used will make the router spiral into reboot if used over the 5GHz radio to the wan port. However, if I use one of the wired ports I get plenty of the following (not always hostapd):


Jun  1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: page allocation failure: order:0, mode:0x4020
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call Trace:
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<802850a4>] dump_stack+0x8/0x34
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b4548>] warn_alloc_failed+0xe8/0x10c
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800da070>] new_slab+0xa8/0x280
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800dba48>] __kmalloc_track_caller+0x88/0x140
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0854>] __alloc_skb+0x80/0x140
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0930>] dev_alloc_skb+0x1c/0x48
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801d0c74>] ag71xx_poll+0x430/0x65c
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e8c10>] net_rx_action+0x88/0x1c8
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: page allocation failure: order:0, mode:0x4020
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call Trace:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<802850a4>] dump_stack+0x8/0x34
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b4548>] warn_alloc_failed+0xe8/0x10c
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800da070>] new_slab+0xa8/0x280
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800dba48>] __kmalloc_track_caller+0x88/0x140
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0854>] __alloc_skb+0x80/0x140
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0930>] dev_alloc_skb+0x1c/0x48
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801d0c74>] ag71xx_poll+0x430/0x65c
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Mem-Info:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal per-cpu:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU    0: hi:   18, btch:   3 usd:  18
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] active_anon:3826 inactive_anon:63 isolated_anon:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  active_file:683 inactive_file:561 isolated_file:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  unevictable:0 dirty:0 writeback:0 unstable:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  free:96 slab_reclaimable:408 slab_unreclaimable:7706
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  mapped:501 shmem:109 pagetables:142 bounce:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal free:384kB min:1016kB low:1268kB high:1524kB active_anon:15304kB inactive_anon:252kB active_file:2732kB inactive_file:2244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65024kB mlocked:0k
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] lowmem_reserve[]: 0 0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: 42*4kB 15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 384kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 total pagecache pages
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages in swap cache
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap cache stats: add 0, delete 0, find 0/0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free swap  = 0kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total swap = 0kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 pages RAM
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 pages reserved
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 pages shared
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 pages non-shared
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375]   cache: kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, min order: 0
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375]   node 0: slabs: 0, objs: 0, free: 0

But the box seems to survive this… Heck this even survives my test case with 16000 KB used of /tmp. Under that amount of memory pressure named and ntpd get killed but the router does go into automatically reboot, it just stays up and running albeit somewhat useless without named.



> 
>> 	Oh I agree the /tmp issue is a tangent, but it does not seem healthy that the router spirals into reboot once /tmp fills up (BTW if I remove my 30000KB file from /tmp while the first OOM is in process the router recovers) My hunch is that the falmost fully instantiated tmpfs takes to o much memory from the system for it to handle its usual business.
>> 	On top of that are the wireless issues, say what about a kernel memory leak caused by ath wireless that grows and grows until the problematic /tmp size is in the single digit MBs that starts the spiral to reboot?
> 
> No, definitely not healthy!  I'm thinking that maybe setting tmpfs to 20MB would be a good compromise, at least until the presumed memory leak can be tracked down.

	The way I interpret my latest test results is that the "assumed leak" should be restricted to the wireless driver, does that sound right to you? Also with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I will see what happens if I add some swap space to the router, I hope it will be quite happy with 31MB /tmp and actual usage of that space :). Since Dave only recommends full tftp reflashes  maybe the update scenario might not be such a big issue for cerowrt?

> 
>>> I'm thinking that maybe flooding wireless->wired with UDP traffic for
>>> 5-10 minutes is the right approach, and then vice-versa (restarting
>>> the router inbetween?).  If there are problems like infinite retries
>>> or packet memory leaks, that might show them up quickly.
>> 	That sounds like the right way to process, except I am no expert at setting netsurf up so that might take a while until I get around to actually test that hypothesis. (Do you by any chance know a publicly available net server process running in the internets to which I could point a local netperf, and do you have any recommendations how to create the UDP flood with netperf ?)
>> 
>> 
> 
> I don't know of any myself.  There's a possible tutorial on setting it up at http://www.tonymacx86.com/viewtopic.php?t=5700, but assuming you have it installed on two computers already, it should just be a case of running:
> 
> user at computer1$ netperf -t UDP_STREAM -H computer2
> 
> and possibly running "netserver -p 12865" on computer2 if necessary.  (It should in theory be started via inetd.)


	I am still trying to get a second machine on my network so I can test the UDP hypothesis, but that will take a while longer…

Best
	Sebastian



More information about the Cerowrt-devel mailing list