[Cerowrt-devel] 3.3.6-2
Sebastian Moeller
moeller0 at gmx.de
Wed Jun 6 19:03:15 EDT 2012
Hi Robert,
On Jun 3, 2012, at 3:24 PM, Robert Bradley wrote:
> On 02/06/12 08:03, Sebastian Moeller wrote:
>> From my totally unscientific testing I am quite convinced that even 16MB of /tmp used will make the router spiral into reboot if used over the 5GHz radio to the wan port. However, if I use one of the wired ports I get plenty of the following (not always hostapd):
>>
>>
>> Jun 1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: page allocation failure: order:0, mode:0x4020
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call Trace:
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<802850a4>] dump_stack+0x8/0x34
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b4548>] warn_alloc_failed+0xe8/0x10c
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800da070>] new_slab+0xa8/0x280
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800dba48>] __kmalloc_track_caller+0x88/0x140
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0854>] __alloc_skb+0x80/0x140
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0930>] dev_alloc_skb+0x1c/0x48
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801d0c74>] ag71xx_poll+0x430/0x65c
>> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e8c10>] net_rx_action+0x88/0x1c8
>> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: page allocation failure: order:0, mode:0x4020
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call Trace:
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<802850a4>] dump_stack+0x8/0x34
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b4548>] warn_alloc_failed+0xe8/0x10c
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800da070>] new_slab+0xa8/0x280
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800dba48>] __kmalloc_track_caller+0x88/0x140
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0854>] __alloc_skb+0x80/0x140
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0930>] dev_alloc_skb+0x1c/0x48
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801d0c74>] ag71xx_poll+0x430/0x65c
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Mem-Info:
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal per-cpu:
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU 0: hi: 18, btch: 3 usd: 18
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] active_anon:3826 inactive_anon:63 isolated_anon:0
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] active_file:683 inactive_file:561 isolated_file:0
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] unevictable:0 dirty:0 writeback:0 unstable:0
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] free:96 slab_reclaimable:408 slab_unreclaimable:7706
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] mapped:501 shmem:109 pagetables:142 bounce:0
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal free:384kB min:1016kB low:1268kB high:1524kB active_anon:15304kB inactive_anon:252kB active_file:2732kB inactive_file:2244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65024kB mlocked:0k
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] lowmem_reserve[]: 0 0
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: 42*4kB 15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 384kB
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 total pagecache pages
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages in swap cache
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap cache stats: add 0, delete 0, find 0/0
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free swap = 0kB
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total swap = 0kB
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 pages RAM
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 pages reserved
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 pages shared
>> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 pages non-shared
>> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
>> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] cache: kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, min order: 0
>> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] node 0: slabs: 0, objs: 0, free: 0
>>
>> But the box seems to survive this… Heck this even survives my test case with 16000 KB used of /tmp. Under that amount of memory pressure named and ntpd get killed but the router does go into automatically reboot, it just stays up and running albeit somewhat useless without named.
>>
>
> Yes - that stack trace is because the ag71xx driver can't allocate the memory for a skb structure. Unlike the wireless driver though, the ag71xx_poll function simply returns immediately with ENOMEM. I had no real success in tracing what the equivalent is in ath9k.
>
> I noticed a possible issue in ath9k_rx_tasklet, since if bf->bf_mpdu=NULL (bf being an Atheros-specific buffer type) you could potentially get an infinite loop. I can't see though if that can ever occur in reality. I *think* it uses a list of skb structures preallocated at init-time for incoming frames, but I'm still trying to interpret that part of the code. (The exact behaviour is hardware-dependent.)
I see, that is out of my league then (I can read C badly, so I will not necessarily recognize a bug if I look at it); unless I can run some (simple) tests I do not see how I can actually help fix this… (That said, I will try to get the proper kernel sources and start digging through ath9k driver if just to learn something new).
Also I will try to repeat my simplistic tests with some swap space hooked up to see whether that ameliorates the issue out of existence. (In my view it is quite acceptable to require swap to be present for a fully "Tricked out" router distribution like cerowrt).
>
>> The way I interpret my latest test results is that the "assumed leak" should be restricted to the wireless driver, does that sound right to you? Also with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I will see what happens if I add some swap space to the router, I hope it will be quite happy with 31MB /tmp and actual usage of that space :). Since Dave only recommends full tftp reflashes maybe the update scenario might not be such a big issue for cerowrt?
>>
>
> I'll leave that to Dave to say - I was assuming that the firmware would be stored in memory first and then flashed. (There's always tftp at boot time as an alternative flashing method.)
Well, maybe the next kernel base for cerowrt will be more forgiving :)
> --
> Robert Bradley
More information about the Cerowrt-devel
mailing list