From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23]) by huchra.bufferbloat.net (Postfix) with SMTP id 7A8A6200CCC for ; Wed, 6 Jun 2012 16:03:20 -0700 (PDT) Received: (qmail invoked by alias); 06 Jun 2012 23:03:18 -0000 Received: from tsaolab-fw.caltech.edu (EHLO [192.168.50.78]) [131.215.9.89] by mail.gmx.net (mp071) with SMTP; 07 Jun 2012 01:03:18 +0200 X-Authenticated: #24211782 X-Provags-ID: V01U2FsdGVkX1+eTMwwjWWe9BkTLDGF3w02dkwRTYadpQE/Lgn1f/ x1zhEW0VNFr29k Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=windows-1252 From: Sebastian Moeller In-Reply-To: <4FCBE421.5050400@gmail.com> Date: Wed, 6 Jun 2012 16:03:15 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <00404BC8-3761-409D-A1C8-9213D7D9A3DF@gmx.de> <1E435715-5C95-49AF-99D0-E8AD6EAD5B44@gmx.de> <4FBE5767.6080704@gmail.com> <4D0F5C65-2401-470F-A6D8-BE18E8BA25C7@gmx.de> <4FBE6290.9000701@freedesktop.org> <0E4C11DB-2B8A-411B-A61F-34B2A6BF57B9@gmx.de> <4FBE7AAB.5080307@freedesktop.org> <4FBE84C4.80607@gmail.com> <61BEA217-79A6-47C8-888D-101BC0EAFB45@gmx.de> <844EF766-4E37-4B31-AA5D-B51FB22A05A8@gmx.de> <4FC009F6.7070707@gmail.com> <3E3324C9-CF06-4BB3-A7FB-8B2E47A44C0C@gmx.de> <4FCBE421.5050400@gmail.com> To: Robert Bradley X-Mailer: Apple Mail (2.1278) X-Y-GMX-Trusted: 0 Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] 3.3.6-2 X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jun 2012 23:03:21 -0000 Hi Robert, On Jun 3, 2012, at 3:24 PM, Robert Bradley wrote: > On 02/06/12 08:03, Sebastian Moeller wrote: >> =46rom my totally unscientific testing I am quite convinced = that even 16MB of /tmp used will make the router spiral into reboot if = used over the 5GHz radio to the wan port. However, if I use one of the = wired ports I get plenty of the following (not always hostapd): >>=20 >>=20 >> Jun 1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: = page allocation failure: order:0, mode:0x4020 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call = Trace: >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<802850a4>] dump_stack+0x8/0x34 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800b4548>] warn_alloc_failed+0xe8/0x10c >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800da070>] new_slab+0xa8/0x280 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800dba48>] __kmalloc_track_caller+0x88/0x140 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801e0854>] __alloc_skb+0x80/0x140 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801e0930>] dev_alloc_skb+0x1c/0x48 >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801d0c74>] ag71xx_poll+0x430/0x65c >> Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801e8c10>] net_rx_action+0x88/0x1c8 >> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: = page allocation failure: order:0, mode:0x4020 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call = Trace: >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<802850a4>] dump_stack+0x8/0x34 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800b4548>] warn_alloc_failed+0xe8/0x10c >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800da070>] new_slab+0xa8/0x280 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800dba48>] __kmalloc_track_caller+0x88/0x140 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<801e0854>] __alloc_skb+0x80/0x140 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<801e0930>] dev_alloc_skb+0x1c/0x48 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<801d0c74>] ag71xx_poll+0x430/0x65c >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = Mem-Info: >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal = per-cpu: >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU = 0: hi: 18, btch: 3 usd: 18 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = active_anon:3826 inactive_anon:63 isolated_anon:0 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = active_file:683 inactive_file:561 isolated_file:0 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = unevictable:0 dirty:0 writeback:0 unstable:0 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = free:96 slab_reclaimable:408 slab_unreclaimable:7706 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = mapped:501 shmem:109 pagetables:142 bounce:0 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal = free:384kB min:1016kB low:1268kB high:1524kB active_anon:15304kB = inactive_anon:252kB active_file:2732kB inactive_file:2244kB = unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65024kB = mlocked:0k >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = lowmem_reserve[]: 0 0 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: = 42*4kB 15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB = 0*2048kB 0*4096kB =3D 384kB >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 = total pagecache pages >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages = in swap cache >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap = cache stats: add 0, delete 0, find 0/0 >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free = swap =3D 0kB >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total = swap =3D 0kB >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 = pages RAM >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 = pages reserved >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 = pages shared >> Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 = pages non-shared >> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: = Unable to allocate memory on node -1 (gfp=3D0x20) >> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] cache: = kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, = min order: 0 >> Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] node = 0: slabs: 0, objs: 0, free: 0 >>=20 >> But the box seems to survive this=85 Heck this even survives my test = case with 16000 KB used of /tmp. Under that amount of memory pressure = named and ntpd get killed but the router does go into automatically = reboot, it just stays up and running albeit somewhat useless without = named. >>=20 >=20 > Yes - that stack trace is because the ag71xx driver can't allocate the = memory for a skb structure. Unlike the wireless driver though, the = ag71xx_poll function simply returns immediately with ENOMEM. I had no = real success in tracing what the equivalent is in ath9k. >=20 > I noticed a possible issue in ath9k_rx_tasklet, since if = bf->bf_mpdu=3DNULL (bf being an Atheros-specific buffer type) you could = potentially get an infinite loop. I can't see though if that can ever = occur in reality. I *think* it uses a list of skb structures = preallocated at init-time for incoming frames, but I'm still trying to = interpret that part of the code. (The exact behaviour is = hardware-dependent.) I see, that is out of my league then (I can read C badly, so I = will not necessarily recognize a bug if I look at it); unless I can run = some (simple) tests I do not see how I can actually help fix this=85 = (That said, I will try to get the proper kernel sources and start = digging through ath9k driver if just to learn something new). Also I will try to repeat my simplistic tests with some swap = space hooked up to see whether that ameliorates the issue out of = existence. (In my view it is quite acceptable to require swap to be = present for a fully "Tricked out" router distribution like cerowrt). >=20 >> The way I interpret my latest test results is that the "assumed = leak" should be restricted to the wireless driver, does that sound right = to you? Also with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I = will see what happens if I add some swap space to the router, I hope it = will be quite happy with 31MB /tmp and actual usage of that space :). = Since Dave only recommends full tftp reflashes maybe the update = scenario might not be such a big issue for cerowrt? >>=20 >=20 > I'll leave that to Dave to say - I was assuming that the firmware = would be stored in memory first and then flashed. (There's always tftp = at boot time as an alternative flashing method.) Well, maybe the next kernel base for cerowrt will be more = forgiving :) > --=20 > Robert Bradley