From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23]) by huchra.bufferbloat.net (Postfix) with SMTP id 9274C21F0BA for ; Sat, 2 Jun 2012 00:03:44 -0700 (PDT) Received: (qmail invoked by alias); 02 Jun 2012 07:03:41 -0000 Received: from 75-142-58-156.static.mtpk.ca.charter.com (EHLO dhcp-112.home.lan) [75.142.58.156] by mail.gmx.net (mp071) with SMTP; 02 Jun 2012 09:03:41 +0200 X-Authenticated: #24211782 X-Provags-ID: V01U2FsdGVkX1/sz9510Mbs69sFIzozQV7EGn4CyN8PUgmarDmmIV j1FOmo/wrLTN/A Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=windows-1252 From: Sebastian Moeller In-Reply-To: <4FC009F6.7070707@gmail.com> Date: Sat, 2 Jun 2012 00:03:36 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <3E3324C9-CF06-4BB3-A7FB-8B2E47A44C0C@gmx.de> References: <00404BC8-3761-409D-A1C8-9213D7D9A3DF@gmx.de> <1E435715-5C95-49AF-99D0-E8AD6EAD5B44@gmx.de> <4FBE5767.6080704@gmail.com> <4D0F5C65-2401-470F-A6D8-BE18E8BA25C7@gmx.de> <4FBE6290.9000701@freedesktop.org> <0E4C11DB-2B8A-411B-A61F-34B2A6BF57B9@gmx.de> <4FBE7AAB.5080307@freedesktop.org> <4FBE84C4.80607@gmail.com> <61BEA217-79A6-47C8-888D-101BC0EAFB45@gmx.de> <844EF766-4E37-4B31-AA5D-B51FB22A05A8@gmx.de> <4FC009F6.7070707@gmail.com> To: Robert Bradley X-Mailer: Apple Mail (2.1278) X-Y-GMX-Trusted: 0 Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] 3.3.6-2 X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jun 2012 07:03:46 -0000 Hi Robert, tool me some time to get a bit further with more testing... On May 25, 2012, at 3:38 PM, Robert Bradley wrote: > On 25/05/12 19:25, Sebastian Moeller wrote: >> Hi Robert, >>=20 >>=20 >> On May 25, 2012, at 4:11 AM, Robert Bradley wrote: >>=20 >>> That said, unless we can >>> find an obvious reason for /tmp overfilling, I'm not sure we should = do >>> that, since it will cause problems upgrading. >> But if I create a file of 30000 1KB blocks in /tmp (so that = around 400 KB stay available), the router goes into OOM, so I do not = think that upgrading would work well if it really needs so much memory? = I have a hunch that the openwork base under cerowrt does not assume = something as big and demanding as the 11MB bind9 named process running = :) > The flash memory size is about 16MB for the WNDR3700, so it's probably = ok for normal use. It's less certain with BIND and everything else = running, although it'd be possible to restart the router, stop BIND and = then update. =46rom my totally unscientific testing I am quite convinced that = even 16MB of /tmp used will make the router spiral into reboot if used = over the 5GHz radio to the wan port. However, if I use one of the wired = ports I get plenty of the following (not always hostapd): Jun 1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: = page allocation failure: order:0, mode:0x4020 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call = Trace: Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<802850a4>] dump_stack+0x8/0x34 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800b4548>] warn_alloc_failed+0xe8/0x10c Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800da070>] new_slab+0xa8/0x280 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<800dba48>] __kmalloc_track_caller+0x88/0x140 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801e0854>] __alloc_skb+0x80/0x140 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801e0930>] dev_alloc_skb+0x1c/0x48 Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801d0c74>] ag71xx_poll+0x430/0x65c Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] = [<801e8c10>] net_rx_action+0x88/0x1c8 Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: = page allocation failure: order:0, mode:0x4020 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call = Trace: Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<802850a4>] dump_stack+0x8/0x34 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800b4548>] warn_alloc_failed+0xe8/0x10c Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800b684c>] __alloc_pages_nodemask+0x5a0/0x600 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800da070>] new_slab+0xa8/0x280 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<800dba48>] __kmalloc_track_caller+0x88/0x140 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<801e0854>] __alloc_skb+0x80/0x140 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<801e0930>] dev_alloc_skb+0x1c/0x48 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = [<801d0c74>] ag71xx_poll+0x430/0x65c Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]=20 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Mem-Info: Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal = per-cpu: Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU 0: = hi: 18, btch: 3 usd: 18 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = active_anon:3826 inactive_anon:63 isolated_anon:0 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = active_file:683 inactive_file:561 isolated_file:0 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = unevictable:0 dirty:0 writeback:0 unstable:0 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] free:96 = slab_reclaimable:408 slab_unreclaimable:7706 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = mapped:501 shmem:109 pagetables:142 bounce:0 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal = free:384kB min:1016kB low:1268kB high:1524kB active_anon:15304kB = inactive_anon:252kB active_file:2732kB inactive_file:2244kB = unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65024kB = mlocked:0k Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] = lowmem_reserve[]: 0 0 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: = 42*4kB 15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB = 0*2048kB 0*4096kB =3D 384kB Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 total = pagecache pages Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages in = swap cache Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap cache = stats: add 0, delete 0, find 0/0 Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free swap = =3D 0kB Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total swap = =3D 0kB Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 = pages RAM Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 pages = reserved Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 pages = shared Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 = pages non-shared Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: = Unable to allocate memory on node -1 (gfp=3D0x20) Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] cache: = kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, = min order: 0 Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] node 0: = slabs: 0, objs: 0, free: 0 But the box seems to survive this=85 Heck this even survives my test = case with 16000 KB used of /tmp. Under that amount of memory pressure = named and ntpd get killed but the router does go into automatically = reboot, it just stays up and running albeit somewhat useless without = named. >=20 >> Oh I agree the /tmp issue is a tangent, but it does not seem = healthy that the router spirals into reboot once /tmp fills up (BTW if I = remove my 30000KB file from /tmp while the first OOM is in process the = router recovers) My hunch is that the falmost fully instantiated tmpfs = takes to o much memory from the system for it to handle its usual = business. >> On top of that are the wireless issues, say what about a kernel = memory leak caused by ath wireless that grows and grows until the = problematic /tmp size is in the single digit MBs that starts the spiral = to reboot? >=20 > No, definitely not healthy! I'm thinking that maybe setting tmpfs to = 20MB would be a good compromise, at least until the presumed memory leak = can be tracked down. The way I interpret my latest test results is that the "assumed = leak" should be restricted to the wireless driver, does that sound right = to you? Also with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I = will see what happens if I add some swap space to the router, I hope it = will be quite happy with 31MB /tmp and actual usage of that space :). = Since Dave only recommends full tftp reflashes maybe the update = scenario might not be such a big issue for cerowrt? >=20 >>> I'm thinking that maybe flooding wireless->wired with UDP traffic = for >>> 5-10 minutes is the right approach, and then vice-versa (restarting >>> the router inbetween?). If there are problems like infinite retries >>> or packet memory leaks, that might show them up quickly. >> That sounds like the right way to process, except I am no expert = at setting netsurf up so that might take a while until I get around to = actually test that hypothesis. (Do you by any chance know a publicly = available net server process running in the internets to which I could = point a local netperf, and do you have any recommendations how to create = the UDP flood with netperf ?) >>=20 >>=20 >=20 > I don't know of any myself. There's a possible tutorial on setting it = up at http://www.tonymacx86.com/viewtopic.php?t=3D5700, but assuming you = have it installed on two computers already, it should just be a case of = running: >=20 > user@computer1$ netperf -t UDP_STREAM -H computer2 >=20 > and possibly running "netserver -p 12865" on computer2 if necessary. = (It should in theory be started via inetd.) I am still trying to get a second machine on my network so I can = test the UDP hypothesis, but that will take a while longer=85 Best Sebastian