From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23])
	by huchra.bufferbloat.net (Postfix) with SMTP id 9274C21F0BA
	for <cerowrt-devel@lists.bufferbloat.net>;
	Sat,  2 Jun 2012 00:03:44 -0700 (PDT)
Received: (qmail invoked by alias); 02 Jun 2012 07:03:41 -0000
Received: from 75-142-58-156.static.mtpk.ca.charter.com (EHLO
	dhcp-112.home.lan) [75.142.58.156]
	by mail.gmx.net (mp071) with SMTP; 02 Jun 2012 09:03:41 +0200
X-Authenticated: #24211782
X-Provags-ID: V01U2FsdGVkX1/sz9510Mbs69sFIzozQV7EGn4CyN8PUgmarDmmIV
	j1FOmo/wrLTN/A
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=windows-1252
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <4FC009F6.7070707@gmail.com>
Date: Sat, 2 Jun 2012 00:03:36 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <3E3324C9-CF06-4BB3-A7FB-8B2E47A44C0C@gmx.de>
References: <00404BC8-3761-409D-A1C8-9213D7D9A3DF@gmx.de>
	<1E435715-5C95-49AF-99D0-E8AD6EAD5B44@gmx.de>
	<4FBE5767.6080704@gmail.com>
	<4D0F5C65-2401-470F-A6D8-BE18E8BA25C7@gmx.de>
	<4FBE6290.9000701@freedesktop.org>
	<0E4C11DB-2B8A-411B-A61F-34B2A6BF57B9@gmx.de>
	<4FBE7AAB.5080307@freedesktop.org> <4FBE84C4.80607@gmail.com>
	<61BEA217-79A6-47C8-888D-101BC0EAFB45@gmx.de>
	<CAA=Zby7hmoZdZrmERNfbYbDm6C6eCWU2KcSEr96iQisZLzDMGQ@mail.gmail.com>
	<844EF766-4E37-4B31-AA5D-B51FB22A05A8@gmx.de>
	<4FC009F6.7070707@gmail.com>
To: Robert Bradley <robert.bradley1@gmail.com>
X-Mailer: Apple Mail (2.1278)
X-Y-GMX-Trusted: 0
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] 3.3.6-2
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 02 Jun 2012 07:03:46 -0000

Hi Robert,

tool me some time to get a bit further with more testing...

On May 25, 2012, at 3:38 PM, Robert Bradley wrote:

> On 25/05/12 19:25, Sebastian Moeller wrote:
>> Hi Robert,
>>=20
>>=20
>> On May 25, 2012, at 4:11 AM, Robert Bradley wrote:
>>=20
>>> That said, unless we can
>>> find an obvious reason for /tmp overfilling, I'm not sure we should =
do
>>> that, since it will cause problems upgrading.
>> 	But if I create a file of 30000 1KB blocks in /tmp (so that =
around 400 KB stay available), the router goes into OOM, so I do not =
think that upgrading would work well if it really needs so much memory? =
I have a hunch that the openwork base under cerowrt does not assume =
something as big and demanding as the 11MB bind9 named process running =
:)
> The flash memory size is about 16MB for the WNDR3700, so it's probably =
ok for normal use.  It's less certain with BIND and everything else =
running, although it'd be possible to restart the router, stop BIND and =
then update.

	=46rom my totally unscientific testing I am quite convinced that =
even 16MB of /tmp used will make the router spiral into reboot if used =
over the 5GHz radio to the wan port. However, if I use one of the wired =
ports I get plenty of the following (not always hostapd):


Jun  1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: =
page allocation failure: order:0, mode:0x4020
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call =
Trace:
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<802850a4>] dump_stack+0x8/0x34
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<800b4548>] warn_alloc_failed+0xe8/0x10c
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<800b684c>] __alloc_pages_nodemask+0x5a0/0x600
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<800da070>] new_slab+0xa8/0x280
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<800dba48>] __kmalloc_track_caller+0x88/0x140
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<801e0854>] __alloc_skb+0x80/0x140
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<801e0930>] dev_alloc_skb+0x1c/0x48
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<801d0c74>] ag71xx_poll+0x430/0x65c
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] =
[<801e8c10>] net_rx_action+0x88/0x1c8
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: =
page allocation failure: order:0, mode:0x4020
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call =
Trace:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<802850a4>] dump_stack+0x8/0x34
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<800b4548>] warn_alloc_failed+0xe8/0x10c
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<800b684c>] __alloc_pages_nodemask+0x5a0/0x600
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<800da070>] new_slab+0xa8/0x280
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<80286b18>] __slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<800dba48>] __kmalloc_track_caller+0x88/0x140
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<801e0854>] __alloc_skb+0x80/0x140
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<801e0930>] dev_alloc_skb+0x1c/0x48
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
[<801d0c74>] ag71xx_poll+0x430/0x65c
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]=20
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Mem-Info:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal =
per-cpu:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU    0: =
hi:   18, btch:   3 usd:  18
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
active_anon:3826 inactive_anon:63 isolated_anon:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  =
active_file:683 inactive_file:561 isolated_file:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  =
unevictable:0 dirty:0 writeback:0 unstable:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  free:96 =
slab_reclaimable:408 slab_unreclaimable:7706
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  =
mapped:501 shmem:109 pagetables:142 bounce:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal =
free:384kB min:1016kB low:1268kB high:1524kB active_anon:15304kB =
inactive_anon:252kB active_file:2732kB inactive_file:2244kB =
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65024kB =
mlocked:0k
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] =
lowmem_reserve[]: 0 0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: =
42*4kB 15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB =
0*2048kB 0*4096kB =3D 384kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 total =
pagecache pages
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages in =
swap cache
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap cache =
stats: add 0, delete 0, find 0/0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free swap  =
=3D 0kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total swap =
=3D 0kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 =
pages RAM
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 pages =
reserved
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 pages =
shared
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 =
pages non-shared
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: =
Unable to allocate memory on node -1 (gfp=3D0x20)
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375]   cache: =
kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, =
min order: 0
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375]   node 0: =
slabs: 0, objs: 0, free: 0

But the box seems to survive this=85 Heck this even survives my test =
case with 16000 KB used of /tmp. Under that amount of memory pressure =
named and ntpd get killed but the router does go into automatically =
reboot, it just stays up and running albeit somewhat useless without =
named.


>=20
>> 	Oh I agree the /tmp issue is a tangent, but it does not seem =
healthy that the router spirals into reboot once /tmp fills up (BTW if I =
remove my 30000KB file from /tmp while the first OOM is in process the =
router recovers) My hunch is that the falmost fully instantiated tmpfs =
takes to o much memory from the system for it to handle its usual =
business.
>> 	On top of that are the wireless issues, say what about a kernel =
memory leak caused by ath wireless that grows and grows until the =
problematic /tmp size is in the single digit MBs that starts the spiral =
to reboot?
>=20
> No, definitely not healthy!  I'm thinking that maybe setting tmpfs to =
20MB would be a good compromise, at least until the presumed memory leak =
can be tracked down.

	The way I interpret my latest test results is that the "assumed =
leak" should be restricted to the wireless driver, does that sound right =
to you? Also with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I =
will see what happens if I add some swap space to the router, I hope it =
will be quite happy with 31MB /tmp and actual usage of that space :). =
Since Dave only recommends full tftp reflashes  maybe the update =
scenario might not be such a big issue for cerowrt?

>=20
>>> I'm thinking that maybe flooding wireless->wired with UDP traffic =
for
>>> 5-10 minutes is the right approach, and then vice-versa (restarting
>>> the router inbetween?).  If there are problems like infinite retries
>>> or packet memory leaks, that might show them up quickly.
>> 	That sounds like the right way to process, except I am no expert =
at setting netsurf up so that might take a while until I get around to =
actually test that hypothesis. (Do you by any chance know a publicly =
available net server process running in the internets to which I could =
point a local netperf, and do you have any recommendations how to create =
the UDP flood with netperf ?)
>>=20
>>=20
>=20
> I don't know of any myself.  There's a possible tutorial on setting it =
up at http://www.tonymacx86.com/viewtopic.php?t=3D5700, but assuming you =
have it installed on two computers already, it should just be a case of =
running:
>=20
> user@computer1$ netperf -t UDP_STREAM -H computer2
>=20
> and possibly running "netserver -p 12865" on computer2 if necessary.  =
(It should in theory be started via inetd.)


	I am still trying to get a second machine on my network so I can =
test the UDP hypothesis, but that will take a while longer=85

Best
	Sebastian