From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23])
	by huchra.bufferbloat.net (Postfix) with SMTP id 53B6E200B49
	for <cerowrt-devel@lists.bufferbloat.net>;
	Wed, 15 Aug 2012 22:15:39 -0700 (PDT)
Received: (qmail invoked by alias); 16 Aug 2012 05:15:37 -0000
Received: from 75-142-58-156.static.mtpk.ca.charter.com (EHLO
	dhcp-112.home.lan) [75.142.58.156]
	by mail.gmx.net (mp037) with SMTP; 16 Aug 2012 07:15:37 +0200
X-Authenticated: #24211782
X-Provags-ID: V01U2FsdGVkX1+DfDci7OiD/BsMpYcjdRJz/eQf76Pt9sHuq8xzra
	7+hK9Y4tdP/LZF
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=windows-1252
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <CAA93jw5JV0L0MLYxuO-ok2joOkodie_JkxNs_VAJat_eYhV1gw@mail.gmail.com>
Date: Wed, 15 Aug 2012 22:15:35 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <6329D77B-9803-453A-A34F-7B6EA02FE9AA@gmx.de>
References: <CAA93jw50MeqWH6TVKditFGfg-V-mOi-UUtsABqd+WHs2vedHQw@mail.gmail.com>
	<36D61FDC-9AA9-46CC-ACBB-2D28B250C660@gmx.de>
	<CAA93jw5JV0L0MLYxuO-ok2joOkodie_JkxNs_VAJat_eYhV1gw@mail.gmail.com>
To: Dave Taht <dave.taht@gmail.com>
X-Mailer: Apple Mail (2.1278)
X-Y-GMX-Trusted: 0
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] cerowrt 3.3.8-17 is released
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 16 Aug 2012 05:15:39 -0000

Hi Dave,

thanks for the detailed response...

On Aug 15, 2012, at 9:08 PM, Dave Taht wrote:

> re: ath: skbuff alloc of size 1926 failed
>=20
> as for the ath skbuff problem, I've seen that a lot. I had put hard
> packet limits (~600) on fq_codel in -11 and prior that were too low
> and it mostly went away, but I hit tail drop behavior everywhere,
> instead of codel behavior. What I have now (typically 1200) may well
> be too high, but not as overly high as the default (10k packets).

	Question is this limit per interface or per flow, or fq bin?

> There may be another means of increasing the size of that slab pool or
> making it less onerous.

	Interesting idea, I will have a look at that...

>=20
> I would like it if codel "kicked in" earlier than it currently does.
> The code in ns2 is currently using half the period that the linux code
> is. This would control things better, or so I hope (planning on trying
> this as I get time)
>=20
> I am also considering means of artificially upscaling the drop
> scheduler when we get close to queue limits.
>=20
> See some discussions on the codel list for these issues. (sims are
> easier to deal with than cerowrt, too!)

	Ah great, more goodness on the way to cerowrt I hope :)

>=20
> as for bind, it should be automagically restarted from xinetd, no need
> to fiddle with anything. However, since you are already under massive
> memory pressure, it may well fail to start up that way, too.

	Well, once bind is gone and the easement is ver the memory =
pressure is gone and there should be enough memory for bind to start =
(will check that hypothesis later). But trying to start it manually with =
something like 23MB free did not allow me to start bind up again, so =
certainly I was doing something wrong (or OOM killed more than just =
bind, but that is hard to say as nothing showed up in dmesg or in =
logread-f about the OOM killer, so maybe bind died from other causes).

> At the
> moment, I've largely given up on bind on anything but a more core home
> gw, and am running dnsmasq on everything (3700v2, picostations,
> nanostations) but the 3800s. (and the ones I run it on, aren't being
> used for wifi right now).

	A that should free some MBs for queues to grow in :)

>=20
> Lastly: Swap space won't help you on exhausting kernel limits.

	I had the naive hope that the swap would allow to push bind's =
memory out to the page file and give the kernel some more room to =
breathe, but that did only work to some degree. (In 3.3.8-6 one of the =
UDP storm tests I did made the router reboot like every other day, =
adding swap turned this into survival with killed bind and =
non-functional DNS; I am not sure in retrospect whether adding swap was =
such a good idea, as after the sudden reboots the router was at least =
functional again :))

>=20
> I'm glad you can reproduce the ath: slab problem - I can get it too at
> high rates using netperf over wifi.

	I always wanted to stress this with netsurf, but somehow never =
were able to find a netperf server outside of my cable modem with wich =
to recreate my failure mode...

> I will try a 3700v2 with and
> without bind to see if it's still there in 3.3.8-17. In the meantime
> if anyone knows how to get more allocations in that (2048? 4096?) slab
> by default, perhaps that will help?

Thanks so much for all the hard work and such a fun toy to play with=85

Sebastian

>=20
>=20
>=20
> On Wed, Aug 15, 2012 at 10:23 AM, Sebastian Moeller <moeller0@gmx.de> =
wrote:
>> Hi Dave,
>>=20
>> great work, as always I upgraded my production router to the latest =
and greatest (since I only have one router=85). And it works quite well =
for normal usage=85
>> Netalyzr reports around 2800ms seconds of uplink buffering, yet =
saturating the uplink does not affect ping times to a remote target =
noticeably, basically the same as for all codellized ceo versions I =
tested so far...
>>=20
>> Some notes and a question:
>> I noticed that even given plenty of swap space (1GB on a usb stick), =
using http://broadband.mpi-sws.org/residential/ to exercise UDP stress =
(on the uplink I assume) I can easily produce (I run the test from a =
macosx via 5GHz wireless over 1.5 yards):
>> Aug 15 01:16:29 nacktmulle kern.err kernel: [175395.132812] ath: =
skbuff alloc of size 1926 failed
>> (and plenty of those=85).
>> What then happens is that the OOM killer will aim for bind =
(reasonable since it is the largest single process) and kill it. When I =
try to restart bind by:
>> root@nacktmulle:~# /etc/rc.d/S47namedprep start
>> root@nacktmulle:~# /etc/rc.d/S48named restart
>> Stopping isc-bind
>>  /etc/chroot/named//var/run/named/named.pid not found, trying brute =
force
>> killall: named: no process killed
>> Kicking isc-bind in xinetd
>> rndc: connect failed: 127.0.0.1#953: connection refused
>> And bind does not start again and the router becomes less than =
useful. Now I assume I am doing something wrong, but what, if you have =
any idea how to solve this short of a reboot of the router (my current =
method) I would be happy to learn
>>=20
>>=20
>>=20
>> best regards
>>        sebastian
>>=20
>> On Aug 12, 2012, at 11:08 PM, Dave Taht wrote:
>>=20
>>> I'm too tired to write up a full set of release notes, but I've been
>>> testing it all day,
>>> and it looks better than -10 and certainly better than -11, but I =
won't know
>>> until some more folk sit down and test it, so here it is.
>>>=20
>>> http://huchra.bufferbloat.net/~cero1/3.3/3.3.8-17/
>>>=20
>>> fresh merge with openwrt, fix to a bind CVE, fixes for 6in4 and =
quagga
>>> routing problems,
>>> and a few tweaks to fq_codel setup that might make voip better.
>>>=20
>>> Go forth and break things!
>>>=20
>>> In other news:
>>>=20
>>> Van Jacobson gave a great talk about bufferbloat, BQL, codel, and =
fq_codel
>>> at last week's ietf meeting. Well worth watching. At the end he =
outlines
>>> the deployment problems in particular.
>>>=20
>>> =
http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=3DIETF8=
4_TSVAREA&chapter=3Dpart_3
>>>=20
>>> Far more interesting than this email!
>>>=20
>>>=20
>>> --
>>> Dave T=E4ht
>>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
>>> with fq_codel!"
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>=20
>=20
>=20
>=20
> --=20
> Dave T=E4ht
> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
> with fq_codel!"