[Cerowrt-devel] cerowrt 3.3.8-17 is released
moeller0 at gmx.de
Thu Aug 16 01:15:35 EDT 2012
thanks for the detailed response...
On Aug 15, 2012, at 9:08 PM, Dave Taht wrote:
> re: ath: skbuff alloc of size 1926 failed
> as for the ath skbuff problem, I've seen that a lot. I had put hard
> packet limits (~600) on fq_codel in -11 and prior that were too low
> and it mostly went away, but I hit tail drop behavior everywhere,
> instead of codel behavior. What I have now (typically 1200) may well
> be too high, but not as overly high as the default (10k packets).
Question is this limit per interface or per flow, or fq bin?
> There may be another means of increasing the size of that slab pool or
> making it less onerous.
Interesting idea, I will have a look at that...
> I would like it if codel "kicked in" earlier than it currently does.
> The code in ns2 is currently using half the period that the linux code
> is. This would control things better, or so I hope (planning on trying
> this as I get time)
> I am also considering means of artificially upscaling the drop
> scheduler when we get close to queue limits.
> See some discussions on the codel list for these issues. (sims are
> easier to deal with than cerowrt, too!)
Ah great, more goodness on the way to cerowrt I hope :)
> as for bind, it should be automagically restarted from xinetd, no need
> to fiddle with anything. However, since you are already under massive
> memory pressure, it may well fail to start up that way, too.
Well, once bind is gone and the easement is ver the memory pressure is gone and there should be enough memory for bind to start (will check that hypothesis later). But trying to start it manually with something like 23MB free did not allow me to start bind up again, so certainly I was doing something wrong (or OOM killed more than just bind, but that is hard to say as nothing showed up in dmesg or in logread-f about the OOM killer, so maybe bind died from other causes).
> At the
> moment, I've largely given up on bind on anything but a more core home
> gw, and am running dnsmasq on everything (3700v2, picostations,
> nanostations) but the 3800s. (and the ones I run it on, aren't being
> used for wifi right now).
A that should free some MBs for queues to grow in :)
> Lastly: Swap space won't help you on exhausting kernel limits.
I had the naive hope that the swap would allow to push bind's memory out to the page file and give the kernel some more room to breathe, but that did only work to some degree. (In 3.3.8-6 one of the UDP storm tests I did made the router reboot like every other day, adding swap turned this into survival with killed bind and non-functional DNS; I am not sure in retrospect whether adding swap was such a good idea, as after the sudden reboots the router was at least functional again :))
> I'm glad you can reproduce the ath: slab problem - I can get it too at
> high rates using netperf over wifi.
I always wanted to stress this with netsurf, but somehow never were able to find a netperf server outside of my cable modem with wich to recreate my failure mode...
> I will try a 3700v2 with and
> without bind to see if it's still there in 3.3.8-17. In the meantime
> if anyone knows how to get more allocations in that (2048? 4096?) slab
> by default, perhaps that will help?
Thanks so much for all the hard work and such a fun toy to play with…
> On Wed, Aug 15, 2012 at 10:23 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
>> Hi Dave,
>> great work, as always I upgraded my production router to the latest and greatest (since I only have one router…). And it works quite well for normal usage…
>> Netalyzr reports around 2800ms seconds of uplink buffering, yet saturating the uplink does not affect ping times to a remote target noticeably, basically the same as for all codellized ceo versions I tested so far...
>> Some notes and a question:
>> I noticed that even given plenty of swap space (1GB on a usb stick), using http://broadband.mpi-sws.org/residential/ to exercise UDP stress (on the uplink I assume) I can easily produce (I run the test from a macosx via 5GHz wireless over 1.5 yards):
>> Aug 15 01:16:29 nacktmulle kern.err kernel: [175395.132812] ath: skbuff alloc of size 1926 failed
>> (and plenty of those…).
>> What then happens is that the OOM killer will aim for bind (reasonable since it is the largest single process) and kill it. When I try to restart bind by:
>> root at nacktmulle:~# /etc/rc.d/S47namedprep start
>> root at nacktmulle:~# /etc/rc.d/S48named restart
>> Stopping isc-bind
>> /etc/chroot/named//var/run/named/named.pid not found, trying brute force
>> killall: named: no process killed
>> Kicking isc-bind in xinetd
>> rndc: connect failed: 127.0.0.1#953: connection refused
>> And bind does not start again and the router becomes less than useful. Now I assume I am doing something wrong, but what, if you have any idea how to solve this short of a reboot of the router (my current method) I would be happy to learn
>> best regards
>> On Aug 12, 2012, at 11:08 PM, Dave Taht wrote:
>>> I'm too tired to write up a full set of release notes, but I've been
>>> testing it all day,
>>> and it looks better than -10 and certainly better than -11, but I won't know
>>> until some more folk sit down and test it, so here it is.
>>> fresh merge with openwrt, fix to a bind CVE, fixes for 6in4 and quagga
>>> routing problems,
>>> and a few tweaks to fq_codel setup that might make voip better.
>>> Go forth and break things!
>>> In other news:
>>> Van Jacobson gave a great talk about bufferbloat, BQL, codel, and fq_codel
>>> at last week's ietf meeting. Well worth watching. At the end he outlines
>>> the deployment problems in particular.
>>> Far more interesting than this email!
>>> Dave Täht
>>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
>>> with fq_codel!"
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel at lists.bufferbloat.net
> Dave Täht
> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
> with fq_codel!"
More information about the Cerowrt-devel