<font face="arial" size="2"><p style="margin:0;padding:0;">Ahhh, thanks, Sebastian. Now I understand what you are doing is a stress test.</p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;"> </p>
<p style="margin:0;padding:0;">-----Original Message-----<br />From: "Sebastian Moeller" <moeller0@gmx.de><br />Sent: Tuesday, August 21, 2012 1:28am<br />To: "Marchon" <marchon@gmail.com><br />Cc: "dpreed@reed.com" <dpreed@reed.com>, "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net><br />Subject: Re: [Cerowrt-devel] cerowrt 3.3.8-17 is released<br /><br /></p>
<div id="SafeStyles1345659559">
<p style="margin:0;padding:0;">Hi there,<br /><br />again the UDP flood test was not about the cable modem but just about resilience under (extreme) load. So just plain bread and butter functionality of a router, nothing fancy :) As is it causes a mix of very slow processing due to allocation errors and occasional OOM situations and full reboots. IMHO surviving the stress without these unfortunate outcomes would be preferable…<br /><br /><br />best<br /> Sebastian<br /><br />On Aug 20, 2012, at 7:44 PM, Marchon wrote:<br /><br />> It The Fq_codel settings prevent excess buffering on the push of data out over the cable modem itself / it will prevent unnecessary premature reduction of the tcp sliding window size further preventing a cascading backlog that ends up further reducing the sliding window and slowing down the overall outbound transfer rate. <br />> <br />> A buffering problem in the cable modem only happens if you feed it data to quickly. <br />> <br />> <br />> <br />> <br />> <br />> <br />> Sent from my iPhone<br />> <br />> On Aug 20, 2012, at 10:33 PM, dpreed@reed.com wrote:<br />> <br />>> I'm confused. Fq_codel does not (to my knowledge) fix bufferbloat *in your cable modem* or *in the CMTS head-end*.<br />>> <br />>> How could it? In order for that to be fixed, you need to manage the buffer in the cable modem itself.<br />>> <br />>> -----Original Message-----<br />>> From: "Sebastian Moeller" <moeller0@gmx.de><br />>> Sent: Monday, August 20, 2012 2:24pm<br />>> To: "Dave Taht" <dave.taht@gmail.com><br />>> Cc: cerowrt-devel@lists.bufferbloat.net<br />>> Subject: Re: [Cerowrt-devel] cerowrt 3.3.8-17 is released<br />>> <br />>> Hi Dave,<br />>> <br />>> so I went to play around with this a bit more. I turned to UDP flooding my cable modem through the router and this surely allows me to create enough load on the wndr3700v2 to cause the allocation errors and as a "bonus" also to drive the router to reboot (driven by the watchdog timer?). Here is the script I used over 5G wireless (from http://blog.ioshints.info/2008/03/udp-flood-in-perl.html)<br />>> <br />>> #!/usr/bin/perl<br />>> ##############<br />>> <br />>> # udp flood.<br />>> ##############<br />>> <br />>> use Socket;<br />>> use strict;<br />>> <br />>> if ($#ARGV != 3) {<br />>> print "flood.pl <ip> <port> <size> <time>\n\n";<br />>> print " port=0: use random ports\n";<br />>> print " size=0: use random size between 64 and 1024\n";<br />>> print " time=0: continuous flood\n";<br />>> exit(1);<br />>> }<br />>> <br />>> my ($ip,$port,$size,$time) = @ARGV;<br />>> <br />>> my ($iaddr,$endtime,$psize,$pport);<br />>> <br />>> $iaddr = inet_aton("$ip") or die "Cannot resolve hostname $ip\n";<br />>> $endtime = time() + ($time ? $time : 1000000);<br />>> <br />>> socket(flood, PF_INET, SOCK_DGRAM, 17);<br />>> <br />>> <br />>> print "Flooding $ip " . ($port ? $port : "random") . " port with " . <br />>> ($size ? "$size-byte" : "random size") . " packets" . <br />>> ($time ? " for $time seconds" : "") . "\n";<br />>> print "Break with Ctrl-C\n" unless $time;<br />>> <br />>> for (;time() <= $endtime;) {<br />>> $psize = $size ? $size : int(rand(1024-64)+64) ;<br />>> $pport = $port ? $port : int(rand(65500))+1;<br />>> <br />>> send(flood, pack("a$psize","flood"), 0, pack_sockaddr_in($pport, $iaddr));}<br />>> <br />>> called as either<br />>> udp_flood.pl 192.168.100.1 0 1024 240<br />>> or <br />>> udp_flood.pl 192.168.100.1 32000 1024 240<br />>> <br />>> The first version with randomized port number spreads the load nicely over many fq_codel bins/flows and seems slightly more likely to cause allocation errors and reboots than the 2nd invocation which restricts itself to port 32000 and presumably just one flow.<br />>> I wonder how to make cerowrt survive this kind of stress test… <br />>> <br />>> best<br />>> Sebastian<br />>> <br />>> <br />>> On Aug 15, 2012, at 9:08 PM, Dave Taht wrote:<br />>> <br />>> > re: ath: skbuff alloc of size 1926 failed<br />>> > <br />>> > as for the ath skbuff problem, I've seen that a lot. I had put hard<br />>> > packet limits (~600) on fq_codel in -11 and prior that were too low<br />>> > and it mostly went away, but I hit tail drop behavior everywhere,<br />>> > instead of codel behavior. What I have now (typically 1200) may well<br />>> > be too high, but not as overly high as the default (10k packets).<br />>> > There may be another means of increasing the size of that slab pool or<br />>> > making it less onerous.<br />>> > <br />>> > I would like it if codel "kicked in" earlier than it currently does.<br />>> > The code in ns2 is currently using half the period that the linux code<br />>> > is. This would control things better, or so I hope (planning on trying<br />>> > this as I get time)<br />>> > <br />>> > I am also considering means of artificially upscaling the drop<br />>> > scheduler when we get close to queue limits.<br />>> > <br />>> > See some discussions on the codel list for these issues. (sims are<br />>> > easier to deal with than cerowrt, too!)<br />>> > <br />>> > as for bind, it should be automagically restarted from xinetd, no need<br />>> > to fiddle with anything. However, since you are already under massive<br />>> > memory pressure, it may well fail to start up that way, too. At the<br />>> > moment, I've largely given up on bind on anything but a more core home<br />>> > gw, and am running dnsmasq on everything (3700v2, picostations,<br />>> > nanostations) but the 3800s. (and the ones I run it on, aren't being<br />>> > used for wifi right now).<br />>> > <br />>> > Lastly: Swap space won't help you on exhausting kernel limits.<br />>> > <br />>> > I'm glad you can reproduce the ath: slab problem - I can get it too at<br />>> > high rates using netperf over wifi. I will try a 3700v2 with and<br />>> > without bind to see if it's still there in 3.3.8-17. In the meantime<br />>> > if anyone knows how to get more allocations in that (2048? 4096?) slab<br />>> > by default, perhaps that will help?<br />>> > <br />>> > <br />>> > <br />>> > On Wed, Aug 15, 2012 at 10:23 AM, Sebastian Moeller <moeller0@gmx.de> wrote:<br />>> >> Hi Dave,<br />>> >> <br />>> >> great work, as always I upgraded my production router to the latest and greatest (since I only have one router…). And it works quite well for normal usage…<br />>> >> Netalyzr reports around 2800ms seconds of uplink buffering, yet saturating the uplink does not affect ping times to a remote target noticeably, basically the same as for all codellized ceo versions I tested so far...<br />>> >> <br />>> >> Some notes and a question:<br />>> >> I noticed that even given plenty of swap space (1GB on a usb stick), using http://broadband.mpi-sws.org/residential/ to exercise UDP stress (on the uplink I assume) I can easily produce (I run the test from a macosx via 5GHz wireless over 1.5 yards):<br />>> >> Aug 15 01:16:29 nacktmulle kern.err kernel: [175395.132812] ath: skbuff alloc of size 1926 failed<br />>> >> (and plenty of those…).<br />>> >> What then happens is that the OOM killer will aim for bind (reasonable since it is the largest single process) and kill it. When I try to restart bind by:<br />>> >> root@nacktmulle:~# /etc/rc.d/S47namedprep start<br />>> >> root@nacktmulle:~# /etc/rc.d/S48named restart<br />>> >> Stopping isc-bind<br />>> >> /etc/chroot/named//var/run/named/named.pid not found, trying brute force<br />>> >> killall: named: no process killed<br />>> >> Kicking isc-bind in xinetd<br />>> >> rndc: connect failed: 127.0.0.1#953: connection refused<br />>> >> And bind does not start again and the router becomes less than useful. Now I assume I am doing something wrong, but what, if you have any idea how to solve this short of a reboot of the router (my current method) I would be happy to learn<br />>> >> <br />>> >> <br />>> >> <br />>> >> best regards<br />>> >> sebastian<br />>> >> <br />>> >> On Aug 12, 2012, at 11:08 PM, Dave Taht wrote:<br />>> >> <br />>> >>> I'm too tired to write up a full set of release notes, but I've been<br />>> >>> testing it all day,<br />>> >>> and it looks better than -10 and certainly better than -11, but I won't know<br />>> >>> until some more folk sit down and test it, so here it is.<br />>> >>> <br />>> >>> http://huchra.bufferbloat.net/~cero1/3.3/3.3.8-17/<br />>> >>> <br />>> >>> fresh merge with openwrt, fix to a bind CVE, fixes for 6in4 and quagga<br />>> >>> routing problems,<br />>> >>> and a few tweaks to fq_codel setup that might make voip better.<br />>> >>> <br />>> >>> Go forth and break things!<br />>> >>> <br />>> >>> In other news:<br />>> >>> <br />>> >>> Van Jacobson gave a great talk about bufferbloat, BQL, codel, and fq_codel<br />>> >>> at last week's ietf meeting. Well worth watching. At the end he outlines<br />>> >>> the deployment problems in particular.<br />>> >>> <br />>> >>> http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF84_TSVAREA&chapter=part_3<br />>> >>> <br />>> >>> Far more interesting than this email!<br />>> >>> <br />>> >>> <br />>> >>> --<br />>> >>> Dave Täht<br />>> >>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out<br />>> >>> with fq_codel!"<br />>> >>> _______________________________________________<br />>> >>> Cerowrt-devel mailing list<br />>> >>> Cerowrt-devel@lists.bufferbloat.net<br />>> >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel<br />>> >> <br />>> > <br />>> > <br />>> > <br />>> > -- <br />>> > Dave Täht<br />>> > http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out<br />>> > with fq_codel!"<br />>> <br />>> _______________________________________________<br />>> Cerowrt-devel mailing list<br />>> Cerowrt-devel@lists.bufferbloat.net<br />>> https://lists.bufferbloat.net/listinfo/cerowrt-devel<br />>> _______________________________________________<br />>> Cerowrt-devel mailing list<br />>> Cerowrt-devel@lists.bufferbloat.net<br />>> https://lists.bufferbloat.net/listinfo/cerowrt-devel<br /><br /></p>
</div></font>