From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by huchra.bufferbloat.net (Postfix) with SMTP id 0B86920221E for ; Mon, 20 Aug 2012 12:55:29 -0700 (PDT) Received: (qmail invoked by alias); 20 Aug 2012 19:55:28 -0000 Received: from tsaolab-fw.caltech.edu (EHLO [192.168.50.16]) [131.215.9.89] by mail.gmx.net (mp020) with SMTP; 20 Aug 2012 21:55:28 +0200 X-Authenticated: #24211782 X-Provags-ID: V01U2FsdGVkX1+DUsUGz5eD1pn6bIBOUbuEq4+pkxAbjmSU2Ptu/A u1xnpVMsnuPFzU Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=windows-1252 From: Sebastian Moeller In-Reply-To: Date: Mon, 20 Aug 2012 12:55:26 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <7E05F1F9-948B-42F5-99DD-28FB60F2332C@gmx.de> References: To: Dave Taht X-Mailer: Apple Mail (2.1278) X-Y-GMX-Trusted: 0 Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] Coping with router memory limitations in fq_codel X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2012 19:55:30 -0000 Hi Dave, sorry for accidentally taking this private, so here it is again. On Aug 20, 2012, at 12:41 PM, Sebastian Moeller wrote: > Hi Dave, >=20 > thanks for the long and thoughtful response. >=20 >=20 > On Aug 20, 2012, at 12:12 PM, Dave Taht wrote: >=20 >> Dear Sebastian: >>=20 >> In addition to your udp flooding DoS attack, I attacked cero also by >> using diffserv marking in netperf (-Y codepoint,codepoint) to = saturate >> all 4 wifi fq_codel queues, and also would get the router to have >> memory allocation failures and ultimately crash in the same way you >> are crashing it. I can similarly do what you just did with rtp >> flooding. You are correct that codel is tuned for tcp, and that >> fq_codel by maintaining many queues is even more susceptible to a >> tuned udp flooding attack on a memory limited device such as this. >=20 > Ah, I did not think that I reported something new in regards to = crash the router, it was more about me having found a way to reproduce = it without netsurf/iperf (which I never really got to run, due to a lack = of endpoints) as well as without using = http://broadband.mpi-sws.org/residential/ (as this only allows around 5 = runs per 24hour period). >=20 >>=20 >> I tried to cope with this in 3.3.8-10/11 by reducing the packet >> limits, which helped a lot. Unfortunately the settings I used then >> were below codel's reaction time, which invoked "interesting" tail >> drop behavior, so I arbitrarily doubled them in -17. To invoke more = of >> the kind of problems you are encountering=85 >=20 > That would be limit 600? Is 600 a problem for a single flow, or = die to limit being for the sum of all flows? Would an additional per = flow limit be able to help deal with this issue? >=20 >>=20 >> 0) Since then I have been looking into ways to improve codel's >> reaction time that are in the ns2 model presently, also fixing an >> assumption about newton's method that didn't hold in reverse, and = also >> means to incorporate more aggressive codel behavior when queue limits >> are near to being exceeded. >=20 > I see ramping up the drop frequency once space gets tight... >=20 >>=20 >> Unfortunately as the memory pressure problem starts in the driver, >> it's not communicated up the stack to where it could be controlled >> better=85 >=20 > Argh, sounds like fun :) >=20 >>=20 >> 0) I would like avoid having to determining if a queue is tcp or >> "other", and then having different kinds of drop strategies for each. >> That said, it seems possible to implement that=85 >=20 > Since the flows are filled by hash, a flow might contain both, = correct? So being more firm in non-tcp containg flows, might hurt some = TCP in shared bins. >=20 >>=20 >> 1) A workaround of sorts for the 64MB 3700v2 has been to give up on >> named and get some memory back that way. >=20 > Since I am a layman, what is the quick and dirty (and = reversible) way to do so, so I can test this? >=20 >>=20 >> 2) I believe, but am not sure, that Linux 3.6(5?) has some stuff in = it >> to get skb memory allocations done more efficiently. Eric and I and >> felix had talked about it, I don't know what was implemented. >=20 > ISTR there was something about fixing the accounting of drivers = so they track all buffers and not just part of the payload (truesize was = the word). Which totally went over my head, but sounds like something = that might help... >=20 >>=20 >> 3) It may be possible to improve how the memory allocations from the >> 2048 slab work in general. I imagine that half of memory is being >> wasted on big packets otherwise. >=20 > I had a quick look at the SLUB documentation and see no way to = do so I can understand. >=20 >>=20 >> 4) some options for improving fq_codel for more memory constrained >> home environments better. >>=20 >> 4a) On the wifi front (as well as other devices with multiple = hardware >> queues), I envision something like "mfq_codel", which would have an >> overall similar packet limit to a single fq_codel, but be able to >> deliver (and fair queue) packets to the underlying hardware queues >> independently. >=20 > Sounds like something to test I guess (but out of my league) >=20 >>=20 >> 4b) On the home to-ISP gateway qos front, a rate limited (tbf) >> mfq_codel with 2-4 queues would replace the complexity of hfsc or htb >> with a default qdisc that "just worked" without any scripting. It >> could be mildly more responsive (htb buffers up some data and has = it's >> own notion of time and quantums), thus cpu and memory usage would be >> lower than htb + multiple fq_codel queues. >=20 > But I thought that being able to arbitrarily prioritize some = traffic in a home router is a good thing; and that will require some = hierarchical system and will bring along some complexity... >=20 >=20 >>=20 >> Getting something that scaled down to 10s of kbits and up to gigabits >> would be hard, tho. HTB needs to be tuned when running lower or = higher >> than it's original operating range, presently, and that is where, in >> part, the simple_qos.sh effort is "stuck". >=20 > Can't this not be divined from the configured up and downlink = rates? Or are you thinking about dynamic changes in link-rates? >=20 >>=20 >> 4c) Another thought would be to have a weighted packet (to handle >> classification) oriented sfq codel or qfq_codel rather than separate >> fq_codel queues that are each byte-aware... we have CPU to burn, but >> not memory=85 >=20 > That I admit I do not understand. >=20 > Thanks a lot & best regards > Sebastian >=20 >>=20 >> On Mon, Aug 20, 2012 at 11:24 AM, Sebastian Moeller = wrote: >>> Hi Dave, >>>=20 >>> so I went to play around with this a bit more. I turned to UDP = flooding my cable modem through the router and this surely allows me to = create enough load on the wndr3700v2 to cause the allocation errors and = as a "bonus" also to drive the router to reboot (driven by the watchdog = timer?). Here is the script I used over 5G wireless (from = http://blog.ioshints.info/2008/03/udp-flood-in-perl.html) >>>=20 >>> #!/usr/bin/perl >>=20 >> It would be nice to have a C or lua version of this sort of test. >>=20 >>> ############## >>>=20 >>> # udp flood. >>> ############## >>>=20 >>> use Socket; >>> use strict; >>>=20 >>> if ($#ARGV !=3D 3) { >>> print "flood.pl