From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id DDD2121F407 for ; Fri, 17 Oct 2014 14:59:23 -0700 (PDT) Received: by mail-ob0-f178.google.com with SMTP id wn1so1369223obc.9 for ; Fri, 17 Oct 2014 14:59:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=rJ9gUNxa5vKdXA7CL4zwWr/pjkA781TXvIIGMwVpjJo=; b=h+TEymoRUUdPx2uoJlCmmN0KFsg1aP6bfOWBKr54YqQvR4GOHsYXWOzit9FxIvHn/X 1HCsECUBZZPicdDhP87Sqw5rkyYYl8f3CQHoMS5a6qdhacRGjciEz+5M7+I79D6jm1Pw 9d6ZaZ68taU6CGVdOQvQDIRWXEaWl9ZrAsAKcMCrrg7ARhTYODIVC0/J5wwpiUrPbBj/ sqspsU9s+x7PqlJH91KE4lCYGXGPnS3suIA71s60Qq4TzQA0NU+GE6SouEceT/Hb+jWh cSBy93J/dyBpwmNJUeKfCwTdLta/Pwbmk+4uwBw8/jQiwxnbeEwAOcY1n6QeaEXmCGpP 2jzQ== MIME-Version: 1.0 X-Received: by 10.202.66.137 with SMTP id p131mr3693719oia.77.1413583162641; Fri, 17 Oct 2014 14:59:22 -0700 (PDT) Received: by 10.202.227.211 with HTTP; Fri, 17 Oct 2014 14:59:22 -0700 (PDT) In-Reply-To: <20141017210641.1615F1CD@taggart.lackof.org> References: <20141017210641.1615F1CD@taggart.lackof.org> Date: Fri, 17 Oct 2014 14:59:22 -0700 Message-ID: From: Dave Taht To: Matt Taggart Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] fq_codel tuning on distros X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2014 21:59:52 -0000 On Fri, Oct 17, 2014 at 2:06 PM, Matt Taggart wrote: > Hi, > > http://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchma= rki > ng_Codel_and_FQ_Codel#Tuning-fq_codel > > explains that the default packet limit of 10000 is designed for 10GigE > speeds and that for slower links it should be turned down. Is that still > true? Looking at 3.16 source in fq_codel_init I see: > > sch->limit =3D 10*1024; This is the 10k packet limit. My take on the fq_codel in fedora discussion was that if a machine has enough memory to run systemd, it has more than enough memory to run fq_codel with the default packet limit. :) Most of our work here has been colored by the pain of streamlining this stuff to fit into boxes with 32MB of ram or less, and run at rates well below 50mbit. I don't feel any intense urges to fiddle with anything in fq_codel above those speeds and memory sizes, which is what fedora largely targets. I haven't seen any real difference in benchmarks from fiddling with the fq_codel quantum at the higher rates. Does anybody here, running fq_codel on their desktops or servers fiddle with any fq_codel parameters more? Certainly we have a backlog of tweaks to the algorithm that could use wider testing, which I expected to mostly fold into cake and then back into fq_codel. it's all second or third order stuff. but: I have longed for more field data from bleeding edge desktop and server users, which is why fedora would be a wonderful place to nail down other things that could be optimized better, if needed. I'm not on their relevant mailing lists, am busy today, and I tried to dump what info I could into the lwn.net article so that those fiddling with it over there might show up here or on the codel list for more discussion. (someone feel free to join over there...) IF fedora goes down this route... I also do wish we could reduce TSO/GSO sizes generally, have tcp small queues autosize to load, and had BQL on everything, in addition to fq or fq_codel. And I'd like a pony! I do have some concerns with just blithely turning fq_codel on on everything. One is savvy adminstrators tend to build complex qdiscs - or as in our case, we tend to turn on SQM on a given interface - and it would be annoying for a "systemd" to arbitrarily change it back - on some sort of reload. But: the 99.99% of the users not doing any tuning need a usually sane defau= lt. In terms of where to turn a different qdisc on, systemd does not strike me as the right place. I'd argue that you'd chose a default sysctl qdisc once, at hardware detection time, and only change it when the hardware changes. At a rough cut... something like this psuedocode would do if (have_good_clocksource) { // I have no idea how to get this from userspa= ce if(!am_forwarding && 10gigE) { enable(fq); } else { enable(fq_codel) } if ((am_forwarding || have_wifi || non_bql_device || non_web_workload) && 10gigE ) { enable(fq_codel) } // the circumstances under which sch_fq can be used effectively are not thoroughly // defined. For example, I think it's a bad idea to use the giant quantums it uses at lower speeds, // but I do like what fq+pacing does, and love it scaling to millions of flows. I'd like the // srtt and pacing stuff to end up back in fq_codel actually } else { enable pfifo_fast } > q->flows_cnt =3D 1024; There is some work showing we could improve the hash. I don't see any reason to change the default number of queues. Also the odd slow start results I got with a huge packet limit were, like, 2012 - prior to tcp small queues and some TSO/GSO offload fixes. They should be revisited. I'll put it on my list. > But I don't know what those correspond to. Are they sysfs tunable or only > at compile time? I would have liked it if there was a sysctl for default options, and it was more inheritable across devices, much like how ip_forwarding is implemented. > If Linux distros are going to turn on fq_codel by default, are these Up until this morning I would never have believed it. Aside from the folks doing routers, we'd heard nothing but crickets back from the mainstream distros. See for example: https://bugs.launchpad.net/ubuntu/+bug/940541 I was convinced we had to make cake into a completely backward compatible equivalent of pfifo_fast, and we had to make it run as fast as pfifo_fast. Stephen's presentation must have been VERY convincing! > reasonable values for the installed base (which I am assuming is mostly > 1GigE)? What recommendations should the distro documentation make for > tuning on various speeds? I run with the defaults. > I'm excited for this to go into distros, what needs to be done to make th= at > easier? Most of my stuff was constructed to fall into /etc/network/ifpreup.d. I would like the folk doing the work to benchmark their results, be convinced by them, and then be committed to work through whatever problems are raised by the users in the field. And/or subject themselves to the pain of pfifo_fast, as mostafa did.... I certainly have done enough benchmarking personally to convince :me: and the lovely, dedicated bunch of users we have here. > > Thanks, > > -- > Matt Taggart > matt@lackof.org > > > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel --=20 Dave T=C3=A4ht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks