From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 8F8F621F0E8 for ; Thu, 11 Jul 2013 14:18:32 -0700 (PDT) Received: by mail-ie0-f170.google.com with SMTP id e11so19304180iej.1 for ; Thu, 11 Jul 2013 14:18:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=YGqayCp+lzKcWlbfORo1Zm8dScs0AEqesvxSCR6VmEE=; b=riqNMVbLuoxUGNF6LUbVGIAwMgEN8Wv3yupm08FcSNUx+KtxvGq1Qw/NcytElhKH64 TpeCuLFB/a8F7z/Qq3ZWjaOU/Ip6tPnHn0RrO+cAnqw42HkjRFplZzj7QVCMjYA7tkn5 gAKTjOElLRpjThb30vO5NMueXTp5A+TSufOVigxuzM5WddSNDqY+ylPD+qCxHT8WD5qD l466fMscVmA2C7sODYTgm7eozFDIqtffs1Ilmu8CJp7u8K+suoOC4aEGk3sTW8FIsj+3 j7fojHbzo3Fu3mFJSh3iHHT29EfYsWXXe0ODl6949NtVwwrczXTGvzGvJBj3IjdDfCJT SBlw== MIME-Version: 1.0 X-Received: by 10.43.133.70 with SMTP id hx6mr12025469icc.34.1373577511797; Thu, 11 Jul 2013 14:18:31 -0700 (PDT) Received: by 10.64.98.162 with HTTP; Thu, 11 Jul 2013 14:18:31 -0700 (PDT) In-Reply-To: <1373568848.4600.66.camel@edumazet-glaptop> References: <1373564673.4600.55.camel@edumazet-glaptop> <1373568848.4600.66.camel@edumazet-glaptop> Date: Thu, 11 Jul 2013 14:18:31 -0700 Message-ID: From: Dave Taht To: Eric Dumazet Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: codel@lists.bufferbloat.net Subject: Re: [Codel] hardware multiqueue in fq_codel? X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 21:18:32 -0000 On Thu, Jul 11, 2013 at 11:54 AM, Eric Dumazet wro= te: > On Thu, 2013-07-11 at 11:06 -0700, Dave Taht wrote: > >> Gotcha. So what I actually did (felix did, in openwrt, actually) was >> just make fq_codel the default qdisc to avoid having to inspect things >> to set the number of queues in mq and mqprio. I see, for example, that >> mq is the default for tg3... >> >> http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0003-Use-FQ_codel-b= y-default.patch >> >> I just added it to htb and hfsc too: >> >> http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0008-Make-fq_codel-= the-default-qdisc-for-htb-and-hfsc.patch >> >> There's a patch to obsolete pfifo_fast entirely in openwrt, which is a >> tad premature. >> >> A remaining concern is to what this affects: >> >> A) people that expect ifconfig X txqueuelen Y to do anything will be >> misled. Perhaps this could be fixed by having the fq_codel default >> limit be txqueuelen rather than the default (and overlarge) limit of >> 10k, but as tons of people are supplying oddball txqueuelens, I tend >> to think just ignoring txqueuelen going forward is more the right >> thing. >> >> Do you actually get close to 10k packets outstanding in 10GigE under >> any sane circumstances? > > > 10GigE can send 10.000.000 packets per second. > > 10k is only 1ms of buffering, which is pretty low considering the cpu > able to restart a queue might be blocked ~10 ms in a softirq handler. I have incidentally long thought that you are also tweaking target and interval for your environment? > Whole point of codel is that number of packets in the queue is > irrelevant. Only sojourn time is. Which is a lovely thing in worlds with infinite amounts of memory. > Now if your host has memory concerns, that's a separate issue, and you > can adjust the qdisc limit, or add a callback from mm to be able to > shrink queue in case of memory pressure, if you deal with non elastic > flows. So my take on this is that the default limit should be 1k on devices with less than 256MB of ram overall, and 10k (or more) elsewhere. This matches current txqueuelen behavior and has the least surprise. It does strike me as useful but probably hurtful to try and resize the queue when it gets too large as a callback from the mm subsystem, better to just drop packets? There are other patches out there to reduce memory pressure under load (also used in openwrt) by reducing skb size, those have also worked out well... typically they look like: static int pfifo_enqueue(struct sk_buff *skb, struct Qdisc *sch) { - if (likely(skb_queue_len(&sch->q) < sch->limit)) + if (likely(skb_queue_len(&sch->q) < sch->limit)) { + if (skb_queue_len(&sch->q) > 128) + skb =3D skb_reduce_truesize(skb); return qdisc_enqueue_tail(skb, sch); - + } If these were wrapped in a define > >> >> B) people that expect pfifo_fast semantics, for which substituting >> fq_codel behaves oddly in two ways - >> >> 1) if you are explicitly setting skb->priority for the default >> pfifo_fast 3 bands and expecting a result, nothing happens - but in >> the general case, people setting skb->priority are trying to get >> better latency in the first place, and I really don't think almost >> anybody will notice. I can also sit down and go through all the various overloaded uses that skb->priority has which make life really confusing and difficult. I really don't mind ignoring it entirely by default. :) >> 2) if you are using a filter on pfifo_fast that expects 3 bands, and >> end up using fq_codel by default anyway we get DRR-like behavior over >> codel rather than strict prioritization and lose fq_codel's full >> benefits... which is still a win IMHO. I am not fond of being able to >> starve the other two bands.... >> 3) trying to explicitly set pfifo_fast via tc doesn't work with this pat= ch. >> >> 4) ECN processing is enabled by default (but off by default in sysctl) > > There is no 'one solution fits every needs'. > > codel is _not_ a replacement of pfifo_fast, its a replacement for pfifo. Semantically here I'm trying to "replace the default qdisc" that 99.98% of people use, not "replace pfifo_fast" (that 99.99% of people use) or rather, come up with a strategy for doing such, one day, in some more easily deployable fashion. > If you want to replace pfifo_fast, you want PRIO + 3 codel, because > pfifo_fast is really PRIO + 3 pfifo. This is where this dialog died last time. This time however I'm trying to assemble consensus as to the steps required to build a viable *default* qdisc that is better than pfifo_fast, for desktops, servers, android boxes, routers, etc - which fq_codel seems to win at (nearly) across the board. Certainly those users that override pfifo_fast should be allowed to continue to do so. I agree a three tier system on top of fq_codel, would be a pure superset of pfifo_fast, and probably better in a few respects than pure fq_codel, but disagree strongly that aping the existing pfifo_fast PRIO-like behavior is desirable in a replacement for the default qdisc. Given the actual frequency of prio 1 traffic in your data it seems reasonable, but given the actual frequency of prio 3 (background) in mine, completely starving the background queue in the presence of a prio 1 or 2 flow is highly undesirable. I'm just repeating my position from last time... My problem with writing a prio_fq_codel qdisc that is a little more like pfifo_fast is multifold. 1) there is a cpu hit in a level of drr/sfq + fq_codel that is kind of unknown (I imagine sch_prio is trivial) 2) there is a memory hit in adding 3 fq_codel-like queues to every qdisc. In the case of mq on wifi, you end up with 12 queues per ssid and that's a rather huge hit on memory... I kind of hope that we are in agreement, at least, that it would be nice if pfifo_fast went the way of the dodo? So based on this dialog here, and over on lwn.net (http://lwn.net/Articles/558603/) I think the beginnings of a way forward would be for me to A) change my existing patch converting all instances of pfifo_fast to fq_codel to be a configurable define instead (CONFIG_QDISC_DEFAULT) that can be set more easily in a distro than dealing with tc directly, which makes it vastly easier to apply to mq devices by default, in particular. B) Work on some sort of fq_codel derivative that has the desired 3 tier behavior. The simplest would be something that has a single queue for prio 1 and a single queue for background and services the first in the fast queue, and the background queue, well, darn it, I dunno. C) come up with more ways of meeting the memory needs of both teeny and gingormous devices as per above > > --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.= html