From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-x229.google.com (mail-ie0-x229.google.com [IPv6:2607:f8b0:4001:c03::229]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id EE40721F1FC for ; Thu, 11 Jul 2013 11:06:59 -0700 (PDT) Received: by mail-ie0-f169.google.com with SMTP id 10so18597528ied.14 for ; Thu, 11 Jul 2013 11:06:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=AWdLod4l+gdcyWbe1SPqysfaeUGHN2O+shPrZxwBDUE=; b=Bmo7m9hmEm8r17c78tPK11VVwW6FQqmyYFDg3fQvk963nVFkQYKdmE/u8YVd4tlYdK lZ+uYSd4eY5tS3kbXHfWbBP2BGKSRqJ0fE/6l5lCYmN+MG1VVLiLObAs7Pt4/grj742J RpUXIwl7bWh9FEhQRetgil6ndDLKAo4kMFJyfn2T7GvqpMWPnyfv9YgAuu042OERHaIo wZlU3FlGjXPwN0QCxrSMr0p2lEooBPq+3n4uw/djRU82tdcUkyyGNe+oXfB+zr/HMj5A bHU4oMJwvMPi9jODYAN9VMQG/sGfH7b8kEX52L6UdUJLulc8YFC+0nTGLLutKZYm0GdF XMYw== MIME-Version: 1.0 X-Received: by 10.42.133.66 with SMTP id g2mr11815802ict.49.1373566019001; Thu, 11 Jul 2013 11:06:59 -0700 (PDT) Received: by 10.64.98.162 with HTTP; Thu, 11 Jul 2013 11:06:58 -0700 (PDT) In-Reply-To: <1373564673.4600.55.camel@edumazet-glaptop> References: <1373564673.4600.55.camel@edumazet-glaptop> Date: Thu, 11 Jul 2013 11:06:58 -0700 Message-ID: From: Dave Taht To: Eric Dumazet Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: codel@lists.bufferbloat.net Subject: Re: [Codel] hardware multiqueue in fq_codel? X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 18:07:00 -0000 On Thu, Jul 11, 2013 at 10:44 AM, Eric Dumazet wro= te: > On Thu, 2013-07-11 at 10:09 -0700, Dave Taht wrote: >> In my default environments (wifi, mainly) the hardware queues have >> very different properties. >> >> I'm under the impression that in at least a few ethernet devices they >> are essentially the same. That said, in the sch_mq case, an entirely >> separate qdisc is created per hardware queue, and it's always been >> puzzling to me as to how to attempt to use them within a single qdisc >> in the pull-through manner. >> >> logically, you should be able to take the fq_codel hash index (idx % >> dev->num_tx_queues) and spread out across the hardware queues that >> way, but I have no idea where that info would go (the skb? the flow?) >> or even if it were possible as per the pull through problem... >> >> (This does not mean that I necessarily think hardware multiqueues are >> a good idea... (certainly the results I get out of 802.11e are >> terrible - but it would be nice to have a unified solution for hw >> multiqueue devices) >> > > We do not have a fixed/unified queue selection. > > It can be tweaked by many different things, depending on exact needs. > > MQ is not a qdisc per se, it's only a fake one, a demux if you want, so > that each tx queue has a separate qdisc lock. > > If you stick one fq_codel at the top of the hierarchy (instead of MQ), > then you loose all the pros of having multiple locks : sending packets > from fq_codel to different queues on hardware makes no sense, since the > single qdisc lock is the bottleneck. > > So if you want fq_codel and MQ, to be able to drive 40G links from many > cpus, just use : > > ETH=3Deth0 > NQUEUES=3D16 # or more, check how many tx queues your NIC supports > tc qd del dev $ETH root 2>/dev/null > tc qd add dev $ETH root handle 1: mq > for i in `seq 1 $NQUEUES` > do > tc qd add dev $ETH parent 1:$i fq_codel > done > > Thats only replaces the default pfifo_fast on each slave qdisc by > fq_codel. Gotcha. So what I actually did (felix did, in openwrt, actually) was just make fq_codel the default qdisc to avoid having to inspect things to set the number of queues in mq and mqprio. I see, for example, that mq is the default for tg3... http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0003-Use-FQ_codel-by-d= efault.patch I just added it to htb and hfsc too: http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0008-Make-fq_codel-the= -default-qdisc-for-htb-and-hfsc.patch There's a patch to obsolete pfifo_fast entirely in openwrt, which is a tad premature. A remaining concern is to what this affects: A) people that expect ifconfig X txqueuelen Y to do anything will be misled. Perhaps this could be fixed by having the fq_codel default limit be txqueuelen rather than the default (and overlarge) limit of 10k, but as tons of people are supplying oddball txqueuelens, I tend to think just ignoring txqueuelen going forward is more the right thing. Do you actually get close to 10k packets outstanding in 10GigE under any sane circumstances? B) people that expect pfifo_fast semantics, for which substituting fq_codel behaves oddly in two ways - 1) if you are explicitly setting skb->priority for the default pfifo_fast 3 bands and expecting a result, nothing happens - but in the general case, people setting skb->priority are trying to get better latency in the first place, and I really don't think almost anybody will notice. 2) if you are using a filter on pfifo_fast that expects 3 bands, and end up using fq_codel by default anyway we get DRR-like behavior over codel rather than strict prioritization and lose fq_codel's full benefits... which is still a win IMHO. I am not fond of being able to starve the other two bands.... 3) trying to explicitly set pfifo_fast via tc doesn't work with this patch. 4) ECN processing is enabled by default (but off by default in sysctl) > > > --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.= html