From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f171.google.com (mail-we0-f171.google.com [74.125.82.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 4DCF0201B52 for ; Fri, 31 Aug 2012 09:59:19 -0700 (PDT) Received: by weys43 with SMTP id s43so3419852wey.16 for ; Fri, 31 Aug 2012 09:59:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=SXQXto0wVr95VrBpI/xlZbuyUxJ1u9nd8UN0krSk/ZU=; b=kznrfhX7SFptE3Gw8iNVyVQ9Kp0ligLbQy02tqBLMJNZXfOpq01/B/eDY9d4PkX+GR AMrC/e4A0CdGu8wIVMuHASM+0y0VFWLdzbmpdT+xJcPX4JRG1nIc8cCLQ1fR1D/9sd3T Dv0iATB1KgkVIt87sOUHMIDcFgEUq6DsJ9jgUvSHbBNx/oiTjgrSB0/g2yQHug7gnPQE a5Vfxe6QxJe0JwOcv7nfMahygbfw4dvNfvLXBw89CrVcx1r91nnbpR9jOazHEv/n31Qy QRXINOTnBRIqsVIkYpVJzcAJxIC7N3Y9cQEN7TZbSY5+aDjegQFLzbGsYy/Q5L7l6oVu EoCA== MIME-Version: 1.0 Received: by 10.180.8.41 with SMTP id o9mr5969898wia.3.1346432357396; Fri, 31 Aug 2012 09:59:17 -0700 (PDT) Received: by 10.223.159.134 with HTTP; Fri, 31 Aug 2012 09:59:17 -0700 (PDT) In-Reply-To: <1346430207.7996.11.camel@edumazet-glaptop> References: <1346396137.2586.301.camel@edumazet-glaptop> <5040DDE9.7030507@hp.com> <1346430207.7996.11.camel@edumazet-glaptop> Date: Fri, 31 Aug 2012 09:59:17 -0700 Message-ID: From: Dave Taht To: Eric Dumazet Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: codel@lists.bufferbloat.net Subject: Re: [Codel] fq_codel : interval servo X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2012 16:59:19 -0000 On Fri, Aug 31, 2012 at 9:23 AM, Eric Dumazet wrot= e: > On Fri, 2012-08-31 at 08:53 -0700, Rick Jones wrote: >> On 08/30/2012 11:55 PM, Eric Dumazet wrote: >> > On locally generated TCP traffic (host), we can override the 100 ms >> > interval value using the more accurate RTT estimation maintained by TC= P >> > stack (tp->srtt) >> > >> > Datacenter workload benefits using shorter feedback (say if RTT is bel= ow >> > 1 ms, we can react 100 times faster to a congestion) >> > >> > Idea from Yuchung Cheng. >> >> Mileage varies of course, but what are the odds of a datacenter's >> end-system's NIC(s) being the bottleneck point? Is it worth pinging a >> couple additional cache lines around (looking at the v2 email, I'm >> ass-u-me-ing that sk_protocol and tp->srtt are on different cache lines)= ? >> > > Its certainly worth pinging additional cache lines. > > A host consume almost no cpu in qdisc layer (at least not in fq_codel) > > A router wont use this stuff (as skb->sk will be NULL) > >> If fq_codel is going to be a little bit pregnant and "layer violate" :) >> why stop at TCP? > > Who said I would stop at TCP ? ;) Heh. "Vith Codel I vill rule ze vorld!" I have not, of late, been focused on TCP, (torn up about uTP, actually) and also more on the kinds of problems that occur in the home gateway. I realize that 10GigE and datacenter host based work is sexy and fun, but getting stuff that runs well in today's 1-20Mbit environments is my own priority, going up to 100Mbit, with something that can be embedded in a SoC. The latest generation of SoCs all do QoS in hardware... badly. Secondly, finding things that would work on the head ends (CMTSes and DSLAMs) also ranks way up there. Fixing wifi follows that in priority... While many of the things under tweak have commonality I'm thinking more and more it would be saner for me to fork off for a while and do a "mfq_codel", which would let me A) be able to test codel, fq_codel, mfq_codel enhancements on the same build and testbed. B) play with stuff that works better at lower bandwidths and C) with wifi. Two notes from my slow and often buggy work that I haven't mentioned on this list yet: 1) The current fq_codel implementation wipes out the the codel state every time a queue is emptied. This happens rarely at 10GigE speeds, quite often below 100Mbit. Keeping the state around helps a LOT on queue depth - for example (with the rest of my buggy patchset) I end up with a very nice avg 50 packet backlog at 100Mbit, low median, stddev, etc, with 8 streams, and with 150, 70 or so... "fixing" that just involved removing codel_init_vars from the "is this a new queue" routine. some of the other patches I've thrown around are about seeing odd behaviors on temporarily empty queues, like: 2) maxpacket can get set to the largest packet ever seen by a queue (like a TSO packet) .... } else { + stats->maxpacket =3D qdisc_pkt_len(skb); vars->count =3D 1; vars->rec_inv_sqrt =3D ~0U >> REC_INV_SQRT_SHIFT; } So resetting it will A) have a fq_codel queue set to the actual packet size (acks, for example), might make codel more accurate on those sort of streams... and B) fiddling with ethtool to turn off gso/tso/etc was not reflected in codel's estimates. B) has occasionally caused a great deal of headscratching. > >> >> Is this change rectifying an "unfairness" with the existing fq_codel and >> the 100ms for all when two TCP flows have very different srtts? >> > > codel has to use a single interval value, and we use an average value. > It seems to work quite well. > > fq_codel has the opportunity to get a per tcp flow interval value. > And this should give better behavior. > >> Some perhaps overly paranoid questions: >> >> Does it matter that the value of tp->srtt at the time fq_codel dequeues >> will not necessarily be the same as when that segment was queued? >> > > It matters we use the last srtt value/estimation, which is done in this > patch. > >> Is there any chance of the socket going away between the time the packet >> was queued and the time it was dequeued? (Or tp->srtt becoming "undefine= d?") > > When skb->sk is non NULL, we hold a reference to the socket, it cannot > disappear under us. > > > > _______________________________________________ > Codel mailing list > Codel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/codel --=20 Dave T=E4ht http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out with fq_codel!"