From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 08936200A7A for ; Sun, 2 Sep 2012 11:08:21 -0700 (PDT) Received: by wgbfa7 with SMTP id fa7so2619344wgb.28 for ; Sun, 02 Sep 2012 11:08:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=JG0OIBMDNeOyqA9ZJBDGEAMhH9HJjAtW0bU/8MZk1/s=; b=Sc7+5LCnGq3Sez0BAMb076EvR+UzXA/+pltDHzJBqG5ipV7l/593rgW2KOE170LUu4 75ZBRZ66rNmQ2dwgtwyl2xr6ZOnx9F6J3y/rEdalq7WbPl3pL8cHsJYu49rQ5paThdDc LLIfcyyTs1kmPqlN8J5qIAQEcbxJUCFiCNKPd0NPuknVumPcqGwgWMTWvImUzxfI8Dfz 1ClNmH4wESEHRRPCLWwPhNlg/PW5OCCcaqgarM5QvDeUEQBrKAVy4CmWuGqe/V1EFe84 D9LvLWywB+x5h6v4l3x+OxMDRICW4e2UYStv7flPxh8hV32uUoRIGQyjay4sSLQivjsN BEsQ== MIME-Version: 1.0 Received: by 10.180.100.133 with SMTP id ey5mr17877456wib.4.1346609299189; Sun, 02 Sep 2012 11:08:19 -0700 (PDT) Received: by 10.223.159.134 with HTTP; Sun, 2 Sep 2012 11:08:19 -0700 (PDT) In-Reply-To: <1346504012.7996.68.camel@edumazet-glaptop> References: <1346396137.2586.301.camel@edumazet-glaptop> <5040DDE9.7030507@hp.com> <1346430207.7996.11.camel@edumazet-glaptop> <1346504012.7996.68.camel@edumazet-glaptop> Date: Sun, 2 Sep 2012 11:08:19 -0700 Message-ID: From: Dave Taht To: Eric Dumazet Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: codel@lists.bufferbloat.net Subject: Re: [Codel] fq_codel : interval servo X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Sep 2012 18:08:22 -0000 On Sat, Sep 1, 2012 at 5:53 AM, Eric Dumazet wrote= : > On Fri, 2012-08-31 at 09:59 -0700, Dave Taht wrote: > >> I realize that 10GigE and datacenter host based work is sexy and fun, >> but getting stuff that runs well in today's 1-20Mbit environments is >> my own priority, going up to 100Mbit, with something that can be >> embedded in a SoC. The latest generation of SoCs all do QoS in >> hardware... badly. > > Maybe 'datacenter' word was badly chosen and you obviously jumped on it, > because it meant different things for you. I am hypersensitive about optimizing for sub-ms problems when there are huge multi-second problems like in cable, wifi, and cellular. Recent paper: http://conferences.sigcomm.org/sigcomm/2012/paper/cellnet/p1.pdf Sorry. If the srtt idea can scale UP as well as down sanely, cool. I'm concerned about how different TCPs might react to this and have a long comment about the placement of this at this layer at the bottom of this email. > Point was that when your machine has flows with quite different RTT, 1 > ms on your local LAN, and 100 ms on different continent, current control > law might clamp long distance communications, or have slow response time > for the LAN traffic. fq_codel, far less likely, and if you have a collision between long distanc= e and local streams in a single queue, there, what will happen if you fiddle with srrt? > The shortest path you have, the sooner you should drop packets because > losses have much less impact on latencies. Sure. > Yuchung idea sounds very good and my intuition is it will give > tremendous results for standard linux qdisc setups ( a single qdisc per > device) I tend to agree. > To get similar effects, you could use two (or more) fq codels per > ethernet device. Ugh. > One fq_codel with interval =3D 1 or 5 ms for LAN communications > One fq_codel with interval =3D 100 ms for other communications and one mfq_codel with a calculated maxpacket, weird interval, etc for wifi. > tc filters to select the right qdisc by destination addresses Meh. A simple default might be "Am I going out the default route for this?" > Then we are a bit far from codel spirit (no knob qdisc) > > I am pretty sure you noticed that if your ethernet adapter is only used > for LAN communications, you have to setup codel interval to a much > smaller value than the 100 ms default to get reasonably fast answer to > congestion. At 100Mbit, (as I've noted elsewhere), BQL choses defaults about double optimum (6-7k), and gso is currently left on. With those disabled, I tend t= o run a pretty congested network, and rarely notice. That does not mean that reaction time isn't an issue, it is merely masked so well that I don't care= . > Just make this automatic, because people dont want to think about it. Like you, I want one qdisc to rule them all, with sane defaults. I do feel it is very necessary to add in one pfifo_fast-like behavior in fq_codel: deprioritizing background traffic, in its own set of fq'd flows. Simple way to do that is to have a bkweight of, say 20, and only check "q->slow_flows" on that interval of packet deliveries. This is the only way I can think of to survive bittorrent-like flows, and t= o capture the intent of traffic marked background. However, I did want to talk to the using-codel-to-solve-everything issue for fixing host bufferbloat... Fixing host bufferbloat by adding local tcp awareness is a neat idea, don't let me stop you! But... Codel will push stuff down to, but not below, 5ms of latency (or target). In fq_codel you will typically end up with 1 packet outstanding in each active queue under heavy load. At 10Mbit it's pretty easy to have it strain mightily and fail to get to 5ms, particularly on torrent-lik= e workloads. The "right" amount of host latency to aim for is ... 0, or as close to it a= s you can get. Fiddling with codel target and interval on the host to get less host latency is well and good, but you can't get to 0 that way... The best queue on a host is no extra queue. I spent some time evaluating linux fq_codel vs the ns2 nfq_codel version I just got working. In 150 bidirectional competing streams, at 100Mbit, it retained about 30% less packets in queue (110 vs 140). Next up on my list is longer RTTs and wifi, but all else was pretty equivalent. The effects of fiddling with /proc/sys/net/ipv4/tcp_limit_output_bytes was even more remarkable. At 6000, I would get down to a nice steady 71-81 packets in queue on that 150 stream workload. So, I started thinking through and playing with how TSQ works: At one hop 100Mbit, with a BQL of 3000 and a tcp_limit_output_bytes of 6000= , all offloads off, nfq_codel on both ends, I get single stream throughoutput of 92.85Mbit. Backlog in qdisc is, 0. 2 netperf streams, bidirectional: 91.47 each, darn close to theoretical, le= ss than one packet in the backlog. 4 streams backlogs a little over 3. (and sums to 91.94 in each direction) 8, backlog of 8. (optimal throughput) Repeating the 8 stream test with tcp_output_limit of 1500, I get packets outstanding of around 3, and optimal throughput. (1 stream test: 42Mbit throughput (obviously starved), 150 streams: 82...) 8 streams, limit set to 127k, I get 50 packets outstanding in the queue, and the same throughput. (150 streams, ~100) So I might argue that a more "right" number for tcp_output_bytes is not 128k per TCP socket, but (BQL_limit*2/active_sockets), in conjunction with fq_codel. I realize that that raises interesting questions as to when to use TSO/GSO, and how to schedule tcp packet releases, and pushes the window reduction issue all the way up into the tcp stack rather than responding to indications from the qdisc... but it does get you closer to a 0 backlog in qdisc. And *usually* the bottleneck link is not on the host but on something inbetween, and that's where your signalling comes from, anyway. --=20 Dave T=E4ht http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out with fq_codel!"