From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 08936200A7A
	for <codel@lists.bufferbloat.net>; Sun,  2 Sep 2012 11:08:21 -0700 (PDT)
Received: by wgbfa7 with SMTP id fa7so2619344wgb.28
	for <codel@lists.bufferbloat.net>; Sun, 02 Sep 2012 11:08:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=JG0OIBMDNeOyqA9ZJBDGEAMhH9HJjAtW0bU/8MZk1/s=;
	b=Sc7+5LCnGq3Sez0BAMb076EvR+UzXA/+pltDHzJBqG5ipV7l/593rgW2KOE170LUu4
	75ZBRZ66rNmQ2dwgtwyl2xr6ZOnx9F6J3y/rEdalq7WbPl3pL8cHsJYu49rQ5paThdDc
	LLIfcyyTs1kmPqlN8J5qIAQEcbxJUCFiCNKPd0NPuknVumPcqGwgWMTWvImUzxfI8Dfz
	1ClNmH4wESEHRRPCLWwPhNlg/PW5OCCcaqgarM5QvDeUEQBrKAVy4CmWuGqe/V1EFe84
	D9LvLWywB+x5h6v4l3x+OxMDRICW4e2UYStv7flPxh8hV32uUoRIGQyjay4sSLQivjsN
	BEsQ==
MIME-Version: 1.0
Received: by 10.180.100.133 with SMTP id ey5mr17877456wib.4.1346609299189;
	Sun, 02 Sep 2012 11:08:19 -0700 (PDT)
Received: by 10.223.159.134 with HTTP; Sun, 2 Sep 2012 11:08:19 -0700 (PDT)
In-Reply-To: <1346504012.7996.68.camel@edumazet-glaptop>
References: <1346396137.2586.301.camel@edumazet-glaptop>
	<5040DDE9.7030507@hp.com>
	<1346430207.7996.11.camel@edumazet-glaptop>
	<CAA93jw6hQhFpJjySqfRTS3DFLDKV+LPfLzqDU8JMZdJOBaJ2HQ@mail.gmail.com>
	<1346504012.7996.68.camel@edumazet-glaptop>
Date: Sun, 2 Sep 2012 11:08:19 -0700
Message-ID: <CAA93jw4gAUXYHOSHC6A15nEgdm9Fh2tze5PehK1M-qyksYAqGQ@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: codel@lists.bufferbloat.net
Subject: Re: [Codel] fq_codel : interval servo
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 02 Sep 2012 18:08:22 -0000

On Sat, Sep 1, 2012 at 5:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote=
:
> On Fri, 2012-08-31 at 09:59 -0700, Dave Taht wrote:
>
>> I realize that 10GigE and datacenter host based work is sexy and fun,
>> but getting stuff that runs well in today's 1-20Mbit environments is
>> my own priority, going up to 100Mbit, with something that can be
>> embedded in a SoC. The latest generation of SoCs all do QoS in
>> hardware... badly.
>
> Maybe 'datacenter' word was badly chosen and you obviously jumped on it,
> because it meant different things for you.

I am hypersensitive about optimizing for sub-ms problems when there are
huge multi-second problems like in cable, wifi, and cellular. Recent paper:

http://conferences.sigcomm.org/sigcomm/2012/paper/cellnet/p1.pdf

Sorry.

If the srtt idea can scale UP as well as down sanely, cool. I'm
concerned about how different TCPs might react to this and have a
long comment about the placement of this at this layer at the bottom
of this email.

> Point was that when your machine has flows with quite different RTT, 1
> ms on your local LAN, and 100 ms on different continent, current control
> law might clamp long distance communications, or have slow response time
> for the LAN traffic.

fq_codel, far less likely, and if you have a collision between long distanc=
e
and local streams in a single queue, there, what will happen if you fiddle
with srrt?

> The shortest path you have, the sooner you should drop packets because
> losses have much less impact on latencies.

Sure.

> Yuchung idea sounds very good and my intuition is it will give
> tremendous results for standard linux qdisc setups ( a single qdisc per
> device)

I tend to agree.

> To get similar effects, you could use two (or more) fq codels per
> ethernet device.

Ugh.

> One fq_codel with interval =3D 1 or 5 ms for LAN communications
> One fq_codel with interval =3D 100 ms for other communications
and one mfq_codel with a calculated maxpacket, weird interval, etc
for wifi.

> tc filters to select the right qdisc by destination addresses

Meh. A simple default might be "Am I going out the default route for this?"
> Then we are a bit far from codel spirit (no knob qdisc)
>
> I am pretty sure you noticed that if your ethernet adapter is only used
> for LAN communications, you have to setup codel interval to a much
> smaller value than the 100 ms default to get reasonably fast answer to
> congestion.

At 100Mbit, (as I've noted elsewhere), BQL choses defaults about double
optimum (6-7k), and gso is currently left on. With those disabled, I tend t=
o run
a pretty congested network, and rarely notice.  That does not mean that
reaction time isn't an issue, it is merely masked so well that I don't care=
.

> Just make this automatic, because people dont want to think about it.

Like you, I want one qdisc to rule them all, with sane defaults.

I do feel it is very necessary to add in one pfifo_fast-like behavior in
fq_codel: deprioritizing background traffic, in its own
set of fq'd flows. Simple way to do that is to have a bkweight of,
say 20, and only check "q->slow_flows" on that interval of packet
deliveries.

This is the only way I can think of to survive bittorrent-like flows, and t=
o
capture the intent of traffic marked background.

However, I did want to talk to the using-codel-to-solve-everything issue
for fixing host bufferbloat...

Fixing host bufferbloat by adding local tcp awareness is a neat idea,
don't let me stop you! But...

Codel will push stuff down to, but not below, 5ms of latency (or
target). In fq_codel you will typically end up with 1 packet outstanding in
each active queue under heavy load. At 10Mbit it's pretty easy to
have it strain mightily and fail to get to 5ms, particularly on torrent-lik=
e
workloads.

The "right" amount of host latency to aim for is ... 0, or as close to it a=
s
you can get.  Fiddling with codel target and interval on the host to
get less host latency is well and good, but you can't get to 0 that way...

The best queue on a host is no extra queue.

I spent some time evaluating linux fq_codel vs the ns2 nfq_codel version I
just got working. In 150 bidirectional competing streams, at 100Mbit,
it retained about 30% less packets in queue (110 vs 140). Next up
on my list is longer RTTs and wifi, but all else was pretty equivalent.

The effects of fiddling with /proc/sys/net/ipv4/tcp_limit_output_bytes
was even more remarkable. At 6000, I would get down to
a nice steady 71-81 packets in queue on that 150 stream workload.

So, I started thinking through and playing with how TSQ works:

At one hop 100Mbit, with a BQL of 3000 and a tcp_limit_output_bytes of 6000=
,
all offloads off, nfq_codel on both ends, I get single stream throughoutput
of 92.85Mbit.  Backlog in qdisc is, 0.

2 netperf streams, bidirectional: 91.47 each, darn close to theoretical, le=
ss
than one packet in the backlog.

4 streams backlogs a little over 3. (and sums to 91.94 in each direction)

8, backlog of 8. (optimal throughput)

Repeating the 8 stream test with tcp_output_limit of 1500, I get
packets outstanding of around 3, and optimal throughput. (1 stream test:
42Mbit throughput (obviously starved), 150 streams: 82...)

8 streams, limit set to 127k, I get 50 packets outstanding in the queue,
and the same throughput. (150 streams, ~100)

So I might argue that a more "right" number for tcp_output_bytes is
not 128k per TCP socket, but (BQL_limit*2/active_sockets), in conjunction
with fq_codel. I realize that that raises interesting questions as to when
to use TSO/GSO, and how to schedule tcp packet releases, and pushes
the window reduction issue all the way up into the tcp stack rather
than responding to indications from the qdisc... but it does
get you closer to a 0 backlog in qdisc.

And *usually* the bottleneck link is not on the host but on something
inbetween, and that's where your signalling comes from, anyway.


--=20
Dave T=E4ht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"