From: Dave Taht <dave.taht@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: codel@lists.bufferbloat.net
Subject: Re: [Codel] fq_codel : interval servo
Date: Sun, 2 Sep 2012 11:08:19 -0700 [thread overview]
Message-ID: <CAA93jw4gAUXYHOSHC6A15nEgdm9Fh2tze5PehK1M-qyksYAqGQ@mail.gmail.com> (raw)
In-Reply-To: <1346504012.7996.68.camel@edumazet-glaptop>
On Sat, Sep 1, 2012 at 5:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2012-08-31 at 09:59 -0700, Dave Taht wrote:
>
>> I realize that 10GigE and datacenter host based work is sexy and fun,
>> but getting stuff that runs well in today's 1-20Mbit environments is
>> my own priority, going up to 100Mbit, with something that can be
>> embedded in a SoC. The latest generation of SoCs all do QoS in
>> hardware... badly.
>
> Maybe 'datacenter' word was badly chosen and you obviously jumped on it,
> because it meant different things for you.
I am hypersensitive about optimizing for sub-ms problems when there are
huge multi-second problems like in cable, wifi, and cellular. Recent paper:
http://conferences.sigcomm.org/sigcomm/2012/paper/cellnet/p1.pdf
Sorry.
If the srtt idea can scale UP as well as down sanely, cool. I'm
concerned about how different TCPs might react to this and have a
long comment about the placement of this at this layer at the bottom
of this email.
> Point was that when your machine has flows with quite different RTT, 1
> ms on your local LAN, and 100 ms on different continent, current control
> law might clamp long distance communications, or have slow response time
> for the LAN traffic.
fq_codel, far less likely, and if you have a collision between long distance
and local streams in a single queue, there, what will happen if you fiddle
with srrt?
> The shortest path you have, the sooner you should drop packets because
> losses have much less impact on latencies.
Sure.
> Yuchung idea sounds very good and my intuition is it will give
> tremendous results for standard linux qdisc setups ( a single qdisc per
> device)
I tend to agree.
> To get similar effects, you could use two (or more) fq codels per
> ethernet device.
Ugh.
> One fq_codel with interval = 1 or 5 ms for LAN communications
> One fq_codel with interval = 100 ms for other communications
and one mfq_codel with a calculated maxpacket, weird interval, etc
for wifi.
> tc filters to select the right qdisc by destination addresses
Meh. A simple default might be "Am I going out the default route for this?"
> Then we are a bit far from codel spirit (no knob qdisc)
>
> I am pretty sure you noticed that if your ethernet adapter is only used
> for LAN communications, you have to setup codel interval to a much
> smaller value than the 100 ms default to get reasonably fast answer to
> congestion.
At 100Mbit, (as I've noted elsewhere), BQL choses defaults about double
optimum (6-7k), and gso is currently left on. With those disabled, I tend to run
a pretty congested network, and rarely notice. That does not mean that
reaction time isn't an issue, it is merely masked so well that I don't care.
> Just make this automatic, because people dont want to think about it.
Like you, I want one qdisc to rule them all, with sane defaults.
I do feel it is very necessary to add in one pfifo_fast-like behavior in
fq_codel: deprioritizing background traffic, in its own
set of fq'd flows. Simple way to do that is to have a bkweight of,
say 20, and only check "q->slow_flows" on that interval of packet
deliveries.
This is the only way I can think of to survive bittorrent-like flows, and to
capture the intent of traffic marked background.
However, I did want to talk to the using-codel-to-solve-everything issue
for fixing host bufferbloat...
Fixing host bufferbloat by adding local tcp awareness is a neat idea,
don't let me stop you! But...
Codel will push stuff down to, but not below, 5ms of latency (or
target). In fq_codel you will typically end up with 1 packet outstanding in
each active queue under heavy load. At 10Mbit it's pretty easy to
have it strain mightily and fail to get to 5ms, particularly on torrent-like
workloads.
The "right" amount of host latency to aim for is ... 0, or as close to it as
you can get. Fiddling with codel target and interval on the host to
get less host latency is well and good, but you can't get to 0 that way...
The best queue on a host is no extra queue.
I spent some time evaluating linux fq_codel vs the ns2 nfq_codel version I
just got working. In 150 bidirectional competing streams, at 100Mbit,
it retained about 30% less packets in queue (110 vs 140). Next up
on my list is longer RTTs and wifi, but all else was pretty equivalent.
The effects of fiddling with /proc/sys/net/ipv4/tcp_limit_output_bytes
was even more remarkable. At 6000, I would get down to
a nice steady 71-81 packets in queue on that 150 stream workload.
So, I started thinking through and playing with how TSQ works:
At one hop 100Mbit, with a BQL of 3000 and a tcp_limit_output_bytes of 6000,
all offloads off, nfq_codel on both ends, I get single stream throughoutput
of 92.85Mbit. Backlog in qdisc is, 0.
2 netperf streams, bidirectional: 91.47 each, darn close to theoretical, less
than one packet in the backlog.
4 streams backlogs a little over 3. (and sums to 91.94 in each direction)
8, backlog of 8. (optimal throughput)
Repeating the 8 stream test with tcp_output_limit of 1500, I get
packets outstanding of around 3, and optimal throughput. (1 stream test:
42Mbit throughput (obviously starved), 150 streams: 82...)
8 streams, limit set to 127k, I get 50 packets outstanding in the queue,
and the same throughput. (150 streams, ~100)
So I might argue that a more "right" number for tcp_output_bytes is
not 128k per TCP socket, but (BQL_limit*2/active_sockets), in conjunction
with fq_codel. I realize that that raises interesting questions as to when
to use TSO/GSO, and how to schedule tcp packet releases, and pushes
the window reduction issue all the way up into the tcp stack rather
than responding to indications from the qdisc... but it does
get you closer to a 0 backlog in qdisc.
And *usually* the bottleneck link is not on the host but on something
inbetween, and that's where your signalling comes from, anyway.
--
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"
next prev parent reply other threads:[~2012-09-02 18:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-31 6:55 Eric Dumazet
2012-08-31 13:41 ` Jim Gettys
2012-08-31 13:50 ` [Codel] [RFC] fq_codel : interval servo on hosts Eric Dumazet
2012-08-31 13:57 ` [Codel] [RFC v2] " Eric Dumazet
2012-09-01 1:37 ` Yuchung Cheng
2012-09-01 12:51 ` Eric Dumazet
2012-09-04 15:10 ` Nandita Dukkipati
2012-09-04 15:25 ` Jonathan Morton
2012-09-04 15:39 ` Eric Dumazet
2012-09-04 15:34 ` Eric Dumazet
2012-09-04 16:40 ` Dave Taht
2012-09-04 16:54 ` Eric Dumazet
2012-09-04 16:57 ` Eric Dumazet
2012-08-31 15:53 ` [Codel] fq_codel : interval servo Rick Jones
2012-08-31 16:23 ` Eric Dumazet
2012-08-31 16:59 ` Dave Taht
2012-09-01 12:53 ` Eric Dumazet
2012-09-02 18:08 ` Dave Taht [this message]
2012-09-02 18:17 ` Dave Taht
2012-09-02 23:28 ` Eric Dumazet
2012-09-02 23:23 ` Eric Dumazet
2012-09-03 0:18 ` Dave Taht
2012-08-31 16:40 ` Jim Gettys
2012-08-31 16:49 ` Jonathan Morton
2012-08-31 17:15 ` Jim Gettys
2012-08-31 17:31 ` Rick Jones
2012-08-31 17:44 ` Jim Gettys
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/codel.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA93jw4gAUXYHOSHC6A15nEgdm9Fh2tze5PehK1M-qyksYAqGQ@mail.gmail.com \
--to=dave.taht@gmail.com \
--cc=codel@lists.bufferbloat.net \
--cc=eric.dumazet@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox