From: Ben Greear <greearb@candelatech.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: nanditad@google.com, netdev@vger.kernel.org,
mattmathis@google.com, codel@lists.bufferbloat.net,
ncardwell@google.com, David Miller <davem@davemloft.net>
Subject: Re: [Codel] [RFC PATCH v2] tcp: TCP Small Queues
Date: Wed, 11 Jul 2012 08:16:58 -0700 [thread overview]
Message-ID: <4FFD98EA.1040301@candelatech.com> (raw)
In-Reply-To: <1342019518.3265.8116.camel@edumazet-glaptop>
On 07/11/2012 08:11 AM, Eric Dumazet wrote:
> On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
>> This introduce TSQ (TCP Small Queues)
>>
>> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
>> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
>> problem.
>>
>> sk->sk_wmem_alloc not allowed to grow above a given limit,
>> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
>> given time.
>>
>> TSO packets are sized/capped to half the limit, so that we have two
>> TSO packets in flight, allowing better bandwidth use.
>>
>> As a side effect, setting the limit to 40000 automatically reduces the
>> standard gso max limit (65536) to 40000/2 : It can help to reduce
>> latencies of high prio packets, having smaller TSO packets.
>>
>> This means we divert sock_wfree() to a tcp_wfree() handler, to
>> queue/send following frames when skb_orphan() [2] is called for the
>> already queued skbs.
>>
>> Results on my dev machine (tg3 nic) are really impressive, using
>> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
>> nominal bandwidth.
>>
>> I no longer have 3MBytes backlogged in qdisc by a single netperf
>> session, and both side socket autotuning no longer use 4 Mbytes.
>>
>> As skb destructor cannot restart xmit itself ( as qdisc lock might be
>> taken at this point ), we delegate the work to a tasklet. We use one
>> tasklest per cpu for performance reasons.
>>
>>
>>
>> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
>> [2] skb_orphan() is usually called at TX completion time,
>> but some drivers call it in their start_xmit() handler.
>> These drivers should at least use BQL, or else a single TCP
>> session can still fill the whole NIC TX ring, since TSQ will
>> have no effect.
>
> I am going to send an official patch (I'll put a v3 tag in it)
>
> I believe I did a full implementation, including the xmit() done
> by the user at release_sock() time, if the tasklet found socket owned by
> the user.
>
> Some bench results about the choice of 128KB being the default value:
>
> 64KB seems the 'good' value on 10Gb links to reach max throughput on my
> lab machines (ixgbe adapters).
>
> Using 128KB is a very conservative value to allow link rate on 20Gbps.
>
> Still, it allows less than 1ms of buffering on a Gbit link, and less
> than 8ms on 100Mbit link (instead of 130ms without Small Queues)
I haven't read your patch in detail, but I was wondering if this feature
would cause trouble for applications that are servicing many sockets at once
and so might take several ms between handling each individual socket.
Or, applications that for other reasons cannot service sockets quite
as fast. Without this feature, they could poke more data into the
xmit queues to be handled by the kernel while the app goes about it's
other user-space work?
Maybe this feature could be enabled/tuned on a per-socket basis?
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2012-07-11 15:17 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-28 17:07 [Codel] [PATCH net-next] fq_codel: report congestion notification at enqueue time Eric Dumazet
2012-06-28 17:51 ` Dave Taht
2012-06-28 18:12 ` Eric Dumazet
2012-06-28 22:56 ` Yuchung Cheng
2012-06-28 23:47 ` Dave Taht
2012-06-29 4:50 ` Eric Dumazet
2012-06-29 5:24 ` Dave Taht
2012-07-04 10:11 ` [Codel] [RFC PATCH] tcp: limit data skbs in qdisc layer Eric Dumazet
2012-07-09 7:08 ` David Miller
2012-07-09 8:03 ` Eric Dumazet
2012-07-09 8:48 ` Eric Dumazet
2012-07-09 14:55 ` Eric Dumazet
2012-07-10 13:28 ` Lin Ming
2012-07-10 15:13 ` [Codel] [RFC PATCH v2] tcp: TCP Small Queues Eric Dumazet
2012-07-10 17:06 ` Eric Dumazet
2012-07-10 17:37 ` Yuchung Cheng
2012-07-10 18:32 ` Eric Dumazet
2012-07-11 15:11 ` Eric Dumazet
2012-07-11 15:16 ` Ben Greear [this message]
2012-07-11 15:25 ` Eric Dumazet
2012-07-11 15:43 ` Ben Greear
2012-07-11 15:54 ` Eric Dumazet
2012-07-11 16:03 ` Ben Greear
2012-07-11 18:23 ` Rick Jones
2012-07-11 23:38 ` Eric Dumazet
2012-07-11 18:44 ` Rick Jones
2012-07-11 23:49 ` Eric Dumazet
2012-07-12 7:34 ` Eric Dumazet
2012-07-12 7:37 ` David Miller
2012-07-12 7:51 ` Eric Dumazet
2012-07-12 14:55 ` Tom Herbert
2012-07-12 13:33 ` John Heffner
2012-07-12 13:46 ` Eric Dumazet
2012-07-12 16:44 ` John Heffner
2012-07-12 16:54 ` Jim Gettys
2012-06-28 23:52 ` [Codel] [PATCH net-next] fq_codel: report congestion notification at enqueue time Nandita Dukkipati
2012-06-29 4:18 ` Eric Dumazet
2012-06-29 4:53 ` Eric Dumazet
2012-06-29 5:12 ` David Miller
2012-06-29 5:24 ` Eric Dumazet
2012-06-29 5:29 ` David Miller
2012-06-29 5:50 ` Eric Dumazet
2012-06-29 7:53 ` David Miller
2012-06-29 8:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/codel.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FFD98EA.1040301@candelatech.com \
--to=greearb@candelatech.com \
--cc=codel@lists.bufferbloat.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=mattmathis@google.com \
--cc=nanditad@google.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox