CoDel AQM discussions
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: nanditad@google.com, netdev@vger.kernel.org, ycheng@google.com,
	codel@lists.bufferbloat.net, mattmathis@google.com,
	ncardwell@google.com
Subject: Re: [Codel] [RFC PATCH v2] tcp: TCP Small Queues
Date: Tue, 10 Jul 2012 19:06:27 +0200	[thread overview]
Message-ID: <1341939987.3265.5741.camel@edumazet-glaptop> (raw)
In-Reply-To: <1341933215.3265.5476.camel@edumazet-glaptop>

On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
> This introduce TSQ (TCP Small Queues)
> 
> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
> problem.
> 
> sk->sk_wmem_alloc not allowed to grow above a given limit,
> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
> given time.
> 
> TSO packets are sized/capped to half the limit, so that we have two
> TSO packets in flight, allowing better bandwidth use.
> 
> As a side effect, setting the limit to 40000 automatically reduces the
> standard gso max limit (65536) to 40000/2 : It can help to reduce
> latencies of high prio packets, having smaller TSO packets.
> 
> This means we divert sock_wfree() to a tcp_wfree() handler, to
> queue/send following frames when skb_orphan() [2] is called for the
> already queued skbs.
> 
> Results on my dev machine (tg3 nic) are really impressive, using
> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
> nominal bandwidth.
> 
> I no longer have 3MBytes backlogged in qdisc by a single netperf
> session, and both side socket autotuning no longer use 4 Mbytes.
> 
> As skb destructor cannot restart xmit itself ( as qdisc lock might be
> taken at this point ), we delegate the work to a tasklet. We use one
> tasklest per cpu for performance reasons.
> 
> 
> 
> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
> [2] skb_orphan() is usually called at TX completion time,
>   but some drivers call it in their start_xmit() handler.
>   These drivers should at least use BQL, or else a single TCP
>   session can still fill the whole NIC TX ring, since TSQ will
>   have no effect.
> 
> Not-Yet-Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

By the way, Rick Jones asked me :

"Is there also any chance in service demand?"

I copy here my answer since its a very good point:

I worked on the idea of a CoDel like feedback, to have a timed limit
instead of byte limit ("allow up to 1ms" delay in qdisc/dev queue.)

But it seemed a bit complex : I would need to add skb fields to properly
track the residence time (sojourn time) of queued packets.

Alternative would be to have a per tcp socket tracking array,
but it might be expensive to search a packet in it...

With multi queue devices or bad qdiscs, we can have reordering in skb
orphanings. So the lookup can be relatively expensive.




  reply	other threads:[~2012-07-10 17:06 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-28 17:07 [Codel] [PATCH net-next] fq_codel: report congestion notification at enqueue time Eric Dumazet
2012-06-28 17:51 ` Dave Taht
2012-06-28 18:12   ` Eric Dumazet
2012-06-28 22:56     ` Yuchung Cheng
2012-06-28 23:47       ` Dave Taht
2012-06-29  4:50         ` Eric Dumazet
2012-06-29  5:24           ` Dave Taht
2012-07-04 10:11           ` [Codel] [RFC PATCH] tcp: limit data skbs in qdisc layer Eric Dumazet
2012-07-09  7:08             ` David Miller
2012-07-09  8:03               ` Eric Dumazet
2012-07-09  8:48                 ` Eric Dumazet
2012-07-09 14:55               ` Eric Dumazet
2012-07-10 13:28                 ` Lin Ming
2012-07-10 15:13                 ` [Codel] [RFC PATCH v2] tcp: TCP Small Queues Eric Dumazet
2012-07-10 17:06                   ` Eric Dumazet [this message]
2012-07-10 17:37                   ` Yuchung Cheng
2012-07-10 18:32                     ` Eric Dumazet
2012-07-11 15:11                   ` Eric Dumazet
2012-07-11 15:16                     ` Ben Greear
2012-07-11 15:25                       ` Eric Dumazet
2012-07-11 15:43                         ` Ben Greear
2012-07-11 15:54                           ` Eric Dumazet
2012-07-11 16:03                             ` Ben Greear
2012-07-11 18:23                     ` Rick Jones
2012-07-11 23:38                       ` Eric Dumazet
2012-07-11 18:44                     ` Rick Jones
2012-07-11 23:49                       ` Eric Dumazet
2012-07-12  7:34                         ` Eric Dumazet
2012-07-12  7:37                           ` David Miller
2012-07-12  7:51                             ` Eric Dumazet
2012-07-12 14:55                               ` Tom Herbert
2012-07-12 13:33                   ` John Heffner
2012-07-12 13:46                     ` Eric Dumazet
2012-07-12 16:44                       ` John Heffner
2012-07-12 16:54                         ` Jim Gettys
2012-06-28 23:52 ` [Codel] [PATCH net-next] fq_codel: report congestion notification at enqueue time Nandita Dukkipati
2012-06-29  4:18   ` Eric Dumazet
2012-06-29  4:53 ` Eric Dumazet
2012-06-29  5:12   ` David Miller
2012-06-29  5:24     ` Eric Dumazet
2012-06-29  5:29       ` David Miller
2012-06-29  5:50         ` Eric Dumazet
2012-06-29  7:53           ` David Miller
2012-06-29  8:04           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/codel.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1341939987.3265.5741.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=codel@lists.bufferbloat.net \
    --cc=davem@davemloft.net \
    --cc=mattmathis@google.com \
    --cc=nanditad@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox