Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
From: Dave Taht <dave.taht@gmail.com>
To: Cake List <cake@lists.bufferbloat.net>
Subject: Re: [Cake] [net-next PATCH 00/14] lockless qdisc series
Date: Thu, 7 Dec 2017 11:07:45 -0800	[thread overview]
Message-ID: <CAA93jw74HcPi3qZbc=rCbN5m2opNshhHo_i5_C8Y99Sk0P3MVQ@mail.gmail.com> (raw)
In-Reply-To: <CAA93jw4SAszy0MgC7hGFAY27_uVGkvD0FcZTm=_y69=-h_datw@mail.gmail.com>

I'm forwarding this sort of stuff 'cause I keep hoping to find more
optimizations for cake, and it really seems like cheap multicores have
grown very common.




On Thu, Dec 7, 2017 at 10:53 AM, Dave Taht <dave.taht@gmail.com> wrote:
> ---------- Forwarded message ----------
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Thu, Dec 7, 2017 at 9:53 AM
> Subject: [net-next PATCH 00/14] lockless qdisc series
> To: willemdebruijn.kernel@gmail.com, daniel@iogearbox.net,
> eric.dumazet@gmail.com, davem@davemloft.net
> Cc: netdev@vger.kernel.org, jiri@resnulli.us, xiyou.wangcong@gmail.com
>
>
> This series adds support for building lockless qdiscs. This is
> the result of noticing the qdisc lock is a common hot-spot in
> perf analysis of the Linux network stack, especially when testing
> with high packet per second rates. However, nothing is free and
> most qdiscs rely on the qdisc lock for their data structures so
> each qdisc must be converted on a case by case basis. In this
> series, to kick things off, we make pfifo_fast, mq, and mqprio
> lockless. Follow up series can address additional qdiscs as needed.
> For example sch_tbf might be useful. To allow this the lockless
> design is an opt-in flag. In some future utopia we convert all
> qdiscs and we get to drop this case analysis, but in order to
> make progress we live in the real-world.
>
> There are also a handful of optimizations I have behind this
> series and a few code cleanups that I couldn't figure out how
> to fit neatly into this series with out increasing the patch
> count. Once this is in additional patches can address this. The
> most notable is in skb_dequeue we can push the consumer lock
> out a bit and consume multiple skbs off the skb_array in pfifo
> fast per iteration. Ideally we could push arrays of packets at
> drivers as well but we would need the infrastructure for this.
> The other notable improvement is to do less locking in the
> overrun cases where bad tx queue list and gso_skb are being
> hit. Although, nice in theory in practice this is the error
> case and I haven't found a benchmark where this matters yet.
>
> For testing...
>
> My first test case uses multiple containers (via cilium) where
> multiple client containers use 'wrk' to benchmark connections with
> a server container running lighttpd. Where lighttpd is configured
> to use multiple threads, one per core. Additionally this test has
> a proxy agent running so all traffic takes an extra hop through a
> proxy container. In these cases each TCP packet traverses the egress
> qdisc layer at least four times and the ingress qdisc layer an
> additional four times. This makes for a good stress test IMO, perf
> details below.
>
> The other micro-benchmark I run is injecting packets directly into
> qdisc layer using pktgen. This uses the benchmark script,
>
>  ./pktgen_bench_xmit_mode_queue_xmit.sh
>
> Benchmarks taken in two cases, "base" running latest net-next no
> changes to qdisc layer and "qdisc" tests run with qdisc lockless
> updates. Numbers reported in req/sec. All virtual 'veth' devices
> run with pfifo_fast in the qdisc test case.
>
> `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"`
>
> conns    16      32     64   1024
> -----------------------------------------------
> base:   18831  20201  21393  29151
> qdisc:  19309  21063  23899  29265
>
> notice in all cases we see performance improvement when running
> with qdisc case.
>
> Microbenchmarks using pktgen are as follows,
>
> `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000
>
> base(mq):          2.1Mpps
> base(pfifo_fast):  2.1Mpps
> qdisc(mq):         2.6Mpps
> qdisc(pfifo_fast): 2.6Mpps
>
> notice numbers are the same for mq and pfifo_fast because only
> testing a single thread here. In both tests we see a nice bump
> in performance gain. The key with 'mq' is it is already per
> txq ring so contention is minimal in the above cases. Qdiscs
> such as tbf or htb which have more contention will likely show
> larger gains when/if lockless versions are implemented.
>
> Thanks to everyone who helped with this work especially Daniel
> Borkmann, Eric Dumazet and Willem de Bruijn for discussing the
> design and reviewing versions of the code.
>
> Changes from the RFC: dropped a couple patches off the end,
> fixed a bug with skb_queue_walk_safe not unlinking skb in all
> cases, fixed a lockdep splat with pfifo_fast_destroy not calling
> *_bh lock variant, addressed _most_ of Willem's comments, there
> was a bug in the bulk locking (final patch) of the RFC series.
>
> @Willem, I left out lockdep annotation for a follow on series
> to add lockdep more completely, rather than just in code I
> touched.
>
> Comments and feedback welcome.
>
> Thanks,
> John
>
> ---
>
> John Fastabend (14):
>       net: sched: cleanup qdisc_run and __qdisc_run semantics
>       net: sched: allow qdiscs to handle locking
>       net: sched: remove remaining uses for qdisc_qlen in xmit path
>       net: sched: provide per cpu qstat helpers
>       net: sched: a dflt qdisc may be used with per cpu stats
>       net: sched: explicit locking in gso_cpu fallback
>       net: sched: drop qdisc_reset from dev_graft_qdisc
>       net: sched: use skb list for skb_bad_tx
>       net: sched: check for frozen queue before skb_bad_txq check
>       net: sched: helpers to sum qlen and qlen for per cpu logic
>       net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
>       net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
>       net: skb_array: expose peek API
>       net: sched: pfifo_fast use skb_array
>
>
>  include/linux/skb_array.h |    5 +
>  include/net/gen_stats.h   |    3
>  include/net/pkt_sched.h   |   10 +
>  include/net/sch_generic.h |   79 +++++++-
>  net/core/dev.c            |   31 +++
>  net/core/gen_stats.c      |    9 +
>  net/sched/sch_api.c       |    8 +
>  net/sched/sch_generic.c   |  440 ++++++++++++++++++++++++++++++++-------------
>  net/sched/sch_mq.c        |   34 +++
>  net/sched/sch_mqprio.c    |   69 +++++--
>  10 files changed, 512 insertions(+), 176 deletions(-)
>
> --
> Signature
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

  parent reply	other threads:[~2017-12-07 19:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20171207173500.5771.41198.stgit@john-Precision-Tower-5810>
2017-12-07 18:53 ` [Cake] Fwd: " Dave Taht
     [not found]   ` <CAJq5cE22iXQsrwT3o0sv_=ydrk6wr6nqtoN3k5Tf1mxdzJC04A@mail.gmail.com>
     [not found]     ` <CAJq5cE3ESNejMEPDJORtDtHZpTTQKROs3-e1xbejajrpciYD2A@mail.gmail.com>
2017-12-07 19:05       ` Jonathan Morton
2017-12-07 19:07   ` Dave Taht [this message]
2017-12-07 21:57     ` [Cake] " Pete Heist
2017-12-13  7:07       ` Pete Heist

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cake.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA93jw74HcPi3qZbc=rCbN5m2opNshhHo_i5_C8Y99Sk0P3MVQ@mail.gmail.com' \
    --to=dave.taht@gmail.com \
    --cc=cake@lists.bufferbloat.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox