From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-x22c.google.com (mail-qt0-x22c.google.com [IPv6:2607:f8b0:400d:c0d::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B0DF43B29E for ; Thu, 7 Dec 2017 14:07:46 -0500 (EST) Received: by mail-qt0-x22c.google.com with SMTP id g10so20228432qtj.12 for ; Thu, 07 Dec 2017 11:07:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=SSUYBSWc+ZYxwWuo3kIabp7G9HTCTb9Vj4+HVl9FpyI=; b=NwbX8HRRgobqGFHkaBj0TooCnn8F59e8WQiivRh3w9iZkkcDp2zZa5+1a+HgkXyO58 TNWzhBAIme7/Wg8WLhYvSwZIuf5GzgF2kDYE6EE2RZ1P0h94hFNMQs331Gx2asc2EB9L oGy685iyvIqeHr6xarU31AnG82ey8QdnNA9E7v8vtLnm34cWIBCX6Bd6TRL+PBp2Z3Ci xPAyReNshaRZeezibLe0k8RYTEGgSC46N+oV563Xh6XbrE3HW6pLzMi9TxgnJjPHi92r Ebs/iWRsTRbODAA5xmLWRY9bOZ1UWOk87PYIDR6mgVDmDYAzHl9+KH63mv+5gMgZKMbb 2Qdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=SSUYBSWc+ZYxwWuo3kIabp7G9HTCTb9Vj4+HVl9FpyI=; b=f2q4irRTe1SToUoxqWHxoQY5AccXUaqSyRfa99D1JwdN0hDyL9Wl48Xj/p6T7cd3zh zOuAzjgCqVM9OO1NoALpArgrvgr+bGKILDcTbTtQY2xi6iMbu/fqVeIhGkdjXyvyHMxa mbxn0iOXjv7kwdWfCdqIjZgixYssO9Wq68LTCrpf+q1DwhF1PGrc5l9DZBAMUyz5JBAI hB3QpySM8dNqfREOj6uMzXOUPbzn3Qy0htz558K9K9M4y0p7xdGiNX390yCt+JBhlIr5 oq2TKXbJsLRqrBQa3jHye1pgb51/VYSCnfxS2rWEj5mwTRY02rOeKGcDiw+8sysxy3Rc Nk0A== X-Gm-Message-State: AKGB3mLDnbe0b/PDkzLveI1opo3tPis2hR9ar8xQZCKEsAdABDbgZyP2 hKMJRJgVwmRMcsl5h1w6WKlidaEVpIwvGMVjTtQ= X-Google-Smtp-Source: AGs4zMbttJYiNnypIHx1vCwZbWUhO83f3JpvCT+uRkr8SXIOsUYiVvqrzpapJFCoLxSzK9HHUZADqlRJfdyvDGb6E8o= X-Received: by 10.55.109.4 with SMTP id i4mr30407242qkc.17.1512673665951; Thu, 07 Dec 2017 11:07:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.193.93 with HTTP; Thu, 7 Dec 2017 11:07:45 -0800 (PST) In-Reply-To: References: <20171207173500.5771.41198.stgit@john-Precision-Tower-5810> From: Dave Taht Date: Thu, 7 Dec 2017 11:07:45 -0800 Message-ID: To: Cake List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] [net-next PATCH 00/14] lockless qdisc series X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Dec 2017 19:07:46 -0000 I'm forwarding this sort of stuff 'cause I keep hoping to find more optimizations for cake, and it really seems like cheap multicores have grown very common. On Thu, Dec 7, 2017 at 10:53 AM, Dave Taht wrote: > ---------- Forwarded message ---------- > From: John Fastabend > Date: Thu, Dec 7, 2017 at 9:53 AM > Subject: [net-next PATCH 00/14] lockless qdisc series > To: willemdebruijn.kernel@gmail.com, daniel@iogearbox.net, > eric.dumazet@gmail.com, davem@davemloft.net > Cc: netdev@vger.kernel.org, jiri@resnulli.us, xiyou.wangcong@gmail.com > > > This series adds support for building lockless qdiscs. This is > the result of noticing the qdisc lock is a common hot-spot in > perf analysis of the Linux network stack, especially when testing > with high packet per second rates. However, nothing is free and > most qdiscs rely on the qdisc lock for their data structures so > each qdisc must be converted on a case by case basis. In this > series, to kick things off, we make pfifo_fast, mq, and mqprio > lockless. Follow up series can address additional qdiscs as needed. > For example sch_tbf might be useful. To allow this the lockless > design is an opt-in flag. In some future utopia we convert all > qdiscs and we get to drop this case analysis, but in order to > make progress we live in the real-world. > > There are also a handful of optimizations I have behind this > series and a few code cleanups that I couldn't figure out how > to fit neatly into this series with out increasing the patch > count. Once this is in additional patches can address this. The > most notable is in skb_dequeue we can push the consumer lock > out a bit and consume multiple skbs off the skb_array in pfifo > fast per iteration. Ideally we could push arrays of packets at > drivers as well but we would need the infrastructure for this. > The other notable improvement is to do less locking in the > overrun cases where bad tx queue list and gso_skb are being > hit. Although, nice in theory in practice this is the error > case and I haven't found a benchmark where this matters yet. > > For testing... > > My first test case uses multiple containers (via cilium) where > multiple client containers use 'wrk' to benchmark connections with > a server container running lighttpd. Where lighttpd is configured > to use multiple threads, one per core. Additionally this test has > a proxy agent running so all traffic takes an extra hop through a > proxy container. In these cases each TCP packet traverses the egress > qdisc layer at least four times and the ingress qdisc layer an > additional four times. This makes for a good stress test IMO, perf > details below. > > The other micro-benchmark I run is injecting packets directly into > qdisc layer using pktgen. This uses the benchmark script, > > ./pktgen_bench_xmit_mode_queue_xmit.sh > > Benchmarks taken in two cases, "base" running latest net-next no > changes to qdisc layer and "qdisc" tests run with qdisc lockless > updates. Numbers reported in req/sec. All virtual 'veth' devices > run with pfifo_fast in the qdisc test case. > > `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"` > > conns 16 32 64 1024 > ----------------------------------------------- > base: 18831 20201 21393 29151 > qdisc: 19309 21063 23899 29265 > > notice in all cases we see performance improvement when running > with qdisc case. > > Microbenchmarks using pktgen are as follows, > > `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000 > > base(mq): 2.1Mpps > base(pfifo_fast): 2.1Mpps > qdisc(mq): 2.6Mpps > qdisc(pfifo_fast): 2.6Mpps > > notice numbers are the same for mq and pfifo_fast because only > testing a single thread here. In both tests we see a nice bump > in performance gain. The key with 'mq' is it is already per > txq ring so contention is minimal in the above cases. Qdiscs > such as tbf or htb which have more contention will likely show > larger gains when/if lockless versions are implemented. > > Thanks to everyone who helped with this work especially Daniel > Borkmann, Eric Dumazet and Willem de Bruijn for discussing the > design and reviewing versions of the code. > > Changes from the RFC: dropped a couple patches off the end, > fixed a bug with skb_queue_walk_safe not unlinking skb in all > cases, fixed a lockdep splat with pfifo_fast_destroy not calling > *_bh lock variant, addressed _most_ of Willem's comments, there > was a bug in the bulk locking (final patch) of the RFC series. > > @Willem, I left out lockdep annotation for a follow on series > to add lockdep more completely, rather than just in code I > touched. > > Comments and feedback welcome. > > Thanks, > John > > --- > > John Fastabend (14): > net: sched: cleanup qdisc_run and __qdisc_run semantics > net: sched: allow qdiscs to handle locking > net: sched: remove remaining uses for qdisc_qlen in xmit path > net: sched: provide per cpu qstat helpers > net: sched: a dflt qdisc may be used with per cpu stats > net: sched: explicit locking in gso_cpu fallback > net: sched: drop qdisc_reset from dev_graft_qdisc > net: sched: use skb list for skb_bad_tx > net: sched: check for frozen queue before skb_bad_txq check > net: sched: helpers to sum qlen and qlen for per cpu logic > net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq > net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio > net: skb_array: expose peek API > net: sched: pfifo_fast use skb_array > > > include/linux/skb_array.h | 5 + > include/net/gen_stats.h | 3 > include/net/pkt_sched.h | 10 + > include/net/sch_generic.h | 79 +++++++- > net/core/dev.c | 31 +++ > net/core/gen_stats.c | 9 + > net/sched/sch_api.c | 8 + > net/sched/sch_generic.c | 440 ++++++++++++++++++++++++++++++++-------= ------ > net/sched/sch_mq.c | 34 +++ > net/sched/sch_mqprio.c | 69 +++++-- > 10 files changed, 512 insertions(+), 176 deletions(-) > > -- > Signature > > > -- > > Dave T=C3=A4ht > CEO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-669-226-2619 --=20 Dave T=C3=A4ht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619