From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 2CA5B3B29D for ; Fri, 21 Aug 2020 21:56:56 -0400 (EDT) Received: by mail-io1-xd41.google.com with SMTP id h4so3563679ioe.5 for ; Fri, 21 Aug 2020 18:56:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=mZp60XWXHXI0OKQweTwIQr/Gxdhd3rxAHMYnSwlN2rI=; b=KeBM6jTHGgAknE3a/ooyZGZgYFgiXk2hh5xnNRs11gLbQxjwbT9ZWL3DhChLdp0Api wpPu8vpmhSqNdOVa+hnOlAvbZQnbW46Jnj1SjSO7nPYwxfk3r8r5glWk0vbBHZgfdYhU mNNeTj0znC/HmsA4dbiUPcZ3yDaSWvY1Rnv++xCpapdHqwd1Dqn4OFa6cqDV0ib3Qgd8 HsxhTzBGEFhD97+Cji1FbNnnBe6EBoctnOUlqA0npBRL9VwMOEk0MsOoHfxiGO7Y+v7L 2x/0bE9/mbvTzlwDe8SDD7XnZE5rCeGkAZdqje90TUGl/Mk/STz2O2ZIg8lInOL9LSAv Kfag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=mZp60XWXHXI0OKQweTwIQr/Gxdhd3rxAHMYnSwlN2rI=; b=OMPi8SmYbqv2TkwiaIQlzJ40VoCb2HTFOhFFtzgZiiJ4qYD4L1rHvLPGCa/LqFeakE B0AdHeovZVZfGH+2a1DGUUZQ21kBVuhTdo6nprHVJPC8rx7FeIlsC80+DUsZ03wYoRGV pMri1TBJxTDFamS7mB1GA/8qXqbr3Q1JbnmkegKtyhJ353EB4rw4xudtcSztrvKF4qPE zQXPwqM0fz2dS3GXysGECM/xwbiJIhNKcaYBW4SZQvJ6w0zWWixnFZPlALNXNTLF7P2o mBE/pmocb3N5llWt9WOtrykUOwoJN1axcAsXstFBPKdRS3up8qgqgcyLe11LNvCPBgcu hKag== X-Gm-Message-State: AOAM531x5cRwtZ4nItjTpVSnh81AqGtgkoi4ir+KRNidzxeV58mPlQmA avB/lzTFaRCHlPCvk5gcczEG8sQucRp72iIrGVwSGDrP X-Google-Smtp-Source: ABdhPJzjA4k+pkeRnDPHQ/HqSaAhdQ2NAYXS9uCeT8f7cFwe20xmChjWJr5SRPwOPifQBFV3K3+M33D6U7/86zZuE1Y= X-Received: by 2002:a02:a389:: with SMTP id y9mr4865732jak.82.1598061415209; Fri, 21 Aug 2020 18:56:55 -0700 (PDT) MIME-Version: 1.0 References: <20200821190151.9792-1-nbd@nbd.name> In-Reply-To: <20200821190151.9792-1-nbd@nbd.name> From: Dave Taht Date: Fri, 21 Aug 2020 18:56:44 -0700 Message-ID: To: Make-Wifi-fast Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [Make-wifi-fast] Fwd: [PATCH v3 1/2] net: add support for threaded NAPI polling X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Aug 2020 01:56:56 -0000 I do not understand why napi is needed at all for wifi. ---------- Forwarded message --------- From: Felix Fietkau Date: Fri, Aug 21, 2020 at 12:03 PM Subject: [PATCH v3 1/2] net: add support for threaded NAPI polling To: Cc: Eric Dumazet , Hillf Danton For some drivers (especially 802.11 drivers), doing a lot of work in the NA= PI poll function does not perform well. Since NAPI poll is bound to the CPU it was scheduled from, we can easily end up with a few very busy CPUs spending most of their time in softirq/ksoftirqd and some idle ones. Introduce threaded NAPI for such drivers based on a workqueue. The API is t= he same except for using netif_threaded_napi_add instead of netif_napi_add. In my tests with mt76 on MT7621 using threaded NAPI + a thread for tx sched= uling improves LAN->WLAN bridging throughput by 10-50%. Throughput without thread= ed NAPI is wildly inconsistent, depending on the CPU that runs the tx scheduli= ng thread. With threaded NAPI, throughput seems stable and consistent (and higher than the best results I got without it). Based on a patch by Hillf Danton Cc: Hillf Danton Signed-off-by: Felix Fietkau --- Changes since PATCH v2: - Split sysfs attribute into a separate patch - Take RTNL on attribute show - make napi_threaded attribute static Changes since PATCH v1: - use WQ_SYSFS to make workqueue configurable from user space - cancel work in netif_napi_del - add a sysfs file to enable/disable threaded NAPI for a netdev Changes since RFC v2: - fix unused but set variable reported by kbuild test robot Changes since RFC: - disable softirq around threaded poll functions - reuse most parts of napi_poll() - fix re-schedule condition include/linux/netdevice.h | 23 +++++++ net/core/dev.c | 139 +++++++++++++++++++++++++++----------- 2 files changed, 124 insertions(+), 38 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b0e303f6603f..69507e6d4dc8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -347,6 +347,7 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; unsigned int napi_id; + struct work_struct work; }; enum { @@ -357,6 +358,7 @@ enum { NAPI_STATE_HASHED, /* In NAPI hash (busy polling possible) */ NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling= */ NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */ + NAPI_STATE_THREADED, /* Use threaded NAPI */ }; enum { @@ -367,6 +369,7 @@ enum { NAPIF_STATE_HASHED =3D BIT(NAPI_STATE_HASHED), NAPIF_STATE_NO_BUSY_POLL =3D BIT(NAPI_STATE_NO_BUSY_POLL), NAPIF_STATE_IN_BUSY_POLL =3D BIT(NAPI_STATE_IN_BUSY_POLL), + NAPIF_STATE_THREADED =3D BIT(NAPI_STATE_THREADED), }; enum gro_result { @@ -2327,6 +2330,26 @@ static inline void *netdev_priv(const struct net_device *dev) void netif_napi_add(struct net_device *dev, struct napi_struct *napi, int (*poll)(struct napi_struct *, int), int weight); +/** + * netif_threaded_napi_add - initialize a NAPI context + * @dev: network device + * @napi: NAPI context + * @poll: polling function + * @weight: default weight + * + * This variant of netif_napi_add() should be used from drivers using NAPI + * with CPU intensive poll functions. + * This will schedule polling from a high priority workqueue + */ +static inline void netif_threaded_napi_add(struct net_device *dev, + struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), + int weight) +{ + set_bit(NAPI_STATE_THREADED, &napi->state); + netif_napi_add(dev, napi, poll, weight); +} + /** * netif_tx_napi_add - initialize a NAPI context * @dev: network device diff --git a/net/core/dev.c b/net/core/dev.c index b5d1129d8310..b6165309617c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -157,6 +157,7 @@ static DEFINE_SPINLOCK(offload_lock); struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly; struct list_head ptype_all __read_mostly; /* Taps */ static struct list_head offload_base __read_mostly; +static struct workqueue_struct *napi_workq __read_mostly; static int netif_rx_internal(struct sk_buff *skb); static int call_netdevice_notifiers_info(unsigned long val, @@ -6282,6 +6283,11 @@ void __napi_schedule(struct napi_struct *n) { unsigned long flags; + if (test_bit(NAPI_STATE_THREADED, &n->state)) { + queue_work(napi_workq, &n->work); + return; + } + local_irq_save(flags); ____napi_schedule(this_cpu_ptr(&softnet_data), n); local_irq_restore(flags); @@ -6329,6 +6335,11 @@ EXPORT_SYMBOL(napi_schedule_prep); */ void __napi_schedule_irqoff(struct napi_struct *n) { + if (test_bit(NAPI_STATE_THREADED, &n->state)) { + queue_work(napi_workq, &n->work); + return; + } + ____napi_schedule(this_cpu_ptr(&softnet_data), n); } EXPORT_SYMBOL(__napi_schedule_irqoff); @@ -6597,6 +6608,86 @@ static void init_gro_hash(struct napi_struct *napi) napi->gro_bitmask =3D 0; } +static int __napi_poll(struct napi_struct *n, bool *repoll) +{ + int work, weight; + + weight =3D n->weight; + + /* This NAPI_STATE_SCHED test is for avoiding a race + * with netpoll's poll_napi(). Only the entity which + * obtains the lock and sees NAPI_STATE_SCHED set will + * actually make the ->poll() call. Therefore we avoid + * accidentally calling ->poll() when NAPI is not scheduled. + */ + work =3D 0; + if (test_bit(NAPI_STATE_SCHED, &n->state)) { + work =3D n->poll(n, weight); + trace_napi_poll(n, work, weight); + } + + if (unlikely(work > weight)) + pr_err_once("NAPI poll function %pS returned %d, exceeding its budget of %d.\n", + n->poll, work, weight); + + if (likely(work < weight)) + return work; + + /* Drivers must not modify the NAPI state if they + * consume the entire weight. In such cases this code + * still "owns" the NAPI instance and therefore can + * move the instance around on the list at-will. + */ + if (unlikely(napi_disable_pending(n))) { + napi_complete(n); + return work; + } + + if (n->gro_bitmask) { + /* flush too old packets + * If HZ < 1000, flush all packets. + */ + napi_gro_flush(n, HZ >=3D 1000); + } + + gro_normal_list(n); + + *repoll =3D true; + + return work; +} + +static void napi_workfn(struct work_struct *work) +{ + struct napi_struct *n =3D container_of(work, struct napi_struct, wo= rk); + void *have; + + for (;;) { + bool repoll =3D false; + + local_bh_disable(); + + have =3D netpoll_poll_lock(n); + __napi_poll(n, &repoll); + netpoll_poll_unlock(have); + + local_bh_enable(); + + if (!repoll) + return; + + if (!need_resched()) + continue; + + /* + * have to pay for the latency of task switch even if + * napi is scheduled + */ + queue_work(napi_workq, work); + return; + } +} + void netif_napi_add(struct net_device *dev, struct napi_struct *napi, int (*poll)(struct napi_struct *, int), int weight) { @@ -6617,6 +6708,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, #ifdef CONFIG_NETPOLL napi->poll_owner =3D -1; #endif + INIT_WORK(&napi->work, napi_workfn); set_bit(NAPI_STATE_SCHED, &napi->state); napi_hash_add(napi); } @@ -6655,6 +6747,7 @@ static void flush_gro_hash(struct napi_struct *napi) void netif_napi_del(struct napi_struct *napi) { might_sleep(); + cancel_work_sync(&napi->work); if (napi_hash_del(napi)) synchronize_net(); list_del_init(&napi->dev_list); @@ -6667,53 +6760,19 @@ EXPORT_SYMBOL(netif_napi_del); static int napi_poll(struct napi_struct *n, struct list_head *repoll) { + bool do_repoll =3D false; void *have; - int work, weight; + int work; list_del_init(&n->poll_list); have =3D netpoll_poll_lock(n); - weight =3D n->weight; + work =3D __napi_poll(n, &do_repoll); - /* This NAPI_STATE_SCHED test is for avoiding a race - * with netpoll's poll_napi(). Only the entity which - * obtains the lock and sees NAPI_STATE_SCHED set will - * actually make the ->poll() call. Therefore we avoid - * accidentally calling ->poll() when NAPI is not scheduled. - */ - work =3D 0; - if (test_bit(NAPI_STATE_SCHED, &n->state)) { - work =3D n->poll(n, weight); - trace_napi_poll(n, work, weight); - } - - if (unlikely(work > weight)) - pr_err_once("NAPI poll function %pS returned %d, exceeding its budget of %d.\n", - n->poll, work, weight); - - if (likely(work < weight)) + if (!do_repoll) goto out_unlock; - /* Drivers must not modify the NAPI state if they - * consume the entire weight. In such cases this code - * still "owns" the NAPI instance and therefore can - * move the instance around on the list at-will. - */ - if (unlikely(napi_disable_pending(n))) { - napi_complete(n); - goto out_unlock; - } - - if (n->gro_bitmask) { - /* flush too old packets - * If HZ < 1000, flush all packets. - */ - napi_gro_flush(n, HZ >=3D 1000); - } - - gro_normal_list(n); - /* Some drivers may have called napi_schedule * prior to exhausting their budget. */ @@ -10975,6 +11034,10 @@ static int __init net_dev_init(void) sd->backlog.weight =3D weight_p; } + napi_workq =3D alloc_workqueue("napi_workq", WQ_UNBOUND | WQ_HIGHPR= I, + WQ_UNBOUND_MAX_ACTIVE | WQ_SYSFS); + BUG_ON(!napi_workq); + dev_boot_phase =3D 0; /* The loopback device is special if any other network devices -- 2.28.0 --=20 "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman dave@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729