From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id DA4F23B29D for ; Tue, 20 Dec 2022 10:10:37 -0500 (EST) Received: by mail-wr1-x42f.google.com with SMTP id z10so1458836wrh.10 for ; Tue, 20 Dec 2022 07:10:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Lr9AVfm10LzRAZpy4+5D7cvTFDu/zEzjOWE+1rGSMUA=; b=Ow34D0WtZ8vf/7IKCPWxmRhNMt3UuVJO5F6Ie0EItmu7v+ur/NiICu0PKTiwhrNvD0 iBne+gG1O6884DWGz5nAwBqzR4Sbzb0qbNEuIQrzCfaTFTs8d+ljvi04YAbVNVrfXqDx 4yP6aObGVc/uNRKSSm/HRiC2cBtDii/wegvUx/rxa2K5ogoDXM0vcjBshp/P24iVdhPc EeEbUOel4WQ6Qr2KsqXQ0Fir1fMLfu29G7hQ+18Z16tEc+eeZEcqAGKCWMV/YDK0iLo1 L97xLbN6RTD8FpRfeU5vB/mhj6tOMSeagQ3FjpSunUIScXHfyZiB8WB7vcYLE8uHlDdA CXaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Lr9AVfm10LzRAZpy4+5D7cvTFDu/zEzjOWE+1rGSMUA=; b=bqRf8V1gNmkUeHH69aKWrMML/SbWLeWw239uFHrXGJZqwhfM0ReIPvGhTlosEjp4TX anfzXq3xnwgTK6EL/+bwch5I4pi5zLZa9Cy4sjJ4diZSEUjNsgYew3vOsxcdhbPxLnnv KbwnbJP/jQmphfW/TdTbswKpSNGkEjx3fDkELnR7ODlGSLuZuYhJ3WsgjNqYLuJz4j1K uBNHOH5xJIP0MDF2KdPuaGFb+fxZExLWlp3hzkiqbAEiSlTbNZCSKe5iBdWQG8o6ahd1 B0StCt7CjQA/cjtszaMqWD1sjEwx7tkqf2DE/1CyrKH+gfFdU5IWw5B12EevagGBC9YU cAgQ== X-Gm-Message-State: ANoB5pkKMsm87TOBlguoKanq+6j8o8gcxww7Cgf6eFDpkpu0nuqvWiV0 /9qSeP5mD38wcs8SVU/pAbnsgUinp+6VekxaR2Q= X-Google-Smtp-Source: AA0mqf4z0IYN8h/my+4Y2tn+vHBA2ekOyKsR6wg27aNTif4lxwhgHiVs6vHWxbqzkPdBBa5Uci5Z7iz1p0z9skBBIp4= X-Received: by 2002:a5d:510c:0:b0:242:82f5:fe65 with SMTP id s12-20020a5d510c000000b0024282f5fe65mr4799109wrt.688.1671549036567; Tue, 20 Dec 2022 07:10:36 -0800 (PST) MIME-Version: 1.0 References: <20221220075215.1.Ic12e347e0d61a618124b742614e82bbd5d770173@changeid> In-Reply-To: <20221220075215.1.Ic12e347e0d61a618124b742614e82bbd5d770173@changeid> From: Dave Taht Date: Tue, 20 Dec 2022 07:10:23 -0800 Message-ID: To: Abhishek Kumar Cc: kvalo@kernel.org, ath10k@lists.infradead.org, linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Make-Wifi-fast Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Make-wifi-fast] [PATCH] ath10k: snoc: enable threaded napi on WCN3990 X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2022 15:10:38 -0000 I am always interested in flent.org tcp_nup, tcp_ndown, and rrul_be tests on wifi hardware. In AP mode, especially, against a few clients in rtt_fair on the "ending the anomaly" test suite at the bottom of this link: https://www.cs.kau.se/tohojo/airtime-fairness/ . Of these, it's trying to optimize bandwidth more fairly and keep latencies low when 4 or more stations are trying to transmit (in a world with 16 or more stations online), that increasingly bothers me the most. I'm seeing 5+ seconds on some rtt_fair-like tests nowadays. I was also seeing huge simultaneous upload vs download disparities on the latest kernels, on various threads over here: https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002 and more recently here: https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-i= n-wifi/133605 I don't understand why napi with the default budget (64) is even needed on the ath10k, as a single txop takes a minimum of ~200us, but perhaps your patch will help. Still, measuring the TCP statistics in-band would be nice to see. Some new tools are appearing that can do this, Apple's goresponsiveness, crusader... that are simpler to use than flent. On Tue, Dec 20, 2022 at 12:17 AM Abhishek Kumar wrote= : > > NAPI poll can be done in threaded context along with soft irq > context. Threaded context can be scheduled efficiently, thus > creating less of bottleneck during Rx processing. This patch is > to enable threaded NAPI on ath10k driver. > > Based on testing, it was observed that on WCN3990, the CPU0 reaches > 100% utilization when napi runs in softirq context. At the same > time the other CPUs are at low consumption percentage. This > does not allow device to reach its maximum throughput potential. > After enabling threaded napi, CPU load is balanced across all CPUs > and following improvments were observed: > - UDP_RX increase by ~22-25% > - TCP_RX increase by ~15% > > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1 > Signed-off-by: Abhishek Kumar > --- > > drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++ > drivers/net/wireless/ath/ath10k/hw.h | 2 ++ > drivers/net/wireless/ath/ath10k/snoc.c | 3 +++ > 3 files changed, 21 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireles= s/ath/ath10k/core.c > index 5eb131ab916fd..ee4b6ba508c81 100644 > --- a/drivers/net/wireless/ath/ath10k/core.c > +++ b/drivers/net/wireless/ath/ath10k/core.c > @@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA988X_HW_2_0_VERSION, > @@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA9887_HW_1_0_VERSION, > @@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA6174_HW_3_2_VERSION, > @@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA6174_HW_2_1_VERSION, > @@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA6174_HW_2_1_VERSION, > @@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA6174_HW_3_0_VERSION, > @@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA6174_HW_3_2_VERSION, > @@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA99X0_HW_2_0_DEV_VERSION, > @@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA9984_HW_1_0_DEV_VERSION, > @@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA9888_HW_2_0_DEV_VERSION, > @@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA9377_HW_1_0_DEV_VERSION, > @@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA9377_HW_1_1_DEV_VERSION, > @@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA9377_HW_1_1_DEV_VERSION, > @@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D QCA4019_HW_1_0_DEV_VERSION, > @@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D false, > .use_fw_tx_credits =3D true, > .delay_unmap_buffer =3D false, > + .enable_threaded_napi =3D false, > }, > { > .id =3D WCN3990_HW_1_0_DEV_VERSION, > @@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params= _list[] =3D { > .hw_restart_disconnect =3D true, > .use_fw_tx_credits =3D false, > .delay_unmap_buffer =3D true, > + .enable_threaded_napi =3D true, > }, > }; > > diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/= ath/ath10k/hw.h > index 9643031a4427a..adf3076b96503 100644 > --- a/drivers/net/wireless/ath/ath10k/hw.h > +++ b/drivers/net/wireless/ath/ath10k/hw.h > @@ -639,6 +639,8 @@ struct ath10k_hw_params { > bool use_fw_tx_credits; > > bool delay_unmap_buffer; > + > + bool enable_threaded_napi; > }; > > struct htt_resp; > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireles= s/ath/ath10k/snoc.c > index cfcb759a87dea..b94150fb6ef06 100644 > --- a/drivers/net/wireless/ath/ath10k/snoc.c > +++ b/drivers/net/wireless/ath/ath10k/snoc.c > @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar) > > bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX); > > + if (ar->hw_params.enable_threaded_napi) > + dev_set_threaded(&ar->napi_dev, true); > + > ath10k_core_napi_enable(ar); > ath10k_snoc_irq_enable(ar); > ath10k_snoc_rx_post(ar); > -- > 2.39.0.314.g84b9a713c41-goog > --=20 This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666656= 07352320-FXtz Dave T=C3=A4ht CEO, TekLibre, LLC