From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 66ADC3B2A4 for ; Mon, 28 Nov 2022 03:11:20 -0500 (EST) Received: by mail-wr1-x430.google.com with SMTP id g12so15415172wrs.10 for ; Mon, 28 Nov 2022 00:11:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VukoYlTGdmXDAf5cGqZvsDuMcy5/S3rWTiuOMsbDB+M=; b=alRDDvrab1BFGO6OqM72oSiubN0O2/I06qa6v8sqqlnpeIyu4pb09aZdVb+Zsz5q6j W+HQKq718lxs0Ny+uBs9VLnxGPWAEE6VDTxT8/8SXxlZ5E3GAfj47Yuvh2Cm98kPBvYU KmMidpi+vs6ONoRe68CzZYcWNj2rd5igKMA47k+RrAsjOxhXAwVe0j80rsZ+6mtHkgIG 4XkSL64x/32CKEH9KcEP1grQqcpzLyiAJWwtPgaL2Z4cN8GlAa940sS40Es28+M5OUKB epYNOjjjLFZPtC45rWHGf47GOZpKrlcZOfWHakkIXlxxGKq8IoFRqx2HYEn0GIqwGtEZ ki8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VukoYlTGdmXDAf5cGqZvsDuMcy5/S3rWTiuOMsbDB+M=; b=FAZgme7ArF7gix28mI6ODD3r58Ex21a6n5wEESRiNw27uy0Jc5vj9bxmsuChXtc9sK jrKcDpnRLTpOenMHdbnJsZluqcdrfhQXb/1hxWMjk8zQXxlGMCKAzVaFWLAk8+b8Y5Dr bg14BH37cidEKbhhQg+j6aJAeKE5CfaEkRejJ8Jp5VOoH4hvg3XyJXi2vA9jyb5n0GIW CBHKyVO4Yxgg54UgJNTmiEezwLb1Lfgfg+KQ1DgRZoVYqqQH0ZPTJCMhIP0mZSomDvXO QEpa/bqjAzYikGZwSvsdRYqsADlR4H0UUMOFclRRIC2jNO/NfaalcfM3IcGUISAgiXIl tqsg== X-Gm-Message-State: ANoB5pniRO2aMBUTd19su5Q89hCQXil5ZVrbJJdGG+TrtmdwYF7iaCK7 4WkpfB/Rhu/afok53pVujYv8JUbSEb65wlO9ekHJ76+9NGs= X-Google-Smtp-Source: AA0mqf7xnZGPEYkHmnaCgwTHBy8QrlhEMUUuYwQDbkM4wZoRsAyVYxxwuQL+g0qQK5PWEZSA0Hn1AxdrJCzYSVdhFkc= X-Received: by 2002:adf:f94f:0:b0:241:f467:f885 with SMTP id q15-20020adff94f000000b00241f467f885mr14458884wrr.482.1669623078762; Mon, 28 Nov 2022 00:11:18 -0800 (PST) MIME-Version: 1.0 References: <20221112190946.728270-1-yury.norov@gmail.com> In-Reply-To: <20221112190946.728270-1-yury.norov@gmail.com> From: Dave Taht Date: Mon, 28 Nov 2022 00:11:05 -0800 Message-ID: To: libreqos Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [LibreQoS] Fwd: [PATCH v2 0/4] cpumask: improve on cpumask_local_spread() locality X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2022 08:11:20 -0000 ---------- Forwarded message --------- From: Yury Norov Date: Sat, Nov 12, 2022 at 11:14 AM Subject: [PATCH v2 0/4] cpumask: improve on cpumask_local_spread() locality To: , David S. Miller , Andy Shevchenko , Barry Song , Ben Segall , haniel Bristot de Oliveira , Dietmar Eggemann , Gal Pressman , Greg Kroah-Hartman , Heiko Carstens , Ingo Molnar , Jakub Kicinski , Jason Gunthorpe , Jesse Brandeburg , Jonathan Cameron , Juri Lelli , Leon Romanovsky , Mel Gorman , Peter Zijlstra , Rasmus Villemoes , Saeed Mahameed , Steven Rostedt , Tariq Toukan , Tariq Toukan , Tony Luck , Valentin Schneider , Vincent Guittot Cc: Yury Norov , , , cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This series is inspired by Tariq Toukan and Valentin Schneider's "net/mlx5e= : Improve remote NUMA preferences used for the IRQ affinity hints" https://patchwork.kernel.org/project/netdevbpf/patch/20220728191203.4055-3-= tariqt@nvidia.com/ According to their measurements, for mlx5e: Bottleneck in RX side is released, reached linerate (~1.8x speedup)= . ~30% less cpu util on TX. This patch makes cpumask_local_spread() traversing CPUs based on NUMA distance, just as well, and I expect comparabale improvement for its users, as in case of mlx5e. I tested new behavior on my VM with the following NUMA configuration: root@debian:~# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 node 0 size: 3869 MB node 0 free: 3740 MB node 1 cpus: 4 5 node 1 size: 1969 MB node 1 free: 1937 MB node 2 cpus: 6 7 node 2 size: 1967 MB node 2 free: 1873 MB node 3 cpus: 8 9 10 11 12 13 14 15 node 3 size: 7842 MB node 3 free: 7723 MB node distances: node 0 1 2 3 0: 10 50 30 70 1: 50 10 70 30 2: 30 70 10 50 3: 70 30 50 10 And the cpumask_local_spread() for each node and offset traversing looks like this: node 0: 0 1 2 3 6 7 4 5 8 9 10 11 12 13 14 15 node 1: 4 5 8 9 10 11 12 13 14 15 0 1 2 3 6 7 node 2: 6 7 0 1 2 3 8 9 10 11 12 13 14 15 4 5 node 3: 8 9 10 11 12 13 14 15 4 5 6 7 0 1 2 3 v1: https://lore.kernel.org/lkml/20221111040027.621646-5-yury.norov@gmail.c= om/T/ v2: - use bsearch() in sched_numa_find_nth_cpu(); - fix missing 'static inline' in 3rd patch. Yury Norov (4): lib/find: introduce find_nth_and_andnot_bit cpumask: introduce cpumask_nth_and_andnot sched: add sched_numa_find_nth_cpu() cpumask: improve on cpumask_local_spread() locality include/linux/cpumask.h | 20 +++++++++++++++ include/linux/find.h | 33 ++++++++++++++++++++++++ include/linux/topology.h | 8 ++++++ kernel/sched/topology.c | 55 ++++++++++++++++++++++++++++++++++++++++ lib/cpumask.c | 12 ++------- lib/find_bit.c | 9 +++++++ 6 files changed, 127 insertions(+), 10 deletions(-) -- 2.34.1 --=20 This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666656= 07352320-FXtz Dave T=C3=A4ht CEO, TekLibre, LLC