From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id BF7363CB39 for ; Tue, 28 Feb 2023 16:50:31 -0500 (EST) Received: by mail-wr1-x42f.google.com with SMTP id l25so11296710wrb.3 for ; Tue, 28 Feb 2023 13:50:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6FxdZ3ZSJrqOfB/WYcd8EYdByeotT7hZkeDxQ5cxvKw=; b=QLDBJEsZJkwfgLNyWP1/552TcME8LE4LrpQJLSQWQgMdZlb1h2jgeNxM7wvhszuggi kD/1IbtHR7+P8Ev3eoi49MF3CeV0279+4KAyyO9zoYRGrzTh0L9hhwp8DvX3KahtIeB6 DqB4eVC+wDMKTHeXbctojynKsa+Puy7CXrscRrn8xc50b3hvNn0Ge0IHPxO1g/9AYsc2 Ltv3tYf6BKLkAxJ5h9RG7dCCH0yR4bSrjbfTG+WUMnf073eZZYrFjcjS9+1fot1YA4/n sF4fUgMaektPhC+RbVwFLvBt+yhREB1HP2aGL9+MP+8tuvIKclJqSuHgXNAQG8GU7bOo SfnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6FxdZ3ZSJrqOfB/WYcd8EYdByeotT7hZkeDxQ5cxvKw=; b=C74TeL+HOR4l3aNGww45v6XOAjDvI9+INyqy42CrFOm02ALPw5PD9rMC+Ptt3NfP7U CTyFMlMCLp+jcIN4sscWlXOLvzAMkdwhIXRqFwrHosIzmfVpONymNG5vwGHvan8sMSMl 2Mcb+DuHUNSRg8FFrHyL/p3mZXF3lq+xbO6y0h9KEpxQG7hUNghLxxJ+PIkhtyjX1Zg3 GU3HDgSiqa6JUv15bdMyUuXu5Up/+cq8Ik92M/VbrXhim8Wh8wXyVYVIiEJVeQYyx1kY boLJRFSmwZrsfyUUM4RQiTZPA9daMvJjaUxRANUN1WRh4uyBVnbKvi5vkACAfLP5d+HS v89A== X-Gm-Message-State: AO0yUKWaDgic5KD3XpRAxa+kiJd1YV1Pc/XDBRcZFO+tyga5OwbCRWu2 00VNd7LCXJiqnr27ki4pwwsRHk4vtOIRNMsi7EbEaN2t X-Google-Smtp-Source: AK7set/M8E9bK6j+X6VzJyO5pK7t+zc86GOAw6OIW7cX7QSbHoSa1XEVMClYpex/yw831ma6AVV0sjBIsvIHgp2QxL8= X-Received: by 2002:a5d:6210:0:b0:2c5:8d5a:9cc7 with SMTP id y16-20020a5d6210000000b002c58d5a9cc7mr890014wru.0.1677621029613; Tue, 28 Feb 2023 13:50:29 -0800 (PST) MIME-Version: 1.0 References: <20230228132118.978145284@linutronix.de> <20230228132910.934296889@linutronix.de> In-Reply-To: <20230228132910.934296889@linutronix.de> From: Dave Taht Date: Tue, 28 Feb 2023 13:50:17 -0800 Message-ID: To: libreqos Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [LibreQoS] Fwd: [patch 1/3] net: dst: Prevent false sharing vs. dst_entry::__refcnt X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2023 21:50:31 -0000 ---------- Forwarded message --------- From: Thomas Gleixner Date: Tue, Feb 28, 2023 at 6:43=E2=80=AFAM Subject: [patch 1/3] net: dst: Prevent false sharing vs. dst_entry::__refcn= t To: LKML Cc: Linus Torvalds , , Wangyang Guo , Arjan van De Ven , David S. Miller , Eric Dumazet , Jakub Kicinski , Paolo Abeni , , Will Deacon , Peter Zijlstra , Boqun Feng , Mark Rutland , Marc Zyngier From: Wangyang Guo dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive. Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing. perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid. dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct. That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also dst_entry::lwtstate which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired. struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially rtable::rt_genid is also read by the contexts which have a reference count acquired already. When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly. Rearrange and pad the structure so that the lwtstate member is in the next cache-line. This increases the struct size from 112 to 136 bytes on 64bit. The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark. [ tglx: Rearrange struct ] Signed-off-by: Wangyang Guo Signed-off-by: Arjan van De Ven Signed-off-by: Thomas Gleixner Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: netdev@vger.kernel.org --- include/net/dst.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) --- a/include/net/dst.h +++ b/include/net/dst.h @@ -69,15 +69,25 @@ struct dst_entry { #endif int __use; unsigned long lastuse; - struct lwtunnel_state *lwtstate; struct rcu_head rcu_head; short error; short __pad; __u32 tclassid; #ifndef CONFIG_64BIT + struct lwtunnel_state *lwtstate; atomic_t __refcnt; /* 32-bit offset 64 */ #endif netdevice_tracker dev_tracker; +#ifdef CONFIG_64BIT + /* + * Ensure that lwtstate is not in the same cache line as __refcnt, + * because that would lead to false sharing under high contention + * of __refcnt. This also ensures that rtable::rt_genid is not + * sharing the same cache-line. + */ + int pad2[6]; + struct lwtunnel_state *lwtstate; +#endif }; struct dst_metrics { --=20 A pithy note on VOQs vs SQM: https://blog.cerowrt.org/post/juniper/ Dave T=C3=A4ht CEO, TekLibre, LLC