From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id ECC453CB37 for ; Thu, 13 Jul 2023 11:02:38 -0400 (EDT) Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-55bc29a909dso481382a12.3 for ; Thu, 13 Jul 2023 08:02:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689260557; x=1691852557; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BiKD2rq1/O16l0hMqLH+e+jMOTXURNazhfIQqC9+1vc=; b=XtR5NoPuHuDs5QzZW/gtEymAm/yZgn0xyHRUX5uF5eL3j209pKtccf7MgAluLZDkxC Vso85x/pgnSNtxJ/iJKbI8BHebaWW30opDt3ySxNqJqfxJnekL5rrknCjKOi4Iw+w5H2 PqwwMncN4iykY9+aEGqVfu/TnVU8I6qsKQiHiVs1YG+kYq+wIPKNeOXBe2X15m6M80mK QSdhY+MWNWDTNt+IOMPVJqUM4X/TM8z2ErGnOcsDYHNBciPRs9DOkDxvsoQTDcRHy73x Nakf4akEkF++BuGBXnmfgt0vGAvbbN9VrCBZnJDBTuz42YfllBJfBEHHNCu64hE4wUo+ gi1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689260557; x=1691852557; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BiKD2rq1/O16l0hMqLH+e+jMOTXURNazhfIQqC9+1vc=; b=VELtgoxioOocaO/ZT8eb5pmVe6krolfOiBGrKhQYRLZuBJskb5BrrPxQ5HboTWp1mC 0Scw1B6khUOIWwpCh3SoGhmtJ2rNgR6mg6JHUy7RyPMHoFRD45ivCBsUkRz0X6WVcIWZ K7r6YkxR/SwYCj4/hBvKoA3u+WifYzLEg01jOTnRkvSPpU1XxhunxDjflqw+XMb1PHTl 8NvcON9L4J6GeYJjHwXjULagnbA6f/S/EhJp/DUUFw5f0bQRUNxd23InI/M7FhdlGLJP BXwM6NdbeO0iwW+Yg4rJUhxkgoaQtjNN3pJ/0O5iiX+MIbYaphRAfVt5hq+mY16ow4Ha di3w== X-Gm-Message-State: ABy/qLbSFzrTx7kQpALbpaPp+wY+8qE4lXojQ9dqZoTYVLmpUh2Auc03 +d5iZyJyD92NBJNO+pGFSA9xMLGNs3byrfbYPoKQqwG2f88= X-Google-Smtp-Source: APBJJlExEwDIoKqplKfms4lIWF7+JOMrcoREE0Bq7S9dAoxPVqd39itO5VypJ4nNPQ/Zhmdlm0CpktxYl/dXQy25SC0= X-Received: by 2002:a17:90a:ce94:b0:262:edb9:bf44 with SMTP id g20-20020a17090ace9400b00262edb9bf44mr1013196pju.19.1689260557270; Thu, 13 Jul 2023 08:02:37 -0700 (PDT) MIME-Version: 1.0 References: <00ca7beb7fe054a3ba1a36c61c1e3b1314369f11.camel@nvidia.com> <6d47e22e-f128-ec8f-bbdc-c030483a8783@redhat.com> <324a5a08-3053-6ab6-d47e-7413d9f2f443@redhat.com> In-Reply-To: <324a5a08-3053-6ab6-d47e-7413d9f2f443@redhat.com> From: Dave Taht Date: Thu, 13 Jul 2023 09:02:25 -0600 Message-ID: To: libreqos Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [LibreQoS] Fwd: mlx5 XDP redirect leaking memory on kernel 6.3 X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jul 2023 15:02:39 -0000 not that this applies to our mlx problem... probably... ---------- Forwarded message --------- From: Jesper Dangaard Brouer Date: Thu, Jul 13, 2023 at 8:59=E2=80=AFAM Subject: Re: mlx5 XDP redirect leaking memory on kernel 6.3 To: Dragos Tatulea , Tariq Toukan , jbrouer@redhat.com , Saeed Mahameed , saeed@kernel.org , netdev@vger.kernel.org , Greg KH Cc: , maxtram95@gmail.com , lorenzo@kernel.org , alexander.duyck@gmail.com , kheib@redhat.com , ilias.apalodimas@linaro.org , mkabat@redhat.com , atzin@redhat.com , fmaurer@redhat.com , bpf@vger.kernel.org , jbenc@redhat.com , linyunsheng@huawei.com , ttoukan.linux@gmail.com On 13/07/2023 12.11, Dragos Tatulea wrote: > Gi Jesper, > On Thu, 2023-07-13 at 11:20 +0200, Jesper Dangaard Brouer wrote: >> Hi Dragos, >> >> Below you promised to work on a fix for XDP redirect memory leak... >> What is the status? >> > The fix got merged into net a week ago: > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/dri= vers/net/ethernet/mellanox/mlx5/core?id=3D7abd955a58fb0fcd4e756fa2065c03ae4= 88fcfa7 > > Just forgot to follow up on this thread. Sorry about that... > Good to see it being fixed in net.git commit: 7abd955a58fb ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP") This need to be backported into stable tree 6.3, but I can see 6.3.13 is marked EOL (End-of-Life). Can we still get this fix applied? (Cc. GregKH) --Jesper > >> On 23/05/2023 18.35, Dragos Tatulea wrote: >>> >>> On Tue, 2023-05-23 at 17:55 +0200, Jesper Dangaard Brouer wrote: >>>> >>>> When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memo= ry >>>> is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_= TX >>>> works correctly. I tested both redirecting back out same mlx5 device a= nd >>>> cpumap redirect (with XDP_PASS), which both cause leaking. >>>> >>>> After removing the XDP prog, which also cause the page_pool to be >>>> released by mlx5, then the leaks are visible via the page_pool periodi= c >>>> inflight reports. I have this bpftrace[1] tool that I also use to dete= ct >>>> the problem faster (not waiting 60 sec for a report). >>>> >>>> [1] >>>> https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftr= ace/page_pool_track_shutdown01.bt >>>> >>>> I've been debugging and reading through the code for a couple of days, >>>> but I've not found the root-cause, yet. I would appreciate new ideas >>>> where to look and fresh eyes on the issue. >>>> >>>> >>>> To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current >>>> suspicion is that mlx5 driver doesn't fully release the bias count (hi= nt >>>> see MLX5E_PAGECNT_BIAS_MAX). >>>> >>> >>> Thanks for the report Jesper. Incidentally I've just picked up this iss= ue >>> today >>> as well. >>> >>> On XDP redirect and tx, the page is set to skip the bias counter releas= e >>> with >>> the expectation that page_pool_put_defragged_page will be called from [= 1]. >>> But, >>> as I found out now, during XDP redirect only one fragment of the page i= s >>> released in xdp core [2]. This is where the leak is coming from. >>> >>> We'll provide a fix soon. >>> >>> [1] >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tre= e/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c#n665 >>> >>> [2] >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tre= e/net/core/xdp.c#n390 >>> >>> Thanks, >>> Dragos >>> >>> >> > --=20 Podcast: https://www.linkedin.com/feed/update/urn:li:activity:7058793910227= 111937/ Dave T=C3=A4ht CSO, LibreQos