From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 194D63B2A4 for ; Tue, 23 May 2023 13:00:33 -0400 (EDT) Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-3063433fa66so4863041f8f.3 for ; Tue, 23 May 2023 10:00:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684861231; x=1687453231; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=VolUKYZjYx0rETXkmqlWTHP3TP4MGXxs+MG3a9JLkkQ=; b=F/DslB1YGbG7L1QQbhstS7ZL2knW1ZwnpL7ZGrGrWNJAH1TSJxeHmcdjazaCaRu7R7 s1KOlVFWg0xqMHetphG9tWDXLWL8kkYV2r0kC/KxSLyrRhLDvfyd/suYY/jz7kfidAlV bFiiOPu5fyfT3OfXgsZbnauBV7fIBJ3HPqLS5FEaD+0QpRasPBqjTi5rRMFDIc7lEqJD 4Da40N/9NTMSUTP467w5FVqpLKrmEddnjH+RX3N2Ke5HXWdCBeKE/NqN0kOJX+8XXjAt PfvlK/HW110kuJxW6z/8lndYRBK4XDSzoPkcAcfW8Hp+V/uPN9uUR9U7olHvQ2NOKuAg k3tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684861231; x=1687453231; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VolUKYZjYx0rETXkmqlWTHP3TP4MGXxs+MG3a9JLkkQ=; b=DH6e6XRgC4ZrfH6hVShv+oEaNyqP3ZX8MB5R8otn2UxquSfqnRBuiInnE8PGH8PFb7 fyIUphu3sKv6J7zPsqGUhWuSFcX35X+2ljhh05FBGVGiMADoOJfUyiZm2hIeOaXvw4AJ sNLODQcFK4NCjQo8AyIDNKkdcf3i6YaViIwaY1yWv5XrslIOHZsiy+kvltOkE3TySozm +EpWfbqNDgP/4NRaC+9Moez3zu8U+T5sgtXwMXL05gZNFXaPxG96ZOGqBC2xL12t9yPZ g+mcVxrjHm10ln70RkgMyA/8aKP4RNO7ath2VkXmnbpMFH3B+Zok0djn2uMWm5HhUN7E D60Q== X-Gm-Message-State: AC+VfDx8vHw39LlNo7Bhal63f2EkDR/gH3PPNEWkBOa8CQutoxDIP2eo NHJqYqw0QI5hxRxgKU1VBLFbRSBjaLnF3up8Kx3BoX21 X-Google-Smtp-Source: ACHHUZ4gKjxfBSP+O11SC30eBT6sjWkrzod+FBL3CUsKCeBFvcwjRj9qT8Eq0d0gS9+QC0SmrWI4gkOczWdAMbli1Os= X-Received: by 2002:a5d:4d4f:0:b0:309:535c:c236 with SMTP id a15-20020a5d4d4f000000b00309535cc236mr10603733wru.36.1684861231357; Tue, 23 May 2023 10:00:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Tue, 23 May 2023 11:00:19 -0600 Message-ID: To: libreqos Content-Type: multipart/alternative; boundary="00000000000030e2db05fc5f537b" Subject: [LibreQoS] Fwd: mlx5 XDP redirect leaking memory on kernel 6.3 X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 May 2023 17:00:33 -0000 --00000000000030e2db05fc5f537b Content-Type: text/plain; charset="UTF-8" Not sure what driver our friends in NZ are using... ---------- Forwarded message --------- From: Jesper Dangaard Brouer Date: Tue, May 23, 2023, 9:55 AM Subject: mlx5 XDP redirect leaking memory on kernel 6.3 To: Dragos Tatulea , Saeed Mahameed , Saeed Mahameed , Tariq Toukan , Tariq Toukan , Netdev , Yunsheng Lin Cc: , , , < kheib@redhat.com>, Jiri Benc , bpf , Felix Maurer , Alexander Duyck < alexander.duyck@gmail.com>, Ilias Apalodimas , Lorenzo Bianconi , Maxim Mikityanskiy < maxtram95@gmail.com> When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX works correctly. I tested both redirecting back out same mlx5 device and cpumap redirect (with XDP_PASS), which both cause leaking. After removing the XDP prog, which also cause the page_pool to be released by mlx5, then the leaks are visible via the page_pool periodic inflight reports. I have this bpftrace[1] tool that I also use to detect the problem faster (not waiting 60 sec for a report). [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt I've been debugging and reading through the code for a couple of days, but I've not found the root-cause, yet. I would appreciate new ideas where to look and fresh eyes on the issue. To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current suspicion is that mlx5 driver doesn't fully release the bias count (hint see MLX5E_PAGECNT_BIAS_MAX). --Jesper Extra info about my device. Providing these as mlx5 driver can have different allocation modes depending on HW and device priv-flags setup. $ ethtool --show-priv-flags mlx5p1 Private flags for mlx5p1: rx_cqe_moder : on tx_cqe_moder : off rx_cqe_compress : off rx_striding_rq : on rx_no_csum_complete: off xdp_tx_mpwqe : on skb_tx_mpwqe : on tx_port_ts : off $ ethtool -i mlx5p1 driver: mlx5_core version: 6.4.0-rc2-net-next-vm-lock-dbg+ firmware-version: 16.23.1020 (MT_0000000009) expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes $ lspci -v | grep 03:00.0 03:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] --00000000000030e2db05fc5f537b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Not sure what driver our friends in NZ are using...
=
------= ---- Forwarded message ---------
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
Date:= Tue, May 23, 2023, 9:55 AM
Subject: mlx5 XDP redirect leaking memory on= kernel 6.3
To: Dragos Tatulea <dtatulea@nvidia.com>, Saeed Mahameed <saeed@kernel.org>, Saeed Mahameed <saeedm@nvidia.com>, Tariq Toukan <tariqt@nvidia.com>, Tariq Toukan <ttoukan.linux@gmail.com>, Netdev= <netdev@vger.kernel.org&g= t;, Yunsheng Lin <linyunsheng@= huawei.com>
Cc: <brouer@= redhat.com>, <atzin@redhat.c= om>, <mkabat@redhat.com= >, <kheib@redhat.com>, Ji= ri Benc <jbenc@redhat.com>, b= pf <bpf@vger.kernel.org>, = Felix Maurer <fmaurer@redhat.com>, Alexander Duyck <al= exander.duyck@gmail.com>, Ilias Apalodimas <ilias.apalodimas@linaro.org>, Lorenzo Bian= coni <lorenzo@kernel.org>, = Maxim Mikityanskiy <maxtram95@gma= il.com>



When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX works correctly. I tested both redirecting back out same mlx5 device and cpumap redirect (with XDP_PASS), which both cause leaking.

After removing the XDP prog, which also cause the page_pool to be
released by mlx5, then the leaks are visible via the page_pool periodic
inflight reports. I have this bpftrace[1] tool that I also use to detect the problem faster (not waiting 60 sec for a report).

=C2=A0 [1]
https://github.com/xdp-project/xdp-project/blob/master/areas/= mem/bpftrace/page_pool_track_shutdown01.bt

I've been debugging and reading through the code for a couple of days,<= br> but I've not found the root-cause, yet. I would appreciate new ideas where to look and fresh eyes on the issue.

To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current
suspicion is that mlx5 driver doesn't fully release the bias count (hin= t
see MLX5E_PAGECNT_BIAS_MAX).

--Jesper


Extra info about my device.=C2=A0 Providing these as mlx5 driver can have <= br> different allocation modes depending on HW and device priv-flags setup.

$ ethtool --show-priv-flags mlx5p1
Private flags for mlx5p1:
rx_cqe_moder=C2=A0 =C2=A0 =C2=A0 =C2=A0: on
tx_cqe_moder=C2=A0 =C2=A0 =C2=A0 =C2=A0: off
rx_cqe_compress=C2=A0 =C2=A0 : off
rx_striding_rq=C2=A0 =C2=A0 =C2=A0: on
rx_no_csum_complete: off
xdp_tx_mpwqe=C2=A0 =C2=A0 =C2=A0 =C2=A0: on
skb_tx_mpwqe=C2=A0 =C2=A0 =C2=A0 =C2=A0: on
tx_port_ts=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: off

$ ethtool -i mlx5p1
driver: mlx5_core
version: 6.4.0-rc2-net-next-vm-lock-dbg+
firmware-version: 16.23.1020 (MT_0000000009)
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

$ lspci -v | grep 03:00.0
03:00.0 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]


--00000000000030e2db05fc5f537b--