Not sure what driver our friends in NZ are using... ---------- Forwarded message --------- From: Jesper Dangaard Brouer Date: Tue, May 23, 2023, 9:55 AM Subject: mlx5 XDP redirect leaking memory on kernel 6.3 To: Dragos Tatulea , Saeed Mahameed , Saeed Mahameed , Tariq Toukan , Tariq Toukan , Netdev , Yunsheng Lin Cc: , , , < kheib@redhat.com>, Jiri Benc , bpf , Felix Maurer , Alexander Duyck < alexander.duyck@gmail.com>, Ilias Apalodimas , Lorenzo Bianconi , Maxim Mikityanskiy < maxtram95@gmail.com> When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX works correctly. I tested both redirecting back out same mlx5 device and cpumap redirect (with XDP_PASS), which both cause leaking. After removing the XDP prog, which also cause the page_pool to be released by mlx5, then the leaks are visible via the page_pool periodic inflight reports. I have this bpftrace[1] tool that I also use to detect the problem faster (not waiting 60 sec for a report). [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt I've been debugging and reading through the code for a couple of days, but I've not found the root-cause, yet. I would appreciate new ideas where to look and fresh eyes on the issue. To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current suspicion is that mlx5 driver doesn't fully release the bias count (hint see MLX5E_PAGECNT_BIAS_MAX). --Jesper Extra info about my device. Providing these as mlx5 driver can have different allocation modes depending on HW and device priv-flags setup. $ ethtool --show-priv-flags mlx5p1 Private flags for mlx5p1: rx_cqe_moder : on tx_cqe_moder : off rx_cqe_compress : off rx_striding_rq : on rx_no_csum_complete: off xdp_tx_mpwqe : on skb_tx_mpwqe : on tx_port_ts : off $ ethtool -i mlx5p1 driver: mlx5_core version: 6.4.0-rc2-net-next-vm-lock-dbg+ firmware-version: 16.23.1020 (MT_0000000009) expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes $ lspci -v | grep 03:00.0 03:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]