[LibreQoS] Fwd: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata

Dave Taht dave.taht at gmail.com
Thu Dec 8 20:27:53 EST 2022

---------- Forwarded message ---------
From: Toke Høiland-Jørgensen <toke at redhat.com>
Date: Thu, Dec 8, 2022 at 3:06 PM
Subject: Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata
To: Stanislav Fomichev <sdf at google.com>, <bpf at vger.kernel.org>
Cc: <ast at kernel.org>, <daniel at iogearbox.net>, <andrii at kernel.org>,
<martin.lau at linux.dev>, <song at kernel.org>, <yhs at fb.com>,
<john.fastabend at gmail.com>, <kpsingh at kernel.org>, <sdf at google.com>,
<haoluo at google.com>, <jolsa at kernel.org>, Saeed Mahameed
<saeedm at nvidia.com>, David Ahern <dsahern at gmail.com>, Jakub Kicinski
<kuba at kernel.org>, Willem de Bruijn <willemb at google.com>, Jesper
Dangaard Brouer <brouer at redhat.com>, Anatoly Burakov
<anatoly.burakov at intel.com>, Alexander Lobakin
<alexandr.lobakin at intel.com>, Magnus Karlsson
<magnus.karlsson at gmail.com>, Maryam Tahhan <mtahhan at redhat.com>,
<xdp-hints at xdp-project.net>, <netdev at vger.kernel.org>

Stanislav Fomichev <sdf at google.com> writes:

> From: Toke Høiland-Jørgensen <toke at redhat.com>
> Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
> pointer to the mlx5e_skb_from* functions so it can be retrieved from the
> XDP ctx to do this.

So I finally managed to get enough ducks in row to actually benchmark
this. With the caveat that I suddenly can't get the timestamp support to
work (it was working in an earlier version, but now
timestamp_supported() just returns false). I'm not sure if this is an
issue with the enablement patch, or if I just haven't gotten the
hardware configured properly. I'll investigate some more, but figured
I'd post these results now:

Baseline XDP_DROP:         25,678,262 pps / 38.94 ns/pkt
XDP_DROP + read metadata:  23,924,109 pps / 41.80 ns/pkt
Overhead:                   1,754,153 pps /  2.86 ns/pkt

As per the above, this is with calling three kfuncs/pkt
(metadata_supported(), rx_hash_supported() and rx_hash()). So that's
~0.95 ns per function call, which is a bit less, but not far off from
the ~1.2 ns that I'm used to. The tests where I accidentally called the
default kfuncs cut off ~1.3 ns for one less kfunc call, so it's
definitely in that ballpark.

I'm not doing anything with the data, just reading it into an on-stack
buffer, so this is the smallest possible delta from just getting the
data out of the driver. I did confirm that the call instructions are
still in the BPF program bytecode when it's dumped back out from the


This song goes out to all the folk that thought Stadia would work:
Dave Täht CEO, TekLibre, LLC

More information about the LibreQoS mailing list