[LibreQoS] Fwd: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata

Dave Taht dave.taht at gmail.com
Thu Dec 8 20:27:53 EST 2022


---------- Forwarded message ---------
From: Toke Høiland-Jørgensen <toke at redhat.com>
Date: Thu, Dec 8, 2022 at 3:06 PM
Subject: Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata
To: Stanislav Fomichev <sdf at google.com>, <bpf at vger.kernel.org>
Cc: <ast at kernel.org>, <daniel at iogearbox.net>, <andrii at kernel.org>,
<martin.lau at linux.dev>, <song at kernel.org>, <yhs at fb.com>,
<john.fastabend at gmail.com>, <kpsingh at kernel.org>, <sdf at google.com>,
<haoluo at google.com>, <jolsa at kernel.org>, Saeed Mahameed
<saeedm at nvidia.com>, David Ahern <dsahern at gmail.com>, Jakub Kicinski
<kuba at kernel.org>, Willem de Bruijn <willemb at google.com>, Jesper
Dangaard Brouer <brouer at redhat.com>, Anatoly Burakov
<anatoly.burakov at intel.com>, Alexander Lobakin
<alexandr.lobakin at intel.com>, Magnus Karlsson
<magnus.karlsson at gmail.com>, Maryam Tahhan <mtahhan at redhat.com>,
<xdp-hints at xdp-project.net>, <netdev at vger.kernel.org>


Stanislav Fomichev <sdf at google.com> writes:

> From: Toke Høiland-Jørgensen <toke at redhat.com>
>
> Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
> pointer to the mlx5e_skb_from* functions so it can be retrieved from the
> XDP ctx to do this.

So I finally managed to get enough ducks in row to actually benchmark
this. With the caveat that I suddenly can't get the timestamp support to
work (it was working in an earlier version, but now
timestamp_supported() just returns false). I'm not sure if this is an
issue with the enablement patch, or if I just haven't gotten the
hardware configured properly. I'll investigate some more, but figured
I'd post these results now:

Baseline XDP_DROP:         25,678,262 pps / 38.94 ns/pkt
XDP_DROP + read metadata:  23,924,109 pps / 41.80 ns/pkt
Overhead:                   1,754,153 pps /  2.86 ns/pkt

As per the above, this is with calling three kfuncs/pkt
(metadata_supported(), rx_hash_supported() and rx_hash()). So that's
~0.95 ns per function call, which is a bit less, but not far off from
the ~1.2 ns that I'm used to. The tests where I accidentally called the
default kfuncs cut off ~1.3 ns for one less kfunc call, so it's
definitely in that ballpark.

I'm not doing anything with the data, just reading it into an on-stack
buffer, so this is the smallest possible delta from just getting the
data out of the driver. I did confirm that the call instructions are
still in the BPF program bytecode when it's dumped back out from the
kernel.

-Toke



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC


More information about the LibreQoS mailing list