[LibreQoS] In BPF pping - so far

Mon Oct 17 10:13:26 EDT 2022

[ Adding Simon to Cc ]

Herbert Wolverson via LibreQoS <libreqos at lists.bufferbloat.net> writes:

> Hey,
>
> I've had some pretty good success with merging xdp-pping (
> https://github.com/xdp-project/bpf-examples/blob/master/pping/pping.h )
> into xdp-cpumap-tc ( https://github.com/xdp-project/xdp-cpumap-tc ).
>
> I ported over most of the xdp-pping code, and then changed the entry point
> and packet parsing code to make use of the work already done in
> xdp-cpumap-tc (it's already parsed a big chunk of the packet, no need to do
> it twice). Then I switched the maps to per-cpu maps, and had to pin them -
> otherwise the two tc instances don't properly share data. Right now, output
> is just stubbed - I've still got to port the perfmap output code. Instead,
> I'm dumping a bunch of extra data to the kernel debug pipe, so I can see
> roughly what the output would look like.
>
> With debug enabled and just logging I'm now getting about 4.9 Gbits/sec on
> single-stream iperf between two VMs (with a shaper VM in the middle). :-)

Just FYI, that "just logging" is probably the biggest source of
overhead, then. What Simon found was that sending the data from kernel
to userspace is one of the most expensive bits of epping, at least when
the number of data points goes up (which is does as additional flows are
added).

> So my question: how would you prefer to receive this data? I'll have to
> write a daemon that provides userspace control (periodic cleanup as well as
> reading the performance stream), so the world's kinda our oyster. I can
> stick to Kathie's original format (and dump it to a named pipe, perhaps?),
> a condensed format that only shows what you want to use, an efficient
> binary format if you feel like parsing that...

It would be great if we could combine efforts a bit here so we don't
fork the codebase more than we have to. I.e., if "upstream" epping and
whatever daemon you end up writing can agree on data format etc that
would be fantastic! Added Simon to Cc to facilitate this :)

Briefly what I've discussed before with Simon was to have the ability to
aggregate the metrics in the kernel (WiP PR [0]) and have a userspace
utility periodically pull them out. What we discussed was doing this
using an LPM map (which is not in that PR yet). The idea would be that
userspace would populate the LPM map with the keys (prefixes) they
wanted statistics for (in LibreQOS context that could be one key per
customer, for instance). Epping would then do a map lookup into the LPM,
and if it gets a match it would update the statistics in that map entry
(keeping a histogram of latency values seen, basically). Simon's PR
below uses this technique where userspace will "reset" the histogram
every time it loads it by swapping out two different map entries when it
does a read; this allows you to control the sampling rate from
userspace, and you'll just get the data since the last time you polled.

I was thinking that if we all can agree on the map format, then your
polling daemon could be one userspace "client" for that, and the epping
binary itself could be another; but we could keep compatibility between
the two, so we don't duplicate effort.

Similarly, refactoring of the epping code itself so it can be plugged
into the cpumap-tc code would be a good goal...

-Toke

[0] https://github.com/xdp-project/bpf-examples/pull/59