From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <herberticus@gmail.com>
Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com
 [IPv6:2607:f8b0:4864:20::42d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 175B03B2A4
 for <libreqos@lists.bufferbloat.net>; Mon, 17 Oct 2022 10:59:48 -0400 (EDT)
Received: by mail-pf1-x42d.google.com with SMTP id 67so11308757pfz.12
 for <libreqos@lists.bufferbloat.net>; Mon, 17 Oct 2022 07:59:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:subject:message-id:date:from:in-reply-to:references:mime-version
 :from:to:cc:subject:date:message-id:reply-to;
 bh=T+3o9BXkaSXjYRW4GWSJucwpBVmxwJg0PwUKeAJZx+0=;
 b=cJTZirutaAARlvw9MSzPXghfvwo8x4pM6fQVs+sEaxv7yzBpQnQ6zQwRPyqtr8UYQL
 NC99rgToX3RfLQyYBeIQwoidzLIeTSoGcJG9FlL43RRxoRnA5F0HfhBWIQ7yluZ/Dl9d
 Xk8hM+rmLzodyXWCXZCtb0uswdQWrzBwzC5aabdk2knwCZk2CEVEUZz8tHYpNszIDKVW
 ig+LgLBI39K51xQIbww3eBRlQJ6iSOYkGcGT3MBYrMhiKKv6pZOzWAsrfl+a08iV5xyr
 +t4SqSRbAKfJ3d55PAjV9Gy2RGTWN+L48OuqttDccCGwzeSCdqAHx9Xby6UhiwxlRE1G
 ffzw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:subject:message-id:date:from:in-reply-to:references:mime-version
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=T+3o9BXkaSXjYRW4GWSJucwpBVmxwJg0PwUKeAJZx+0=;
 b=A4abzOmoR3YAqaCHXdoUrqhXax0GcguvvnPhP7Y9raNRX+/sLkKEvg/iCVApII5kk6
 1Bi33OzsJEurnQ6cNDLOukORw+gkrKabe47d9iEDnb7w2VQVo3KDimZx4xRBv9VW6QPE
 m82U1LeZpW2bYv1mo+hTwTyREcYDuBqC6jdQKfjJEWsrd+9D2CHucX/Yb/3AgBz7Fd7l
 BshQIP69FN32n+raBK7tpEBK6gg0wHRcw/90esrN7xCjfdW6PGIeUD9ujU+TWly4+yHd
 yzxvhCxF1yDBX9gWPZxaajXDhPltkmH3hfvdviHezRHxFMA7k4JklabTon/WOd5tZsHb
 DU8Q==
X-Gm-Message-State: ACrzQf2SvVop4DrLkmbR+OtNJphUPGqhXGtb29QjaBmYaoHGuJ5LUig/
 q0deAItHY/onDHOyEyQjsqra2ogE0US59zUAtk1NHaxt
X-Received: by 2002:a65:408b:0:b0:42a:55fb:60b0 with SMTP id
 t11-20020a65408b000000b0042a55fb60b0mt8316110pgp.431.1666018786768; Mon, 17
 Oct 2022 07:59:46 -0700 (PDT)
MIME-Version: 1.0
References: <CA+erpM5CNocpTnxNpTyEifaLv2P-ZbRXASUxS7iYr8LgCRgRNA@mail.gmail.com>
 <87bkqatu61.fsf@toke.dk>
In-Reply-To: <87bkqatu61.fsf@toke.dk>
From: Herbert Wolverson <herberticus@gmail.com>
Date: Mon, 17 Oct 2022 09:59:36 -0500
Message-ID: <CA+erpM6rxD44HcNeQ14RK6p-_PxRQiKkteC8gBDYqyaJ34mHpQ@mail.gmail.com>
Cc: libreqos@lists.bufferbloat.net, Simon Sundberg <Simon.Sundberg@kau.se>
Content-Type: multipart/alternative; boundary="000000000000fb0d8505eb3c39be"
Subject: Re: [LibreQoS] In BPF pping - so far
X-BeenThere: libreqos@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: <libreqos.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/libreqos>,
 <mailto:libreqos-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/libreqos>
List-Post: <mailto:libreqos@lists.bufferbloat.net>
List-Help: <mailto:libreqos-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/libreqos>,
 <mailto:libreqos-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2022 14:59:48 -0000

--000000000000fb0d8505eb3c39be
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I have no doubt that logging is the biggest slow-down, followed by some
dumb things (e.g. I just significantly
increased performance by not accidentally copying addresses twice...) I'm
honestly pleasantly surprised
by how performant the debug logging is!

In the short-term, this is a fork. I'm not planning on keeping it that way,
but I'm early enough into the
task that I need the freedom to really mess things up without upsetting
upstream. ;-) At some point very
soon, I'll post a temporary GitHub repo with the hacked and messy version
in, with a view to getting
more eyes on it before it transforms into something more generally useful.
Cleaning up the more
embarrassing "written in a hurry" code.

The per-stream RTT buffer looks great. I'll definitely try to use that. I
was a little alarmed to discover
that running clean-up on the kernel side is practically impossible, making
a management daemon a
necessity (since the XDP mapping is long-running, the packet timing is
likely to be running whether or
not LibreQOS is actively reading from it). A ready-summarized buffer format
makes a LOT of sense.
At least until I run out of memory. ;-)

Thanks,
Herbert

On Mon, Oct 17, 2022 at 9:13 AM Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=
.dk> wrote:

> [ Adding Simon to Cc ]
>
> Herbert Wolverson via LibreQoS <libreqos@lists.bufferbloat.net> writes:
>
> > Hey,
> >
> > I've had some pretty good success with merging xdp-pping (
> > https://github.com/xdp-project/bpf-examples/blob/master/pping/pping.h )
> > into xdp-cpumap-tc ( https://github.com/xdp-project/xdp-cpumap-tc ).
> >
> > I ported over most of the xdp-pping code, and then changed the entry
> point
> > and packet parsing code to make use of the work already done in
> > xdp-cpumap-tc (it's already parsed a big chunk of the packet, no need t=
o
> do
> > it twice). Then I switched the maps to per-cpu maps, and had to pin the=
m
> -
> > otherwise the two tc instances don't properly share data. Right now,
> output
> > is just stubbed - I've still got to port the perfmap output code.
> Instead,
> > I'm dumping a bunch of extra data to the kernel debug pipe, so I can se=
e
> > roughly what the output would look like.
> >
> > With debug enabled and just logging I'm now getting about 4.9 Gbits/sec
> on
> > single-stream iperf between two VMs (with a shaper VM in the middle). :=
-)
>
> Just FYI, that "just logging" is probably the biggest source of
> overhead, then. What Simon found was that sending the data from kernel
> to userspace is one of the most expensive bits of epping, at least when
> the number of data points goes up (which is does as additional flows are
> added).
>
> > So my question: how would you prefer to receive this data? I'll have to
> > write a daemon that provides userspace control (periodic cleanup as wel=
l
> as
> > reading the performance stream), so the world's kinda our oyster. I can
> > stick to Kathie's original format (and dump it to a named pipe,
> perhaps?),
> > a condensed format that only shows what you want to use, an efficient
> > binary format if you feel like parsing that...
>
> It would be great if we could combine efforts a bit here so we don't
> fork the codebase more than we have to. I.e., if "upstream" epping and
> whatever daemon you end up writing can agree on data format etc that
> would be fantastic! Added Simon to Cc to facilitate this :)
>
> Briefly what I've discussed before with Simon was to have the ability to
> aggregate the metrics in the kernel (WiP PR [0]) and have a userspace
> utility periodically pull them out. What we discussed was doing this
> using an LPM map (which is not in that PR yet). The idea would be that
> userspace would populate the LPM map with the keys (prefixes) they
> wanted statistics for (in LibreQOS context that could be one key per
> customer, for instance). Epping would then do a map lookup into the LPM,
> and if it gets a match it would update the statistics in that map entry
> (keeping a histogram of latency values seen, basically). Simon's PR
> below uses this technique where userspace will "reset" the histogram
> every time it loads it by swapping out two different map entries when it
> does a read; this allows you to control the sampling rate from
> userspace, and you'll just get the data since the last time you polled.
>
> I was thinking that if we all can agree on the map format, then your
> polling daemon could be one userspace "client" for that, and the epping
> binary itself could be another; but we could keep compatibility between
> the two, so we don't duplicate effort.
>
> Similarly, refactoring of the epping code itself so it can be plugged
> into the cpumap-tc code would be a good goal...
>
> -Toke
>
> [0] https://github.com/xdp-project/bpf-examples/pull/59
>

--000000000000fb0d8505eb3c39be
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I have no doubt that logging is the biggest slow-down=
, followed by some dumb things (e.g. I just significantly <br></div><div>in=
creased performance by not accidentally copying addresses twice...) I&#39;m=
 honestly pleasantly surprised</div><div>by how performant the debug loggin=
g is!</div><div><br></div><div>In the short-term, this is a fork. I&#39;m n=
ot planning on keeping it that way, but I&#39;m early enough into the</div>=
<div>task that I need the freedom to really mess things up without upsettin=
g upstream. ;-) At some point very <br></div><div>soon, I&#39;ll post a tem=
porary GitHub repo with the hacked and messy version in, with a view to get=
ting <br></div><div>more eyes on it before it transforms into something mor=
e generally useful. Cleaning up the more</div><div>embarrassing &quot;writt=
en in a hurry&quot; code.<br></div><div><br></div><div>The per-stream RTT b=
uffer looks great. I&#39;ll definitely try to use that. I was a little alar=
med to discover</div><div>that running clean-up on the kernel side is pract=
ically impossible, making a management daemon a</div><div>necessity (since =
the XDP mapping is long-running, the packet timing is likely to be running =
whether or</div><div>not LibreQOS is actively reading from it). A ready-sum=
marized buffer format makes a LOT of sense.</div><div>At least until I run =
out of memory. ;-)</div><div><br></div><div>Thanks,</div><div>Herbert<br></=
div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at=
tr">On Mon, Oct 17, 2022 at 9:13 AM Toke H=C3=B8iland-J=C3=B8rgensen &lt;<a=
 href=3D"mailto:toke@toke.dk">toke@toke.dk</a>&gt; wrote:<br></div><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px=
 solid rgb(204,204,204);padding-left:1ex">[ Adding Simon to Cc ]<br>
<br>
Herbert Wolverson via LibreQoS &lt;<a href=3D"mailto:libreqos@lists.bufferb=
loat.net" target=3D"_blank">libreqos@lists.bufferbloat.net</a>&gt; writes:<=
br>
<br>
&gt; Hey,<br>
&gt;<br>
&gt; I&#39;ve had some pretty good success with merging xdp-pping (<br>
&gt; <a href=3D"https://github.com/xdp-project/bpf-examples/blob/master/ppi=
ng/pping.h" rel=3D"noreferrer" target=3D"_blank">https://github.com/xdp-pro=
ject/bpf-examples/blob/master/pping/pping.h</a> )<br>
&gt; into xdp-cpumap-tc ( <a href=3D"https://github.com/xdp-project/xdp-cpu=
map-tc" rel=3D"noreferrer" target=3D"_blank">https://github.com/xdp-project=
/xdp-cpumap-tc</a> ).<br>
&gt;<br>
&gt; I ported over most of the xdp-pping code, and then changed the entry p=
oint<br>
&gt; and packet parsing code to make use of the work already done in<br>
&gt; xdp-cpumap-tc (it&#39;s already parsed a big chunk of the packet, no n=
eed to do<br>
&gt; it twice). Then I switched the maps to per-cpu maps, and had to pin th=
em -<br>
&gt; otherwise the two tc instances don&#39;t properly share data. Right no=
w, output<br>
&gt; is just stubbed - I&#39;ve still got to port the perfmap output code. =
Instead,<br>
&gt; I&#39;m dumping a bunch of extra data to the kernel debug pipe, so I c=
an see<br>
&gt; roughly what the output would look like.<br>
&gt;<br>
&gt; With debug enabled and just logging I&#39;m now getting about 4.9 Gbit=
s/sec on<br>
&gt; single-stream iperf between two VMs (with a shaper VM in the middle). =
:-)<br>
<br>
Just FYI, that &quot;just logging&quot; is probably the biggest source of<b=
r>
overhead, then. What Simon found was that sending the data from kernel<br>
to userspace is one of the most expensive bits of epping, at least when<br>
the number of data points goes up (which is does as additional flows are<br=
>
added).<br>
<br>
&gt; So my question: how would you prefer to receive this data? I&#39;ll ha=
ve to<br>
&gt; write a daemon that provides userspace control (periodic cleanup as we=
ll as<br>
&gt; reading the performance stream), so the world&#39;s kinda our oyster. =
I can<br>
&gt; stick to Kathie&#39;s original format (and dump it to a named pipe, pe=
rhaps?),<br>
&gt; a condensed format that only shows what you want to use, an efficient<=
br>
&gt; binary format if you feel like parsing that...<br>
<br>
It would be great if we could combine efforts a bit here so we don&#39;t<br=
>
fork the codebase more than we have to. I.e., if &quot;upstream&quot; eppin=
g and<br>
whatever daemon you end up writing can agree on data format etc that<br>
would be fantastic! Added Simon to Cc to facilitate this :)<br>
<br>
Briefly what I&#39;ve discussed before with Simon was to have the ability t=
o<br>
aggregate the metrics in the kernel (WiP PR [0]) and have a userspace<br>
utility periodically pull them out. What we discussed was doing this<br>
using an LPM map (which is not in that PR yet). The idea would be that<br>
userspace would populate the LPM map with the keys (prefixes) they<br>
wanted statistics for (in LibreQOS context that could be one key per<br>
customer, for instance). Epping would then do a map lookup into the LPM,<br=
>
and if it gets a match it would update the statistics in that map entry<br>
(keeping a histogram of latency values seen, basically). Simon&#39;s PR<br>
below uses this technique where userspace will &quot;reset&quot; the histog=
ram<br>
every time it loads it by swapping out two different map entries when it<br=
>
does a read; this allows you to control the sampling rate from<br>
userspace, and you&#39;ll just get the data since the last time you polled.=
<br>
<br>
I was thinking that if we all can agree on the map format, then your<br>
polling daemon could be one userspace &quot;client&quot; for that, and the =
epping<br>
binary itself could be another; but we could keep compatibility between<br>
the two, so we don&#39;t duplicate effort.<br>
<br>
Similarly, refactoring of the epping code itself so it can be plugged<br>
into the cpumap-tc code would be a good goal...<br>
<br>
-Toke<br>
<br>
[0] <a href=3D"https://github.com/xdp-project/bpf-examples/pull/59" rel=3D"=
noreferrer" target=3D"_blank">https://github.com/xdp-project/bpf-examples/p=
ull/59</a><br>
</blockquote></div>

--000000000000fb0d8505eb3c39be--