[LibreQoS] Integration system, aka fun with graph theory

Mon Oct 31 17:19:53 EDT 2022

I'd agree with color coding (when it exists - no rush, IMO) being
configurable.

>From the "how much delay are we adding" discussion earlier, I thought I'd
do a little bit of profiling of the BPF programs themselves. This is with
the latest round of performance updates (
https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring
anything in production. I simply added a call to get the clock at the
start, and again at the end - and log the difference. Measuring both XDP
and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it
to the right CPU)->(egress)->(TC sends it to the right classifier, on the
correct CPU and measures RTT latency). This is adding about two clock
checks and a debug log entry to execution time, so measuring it is slowing
it down.

The results are interesting, and mostly tell me to try a different
measurement system. I'm seeing a pretty wide variance. Hammering it with an
iperf session and a queue capped at 5 gbit/s: most of the TC timings were
40 nanoseconds - not a packet that requires extra tracking, already in
cache, so proceed. When the TCP RTT tracker fired and recorded a
performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
of things is typically in the 300-400 nanosecond range, I saw a handful of
worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
0.00349 ms. So - assuming worst case (and keeping the overhead added by the
not-so-great monitoring), we're adding *0.0093 ms* to packet transit time
with the BPF programs.

With a much more sedate queue (ceiling 500 mbit/s), I saw much more
consistent numbers. The vast majority of XDP timings were in the 75-150
nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
have an update to perform - peaking very occasionally at 1500 nanoseconds.
Only adding 0.00155 ms to packet times is pretty good.

It definitely performs best on long streams, probably because the previous
lookups are all in cache. This is also making me question the answer I
found to "how long does it take to read the clock?" I'd seen ballpark
estimates of 53 nanoseconds. Given that this reads the clock twice, that
can't be right. (I'm *really* not sure how to measure that one)

Again - not a great test (I'll have to learn the perf system to do this
properly - which in turn opens up the potential for flame graphs and some
proper tracing). Interesting ballpark, though.

On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson at gmail.com> wrote:

>
>
> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
> libreqos at lists.bufferbloat.net> wrote:
>
>> How about the idea of "metaverse-ready" metrics, with one table that is
>> preseem-like and another that's
>>
>> blue =  < 8ms
>> green = < 20ms
>> yellow = < 50ms
>> orange  = < 70ms
>> red = > 70ms
>>
>
> These need configurable.  There are a lot of wisps that would have
> everything orange/red.  We're considering anything under 100ms good on the
> rural plans.   Also keep in mind that if you're tracking latence via pping
> etc, then you need some buffer in there for the internet at large.  <70ms
> to Amazon is one thing, they're very well connected, but <70ms to most of
> the internet isn't probably very realistic and would make most charts look
> like poop.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221031/7fb6c826/attachment-0001.html>