[LibreQoS] Integration system, aka fun with graph theory

Mon Oct 31 19:31:59 EDT 2022

preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they just
consolidate everything >200 to 200, basically so there's no 'terrible'
color lol.  I think these numbers are reasonable for standard internet
service these days.  for a 'default' value anyway.   >100ms isn't bad
service for most people, and most wisps will have a LOT of traffic coming
through with >100ms from the far reaches of the internet.

Maybe just reasonable defaults like preseem uses for integrated 'generic'
tracking, but then have a separate graph hitting some target services.  ie,
try to get game servers on there, AWS, Cloudflare, Azure, Google cloud.
Show a radar graphic or similar.

On Mon, Oct 31, 2022 at 3:57 PM Robert Chacón via LibreQoS <
libreqos at lists.bufferbloat.net> wrote:

> > I'd agree with color coding (when it exists - no rush, IMO) being
> configurable.
>
> Thankfully it will be configurable, and easily, through the InfluxDB
> interface.
> Any operator will be able to click the Gear icon above the tables and set
> the thresholds to whatever is desired.
> I've set it to include both a standard table and "metaverse-ready" table
> based on Dave's threshold recommendations.
>
>    - Standard (Preseem like)
>    - green = < 75 ms
>       - yellow = < 100 ms
>       - red = > 100 ms
>       - Metaverse-Ready
>       - blue =  < 8ms
>       - green = < 20ms
>       - yellow = < 50ms
>       - orange  = < 70ms
>       - red = > 70ms
>
> Are the defaults here reasonable at least? Should we change the Standard
> table thresholds a bit?
>
> > Only adding 0.00155 ms to packet times is pretty good.
>
> Agreed! That's excellent. Great work on this so far it's looking like
> you're making tremendous progress.
>
> On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS <
> libreqos at lists.bufferbloat.net> wrote:
>
>> I'd agree with color coding (when it exists - no rush, IMO) being
>> configurable.
>>
>> From the "how much delay are we adding" discussion earlier, I thought I'd
>> do a little bit of profiling of the BPF programs themselves. This is with
>> the latest round of performance updates (
>> https://github.com/thebracket/cpumap-pping/issues/2), so it's not
>> measuring anything in production. I simply added a call to get the clock at
>> the start, and again at the end - and log the difference. Measuring both
>> XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap
>> sends it to the right CPU)->(egress)->(TC sends it to the right classifier,
>> on the correct CPU and measures RTT latency). This is adding about two
>> clock checks and a debug log entry to execution time, so measuring it is
>> slowing it down.
>>
>> The results are interesting, and mostly tell me to try a different
>> measurement system. I'm seeing a pretty wide variance. Hammering it with an
>> iperf session and a queue capped at 5 gbit/s: most of the TC timings were
>> 40 nanoseconds - not a packet that requires extra tracking, already in
>> cache, so proceed. When the TCP RTT tracker fired and recorded a
>> performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
>> seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
>> of things is typically in the 300-400 nanosecond range, I saw a handful of
>> worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
>> 0.00349 ms. So - assuming worst case (and keeping the overhead added by the
>> not-so-great monitoring), we're adding *0.0093 ms* to packet transit
>> time with the BPF programs.
>>
>> With a much more sedate queue (ceiling 500 mbit/s), I saw much more
>> consistent numbers. The vast majority of XDP timings were in the 75-150
>> nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
>> have an update to perform - peaking very occasionally at 1500 nanoseconds.
>> Only adding 0.00155 ms to packet times is pretty good.
>>
>> It definitely performs best on long streams, probably because the
>> previous lookups are all in cache. This is also making me question the
>> answer I found to "how long does it take to read the clock?" I'd seen
>> ballpark estimates of 53 nanoseconds. Given that this reads the clock
>> twice, that can't be right. (I'm *really* not sure how to measure that one)
>>
>> Again - not a great test (I'll have to learn the perf system to do this
>> properly - which in turn opens up the potential for flame graphs and some
>> proper tracing). Interesting ballpark, though.
>>
>> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson at gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
>>> libreqos at lists.bufferbloat.net> wrote:
>>>
>>>> How about the idea of "metaverse-ready" metrics, with one table that is
>>>> preseem-like and another that's
>>>>
>>>> blue =  < 8ms
>>>> green = < 20ms
>>>> yellow = < 50ms
>>>> orange  = < 70ms
>>>> red = > 70ms
>>>>
>>>
>>> These need configurable.  There are a lot of wisps that would have
>>> everything orange/red.  We're considering anything under 100ms good on the
>>> rural plans.   Also keep in mind that if you're tracking latence via pping
>>> etc, then you need some buffer in there for the internet at large.  <70ms
>>> to Amazon is one thing, they're very well connected, but <70ms to most of
>>> the internet isn't probably very realistic and would make most charts look
>>> like poop.
>>>
>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> Robert Chacón
> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
> Dev | LibreQoS.io
>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221031/47b6c341/attachment.html>