[LibreQoS] LibreQos v1.5 Beta 1 Progress & LibreQoS as a part of Network measurements at Callio FutureMINE in Finland

Frantisek Borsik frantisek.borsik at gmail.com
Sun Jun 16 12:46:25 EDT 2024


Hello to all,

As always, the most up-to-date info can be found in our support chat:
https://chat.libreqos.io/join/fvu3cerayyaumo377xwvpev6/

Let's start with *v1.5 Beta 1 Progress notes *from Herbert, and see *LibreQoS
as a part of Network measurements at Callio FutureMINE in Finland *below!

"Wanted to give you guys an update. We just finished merging in all of the
feature branches into develop for 1.5 - except for the one that actually
updates the version number (we do that last so you don't see "Upgrade
available" before there is, in fact, an upgrade available).
We're still going through a bunch of testing on different systems, so far
it looks good (we've tested all the parts separately, integration testing
sometimes finds some gaps at the edges)." Herbert Wolverson:

*Draft Changelog Since 1.4*


*Unified Configuration System* * Replace `ispConfig.py` with a singular
`/etc/lqos.conf`
* Automatically migrate previous configuration
* In-memory cache system (load the config once and share it, detect changes)
* Shared configuration between Python and Rust


*Dynamic Binpacking** It was a common problem for the CPU-queue assignment
(in tree mode) to allocate too many resources
  to a single CPU.

* Each circuit is assigned a "weight". If you have Long-Term Stats, then
the weight is calculated based on
  past usage AND assigned speed plan. Without LTS, it's just assigned speed
plan.

* The total weight - at this time of day - is then calculated for each
top-level entry.
* A "binpacking" algorithm then attempts to equalize load between CPUs.
* Significant performance improvement on many networks.


*Per-Flow Tracking System** "Flows" are detected as TCP connections, UDP
connections that reuse a source/destination and ICMP between a
source/destination.
* Rather than just track per-host statistics, statistics are attached to a
flow.
* Flows maintain a rate estimation at all times, in-kernel.
* Flows calculate Round-Trip Time (RTT) continuously.
* Flows spot timestamp duplications indicating TCP Retry (or duplicate).
* Much of the kernel code moved from the TC part of eBPF to the XDP part,
giving a modest speed-up and improvement in
  overall throughput.


*Per-Flow Userland/Kernel Kernel-Userland System** Rather than reporting
RTT via a giant data structure, individual reports are fed to the kernel
through a userspace callback
  system.

* Flows "closing" (clean closure) results in a kernel-userspace notify.
* Flows also expire on a periodic tick if no data has arrived in a given
time period.
* This decreased *kernel side* overhead significantly (eBPF kernel to
userspace is non-blocking send).
* This increased userspace CPU usage very slightly, but removed the
processing overhead from the packet-flow execution path.


*Per-Flow Reporting System** RTT is compiled per-flow into a ringbuffer.
Results from very-low (mostly idle) flows are ignored. RTT is calculated
from a median of the last
  hundred reports. Significant accuracy improvement.


** Per-flow TCP retries are recorded.** When flows "close", they are
submitted for additional analysis.
* Simple protocol naming system maps ethertype/port to known protocols.

*Export Flow Data in netflow version 5 and 9 (IPFIX)*


*Closed Flow Reporting System** Created "geo.bin", a compiled list of
by-ASN and by IP geolocations.
* `lqosd` will download a refreshed `geo.bin` periodically.
* Closed flows are mapped to an ASN, giving per-ASN performance reports.
* Closed flows are mapped to a geolocation, giving geographic performance
reports.
* Closed flows are mapped to ethertype and protocol.
* User interface expanded in `lqos_node_manager` to display all of this.


*Preflight checks for lqosd** Prior to startup, common configuration and
hardware support issues are checked.
* Single-queue NICs now get a proper error message.
* If the user tries to run both a Linux bridge and an XDP bridge on the
same interface pair,
  the XDP bridge is disabled and a warning emitted.


*XDP "Hot Cache"** Much CPU time was spent running a longest-prefix match
check on every ISP-facing IP address.
* Added a least-recently-used cache that matches IP adddresses to circuits
with a much less
  expensive fast lookup.

* Added a "negative cache" entry to speed up "this IP still isn't mapped"
* Added cache invalidation code to handle the IP mappings changing
* This resulted in a 20-30% CPU usage reduction under heavy load.


*Config UI** `lqos_node_manager` is now aware of the entire configuration
system.
* All configuration items may be edited.
* `ShapedDevices.csv` can be edited from the web UI.
* `network.json` can be edited from the web UI.
* Heavy validation, ensuring that devices have matching network.json
entries, IPs aren't duplicated, etc.


*LQTop** New `lqtop` CLI tool with much prettier text UI and support for
flows.
UISP Integration 2
* An all-new, massively faster UISP Integration system.
* Includes much better network map traversal.


*Support Tools** CLI tool for running a "sanity check" on common issues.
* Gather configuration into a bundle for sending.
* View the bundle.
* Submit the bundle to LibreQoS for analysis.
* A web UI (`lqos_node_manager`) version of the same thing, using shared
code.


*Misc** Improvements and fixes to all integrations, especially Spylnx.
* Update back-end code to latest versions.
Our lines-of-code counts are approximately:
-----------------------------------
Language                       code
Rust                          18251
Python                         5859
-----------------------------------

And I forgot one: if you're running the 6.x kernel line, XDP metadata kicks
in and you get another decent performance boost - but it still works with
the older kernels (just without the boost).

Also... with all of these improvements stacked, we've cracked the 10gbps
single-flow
barrier on Payne.

[image: image.png]


*"Payne" (the test server at Equinix) is:*

16 cores
Intel(R) Xeon(R) E-2378G CPU @ 2.80GHz
16 mb cache
It has an Intel 25gbps NIC

The number of cores won't matter much for single-flow throughput
(obviously, you still need to have enough for your traffic load).

That test was without other traffic also trying to take resources, so it's
hard to say exactly what's required until we've got a bit more hardware
under our belt.


And now for *LibreQoS as a part of Network measurements at Callio
FutureMINE in Finland!*

Read about it on Twitter/X:
https://x.com/sksauli/status/1802122743434875069
<https://x.com/sksauli/status/1802122743434875069?fbclid=IwZXh0bgNhZW0CMTAAAR2qnL6A8QCDTaX5ZDpxlRABx8jn3d5GWLZreOqPZjx1tuRPG0TEJAIUs3U_aem_ZmFrZWR1bW15MTZieXRlcw>


or on LinkedIn:
https://www.linkedin.com/pulse/network-measurements-callio-futuremine-delta-cygni-labs-dqhdf/

[image: 1718487787971.jpeg]
[image: 1718489068506.jpeg]
[image: 1718489172381.jpeg]
[image: 1718489216151.jpeg]
[image: 1718490403924.jpeg]


All the best,

Frank

Frantisek (Frank) Borsik



https://www.linkedin.com/in/frantisekborsik

Signal, Telegram, WhatsApp: +421919416714

iMessage, mobile: +420775230885

Skype: casioa5302ca

frantisek.borsik at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 170051 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1718487787971.jpeg
Type: image/jpeg
Size: 353957 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0005.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1718489068506.jpeg
Type: image/jpeg
Size: 351307 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0006.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1718489172381.jpeg
Type: image/jpeg
Size: 88854 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0007.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1718489216151.jpeg
Type: image/jpeg
Size: 383101 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0008.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1718490403924.jpeg
Type: image/jpeg
Size: 302569 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20240616/ea0ce9b5/attachment-0009.jpeg>


More information about the LibreQoS mailing list