Many ISPs need the kinds of quality shaping cake can do
 help / color / mirror / Atom feed
* [LibreQoS] Integration system, aka fun with graph theory
@ 2022-10-27 21:33 Herbert Wolverson
  2022-10-27 21:41 ` Dave Taht
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-27 21:33 UTC (permalink / raw)
  To: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 1738 bytes --]

So I've been doing some work on getting UISP integration (and integrations
in general) to work a bit more smoothly.

I started by implementing a graph structure that mirrors both the networks
and sites system. It's not done yet, but the basics are coming together
nicely. You can see my progress so far at:
https://github.com/thebracket/LibreQoS/tree/integration-common-graph

Our UISP instance is a *great* testcase for torturing the system. I even
found a case of UISP somehow auto-generating a circular portion of the
tree. We have:

   - Non Ubiquiti devices as "other devices"
   - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
   shared 100 mbit")
   - Bridge mode devices using Option 82 to always allocate the same IP,
   with a "service IP" entry
   - Various bits of infrastructure mapped
   - Sites that go to client sites, which go to other client sites

In other words, over the years we've unleashed a bit of a monster. Cleaning
it up is a useful talk, but I wanted the integration to be able to handle
pathological cases like us!

So I fed our network into the current graph generator, and used graphviz to
spit out a directed graph:
[image: image.png]
That doesn't include client sites! Legend:


   - Green = the root site.
   - Red = a site
   - Blue = an access point
   - Magenta = a client site that has children

So the part in "common" is designed heavily to reduce repetition. When it's
done, you should be able to feed in sites, APs, clients, devices, etc. in a
pretty flexible manner. Given how much code is shared between the UISP and
Splynx integration code, I'm pretty sure both will be cut to a tiny
fraction of the total code. :-)

I can't post the full tree, it's full of client names.

[-- Attachment #1.2: Type: text/html, Size: 2233 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson
@ 2022-10-27 21:41 ` Dave Taht
  2022-10-27 21:44 ` Dave Taht
  2022-10-27 21:48 ` Robert Chacón
  2 siblings, 0 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-27 21:41 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos, Richard E. Brown


[-- Attachment #1.1: Type: text/plain, Size: 2561 bytes --]

One of bufferbloat.net's main folk was (and remains) rich brown, who helped
create "intermapper" so many years ago. I think he sold it off when he
retired... I don't know if anyone uses it anymore... hey rich!!! check this
out!!!

On Thu, Oct 27, 2022 at 2:33 PM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> So I've been doing some work on getting UISP integration (and integrations
> in general) to work a bit more smoothly.
>
> I started by implementing a graph structure that mirrors both the networks
> and sites system. It's not done yet, but the basics are coming together
> nicely. You can see my progress so far at:
> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>
> Our UISP instance is a *great* testcase for torturing the system. I even
> found a case of UISP somehow auto-generating a circular portion of the
> tree. We have:
>
>    - Non Ubiquiti devices as "other devices"
>    - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
>    shared 100 mbit")
>    - Bridge mode devices using Option 82 to always allocate the same IP,
>    with a "service IP" entry
>    - Various bits of infrastructure mapped
>    - Sites that go to client sites, which go to other client sites
>
> In other words, over the years we've unleashed a bit of a monster.
> Cleaning it up is a useful talk, but I wanted the integration to be able to
> handle pathological cases like us!
>
> So I fed our network into the current graph generator, and used graphviz
> to spit out a directed graph:
> [image: image.png]
> That doesn't include client sites! Legend:
>
>
>    - Green = the root site.
>    - Red = a site
>    - Blue = an access point
>    - Magenta = a client site that has children
>
> So the part in "common" is designed heavily to reduce repetition. When
> it's done, you should be able to feed in sites, APs, clients, devices, etc.
> in a pretty flexible manner. Given how much code is shared between the UISP
> and Splynx integration code, I'm pretty sure both will be cut to a tiny
> fraction of the total code. :-)
>
> I can't post the full tree, it's full of client names.
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

[-- Attachment #1.2: Type: text/html, Size: 3714 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson
  2022-10-27 21:41 ` Dave Taht
@ 2022-10-27 21:44 ` Dave Taht
  2022-10-27 21:48 ` Robert Chacón
  2 siblings, 0 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-27 21:44 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos

Not necessarily useful in this context, but one of my all time
favorite graphing tools was the gource animations for commit logs and
developer interest. You think that's kind of a boring subject, yes?
Well, play one of these animations back...

https://gource.io/

I've always kind of wanted to see a network evolve over time, in much
the same way.



On Thu, Oct 27, 2022 at 2:33 PM Herbert Wolverson via LibreQoS
<libreqos@lists.bufferbloat.net> wrote:
>
> So I've been doing some work on getting UISP integration (and integrations in general) to work a bit more smoothly.
>
> I started by implementing a graph structure that mirrors both the networks and sites system. It's not done yet, but the basics are coming together nicely. You can see my progress so far at: https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>
> Our UISP instance is a great testcase for torturing the system. I even found a case of UISP somehow auto-generating a circular portion of the tree. We have:
>
> Non Ubiquiti devices as "other devices"
> Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 shared 100 mbit")
> Bridge mode devices using Option 82 to always allocate the same IP, with a "service IP" entry
> Various bits of infrastructure mapped
> Sites that go to client sites, which go to other client sites
>
> In other words, over the years we've unleashed a bit of a monster. Cleaning it up is a useful talk, but I wanted the integration to be able to handle pathological cases like us!
>
> So I fed our network into the current graph generator, and used graphviz to spit out a directed graph:
>
> That doesn't include client sites! Legend:
>
> Green = the root site.
> Red = a site
> Blue = an access point
> Magenta = a client site that has children
>
> So the part in "common" is designed heavily to reduce repetition. When it's done, you should be able to feed in sites, APs, clients, devices, etc. in a pretty flexible manner. Given how much code is shared between the UISP and Splynx integration code, I'm pretty sure both will be cut to a tiny fraction of the total code. :-)
>
> I can't post the full tree, it's full of client names.
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson
  2022-10-27 21:41 ` Dave Taht
  2022-10-27 21:44 ` Dave Taht
@ 2022-10-27 21:48 ` Robert Chacón
  2022-10-28  0:27   ` dan
  2 siblings, 1 reply; 33+ messages in thread
From: Robert Chacón @ 2022-10-27 21:48 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 2472 bytes --]

This is awesome! Way to go here. Thank you for contributing this.
Being able to map out these complex integrations will help ISPs a ton, and
I really like that it is sharing common features between the Splynx and
UISP integrations.

Thanks,
Robert

On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> So I've been doing some work on getting UISP integration (and integrations
> in general) to work a bit more smoothly.
>
> I started by implementing a graph structure that mirrors both the networks
> and sites system. It's not done yet, but the basics are coming together
> nicely. You can see my progress so far at:
> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>
> Our UISP instance is a *great* testcase for torturing the system. I even
> found a case of UISP somehow auto-generating a circular portion of the
> tree. We have:
>
>    - Non Ubiquiti devices as "other devices"
>    - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
>    shared 100 mbit")
>    - Bridge mode devices using Option 82 to always allocate the same IP,
>    with a "service IP" entry
>    - Various bits of infrastructure mapped
>    - Sites that go to client sites, which go to other client sites
>
> In other words, over the years we've unleashed a bit of a monster.
> Cleaning it up is a useful talk, but I wanted the integration to be able to
> handle pathological cases like us!
>
> So I fed our network into the current graph generator, and used graphviz
> to spit out a directed graph:
> [image: image.png]
> That doesn't include client sites! Legend:
>
>
>    - Green = the root site.
>    - Red = a site
>    - Blue = an access point
>    - Magenta = a client site that has children
>
> So the part in "common" is designed heavily to reduce repetition. When
> it's done, you should be able to feed in sites, APs, clients, devices, etc.
> in a pretty flexible manner. Given how much code is shared between the UISP
> and Splynx integration code, I'm pretty sure both will be cut to a tiny
> fraction of the total code. :-)
>
> I can't post the full tree, it's full of client names.
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>

[-- Attachment #1.2: Type: text/html, Size: 3503 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-27 21:48 ` Robert Chacón
@ 2022-10-28  0:27   ` dan
  2022-10-28 12:40     ` Herbert Wolverson
  0 siblings, 1 reply; 33+ messages in thread
From: dan @ 2022-10-28  0:27 UTC (permalink / raw)
  To: Robert Chacón; +Cc: Herbert Wolverson, libreqos


[-- Attachment #1.1: Type: text/plain, Size: 3524 bytes --]

we're pretty similar in that we've made UISP a mess.  Multiple paths to a
pop.  multiple pops on the network.  failover between pops.  Lots of
'other' devices. handing out /29 etc to customers.

Some sort of discovery would be nice.  Ideally though, pulling something
from SNMP or router APIs etc to build the paths, but having a 'network
elements' list with each of the links described.  ie, backhaul 12 has MACs
..01 and ...02 at 300x100 and then build the topology around that from
discovery.

I've also thought about doing routine trace routes or watching TTLs or
something like that to get some indication that topology has changed and
then do another discovery and potential tree rebuild.

On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> This is awesome! Way to go here. Thank you for contributing this.
> Being able to map out these complex integrations will help ISPs a ton, and
> I really like that it is sharing common features between the Splynx and
> UISP integrations.
>
> Thanks,
> Robert
>
> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> So I've been doing some work on getting UISP integration (and
>> integrations in general) to work a bit more smoothly.
>>
>> I started by implementing a graph structure that mirrors both the
>> networks and sites system. It's not done yet, but the basics are coming
>> together nicely. You can see my progress so far at:
>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>
>> Our UISP instance is a *great* testcase for torturing the system. I even
>> found a case of UISP somehow auto-generating a circular portion of the
>> tree. We have:
>>
>>    - Non Ubiquiti devices as "other devices"
>>    - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
>>    shared 100 mbit")
>>    - Bridge mode devices using Option 82 to always allocate the same IP,
>>    with a "service IP" entry
>>    - Various bits of infrastructure mapped
>>    - Sites that go to client sites, which go to other client sites
>>
>> In other words, over the years we've unleashed a bit of a monster.
>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>> handle pathological cases like us!
>>
>> So I fed our network into the current graph generator, and used graphviz
>> to spit out a directed graph:
>> [image: image.png]
>> That doesn't include client sites! Legend:
>>
>>
>>    - Green = the root site.
>>    - Red = a site
>>    - Blue = an access point
>>    - Magenta = a client site that has children
>>
>> So the part in "common" is designed heavily to reduce repetition. When
>> it's done, you should be able to feed in sites, APs, clients, devices, etc.
>> in a pretty flexible manner. Given how much code is shared between the UISP
>> and Splynx integration code, I'm pretty sure both will be cut to a tiny
>> fraction of the total code. :-)
>>
>> I can't post the full tree, it's full of client names.
>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> Robert Chacón
> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>

[-- Attachment #1.2: Type: text/html, Size: 4969 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-28  0:27   ` dan
@ 2022-10-28 12:40     ` Herbert Wolverson
  2022-10-28 17:43       ` Herbert Wolverson
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-28 12:40 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 4917 bytes --]

Dave: I love those Gource animations! Game development is my other hobby, I
could easily get lost for weeks tweaking the shaders to make the glow "just
right". :-)

Dan: Discovery would be nice, but I don't think we're ready to look in that
direction yet. I'm trying to build a "common grammar" to make it easier to
express network layout from integrations; that would be another form/layer
of integration and a lot easier to work with once there's a solid
foundation. Preseem does some of this (admittedly over-eagerly; nothing
needs to query SNMP that often!), and the SNMP route is quite remarkably
convoluted. Their support turned on a few "extra" modules to deal with
things like PMP450 clients that change MAC when you put them in bridge mode
vs NAT mode (and report the bridge mode CPE in some places either way),
Elevate CPEs that almost but not quite make sense. Robert's code has the
beginnings of some of this, scanning Mikrotik routers for IPv6 allocations
by MAC (this is also the hardest part for me to test, since I don't have
any v6 to test, currently).

We tend to use UISP as the "source of truth" and treat it like a database
for a ton of external tools (mostly ones we've created).

On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:

> we're pretty similar in that we've made UISP a mess.  Multiple paths to a
> pop.  multiple pops on the network.  failover between pops.  Lots of
> 'other' devices. handing out /29 etc to customers.
>
> Some sort of discovery would be nice.  Ideally though, pulling something
> from SNMP or router APIs etc to build the paths, but having a 'network
> elements' list with each of the links described.  ie, backhaul 12 has MACs
> ..01 and ...02 at 300x100 and then build the topology around that from
> discovery.
>
> I've also thought about doing routine trace routes or watching TTLs or
> something like that to get some indication that topology has changed and
> then do another discovery and potential tree rebuild.
>
> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> This is awesome! Way to go here. Thank you for contributing this.
>> Being able to map out these complex integrations will help ISPs a ton,
>> and I really like that it is sharing common features between the Splynx and
>> UISP integrations.
>>
>> Thanks,
>> Robert
>>
>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> So I've been doing some work on getting UISP integration (and
>>> integrations in general) to work a bit more smoothly.
>>>
>>> I started by implementing a graph structure that mirrors both the
>>> networks and sites system. It's not done yet, but the basics are coming
>>> together nicely. You can see my progress so far at:
>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>
>>> Our UISP instance is a *great* testcase for torturing the system. I
>>> even found a case of UISP somehow auto-generating a circular portion of the
>>> tree. We have:
>>>
>>>    - Non Ubiquiti devices as "other devices"
>>>    - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
>>>    shared 100 mbit")
>>>    - Bridge mode devices using Option 82 to always allocate the same
>>>    IP, with a "service IP" entry
>>>    - Various bits of infrastructure mapped
>>>    - Sites that go to client sites, which go to other client sites
>>>
>>> In other words, over the years we've unleashed a bit of a monster.
>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>> handle pathological cases like us!
>>>
>>> So I fed our network into the current graph generator, and used graphviz
>>> to spit out a directed graph:
>>> [image: image.png]
>>> That doesn't include client sites! Legend:
>>>
>>>
>>>    - Green = the root site.
>>>    - Red = a site
>>>    - Blue = an access point
>>>    - Magenta = a client site that has children
>>>
>>> So the part in "common" is designed heavily to reduce repetition. When
>>> it's done, you should be able to feed in sites, APs, clients, devices, etc.
>>> in a pretty flexible manner. Given how much code is shared between the UISP
>>> and Splynx integration code, I'm pretty sure both will be cut to a tiny
>>> fraction of the total code. :-)
>>>
>>> I can't post the full tree, it's full of client names.
>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>
>>
>> --
>> Robert Chacón
>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>

[-- Attachment #1.2: Type: text/html, Size: 6664 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-28 12:40     ` Herbert Wolverson
@ 2022-10-28 17:43       ` Herbert Wolverson
  2022-10-28 19:05         ` Robert Chacón
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-28 17:43 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 8313 bytes --]

The integration is coming along nicely. Some progress updates:

   - You can specify a variable in ispConfig.py named "uispSite". This sets
   where in the topology you want the tree to start. This has two purposes:
      - It's hard to be psychic and know for sure where the shaper is in
      the network.
      - You could run multiple shapers at different egress points, with
      failover - and rebuild the entire topology from the point of view of a
      network node.
   - "Child node with children" are now automatically converted into a
   "(Generated Site) name" site, and their children rearranged. This:
      - Allows you to set the "site" bandwidth independently of the client
      site bandwidth.
      - Makes for easier trees, because we're inserting the site that
      really should be there.
   - Network.json generation (not the shaped devices file yet) is
   automatically generated from a tree, once PrepareTree() and
   createNetworkJson() are called.
      - There's a unit test that generates the network.example.json file
      and compares it with the original to ensure that they match.
   - Unit test coverage hits every function in the graph system, now.

I'm liking this setup. With the non-vendor-specific logic contained inside
the NetworkGraph type, the actual UISP code to generate the example tree is
down to 65
lines of code, including comments. That'll grow a bit as I re-insert some
automatic speed limit determination, AP/Site speed overrides (
i.e. the integrationUISPbandwidths.csv file). Still pretty clean.

Creating the network.example.json file only requires:
from integrationCommon import NetworkGraph, NetworkNode, NodeType
        import json
        net = NetworkGraph()
        net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site,
1000, 1000))
        net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site,
500, 500))
        net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap,
500, 500))
        net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", NodeType.
site, 500, 500))
        net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType.site,
200, 200))
        net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap, 120,
120))
        net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType.site,
60, 60))
        net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.ap,
30, 30))
        net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType.site,
200, 200))
        net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap, 100,
100))
        net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap,
150, 150))
        net.prepareTree()
        net.createNetworkJson()

(The id and name fields are duplicated right now, I'm using readable names
to keep me sane. The third string is the parent, and the last two numbers
are bandwidth limits)
The nice, readable format being:
NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=NodeType.
site, download=1000, upload=1000)

That in turns gives you the example network:
[image: image.png]


On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com>
wrote:

> Dave: I love those Gource animations! Game development is my other hobby,
> I could easily get lost for weeks tweaking the shaders to make the glow
> "just right". :-)
>
> Dan: Discovery would be nice, but I don't think we're ready to look in
> that direction yet. I'm trying to build a "common grammar" to make it
> easier to express network layout from integrations; that would be another
> form/layer of integration and a lot easier to work with once there's a
> solid foundation. Preseem does some of this (admittedly over-eagerly;
> nothing needs to query SNMP that often!), and the SNMP route is quite
> remarkably convoluted. Their support turned on a few "extra" modules to
> deal with things like PMP450 clients that change MAC when you put them in
> bridge mode vs NAT mode (and report the bridge mode CPE in some places
> either way), Elevate CPEs that almost but not quite make sense. Robert's
> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
> allocations by MAC (this is also the hardest part for me to test, since I
> don't have any v6 to test, currently).
>
> We tend to use UISP as the "source of truth" and treat it like a database
> for a ton of external tools (mostly ones we've created).
>
> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>
>> we're pretty similar in that we've made UISP a mess.  Multiple paths to a
>> pop.  multiple pops on the network.  failover between pops.  Lots of
>> 'other' devices. handing out /29 etc to customers.
>>
>> Some sort of discovery would be nice.  Ideally though, pulling something
>> from SNMP or router APIs etc to build the paths, but having a 'network
>> elements' list with each of the links described.  ie, backhaul 12 has MACs
>> ..01 and ...02 at 300x100 and then build the topology around that from
>> discovery.
>>
>> I've also thought about doing routine trace routes or watching TTLs or
>> something like that to get some indication that topology has changed and
>> then do another discovery and potential tree rebuild.
>>
>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> This is awesome! Way to go here. Thank you for contributing this.
>>> Being able to map out these complex integrations will help ISPs a ton,
>>> and I really like that it is sharing common features between the Splynx and
>>> UISP integrations.
>>>
>>> Thanks,
>>> Robert
>>>
>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> So I've been doing some work on getting UISP integration (and
>>>> integrations in general) to work a bit more smoothly.
>>>>
>>>> I started by implementing a graph structure that mirrors both the
>>>> networks and sites system. It's not done yet, but the basics are coming
>>>> together nicely. You can see my progress so far at:
>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>
>>>> Our UISP instance is a *great* testcase for torturing the system. I
>>>> even found a case of UISP somehow auto-generating a circular portion of the
>>>> tree. We have:
>>>>
>>>>    - Non Ubiquiti devices as "other devices"
>>>>    - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
>>>>    shared 100 mbit")
>>>>    - Bridge mode devices using Option 82 to always allocate the same
>>>>    IP, with a "service IP" entry
>>>>    - Various bits of infrastructure mapped
>>>>    - Sites that go to client sites, which go to other client sites
>>>>
>>>> In other words, over the years we've unleashed a bit of a monster.
>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>> handle pathological cases like us!
>>>>
>>>> So I fed our network into the current graph generator, and used
>>>> graphviz to spit out a directed graph:
>>>> [image: image.png]
>>>> That doesn't include client sites! Legend:
>>>>
>>>>
>>>>    - Green = the root site.
>>>>    - Red = a site
>>>>    - Blue = an access point
>>>>    - Magenta = a client site that has children
>>>>
>>>> So the part in "common" is designed heavily to reduce repetition. When
>>>> it's done, you should be able to feed in sites, APs, clients, devices, etc.
>>>> in a pretty flexible manner. Given how much code is shared between the UISP
>>>> and Splynx integration code, I'm pretty sure both will be cut to a tiny
>>>> fraction of the total code. :-)
>>>>
>>>> I can't post the full tree, it's full of client names.
>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> Robert Chacón
>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>

[-- Attachment #1.2: Type: text/html, Size: 24123 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-28 17:43       ` Herbert Wolverson
@ 2022-10-28 19:05         ` Robert Chacón
  2022-10-28 19:54           ` Herbert Wolverson
  0 siblings, 1 reply; 33+ messages in thread
From: Robert Chacón @ 2022-10-28 19:05 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 8992 bytes --]

Wow. This is very nicely done. Awesome work!

On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> The integration is coming along nicely. Some progress updates:
>
>    - You can specify a variable in ispConfig.py named "uispSite". This
>    sets where in the topology you want the tree to start. This has two
>    purposes:
>       - It's hard to be psychic and know for sure where the shaper is in
>       the network.
>       - You could run multiple shapers at different egress points, with
>       failover - and rebuild the entire topology from the point of view of a
>       network node.
>    - "Child node with children" are now automatically converted into a
>    "(Generated Site) name" site, and their children rearranged. This:
>       - Allows you to set the "site" bandwidth independently of the
>       client site bandwidth.
>       - Makes for easier trees, because we're inserting the site that
>       really should be there.
>    - Network.json generation (not the shaped devices file yet) is
>    automatically generated from a tree, once PrepareTree() and
>    createNetworkJson() are called.
>       - There's a unit test that generates the network.example.json file
>       and compares it with the original to ensure that they match.
>    - Unit test coverage hits every function in the graph system, now.
>
> I'm liking this setup. With the non-vendor-specific logic contained inside
> the NetworkGraph type, the actual UISP code to generate the example tree is
> down to 65
> lines of code, including comments. That'll grow a bit as I re-insert some
> automatic speed limit determination, AP/Site speed overrides (
> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>
> Creating the network.example.json file only requires:
> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>         import json
>         net = NetworkGraph()
>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site,
> 1000, 1000))
>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site,
> 500, 500))
>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap,
> 500, 500))
>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", NodeType.
> site, 500, 500))
>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType.
> site, 200, 200))
>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap,
> 120, 120))
>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType.
> site, 60, 60))
>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.ap,
> 30, 30))
>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType.
> site, 200, 200))
>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap,
> 100, 100))
>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap,
> 150, 150))
>         net.prepareTree()
>         net.createNetworkJson()
>
> (The id and name fields are duplicated right now, I'm using readable names
> to keep me sane. The third string is the parent, and the last two numbers
> are bandwidth limits)
> The nice, readable format being:
> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=NodeType.
> site, download=1000, upload=1000)
>
> That in turns gives you the example network:
> [image: image.png]
>
>
> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com>
> wrote:
>
>> Dave: I love those Gource animations! Game development is my other hobby,
>> I could easily get lost for weeks tweaking the shaders to make the glow
>> "just right". :-)
>>
>> Dan: Discovery would be nice, but I don't think we're ready to look in
>> that direction yet. I'm trying to build a "common grammar" to make it
>> easier to express network layout from integrations; that would be another
>> form/layer of integration and a lot easier to work with once there's a
>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>> nothing needs to query SNMP that often!), and the SNMP route is quite
>> remarkably convoluted. Their support turned on a few "extra" modules to
>> deal with things like PMP450 clients that change MAC when you put them in
>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>> allocations by MAC (this is also the hardest part for me to test, since I
>> don't have any v6 to test, currently).
>>
>> We tend to use UISP as the "source of truth" and treat it like a database
>> for a ton of external tools (mostly ones we've created).
>>
>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>
>>> we're pretty similar in that we've made UISP a mess.  Multiple paths to
>>> a pop.  multiple pops on the network.  failover between pops.  Lots of
>>> 'other' devices. handing out /29 etc to customers.
>>>
>>> Some sort of discovery would be nice.  Ideally though, pulling something
>>> from SNMP or router APIs etc to build the paths, but having a 'network
>>> elements' list with each of the links described.  ie, backhaul 12 has MACs
>>> ..01 and ...02 at 300x100 and then build the topology around that from
>>> discovery.
>>>
>>> I've also thought about doing routine trace routes or watching TTLs or
>>> something like that to get some indication that topology has changed and
>>> then do another discovery and potential tree rebuild.
>>>
>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>> Being able to map out these complex integrations will help ISPs a ton,
>>>> and I really like that it is sharing common features between the Splynx and
>>>> UISP integrations.
>>>>
>>>> Thanks,
>>>> Robert
>>>>
>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> So I've been doing some work on getting UISP integration (and
>>>>> integrations in general) to work a bit more smoothly.
>>>>>
>>>>> I started by implementing a graph structure that mirrors both the
>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>> together nicely. You can see my progress so far at:
>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>
>>>>> Our UISP instance is a *great* testcase for torturing the system. I
>>>>> even found a case of UISP somehow auto-generating a circular portion of the
>>>>> tree. We have:
>>>>>
>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>    - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24
>>>>>    shared 100 mbit")
>>>>>    - Bridge mode devices using Option 82 to always allocate the same
>>>>>    IP, with a "service IP" entry
>>>>>    - Various bits of infrastructure mapped
>>>>>    - Sites that go to client sites, which go to other client sites
>>>>>
>>>>> In other words, over the years we've unleashed a bit of a monster.
>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>>> handle pathological cases like us!
>>>>>
>>>>> So I fed our network into the current graph generator, and used
>>>>> graphviz to spit out a directed graph:
>>>>> [image: image.png]
>>>>> That doesn't include client sites! Legend:
>>>>>
>>>>>
>>>>>    - Green = the root site.
>>>>>    - Red = a site
>>>>>    - Blue = an access point
>>>>>    - Magenta = a client site that has children
>>>>>
>>>>> So the part in "common" is designed heavily to reduce repetition. When
>>>>> it's done, you should be able to feed in sites, APs, clients, devices, etc.
>>>>> in a pretty flexible manner. Given how much code is shared between the UISP
>>>>> and Splynx integration code, I'm pretty sure both will be cut to a tiny
>>>>> fraction of the total code. :-)
>>>>>
>>>>> I can't post the full tree, it's full of client names.
>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Chacón
>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
Dev | LibreQoS.io

[-- Attachment #1.2: Type: text/html, Size: 25229 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-28 19:05         ` Robert Chacón
@ 2022-10-28 19:54           ` Herbert Wolverson
  2022-10-28 21:15             ` Robert Chacón
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-28 19:54 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 10246 bytes --]

One more update, and I'm going to sleep until "pick up daughter" time. :-)

The tree at
https://github.com/thebracket/LibreQoS/tree/integration-common-graph can
now build a network.json, ShapedDevices.csv, and
integrationUISPBandwidth.csv and follows pretty much the same logic as the
previous importer - other than using data links to build the hierarchy and
letting (requiring, currently) you specify the root node. It's handling our
bizarre UISP setup pretty well now - so if anyone wants to test it (I
recommend just running integrationUISP.py and checking the output rather
than throwing it into production), I'd appreciate any feedback.

Still on my list: handling the Mikrotik IPv6 connections, and exceptionCPE
and site exclusion.

If you want the pretty graphics, you need to "pip install graphviz" and
"sudo apt install graphviz". It *should* detect that these aren't present
and not try to draw pictures, otherwise.

On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
robert.chacon@jackrabbitwireless.com> wrote:

> Wow. This is very nicely done. Awesome work!
>
> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> The integration is coming along nicely. Some progress updates:
>>
>>    - You can specify a variable in ispConfig.py named "uispSite". This
>>    sets where in the topology you want the tree to start. This has two
>>    purposes:
>>       - It's hard to be psychic and know for sure where the shaper is in
>>       the network.
>>       - You could run multiple shapers at different egress points, with
>>       failover - and rebuild the entire topology from the point of view of a
>>       network node.
>>    - "Child node with children" are now automatically converted into a
>>    "(Generated Site) name" site, and their children rearranged. This:
>>       - Allows you to set the "site" bandwidth independently of the
>>       client site bandwidth.
>>       - Makes for easier trees, because we're inserting the site that
>>       really should be there.
>>    - Network.json generation (not the shaped devices file yet) is
>>    automatically generated from a tree, once PrepareTree() and
>>    createNetworkJson() are called.
>>       - There's a unit test that generates the network.example.json file
>>       and compares it with the original to ensure that they match.
>>    - Unit test coverage hits every function in the graph system, now.
>>
>> I'm liking this setup. With the non-vendor-specific logic contained
>> inside the NetworkGraph type, the actual UISP code to generate the example
>> tree is down to 65
>> lines of code, including comments. That'll grow a bit as I re-insert some
>> automatic speed limit determination, AP/Site speed overrides (
>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>
>> Creating the network.example.json file only requires:
>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>         import json
>>         net = NetworkGraph()
>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site,
>> 1000, 1000))
>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site,
>> 500, 500))
>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap,
>> 500, 500))
>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", NodeType
>> .site, 500, 500))
>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType.
>> site, 200, 200))
>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap,
>> 120, 120))
>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType.
>> site, 60, 60))
>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.ap,
>> 30, 30))
>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType.
>> site, 200, 200))
>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap,
>> 100, 100))
>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap,
>> 150, 150))
>>         net.prepareTree()
>>         net.createNetworkJson()
>>
>> (The id and name fields are duplicated right now, I'm using readable
>> names to keep me sane. The third string is the parent, and the last two
>> numbers are bandwidth limits)
>> The nice, readable format being:
>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=NodeType
>> .site, download=1000, upload=1000)
>>
>> That in turns gives you the example network:
>> [image: image.png]
>>
>>
>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com>
>> wrote:
>>
>>> Dave: I love those Gource animations! Game development is my other
>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>> glow "just right". :-)
>>>
>>> Dan: Discovery would be nice, but I don't think we're ready to look in
>>> that direction yet. I'm trying to build a "common grammar" to make it
>>> easier to express network layout from integrations; that would be another
>>> form/layer of integration and a lot easier to work with once there's a
>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>> deal with things like PMP450 clients that change MAC when you put them in
>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>> allocations by MAC (this is also the hardest part for me to test, since I
>>> don't have any v6 to test, currently).
>>>
>>> We tend to use UISP as the "source of truth" and treat it like a
>>> database for a ton of external tools (mostly ones we've created).
>>>
>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>
>>>> we're pretty similar in that we've made UISP a mess.  Multiple paths to
>>>> a pop.  multiple pops on the network.  failover between pops.  Lots of
>>>> 'other' devices. handing out /29 etc to customers.
>>>>
>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>> from discovery.
>>>>
>>>> I've also thought about doing routine trace routes or watching TTLs or
>>>> something like that to get some indication that topology has changed and
>>>> then do another discovery and potential tree rebuild.
>>>>
>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>> Being able to map out these complex integrations will help ISPs a ton,
>>>>> and I really like that it is sharing common features between the Splynx and
>>>>> UISP integrations.
>>>>>
>>>>> Thanks,
>>>>> Robert
>>>>>
>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>
>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>> together nicely. You can see my progress so far at:
>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>
>>>>>> Our UISP instance is a *great* testcase for torturing the system. I
>>>>>> even found a case of UISP somehow auto-generating a circular portion of the
>>>>>> tree. We have:
>>>>>>
>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>    - Bridge mode devices using Option 82 to always allocate the same
>>>>>>    IP, with a "service IP" entry
>>>>>>    - Various bits of infrastructure mapped
>>>>>>    - Sites that go to client sites, which go to other client sites
>>>>>>
>>>>>> In other words, over the years we've unleashed a bit of a monster.
>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>>>> handle pathological cases like us!
>>>>>>
>>>>>> So I fed our network into the current graph generator, and used
>>>>>> graphviz to spit out a directed graph:
>>>>>> [image: image.png]
>>>>>> That doesn't include client sites! Legend:
>>>>>>
>>>>>>
>>>>>>    - Green = the root site.
>>>>>>    - Red = a site
>>>>>>    - Blue = an access point
>>>>>>    - Magenta = a client site that has children
>>>>>>
>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>> tiny fraction of the total code. :-)
>>>>>>
>>>>>> I can't post the full tree, it's full of client names.
>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> Robert Chacón
> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
> Dev | LibreQoS.io
>
>

[-- Attachment #1.2: Type: text/html, Size: 26755 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-28 19:54           ` Herbert Wolverson
@ 2022-10-28 21:15             ` Robert Chacón
  2022-10-29 15:57               ` Herbert Wolverson
  0 siblings, 1 reply; 33+ messages in thread
From: Robert Chacón @ 2022-10-28 21:15 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 11603 bytes --]

Awesome work. It succeeded in building the topology and creating
ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
I notice that in ShapedDevices.csv it does add CPE radios (which in our
case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
being empty lists [].
This is not necessarily bad, but it may lead to empty leaf classes being
created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
counter increment toward the 32k limit faster.
Do you think perhaps we should check:
*if (len(IPv4) == 0) and (len(IPv6) == 0):*
*   # Skip adding this entry to ShapedDevices.csv*
Or something similar around line 329 of integrationCommon.py?
Open to your suggestions there.



On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> One more update, and I'm going to sleep until "pick up daughter" time. :-)
>
> The tree at
> https://github.com/thebracket/LibreQoS/tree/integration-common-graph can
> now build a network.json, ShapedDevices.csv, and
> integrationUISPBandwidth.csv and follows pretty much the same logic as the
> previous importer - other than using data links to build the hierarchy and
> letting (requiring, currently) you specify the root node. It's handling our
> bizarre UISP setup pretty well now - so if anyone wants to test it (I
> recommend just running integrationUISP.py and checking the output rather
> than throwing it into production), I'd appreciate any feedback.
>
> Still on my list: handling the Mikrotik IPv6 connections, and exceptionCPE
> and site exclusion.
>
> If you want the pretty graphics, you need to "pip install graphviz" and
> "sudo apt install graphviz". It *should* detect that these aren't present
> and not try to draw pictures, otherwise.
>
> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
> robert.chacon@jackrabbitwireless.com> wrote:
>
>> Wow. This is very nicely done. Awesome work!
>>
>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> The integration is coming along nicely. Some progress updates:
>>>
>>>    - You can specify a variable in ispConfig.py named "uispSite". This
>>>    sets where in the topology you want the tree to start. This has two
>>>    purposes:
>>>       - It's hard to be psychic and know for sure where the shaper is
>>>       in the network.
>>>       - You could run multiple shapers at different egress points, with
>>>       failover - and rebuild the entire topology from the point of view of a
>>>       network node.
>>>    - "Child node with children" are now automatically converted into a
>>>    "(Generated Site) name" site, and their children rearranged. This:
>>>       - Allows you to set the "site" bandwidth independently of the
>>>       client site bandwidth.
>>>       - Makes for easier trees, because we're inserting the site that
>>>       really should be there.
>>>    - Network.json generation (not the shaped devices file yet) is
>>>    automatically generated from a tree, once PrepareTree() and
>>>    createNetworkJson() are called.
>>>       - There's a unit test that generates the network.example.json
>>>       file and compares it with the original to ensure that they match.
>>>    - Unit test coverage hits every function in the graph system, now.
>>>
>>> I'm liking this setup. With the non-vendor-specific logic contained
>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>> tree is down to 65
>>> lines of code, including comments. That'll grow a bit as I re-insert
>>> some automatic speed limit determination, AP/Site speed overrides (
>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>
>>> Creating the network.example.json file only requires:
>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>         import json
>>>         net = NetworkGraph()
>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site,
>>> 1000, 1000))
>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site,
>>> 500, 500))
>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap,
>>> 500, 500))
>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>> NodeType.site, 500, 500))
>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType.
>>> site, 200, 200))
>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap,
>>> 120, 120))
>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType.
>>> site, 60, 60))
>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.
>>> ap, 30, 30))
>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType.
>>> site, 200, 200))
>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap,
>>> 100, 100))
>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap,
>>> 150, 150))
>>>         net.prepareTree()
>>>         net.createNetworkJson()
>>>
>>> (The id and name fields are duplicated right now, I'm using readable
>>> names to keep me sane. The third string is the parent, and the last two
>>> numbers are bandwidth limits)
>>> The nice, readable format being:
>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>> NodeType.site, download=1000, upload=1000)
>>>
>>> That in turns gives you the example network:
>>> [image: image.png]
>>>
>>>
>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com>
>>> wrote:
>>>
>>>> Dave: I love those Gource animations! Game development is my other
>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>> glow "just right". :-)
>>>>
>>>> Dan: Discovery would be nice, but I don't think we're ready to look in
>>>> that direction yet. I'm trying to build a "common grammar" to make it
>>>> easier to express network layout from integrations; that would be another
>>>> form/layer of integration and a lot easier to work with once there's a
>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>> don't have any v6 to test, currently).
>>>>
>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>> database for a ton of external tools (mostly ones we've created).
>>>>
>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>
>>>>> we're pretty similar in that we've made UISP a mess.  Multiple paths
>>>>> to a pop.  multiple pops on the network.  failover between pops.  Lots of
>>>>> 'other' devices. handing out /29 etc to customers.
>>>>>
>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>> from discovery.
>>>>>
>>>>> I've also thought about doing routine trace routes or watching TTLs or
>>>>> something like that to get some indication that topology has changed and
>>>>> then do another discovery and potential tree rebuild.
>>>>>
>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>> ton, and I really like that it is sharing common features between the
>>>>>> Splynx and UISP integrations.
>>>>>>
>>>>>> Thanks,
>>>>>> Robert
>>>>>>
>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>
>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>> together nicely. You can see my progress so far at:
>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>
>>>>>>> Our UISP instance is a *great* testcase for torturing the system. I
>>>>>>> even found a case of UISP somehow auto-generating a circular portion of the
>>>>>>> tree. We have:
>>>>>>>
>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>    same IP, with a "service IP" entry
>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>    - Sites that go to client sites, which go to other client sites
>>>>>>>
>>>>>>> In other words, over the years we've unleashed a bit of a monster.
>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>>>>> handle pathological cases like us!
>>>>>>>
>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>> graphviz to spit out a directed graph:
>>>>>>> [image: image.png]
>>>>>>> That doesn't include client sites! Legend:
>>>>>>>
>>>>>>>
>>>>>>>    - Green = the root site.
>>>>>>>    - Red = a site
>>>>>>>    - Blue = an access point
>>>>>>>    - Magenta = a client site that has children
>>>>>>>
>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>> tiny fraction of the total code. :-)
>>>>>>>
>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Chacón
>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>
>>
>> --
>> Robert Chacón
>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>> Dev | LibreQoS.io
>>
>> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
Dev | LibreQoS.io

[-- Attachment #1.2: Type: text/html, Size: 28658 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-28 21:15             ` Robert Chacón
@ 2022-10-29 15:57               ` Herbert Wolverson
  2022-10-29 19:05                 ` Robert Chacón
  2022-10-29 19:18                 ` Dave Taht
  0 siblings, 2 replies; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-29 15:57 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 15207 bytes --]

Alright, the UISP side of the common integrations is pretty much feature
complete. I'll update the tracking issue in a bit.

   - Per your suggestion, devices with no IP addresses (v4 or v6) are not
   added.
   - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
   side of things, so it can be used in other integrations also. I don't have
   a setup on which to test it, but if I'm reading the code right then the
   unit test is testing it appropriately.
   - excludeSites is supported as a common API feature. If a node is added
   with a name that matches an excluded site, it won't be added. The tree
   builder is smart enough to replace invalid "parentId" references with the
   shaper root, so if you have other tree items that rely on this site - they
   will be added to the tree. Was that the intent? (It looks pretty useful; we
   have a child site down the tree with a HUGE amount of load, and bumping it
   to the top-level with excludeSites would probably help our load balancing
   quite a bit)
      - If the intent was to exclude the site and everything underneath it,
      I'd have to rework things a bit. Let me know; it wasn't quite clear.
      - exceptionCPEs is also supported as a common API feature. It simply
   overrides the "parentId'' of incoming nodes with the new parent. Another
   potentially useful feature; if I got excludeSites the wrong away around,
   I'd add a "my_big_site":"" entry to push it to the top.
   - UISP integration now supports a "flat" topology option (set via
   uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to
   include this entry.

I'll look and see how much of the Spylnx code I can shorten with the new
API; I don't have a Spylnx setup to test against, making that tricky. I
*think* the new API should shorten things a lot. I think routers act as
node parents, with clients underneath them? Otherwise, a "flat" setup
should be a little shorter (the CSV code can be replaced with a call to the
graph builder). Most of the Spylnx (and VISP) users I've talked to layer
MPLS+VPLS to pretend to have a big, flat network and then connect via a
RADIUS call in the DHCP server; I've always assumed that's because those
systems prefer the telecom model of "pretend everything is equal" to trying
to model topology.*

I need to clean things up a bit (there's still a bit of duplicated code,
and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
my boss at PragProg - coined the term in The Pragmatic Programmer, and I
feel obliged to use it everywhere!), and do a quick rebase (I accidentally
parented the branch off of a branch instead of main) - but I think I can
have this as a PR for you on Monday.

* - The first big wireless network I setup used a Motorola WiMAX setup.
They *required* that every single AP share two VLANs (management and
bearer) with every other AP - all the way to the core. It kinda worked once
they remembered client isolation was a thing in a patch... Then again,
their installation instructions included connecting two ports of a router
together with a jumper cable, because their localhost implementation didn't
quite work. :-|

On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
robert.chacon@jackrabbitwireless.com> wrote:

> Awesome work. It succeeded in building the topology and creating
> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
> I notice that in ShapedDevices.csv it does add CPE radios (which in our
> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
> being empty lists [].
> This is not necessarily bad, but it may lead to empty leaf classes being
> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
> counter increment toward the 32k limit faster.
> Do you think perhaps we should check:
> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
> *   # Skip adding this entry to ShapedDevices.csv*
> Or something similar around line 329 of integrationCommon.py?
> Open to your suggestions there.
>
>
>
> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> One more update, and I'm going to sleep until "pick up daughter" time. :-)
>>
>> The tree at
>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph can
>> now build a network.json, ShapedDevices.csv, and
>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>> previous importer - other than using data links to build the hierarchy and
>> letting (requiring, currently) you specify the root node. It's handling our
>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>> recommend just running integrationUISP.py and checking the output rather
>> than throwing it into production), I'd appreciate any feedback.
>>
>> Still on my list: handling the Mikrotik IPv6 connections, and
>> exceptionCPE and site exclusion.
>>
>> If you want the pretty graphics, you need to "pip install graphviz" and
>> "sudo apt install graphviz". It *should* detect that these aren't present
>> and not try to draw pictures, otherwise.
>>
>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>> robert.chacon@jackrabbitwireless.com> wrote:
>>
>>> Wow. This is very nicely done. Awesome work!
>>>
>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> The integration is coming along nicely. Some progress updates:
>>>>
>>>>    - You can specify a variable in ispConfig.py named "uispSite". This
>>>>    sets where in the topology you want the tree to start. This has two
>>>>    purposes:
>>>>       - It's hard to be psychic and know for sure where the shaper is
>>>>       in the network.
>>>>       - You could run multiple shapers at different egress points,
>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>       network node.
>>>>    - "Child node with children" are now automatically converted into a
>>>>    "(Generated Site) name" site, and their children rearranged. This:
>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>       client site bandwidth.
>>>>       - Makes for easier trees, because we're inserting the site that
>>>>       really should be there.
>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>    automatically generated from a tree, once PrepareTree() and
>>>>    createNetworkJson() are called.
>>>>       - There's a unit test that generates the network.example.json
>>>>       file and compares it with the original to ensure that they match.
>>>>    - Unit test coverage hits every function in the graph system, now.
>>>>
>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>> tree is down to 65
>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>
>>>> Creating the network.example.json file only requires:
>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>         import json
>>>>         net = NetworkGraph()
>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>> site, 1000, 1000))
>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>> site, 500, 500))
>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.
>>>> ap, 500, 500))
>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>> NodeType.site, 500, 500))
>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType
>>>> .site, 200, 200))
>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap,
>>>> 120, 120))
>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType.
>>>> site, 60, 60))
>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.
>>>> ap, 30, 30))
>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType
>>>> .site, 200, 200))
>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap,
>>>> 100, 100))
>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.
>>>> ap, 150, 150))
>>>>         net.prepareTree()
>>>>         net.createNetworkJson()
>>>>
>>>> (The id and name fields are duplicated right now, I'm using readable
>>>> names to keep me sane. The third string is the parent, and the last two
>>>> numbers are bandwidth limits)
>>>> The nice, readable format being:
>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>> NodeType.site, download=1000, upload=1000)
>>>>
>>>> That in turns gives you the example network:
>>>> [image: image.png]
>>>>
>>>>
>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>> herberticus@gmail.com> wrote:
>>>>
>>>>> Dave: I love those Gource animations! Game development is my other
>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>> glow "just right". :-)
>>>>>
>>>>> Dan: Discovery would be nice, but I don't think we're ready to look in
>>>>> that direction yet. I'm trying to build a "common grammar" to make it
>>>>> easier to express network layout from integrations; that would be another
>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>> don't have any v6 to test, currently).
>>>>>
>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>
>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>
>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple paths
>>>>>> to a pop.  multiple pops on the network.  failover between pops.  Lots of
>>>>>> 'other' devices. handing out /29 etc to customers.
>>>>>>
>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>> from discovery.
>>>>>>
>>>>>> I've also thought about doing routine trace routes or watching TTLs
>>>>>> or something like that to get some indication that topology has changed and
>>>>>> then do another discovery and potential tree rebuild.
>>>>>>
>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>> Splynx and UISP integrations.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Robert
>>>>>>>
>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>
>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>
>>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>
>>>>>>>> Our UISP instance is a *great* testcase for torturing the system.
>>>>>>>> I even found a case of UISP somehow auto-generating a circular portion of
>>>>>>>> the tree. We have:
>>>>>>>>
>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>    - Sites that go to client sites, which go to other client sites
>>>>>>>>
>>>>>>>> In other words, over the years we've unleashed a bit of a monster.
>>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>>>>>> handle pathological cases like us!
>>>>>>>>
>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>> [image: image.png]
>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Green = the root site.
>>>>>>>>    - Red = a site
>>>>>>>>    - Blue = an access point
>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>
>>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>>> tiny fraction of the total code. :-)
>>>>>>>>
>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>> _______________________________________________
>>>>>>>> LibreQoS mailing list
>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Chacón
>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> Robert Chacón
>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>> Dev | LibreQoS.io
>>>
>>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> Robert Chacón
> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
> Dev | LibreQoS.io
>
>

[-- Attachment #1.2: Type: text/html, Size: 32499 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-29 15:57               ` Herbert Wolverson
@ 2022-10-29 19:05                 ` Robert Chacón
  2022-10-29 19:43                   ` Dave Taht
  2022-10-29 19:18                 ` Dave Taht
  1 sibling, 1 reply; 33+ messages in thread
From: Robert Chacón @ 2022-10-29 19:05 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 19218 bytes --]

> Per your suggestion, devices with no IP addresses (v4 or v6) are not
added.
> Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side
of things, so it can be used in other integrations also. I don't have a
setup on which to test it, but if I'm reading the code right then the unit
test is testing it appropriately.

Fantastic.

> excludeSites is supported as a common API feature. If a node is added
with a name that matches an excluded site, it won't be added. The tree
builder is smart enough to replace invalid "parentId" references with the
shaper root, so if you have other tree items that rely on this site - they
will be added to the tree. Was that the intent? (It looks pretty useful; we
have a child site down the tree with a HUGE amount of load, and bumping it
to the top-level with excludeSites would probably help our load balancing
quite a bit)

Very cool approach, I like it! Yeah we have some cases where we need to
balance out high load child nodes across CPUs so that's perfect.
Originally I thought of it to just exclude sites that don't fit into the
shaped topology but this approach is more useful.
Should we rename excludeSites to moveSitesToTop or something similar? That
functionality of distributing across top level nodes / cpu cores seems more
important anyway.

>exceptionCPEs is also supported as a common API feature. It simply
overrides the "parentId'' of incoming nodes with the new parent. Another
potentially useful feature; if I got excludeSites the wrong away around,
I'd add a "my_big_site":"" entry to push it to the top.

Awesome

> UISP integration now supports a "flat" topology option (set via
uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to
include this entry.

Nice!

> I'll look and see how much of the Spylnx code I can shorten with the new
API; I don't have a Spylnx setup to test against, making that tricky.

I'll send you the Splynx login they gave us.

> I *think* the new API should shorten things a lot. I think routers act as
node parents, with clients underneath them? Otherwise, a "flat" setup
should be a little shorter (the CSV code can be replaced with a call to the
graph builder). Most of the Spylnx (and VISP) users I've talked to layer
MPLS+VPLS to pretend to have a big, flat network and then connect via a
RADIUS call in the DHCP server; I've always assumed that's because those
systems prefer the telecom model of "pretend everything is equal" to trying
to model topology.*

Yeah splynx doesn't seem to natively support any topology mapping or even
AP designation, one person I spoke to said they track corresponding APs in
radius anyway. So for now the flat model may be fine.

> I need to clean things up a bit (there's still a bit of duplicated code,
and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
my boss at PragProg - coined the term in The Pragmatic Programmer, and I
feel obliged to use it everywhere!), and do a quick rebase (I accidentally
parented the branch off of a branch instead of main) - but I think I can
have this as a PR for you on Monday.

This is really great work and will make future integrations much cleaner
and nicer to work with. Thank you!


On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> Alright, the UISP side of the common integrations is pretty much feature
> complete. I'll update the tracking issue in a bit.
>
>    - Per your suggestion, devices with no IP addresses (v4 or v6) are not
>    added.
>    - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>    side of things, so it can be used in other integrations also. I don't have
>    a setup on which to test it, but if I'm reading the code right then the
>    unit test is testing it appropriately.
>    - excludeSites is supported as a common API feature. If a node is
>    added with a name that matches an excluded site, it won't be added. The
>    tree builder is smart enough to replace invalid "parentId" references with
>    the shaper root, so if you have other tree items that rely on this site -
>    they will be added to the tree. Was that the intent? (It looks pretty
>    useful; we have a child site down the tree with a HUGE amount of load, and
>    bumping it to the top-level with excludeSites would probably help our load
>    balancing quite a bit)
>       - If the intent was to exclude the site and everything underneath
>       it, I'd have to rework things a bit. Let me know; it wasn't quite clear.
>       - exceptionCPEs is also supported as a common API feature. It
>    simply overrides the "parentId'' of incoming nodes with the new parent.
>    Another potentially useful feature; if I got excludeSites the wrong away
>    around, I'd add a "my_big_site":"" entry to push it to the top.
>    - UISP integration now supports a "flat" topology option (set via
>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>    to include this entry.
>
> I'll look and see how much of the Spylnx code I can shorten with the new
> API; I don't have a Spylnx setup to test against, making that tricky. I
> *think* the new API should shorten things a lot. I think routers act as
> node parents, with clients underneath them? Otherwise, a "flat" setup
> should be a little shorter (the CSV code can be replaced with a call to the
> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
> MPLS+VPLS to pretend to have a big, flat network and then connect via a
> RADIUS call in the DHCP server; I've always assumed that's because those
> systems prefer the telecom model of "pretend everything is equal" to trying
> to model topology.*
>
> I need to clean things up a bit (there's still a bit of duplicated code,
> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
> parented the branch off of a branch instead of main) - but I think I can
> have this as a PR for you on Monday.
>
> * - The first big wireless network I setup used a Motorola WiMAX setup.
> They *required* that every single AP share two VLANs (management and
> bearer) with every other AP - all the way to the core. It kinda worked once
> they remembered client isolation was a thing in a patch... Then again,
> their installation instructions included connecting two ports of a router
> together with a jumper cable, because their localhost implementation didn't
> quite work. :-|
>
> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
> robert.chacon@jackrabbitwireless.com> wrote:
>
>> Awesome work. It succeeded in building the topology and creating
>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>> I notice that in ShapedDevices.csv it does add CPE radios (which in our
>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>> being empty lists [].
>> This is not necessarily bad, but it may lead to empty leaf classes being
>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
>> counter increment toward the 32k limit faster.
>> Do you think perhaps we should check:
>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>> *   # Skip adding this entry to ShapedDevices.csv*
>> Or something similar around line 329 of integrationCommon.py?
>> Open to your suggestions there.
>>
>>
>>
>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> One more update, and I'm going to sleep until "pick up daughter" time.
>>> :-)
>>>
>>> The tree at
>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>> can now build a network.json, ShapedDevices.csv, and
>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>> previous importer - other than using data links to build the hierarchy and
>>> letting (requiring, currently) you specify the root node. It's handling our
>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>> recommend just running integrationUISP.py and checking the output rather
>>> than throwing it into production), I'd appreciate any feedback.
>>>
>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>> exceptionCPE and site exclusion.
>>>
>>> If you want the pretty graphics, you need to "pip install graphviz" and
>>> "sudo apt install graphviz". It *should* detect that these aren't present
>>> and not try to draw pictures, otherwise.
>>>
>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>
>>>> Wow. This is very nicely done. Awesome work!
>>>>
>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> The integration is coming along nicely. Some progress updates:
>>>>>
>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>    purposes:
>>>>>       - It's hard to be psychic and know for sure where the shaper is
>>>>>       in the network.
>>>>>       - You could run multiple shapers at different egress points,
>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>       network node.
>>>>>    - "Child node with children" are now automatically converted into
>>>>>    a "(Generated Site) name" site, and their children rearranged. This:
>>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>>       client site bandwidth.
>>>>>       - Makes for easier trees, because we're inserting the site that
>>>>>       really should be there.
>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>    createNetworkJson() are called.
>>>>>       - There's a unit test that generates the network.example.json
>>>>>       file and compares it with the original to ensure that they match.
>>>>>    - Unit test coverage hits every function in the graph system, now.
>>>>>
>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>> tree is down to 65
>>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>
>>>>> Creating the network.example.json file only requires:
>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>         import json
>>>>>         net = NetworkGraph()
>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>>> site, 1000, 1000))
>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>>> site, 500, 500))
>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.
>>>>> ap, 500, 500))
>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>> NodeType.site, 500, 500))
>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>> NodeType.site, 200, 200))
>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.
>>>>> ap, 120, 120))
>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType
>>>>> .site, 60, 60))
>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType
>>>>> .ap, 30, 30))
>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>> NodeType.site, 200, 200))
>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.
>>>>> ap, 100, 100))
>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.
>>>>> ap, 150, 150))
>>>>>         net.prepareTree()
>>>>>         net.createNetworkJson()
>>>>>
>>>>> (The id and name fields are duplicated right now, I'm using readable
>>>>> names to keep me sane. The third string is the parent, and the last two
>>>>> numbers are bandwidth limits)
>>>>> The nice, readable format being:
>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>> NodeType.site, download=1000, upload=1000)
>>>>>
>>>>> That in turns gives you the example network:
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>> herberticus@gmail.com> wrote:
>>>>>
>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>> glow "just right". :-)
>>>>>>
>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look
>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it
>>>>>> easier to express network layout from integrations; that would be another
>>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>>> don't have any v6 to test, currently).
>>>>>>
>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>
>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>
>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple paths
>>>>>>> to a pop.  multiple pops on the network.  failover between pops.  Lots of
>>>>>>> 'other' devices. handing out /29 etc to customers.
>>>>>>>
>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>> from discovery.
>>>>>>>
>>>>>>> I've also thought about doing routine trace routes or watching TTLs
>>>>>>> or something like that to get some indication that topology has changed and
>>>>>>> then do another discovery and potential tree rebuild.
>>>>>>>
>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>
>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>>> Splynx and UISP integrations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Robert
>>>>>>>>
>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>
>>>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>
>>>>>>>>> Our UISP instance is a *great* testcase for torturing the system.
>>>>>>>>> I even found a case of UISP somehow auto-generating a circular portion of
>>>>>>>>> the tree. We have:
>>>>>>>>>
>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>    - Sites that go to client sites, which go to other client sites
>>>>>>>>>
>>>>>>>>> In other words, over the years we've unleashed a bit of a monster.
>>>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>>>>>>> handle pathological cases like us!
>>>>>>>>>
>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>> [image: image.png]
>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Green = the root site.
>>>>>>>>>    - Red = a site
>>>>>>>>>    - Blue = an access point
>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>
>>>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>>>> tiny fraction of the total code. :-)
>>>>>>>>>
>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Chacón
>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>> _______________________________________________
>>>>>>>> LibreQoS mailing list
>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>
>>>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Chacón
>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>> Dev | LibreQoS.io
>>>>
>>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>
>>
>> --
>> Robert Chacón
>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>> Dev | LibreQoS.io
>>
>> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
Dev | LibreQoS.io

[-- Attachment #1.2: Type: text/html, Size: 37451 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-29 15:57               ` Herbert Wolverson
  2022-10-29 19:05                 ` Robert Chacón
@ 2022-10-29 19:18                 ` Dave Taht
  2022-10-30  1:10                   ` Herbert Wolverson
  1 sibling, 1 reply; 33+ messages in thread
From: Dave Taht @ 2022-10-29 19:18 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 17425 bytes --]

On Sat, Oct 29, 2022 at 8:57 AM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> Alright, the UISP side of the common integrations is pretty much feature
> complete. I'll update the tracking issue in a bit.
>
>    - Per your suggestion, devices with no IP addresses (v4 or v6) are not
>    added.
>
> Every device that is ipv6-ready comes up with a link-local address derived
from the mac like fe80::6f16:fa94:f32b:e2e
Some actually will accept things like ssh to that address
Not that this is necessarily relevant to this bit of code. Dr irrelevant I
am today.
(in the context of babel, at least, you can route ipv4 and ipv6 without
either an ipv6 or ipv4 address, and hnetd configure)

I am kind of curious as to what weird configuration protocols are in common
use today

Painfully common are "smart switches" that don't listen to dhcp by default
AND come up on 192.168.1.1
ubnt comes up on 192.168.1.20 by defualt
a lot of cpe comes up on 192.168.1.100 (like cable and starlink)
I've seen stuff that uses ancient ieee protocols
bootp and tftp are still things

I've always kind of wanted a daemon on every device that would probe all
possible ip addresses with a ttl of 2, to find rogue
devices etc.

>
>    - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>    side of things, so it can be used in other integrations also. I don't have
>    a setup on which to test it, but if I'm reading the code right then the
>    unit test is testing it appropriately.
>
>
You talking about the relevant rfc?


>
>    - excludeSites is supported as a common API feature. If a node is
>    added with a name that matches an excluded site, it won't be added. The
>    tree builder is smart enough to replace invalid "parentId" references with
>    the shaper root, so if you have other tree items that rely on this site -
>    they will be added to the tree. Was that the intent? (It looks pretty
>    useful; we have a child site down the tree with a HUGE amount of load, and
>    bumping it to the top-level with excludeSites would probably help our load
>    balancing quite a bit)
>       - If the intent was to exclude the site and everything underneath
>       it, I'd have to rework things a bit. Let me know; it wasn't quite clear.
>       - exceptionCPEs is also supported as a common API feature. It
>    simply overrides the "parentId'' of incoming nodes with the new parent.
>    Another potentially useful feature; if I got excludeSites the wrong away
>    around, I'd add a "my_big_site":"" entry to push it to the top.
>
>
Seems to be a need for some level of exclusions for device type, e.g. (at
least per your report), don't run ack-filter on a cambium path.


>
>    - UISP integration now supports a "flat" topology option (set via
>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>    to include this entry.
>
> I'll look and see how much of the Spylnx code I can shorten with the new
> API; I don't have a Spylnx setup to test against, making that tricky. I
> *think* the new API should shorten things a lot. I think routers act as
> node parents, with clients underneath them? Otherwise, a "flat" setup
> should be a little shorter (the CSV code can be replaced with a call to the
> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
> MPLS+VPLS to pretend to have a big, flat network and then connect via a
> RADIUS call in the DHCP server;
>

Is there any particularly common set of radius servers in use?


> I've always assumed that's because those systems prefer the telecom model
> of "pretend everything is equal" to trying to model topology.*
>

Except the billing. Always the billing. Our tuesday golden plate special is
you can download all the pr0n from our special partner netblix for 24 hours
a week! 9.95!


>
> I need to clean things up a bit (there's still a bit of duplicated code,
> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
> parented the branch off of a branch instead of main) - but I think I can
> have this as a PR for you on Monday.
>
> * - The first big wireless network I setup used a Motorola WiMAX setup.
> They *required* that every single AP share two VLANs (management and
> bearer) with every other AP - all the way to the core. It kinda worked once
> they remembered client isolation was a thing in a patch... Then again,
> their installation instructions included connecting two ports of a router
> together with a jumper cable, because their localhost implementation didn't
> quite work. :-|
>
> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
> robert.chacon@jackrabbitwireless.com> wrote:
>
>> Awesome work. It succeeded in building the topology and creating
>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>> I notice that in ShapedDevices.csv it does add CPE radios (which in our
>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>> being empty lists [].
>> This is not necessarily bad, but it may lead to empty leaf classes being
>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
>> counter increment toward the 32k limit faster.
>> Do you think perhaps we should check:
>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>> *   # Skip adding this entry to ShapedDevices.csv*
>> Or something similar around line 329 of integrationCommon.py?
>> Open to your suggestions there.
>>
>>
>>
>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> One more update, and I'm going to sleep until "pick up daughter" time.
>>> :-)
>>>
>>> The tree at
>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>> can now build a network.json, ShapedDevices.csv, and
>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>> previous importer - other than using data links to build the hierarchy and
>>> letting (requiring, currently) you specify the root node. It's handling our
>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>> recommend just running integrationUISP.py and checking the output rather
>>> than throwing it into production), I'd appreciate any feedback.
>>>
>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>> exceptionCPE and site exclusion.
>>>
>>> If you want the pretty graphics, you need to "pip install graphviz" and
>>> "sudo apt install graphviz". It *should* detect that these aren't present
>>> and not try to draw pictures, otherwise.
>>>
>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>
>>>> Wow. This is very nicely done. Awesome work!
>>>>
>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> The integration is coming along nicely. Some progress updates:
>>>>>
>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>    purposes:
>>>>>       - It's hard to be psychic and know for sure where the shaper is
>>>>>       in the network.
>>>>>       - You could run multiple shapers at different egress points,
>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>       network node.
>>>>>    - "Child node with children" are now automatically converted into
>>>>>    a "(Generated Site) name" site, and their children rearranged. This:
>>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>>       client site bandwidth.
>>>>>       - Makes for easier trees, because we're inserting the site that
>>>>>       really should be there.
>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>    createNetworkJson() are called.
>>>>>       - There's a unit test that generates the network.example.json
>>>>>       file and compares it with the original to ensure that they match.
>>>>>    - Unit test coverage hits every function in the graph system, now.
>>>>>
>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>> tree is down to 65
>>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>
>>>>> Creating the network.example.json file only requires:
>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>         import json
>>>>>         net = NetworkGraph()
>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>>> site, 1000, 1000))
>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>>> site, 500, 500))
>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.
>>>>> ap, 500, 500))
>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>> NodeType.site, 500, 500))
>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>> NodeType.site, 200, 200))
>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.
>>>>> ap, 120, 120))
>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType
>>>>> .site, 60, 60))
>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType
>>>>> .ap, 30, 30))
>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>> NodeType.site, 200, 200))
>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.
>>>>> ap, 100, 100))
>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.
>>>>> ap, 150, 150))
>>>>>         net.prepareTree()
>>>>>         net.createNetworkJson()
>>>>>
>>>>> (The id and name fields are duplicated right now, I'm using readable
>>>>> names to keep me sane. The third string is the parent, and the last two
>>>>> numbers are bandwidth limits)
>>>>> The nice, readable format being:
>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>> NodeType.site, download=1000, upload=1000)
>>>>>
>>>>> That in turns gives you the example network:
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>> herberticus@gmail.com> wrote:
>>>>>
>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>> glow "just right". :-)
>>>>>>
>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look
>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it
>>>>>> easier to express network layout from integrations; that would be another
>>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>>> don't have any v6 to test, currently).
>>>>>>
>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>
>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>
>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple paths
>>>>>>> to a pop.  multiple pops on the network.  failover between pops.  Lots of
>>>>>>> 'other' devices. handing out /29 etc to customers.
>>>>>>>
>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>> from discovery.
>>>>>>>
>>>>>>> I've also thought about doing routine trace routes or watching TTLs
>>>>>>> or something like that to get some indication that topology has changed and
>>>>>>> then do another discovery and potential tree rebuild.
>>>>>>>
>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>
>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>>> Splynx and UISP integrations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Robert
>>>>>>>>
>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>
>>>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>
>>>>>>>>> Our UISP instance is a *great* testcase for torturing the system.
>>>>>>>>> I even found a case of UISP somehow auto-generating a circular portion of
>>>>>>>>> the tree. We have:
>>>>>>>>>
>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>    - Sites that go to client sites, which go to other client sites
>>>>>>>>>
>>>>>>>>> In other words, over the years we've unleashed a bit of a monster.
>>>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to
>>>>>>>>> handle pathological cases like us!
>>>>>>>>>
>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>> [image: image.png]
>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Green = the root site.
>>>>>>>>>    - Red = a site
>>>>>>>>>    - Blue = an access point
>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>
>>>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>>>> tiny fraction of the total code. :-)
>>>>>>>>>
>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Chacón
>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>> _______________________________________________
>>>>>>>> LibreQoS mailing list
>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>
>>>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Chacón
>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>> Dev | LibreQoS.io
>>>>
>>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>
>>
>> --
>> Robert Chacón
>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>> Dev | LibreQoS.io
>>
>> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

[-- Attachment #1.2: Type: text/html, Size: 36267 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-29 19:05                 ` Robert Chacón
@ 2022-10-29 19:43                   ` Dave Taht
  2022-10-30  1:45                     ` Herbert Wolverson
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Taht @ 2022-10-29 19:43 UTC (permalink / raw)
  To: Robert Chacón; +Cc: Herbert Wolverson, libreqos


[-- Attachment #1.1: Type: text/plain, Size: 21255 bytes --]

For starters, let me also offer praise for this work which is so ahead of
schedule!

I am (perhaps cluelessly) thinking about bigger pictures, and still stuck
in my mindset involving distributing the packet processing,
and representing the network topology, plans and compensating for the
physics.

So you have a major tower, a separate libreqos instance goes there. Or
libreqos outputs rules compatible with mikrotik or vyatta or whatever is
there. Or are you basically thinking one device rules them all and off the
only interface, shapes them?

Or:

You have another pop with a separate connection to the internet that you
inherited from a buyout, or you wanted physical redundancy for your BGP
AS's internet access, maybe just between DCs in the same town or...
    ____________________________________________

/
/
cloud -> pop -> customers - customers <- pop <- cloud
                 \  ----- leased fiber or wireless   /


I'm also a little puzzled as to whats the ISP->internet link? juniper?
cisco? mikrotik, and what role and services that is expected to have.



On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
> added.
> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side
> of things, so it can be used in other integrations also. I don't have a
> setup on which to test it, but if I'm reading the code right then the unit
> test is testing it appropriately.
>
> Fantastic.
>
> > excludeSites is supported as a common API feature. If a node is added
> with a name that matches an excluded site, it won't be added. The tree
> builder is smart enough to replace invalid "parentId" references with the
> shaper root, so if you have other tree items that rely on this site - they
> will be added to the tree. Was that the intent? (It looks pretty useful; we
> have a child site down the tree with a HUGE amount of load, and bumping it
> to the top-level with excludeSites would probably help our load balancing
> quite a bit)
>
> Very cool approach, I like it! Yeah we have some cases where we need to
> balance out high load child nodes across CPUs so that's perfect.
> Originally I thought of it to just exclude sites that don't fit into the
> shaped topology but this approach is more useful.
> Should we rename excludeSites to moveSitesToTop or something similar? That
> functionality of distributing across top level nodes / cpu cores seems more
> important anyway.
>
> >exceptionCPEs is also supported as a common API feature. It simply
> overrides the "parentId'' of incoming nodes with the new parent. Another
> potentially useful feature; if I got excludeSites the wrong away around,
> I'd add a "my_big_site":"" entry to push it to the top.
>
> Awesome
>
> > UISP integration now supports a "flat" topology option (set via
> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to
> include this entry.
>
> Nice!
>
> > I'll look and see how much of the Spylnx code I can shorten with the new
> API; I don't have a Spylnx setup to test against, making that tricky.
>
> I'll send you the Splynx login they gave us.
>
> > I *think* the new API should shorten things a lot. I think routers act
> as node parents, with clients underneath them? Otherwise, a "flat" setup
> should be a little shorter (the CSV code can be replaced with a call to the
> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
> MPLS+VPLS to pretend to have a big, flat network and then connect via a
> RADIUS call in the DHCP server; I've always assumed that's because those
> systems prefer the telecom model of "pretend everything is equal" to trying
> to model topology.*
>
> Yeah splynx doesn't seem to natively support any topology mapping or even
> AP designation, one person I spoke to said they track corresponding APs in
> radius anyway. So for now the flat model may be fine.
>
> > I need to clean things up a bit (there's still a bit of duplicated code,
> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
> parented the branch off of a branch instead of main) - but I think I can
> have this as a PR for you on Monday.
>
> This is really great work and will make future integrations much cleaner
> and nicer to work with. Thank you!
>
>
> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> Alright, the UISP side of the common integrations is pretty much feature
>> complete. I'll update the tracking issue in a bit.
>>
>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>    not added.
>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>    side of things, so it can be used in other integrations also. I don't have
>>    a setup on which to test it, but if I'm reading the code right then the
>>    unit test is testing it appropriately.
>>    - excludeSites is supported as a common API feature. If a node is
>>    added with a name that matches an excluded site, it won't be added. The
>>    tree builder is smart enough to replace invalid "parentId" references with
>>    the shaper root, so if you have other tree items that rely on this site -
>>    they will be added to the tree. Was that the intent? (It looks pretty
>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>    bumping it to the top-level with excludeSites would probably help our load
>>    balancing quite a bit)
>>       - If the intent was to exclude the site and everything underneath
>>       it, I'd have to rework things a bit. Let me know; it wasn't quite clear.
>>       - exceptionCPEs is also supported as a common API feature. It
>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>    Another potentially useful feature; if I got excludeSites the wrong away
>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>    - UISP integration now supports a "flat" topology option (set via
>>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>    to include this entry.
>>
>> I'll look and see how much of the Spylnx code I can shorten with the new
>> API; I don't have a Spylnx setup to test against, making that tricky. I
>> *think* the new API should shorten things a lot. I think routers act as
>> node parents, with clients underneath them? Otherwise, a "flat" setup
>> should be a little shorter (the CSV code can be replaced with a call to the
>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>> RADIUS call in the DHCP server; I've always assumed that's because those
>> systems prefer the telecom model of "pretend everything is equal" to trying
>> to model topology.*
>>
>> I need to clean things up a bit (there's still a bit of duplicated code,
>> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
>> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
>> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
>> parented the branch off of a branch instead of main) - but I think I can
>> have this as a PR for you on Monday.
>>
>> * - The first big wireless network I setup used a Motorola WiMAX setup.
>> They *required* that every single AP share two VLANs (management and
>> bearer) with every other AP - all the way to the core. It kinda worked once
>> they remembered client isolation was a thing in a patch... Then again,
>> their installation instructions included connecting two ports of a router
>> together with a jumper cable, because their localhost implementation didn't
>> quite work. :-|
>>
>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>> robert.chacon@jackrabbitwireless.com> wrote:
>>
>>> Awesome work. It succeeded in building the topology and creating
>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>> I notice that in ShapedDevices.csv it does add CPE radios (which in our
>>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>> being empty lists [].
>>> This is not necessarily bad, but it may lead to empty leaf classes being
>>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
>>> counter increment toward the 32k limit faster.
>>> Do you think perhaps we should check:
>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>> *   # Skip adding this entry to ShapedDevices.csv*
>>> Or something similar around line 329 of integrationCommon.py?
>>> Open to your suggestions there.
>>>
>>>
>>>
>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> One more update, and I'm going to sleep until "pick up daughter" time.
>>>> :-)
>>>>
>>>> The tree at
>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>> can now build a network.json, ShapedDevices.csv, and
>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>> previous importer - other than using data links to build the hierarchy and
>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>> recommend just running integrationUISP.py and checking the output rather
>>>> than throwing it into production), I'd appreciate any feedback.
>>>>
>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>> exceptionCPE and site exclusion.
>>>>
>>>> If you want the pretty graphics, you need to "pip install graphviz" and
>>>> "sudo apt install graphviz". It *should* detect that these aren't present
>>>> and not try to draw pictures, otherwise.
>>>>
>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>
>>>>> Wow. This is very nicely done. Awesome work!
>>>>>
>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>
>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>    purposes:
>>>>>>       - It's hard to be psychic and know for sure where the shaper
>>>>>>       is in the network.
>>>>>>       - You could run multiple shapers at different egress points,
>>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>>       network node.
>>>>>>    - "Child node with children" are now automatically converted into
>>>>>>    a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>>>       client site bandwidth.
>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>       that really should be there.
>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>    createNetworkJson() are called.
>>>>>>       - There's a unit test that generates the network.example.json
>>>>>>       file and compares it with the original to ensure that they match.
>>>>>>    - Unit test coverage hits every function in the graph system, now.
>>>>>>
>>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>>> tree is down to 65
>>>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>
>>>>>> Creating the network.example.json file only requires:
>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>         import json
>>>>>>         net = NetworkGraph()
>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>>>> site, 1000, 1000))
>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>>>> site, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType
>>>>>> .ap, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>> NodeType.site, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>> NodeType.site, 200, 200))
>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.
>>>>>> ap, 120, 120))
>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>> NodeType.site, 60, 60))
>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>> NodeType.ap, 30, 30))
>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>> NodeType.site, 200, 200))
>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.
>>>>>> ap, 100, 100))
>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType
>>>>>> .ap, 150, 150))
>>>>>>         net.prepareTree()
>>>>>>         net.createNetworkJson()
>>>>>>
>>>>>> (The id and name fields are duplicated right now, I'm using readable
>>>>>> names to keep me sane. The third string is the parent, and the last two
>>>>>> numbers are bandwidth limits)
>>>>>> The nice, readable format being:
>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>
>>>>>> That in turns gives you the example network:
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>> herberticus@gmail.com> wrote:
>>>>>>
>>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>>> glow "just right". :-)
>>>>>>>
>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look
>>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it
>>>>>>> easier to express network layout from integrations; that would be another
>>>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>>>> don't have any v6 to test, currently).
>>>>>>>
>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>
>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>
>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>
>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>> from discovery.
>>>>>>>>
>>>>>>>> I've also thought about doing routine trace routes or watching TTLs
>>>>>>>> or something like that to get some indication that topology has changed and
>>>>>>>> then do another discovery and potential tree rebuild.
>>>>>>>>
>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>
>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>
>>>>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>
>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>
>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>    sites
>>>>>>>>>>
>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>
>>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>>> [image: image.png]
>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>    - Red = a site
>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>
>>>>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>>>>> tiny fraction of the total code. :-)
>>>>>>>>>>
>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Chacón
>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> Dev | LibreQoS.io
>>>>>
>>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> Robert Chacón
>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>> Dev | LibreQoS.io
>>>
>>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> Robert Chacón
> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
> Dev | LibreQoS.io
>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

[-- Attachment #1.2: Type: text/html, Size: 40156 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-29 19:18                 ` Dave Taht
@ 2022-10-30  1:10                   ` Herbert Wolverson
  0 siblings, 0 replies; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-30  1:10 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 22416 bytes --]

> You talking about the relevant rfc?

In this case, the "6 to 4" refers to some integration code that was already
present - named "mikrotikFindIpv6.py". I probably should've made that more
clear. It connects to Mikrotik routers, and performs a MAC address search
in their DHCPv6 tables - finding known MAC addresses and providing the
allocated IPv6 address-space. Looks like a handy tool, and a good
work-around for UISP (Ubiquiti's combined management and CRM tool) only
kind-of supporting IPv6. The database format supports v6 addresses, but it
doesn't consistently put any data in there; worse, it doesn't show it
on-screen when it has it!

> Seems to be a need for some level of exclusions for device type, e.g. (at
least per your report), don't run ack-filter on a cambium path.

I agree with that longer-term. For now, I'm trying to get the existing
integrations up-to-speed and easy to work with. The whole "build on a good
foundation" thing. That's one thing I've learned the hard way over the
decades; it's a *lot* easier to shoot for the moon if you take the time to
come up with a good launch platform!

Longer-term, it's looking more and more like we'll need a more robust
discovery system. I've some ideas, but they are way too formative to be
useful yet. Some early thinking: there's a big disparity between what the
various back-ends WISPs (and ISPs in general) are using to manage and
monitor their networks, and the systems that handle CRM (billing,
ticketing, customer interaction, etc.). Spylnx and its ilk are great
billing systems, but don't really know a lot about your network arrangement
- it wouldn't surprise me if there are Spylnx and VISP users who also have
UISP (just the network management mode) going as well. On the other
extreme, PowerCode tries to write directly to your Mikrotik routers and
wants to know everything right down to your underwear colour.

In my mind:
* Step 1 (we're nearly there!) is to build a good foundation for
representing an IPv4/IPv6 network, that's really agnostic to all the crazy
things a WISP may be doing. It should automate all the tedious parts
(figuring out a tree from a soup of sites, access points, users -
rearranging the tree to have a "starting point", emitting the various
control files, etc.), be easy enough to use that someone could say "wow, I
need to support my management system" and be able to do so with a little
bit of hand-holding - encouraging participation.
* Step 2 would be to provide some great manual tools for the DIY crowd, and
some really good documentation to make their life easy.
* Step 3 is some kind of way to mix-and-match systems. Say you have Splynx
AND the management part of UISP. Wouldn't it be great if Spylnx could
provide all of the "plan" data, and the data be provided from UISP's
management"? It seems like that's quite do-able with a little work. We may
need to think about a management GUI at this point, just to help hold hands
a bit.
* Step 4 would be something Dan keeps asking about, ways to query hardware
that exists and build some topology around it. That would be great, and is
quite the undertaking (best tackled incrementally, and in a modular
fashion, IMHO).

This is still just the musings of a sleep-deprived brain. :-)

> Is there any particularly common set of radius servers in use?

It seems like when I poke deeply enough, most people are running FreeRADIUS
or something vendor-supplied (which is sometimes FreeRADIUS with a badge on
it). Then there's crazy people paying $10k for super high-end RADIUS
servers that aren't actually much better than the free ones. RADIUS is a
tough one, because LibreQoS isn't really well placed to directly utilize
it. Typically, RADIUS is basically a "yes or no" box, with options
attached. RADIUS queries happen on network entry (either as part of the
admissions process, part of the Ethernet security step, or from the DHCP
server) and the reply is basically "yes, you're admitted - these are your
options". The problem is, Libre doesn't necessarily see any of that - it's
inside the network. That's why we have API dependencies, even though Spylnx
and VISP are basically a really big billing system that comes bundled with
a RADIUS server. (Unfortunately, Mikrotik interprets the RADIUS replies to
make a simple queue on the router that made the request - you can script
that, but it gets messy fast).

I'll answer the second email in a bit.



On Sat, Oct 29, 2022 at 2:18 PM Dave Taht <dave.taht@gmail.com> wrote:

>
>
> On Sat, Oct 29, 2022 at 8:57 AM Herbert Wolverson via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> Alright, the UISP side of the common integrations is pretty much feature
>> complete. I'll update the tracking issue in a bit.
>>
>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>    not added.
>>
>> Every device that is ipv6-ready comes up with a link-local address
> derived from the mac like fe80::6f16:fa94:f32b:e2e
> Some actually will accept things like ssh to that address
> Not that this is necessarily relevant to this bit of code. Dr irrelevant I
> am today.
> (in the context of babel, at least, you can route ipv4 and ipv6 without
> either an ipv6 or ipv4 address, and hnetd configure)
>
> I am kind of curious as to what weird configuration protocols are in
> common use today
>
> Painfully common are "smart switches" that don't listen to dhcp by default
> AND come up on 192.168.1.1
> ubnt comes up on 192.168.1.20 by defualt
> a lot of cpe comes up on 192.168.1.100 (like cable and starlink)
> I've seen stuff that uses ancient ieee protocols
> bootp and tftp are still things
>
> I've always kind of wanted a daemon on every device that would probe all
> possible ip addresses with a ttl of 2, to find rogue
> devices etc.
>
>>
>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>    side of things, so it can be used in other integrations also. I don't have
>>    a setup on which to test it, but if I'm reading the code right then the
>>    unit test is testing it appropriately.
>>
>>
> You talking about the relevant rfc?
>
>
>>
>>    - excludeSites is supported as a common API feature. If a node is
>>    added with a name that matches an excluded site, it won't be added. The
>>    tree builder is smart enough to replace invalid "parentId" references with
>>    the shaper root, so if you have other tree items that rely on this site -
>>    they will be added to the tree. Was that the intent? (It looks pretty
>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>    bumping it to the top-level with excludeSites would probably help our load
>>    balancing quite a bit)
>>       - If the intent was to exclude the site and everything underneath
>>       it, I'd have to rework things a bit. Let me know; it wasn't quite clear.
>>       - exceptionCPEs is also supported as a common API feature. It
>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>    Another potentially useful feature; if I got excludeSites the wrong away
>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>
>>
> Seems to be a need for some level of exclusions for device type, e.g. (at
> least per your report), don't run ack-filter on a cambium path.
>
>
>>
>>    - UISP integration now supports a "flat" topology option (set via
>>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>    to include this entry.
>>
>> I'll look and see how much of the Spylnx code I can shorten with the new
>> API; I don't have a Spylnx setup to test against, making that tricky. I
>> *think* the new API should shorten things a lot. I think routers act as
>> node parents, with clients underneath them? Otherwise, a "flat" setup
>> should be a little shorter (the CSV code can be replaced with a call to the
>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>> RADIUS call in the DHCP server;
>>
>
> Is there any particularly common set of radius servers in use?
>
>
>> I've always assumed that's because those systems prefer the telecom model
>> of "pretend everything is equal" to trying to model topology.*
>>
>
> Except the billing. Always the billing. Our tuesday golden plate special
> is you can download all the pr0n from our special partner netblix for 24
> hours a week! 9.95!
>
>
>>
>> I need to clean things up a bit (there's still a bit of duplicated code,
>> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
>> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
>> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
>> parented the branch off of a branch instead of main) - but I think I can
>> have this as a PR for you on Monday.
>>
>> * - The first big wireless network I setup used a Motorola WiMAX setup.
>> They *required* that every single AP share two VLANs (management and
>> bearer) with every other AP - all the way to the core. It kinda worked once
>> they remembered client isolation was a thing in a patch... Then again,
>> their installation instructions included connecting two ports of a router
>> together with a jumper cable, because their localhost implementation didn't
>> quite work. :-|
>>
>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>> robert.chacon@jackrabbitwireless.com> wrote:
>>
>>> Awesome work. It succeeded in building the topology and creating
>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>> I notice that in ShapedDevices.csv it does add CPE radios (which in our
>>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>> being empty lists [].
>>> This is not necessarily bad, but it may lead to empty leaf classes being
>>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
>>> counter increment toward the 32k limit faster.
>>> Do you think perhaps we should check:
>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>> *   # Skip adding this entry to ShapedDevices.csv*
>>> Or something similar around line 329 of integrationCommon.py?
>>> Open to your suggestions there.
>>>
>>>
>>>
>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> One more update, and I'm going to sleep until "pick up daughter" time.
>>>> :-)
>>>>
>>>> The tree at
>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>> can now build a network.json, ShapedDevices.csv, and
>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>> previous importer - other than using data links to build the hierarchy and
>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>> recommend just running integrationUISP.py and checking the output rather
>>>> than throwing it into production), I'd appreciate any feedback.
>>>>
>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>> exceptionCPE and site exclusion.
>>>>
>>>> If you want the pretty graphics, you need to "pip install graphviz" and
>>>> "sudo apt install graphviz". It *should* detect that these aren't present
>>>> and not try to draw pictures, otherwise.
>>>>
>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>
>>>>> Wow. This is very nicely done. Awesome work!
>>>>>
>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>
>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>    purposes:
>>>>>>       - It's hard to be psychic and know for sure where the shaper
>>>>>>       is in the network.
>>>>>>       - You could run multiple shapers at different egress points,
>>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>>       network node.
>>>>>>    - "Child node with children" are now automatically converted into
>>>>>>    a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>>>       client site bandwidth.
>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>       that really should be there.
>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>    createNetworkJson() are called.
>>>>>>       - There's a unit test that generates the network.example.json
>>>>>>       file and compares it with the original to ensure that they match.
>>>>>>    - Unit test coverage hits every function in the graph system, now.
>>>>>>
>>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>>> tree is down to 65
>>>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>
>>>>>> Creating the network.example.json file only requires:
>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>         import json
>>>>>>         net = NetworkGraph()
>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>>>> site, 1000, 1000))
>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>>>> site, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType
>>>>>> .ap, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>> NodeType.site, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>> NodeType.site, 200, 200))
>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.
>>>>>> ap, 120, 120))
>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>> NodeType.site, 60, 60))
>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>> NodeType.ap, 30, 30))
>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>> NodeType.site, 200, 200))
>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.
>>>>>> ap, 100, 100))
>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType
>>>>>> .ap, 150, 150))
>>>>>>         net.prepareTree()
>>>>>>         net.createNetworkJson()
>>>>>>
>>>>>> (The id and name fields are duplicated right now, I'm using readable
>>>>>> names to keep me sane. The third string is the parent, and the last two
>>>>>> numbers are bandwidth limits)
>>>>>> The nice, readable format being:
>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>
>>>>>> That in turns gives you the example network:
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>> herberticus@gmail.com> wrote:
>>>>>>
>>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>>> glow "just right". :-)
>>>>>>>
>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look
>>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it
>>>>>>> easier to express network layout from integrations; that would be another
>>>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>>>> don't have any v6 to test, currently).
>>>>>>>
>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>
>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>
>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>
>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>> from discovery.
>>>>>>>>
>>>>>>>> I've also thought about doing routine trace routes or watching TTLs
>>>>>>>> or something like that to get some indication that topology has changed and
>>>>>>>> then do another discovery and potential tree rebuild.
>>>>>>>>
>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>
>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>
>>>>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>
>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>
>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>    sites
>>>>>>>>>>
>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>
>>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>>> [image: image.png]
>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>    - Red = a site
>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>
>>>>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>>>>> tiny fraction of the total code. :-)
>>>>>>>>>>
>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Chacón
>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> Dev | LibreQoS.io
>>>>>
>>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> Robert Chacón
>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>> Dev | LibreQoS.io
>>>
>>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>

[-- Attachment #1.2: Type: text/html, Size: 41564 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-29 19:43                   ` Dave Taht
@ 2022-10-30  1:45                     ` Herbert Wolverson
  2022-10-31  0:15                       ` Dave Taht
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-30  1:45 UTC (permalink / raw)
  To: Dave Taht; +Cc: Robert Chacón, libreqos


[-- Attachment #1.1: Type: text/plain, Size: 24316 bytes --]

> For starters, let me also offer praise for this work which is so ahead of
schedule!

Thank you. I'm enjoying a short period while I wait for my editor to finish
up with a couple of chapters of my next book (working title More Hands-on
Rust; it's intermediate to advanced Rust, taught through the lens of game
development).

I think at least initially, the primary focus is on what WISPs are used to
(and ask for): a fat shaper box that sits between a WISP and their Internet
connection(s). Usually in the topology: (router connected to upstream) <-->
(LibreQoS) <--> (core site router, connected to the WISP's network as a
whole). That's a simplification; there's usually a bypass (in case LibreQoS
dies, is being updated, etc.), sometimes multiple connections that need
shaping, etc. That's how Preseem (and the others) tend to insert themselves
- shape everything on the way out.

I think there's a lot to be said for the possibility of LibreQoS at towers
that need it the most, also. That might require a bit of MPLS support (I
can do the xdp-cpumap-tc part; I'm not sure what the classifier does if it
receives a packet with the TCP/UDP header stuck behind some MPLS headers?),
but has the potential to really clean things up. Especially for a really
busy tower site. (On a similar note, WISPs with multiple Internet
connections at different sites would benefit from LibreQoS on each of
them).

Generally, the QoS box doesn't really care what you are running in the way
of a router. We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit
of Cisco in the mix too!), I know of people who love Juniper, use Cisco,
etc. Since we're shaping in the "router sandwich" (which can be one router
with a bit of care), we don't necessarily need to worry too much about
their innards.

With that said, some future SNMP support (please, not polling everything
all the time... that's a monitoring program's job!) is probably hard to
avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to
be trying to cease supporting  it, ugh)

I could see some support for outputting rules for routers, especially if
the goal is to get Cake managing buffer-bloat in many places down the line.

Incidentally, using my latest build of cpumap-pping (and no separate pping
running, eating a CPU) my average network latency has dropped to 24ms at
peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
customer traffic through the system. :-)




On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote:

> For starters, let me also offer praise for this work which is so ahead of
> schedule!
>
> I am (perhaps cluelessly) thinking about bigger pictures, and still stuck
> in my mindset involving distributing the packet processing,
> and representing the network topology, plans and compensating for the
> physics.
>
> So you have a major tower, a separate libreqos instance goes there. Or
> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
> there. Or are you basically thinking one device rules them all and off the
> only interface, shapes them?
>
> Or:
>
> You have another pop with a separate connection to the internet that you
> inherited from a buyout, or you wanted physical redundancy for your BGP
> AS's internet access, maybe just between DCs in the same town or...
>     ____________________________________________
>
> /
> /
> cloud -> pop -> customers - customers <- pop <- cloud
>                  \  ----- leased fiber or wireless   /
>
>
> I'm also a little puzzled as to whats the ISP->internet link? juniper?
> cisco? mikrotik, and what role and services that is expected to have.
>
>
>
> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
>> added.
>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side
>> of things, so it can be used in other integrations also. I don't have a
>> setup on which to test it, but if I'm reading the code right then the unit
>> test is testing it appropriately.
>>
>> Fantastic.
>>
>> > excludeSites is supported as a common API feature. If a node is added
>> with a name that matches an excluded site, it won't be added. The tree
>> builder is smart enough to replace invalid "parentId" references with the
>> shaper root, so if you have other tree items that rely on this site - they
>> will be added to the tree. Was that the intent? (It looks pretty useful; we
>> have a child site down the tree with a HUGE amount of load, and bumping it
>> to the top-level with excludeSites would probably help our load balancing
>> quite a bit)
>>
>> Very cool approach, I like it! Yeah we have some cases where we need to
>> balance out high load child nodes across CPUs so that's perfect.
>> Originally I thought of it to just exclude sites that don't fit into the
>> shaped topology but this approach is more useful.
>> Should we rename excludeSites to moveSitesToTop or something similar?
>> That functionality of distributing across top level nodes / cpu cores seems
>> more important anyway.
>>
>> >exceptionCPEs is also supported as a common API feature. It simply
>> overrides the "parentId'' of incoming nodes with the new parent. Another
>> potentially useful feature; if I got excludeSites the wrong away around,
>> I'd add a "my_big_site":"" entry to push it to the top.
>>
>> Awesome
>>
>> > UISP integration now supports a "flat" topology option (set via
>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to
>> include this entry.
>>
>> Nice!
>>
>> > I'll look and see how much of the Spylnx code I can shorten with the
>> new API; I don't have a Spylnx setup to test against, making that tricky.
>>
>> I'll send you the Splynx login they gave us.
>>
>> > I *think* the new API should shorten things a lot. I think routers act
>> as node parents, with clients underneath them? Otherwise, a "flat" setup
>> should be a little shorter (the CSV code can be replaced with a call to the
>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>> RADIUS call in the DHCP server; I've always assumed that's because those
>> systems prefer the telecom model of "pretend everything is equal" to trying
>> to model topology.*
>>
>> Yeah splynx doesn't seem to natively support any topology mapping or even
>> AP designation, one person I spoke to said they track corresponding APs in
>> radius anyway. So for now the flat model may be fine.
>>
>> > I need to clean things up a bit (there's still a bit of duplicated
>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>> accidentally parented the branch off of a branch instead of main) - but I
>> think I can have this as a PR for you on Monday.
>>
>> This is really great work and will make future integrations much cleaner
>> and nicer to work with. Thank you!
>>
>>
>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> Alright, the UISP side of the common integrations is pretty much feature
>>> complete. I'll update the tracking issue in a bit.
>>>
>>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>>    not added.
>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>    side of things, so it can be used in other integrations also. I don't have
>>>    a setup on which to test it, but if I'm reading the code right then the
>>>    unit test is testing it appropriately.
>>>    - excludeSites is supported as a common API feature. If a node is
>>>    added with a name that matches an excluded site, it won't be added. The
>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>    the shaper root, so if you have other tree items that rely on this site -
>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>    bumping it to the top-level with excludeSites would probably help our load
>>>    balancing quite a bit)
>>>       - If the intent was to exclude the site and everything underneath
>>>       it, I'd have to rework things a bit. Let me know; it wasn't quite clear.
>>>       - exceptionCPEs is also supported as a common API feature. It
>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>    - UISP integration now supports a "flat" topology option (set via
>>>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>    to include this entry.
>>>
>>> I'll look and see how much of the Spylnx code I can shorten with the new
>>> API; I don't have a Spylnx setup to test against, making that tricky. I
>>> *think* the new API should shorten things a lot. I think routers act as
>>> node parents, with clients underneath them? Otherwise, a "flat" setup
>>> should be a little shorter (the CSV code can be replaced with a call to the
>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>>> RADIUS call in the DHCP server; I've always assumed that's because those
>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>> to model topology.*
>>>
>>> I need to clean things up a bit (there's still a bit of duplicated code,
>>> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
>>> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
>>> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
>>> parented the branch off of a branch instead of main) - but I think I can
>>> have this as a PR for you on Monday.
>>>
>>> * - The first big wireless network I setup used a Motorola WiMAX setup.
>>> They *required* that every single AP share two VLANs (management and
>>> bearer) with every other AP - all the way to the core. It kinda worked once
>>> they remembered client isolation was a thing in a patch... Then again,
>>> their installation instructions included connecting two ports of a router
>>> together with a jumper cable, because their localhost implementation didn't
>>> quite work. :-|
>>>
>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>
>>>> Awesome work. It succeeded in building the topology and creating
>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in our
>>>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>> being empty lists [].
>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>> class counter increment toward the 32k limit faster.
>>>> Do you think perhaps we should check:
>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>> Or something similar around line 329 of integrationCommon.py?
>>>> Open to your suggestions there.
>>>>
>>>>
>>>>
>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> One more update, and I'm going to sleep until "pick up daughter" time.
>>>>> :-)
>>>>>
>>>>> The tree at
>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>> previous importer - other than using data links to build the hierarchy and
>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>
>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>> exceptionCPE and site exclusion.
>>>>>
>>>>> If you want the pretty graphics, you need to "pip install graphviz"
>>>>> and "sudo apt install graphviz". It *should* detect that these aren't
>>>>> present and not try to draw pictures, otherwise.
>>>>>
>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>
>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>
>>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>>    purposes:
>>>>>>>       - It's hard to be psychic and know for sure where the shaper
>>>>>>>       is in the network.
>>>>>>>       - You could run multiple shapers at different egress points,
>>>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>>>       network node.
>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>>>>       client site bandwidth.
>>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>>       that really should be there.
>>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>>    createNetworkJson() are called.
>>>>>>>       - There's a unit test that generates the network.example.json
>>>>>>>       file and compares it with the original to ensure that they match.
>>>>>>>    - Unit test coverage hits every function in the graph system,
>>>>>>>    now.
>>>>>>>
>>>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>>>> tree is down to 65
>>>>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>
>>>>>>> Creating the network.example.json file only requires:
>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>>         import json
>>>>>>>         net = NetworkGraph()
>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>>>>> site, 1000, 1000))
>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>>>>> site, 500, 500))
>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>> NodeType.ap, 500, 500))
>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>> NodeType.site, 500, 500))
>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>> NodeType.site, 200, 200))
>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType
>>>>>>> .ap, 120, 120))
>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>> NodeType.site, 60, 60))
>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>> NodeType.ap, 30, 30))
>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>> NodeType.site, 200, 200))
>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType
>>>>>>> .ap, 100, 100))
>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>> NodeType.ap, 150, 150))
>>>>>>>         net.prepareTree()
>>>>>>>         net.createNetworkJson()
>>>>>>>
>>>>>>> (The id and name fields are duplicated right now, I'm using readable
>>>>>>> names to keep me sane. The third string is the parent, and the last two
>>>>>>> numbers are bandwidth limits)
>>>>>>> The nice, readable format being:
>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>>
>>>>>>> That in turns gives you the example network:
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>> herberticus@gmail.com> wrote:
>>>>>>>
>>>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>>>> glow "just right". :-)
>>>>>>>>
>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look
>>>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it
>>>>>>>> easier to express network layout from integrations; that would be another
>>>>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>>>>> don't have any v6 to test, currently).
>>>>>>>>
>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>>
>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>
>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>> from discovery.
>>>>>>>>>
>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>
>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Robert
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>
>>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>>
>>>>>>>>>>> I started by implementing a graph structure that mirrors both
>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>
>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>
>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>>    sites
>>>>>>>>>>>
>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>
>>>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>>>> [image: image.png]
>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>
>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>
>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Robert Chacón
>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Chacón
>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>> Dev | LibreQoS.io
>>>>>>
>>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Chacón
>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>> Dev | LibreQoS.io
>>>>
>>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>
>>
>> --
>> Robert Chacón
>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>> Dev | LibreQoS.io
>>
>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>

[-- Attachment #1.2: Type: text/html, Size: 43273 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-30  1:45                     ` Herbert Wolverson
@ 2022-10-31  0:15                       ` Dave Taht
  2022-10-31  1:15                         ` Robert Chacón
  2022-10-31  1:26                         ` Herbert Wolverson
  0 siblings, 2 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-31  0:15 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: Robert Chacón, libreqos


[-- Attachment #1.1: Type: text/plain, Size: 27688 bytes --]

On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com>
wrote:

> > For starters, let me also offer praise for this work which is so ahead
> of schedule!
>
> Thank you. I'm enjoying a short period while I wait for my editor to
> finish up with a couple of chapters of my next book (working title More
> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
> of game development).
>

cool. I'm 32 years into my PHD thesis.


>
> I think at least initially, the primary focus is on what WISPs are used to
> (and ask for): a fat shaper box that sits between a WISP and their Internet
> connection(s). Usually in the topology: (router connected to upstream) <-->
> (LibreQoS) <--> (core site router, connected to the WISP's network as a
> whole). That's a simplification; there's usually a bypass (in case LibreQoS
> dies, is being updated, etc.), sometimes multiple connections that need
> shaping, etc. That's how Preseem (and the others) tend to insert themselves
> - shape everything on the way out.
>

Presently LibreQos appears to be inserting about 200us of delay into the
path, for the sparsest packets. Every box on the path adds
delay, though cut-through switches are common. Don't talk to me about
network slicing and disaggregated this or that in the 3GPP world, tho...
ugh.

I guess, for every "box" (or virtual machine) on the path I have amdah's
law stuck in my head.

This is in part why the K8 crowd makes me a little crazy.


>
> I think there's a lot to be said for the possibility of LibreQoS at towers
> that need it the most, also. That might require a bit of MPLS support (I
> can do the xdp-cpumap-tc part; I'm not sure what the classifier does if it
> receives a packet with the TCP/UDP header stuck behind some MPLS headers?),
> but has the potential to really clean things up. Especially for a really
> busy tower site. (On a similar note, WISPs with multiple Internet
> connections at different sites would benefit from LibreQoS on each of
> them).
>
> Generally, the QoS box doesn't really care what you are running in the way
> of a router.
>

It is certainly simpler to have a transparent middlebox for this stuff,
initially, and it would take a great leap of faith,
for many, to just plug in a lqos box as the main box... but cumulus did
succeed at a lot of that... they open sourced a bfd daemon... numerous
other tools...

https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/


> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in
> the mix too!), I know of people who love Juniper, use Cisco, etc. Since
> we're shaping in the "router sandwich" (which can be one router with a bit
> of care), we don't necessarily need to worry too much about their innards.
>
>
An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, or
a pair perhaps using a fiber optic splitter for failover

http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos




> With that said, some future SNMP support (please, not polling everything
> all the time... that's a monitoring program's job!) is probably hard to
> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to
> be trying to cease supporting  it, ugh)
>
>
Building on this initial core strength - sampling RTT - would be a
differentiator.

Examples:

RTT per AP
RTT P1 per AP (what's the effective minimum)
RTT P99 (what's the worst case?)
RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS number
or /24

(variance is a very important concept)





> I could see some support for outputting rules for routers, especially if
> the goal is to get Cake managing buffer-bloat in many places down the line.
>
> Incidentally, using my latest build of cpumap-pping (and no separate pping
> running, eating a CPU) my average network latency has dropped to 24ms at
> peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
> customer traffic through the system. :-)
>

OK, this is something that "triggers" my inner pedant. Forgive me in
advance?

"average" of "what"?

Changing the monitoring tool shouldn't have affected the average latency,
unless how it is calculated is different, or the sample
population (more likely) has changed. If you are tracking now far more
short flows, the observed latency will decline, but the
higher latencies you were observing in the first place are still there.

Also... between where and where? Across the network? To the customer to
their typical set of IP addresses of their servers?
on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
should take under 2ms). Wifi hops at the end of the link are
probably adding the most delay...

If you consider 24ms "good" - however you calculate -  going for ever less
via whatever means can be obtained from these
analyses, is useful. But there are some things I don't think make as much
sense as they used to - a netflix cache hitrate must
be so low nowadays as to cost you just as much to fetch it from upstream
than host a box...




>
>
>
>
> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote:
>
>> For starters, let me also offer praise for this work which is so ahead of
>> schedule!
>>
>> I am (perhaps cluelessly) thinking about bigger pictures, and still stuck
>> in my mindset involving distributing the packet processing,
>> and representing the network topology, plans and compensating for the
>> physics.
>>
>> So you have a major tower, a separate libreqos instance goes there. Or
>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>> there. Or are you basically thinking one device rules them all and off the
>> only interface, shapes them?
>>
>> Or:
>>
>> You have another pop with a separate connection to the internet that you
>> inherited from a buyout, or you wanted physical redundancy for your BGP
>> AS's internet access, maybe just between DCs in the same town or...
>>     ____________________________________________
>>
>> /
>> /
>> cloud -> pop -> customers - customers <- pop <- cloud
>>                  \  ----- leased fiber or wireless   /
>>
>>
>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>> cisco? mikrotik, and what role and services that is expected to have.
>>
>>
>>
>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
>>> added.
>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>> side of things, so it can be used in other integrations also. I don't have
>>> a setup on which to test it, but if I'm reading the code right then the
>>> unit test is testing it appropriately.
>>>
>>> Fantastic.
>>>
>>> > excludeSites is supported as a common API feature. If a node is added
>>> with a name that matches an excluded site, it won't be added. The tree
>>> builder is smart enough to replace invalid "parentId" references with the
>>> shaper root, so if you have other tree items that rely on this site - they
>>> will be added to the tree. Was that the intent? (It looks pretty useful; we
>>> have a child site down the tree with a HUGE amount of load, and bumping it
>>> to the top-level with excludeSites would probably help our load balancing
>>> quite a bit)
>>>
>>> Very cool approach, I like it! Yeah we have some cases where we need to
>>> balance out high load child nodes across CPUs so that's perfect.
>>> Originally I thought of it to just exclude sites that don't fit into the
>>> shaped topology but this approach is more useful.
>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>> That functionality of distributing across top level nodes / cpu cores seems
>>> more important anyway.
>>>
>>> >exceptionCPEs is also supported as a common API feature. It simply
>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>> potentially useful feature; if I got excludeSites the wrong away around,
>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>
>>> Awesome
>>>
>>> > UISP integration now supports a "flat" topology option (set via
>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to
>>> include this entry.
>>>
>>> Nice!
>>>
>>> > I'll look and see how much of the Spylnx code I can shorten with the
>>> new API; I don't have a Spylnx setup to test against, making that tricky.
>>>
>>> I'll send you the Splynx login they gave us.
>>>
>>> > I *think* the new API should shorten things a lot. I think routers
>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>> setup should be a little shorter (the CSV code can be replaced with a call
>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>> to model topology.*
>>>
>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>> even AP designation, one person I spoke to said they track corresponding
>>> APs in radius anyway. So for now the flat model may be fine.
>>>
>>> > I need to clean things up a bit (there's still a bit of duplicated
>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>> accidentally parented the branch off of a branch instead of main) - but I
>>> think I can have this as a PR for you on Monday.
>>>
>>> This is really great work and will make future integrations much cleaner
>>> and nicer to work with. Thank you!
>>>
>>>
>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> Alright, the UISP side of the common integrations is pretty much
>>>> feature complete. I'll update the tracking issue in a bit.
>>>>
>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>>>    not added.
>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>    then the unit test is testing it appropriately.
>>>>    - excludeSites is supported as a common API feature. If a node is
>>>>    added with a name that matches an excluded site, it won't be added. The
>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>    balancing quite a bit)
>>>>       - If the intent was to exclude the site and everything
>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>       quite clear.
>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>    - UISP integration now supports a "flat" topology option (set via
>>>>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>>    to include this entry.
>>>>
>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>> *think* the new API should shorten things a lot. I think routers act
>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup
>>>> should be a little shorter (the CSV code can be replaced with a call to the
>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>>>> RADIUS call in the DHCP server; I've always assumed that's because those
>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>> to model topology.*
>>>>
>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>> think I can have this as a PR for you on Monday.
>>>>
>>>> * - The first big wireless network I setup used a Motorola WiMAX setup.
>>>> They *required* that every single AP share two VLANs (management and
>>>> bearer) with every other AP - all the way to the core. It kinda worked once
>>>> they remembered client isolation was a thing in a patch... Then again,
>>>> their installation instructions included connecting two ports of a router
>>>> together with a jumper cable, because their localhost implementation didn't
>>>> quite work. :-|
>>>>
>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>
>>>>> Awesome work. It succeeded in building the topology and creating
>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>> being empty lists [].
>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>> class counter increment toward the 32k limit faster.
>>>>> Do you think perhaps we should check:
>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>> Open to your suggestions there.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>> time. :-)
>>>>>>
>>>>>> The tree at
>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>
>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>> exceptionCPE and site exclusion.
>>>>>>
>>>>>> If you want the pretty graphics, you need to "pip install graphviz"
>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't
>>>>>> present and not try to draw pictures, otherwise.
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>
>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>
>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>
>>>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>>>    purposes:
>>>>>>>>       - It's hard to be psychic and know for sure where the shaper
>>>>>>>>       is in the network.
>>>>>>>>       - You could run multiple shapers at different egress points,
>>>>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>>>>       network node.
>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>       the client site bandwidth.
>>>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>>>       that really should be there.
>>>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>>>    createNetworkJson() are called.
>>>>>>>>       - There's a unit test that generates the
>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>       they match.
>>>>>>>>    - Unit test coverage hits every function in the graph system,
>>>>>>>>    now.
>>>>>>>>
>>>>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>>>>> tree is down to 65
>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>> (
>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>
>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>>>         import json
>>>>>>>>         net = NetworkGraph()
>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType
>>>>>>>> .site, 1000, 1000))
>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType
>>>>>>>> .site, 500, 500))
>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>         net.prepareTree()
>>>>>>>>         net.createNetworkJson()
>>>>>>>>
>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>> The nice, readable format being:
>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>>>
>>>>>>>> That in turns gives you the example network:
>>>>>>>> [image: image.png]
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>> herberticus@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>>>>> glow "just right". :-)
>>>>>>>>>
>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>
>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>>>
>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>
>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>> from discovery.
>>>>>>>>>>
>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>>>>> Being able to map out these complex integrations will help ISPs
>>>>>>>>>>> a ton, and I really like that it is sharing common features between the
>>>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Robert
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>
>>>>>>>>>>>> I started by implementing a graph structure that mirrors both
>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>
>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>
>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>>>    sites
>>>>>>>>>>>>
>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>
>>>>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>
>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>
>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Robert Chacón
>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>> LibreQoS mailing list
>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Chacón
>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>> Dev | LibreQoS.io
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> Dev | LibreQoS.io
>>>>>
>>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> Robert Chacón
>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>> Dev | LibreQoS.io
>>>
>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>
>>
>>
>> --
>> This song goes out to all the folk that thought Stadia would work:
>>
>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>> Dave Täht CEO, TekLibre, LLC
>>
>

-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

[-- Attachment #1.2: Type: text/html, Size: 48489 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  0:15                       ` Dave Taht
@ 2022-10-31  1:15                         ` Robert Chacón
  2022-10-31  1:26                         ` Herbert Wolverson
  1 sibling, 0 replies; 33+ messages in thread
From: Robert Chacón @ 2022-10-31  1:15 UTC (permalink / raw)
  To: Dave Taht; +Cc: Herbert Wolverson, libreqos


[-- Attachment #1.1: Type: text/plain, Size: 29972 bytes --]

> RTT per AP
RTT P1 per AP (what's the effective minimum)
RTT P99 (what's the worst case?)
RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS number
or /24

Working on it. RTT per AP is actually already there in v1.3 - graphed in
InfluxDB.
We just need to keep testing cpumap-pping with more real world traffic.
When I tried it today it worked great for 99% of users. And was very
resource efficient.
There's a small issue when clients have plans past 500Mbps
<https://github.com/thebracket/cpumap-pping/issues/2> but that's admittedly
an edge case for most small ISPs.
For now we could implement a toggle between LibreQoS using xdp-cpumap-tc or
cpumap-pping in ispConfig.py until that's figured out eventually.

> "average" of "what"?

Probably average RTT from end-user households to CDNs and major IXs?
I think he means when running Pollere's PPing instead of his much faster
XDP based cpumap-pping, PPing was hammering the CPU so hard it was
negatively affecting end-user RTT.
The original PPing chokes after 1Gbps or so and uses way too much CPU,
which likely hindered the functionality of HTBs and CAKE instances on the
same cores.

> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
should take under 2ms). Wifi hops at the end of the link are
probably adding the most delay...

I think they're mostly wireless. 24ms aint bad at all from the end user to
IX!

On Sun, Oct 30, 2022 at 6:15 PM Dave Taht <dave.taht@gmail.com> wrote:

>
>
> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com>
> wrote:
>
>> > For starters, let me also offer praise for this work which is so ahead
>> of schedule!
>>
>> Thank you. I'm enjoying a short period while I wait for my editor to
>> finish up with a couple of chapters of my next book (working title More
>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
>> of game development).
>>
>
> cool. I'm 32 years into my PHD thesis.
>
>
>>
>> I think at least initially, the primary focus is on what WISPs are used
>> to (and ask for): a fat shaper box that sits between a WISP and their
>> Internet connection(s). Usually in the topology: (router connected to
>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's
>> network as a whole). That's a simplification; there's usually a bypass (in
>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections
>> that need shaping, etc. That's how Preseem (and the others) tend to insert
>> themselves - shape everything on the way out.
>>
>
> Presently LibreQos appears to be inserting about 200us of delay into the
> path, for the sparsest packets. Every box on the path adds
> delay, though cut-through switches are common. Don't talk to me about
> network slicing and disaggregated this or that in the 3GPP world, tho...
> ugh.
>
> I guess, for every "box" (or virtual machine) on the path I have amdah's
> law stuck in my head.
>
> This is in part why the K8 crowd makes me a little crazy.
>
>
>>
>> I think there's a lot to be said for the possibility of LibreQoS at
>> towers that need it the most, also. That might require a bit of MPLS
>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier
>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS
>> headers?), but has the potential to really clean things up. Especially for
>> a really busy tower site. (On a similar note, WISPs with multiple Internet
>> connections at different sites would benefit from LibreQoS on each of
>> them).
>>
>> Generally, the QoS box doesn't really care what you are running in the
>> way of a router.
>>
>
> It is certainly simpler to have a transparent middlebox for this stuff,
> initially, and it would take a great leap of faith,
> for many, to just plug in a lqos box as the main box... but cumulus did
> succeed at a lot of that... they open sourced a bfd daemon... numerous
> other tools...
>
> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/
>
>
>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in
>> the mix too!), I know of people who love Juniper, use Cisco, etc. Since
>> we're shaping in the "router sandwich" (which can be one router with a bit
>> of care), we don't necessarily need to worry too much about their innards.
>>
>>
> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff,
> or a pair perhaps using a fiber optic splitter for failover
>
> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos
>
>
>
>
>> With that said, some future SNMP support (please, not polling everything
>> all the time... that's a monitoring program's job!) is probably hard to
>> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to
>> be trying to cease supporting  it, ugh)
>>
>>
> Building on this initial core strength - sampling RTT - would be a
> differentiator.
>
> Examples:
>
> RTT per AP
> RTT P1 per AP (what's the effective minimum)
> RTT P99 (what's the worst case?)
> RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS number
> or /24
>
> (variance is a very important concept)
>
>
>
>
>
>> I could see some support for outputting rules for routers, especially if
>> the goal is to get Cake managing buffer-bloat in many places down the line.
>>
>> Incidentally, using my latest build of cpumap-pping (and no separate
>> pping running, eating a CPU) my average network latency has dropped to 24ms
>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
>> customer traffic through the system. :-)
>>
>
> OK, this is something that "triggers" my inner pedant. Forgive me in
> advance?
>
> "average" of "what"?
>
> Changing the monitoring tool shouldn't have affected the average latency,
> unless how it is calculated is different, or the sample
> population (more likely) has changed. If you are tracking now far more
> short flows, the observed latency will decline, but the
> higher latencies you were observing in the first place are still there.
>
> Also... between where and where? Across the network? To the customer to
> their typical set of IP addresses of their servers?
> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
> should take under 2ms). Wifi hops at the end of the link are
> probably adding the most delay...
>
> If you consider 24ms "good" - however you calculate -  going for ever less
> via whatever means can be obtained from these
> analyses, is useful. But there are some things I don't think make as much
> sense as they used to - a netflix cache hitrate must
> be so low nowadays as to cost you just as much to fetch it from upstream
> than host a box...
>
>
>
>
>>
>>
>>
>>
>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote:
>>
>>> For starters, let me also offer praise for this work which is so ahead
>>> of schedule!
>>>
>>> I am (perhaps cluelessly) thinking about bigger pictures, and still
>>> stuck in my mindset involving distributing the packet processing,
>>> and representing the network topology, plans and compensating for the
>>> physics.
>>>
>>> So you have a major tower, a separate libreqos instance goes there. Or
>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>>> there. Or are you basically thinking one device rules them all and off the
>>> only interface, shapes them?
>>>
>>> Or:
>>>
>>> You have another pop with a separate connection to the internet that you
>>> inherited from a buyout, or you wanted physical redundancy for your BGP
>>> AS's internet access, maybe just between DCs in the same town or...
>>>     ____________________________________________
>>>
>>> /
>>> /
>>> cloud -> pop -> customers - customers <- pop <- cloud
>>>                  \  ----- leased fiber or wireless   /
>>>
>>>
>>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>>> cisco? mikrotik, and what role and services that is expected to have.
>>>
>>>
>>>
>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
>>>> added.
>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>> side of things, so it can be used in other integrations also. I don't have
>>>> a setup on which to test it, but if I'm reading the code right then the
>>>> unit test is testing it appropriately.
>>>>
>>>> Fantastic.
>>>>
>>>> > excludeSites is supported as a common API feature. If a node is added
>>>> with a name that matches an excluded site, it won't be added. The tree
>>>> builder is smart enough to replace invalid "parentId" references with the
>>>> shaper root, so if you have other tree items that rely on this site - they
>>>> will be added to the tree. Was that the intent? (It looks pretty useful; we
>>>> have a child site down the tree with a HUGE amount of load, and bumping it
>>>> to the top-level with excludeSites would probably help our load balancing
>>>> quite a bit)
>>>>
>>>> Very cool approach, I like it! Yeah we have some cases where we need to
>>>> balance out high load child nodes across CPUs so that's perfect.
>>>> Originally I thought of it to just exclude sites that don't fit into
>>>> the shaped topology but this approach is more useful.
>>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>>> That functionality of distributing across top level nodes / cpu cores seems
>>>> more important anyway.
>>>>
>>>> >exceptionCPEs is also supported as a common API feature. It simply
>>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>>> potentially useful feature; if I got excludeSites the wrong away around,
>>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>>
>>>> Awesome
>>>>
>>>> > UISP integration now supports a "flat" topology option (set via
>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>> to include this entry.
>>>>
>>>> Nice!
>>>>
>>>> > I'll look and see how much of the Spylnx code I can shorten with the
>>>> new API; I don't have a Spylnx setup to test against, making that tricky.
>>>>
>>>> I'll send you the Splynx login they gave us.
>>>>
>>>> > I *think* the new API should shorten things a lot. I think routers
>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>> to model topology.*
>>>>
>>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>>> even AP designation, one person I spoke to said they track corresponding
>>>> APs in radius anyway. So for now the flat model may be fine.
>>>>
>>>> > I need to clean things up a bit (there's still a bit of duplicated
>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>> think I can have this as a PR for you on Monday.
>>>>
>>>> This is really great work and will make future integrations much
>>>> cleaner and nicer to work with. Thank you!
>>>>
>>>>
>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> Alright, the UISP side of the common integrations is pretty much
>>>>> feature complete. I'll update the tracking issue in a bit.
>>>>>
>>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>>>>    not added.
>>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>>    then the unit test is testing it appropriately.
>>>>>    - excludeSites is supported as a common API feature. If a node is
>>>>>    added with a name that matches an excluded site, it won't be added. The
>>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>>    balancing quite a bit)
>>>>>       - If the intent was to exclude the site and everything
>>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>>       quite clear.
>>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>>    - UISP integration now supports a "flat" topology option (set via
>>>>>    uispStrategy = "flat" in ispConfig). I expanded
>>>>>    ispConfig.example.py to include this entry.
>>>>>
>>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>>> *think* the new API should shorten things a lot. I think routers act
>>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup
>>>>> should be a little shorter (the CSV code can be replaced with a call to the
>>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>>>>> RADIUS call in the DHCP server; I've always assumed that's because those
>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>> to model topology.*
>>>>>
>>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>> think I can have this as a PR for you on Monday.
>>>>>
>>>>> * - The first big wireless network I setup used a Motorola WiMAX
>>>>> setup. They *required* that every single AP share two VLANs
>>>>> (management and bearer) with every other AP - all the way to the core. It
>>>>> kinda worked once they remembered client isolation was a thing in a
>>>>> patch... Then again, their installation instructions included connecting
>>>>> two ports of a router together with a jumper cable, because their localhost
>>>>> implementation didn't quite work. :-|
>>>>>
>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>
>>>>>> Awesome work. It succeeded in building the topology and creating
>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>>> being empty lists [].
>>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>>> class counter increment toward the 32k limit faster.
>>>>>> Do you think perhaps we should check:
>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>>> Open to your suggestions there.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>>> time. :-)
>>>>>>>
>>>>>>> The tree at
>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>>
>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>>> exceptionCPE and site exclusion.
>>>>>>>
>>>>>>> If you want the pretty graphics, you need to "pip install graphviz"
>>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't
>>>>>>> present and not try to draw pictures, otherwise.
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>>
>>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>>
>>>>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>>>>    purposes:
>>>>>>>>>       - It's hard to be psychic and know for sure where the
>>>>>>>>>       shaper is in the network.
>>>>>>>>>       - You could run multiple shapers at different egress
>>>>>>>>>       points, with failover - and rebuild the entire topology from the point of
>>>>>>>>>       view of a network node.
>>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>>       the client site bandwidth.
>>>>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>>>>       that really should be there.
>>>>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>>>>    createNetworkJson() are called.
>>>>>>>>>       - There's a unit test that generates the
>>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>>       they match.
>>>>>>>>>    - Unit test coverage hits every function in the graph system,
>>>>>>>>>    now.
>>>>>>>>>
>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic
>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate
>>>>>>>>> the example tree is down to 65
>>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>>> (
>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>>
>>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>>>>         import json
>>>>>>>>>         net = NetworkGraph()
>>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "",
>>>>>>>>> NodeType.site, 1000, 1000))
>>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "",
>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>>         net.prepareTree()
>>>>>>>>>         net.createNetworkJson()
>>>>>>>>>
>>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>>> The nice, readable format being:
>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>>>>
>>>>>>>>> That in turns gives you the example network:
>>>>>>>>> [image: image.png]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>>> herberticus@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Dave: I love those Gource animations! Game development is my
>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make
>>>>>>>>>> the glow "just right". :-)
>>>>>>>>>>
>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>>
>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>>
>>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>>> from discovery.
>>>>>>>>>>>
>>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing
>>>>>>>>>>>> this.
>>>>>>>>>>>> Being able to map out these complex integrations will help ISPs
>>>>>>>>>>>> a ton, and I really like that it is sharing common features between the
>>>>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Robert
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I started by implementing a graph structure that mirrors both
>>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>>
>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>>>>    sites
>>>>>>>>>>>>>
>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I fed our network into the current graph generator, and
>>>>>>>>>>>>> used graphviz to spit out a directed graph:
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Robert Chacón
>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Chacón
>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>> Dev | LibreQoS.io
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Chacón
>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>> Dev | LibreQoS.io
>>>>>>
>>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Chacón
>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>> Dev | LibreQoS.io
>>>>
>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> This song goes out to all the folk that thought Stadia would work:
>>>
>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>> Dave Täht CEO, TekLibre, LLC
>>>
>>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
Dev | LibreQoS.io

[-- Attachment #1.2: Type: text/html, Size: 50992 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  0:15                       ` Dave Taht
  2022-10-31  1:15                         ` Robert Chacón
@ 2022-10-31  1:26                         ` Herbert Wolverson
  2022-10-31  1:36                           ` Herbert Wolverson
  1 sibling, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-31  1:26 UTC (permalink / raw)
  To: Dave Taht; +Cc: Robert Chacón, libreqos


[-- Attachment #1.1: Type: text/plain, Size: 30812 bytes --]

> "average" of "what"?

Mean TCP RTT times, as measured by pping-cpumap. There's two steps of
improvement; the original "pping" started to eat a bunch of CPU at higher
traffic levels, and I had a feeling - not entirely quantified - that the
excess CPU usage was causing some latency. Switching to pping-cpumap showed
that I was correct in my hunch. On top of that,as Robert had observed, the
previous version was causing a slight "stutter" when it filled the tracking
buffers (and then recovered fine). My most recent build scales the tracking
buffers up a LOT - which I was worried would cause some slowdown (since the
program is now searching a much larger hashmap space, making it less cache
friendly). The buffer increase fixed up the stutter issue. I probably
should have been a little more clear on what I was talking about. I'm still
trying to figure out the optimal buffer size, and the optimal stats
collection (which "resets" the buffers, eliminating any resource depletion)
period.

I'm also experimenting with a few other ideas to keep the measurement
latency more consistent. I tried "dump it all into a perfmap and figure it
out in userspace" which went spectacularly badly. :-|

The RTT measurements are from the customer to whatever the heck they are
using on the Internet. So customers using a slow service that's
bottlenecked far outside of my control will negatively affect the results -
but there's nothing I can do about that. Coincidentally, it's the same
"QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself
included) tend to have a "feel" for it. If I remember rightly, Preseem
(which is basically fq-codel queues per customer, with an optional layer of
AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms
as "red" - and a lot of WISPs have become used to that grading. I always
thought that an average of 70ms seemed pretty excessive to be "good". The
idea is that it's quantifying the customer's *experience* - the lower the
average, the snappier the connection "feels". You can have a pretty happy
customer with very low latency and a low speed plan, if they aren't doing
anything that needs to exhaust their speed plan. (This contrasts with a lot
of other solutions - notably Sandvine - which have always focused heavily
on "how much less upsteam does the ISP need to buy?")

On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht@gmail.com> wrote:

>
>
> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com>
> wrote:
>
>> > For starters, let me also offer praise for this work which is so ahead
>> of schedule!
>>
>> Thank you. I'm enjoying a short period while I wait for my editor to
>> finish up with a couple of chapters of my next book (working title More
>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
>> of game development).
>>
>
> cool. I'm 32 years into my PHD thesis.
>
>
>>
>> I think at least initially, the primary focus is on what WISPs are used
>> to (and ask for): a fat shaper box that sits between a WISP and their
>> Internet connection(s). Usually in the topology: (router connected to
>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's
>> network as a whole). That's a simplification; there's usually a bypass (in
>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections
>> that need shaping, etc. That's how Preseem (and the others) tend to insert
>> themselves - shape everything on the way out.
>>
>
> Presently LibreQos appears to be inserting about 200us of delay into the
> path, for the sparsest packets. Every box on the path adds
> delay, though cut-through switches are common. Don't talk to me about
> network slicing and disaggregated this or that in the 3GPP world, tho...
> ugh.
>
> I guess, for every "box" (or virtual machine) on the path I have amdah's
> law stuck in my head.
>
> This is in part why the K8 crowd makes me a little crazy.
>
>
>>
>> I think there's a lot to be said for the possibility of LibreQoS at
>> towers that need it the most, also. That might require a bit of MPLS
>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier
>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS
>> headers?), but has the potential to really clean things up. Especially for
>> a really busy tower site. (On a similar note, WISPs with multiple Internet
>> connections at different sites would benefit from LibreQoS on each of
>> them).
>>
>> Generally, the QoS box doesn't really care what you are running in the
>> way of a router.
>>
>
> It is certainly simpler to have a transparent middlebox for this stuff,
> initially, and it would take a great leap of faith,
> for many, to just plug in a lqos box as the main box... but cumulus did
> succeed at a lot of that... they open sourced a bfd daemon... numerous
> other tools...
>
> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/
>
>
>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in
>> the mix too!), I know of people who love Juniper, use Cisco, etc. Since
>> we're shaping in the "router sandwich" (which can be one router with a bit
>> of care), we don't necessarily need to worry too much about their innards.
>>
>>
> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff,
> or a pair perhaps using a fiber optic splitter for failover
>
> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos
>
>
>
>
>> With that said, some future SNMP support (please, not polling everything
>> all the time... that's a monitoring program's job!) is probably hard to
>> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to
>> be trying to cease supporting  it, ugh)
>>
>>
> Building on this initial core strength - sampling RTT - would be a
> differentiator.
>
> Examples:
>
> RTT per AP
> RTT P1 per AP (what's the effective minimum)
> RTT P99 (what's the worst case?)
> RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS number
> or /24
>
> (variance is a very important concept)
>
>
>
>
>
>> I could see some support for outputting rules for routers, especially if
>> the goal is to get Cake managing buffer-bloat in many places down the line.
>>
>> Incidentally, using my latest build of cpumap-pping (and no separate
>> pping running, eating a CPU) my average network latency has dropped to 24ms
>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
>> customer traffic through the system. :-)
>>
>
> OK, this is something that "triggers" my inner pedant. Forgive me in
> advance?
>
> "average" of "what"?
>
> Changing the monitoring tool shouldn't have affected the average latency,
> unless how it is calculated is different, or the sample
> population (more likely) has changed. If you are tracking now far more
> short flows, the observed latency will decline, but the
> higher latencies you were observing in the first place are still there.
>
> Also... between where and where? Across the network? To the customer to
> their typical set of IP addresses of their servers?
> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
> should take under 2ms). Wifi hops at the end of the link are
> probably adding the most delay...
>
> If you consider 24ms "good" - however you calculate -  going for ever less
> via whatever means can be obtained from these
> analyses, is useful. But there are some things I don't think make as much
> sense as they used to - a netflix cache hitrate must
> be so low nowadays as to cost you just as much to fetch it from upstream
> than host a box...
>
>
>
>
>>
>>
>>
>>
>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote:
>>
>>> For starters, let me also offer praise for this work which is so ahead
>>> of schedule!
>>>
>>> I am (perhaps cluelessly) thinking about bigger pictures, and still
>>> stuck in my mindset involving distributing the packet processing,
>>> and representing the network topology, plans and compensating for the
>>> physics.
>>>
>>> So you have a major tower, a separate libreqos instance goes there. Or
>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>>> there. Or are you basically thinking one device rules them all and off the
>>> only interface, shapes them?
>>>
>>> Or:
>>>
>>> You have another pop with a separate connection to the internet that you
>>> inherited from a buyout, or you wanted physical redundancy for your BGP
>>> AS's internet access, maybe just between DCs in the same town or...
>>>     ____________________________________________
>>>
>>> /
>>> /
>>> cloud -> pop -> customers - customers <- pop <- cloud
>>>                  \  ----- leased fiber or wireless   /
>>>
>>>
>>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>>> cisco? mikrotik, and what role and services that is expected to have.
>>>
>>>
>>>
>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
>>>> added.
>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>> side of things, so it can be used in other integrations also. I don't have
>>>> a setup on which to test it, but if I'm reading the code right then the
>>>> unit test is testing it appropriately.
>>>>
>>>> Fantastic.
>>>>
>>>> > excludeSites is supported as a common API feature. If a node is added
>>>> with a name that matches an excluded site, it won't be added. The tree
>>>> builder is smart enough to replace invalid "parentId" references with the
>>>> shaper root, so if you have other tree items that rely on this site - they
>>>> will be added to the tree. Was that the intent? (It looks pretty useful; we
>>>> have a child site down the tree with a HUGE amount of load, and bumping it
>>>> to the top-level with excludeSites would probably help our load balancing
>>>> quite a bit)
>>>>
>>>> Very cool approach, I like it! Yeah we have some cases where we need to
>>>> balance out high load child nodes across CPUs so that's perfect.
>>>> Originally I thought of it to just exclude sites that don't fit into
>>>> the shaped topology but this approach is more useful.
>>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>>> That functionality of distributing across top level nodes / cpu cores seems
>>>> more important anyway.
>>>>
>>>> >exceptionCPEs is also supported as a common API feature. It simply
>>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>>> potentially useful feature; if I got excludeSites the wrong away around,
>>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>>
>>>> Awesome
>>>>
>>>> > UISP integration now supports a "flat" topology option (set via
>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>> to include this entry.
>>>>
>>>> Nice!
>>>>
>>>> > I'll look and see how much of the Spylnx code I can shorten with the
>>>> new API; I don't have a Spylnx setup to test against, making that tricky.
>>>>
>>>> I'll send you the Splynx login they gave us.
>>>>
>>>> > I *think* the new API should shorten things a lot. I think routers
>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>> to model topology.*
>>>>
>>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>>> even AP designation, one person I spoke to said they track corresponding
>>>> APs in radius anyway. So for now the flat model may be fine.
>>>>
>>>> > I need to clean things up a bit (there's still a bit of duplicated
>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>> think I can have this as a PR for you on Monday.
>>>>
>>>> This is really great work and will make future integrations much
>>>> cleaner and nicer to work with. Thank you!
>>>>
>>>>
>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> Alright, the UISP side of the common integrations is pretty much
>>>>> feature complete. I'll update the tracking issue in a bit.
>>>>>
>>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>>>>    not added.
>>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>>    then the unit test is testing it appropriately.
>>>>>    - excludeSites is supported as a common API feature. If a node is
>>>>>    added with a name that matches an excluded site, it won't be added. The
>>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>>    balancing quite a bit)
>>>>>       - If the intent was to exclude the site and everything
>>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>>       quite clear.
>>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>>    - UISP integration now supports a "flat" topology option (set via
>>>>>    uispStrategy = "flat" in ispConfig). I expanded
>>>>>    ispConfig.example.py to include this entry.
>>>>>
>>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>>> *think* the new API should shorten things a lot. I think routers act
>>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup
>>>>> should be a little shorter (the CSV code can be replaced with a call to the
>>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>>>>> RADIUS call in the DHCP server; I've always assumed that's because those
>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>> to model topology.*
>>>>>
>>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>> think I can have this as a PR for you on Monday.
>>>>>
>>>>> * - The first big wireless network I setup used a Motorola WiMAX
>>>>> setup. They *required* that every single AP share two VLANs
>>>>> (management and bearer) with every other AP - all the way to the core. It
>>>>> kinda worked once they remembered client isolation was a thing in a
>>>>> patch... Then again, their installation instructions included connecting
>>>>> two ports of a router together with a jumper cable, because their localhost
>>>>> implementation didn't quite work. :-|
>>>>>
>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>
>>>>>> Awesome work. It succeeded in building the topology and creating
>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>>> being empty lists [].
>>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>>> class counter increment toward the 32k limit faster.
>>>>>> Do you think perhaps we should check:
>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>>> Open to your suggestions there.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>>> time. :-)
>>>>>>>
>>>>>>> The tree at
>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>>
>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>>> exceptionCPE and site exclusion.
>>>>>>>
>>>>>>> If you want the pretty graphics, you need to "pip install graphviz"
>>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't
>>>>>>> present and not try to draw pictures, otherwise.
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>>
>>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>>
>>>>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>>>>    purposes:
>>>>>>>>>       - It's hard to be psychic and know for sure where the
>>>>>>>>>       shaper is in the network.
>>>>>>>>>       - You could run multiple shapers at different egress
>>>>>>>>>       points, with failover - and rebuild the entire topology from the point of
>>>>>>>>>       view of a network node.
>>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>>       the client site bandwidth.
>>>>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>>>>       that really should be there.
>>>>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>>>>    createNetworkJson() are called.
>>>>>>>>>       - There's a unit test that generates the
>>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>>       they match.
>>>>>>>>>    - Unit test coverage hits every function in the graph system,
>>>>>>>>>    now.
>>>>>>>>>
>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic
>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate
>>>>>>>>> the example tree is down to 65
>>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>>> (
>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>>
>>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>>>>         import json
>>>>>>>>>         net = NetworkGraph()
>>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "",
>>>>>>>>> NodeType.site, 1000, 1000))
>>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "",
>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>>         net.prepareTree()
>>>>>>>>>         net.createNetworkJson()
>>>>>>>>>
>>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>>> The nice, readable format being:
>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>>>>
>>>>>>>>> That in turns gives you the example network:
>>>>>>>>> [image: image.png]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>>> herberticus@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Dave: I love those Gource animations! Game development is my
>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make
>>>>>>>>>> the glow "just right". :-)
>>>>>>>>>>
>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>>
>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>>
>>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>>> from discovery.
>>>>>>>>>>>
>>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing
>>>>>>>>>>>> this.
>>>>>>>>>>>> Being able to map out these complex integrations will help ISPs
>>>>>>>>>>>> a ton, and I really like that it is sharing common features between the
>>>>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Robert
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I started by implementing a graph structure that mirrors both
>>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>>
>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>>>>    sites
>>>>>>>>>>>>>
>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I fed our network into the current graph generator, and
>>>>>>>>>>>>> used graphviz to spit out a directed graph:
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Robert Chacón
>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Chacón
>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>> Dev | LibreQoS.io
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Chacón
>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>> Dev | LibreQoS.io
>>>>>>
>>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Chacón
>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>> Dev | LibreQoS.io
>>>>
>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> This song goes out to all the folk that thought Stadia would work:
>>>
>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>> Dave Täht CEO, TekLibre, LLC
>>>
>>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>

[-- Attachment #1.2: Type: text/html, Size: 51494 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  1:26                         ` Herbert Wolverson
@ 2022-10-31  1:36                           ` Herbert Wolverson
  2022-10-31  1:46                             ` Herbert Wolverson
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-31  1:36 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 32744 bytes --]

On a high-level, I've been playing with:

   - The brute force approach: have a bigger buffer, so exhaustion is less
   likely to ever happen.
   - A shared "config" flag that turns off monitoring once exhaustion is
   near - it costs one synchronized lookup/increment, and gets reset when you
   read the stats.
   - Per-CPU buffers for the very volatile data, which is generally faster
   (at the expense of RAM) - but is also quite hard to manage from userspace.
   It significantly reduces the likelihood of stalling, but I'm not fond of
   the complexity so far.
   - Replacing the volatile "packet buffer" with a "least recently used"
   map that automatically gets rid of old data if it isn't cleaned up (the
   original only cleans up when a TCP connection closes gracefully)
   - Maintaining two sets of buffers and keeping a pointer to each. A
   shared config variable indicates whether we are currently writing to A or
   B. "Cleanup" cleans the *other* buffer and switches the pointers. So
   we're never sharing "hot" data with a userland cleanup.

That's a lot to play with, so I'm taking my time. My gut likes the A/B
switch, currently.

On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson <herberticus@gmail.com>
wrote:

> > "average" of "what"?
>
> Mean TCP RTT times, as measured by pping-cpumap. There's two steps of
> improvement; the original "pping" started to eat a bunch of CPU at higher
> traffic levels, and I had a feeling - not entirely quantified - that the
> excess CPU usage was causing some latency. Switching to pping-cpumap showed
> that I was correct in my hunch. On top of that,as Robert had observed, the
> previous version was causing a slight "stutter" when it filled the tracking
> buffers (and then recovered fine). My most recent build scales the tracking
> buffers up a LOT - which I was worried would cause some slowdown (since the
> program is now searching a much larger hashmap space, making it less cache
> friendly). The buffer increase fixed up the stutter issue. I probably
> should have been a little more clear on what I was talking about. I'm still
> trying to figure out the optimal buffer size, and the optimal stats
> collection (which "resets" the buffers, eliminating any resource depletion)
> period.
>
> I'm also experimenting with a few other ideas to keep the measurement
> latency more consistent. I tried "dump it all into a perfmap and figure it
> out in userspace" which went spectacularly badly. :-|
>
> The RTT measurements are from the customer to whatever the heck they are
> using on the Internet. So customers using a slow service that's
> bottlenecked far outside of my control will negatively affect the results -
> but there's nothing I can do about that. Coincidentally, it's the same
> "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself
> included) tend to have a "feel" for it. If I remember rightly, Preseem
> (which is basically fq-codel queues per customer, with an optional layer of
> AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms
> as "red" - and a lot of WISPs have become used to that grading. I always
> thought that an average of 70ms seemed pretty excessive to be "good". The
> idea is that it's quantifying the customer's *experience* - the lower the
> average, the snappier the connection "feels". You can have a pretty happy
> customer with very low latency and a low speed plan, if they aren't doing
> anything that needs to exhaust their speed plan. (This contrasts with a lot
> of other solutions - notably Sandvine - which have always focused heavily
> on "how much less upsteam does the ISP need to buy?")
>
> On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht@gmail.com> wrote:
>
>>
>>
>> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com>
>> wrote:
>>
>>> > For starters, let me also offer praise for this work which is so ahead
>>> of schedule!
>>>
>>> Thank you. I'm enjoying a short period while I wait for my editor to
>>> finish up with a couple of chapters of my next book (working title More
>>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
>>> of game development).
>>>
>>
>> cool. I'm 32 years into my PHD thesis.
>>
>>
>>>
>>> I think at least initially, the primary focus is on what WISPs are used
>>> to (and ask for): a fat shaper box that sits between a WISP and their
>>> Internet connection(s). Usually in the topology: (router connected to
>>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's
>>> network as a whole). That's a simplification; there's usually a bypass (in
>>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections
>>> that need shaping, etc. That's how Preseem (and the others) tend to insert
>>> themselves - shape everything on the way out.
>>>
>>
>> Presently LibreQos appears to be inserting about 200us of delay into the
>> path, for the sparsest packets. Every box on the path adds
>> delay, though cut-through switches are common. Don't talk to me about
>> network slicing and disaggregated this or that in the 3GPP world, tho...
>> ugh.
>>
>> I guess, for every "box" (or virtual machine) on the path I have amdah's
>> law stuck in my head.
>>
>> This is in part why the K8 crowd makes me a little crazy.
>>
>>
>>>
>>> I think there's a lot to be said for the possibility of LibreQoS at
>>> towers that need it the most, also. That might require a bit of MPLS
>>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier
>>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS
>>> headers?), but has the potential to really clean things up. Especially for
>>> a really busy tower site. (On a similar note, WISPs with multiple Internet
>>> connections at different sites would benefit from LibreQoS on each of
>>> them).
>>>
>>> Generally, the QoS box doesn't really care what you are running in the
>>> way of a router.
>>>
>>
>> It is certainly simpler to have a transparent middlebox for this stuff,
>> initially, and it would take a great leap of faith,
>> for many, to just plug in a lqos box as the main box... but cumulus did
>> succeed at a lot of that... they open sourced a bfd daemon... numerous
>> other tools...
>>
>> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/
>>
>>
>>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco
>>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since
>>> we're shaping in the "router sandwich" (which can be one router with a bit
>>> of care), we don't necessarily need to worry too much about their innards.
>>>
>>>
>> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff,
>> or a pair perhaps using a fiber optic splitter for failover
>>
>> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos
>>
>>
>>
>>
>>> With that said, some future SNMP support (please, not polling everything
>>> all the time... that's a monitoring program's job!) is probably hard to
>>> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to
>>> be trying to cease supporting  it, ugh)
>>>
>>>
>> Building on this initial core strength - sampling RTT - would be a
>> differentiator.
>>
>> Examples:
>>
>> RTT per AP
>> RTT P1 per AP (what's the effective minimum)
>> RTT P99 (what's the worst case?)
>> RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS
>> number or /24
>>
>> (variance is a very important concept)
>>
>>
>>
>>
>>
>>> I could see some support for outputting rules for routers, especially if
>>> the goal is to get Cake managing buffer-bloat in many places down the line.
>>>
>>> Incidentally, using my latest build of cpumap-pping (and no separate
>>> pping running, eating a CPU) my average network latency has dropped to 24ms
>>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
>>> customer traffic through the system. :-)
>>>
>>
>> OK, this is something that "triggers" my inner pedant. Forgive me in
>> advance?
>>
>> "average" of "what"?
>>
>> Changing the monitoring tool shouldn't have affected the average latency,
>> unless how it is calculated is different, or the sample
>> population (more likely) has changed. If you are tracking now far more
>> short flows, the observed latency will decline, but the
>> higher latencies you were observing in the first place are still there.
>>
>> Also... between where and where? Across the network? To the customer to
>> their typical set of IP addresses of their servers?
>> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
>> should take under 2ms). Wifi hops at the end of the link are
>> probably adding the most delay...
>>
>> If you consider 24ms "good" - however you calculate -  going for ever
>> less via whatever means can be obtained from these
>> analyses, is useful. But there are some things I don't think make as much
>> sense as they used to - a netflix cache hitrate must
>> be so low nowadays as to cost you just as much to fetch it from upstream
>> than host a box...
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>>> For starters, let me also offer praise for this work which is so ahead
>>>> of schedule!
>>>>
>>>> I am (perhaps cluelessly) thinking about bigger pictures, and still
>>>> stuck in my mindset involving distributing the packet processing,
>>>> and representing the network topology, plans and compensating for the
>>>> physics.
>>>>
>>>> So you have a major tower, a separate libreqos instance goes there. Or
>>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>>>> there. Or are you basically thinking one device rules them all and off the
>>>> only interface, shapes them?
>>>>
>>>> Or:
>>>>
>>>> You have another pop with a separate connection to the internet that
>>>> you inherited from a buyout, or you wanted physical redundancy for your BGP
>>>> AS's internet access, maybe just between DCs in the same town or...
>>>>     ____________________________________________
>>>>
>>>> /
>>>> /
>>>> cloud -> pop -> customers - customers <- pop <- cloud
>>>>                  \  ----- leased fiber or wireless   /
>>>>
>>>>
>>>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>>>> cisco? mikrotik, and what role and services that is expected to have.
>>>>
>>>>
>>>>
>>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>
>>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
>>>>> added.
>>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>>> side of things, so it can be used in other integrations also. I don't have
>>>>> a setup on which to test it, but if I'm reading the code right then the
>>>>> unit test is testing it appropriately.
>>>>>
>>>>> Fantastic.
>>>>>
>>>>> > excludeSites is supported as a common API feature. If a node is
>>>>> added with a name that matches an excluded site, it won't be added. The
>>>>> tree builder is smart enough to replace invalid "parentId" references with
>>>>> the shaper root, so if you have other tree items that rely on this site -
>>>>> they will be added to the tree. Was that the intent? (It looks pretty
>>>>> useful; we have a child site down the tree with a HUGE amount of load, and
>>>>> bumping it to the top-level with excludeSites would probably help our load
>>>>> balancing quite a bit)
>>>>>
>>>>> Very cool approach, I like it! Yeah we have some cases where we need
>>>>> to balance out high load child nodes across CPUs so that's perfect.
>>>>> Originally I thought of it to just exclude sites that don't fit into
>>>>> the shaped topology but this approach is more useful.
>>>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>>>> That functionality of distributing across top level nodes / cpu cores seems
>>>>> more important anyway.
>>>>>
>>>>> >exceptionCPEs is also supported as a common API feature. It simply
>>>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>>>> potentially useful feature; if I got excludeSites the wrong away around,
>>>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>>>
>>>>> Awesome
>>>>>
>>>>> > UISP integration now supports a "flat" topology option (set via
>>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>>> to include this entry.
>>>>>
>>>>> Nice!
>>>>>
>>>>> > I'll look and see how much of the Spylnx code I can shorten with the
>>>>> new API; I don't have a Spylnx setup to test against, making that tricky.
>>>>>
>>>>> I'll send you the Splynx login they gave us.
>>>>>
>>>>> > I *think* the new API should shorten things a lot. I think routers
>>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>> to model topology.*
>>>>>
>>>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>>>> even AP designation, one person I spoke to said they track corresponding
>>>>> APs in radius anyway. So for now the flat model may be fine.
>>>>>
>>>>> > I need to clean things up a bit (there's still a bit of duplicated
>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>> think I can have this as a PR for you on Monday.
>>>>>
>>>>> This is really great work and will make future integrations much
>>>>> cleaner and nicer to work with. Thank you!
>>>>>
>>>>>
>>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> Alright, the UISP side of the common integrations is pretty much
>>>>>> feature complete. I'll update the tracking issue in a bit.
>>>>>>
>>>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6)
>>>>>>    are not added.
>>>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>>>    then the unit test is testing it appropriately.
>>>>>>    - excludeSites is supported as a common API feature. If a node is
>>>>>>    added with a name that matches an excluded site, it won't be added. The
>>>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>>>    balancing quite a bit)
>>>>>>       - If the intent was to exclude the site and everything
>>>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>>>       quite clear.
>>>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>>>    - UISP integration now supports a "flat" topology option (set via
>>>>>>    uispStrategy = "flat" in ispConfig). I expanded
>>>>>>    ispConfig.example.py to include this entry.
>>>>>>
>>>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>>>> *think* the new API should shorten things a lot. I think routers act
>>>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup
>>>>>> should be a little shorter (the CSV code can be replaced with a call to the
>>>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>>>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>>>>>> RADIUS call in the DHCP server; I've always assumed that's because those
>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>>> to model topology.*
>>>>>>
>>>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>>> think I can have this as a PR for you on Monday.
>>>>>>
>>>>>> * - The first big wireless network I setup used a Motorola WiMAX
>>>>>> setup. They *required* that every single AP share two VLANs
>>>>>> (management and bearer) with every other AP - all the way to the core. It
>>>>>> kinda worked once they remembered client isolation was a thing in a
>>>>>> patch... Then again, their installation instructions included connecting
>>>>>> two ports of a router together with a jumper cable, because their localhost
>>>>>> implementation didn't quite work. :-|
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>
>>>>>>> Awesome work. It succeeded in building the topology and creating
>>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>>>> being empty lists [].
>>>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>>>> class counter increment toward the 32k limit faster.
>>>>>>> Do you think perhaps we should check:
>>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>>>> Open to your suggestions there.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>
>>>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>>>> time. :-)
>>>>>>>>
>>>>>>>> The tree at
>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>>>
>>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>>>> exceptionCPE and site exclusion.
>>>>>>>>
>>>>>>>> If you want the pretty graphics, you need to "pip install graphviz"
>>>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't
>>>>>>>> present and not try to draw pictures, otherwise.
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>>>
>>>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>
>>>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>>>
>>>>>>>>>>    - You can specify a variable in ispConfig.py named
>>>>>>>>>>    "uispSite". This sets where in the topology you want the tree to start.
>>>>>>>>>>    This has two purposes:
>>>>>>>>>>       - It's hard to be psychic and know for sure where the
>>>>>>>>>>       shaper is in the network.
>>>>>>>>>>       - You could run multiple shapers at different egress
>>>>>>>>>>       points, with failover - and rebuild the entire topology from the point of
>>>>>>>>>>       view of a network node.
>>>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>>>       the client site bandwidth.
>>>>>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>>>>>       that really should be there.
>>>>>>>>>>    - Network.json generation (not the shaped devices file yet)
>>>>>>>>>>    is automatically generated from a tree, once PrepareTree() and
>>>>>>>>>>    createNetworkJson() are called.
>>>>>>>>>>       - There's a unit test that generates the
>>>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>>>       they match.
>>>>>>>>>>    - Unit test coverage hits every function in the graph system,
>>>>>>>>>>    now.
>>>>>>>>>>
>>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic
>>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate
>>>>>>>>>> the example tree is down to 65
>>>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>>>> (
>>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>>>
>>>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>>>>>         import json
>>>>>>>>>>         net = NetworkGraph()
>>>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "",
>>>>>>>>>> NodeType.site, 1000, 1000))
>>>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "",
>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>>>         net.prepareTree()
>>>>>>>>>>         net.createNetworkJson()
>>>>>>>>>>
>>>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>>>> The nice, readable format being:
>>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>>>>>
>>>>>>>>>> That in turns gives you the example network:
>>>>>>>>>> [image: image.png]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>>>> herberticus@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Dave: I love those Gource animations! Game development is my
>>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make
>>>>>>>>>>> the glow "just right". :-)
>>>>>>>>>>>
>>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>>>
>>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>>>
>>>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>>>> from discovery.
>>>>>>>>>>>>
>>>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing
>>>>>>>>>>>>> this.
>>>>>>>>>>>>> Being able to map out these complex integrations will help
>>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between
>>>>>>>>>>>>> the Splynx and UISP integrations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Robert
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS
>>>>>>>>>>>>> <libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I started by implementing a graph structure that mirrors both
>>>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>>>>>    sites
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I fed our network into the current graph generator, and
>>>>>>>>>>>>>> used graphviz to spit out a directed graph:
>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Robert Chacón
>>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Chacón
>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>> Dev | LibreQoS.io
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>> LibreQoS mailing list
>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Chacón
>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>> Dev | LibreQoS.io
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> Dev | LibreQoS.io
>>>>>
>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS@lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> This song goes out to all the folk that thought Stadia would work:
>>>>
>>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>>> Dave Täht CEO, TekLibre, LLC
>>>>
>>>
>>
>> --
>> This song goes out to all the folk that thought Stadia would work:
>>
>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>> Dave Täht CEO, TekLibre, LLC
>>
>

[-- Attachment #1.2: Type: text/html, Size: 53131 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  1:36                           ` Herbert Wolverson
@ 2022-10-31  1:46                             ` Herbert Wolverson
  2022-10-31  2:21                               ` Dave Taht
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-31  1:46 UTC (permalink / raw)
  Cc: libreqos


[-- Attachment #1.1: Type: text/plain, Size: 35053 bytes --]

While I remember, a quick Preseem anecdote. The majority of WISPs I've
talked to who have adopted Preseem run it in "monitor only" mode for a bit,
and then turn it on. That way, you can see that it did something. Not a bad
idea for us to support. It's *remarkable* how many WISPs see a sea of red
when they first start monitoring - 100ms+ RTT times (for whatever customer
traffic exists) is pretty common. Just enabling FQ_CODEL, mapped to the
customer's speed limit, tends to start bringing things down into the
green/yellow. I begged them for Cake a few times (along with the ability to
set site/backhaul hierarchies) - and was always told "it's not worth the
extra CPU load". Our experience, turning on BracketQoS (which is basically
LibreQoS, in Rust and designed for our network) was that the remaining reds
became yellows, the remaining yellows became green and customers reported a
"snappier" experience. It's so hard to quantify the latter. I could feel
the difference at my desk; fire up a video while a download was running,
and it simply "felt" like it responded better. TCP RTT times are the best
measure of "feel" I've found, so far.

We've tended to go with "median" latency as a guide, rather than mean.
Thanks to monitoring things beyond our control, some of the outliers tend
to be *really bad* - even if the network is fine. There's literally nothing
we can do about a customer trying to work with a malfunctioning system
somewhere (in space, for all I know!)

On Sun, Oct 30, 2022 at 8:36 PM Herbert Wolverson <herberticus@gmail.com>
wrote:

> On a high-level, I've been playing with:
>
>    - The brute force approach: have a bigger buffer, so exhaustion is
>    less likely to ever happen.
>    - A shared "config" flag that turns off monitoring once exhaustion is
>    near - it costs one synchronized lookup/increment, and gets reset when you
>    read the stats.
>    - Per-CPU buffers for the very volatile data, which is generally
>    faster (at the expense of RAM) - but is also quite hard to manage from
>    userspace. It significantly reduces the likelihood of stalling, but I'm not
>    fond of the complexity so far.
>    - Replacing the volatile "packet buffer" with a "least recently used"
>    map that automatically gets rid of old data if it isn't cleaned up (the
>    original only cleans up when a TCP connection closes gracefully)
>    - Maintaining two sets of buffers and keeping a pointer to each. A
>    shared config variable indicates whether we are currently writing to A or
>    B. "Cleanup" cleans the *other* buffer and switches the pointers. So
>    we're never sharing "hot" data with a userland cleanup.
>
> That's a lot to play with, so I'm taking my time. My gut likes the A/B
> switch, currently.
>
> On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson <herberticus@gmail.com>
> wrote:
>
>> > "average" of "what"?
>>
>> Mean TCP RTT times, as measured by pping-cpumap. There's two steps of
>> improvement; the original "pping" started to eat a bunch of CPU at higher
>> traffic levels, and I had a feeling - not entirely quantified - that the
>> excess CPU usage was causing some latency. Switching to pping-cpumap showed
>> that I was correct in my hunch. On top of that,as Robert had observed, the
>> previous version was causing a slight "stutter" when it filled the tracking
>> buffers (and then recovered fine). My most recent build scales the tracking
>> buffers up a LOT - which I was worried would cause some slowdown (since the
>> program is now searching a much larger hashmap space, making it less cache
>> friendly). The buffer increase fixed up the stutter issue. I probably
>> should have been a little more clear on what I was talking about. I'm still
>> trying to figure out the optimal buffer size, and the optimal stats
>> collection (which "resets" the buffers, eliminating any resource depletion)
>> period.
>>
>> I'm also experimenting with a few other ideas to keep the measurement
>> latency more consistent. I tried "dump it all into a perfmap and figure it
>> out in userspace" which went spectacularly badly. :-|
>>
>> The RTT measurements are from the customer to whatever the heck they are
>> using on the Internet. So customers using a slow service that's
>> bottlenecked far outside of my control will negatively affect the results -
>> but there's nothing I can do about that. Coincidentally, it's the same
>> "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself
>> included) tend to have a "feel" for it. If I remember rightly, Preseem
>> (which is basically fq-codel queues per customer, with an optional layer of
>> AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms
>> as "red" - and a lot of WISPs have become used to that grading. I always
>> thought that an average of 70ms seemed pretty excessive to be "good". The
>> idea is that it's quantifying the customer's *experience* - the lower
>> the average, the snappier the connection "feels". You can have a pretty
>> happy customer with very low latency and a low speed plan, if they aren't
>> doing anything that needs to exhaust their speed plan. (This contrasts with
>> a lot of other solutions - notably Sandvine - which have always focused
>> heavily on "how much less upsteam does the ISP need to buy?")
>>
>> On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht@gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com>
>>> wrote:
>>>
>>>> > For starters, let me also offer praise for this work which is so
>>>> ahead of schedule!
>>>>
>>>> Thank you. I'm enjoying a short period while I wait for my editor to
>>>> finish up with a couple of chapters of my next book (working title More
>>>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
>>>> of game development).
>>>>
>>>
>>> cool. I'm 32 years into my PHD thesis.
>>>
>>>
>>>>
>>>> I think at least initially, the primary focus is on what WISPs are used
>>>> to (and ask for): a fat shaper box that sits between a WISP and their
>>>> Internet connection(s). Usually in the topology: (router connected to
>>>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's
>>>> network as a whole). That's a simplification; there's usually a bypass (in
>>>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections
>>>> that need shaping, etc. That's how Preseem (and the others) tend to insert
>>>> themselves - shape everything on the way out.
>>>>
>>>
>>> Presently LibreQos appears to be inserting about 200us of delay into the
>>> path, for the sparsest packets. Every box on the path adds
>>> delay, though cut-through switches are common. Don't talk to me about
>>> network slicing and disaggregated this or that in the 3GPP world, tho...
>>> ugh.
>>>
>>> I guess, for every "box" (or virtual machine) on the path I have amdah's
>>> law stuck in my head.
>>>
>>> This is in part why the K8 crowd makes me a little crazy.
>>>
>>>
>>>>
>>>> I think there's a lot to be said for the possibility of LibreQoS at
>>>> towers that need it the most, also. That might require a bit of MPLS
>>>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier
>>>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS
>>>> headers?), but has the potential to really clean things up. Especially for
>>>> a really busy tower site. (On a similar note, WISPs with multiple Internet
>>>> connections at different sites would benefit from LibreQoS on each of
>>>> them).
>>>>
>>>> Generally, the QoS box doesn't really care what you are running in the
>>>> way of a router.
>>>>
>>>
>>> It is certainly simpler to have a transparent middlebox for this stuff,
>>> initially, and it would take a great leap of faith,
>>> for many, to just plug in a lqos box as the main box... but cumulus did
>>> succeed at a lot of that... they open sourced a bfd daemon... numerous
>>> other tools...
>>>
>>> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/
>>>
>>>
>>>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco
>>>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since
>>>> we're shaping in the "router sandwich" (which can be one router with a bit
>>>> of care), we don't necessarily need to worry too much about their innards.
>>>>
>>>>
>>> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff,
>>> or a pair perhaps using a fiber optic splitter for failover
>>>
>>> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos
>>>
>>>
>>>
>>>
>>>> With that said, some future SNMP support (please, not polling
>>>> everything all the time... that's a monitoring program's job!) is probably
>>>> hard to avoid. At least that's relatively vendor agnostic (even if Ubiquiti
>>>> seem to be trying to cease supporting  it, ugh)
>>>>
>>>>
>>> Building on this initial core strength - sampling RTT - would be a
>>> differentiator.
>>>
>>> Examples:
>>>
>>> RTT per AP
>>> RTT P1 per AP (what's the effective minimum)
>>> RTT P99 (what's the worst case?)
>>> RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS
>>> number or /24
>>>
>>> (variance is a very important concept)
>>>
>>>
>>>
>>>
>>>
>>>> I could see some support for outputting rules for routers, especially
>>>> if the goal is to get Cake managing buffer-bloat in many places down the
>>>> line.
>>>>
>>>> Incidentally, using my latest build of cpumap-pping (and no separate
>>>> pping running, eating a CPU) my average network latency has dropped to 24ms
>>>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
>>>> customer traffic through the system. :-)
>>>>
>>>
>>> OK, this is something that "triggers" my inner pedant. Forgive me in
>>> advance?
>>>
>>> "average" of "what"?
>>>
>>> Changing the monitoring tool shouldn't have affected the average
>>> latency, unless how it is calculated is different, or the sample
>>> population (more likely) has changed. If you are tracking now far more
>>> short flows, the observed latency will decline, but the
>>> higher latencies you were observing in the first place are still there.
>>>
>>> Also... between where and where? Across the network? To the customer to
>>> their typical set of IP addresses of their servers?
>>> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
>>> should take under 2ms). Wifi hops at the end of the link are
>>> probably adding the most delay...
>>>
>>> If you consider 24ms "good" - however you calculate -  going for ever
>>> less via whatever means can be obtained from these
>>> analyses, is useful. But there are some things I don't think make as
>>> much sense as they used to - a netflix cache hitrate must
>>> be so low nowadays as to cost you just as much to fetch it from upstream
>>> than host a box...
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote:
>>>>
>>>>> For starters, let me also offer praise for this work which is so ahead
>>>>> of schedule!
>>>>>
>>>>> I am (perhaps cluelessly) thinking about bigger pictures, and still
>>>>> stuck in my mindset involving distributing the packet processing,
>>>>> and representing the network topology, plans and compensating for the
>>>>> physics.
>>>>>
>>>>> So you have a major tower, a separate libreqos instance goes there. Or
>>>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>>>>> there. Or are you basically thinking one device rules them all and off the
>>>>> only interface, shapes them?
>>>>>
>>>>> Or:
>>>>>
>>>>> You have another pop with a separate connection to the internet that
>>>>> you inherited from a buyout, or you wanted physical redundancy for your BGP
>>>>> AS's internet access, maybe just between DCs in the same town or...
>>>>>     ____________________________________________
>>>>>
>>>>> /
>>>>> /
>>>>> cloud -> pop -> customers - customers <- pop <- cloud
>>>>>                  \  ----- leased fiber or wireless   /
>>>>>
>>>>>
>>>>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>>>>> cisco? mikrotik, and what role and services that is expected to have.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are
>>>>>> not added.
>>>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>>>> side of things, so it can be used in other integrations also. I don't have
>>>>>> a setup on which to test it, but if I'm reading the code right then the
>>>>>> unit test is testing it appropriately.
>>>>>>
>>>>>> Fantastic.
>>>>>>
>>>>>> > excludeSites is supported as a common API feature. If a node is
>>>>>> added with a name that matches an excluded site, it won't be added. The
>>>>>> tree builder is smart enough to replace invalid "parentId" references with
>>>>>> the shaper root, so if you have other tree items that rely on this site -
>>>>>> they will be added to the tree. Was that the intent? (It looks pretty
>>>>>> useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>> bumping it to the top-level with excludeSites would probably help our load
>>>>>> balancing quite a bit)
>>>>>>
>>>>>> Very cool approach, I like it! Yeah we have some cases where we need
>>>>>> to balance out high load child nodes across CPUs so that's perfect.
>>>>>> Originally I thought of it to just exclude sites that don't fit into
>>>>>> the shaped topology but this approach is more useful.
>>>>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>>>>> That functionality of distributing across top level nodes / cpu cores seems
>>>>>> more important anyway.
>>>>>>
>>>>>> >exceptionCPEs is also supported as a common API feature. It simply
>>>>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>>>>> potentially useful feature; if I got excludeSites the wrong away around,
>>>>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>>>>
>>>>>> Awesome
>>>>>>
>>>>>> > UISP integration now supports a "flat" topology option (set via
>>>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>>>> to include this entry.
>>>>>>
>>>>>> Nice!
>>>>>>
>>>>>> > I'll look and see how much of the Spylnx code I can shorten with
>>>>>> the new API; I don't have a Spylnx setup to test against, making that
>>>>>> tricky.
>>>>>>
>>>>>> I'll send you the Splynx login they gave us.
>>>>>>
>>>>>> > I *think* the new API should shorten things a lot. I think routers
>>>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>>> to model topology.*
>>>>>>
>>>>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>>>>> even AP designation, one person I spoke to said they track corresponding
>>>>>> APs in radius anyway. So for now the flat model may be fine.
>>>>>>
>>>>>> > I need to clean things up a bit (there's still a bit of duplicated
>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>>> think I can have this as a PR for you on Monday.
>>>>>>
>>>>>> This is really great work and will make future integrations much
>>>>>> cleaner and nicer to work with. Thank you!
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> Alright, the UISP side of the common integrations is pretty much
>>>>>>> feature complete. I'll update the tracking issue in a bit.
>>>>>>>
>>>>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6)
>>>>>>>    are not added.
>>>>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>>>>    then the unit test is testing it appropriately.
>>>>>>>    - excludeSites is supported as a common API feature. If a node
>>>>>>>    is added with a name that matches an excluded site, it won't be added. The
>>>>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>>>>    balancing quite a bit)
>>>>>>>       - If the intent was to exclude the site and everything
>>>>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>>>>       quite clear.
>>>>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>>>>    - UISP integration now supports a "flat" topology option (set
>>>>>>>    via uispStrategy = "flat" in ispConfig). I expanded
>>>>>>>    ispConfig.example.py to include this entry.
>>>>>>>
>>>>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>>>>> *think* the new API should shorten things a lot. I think routers
>>>>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>>>> to model topology.*
>>>>>>>
>>>>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>>>> think I can have this as a PR for you on Monday.
>>>>>>>
>>>>>>> * - The first big wireless network I setup used a Motorola WiMAX
>>>>>>> setup. They *required* that every single AP share two VLANs
>>>>>>> (management and bearer) with every other AP - all the way to the core. It
>>>>>>> kinda worked once they remembered client isolation was a thing in a
>>>>>>> patch... Then again, their installation instructions included connecting
>>>>>>> two ports of a router together with a jumper cable, because their localhost
>>>>>>> implementation didn't quite work. :-|
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>>
>>>>>>>> Awesome work. It succeeded in building the topology and creating
>>>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>>>>> being empty lists [].
>>>>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>>>>> class counter increment toward the 32k limit faster.
>>>>>>>> Do you think perhaps we should check:
>>>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>>>>> Open to your suggestions there.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>>>>> time. :-)
>>>>>>>>>
>>>>>>>>> The tree at
>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>>>>
>>>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>>>>> exceptionCPE and site exclusion.
>>>>>>>>>
>>>>>>>>> If you want the pretty graphics, you need to "pip install
>>>>>>>>> graphviz" and "sudo apt install graphviz". It *should* detect that these
>>>>>>>>> aren't present and not try to draw pictures, otherwise.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>>>>> robert.chacon@jackrabbitwireless.com> wrote:
>>>>>>>>>
>>>>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>
>>>>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>>>>
>>>>>>>>>>>    - You can specify a variable in ispConfig.py named
>>>>>>>>>>>    "uispSite". This sets where in the topology you want the tree to start.
>>>>>>>>>>>    This has two purposes:
>>>>>>>>>>>       - It's hard to be psychic and know for sure where the
>>>>>>>>>>>       shaper is in the network.
>>>>>>>>>>>       - You could run multiple shapers at different egress
>>>>>>>>>>>       points, with failover - and rebuild the entire topology from the point of
>>>>>>>>>>>       view of a network node.
>>>>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>>>>       the client site bandwidth.
>>>>>>>>>>>       - Makes for easier trees, because we're inserting the
>>>>>>>>>>>       site that really should be there.
>>>>>>>>>>>    - Network.json generation (not the shaped devices file yet)
>>>>>>>>>>>    is automatically generated from a tree, once PrepareTree() and
>>>>>>>>>>>    createNetworkJson() are called.
>>>>>>>>>>>       - There's a unit test that generates the
>>>>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>>>>       they match.
>>>>>>>>>>>    - Unit test coverage hits every function in the graph
>>>>>>>>>>>    system, now.
>>>>>>>>>>>
>>>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic
>>>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate
>>>>>>>>>>> the example tree is down to 65
>>>>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>>>>> (
>>>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>>>>
>>>>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode,
>>>>>>>>>>> NodeType
>>>>>>>>>>>         import json
>>>>>>>>>>>         net = NetworkGraph()
>>>>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "",
>>>>>>>>>>> NodeType.site, 1000, 1000))
>>>>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "",
>>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>>>>         net.prepareTree()
>>>>>>>>>>>         net.createNetworkJson()
>>>>>>>>>>>
>>>>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>>>>> The nice, readable format being:
>>>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type
>>>>>>>>>>> =NodeType.site, download=1000, upload=1000)
>>>>>>>>>>>
>>>>>>>>>>> That in turns gives you the example network:
>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>>>>> herberticus@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dave: I love those Gource animations! Game development is my
>>>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make
>>>>>>>>>>>> the glow "just right". :-)
>>>>>>>>>>>>
>>>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>>>>
>>>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like
>>>>>>>>>>>> a database for a ton of external tools (mostly ones we've created).
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>>>>> from discovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing
>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>> Being able to map out these complex integrations will help
>>>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between
>>>>>>>>>>>>>> the Splynx and UISP integrations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Robert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via
>>>>>>>>>>>>>> LibreQoS <libreqos@lists.bufferbloat.net> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I've been doing some work on getting UISP integration
>>>>>>>>>>>>>>> (and integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I started by implementing a graph structure that mirrors
>>>>>>>>>>>>>>> both the networks and sites system. It's not done yet, but the basics are
>>>>>>>>>>>>>>> coming together nicely. You can see my progress so far at:
>>>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>>>>    - Sites that go to client sites, which go to other
>>>>>>>>>>>>>>>    client sites
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I fed our network into the current graph generator, and
>>>>>>>>>>>>>>> used graphviz to spit out a directed graph:
>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Robert Chacón
>>>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Robert Chacón
>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>> Dev | LibreQoS.io
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Chacón
>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>> Dev | LibreQoS.io
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Chacón
>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>> Dev | LibreQoS.io
>>>>>>
>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS@lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> This song goes out to all the folk that thought Stadia would work:
>>>>>
>>>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>>>> Dave Täht CEO, TekLibre, LLC
>>>>>
>>>>
>>>
>>> --
>>> This song goes out to all the folk that thought Stadia would work:
>>>
>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>> Dave Täht CEO, TekLibre, LLC
>>>
>>

[-- Attachment #1.2: Type: text/html, Size: 55125 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 573568 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 115596 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  1:46                             ` Herbert Wolverson
@ 2022-10-31  2:21                               ` Dave Taht
  2022-10-31  3:26                                 ` Robert Chacón
                                                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-31  2:21 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos

[-- Attachment #1: Type: text/plain, Size: 180 bytes --]

How about the idea of "metaverse-ready" metrics, with one table that is
preseem-like and another that's

blue =  < 8ms
green = < 20ms
yellow = < 50ms
orange  = < 70ms
red = > 70ms

[-- Attachment #2: Type: text/html, Size: 327 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  2:21                               ` Dave Taht
@ 2022-10-31  3:26                                 ` Robert Chacón
  2022-10-31 14:47                                 ` [LibreQoS] metaverse-ready metrics Dave Taht
  2022-10-31 15:56                                 ` [LibreQoS] Integration system, aka fun with graph theory dan
  2 siblings, 0 replies; 33+ messages in thread
From: Robert Chacón @ 2022-10-31  3:26 UTC (permalink / raw)
  To: Dave Taht; +Cc: Herbert Wolverson, libreqos

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

> That's a lot to play with, so I'm taking my time. My gut likes the A/B
switch, currently.

Take your time, I'm just thrilled to see this working so well so far.

> I could feel the difference at my desk; fire up a video while a download
was running, and it simply "felt" like it responded better. TCP RTT times
are the best measure of "feel" I've found, so far.

I've experienced the same when our network switched from LibreQoS using
fq_codel to LibreQoS using CAKE. Really hard to quantify it but the
"snappiness" or "feel" is noticeable to end-users.

> We've tended to go with "median" latency as a guide, rather than mean.
Thanks to monitoring things beyond our control, some of the outliers tend
to be *really bad* - even if the network is fine. There's literally nothing
we can do about a customer trying to work with a malfunctioning system
somewhere (in space, for all I know!)

True. And it can be sort of helpful for troubleshooting WiFi latency issues
and bottlenecks inside the home and such.

> "monitor only" mode

Perhaps we can use ePPing just for this aspect? Or instead we could use
cpumap-pping but with all HTB classes set to high rates (no plan
enforcement) and no CAKE leafs.

> How about the idea of "metaverse-ready" metrics, with one table that is
preseem-like and another that's

Good idea. I've now added both a standard (preseem like) table and
"metaverse-ready" table of Node (AP) TCP Latency on the InfluxDB template.



On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> How about the idea of "metaverse-ready" metrics, with one table that is
> preseem-like and another that's
>
> blue =  < 8ms
> green = < 20ms
> yellow = < 50ms
> orange  = < 70ms
> red = > 70ms
>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
Dev | LibreQoS.io

[-- Attachment #2: Type: text/html, Size: 3217 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [LibreQoS] metaverse-ready metrics
  2022-10-31  2:21                               ` Dave Taht
  2022-10-31  3:26                                 ` Robert Chacón
@ 2022-10-31 14:47                                 ` Dave Taht
  2022-10-31 14:50                                   ` Dave Taht
  2022-10-31 15:56                                 ` [LibreQoS] Integration system, aka fun with graph theory dan
  2 siblings, 1 reply; 33+ messages in thread
From: Dave Taht @ 2022-10-31 14:47 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos

On Sun, Oct 30, 2022 at 7:21 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's

aquamarine = < 3.2ms - this is as low as it is possible to measure, as
tcp timestamps are in ms.
blue =  < 8ms
green = < 20ms
yellow = < 50ms
orange  = < 70ms
red = > 70ms
mordor-red > 120ms

is there a truly ugly tone of red, blackish, ugly as sin? (mordor-red)

This above is almost but not quite, a :
https://en.wikipedia.org/wiki/Seven-number_summary


>


-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] metaverse-ready metrics
  2022-10-31 14:47                                 ` [LibreQoS] metaverse-ready metrics Dave Taht
@ 2022-10-31 14:50                                   ` Dave Taht
  0 siblings, 0 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-31 14:50 UTC (permalink / raw)
  To: Herbert Wolverson, Andrew McGregor; +Cc: libreqos

Andrew?, I can't remember or find the name of that algebra and
distribution you were so hot on 5? 8? years ago, that influenced bbr.

On Mon, Oct 31, 2022 at 7:47 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Sun, Oct 30, 2022 at 7:21 PM Dave Taht <dave.taht@gmail.com> wrote:
> >
> > How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's
>
> aquamarine = < 3.2ms - this is as low as it is possible to measure, as
> tcp timestamps are in ms.
> blue =  < 8ms
> green = < 20ms
> yellow = < 50ms
> orange  = < 70ms
> red = > 70ms
> mordor-red > 120ms
>
> is there a truly ugly tone of red, blackish, ugly as sin? (mordor-red)
>
> This above is almost but not quite, a :
> https://en.wikipedia.org/wiki/Seven-number_summary
>
>
> >
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31  2:21                               ` Dave Taht
  2022-10-31  3:26                                 ` Robert Chacón
  2022-10-31 14:47                                 ` [LibreQoS] metaverse-ready metrics Dave Taht
@ 2022-10-31 15:56                                 ` dan
  2022-10-31 21:19                                   ` Herbert Wolverson
  2 siblings, 1 reply; 33+ messages in thread
From: dan @ 2022-10-31 15:56 UTC (permalink / raw)
  To: Dave Taht; +Cc: Herbert Wolverson, libreqos

[-- Attachment #1: Type: text/plain, Size: 746 bytes --]

On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> How about the idea of "metaverse-ready" metrics, with one table that is
> preseem-like and another that's
>
> blue =  < 8ms
> green = < 20ms
> yellow = < 50ms
> orange  = < 70ms
> red = > 70ms
>

These need configurable.  There are a lot of wisps that would have
everything orange/red.  We're considering anything under 100ms good on the
rural plans.   Also keep in mind that if you're tracking latence via pping
etc, then you need some buffer in there for the internet at large.  <70ms
to Amazon is one thing, they're very well connected, but <70ms to most of
the internet isn't probably very realistic and would make most charts look
like poop.

[-- Attachment #2: Type: text/html, Size: 1217 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31 15:56                                 ` [LibreQoS] Integration system, aka fun with graph theory dan
@ 2022-10-31 21:19                                   ` Herbert Wolverson
  2022-10-31 21:54                                     ` Dave Taht
                                                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Herbert Wolverson @ 2022-10-31 21:19 UTC (permalink / raw)
  Cc: libreqos

[-- Attachment #1: Type: text/plain, Size: 3362 bytes --]

I'd agree with color coding (when it exists - no rush, IMO) being
configurable.

From the "how much delay are we adding" discussion earlier, I thought I'd
do a little bit of profiling of the BPF programs themselves. This is with
the latest round of performance updates (
https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring
anything in production. I simply added a call to get the clock at the
start, and again at the end - and log the difference. Measuring both XDP
and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it
to the right CPU)->(egress)->(TC sends it to the right classifier, on the
correct CPU and measures RTT latency). This is adding about two clock
checks and a debug log entry to execution time, so measuring it is slowing
it down.

The results are interesting, and mostly tell me to try a different
measurement system. I'm seeing a pretty wide variance. Hammering it with an
iperf session and a queue capped at 5 gbit/s: most of the TC timings were
40 nanoseconds - not a packet that requires extra tracking, already in
cache, so proceed. When the TCP RTT tracker fired and recorded a
performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
of things is typically in the 300-400 nanosecond range, I saw a handful of
worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
0.00349 ms. So - assuming worst case (and keeping the overhead added by the
not-so-great monitoring), we're adding *0.0093 ms* to packet transit time
with the BPF programs.

With a much more sedate queue (ceiling 500 mbit/s), I saw much more
consistent numbers. The vast majority of XDP timings were in the 75-150
nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
have an update to perform - peaking very occasionally at 1500 nanoseconds.
Only adding 0.00155 ms to packet times is pretty good.

It definitely performs best on long streams, probably because the previous
lookups are all in cache. This is also making me question the answer I
found to "how long does it take to read the clock?" I'd seen ballpark
estimates of 53 nanoseconds. Given that this reads the clock twice, that
can't be right. (I'm *really* not sure how to measure that one)

Again - not a great test (I'll have to learn the perf system to do this
properly - which in turn opens up the potential for flame graphs and some
proper tracing). Interesting ballpark, though.

On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote:

>
>
> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> How about the idea of "metaverse-ready" metrics, with one table that is
>> preseem-like and another that's
>>
>> blue =  < 8ms
>> green = < 20ms
>> yellow = < 50ms
>> orange  = < 70ms
>> red = > 70ms
>>
>
> These need configurable.  There are a lot of wisps that would have
> everything orange/red.  We're considering anything under 100ms good on the
> rural plans.   Also keep in mind that if you're tracking latence via pping
> etc, then you need some buffer in there for the internet at large.  <70ms
> to Amazon is one thing, they're very well connected, but <70ms to most of
> the internet isn't probably very realistic and would make most charts look
> like poop.
>

[-- Attachment #2: Type: text/html, Size: 4379 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31 21:19                                   ` Herbert Wolverson
@ 2022-10-31 21:54                                     ` Dave Taht
  2022-10-31 21:57                                     ` Robert Chacón
  2022-11-01  3:31                                     ` Dave Taht
  2 siblings, 0 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-31 21:54 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos

glibc added a vdo mapping directly to the kernel time page, so
gettimeofday is not a syscall, and the results in the linux 4.0-4.2
era were in the 40ns range.

Last I looked, musl used the syscall, which was much, much worse.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31 21:19                                   ` Herbert Wolverson
  2022-10-31 21:54                                     ` Dave Taht
@ 2022-10-31 21:57                                     ` Robert Chacón
  2022-10-31 23:31                                       ` dan
  2022-11-01  3:31                                     ` Dave Taht
  2 siblings, 1 reply; 33+ messages in thread
From: Robert Chacón @ 2022-10-31 21:57 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos

[-- Attachment #1: Type: text/plain, Size: 4841 bytes --]

> I'd agree with color coding (when it exists - no rush, IMO) being
configurable.

Thankfully it will be configurable, and easily, through the InfluxDB
interface.
Any operator will be able to click the Gear icon above the tables and set
the thresholds to whatever is desired.
I've set it to include both a standard table and "metaverse-ready" table
based on Dave's threshold recommendations.

   - Standard (Preseem like)
   - green = < 75 ms
      - yellow = < 100 ms
      - red = > 100 ms
      - Metaverse-Ready
      - blue =  < 8ms
      - green = < 20ms
      - yellow = < 50ms
      - orange  = < 70ms
      - red = > 70ms

Are the defaults here reasonable at least? Should we change the Standard
table thresholds a bit?

> Only adding 0.00155 ms to packet times is pretty good.

Agreed! That's excellent. Great work on this so far it's looking like
you're making tremendous progress.

On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> I'd agree with color coding (when it exists - no rush, IMO) being
> configurable.
>
> From the "how much delay are we adding" discussion earlier, I thought I'd
> do a little bit of profiling of the BPF programs themselves. This is with
> the latest round of performance updates (
> https://github.com/thebracket/cpumap-pping/issues/2), so it's not
> measuring anything in production. I simply added a call to get the clock at
> the start, and again at the end - and log the difference. Measuring both
> XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap
> sends it to the right CPU)->(egress)->(TC sends it to the right classifier,
> on the correct CPU and measures RTT latency). This is adding about two
> clock checks and a debug log entry to execution time, so measuring it is
> slowing it down.
>
> The results are interesting, and mostly tell me to try a different
> measurement system. I'm seeing a pretty wide variance. Hammering it with an
> iperf session and a queue capped at 5 gbit/s: most of the TC timings were
> 40 nanoseconds - not a packet that requires extra tracking, already in
> cache, so proceed. When the TCP RTT tracker fired and recorded a
> performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
> seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
> of things is typically in the 300-400 nanosecond range, I saw a handful of
> worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
> 0.00349 ms. So - assuming worst case (and keeping the overhead added by the
> not-so-great monitoring), we're adding *0.0093 ms* to packet transit time
> with the BPF programs.
>
> With a much more sedate queue (ceiling 500 mbit/s), I saw much more
> consistent numbers. The vast majority of XDP timings were in the 75-150
> nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
> have an update to perform - peaking very occasionally at 1500 nanoseconds.
> Only adding 0.00155 ms to packet times is pretty good.
>
> It definitely performs best on long streams, probably because the previous
> lookups are all in cache. This is also making me question the answer I
> found to "how long does it take to read the clock?" I'd seen ballpark
> estimates of 53 nanoseconds. Given that this reads the clock twice, that
> can't be right. (I'm *really* not sure how to measure that one)
>
> Again - not a great test (I'll have to learn the perf system to do this
> properly - which in turn opens up the potential for flame graphs and some
> proper tracing). Interesting ballpark, though.
>
> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote:
>
>>
>>
>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
>> libreqos@lists.bufferbloat.net> wrote:
>>
>>> How about the idea of "metaverse-ready" metrics, with one table that is
>>> preseem-like and another that's
>>>
>>> blue =  < 8ms
>>> green = < 20ms
>>> yellow = < 50ms
>>> orange  = < 70ms
>>> red = > 70ms
>>>
>>
>> These need configurable.  There are a lot of wisps that would have
>> everything orange/red.  We're considering anything under 100ms good on the
>> rural plans.   Also keep in mind that if you're tracking latence via pping
>> etc, then you need some buffer in there for the internet at large.  <70ms
>> to Amazon is one thing, they're very well connected, but <70ms to most of
>> the internet isn't probably very realistic and would make most charts look
>> like poop.
>>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>


-- 
Robert Chacón
CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
Dev | LibreQoS.io

[-- Attachment #2: Type: text/html, Size: 6584 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31 21:57                                     ` Robert Chacón
@ 2022-10-31 23:31                                       ` dan
  2022-10-31 23:45                                         ` Dave Taht
  0 siblings, 1 reply; 33+ messages in thread
From: dan @ 2022-10-31 23:31 UTC (permalink / raw)
  To: Robert Chacón; +Cc: Herbert Wolverson, libreqos

[-- Attachment #1: Type: text/plain, Size: 5940 bytes --]

preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they just
consolidate everything >200 to 200, basically so there's no 'terrible'
color lol.  I think these numbers are reasonable for standard internet
service these days.  for a 'default' value anyway.   >100ms isn't bad
service for most people, and most wisps will have a LOT of traffic coming
through with >100ms from the far reaches of the internet.

Maybe just reasonable defaults like preseem uses for integrated 'generic'
tracking, but then have a separate graph hitting some target services.  ie,
try to get game servers on there, AWS, Cloudflare, Azure, Google cloud.
Show a radar graphic or similar.

On Mon, Oct 31, 2022 at 3:57 PM Robert Chacón via LibreQoS <
libreqos@lists.bufferbloat.net> wrote:

> > I'd agree with color coding (when it exists - no rush, IMO) being
> configurable.
>
> Thankfully it will be configurable, and easily, through the InfluxDB
> interface.
> Any operator will be able to click the Gear icon above the tables and set
> the thresholds to whatever is desired.
> I've set it to include both a standard table and "metaverse-ready" table
> based on Dave's threshold recommendations.
>
>    - Standard (Preseem like)
>    - green = < 75 ms
>       - yellow = < 100 ms
>       - red = > 100 ms
>       - Metaverse-Ready
>       - blue =  < 8ms
>       - green = < 20ms
>       - yellow = < 50ms
>       - orange  = < 70ms
>       - red = > 70ms
>
> Are the defaults here reasonable at least? Should we change the Standard
> table thresholds a bit?
>
> > Only adding 0.00155 ms to packet times is pretty good.
>
> Agreed! That's excellent. Great work on this so far it's looking like
> you're making tremendous progress.
>
> On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
>
>> I'd agree with color coding (when it exists - no rush, IMO) being
>> configurable.
>>
>> From the "how much delay are we adding" discussion earlier, I thought I'd
>> do a little bit of profiling of the BPF programs themselves. This is with
>> the latest round of performance updates (
>> https://github.com/thebracket/cpumap-pping/issues/2), so it's not
>> measuring anything in production. I simply added a call to get the clock at
>> the start, and again at the end - and log the difference. Measuring both
>> XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap
>> sends it to the right CPU)->(egress)->(TC sends it to the right classifier,
>> on the correct CPU and measures RTT latency). This is adding about two
>> clock checks and a debug log entry to execution time, so measuring it is
>> slowing it down.
>>
>> The results are interesting, and mostly tell me to try a different
>> measurement system. I'm seeing a pretty wide variance. Hammering it with an
>> iperf session and a queue capped at 5 gbit/s: most of the TC timings were
>> 40 nanoseconds - not a packet that requires extra tracking, already in
>> cache, so proceed. When the TCP RTT tracker fired and recorded a
>> performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
>> seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
>> of things is typically in the 300-400 nanosecond range, I saw a handful of
>> worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
>> 0.00349 ms. So - assuming worst case (and keeping the overhead added by the
>> not-so-great monitoring), we're adding *0.0093 ms* to packet transit
>> time with the BPF programs.
>>
>> With a much more sedate queue (ceiling 500 mbit/s), I saw much more
>> consistent numbers. The vast majority of XDP timings were in the 75-150
>> nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
>> have an update to perform - peaking very occasionally at 1500 nanoseconds.
>> Only adding 0.00155 ms to packet times is pretty good.
>>
>> It definitely performs best on long streams, probably because the
>> previous lookups are all in cache. This is also making me question the
>> answer I found to "how long does it take to read the clock?" I'd seen
>> ballpark estimates of 53 nanoseconds. Given that this reads the clock
>> twice, that can't be right. (I'm *really* not sure how to measure that one)
>>
>> Again - not a great test (I'll have to learn the perf system to do this
>> properly - which in turn opens up the potential for flame graphs and some
>> proper tracing). Interesting ballpark, though.
>>
>> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
>>> libreqos@lists.bufferbloat.net> wrote:
>>>
>>>> How about the idea of "metaverse-ready" metrics, with one table that is
>>>> preseem-like and another that's
>>>>
>>>> blue =  < 8ms
>>>> green = < 20ms
>>>> yellow = < 50ms
>>>> orange  = < 70ms
>>>> red = > 70ms
>>>>
>>>
>>> These need configurable.  There are a lot of wisps that would have
>>> everything orange/red.  We're considering anything under 100ms good on the
>>> rural plans.   Also keep in mind that if you're tracking latence via pping
>>> etc, then you need some buffer in there for the internet at large.  <70ms
>>> to Amazon is one thing, they're very well connected, but <70ms to most of
>>> the internet isn't probably very realistic and would make most charts look
>>> like poop.
>>>
>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> Robert Chacón
> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
> Dev | LibreQoS.io
>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos
>

[-- Attachment #2: Type: text/html, Size: 8033 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31 23:31                                       ` dan
@ 2022-10-31 23:45                                         ` Dave Taht
  0 siblings, 0 replies; 33+ messages in thread
From: Dave Taht @ 2022-10-31 23:45 UTC (permalink / raw)
  To: dan; +Cc: Robert Chacón, libreqos

On Mon, Oct 31, 2022 at 4:32 PM dan via LibreQoS
<libreqos@lists.bufferbloat.net> wrote:
>
> preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they just consolidate everything >200 to 200, basically so there's no 'terrible' color lol.

I am sorry to hear those numbers are considered to be good. My numbers
are based on human factors research, some of which are cited here:
https://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/

> I think these numbers are reasonable for standard internet service these days.  for a 'default' value anyway.   >100ms isn't bad service for most people, and most wisps will have a LOT of traffic coming through with >100ms from the far reaches of the internet.

I'm puzzled, actually. Given the rise of CDNs I would expect most
internet connections to the ISP to have
far less than 60ms latency at this point. Google, is typically 2ms
away from most fiber in the eu, for example.

Very few transactions go to the far reaches of the planet anymore, but
I do lack real world data on that.

>
> Maybe just reasonable defaults like preseem uses for integrated 'generic' tracking, but then have a separate graph hitting some target services.  ie, try to get game servers on there, AWS, Cloudflare, Azure, Google cloud.  Show a radar graphic or similar.

My thought for slices of the data (2nd tier support and CTO level) would be

ISP infrastructure (aquamarine, less than 3ms)
First hop infrastructure (blue, less than 8ms)
ISP -> customer - 10-20ms (green) for wired, much worse for wifi
customer to world - ideally, sub 50ms.

I can certainly agree that the metaverse metrics are scary given the
state of things you describe, but the
8ms figure is the bare minimum to have an acceptible experience in
that virtual world.

>
> On Mon, Oct 31, 2022 at 3:57 PM Robert Chacón via LibreQoS <libreqos@lists.bufferbloat.net> wrote:
>>
>> > I'd agree with color coding (when it exists - no rush, IMO) being configurable.
>>
>> Thankfully it will be configurable, and easily, through the InfluxDB interface.
>> Any operator will be able to click the Gear icon above the tables and set the thresholds to whatever is desired.
>> I've set it to include both a standard table and "metaverse-ready" table based on Dave's threshold recommendations.
>>
>> Standard (Preseem like)
>>
>> green = < 75 ms
>> yellow = < 100 ms
>> red = > 100 ms
>>
>> Metaverse-Ready

aquamarine <= 3ms
>> blue =  < 8ms
>> green = < 20ms
>> yellow = < 50ms
>> orange  = < 70ms
>> red = > 70ms
mordor-red = >100ms

>> Are the defaults here reasonable at least? Should we change the Standard table thresholds a bit?

Following exactly preseems current breakdown seems best for the
"preseem" table. Calling it "standard",
kind of requires actual standards.


>>
>> > Only adding 0.00155 ms to packet times is pretty good.
>>
>> Agreed! That's excellent. Great work on this so far it's looking like you're making tremendous progress.
>>
>> On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS <libreqos@lists.bufferbloat.net> wrote:
>>>
>>> I'd agree with color coding (when it exists - no rush, IMO) being configurable.
>>>
>>> From the "how much delay are we adding" discussion earlier, I thought I'd do a little bit of profiling of the BPF programs themselves. This is with the latest round of performance updates (https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring anything in production. I simply added a call to get the clock at the start, and again at the end - and log the difference. Measuring both XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it to the right CPU)->(egress)->(TC sends it to the right classifier, on the correct CPU and measures RTT latency). This is adding about two clock checks and a debug log entry to execution time, so measuring it is slowing it down.
>>>
>>> The results are interesting, and mostly tell me to try a different measurement system. I'm seeing a pretty wide variance. Hammering it with an iperf session and a queue capped at 5 gbit/s: most of the TC timings were 40 nanoseconds - not a packet that requires extra tracking, already in cache, so proceed. When the TCP RTT tracker fired and recorded a performance event, it peaked at 5,900 nanoseconds. So the tc xdp program seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side of things is typically in the 300-400 nanosecond range, I saw a handful of worst-case numbers in the 3400 nanosecond range. So the XDP side is adding 0.00349 ms. So - assuming worst case (and keeping the overhead added by the not-so-great monitoring), we're adding 0.0093 ms to packet transit time with the BPF programs.
>>>
>>> With a much more sedate queue (ceiling 500 mbit/s), I saw much more consistent numbers. The vast majority of XDP timings were in the 75-150 nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't have an update to perform - peaking very occasionally at 1500 nanoseconds. Only adding 0.00155 ms to packet times is pretty good.
>>>
>>> It definitely performs best on long streams, probably because the previous lookups are all in cache. This is also making me question the answer I found to "how long does it take to read the clock?" I'd seen ballpark estimates of 53 nanoseconds. Given that this reads the clock twice, that can't be right. (I'm *really* not sure how to measure that one)
>>>
>>> Again - not a great test (I'll have to learn the perf system to do this properly - which in turn opens up the potential for flame graphs and some proper tracing). Interesting ballpark, though.
>>>
>>> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <libreqos@lists.bufferbloat.net> wrote:
>>>>>
>>>>> How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's
>>>>>
>>>>> blue =  < 8ms
>>>>> green = < 20ms
>>>>> yellow = < 50ms
>>>>> orange  = < 70ms
>>>>> red = > 70ms
>>>>
>>>>
>>>> These need configurable.  There are a lot of wisps that would have everything orange/red.  We're considering anything under 100ms good on the rural plans.   Also keep in mind that if you're tracking latence via pping etc, then you need some buffer in there for the internet at large.  <70ms to Amazon is one thing, they're very well connected, but <70ms to most of the internet isn't probably very realistic and would make most charts look like poop.
>>>
>>> _______________________________________________
>>> LibreQoS mailing list
>>> LibreQoS@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>>
>>
>> --
>> Robert Chacón
>> CEO | JackRabbit Wireless LLC
>> Dev | LibreQoS.io
>>
>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-10-31 21:19                                   ` Herbert Wolverson
  2022-10-31 21:54                                     ` Dave Taht
  2022-10-31 21:57                                     ` Robert Chacón
@ 2022-11-01  3:31                                     ` Dave Taht
  2022-11-01 13:38                                       ` Herbert Wolverson
  2 siblings, 1 reply; 33+ messages in thread
From: Dave Taht @ 2022-11-01  3:31 UTC (permalink / raw)
  To: Herbert Wolverson; +Cc: libreqos

Calling rdtsc directly used to be even faster than gettimeofday

https://github.com/dtaht/libv6/blob/master/erm/includes/get_cycles.h

On Mon, Oct 31, 2022 at 2:20 PM Herbert Wolverson via LibreQoS
<libreqos@lists.bufferbloat.net> wrote:
>
> I'd agree with color coding (when it exists - no rush, IMO) being configurable.
>
> From the "how much delay are we adding" discussion earlier, I thought I'd do a little bit of profiling of the BPF programs themselves. This is with the latest round of performance updates (https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring anything in production. I simply added a call to get the clock at the start, and again at the end - and log the difference. Measuring both XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it to the right CPU)->(egress)->(TC sends it to the right classifier, on the correct CPU and measures RTT latency). This is adding about two clock checks and a debug log entry to execution time, so measuring it is slowing it down.
>
> The results are interesting, and mostly tell me to try a different measurement system. I'm seeing a pretty wide variance. Hammering it with an iperf session and a queue capped at 5 gbit/s: most of the TC timings were 40 nanoseconds - not a packet that requires extra tracking, already in cache, so proceed. When the TCP RTT tracker fired and recorded a performance event, it peaked at 5,900 nanoseconds. So the tc xdp program seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side of things is typically in the 300-400 nanosecond range, I saw a handful of worst-case numbers in the 3400 nanosecond range. So the XDP side is adding 0.00349 ms. So - assuming worst case (and keeping the overhead added by the not-so-great monitoring), we're adding 0.0093 ms to packet transit time with the BPF programs.
>
> With a much more sedate queue (ceiling 500 mbit/s), I saw much more consistent numbers. The vast majority of XDP timings were in the 75-150 nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't have an update to perform - peaking very occasionally at 1500 nanoseconds. Only adding 0.00155 ms to packet times is pretty good.
>
> It definitely performs best on long streams, probably because the previous lookups are all in cache. This is also making me question the answer I found to "how long does it take to read the clock?" I'd seen ballpark estimates of 53 nanoseconds. Given that this reads the clock twice, that can't be right. (I'm *really* not sure how to measure that one)
>
> Again - not a great test (I'll have to learn the perf system to do this properly - which in turn opens up the potential for flame graphs and some proper tracing). Interesting ballpark, though.
>
> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote:
>>
>>
>>
>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <libreqos@lists.bufferbloat.net> wrote:
>>>
>>> How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's
>>>
>>> blue =  < 8ms
>>> green = < 20ms
>>> yellow = < 50ms
>>> orange  = < 70ms
>>> red = > 70ms
>>
>>
>> These need configurable.  There are a lot of wisps that would have everything orange/red.  We're considering anything under 100ms good on the rural plans.   Also keep in mind that if you're tracking latence via pping etc, then you need some buffer in there for the internet at large.  <70ms to Amazon is one thing, they're very well connected, but <70ms to most of the internet isn't probably very realistic and would make most charts look like poop.
>
> _______________________________________________
> LibreQoS mailing list
> LibreQoS@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [LibreQoS] Integration system, aka fun with graph theory
  2022-11-01  3:31                                     ` Dave Taht
@ 2022-11-01 13:38                                       ` Herbert Wolverson
  0 siblings, 0 replies; 33+ messages in thread
From: Herbert Wolverson @ 2022-11-01 13:38 UTC (permalink / raw)
  To: Dave Taht; +Cc: libreqos

[-- Attachment #1: Type: text/plain, Size: 11815 bytes --]

Dave: in this case, I'm running inside the eBPF VM - so I'm already in
kernel space, but have a very limited set of functions available.
bpf_ktime_get_ns() seems to be the approved way to get the clock. There was
a big debate that it uses the kernel's monotonic clock, which takes longer
to sample. I'm guessing they improved that, because I'm not seeing the
delay that some people were complaining about (it's not free, but it's also
a *lot* faster than the estimates I was finding).

> > preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they
just consolidate everything >200 to 200, basically so there's no 'terrible'
color lol.
> I am sorry to hear those numbers are considered to be good.

It's interesting that you see adverts on Wisp Talk (the FB group) showing
"wow, half my APs are now green!" (and showing about 50% green, 25% yellow,
25% red). When we had Preseem, we always took "red" to mean "oh no,
something's really wrong" - and got to work fixing it. There were a couple
of distant (many hops down the chain) APs that struggled to stay yellow,
but red was always a sign for battle stations. I think that's part of why
WISPs suffer from "jump ship as soon as something better comes along" - I'd
be jumping ship too, if my ISP expected me to "enjoy" 125-200 ms RTT
latency for any extended period of time (I'm pretty understanding about
"something went wrong, we're working on it").

Geography does play a large part. I'll see if I can resurrect a tool I had
that turned RTT latency measurements into a Google Maps heatmap overlay
(updating, so you could see the orange/red areas moving when the network
suffered). It can be pretty tough to find a good upstream far from towns,
which affects everything. But more, deep chains of backhauls add up - and
add up fast if you have any sort of congestion issue along the way. For
example:

   - We have a pretty decently connected upstream, averaging 8ms ping
   round-trip time to Cloudflare's DNS.
   - Going down our "hottest" path (60 ghz AF60 LR to a tower, and then
   another one to a 3,000 bed apartment complex - peaks at 900 mbit/s every
   night; will peak at a lot more than that as soon as their check clears for
   some Siklu gear), we worked *stupidly hard* to keep the average ping
   time there at 9ms to Cloudflare's DNS. Even then, it's closer to 16ms when
   fully loaded. They are a topic for a future Cake discussion. :-)
   - We have a few clients connected directly off of the facility with the
   upstream - and they all get great RTT times (a mix of 5.8 and 3.6 CBRS;
   Wave coming as soon as it's in stock at the same time as the guy with the
   money being at a keyboard!).
   - Our largest (by # of customers) tower is 11 miles away, currently fed
   by 2 AirFiber 5XHD (ECMP balanced). We've worked really hard to keep that
   tower's average ping time to Cloudflare at 18ms. We have some nicer radios
   (the Cambium 400C is a beast) going in soon, which should help.
      - That tower feeds 4 micro-pops. The worst is near line-of-sight
      (trees) on a 3.6 ghz Medusa. It suffers a bit at 33ms round-trip
ping times
      to Cloudflare. The best averages 22ms ping times to Cloudflare.
   - We have a bunch more sites behind a 13 mile backhaul hop (followed by
   a 3 mile backhaul hop; geography meant going around a tree-covered ridge).
   We've had a heck of time getting that up to scratch; AF5XHD kinda worked,
   but the experience was pretty wretched. They were the testbed for the
   Cambium 400C, and now average 22ms to Cloudflare.
      - There's 15 (!) small towers behind that one! We eventually got the
      most distant one to 35ms to Cloudflare pings - but
ripped/replaced SO much
      hardware to get there. (Even then, customer experience at some of those
      sites isn't what I'd like; I just tried a ping test from a
customer running
      a 2.4 ghz "elevated" Ubiquiti dish to an old ePMP 1000 - at a
tower 5 hops
      in. 45-50ms to Cloudflare. Not great.

Physics dictates that the tiny towers, separated from the core by miles of
backhaul and hops between them aren't going to perform as well as the
nearby ones. You *can* get them going well, but it's expensive and time
consuming.

One thing Preseem does pretty well is show daily reports in brightly
colored bars, which "gamifies" fixing the issue. If you have any gamers on
staff, they start to obsess with turning everything green. It's great. :-)

The other thing I keep running into is network management. A few years ago,
we bought a WISP with 20 towers and a few hundred customers (it was a
friendly "I'm getting too unwell to keep doing this" purchase). The guy who
set it up was pretty amazing; he had no networking experience whatsoever,
but was pretty good at building things. So he'd built most of the towers
himself, purely because he wanted to get better service out to some *very*
rural parts of Missouri (including a whole bunch of non-profits and
churches, which is our largest market). While it's impressive what he
pulled off, he'd still just lost 200 customers to an electric coop's fiber
build-out. His construction skills were awesome; his network skills - not
so much. He had 1 public IP, connected to a 100mbit/s connection at his
house. Every single tower (over a 60 mile spread) was connected to exactly
one other tower. Every tower had backhauls in bridge mode, connected to a
(netgear consumer) switch at the tower. Every AP (all of them 2.4ghz Bullet
M2) was in bridge mode with client isolation turned off, connected to an
assortment of CPES (mostly Airgrid M2) - also in bridge mode. No DHCP, he
had every customer type in their 192.168.x.y address (he had the whole /16
setup on the one link; no VLANs). Speed limits were set by turning on
traffic shaping on the M2 CPEs... and he wondered why latency sometimes
resembled remote control of a Mars rover, or parts of the network would
randomly die when somebody accidentally plugged their net connection into
their router's LAN port. A couple of customers had foregone routers
altogether, and you could see their Windows networking broadcasts
traversing the network! I wish I could say that was unusual, but I've
helped a handful of WISPs in similar situations.

One of the first things we did was get Preseem running (after adding every
client into UNMS as it was called then). That made a big difference, and
gave good visibility into how bad it was. Then it was a long process of
breaking the network down into routed chunks, enabling DHCP, replacing
backhauls (there were a bunch of times when towers were connected in the
order they were constructed, and never connected to a new tower a mile away
- but 20 miles down the chain), switching out bullets, etc. Eventually,
it's a great network - and growing again. I'm not sure we could've done
that without a) great visibility from monitoring platforms, and b) decades
of experience between us.

Longer-term, I'm hoping that we can help networks like that one. Great
shaping and visibility go a *long* way. Building up some "best practices"
and offering advice can go a *really long* way. (And good mapping makes a
big difference; I'm not all that far from releasing a generally usable
version of my LiDAR mapping suite, an ancient version is here -
https://github.com/thebracket/rf-signals ;  You can get LiDAR data for
about 2/3 of the US for free, now. ).



On Mon, Oct 31, 2022 at 10:32 PM Dave Taht <dave.taht@gmail.com> wrote:

> Calling rdtsc directly used to be even faster than gettimeofday
>
> https://github.com/dtaht/libv6/blob/master/erm/includes/get_cycles.h
>
> On Mon, Oct 31, 2022 at 2:20 PM Herbert Wolverson via LibreQoS
> <libreqos@lists.bufferbloat.net> wrote:
> >
> > I'd agree with color coding (when it exists - no rush, IMO) being
> configurable.
> >
> > From the "how much delay are we adding" discussion earlier, I thought
> I'd do a little bit of profiling of the BPF programs themselves. This is
> with the latest round of performance updates (
> https://github.com/thebracket/cpumap-pping/issues/2), so it's not
> measuring anything in production. I simply added a call to get the clock at
> the start, and again at the end - and log the difference. Measuring both
> XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap
> sends it to the right CPU)->(egress)->(TC sends it to the right classifier,
> on the correct CPU and measures RTT latency). This is adding about two
> clock checks and a debug log entry to execution time, so measuring it is
> slowing it down.
> >
> > The results are interesting, and mostly tell me to try a different
> measurement system. I'm seeing a pretty wide variance. Hammering it with an
> iperf session and a queue capped at 5 gbit/s: most of the TC timings were
> 40 nanoseconds - not a packet that requires extra tracking, already in
> cache, so proceed. When the TCP RTT tracker fired and recorded a
> performance event, it peaked at 5,900 nanoseconds. So the tc xdp program
> seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side
> of things is typically in the 300-400 nanosecond range, I saw a handful of
> worst-case numbers in the 3400 nanosecond range. So the XDP side is adding
> 0.00349 ms. So - assuming worst case (and keeping the overhead added by the
> not-so-great monitoring), we're adding 0.0093 ms to packet transit time
> with the BPF programs.
> >
> > With a much more sedate queue (ceiling 500 mbit/s), I saw much more
> consistent numbers. The vast majority of XDP timings were in the 75-150
> nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't
> have an update to perform - peaking very occasionally at 1500 nanoseconds.
> Only adding 0.00155 ms to packet times is pretty good.
> >
> > It definitely performs best on long streams, probably because the
> previous lookups are all in cache. This is also making me question the
> answer I found to "how long does it take to read the clock?" I'd seen
> ballpark estimates of 53 nanoseconds. Given that this reads the clock
> twice, that can't be right. (I'm *really* not sure how to measure that one)
> >
> > Again - not a great test (I'll have to learn the perf system to do this
> properly - which in turn opens up the potential for flame graphs and some
> proper tracing). Interesting ballpark, though.
> >
> > On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote:
> >>
> >>
> >>
> >> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <
> libreqos@lists.bufferbloat.net> wrote:
> >>>
> >>> How about the idea of "metaverse-ready" metrics, with one table that
> is preseem-like and another that's
> >>>
> >>> blue =  < 8ms
> >>> green = < 20ms
> >>> yellow = < 50ms
> >>> orange  = < 70ms
> >>> red = > 70ms
> >>
> >>
> >> These need configurable.  There are a lot of wisps that would have
> everything orange/red.  We're considering anything under 100ms good on the
> rural plans.   Also keep in mind that if you're tracking latence via pping
> etc, then you need some buffer in there for the internet at large.  <70ms
> to Amazon is one thing, they're very well connected, but <70ms to most of
> the internet isn't probably very realistic and would make most charts look
> like poop.
> >
> > _______________________________________________
> > LibreQoS mailing list
> > LibreQoS@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/libreqos
>
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>

[-- Attachment #2: Type: text/html, Size: 13577 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-11-01 13:39 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson
2022-10-27 21:41 ` Dave Taht
2022-10-27 21:44 ` Dave Taht
2022-10-27 21:48 ` Robert Chacón
2022-10-28  0:27   ` dan
2022-10-28 12:40     ` Herbert Wolverson
2022-10-28 17:43       ` Herbert Wolverson
2022-10-28 19:05         ` Robert Chacón
2022-10-28 19:54           ` Herbert Wolverson
2022-10-28 21:15             ` Robert Chacón
2022-10-29 15:57               ` Herbert Wolverson
2022-10-29 19:05                 ` Robert Chacón
2022-10-29 19:43                   ` Dave Taht
2022-10-30  1:45                     ` Herbert Wolverson
2022-10-31  0:15                       ` Dave Taht
2022-10-31  1:15                         ` Robert Chacón
2022-10-31  1:26                         ` Herbert Wolverson
2022-10-31  1:36                           ` Herbert Wolverson
2022-10-31  1:46                             ` Herbert Wolverson
2022-10-31  2:21                               ` Dave Taht
2022-10-31  3:26                                 ` Robert Chacón
2022-10-31 14:47                                 ` [LibreQoS] metaverse-ready metrics Dave Taht
2022-10-31 14:50                                   ` Dave Taht
2022-10-31 15:56                                 ` [LibreQoS] Integration system, aka fun with graph theory dan
2022-10-31 21:19                                   ` Herbert Wolverson
2022-10-31 21:54                                     ` Dave Taht
2022-10-31 21:57                                     ` Robert Chacón
2022-10-31 23:31                                       ` dan
2022-10-31 23:45                                         ` Dave Taht
2022-11-01  3:31                                     ` Dave Taht
2022-11-01 13:38                                       ` Herbert Wolverson
2022-10-29 19:18                 ` Dave Taht
2022-10-30  1:10                   ` Herbert Wolverson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox