* [LibreQoS] Integration system, aka fun with graph theory @ 2022-10-27 21:33 Herbert Wolverson 2022-10-27 21:41 ` Dave Taht ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Herbert Wolverson @ 2022-10-27 21:33 UTC (permalink / raw) To: libreqos [-- Attachment #1.1: Type: text/plain, Size: 1738 bytes --] So I've been doing some work on getting UISP integration (and integrations in general) to work a bit more smoothly. I started by implementing a graph structure that mirrors both the networks and sites system. It's not done yet, but the basics are coming together nicely. You can see my progress so far at: https://github.com/thebracket/LibreQoS/tree/integration-common-graph Our UISP instance is a *great* testcase for torturing the system. I even found a case of UISP somehow auto-generating a circular portion of the tree. We have: - Non Ubiquiti devices as "other devices" - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 shared 100 mbit") - Bridge mode devices using Option 82 to always allocate the same IP, with a "service IP" entry - Various bits of infrastructure mapped - Sites that go to client sites, which go to other client sites In other words, over the years we've unleashed a bit of a monster. Cleaning it up is a useful talk, but I wanted the integration to be able to handle pathological cases like us! So I fed our network into the current graph generator, and used graphviz to spit out a directed graph: [image: image.png] That doesn't include client sites! Legend: - Green = the root site. - Red = a site - Blue = an access point - Magenta = a client site that has children So the part in "common" is designed heavily to reduce repetition. When it's done, you should be able to feed in sites, APs, clients, devices, etc. in a pretty flexible manner. Given how much code is shared between the UISP and Splynx integration code, I'm pretty sure both will be cut to a tiny fraction of the total code. :-) I can't post the full tree, it's full of client names. [-- Attachment #1.2: Type: text/html, Size: 2233 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson @ 2022-10-27 21:41 ` Dave Taht 2022-10-27 21:44 ` Dave Taht 2022-10-27 21:48 ` Robert Chacón 2 siblings, 0 replies; 33+ messages in thread From: Dave Taht @ 2022-10-27 21:41 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos, Richard E. Brown [-- Attachment #1.1: Type: text/plain, Size: 2561 bytes --] One of bufferbloat.net's main folk was (and remains) rich brown, who helped create "intermapper" so many years ago. I think he sold it off when he retired... I don't know if anyone uses it anymore... hey rich!!! check this out!!! On Thu, Oct 27, 2022 at 2:33 PM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > So I've been doing some work on getting UISP integration (and integrations > in general) to work a bit more smoothly. > > I started by implementing a graph structure that mirrors both the networks > and sites system. It's not done yet, but the basics are coming together > nicely. You can see my progress so far at: > https://github.com/thebracket/LibreQoS/tree/integration-common-graph > > Our UISP instance is a *great* testcase for torturing the system. I even > found a case of UISP somehow auto-generating a circular portion of the > tree. We have: > > - Non Ubiquiti devices as "other devices" > - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 > shared 100 mbit") > - Bridge mode devices using Option 82 to always allocate the same IP, > with a "service IP" entry > - Various bits of infrastructure mapped > - Sites that go to client sites, which go to other client sites > > In other words, over the years we've unleashed a bit of a monster. > Cleaning it up is a useful talk, but I wanted the integration to be able to > handle pathological cases like us! > > So I fed our network into the current graph generator, and used graphviz > to spit out a directed graph: > [image: image.png] > That doesn't include client sites! Legend: > > > - Green = the root site. > - Red = a site > - Blue = an access point > - Magenta = a client site that has children > > So the part in "common" is designed heavily to reduce repetition. When > it's done, you should be able to feed in sites, APs, clients, devices, etc. > in a pretty flexible manner. Given how much code is shared between the UISP > and Splynx integration code, I'm pretty sure both will be cut to a tiny > fraction of the total code. :-) > > I can't post the full tree, it's full of client names. > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC [-- Attachment #1.2: Type: text/html, Size: 3714 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson 2022-10-27 21:41 ` Dave Taht @ 2022-10-27 21:44 ` Dave Taht 2022-10-27 21:48 ` Robert Chacón 2 siblings, 0 replies; 33+ messages in thread From: Dave Taht @ 2022-10-27 21:44 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos Not necessarily useful in this context, but one of my all time favorite graphing tools was the gource animations for commit logs and developer interest. You think that's kind of a boring subject, yes? Well, play one of these animations back... https://gource.io/ I've always kind of wanted to see a network evolve over time, in much the same way. On Thu, Oct 27, 2022 at 2:33 PM Herbert Wolverson via LibreQoS <libreqos@lists.bufferbloat.net> wrote: > > So I've been doing some work on getting UISP integration (and integrations in general) to work a bit more smoothly. > > I started by implementing a graph structure that mirrors both the networks and sites system. It's not done yet, but the basics are coming together nicely. You can see my progress so far at: https://github.com/thebracket/LibreQoS/tree/integration-common-graph > > Our UISP instance is a great testcase for torturing the system. I even found a case of UISP somehow auto-generating a circular portion of the tree. We have: > > Non Ubiquiti devices as "other devices" > Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 shared 100 mbit") > Bridge mode devices using Option 82 to always allocate the same IP, with a "service IP" entry > Various bits of infrastructure mapped > Sites that go to client sites, which go to other client sites > > In other words, over the years we've unleashed a bit of a monster. Cleaning it up is a useful talk, but I wanted the integration to be able to handle pathological cases like us! > > So I fed our network into the current graph generator, and used graphviz to spit out a directed graph: > > That doesn't include client sites! Legend: > > Green = the root site. > Red = a site > Blue = an access point > Magenta = a client site that has children > > So the part in "common" is designed heavily to reduce repetition. When it's done, you should be able to feed in sites, APs, clients, devices, etc. in a pretty flexible manner. Given how much code is shared between the UISP and Splynx integration code, I'm pretty sure both will be cut to a tiny fraction of the total code. :-) > > I can't post the full tree, it's full of client names. > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson 2022-10-27 21:41 ` Dave Taht 2022-10-27 21:44 ` Dave Taht @ 2022-10-27 21:48 ` Robert Chacón 2022-10-28 0:27 ` dan 2 siblings, 1 reply; 33+ messages in thread From: Robert Chacón @ 2022-10-27 21:48 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 2472 bytes --] This is awesome! Way to go here. Thank you for contributing this. Being able to map out these complex integrations will help ISPs a ton, and I really like that it is sharing common features between the Splynx and UISP integrations. Thanks, Robert On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > So I've been doing some work on getting UISP integration (and integrations > in general) to work a bit more smoothly. > > I started by implementing a graph structure that mirrors both the networks > and sites system. It's not done yet, but the basics are coming together > nicely. You can see my progress so far at: > https://github.com/thebracket/LibreQoS/tree/integration-common-graph > > Our UISP instance is a *great* testcase for torturing the system. I even > found a case of UISP somehow auto-generating a circular portion of the > tree. We have: > > - Non Ubiquiti devices as "other devices" > - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 > shared 100 mbit") > - Bridge mode devices using Option 82 to always allocate the same IP, > with a "service IP" entry > - Various bits of infrastructure mapped > - Sites that go to client sites, which go to other client sites > > In other words, over the years we've unleashed a bit of a monster. > Cleaning it up is a useful talk, but I wanted the integration to be able to > handle pathological cases like us! > > So I fed our network into the current graph generator, and used graphviz > to spit out a directed graph: > [image: image.png] > That doesn't include client sites! Legend: > > > - Green = the root site. > - Red = a site > - Blue = an access point > - Magenta = a client site that has children > > So the part in "common" is designed heavily to reduce repetition. When > it's done, you should be able to feed in sites, APs, clients, devices, etc. > in a pretty flexible manner. Given how much code is shared between the UISP > and Splynx integration code, I'm pretty sure both will be cut to a tiny > fraction of the total code. :-) > > I can't post the full tree, it's full of client names. > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> [-- Attachment #1.2: Type: text/html, Size: 3503 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-27 21:48 ` Robert Chacón @ 2022-10-28 0:27 ` dan 2022-10-28 12:40 ` Herbert Wolverson 0 siblings, 1 reply; 33+ messages in thread From: dan @ 2022-10-28 0:27 UTC (permalink / raw) To: Robert Chacón; +Cc: Herbert Wolverson, libreqos [-- Attachment #1.1: Type: text/plain, Size: 3524 bytes --] we're pretty similar in that we've made UISP a mess. Multiple paths to a pop. multiple pops on the network. failover between pops. Lots of 'other' devices. handing out /29 etc to customers. Some sort of discovery would be nice. Ideally though, pulling something from SNMP or router APIs etc to build the paths, but having a 'network elements' list with each of the links described. ie, backhaul 12 has MACs ..01 and ...02 at 300x100 and then build the topology around that from discovery. I've also thought about doing routine trace routes or watching TTLs or something like that to get some indication that topology has changed and then do another discovery and potential tree rebuild. On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > This is awesome! Way to go here. Thank you for contributing this. > Being able to map out these complex integrations will help ISPs a ton, and > I really like that it is sharing common features between the Splynx and > UISP integrations. > > Thanks, > Robert > > On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> So I've been doing some work on getting UISP integration (and >> integrations in general) to work a bit more smoothly. >> >> I started by implementing a graph structure that mirrors both the >> networks and sites system. It's not done yet, but the basics are coming >> together nicely. You can see my progress so far at: >> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >> >> Our UISP instance is a *great* testcase for torturing the system. I even >> found a case of UISP somehow auto-generating a circular portion of the >> tree. We have: >> >> - Non Ubiquiti devices as "other devices" >> - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 >> shared 100 mbit") >> - Bridge mode devices using Option 82 to always allocate the same IP, >> with a "service IP" entry >> - Various bits of infrastructure mapped >> - Sites that go to client sites, which go to other client sites >> >> In other words, over the years we've unleashed a bit of a monster. >> Cleaning it up is a useful talk, but I wanted the integration to be able to >> handle pathological cases like us! >> >> So I fed our network into the current graph generator, and used graphviz >> to spit out a directed graph: >> [image: image.png] >> That doesn't include client sites! Legend: >> >> >> - Green = the root site. >> - Red = a site >> - Blue = an access point >> - Magenta = a client site that has children >> >> So the part in "common" is designed heavily to reduce repetition. When >> it's done, you should be able to feed in sites, APs, clients, devices, etc. >> in a pretty flexible manner. Given how much code is shared between the UISP >> and Splynx integration code, I'm pretty sure both will be cut to a tiny >> fraction of the total code. :-) >> >> I can't post the full tree, it's full of client names. >> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > Robert Chacón > CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > [-- Attachment #1.2: Type: text/html, Size: 4969 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-28 0:27 ` dan @ 2022-10-28 12:40 ` Herbert Wolverson 2022-10-28 17:43 ` Herbert Wolverson 0 siblings, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-28 12:40 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 4917 bytes --] Dave: I love those Gource animations! Game development is my other hobby, I could easily get lost for weeks tweaking the shaders to make the glow "just right". :-) Dan: Discovery would be nice, but I don't think we're ready to look in that direction yet. I'm trying to build a "common grammar" to make it easier to express network layout from integrations; that would be another form/layer of integration and a lot easier to work with once there's a solid foundation. Preseem does some of this (admittedly over-eagerly; nothing needs to query SNMP that often!), and the SNMP route is quite remarkably convoluted. Their support turned on a few "extra" modules to deal with things like PMP450 clients that change MAC when you put them in bridge mode vs NAT mode (and report the bridge mode CPE in some places either way), Elevate CPEs that almost but not quite make sense. Robert's code has the beginnings of some of this, scanning Mikrotik routers for IPv6 allocations by MAC (this is also the hardest part for me to test, since I don't have any v6 to test, currently). We tend to use UISP as the "source of truth" and treat it like a database for a ton of external tools (mostly ones we've created). On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: > we're pretty similar in that we've made UISP a mess. Multiple paths to a > pop. multiple pops on the network. failover between pops. Lots of > 'other' devices. handing out /29 etc to customers. > > Some sort of discovery would be nice. Ideally though, pulling something > from SNMP or router APIs etc to build the paths, but having a 'network > elements' list with each of the links described. ie, backhaul 12 has MACs > ..01 and ...02 at 300x100 and then build the topology around that from > discovery. > > I've also thought about doing routine trace routes or watching TTLs or > something like that to get some indication that topology has changed and > then do another discovery and potential tree rebuild. > > On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> This is awesome! Way to go here. Thank you for contributing this. >> Being able to map out these complex integrations will help ISPs a ton, >> and I really like that it is sharing common features between the Splynx and >> UISP integrations. >> >> Thanks, >> Robert >> >> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> So I've been doing some work on getting UISP integration (and >>> integrations in general) to work a bit more smoothly. >>> >>> I started by implementing a graph structure that mirrors both the >>> networks and sites system. It's not done yet, but the basics are coming >>> together nicely. You can see my progress so far at: >>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>> >>> Our UISP instance is a *great* testcase for torturing the system. I >>> even found a case of UISP somehow auto-generating a circular portion of the >>> tree. We have: >>> >>> - Non Ubiquiti devices as "other devices" >>> - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 >>> shared 100 mbit") >>> - Bridge mode devices using Option 82 to always allocate the same >>> IP, with a "service IP" entry >>> - Various bits of infrastructure mapped >>> - Sites that go to client sites, which go to other client sites >>> >>> In other words, over the years we've unleashed a bit of a monster. >>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>> handle pathological cases like us! >>> >>> So I fed our network into the current graph generator, and used graphviz >>> to spit out a directed graph: >>> [image: image.png] >>> That doesn't include client sites! Legend: >>> >>> >>> - Green = the root site. >>> - Red = a site >>> - Blue = an access point >>> - Magenta = a client site that has children >>> >>> So the part in "common" is designed heavily to reduce repetition. When >>> it's done, you should be able to feed in sites, APs, clients, devices, etc. >>> in a pretty flexible manner. Given how much code is shared between the UISP >>> and Splynx integration code, I'm pretty sure both will be cut to a tiny >>> fraction of the total code. :-) >>> >>> I can't post the full tree, it's full of client names. >>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > [-- Attachment #1.2: Type: text/html, Size: 6664 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-28 12:40 ` Herbert Wolverson @ 2022-10-28 17:43 ` Herbert Wolverson 2022-10-28 19:05 ` Robert Chacón 0 siblings, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-28 17:43 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 8313 bytes --] The integration is coming along nicely. Some progress updates: - You can specify a variable in ispConfig.py named "uispSite". This sets where in the topology you want the tree to start. This has two purposes: - It's hard to be psychic and know for sure where the shaper is in the network. - You could run multiple shapers at different egress points, with failover - and rebuild the entire topology from the point of view of a network node. - "Child node with children" are now automatically converted into a "(Generated Site) name" site, and their children rearranged. This: - Allows you to set the "site" bandwidth independently of the client site bandwidth. - Makes for easier trees, because we're inserting the site that really should be there. - Network.json generation (not the shaped devices file yet) is automatically generated from a tree, once PrepareTree() and createNetworkJson() are called. - There's a unit test that generates the network.example.json file and compares it with the original to ensure that they match. - Unit test coverage hits every function in the graph system, now. I'm liking this setup. With the non-vendor-specific logic contained inside the NetworkGraph type, the actual UISP code to generate the example tree is down to 65 lines of code, including comments. That'll grow a bit as I re-insert some automatic speed limit determination, AP/Site speed overrides ( i.e. the integrationUISPbandwidths.csv file). Still pretty clean. Creating the network.example.json file only requires: from integrationCommon import NetworkGraph, NetworkNode, NodeType import json net = NetworkGraph() net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site, 1000, 1000)) net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site, 500, 500)) net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap, 500, 500)) net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", NodeType. site, 500, 500)) net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType.site, 200, 200)) net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap, 120, 120)) net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType.site, 60, 60)) net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.ap, 30, 30)) net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType.site, 200, 200)) net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap, 100, 100)) net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap, 150, 150)) net.prepareTree() net.createNetworkJson() (The id and name fields are duplicated right now, I'm using readable names to keep me sane. The third string is the parent, and the last two numbers are bandwidth limits) The nice, readable format being: NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=NodeType. site, download=1000, upload=1000) That in turns gives you the example network: [image: image.png] On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com> wrote: > Dave: I love those Gource animations! Game development is my other hobby, > I could easily get lost for weeks tweaking the shaders to make the glow > "just right". :-) > > Dan: Discovery would be nice, but I don't think we're ready to look in > that direction yet. I'm trying to build a "common grammar" to make it > easier to express network layout from integrations; that would be another > form/layer of integration and a lot easier to work with once there's a > solid foundation. Preseem does some of this (admittedly over-eagerly; > nothing needs to query SNMP that often!), and the SNMP route is quite > remarkably convoluted. Their support turned on a few "extra" modules to > deal with things like PMP450 clients that change MAC when you put them in > bridge mode vs NAT mode (and report the bridge mode CPE in some places > either way), Elevate CPEs that almost but not quite make sense. Robert's > code has the beginnings of some of this, scanning Mikrotik routers for IPv6 > allocations by MAC (this is also the hardest part for me to test, since I > don't have any v6 to test, currently). > > We tend to use UISP as the "source of truth" and treat it like a database > for a ton of external tools (mostly ones we've created). > > On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: > >> we're pretty similar in that we've made UISP a mess. Multiple paths to a >> pop. multiple pops on the network. failover between pops. Lots of >> 'other' devices. handing out /29 etc to customers. >> >> Some sort of discovery would be nice. Ideally though, pulling something >> from SNMP or router APIs etc to build the paths, but having a 'network >> elements' list with each of the links described. ie, backhaul 12 has MACs >> ..01 and ...02 at 300x100 and then build the topology around that from >> discovery. >> >> I've also thought about doing routine trace routes or watching TTLs or >> something like that to get some indication that topology has changed and >> then do another discovery and potential tree rebuild. >> >> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> This is awesome! Way to go here. Thank you for contributing this. >>> Being able to map out these complex integrations will help ISPs a ton, >>> and I really like that it is sharing common features between the Splynx and >>> UISP integrations. >>> >>> Thanks, >>> Robert >>> >>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> So I've been doing some work on getting UISP integration (and >>>> integrations in general) to work a bit more smoothly. >>>> >>>> I started by implementing a graph structure that mirrors both the >>>> networks and sites system. It's not done yet, but the basics are coming >>>> together nicely. You can see my progress so far at: >>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>> >>>> Our UISP instance is a *great* testcase for torturing the system. I >>>> even found a case of UISP somehow auto-generating a circular portion of the >>>> tree. We have: >>>> >>>> - Non Ubiquiti devices as "other devices" >>>> - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 >>>> shared 100 mbit") >>>> - Bridge mode devices using Option 82 to always allocate the same >>>> IP, with a "service IP" entry >>>> - Various bits of infrastructure mapped >>>> - Sites that go to client sites, which go to other client sites >>>> >>>> In other words, over the years we've unleashed a bit of a monster. >>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>> handle pathological cases like us! >>>> >>>> So I fed our network into the current graph generator, and used >>>> graphviz to spit out a directed graph: >>>> [image: image.png] >>>> That doesn't include client sites! Legend: >>>> >>>> >>>> - Green = the root site. >>>> - Red = a site >>>> - Blue = an access point >>>> - Magenta = a client site that has children >>>> >>>> So the part in "common" is designed heavily to reduce repetition. When >>>> it's done, you should be able to feed in sites, APs, clients, devices, etc. >>>> in a pretty flexible manner. Given how much code is shared between the UISP >>>> and Splynx integration code, I'm pretty sure both will be cut to a tiny >>>> fraction of the total code. :-) >>>> >>>> I can't post the full tree, it's full of client names. >>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> Robert Chacón >>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> [-- Attachment #1.2: Type: text/html, Size: 24123 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-28 17:43 ` Herbert Wolverson @ 2022-10-28 19:05 ` Robert Chacón 2022-10-28 19:54 ` Herbert Wolverson 0 siblings, 1 reply; 33+ messages in thread From: Robert Chacón @ 2022-10-28 19:05 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 8992 bytes --] Wow. This is very nicely done. Awesome work! On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > The integration is coming along nicely. Some progress updates: > > - You can specify a variable in ispConfig.py named "uispSite". This > sets where in the topology you want the tree to start. This has two > purposes: > - It's hard to be psychic and know for sure where the shaper is in > the network. > - You could run multiple shapers at different egress points, with > failover - and rebuild the entire topology from the point of view of a > network node. > - "Child node with children" are now automatically converted into a > "(Generated Site) name" site, and their children rearranged. This: > - Allows you to set the "site" bandwidth independently of the > client site bandwidth. > - Makes for easier trees, because we're inserting the site that > really should be there. > - Network.json generation (not the shaped devices file yet) is > automatically generated from a tree, once PrepareTree() and > createNetworkJson() are called. > - There's a unit test that generates the network.example.json file > and compares it with the original to ensure that they match. > - Unit test coverage hits every function in the graph system, now. > > I'm liking this setup. With the non-vendor-specific logic contained inside > the NetworkGraph type, the actual UISP code to generate the example tree is > down to 65 > lines of code, including comments. That'll grow a bit as I re-insert some > automatic speed limit determination, AP/Site speed overrides ( > i.e. the integrationUISPbandwidths.csv file). Still pretty clean. > > Creating the network.example.json file only requires: > from integrationCommon import NetworkGraph, NetworkNode, NodeType > import json > net = NetworkGraph() > net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site, > 1000, 1000)) > net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site, > 500, 500)) > net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap, > 500, 500)) > net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", NodeType. > site, 500, 500)) > net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType. > site, 200, 200)) > net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap, > 120, 120)) > net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType. > site, 60, 60)) > net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.ap, > 30, 30)) > net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType. > site, 200, 200)) > net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap, > 100, 100)) > net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap, > 150, 150)) > net.prepareTree() > net.createNetworkJson() > > (The id and name fields are duplicated right now, I'm using readable names > to keep me sane. The third string is the parent, and the last two numbers > are bandwidth limits) > The nice, readable format being: > NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=NodeType. > site, download=1000, upload=1000) > > That in turns gives you the example network: > [image: image.png] > > > On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com> > wrote: > >> Dave: I love those Gource animations! Game development is my other hobby, >> I could easily get lost for weeks tweaking the shaders to make the glow >> "just right". :-) >> >> Dan: Discovery would be nice, but I don't think we're ready to look in >> that direction yet. I'm trying to build a "common grammar" to make it >> easier to express network layout from integrations; that would be another >> form/layer of integration and a lot easier to work with once there's a >> solid foundation. Preseem does some of this (admittedly over-eagerly; >> nothing needs to query SNMP that often!), and the SNMP route is quite >> remarkably convoluted. Their support turned on a few "extra" modules to >> deal with things like PMP450 clients that change MAC when you put them in >> bridge mode vs NAT mode (and report the bridge mode CPE in some places >> either way), Elevate CPEs that almost but not quite make sense. Robert's >> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >> allocations by MAC (this is also the hardest part for me to test, since I >> don't have any v6 to test, currently). >> >> We tend to use UISP as the "source of truth" and treat it like a database >> for a ton of external tools (mostly ones we've created). >> >> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >> >>> we're pretty similar in that we've made UISP a mess. Multiple paths to >>> a pop. multiple pops on the network. failover between pops. Lots of >>> 'other' devices. handing out /29 etc to customers. >>> >>> Some sort of discovery would be nice. Ideally though, pulling something >>> from SNMP or router APIs etc to build the paths, but having a 'network >>> elements' list with each of the links described. ie, backhaul 12 has MACs >>> ..01 and ...02 at 300x100 and then build the topology around that from >>> discovery. >>> >>> I've also thought about doing routine trace routes or watching TTLs or >>> something like that to get some indication that topology has changed and >>> then do another discovery and potential tree rebuild. >>> >>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> This is awesome! Way to go here. Thank you for contributing this. >>>> Being able to map out these complex integrations will help ISPs a ton, >>>> and I really like that it is sharing common features between the Splynx and >>>> UISP integrations. >>>> >>>> Thanks, >>>> Robert >>>> >>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> So I've been doing some work on getting UISP integration (and >>>>> integrations in general) to work a bit more smoothly. >>>>> >>>>> I started by implementing a graph structure that mirrors both the >>>>> networks and sites system. It's not done yet, but the basics are coming >>>>> together nicely. You can see my progress so far at: >>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>> >>>>> Our UISP instance is a *great* testcase for torturing the system. I >>>>> even found a case of UISP somehow auto-generating a circular portion of the >>>>> tree. We have: >>>>> >>>>> - Non Ubiquiti devices as "other devices" >>>>> - Sections that need shaping by subnet (e.g. "all of 192.168.1.0/24 >>>>> shared 100 mbit") >>>>> - Bridge mode devices using Option 82 to always allocate the same >>>>> IP, with a "service IP" entry >>>>> - Various bits of infrastructure mapped >>>>> - Sites that go to client sites, which go to other client sites >>>>> >>>>> In other words, over the years we've unleashed a bit of a monster. >>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>>> handle pathological cases like us! >>>>> >>>>> So I fed our network into the current graph generator, and used >>>>> graphviz to spit out a directed graph: >>>>> [image: image.png] >>>>> That doesn't include client sites! Legend: >>>>> >>>>> >>>>> - Green = the root site. >>>>> - Red = a site >>>>> - Blue = an access point >>>>> - Magenta = a client site that has children >>>>> >>>>> So the part in "common" is designed heavily to reduce repetition. When >>>>> it's done, you should be able to feed in sites, APs, clients, devices, etc. >>>>> in a pretty flexible manner. Given how much code is shared between the UISP >>>>> and Splynx integration code, I'm pretty sure both will be cut to a tiny >>>>> fraction of the total code. :-) >>>>> >>>>> I can't post the full tree, it's full of client names. >>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> Robert Chacón >>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> Dev | LibreQoS.io [-- Attachment #1.2: Type: text/html, Size: 25229 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-28 19:05 ` Robert Chacón @ 2022-10-28 19:54 ` Herbert Wolverson 2022-10-28 21:15 ` Robert Chacón 0 siblings, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-28 19:54 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 10246 bytes --] One more update, and I'm going to sleep until "pick up daughter" time. :-) The tree at https://github.com/thebracket/LibreQoS/tree/integration-common-graph can now build a network.json, ShapedDevices.csv, and integrationUISPBandwidth.csv and follows pretty much the same logic as the previous importer - other than using data links to build the hierarchy and letting (requiring, currently) you specify the root node. It's handling our bizarre UISP setup pretty well now - so if anyone wants to test it (I recommend just running integrationUISP.py and checking the output rather than throwing it into production), I'd appreciate any feedback. Still on my list: handling the Mikrotik IPv6 connections, and exceptionCPE and site exclusion. If you want the pretty graphics, you need to "pip install graphviz" and "sudo apt install graphviz". It *should* detect that these aren't present and not try to draw pictures, otherwise. On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < robert.chacon@jackrabbitwireless.com> wrote: > Wow. This is very nicely done. Awesome work! > > On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> The integration is coming along nicely. Some progress updates: >> >> - You can specify a variable in ispConfig.py named "uispSite". This >> sets where in the topology you want the tree to start. This has two >> purposes: >> - It's hard to be psychic and know for sure where the shaper is in >> the network. >> - You could run multiple shapers at different egress points, with >> failover - and rebuild the entire topology from the point of view of a >> network node. >> - "Child node with children" are now automatically converted into a >> "(Generated Site) name" site, and their children rearranged. This: >> - Allows you to set the "site" bandwidth independently of the >> client site bandwidth. >> - Makes for easier trees, because we're inserting the site that >> really should be there. >> - Network.json generation (not the shaped devices file yet) is >> automatically generated from a tree, once PrepareTree() and >> createNetworkJson() are called. >> - There's a unit test that generates the network.example.json file >> and compares it with the original to ensure that they match. >> - Unit test coverage hits every function in the graph system, now. >> >> I'm liking this setup. With the non-vendor-specific logic contained >> inside the NetworkGraph type, the actual UISP code to generate the example >> tree is down to 65 >> lines of code, including comments. That'll grow a bit as I re-insert some >> automatic speed limit determination, AP/Site speed overrides ( >> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >> >> Creating the network.example.json file only requires: >> from integrationCommon import NetworkGraph, NetworkNode, NodeType >> import json >> net = NetworkGraph() >> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site, >> 1000, 1000)) >> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site, >> 500, 500)) >> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap, >> 500, 500)) >> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", NodeType >> .site, 500, 500)) >> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType. >> site, 200, 200)) >> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap, >> 120, 120)) >> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType. >> site, 60, 60)) >> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType.ap, >> 30, 30)) >> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType. >> site, 200, 200)) >> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap, >> 100, 100)) >> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap, >> 150, 150)) >> net.prepareTree() >> net.createNetworkJson() >> >> (The id and name fields are duplicated right now, I'm using readable >> names to keep me sane. The third string is the parent, and the last two >> numbers are bandwidth limits) >> The nice, readable format being: >> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=NodeType >> .site, download=1000, upload=1000) >> >> That in turns gives you the example network: >> [image: image.png] >> >> >> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com> >> wrote: >> >>> Dave: I love those Gource animations! Game development is my other >>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>> glow "just right". :-) >>> >>> Dan: Discovery would be nice, but I don't think we're ready to look in >>> that direction yet. I'm trying to build a "common grammar" to make it >>> easier to express network layout from integrations; that would be another >>> form/layer of integration and a lot easier to work with once there's a >>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>> nothing needs to query SNMP that often!), and the SNMP route is quite >>> remarkably convoluted. Their support turned on a few "extra" modules to >>> deal with things like PMP450 clients that change MAC when you put them in >>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>> allocations by MAC (this is also the hardest part for me to test, since I >>> don't have any v6 to test, currently). >>> >>> We tend to use UISP as the "source of truth" and treat it like a >>> database for a ton of external tools (mostly ones we've created). >>> >>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>> >>>> we're pretty similar in that we've made UISP a mess. Multiple paths to >>>> a pop. multiple pops on the network. failover between pops. Lots of >>>> 'other' devices. handing out /29 etc to customers. >>>> >>>> Some sort of discovery would be nice. Ideally though, pulling >>>> something from SNMP or router APIs etc to build the paths, but having a >>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>> from discovery. >>>> >>>> I've also thought about doing routine trace routes or watching TTLs or >>>> something like that to get some indication that topology has changed and >>>> then do another discovery and potential tree rebuild. >>>> >>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>> Being able to map out these complex integrations will help ISPs a ton, >>>>> and I really like that it is sharing common features between the Splynx and >>>>> UISP integrations. >>>>> >>>>> Thanks, >>>>> Robert >>>>> >>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> So I've been doing some work on getting UISP integration (and >>>>>> integrations in general) to work a bit more smoothly. >>>>>> >>>>>> I started by implementing a graph structure that mirrors both the >>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>> together nicely. You can see my progress so far at: >>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>> >>>>>> Our UISP instance is a *great* testcase for torturing the system. I >>>>>> even found a case of UISP somehow auto-generating a circular portion of the >>>>>> tree. We have: >>>>>> >>>>>> - Non Ubiquiti devices as "other devices" >>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>> - Bridge mode devices using Option 82 to always allocate the same >>>>>> IP, with a "service IP" entry >>>>>> - Various bits of infrastructure mapped >>>>>> - Sites that go to client sites, which go to other client sites >>>>>> >>>>>> In other words, over the years we've unleashed a bit of a monster. >>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>>>> handle pathological cases like us! >>>>>> >>>>>> So I fed our network into the current graph generator, and used >>>>>> graphviz to spit out a directed graph: >>>>>> [image: image.png] >>>>>> That doesn't include client sites! Legend: >>>>>> >>>>>> >>>>>> - Green = the root site. >>>>>> - Red = a site >>>>>> - Blue = an access point >>>>>> - Magenta = a client site that has children >>>>>> >>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>> tiny fraction of the total code. :-) >>>>>> >>>>>> I can't post the full tree, it's full of client names. >>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Chacón >>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > Robert Chacón > CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> > Dev | LibreQoS.io > > [-- Attachment #1.2: Type: text/html, Size: 26755 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-28 19:54 ` Herbert Wolverson @ 2022-10-28 21:15 ` Robert Chacón 2022-10-29 15:57 ` Herbert Wolverson 0 siblings, 1 reply; 33+ messages in thread From: Robert Chacón @ 2022-10-28 21:15 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 11603 bytes --] Awesome work. It succeeded in building the topology and creating ShapedDevices.csv for my network. It even graphed it perfectly. Nice! I notice that in ShapedDevices.csv it does add CPE radios (which in our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both being empty lists []. This is not necessarily bad, but it may lead to empty leaf classes being created on LibreQoS.py runs. Not a huge deal, it just makes the minor class counter increment toward the 32k limit faster. Do you think perhaps we should check: *if (len(IPv4) == 0) and (len(IPv6) == 0):* * # Skip adding this entry to ShapedDevices.csv* Or something similar around line 329 of integrationCommon.py? Open to your suggestions there. On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > One more update, and I'm going to sleep until "pick up daughter" time. :-) > > The tree at > https://github.com/thebracket/LibreQoS/tree/integration-common-graph can > now build a network.json, ShapedDevices.csv, and > integrationUISPBandwidth.csv and follows pretty much the same logic as the > previous importer - other than using data links to build the hierarchy and > letting (requiring, currently) you specify the root node. It's handling our > bizarre UISP setup pretty well now - so if anyone wants to test it (I > recommend just running integrationUISP.py and checking the output rather > than throwing it into production), I'd appreciate any feedback. > > Still on my list: handling the Mikrotik IPv6 connections, and exceptionCPE > and site exclusion. > > If you want the pretty graphics, you need to "pip install graphviz" and > "sudo apt install graphviz". It *should* detect that these aren't present > and not try to draw pictures, otherwise. > > On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < > robert.chacon@jackrabbitwireless.com> wrote: > >> Wow. This is very nicely done. Awesome work! >> >> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> The integration is coming along nicely. Some progress updates: >>> >>> - You can specify a variable in ispConfig.py named "uispSite". This >>> sets where in the topology you want the tree to start. This has two >>> purposes: >>> - It's hard to be psychic and know for sure where the shaper is >>> in the network. >>> - You could run multiple shapers at different egress points, with >>> failover - and rebuild the entire topology from the point of view of a >>> network node. >>> - "Child node with children" are now automatically converted into a >>> "(Generated Site) name" site, and their children rearranged. This: >>> - Allows you to set the "site" bandwidth independently of the >>> client site bandwidth. >>> - Makes for easier trees, because we're inserting the site that >>> really should be there. >>> - Network.json generation (not the shaped devices file yet) is >>> automatically generated from a tree, once PrepareTree() and >>> createNetworkJson() are called. >>> - There's a unit test that generates the network.example.json >>> file and compares it with the original to ensure that they match. >>> - Unit test coverage hits every function in the graph system, now. >>> >>> I'm liking this setup. With the non-vendor-specific logic contained >>> inside the NetworkGraph type, the actual UISP code to generate the example >>> tree is down to 65 >>> lines of code, including comments. That'll grow a bit as I re-insert >>> some automatic speed limit determination, AP/Site speed overrides ( >>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>> >>> Creating the network.example.json file only requires: >>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>> import json >>> net = NetworkGraph() >>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.site, >>> 1000, 1000)) >>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.site, >>> 500, 500)) >>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType.ap, >>> 500, 500)) >>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>> NodeType.site, 500, 500)) >>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType. >>> site, 200, 200)) >>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap, >>> 120, 120)) >>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType. >>> site, 60, 60)) >>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType. >>> ap, 30, 30)) >>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType. >>> site, 200, 200)) >>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap, >>> 100, 100)) >>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType.ap, >>> 150, 150)) >>> net.prepareTree() >>> net.createNetworkJson() >>> >>> (The id and name fields are duplicated right now, I'm using readable >>> names to keep me sane. The third string is the parent, and the last two >>> numbers are bandwidth limits) >>> The nice, readable format being: >>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>> NodeType.site, download=1000, upload=1000) >>> >>> That in turns gives you the example network: >>> [image: image.png] >>> >>> >>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <herberticus@gmail.com> >>> wrote: >>> >>>> Dave: I love those Gource animations! Game development is my other >>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>> glow "just right". :-) >>>> >>>> Dan: Discovery would be nice, but I don't think we're ready to look in >>>> that direction yet. I'm trying to build a "common grammar" to make it >>>> easier to express network layout from integrations; that would be another >>>> form/layer of integration and a lot easier to work with once there's a >>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>> deal with things like PMP450 clients that change MAC when you put them in >>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>> allocations by MAC (this is also the hardest part for me to test, since I >>>> don't have any v6 to test, currently). >>>> >>>> We tend to use UISP as the "source of truth" and treat it like a >>>> database for a ton of external tools (mostly ones we've created). >>>> >>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>> >>>>> we're pretty similar in that we've made UISP a mess. Multiple paths >>>>> to a pop. multiple pops on the network. failover between pops. Lots of >>>>> 'other' devices. handing out /29 etc to customers. >>>>> >>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>> from discovery. >>>>> >>>>> I've also thought about doing routine trace routes or watching TTLs or >>>>> something like that to get some indication that topology has changed and >>>>> then do another discovery and potential tree rebuild. >>>>> >>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>> Being able to map out these complex integrations will help ISPs a >>>>>> ton, and I really like that it is sharing common features between the >>>>>> Splynx and UISP integrations. >>>>>> >>>>>> Thanks, >>>>>> Robert >>>>>> >>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>> >>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>> integrations in general) to work a bit more smoothly. >>>>>>> >>>>>>> I started by implementing a graph structure that mirrors both the >>>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>>> together nicely. You can see my progress so far at: >>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>> >>>>>>> Our UISP instance is a *great* testcase for torturing the system. I >>>>>>> even found a case of UISP somehow auto-generating a circular portion of the >>>>>>> tree. We have: >>>>>>> >>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>> same IP, with a "service IP" entry >>>>>>> - Various bits of infrastructure mapped >>>>>>> - Sites that go to client sites, which go to other client sites >>>>>>> >>>>>>> In other words, over the years we've unleashed a bit of a monster. >>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>>>>> handle pathological cases like us! >>>>>>> >>>>>>> So I fed our network into the current graph generator, and used >>>>>>> graphviz to spit out a directed graph: >>>>>>> [image: image.png] >>>>>>> That doesn't include client sites! Legend: >>>>>>> >>>>>>> >>>>>>> - Green = the root site. >>>>>>> - Red = a site >>>>>>> - Blue = an access point >>>>>>> - Magenta = a client site that has children >>>>>>> >>>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>>> tiny fraction of the total code. :-) >>>>>>> >>>>>>> I can't post the full tree, it's full of client names. >>>>>>> _______________________________________________ >>>>>>> LibreQoS mailing list >>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Robert Chacón >>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >> Dev | LibreQoS.io >> >> _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> Dev | LibreQoS.io [-- Attachment #1.2: Type: text/html, Size: 28658 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-28 21:15 ` Robert Chacón @ 2022-10-29 15:57 ` Herbert Wolverson 2022-10-29 19:05 ` Robert Chacón 2022-10-29 19:18 ` Dave Taht 0 siblings, 2 replies; 33+ messages in thread From: Herbert Wolverson @ 2022-10-29 15:57 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 15207 bytes --] Alright, the UISP side of the common integrations is pretty much feature complete. I'll update the tracking issue in a bit. - Per your suggestion, devices with no IP addresses (v4 or v6) are not added. - Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side of things, so it can be used in other integrations also. I don't have a setup on which to test it, but if I'm reading the code right then the unit test is testing it appropriately. - excludeSites is supported as a common API feature. If a node is added with a name that matches an excluded site, it won't be added. The tree builder is smart enough to replace invalid "parentId" references with the shaper root, so if you have other tree items that rely on this site - they will be added to the tree. Was that the intent? (It looks pretty useful; we have a child site down the tree with a HUGE amount of load, and bumping it to the top-level with excludeSites would probably help our load balancing quite a bit) - If the intent was to exclude the site and everything underneath it, I'd have to rework things a bit. Let me know; it wasn't quite clear. - exceptionCPEs is also supported as a common API feature. It simply overrides the "parentId'' of incoming nodes with the new parent. Another potentially useful feature; if I got excludeSites the wrong away around, I'd add a "my_big_site":"" entry to push it to the top. - UISP integration now supports a "flat" topology option (set via uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to include this entry. I'll look and see how much of the Spylnx code I can shorten with the new API; I don't have a Spylnx setup to test against, making that tricky. I *think* the new API should shorten things a lot. I think routers act as node parents, with clients underneath them? Otherwise, a "flat" setup should be a little shorter (the CSV code can be replaced with a call to the graph builder). Most of the Spylnx (and VISP) users I've talked to layer MPLS+VPLS to pretend to have a big, flat network and then connect via a RADIUS call in the DHCP server; I've always assumed that's because those systems prefer the telecom model of "pretend everything is equal" to trying to model topology.* I need to clean things up a bit (there's still a bit of duplicated code, and I believe in the DRY principle - don't repeat yourself; Dave Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, and I feel obliged to use it everywhere!), and do a quick rebase (I accidentally parented the branch off of a branch instead of main) - but I think I can have this as a PR for you on Monday. * - The first big wireless network I setup used a Motorola WiMAX setup. They *required* that every single AP share two VLANs (management and bearer) with every other AP - all the way to the core. It kinda worked once they remembered client isolation was a thing in a patch... Then again, their installation instructions included connecting two ports of a router together with a jumper cable, because their localhost implementation didn't quite work. :-| On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < robert.chacon@jackrabbitwireless.com> wrote: > Awesome work. It succeeded in building the topology and creating > ShapedDevices.csv for my network. It even graphed it perfectly. Nice! > I notice that in ShapedDevices.csv it does add CPE radios (which in our > case we don't shape - they are in bridge mode) with IPv4 and IPv6s both > being empty lists []. > This is not necessarily bad, but it may lead to empty leaf classes being > created on LibreQoS.py runs. Not a huge deal, it just makes the minor class > counter increment toward the 32k limit faster. > Do you think perhaps we should check: > *if (len(IPv4) == 0) and (len(IPv6) == 0):* > * # Skip adding this entry to ShapedDevices.csv* > Or something similar around line 329 of integrationCommon.py? > Open to your suggestions there. > > > > On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> One more update, and I'm going to sleep until "pick up daughter" time. :-) >> >> The tree at >> https://github.com/thebracket/LibreQoS/tree/integration-common-graph can >> now build a network.json, ShapedDevices.csv, and >> integrationUISPBandwidth.csv and follows pretty much the same logic as the >> previous importer - other than using data links to build the hierarchy and >> letting (requiring, currently) you specify the root node. It's handling our >> bizarre UISP setup pretty well now - so if anyone wants to test it (I >> recommend just running integrationUISP.py and checking the output rather >> than throwing it into production), I'd appreciate any feedback. >> >> Still on my list: handling the Mikrotik IPv6 connections, and >> exceptionCPE and site exclusion. >> >> If you want the pretty graphics, you need to "pip install graphviz" and >> "sudo apt install graphviz". It *should* detect that these aren't present >> and not try to draw pictures, otherwise. >> >> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >> robert.chacon@jackrabbitwireless.com> wrote: >> >>> Wow. This is very nicely done. Awesome work! >>> >>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> The integration is coming along nicely. Some progress updates: >>>> >>>> - You can specify a variable in ispConfig.py named "uispSite". This >>>> sets where in the topology you want the tree to start. This has two >>>> purposes: >>>> - It's hard to be psychic and know for sure where the shaper is >>>> in the network. >>>> - You could run multiple shapers at different egress points, >>>> with failover - and rebuild the entire topology from the point of view of a >>>> network node. >>>> - "Child node with children" are now automatically converted into a >>>> "(Generated Site) name" site, and their children rearranged. This: >>>> - Allows you to set the "site" bandwidth independently of the >>>> client site bandwidth. >>>> - Makes for easier trees, because we're inserting the site that >>>> really should be there. >>>> - Network.json generation (not the shaped devices file yet) is >>>> automatically generated from a tree, once PrepareTree() and >>>> createNetworkJson() are called. >>>> - There's a unit test that generates the network.example.json >>>> file and compares it with the original to ensure that they match. >>>> - Unit test coverage hits every function in the graph system, now. >>>> >>>> I'm liking this setup. With the non-vendor-specific logic contained >>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>> tree is down to 65 >>>> lines of code, including comments. That'll grow a bit as I re-insert >>>> some automatic speed limit determination, AP/Site speed overrides ( >>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>> >>>> Creating the network.example.json file only requires: >>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>> import json >>>> net = NetworkGraph() >>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType. >>>> site, 1000, 1000)) >>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType. >>>> site, 500, 500)) >>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType. >>>> ap, 500, 500)) >>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>> NodeType.site, 500, 500)) >>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", NodeType >>>> .site, 200, 200)) >>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.ap, >>>> 120, 120)) >>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType. >>>> site, 60, 60)) >>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType. >>>> ap, 30, 30)) >>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", NodeType >>>> .site, 200, 200)) >>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.ap, >>>> 100, 100)) >>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType. >>>> ap, 150, 150)) >>>> net.prepareTree() >>>> net.createNetworkJson() >>>> >>>> (The id and name fields are duplicated right now, I'm using readable >>>> names to keep me sane. The third string is the parent, and the last two >>>> numbers are bandwidth limits) >>>> The nice, readable format being: >>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>> NodeType.site, download=1000, upload=1000) >>>> >>>> That in turns gives you the example network: >>>> [image: image.png] >>>> >>>> >>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>> herberticus@gmail.com> wrote: >>>> >>>>> Dave: I love those Gource animations! Game development is my other >>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>> glow "just right". :-) >>>>> >>>>> Dan: Discovery would be nice, but I don't think we're ready to look in >>>>> that direction yet. I'm trying to build a "common grammar" to make it >>>>> easier to express network layout from integrations; that would be another >>>>> form/layer of integration and a lot easier to work with once there's a >>>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>>> deal with things like PMP450 clients that change MAC when you put them in >>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>>> allocations by MAC (this is also the hardest part for me to test, since I >>>>> don't have any v6 to test, currently). >>>>> >>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>> database for a ton of external tools (mostly ones we've created). >>>>> >>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>> >>>>>> we're pretty similar in that we've made UISP a mess. Multiple paths >>>>>> to a pop. multiple pops on the network. failover between pops. Lots of >>>>>> 'other' devices. handing out /29 etc to customers. >>>>>> >>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>> from discovery. >>>>>> >>>>>> I've also thought about doing routine trace routes or watching TTLs >>>>>> or something like that to get some indication that topology has changed and >>>>>> then do another discovery and potential tree rebuild. >>>>>> >>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>> >>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>> Being able to map out these complex integrations will help ISPs a >>>>>>> ton, and I really like that it is sharing common features between the >>>>>>> Splynx and UISP integrations. >>>>>>> >>>>>>> Thanks, >>>>>>> Robert >>>>>>> >>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>> >>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>> >>>>>>>> I started by implementing a graph structure that mirrors both the >>>>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>>>> together nicely. You can see my progress so far at: >>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>> >>>>>>>> Our UISP instance is a *great* testcase for torturing the system. >>>>>>>> I even found a case of UISP somehow auto-generating a circular portion of >>>>>>>> the tree. We have: >>>>>>>> >>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>>> same IP, with a "service IP" entry >>>>>>>> - Various bits of infrastructure mapped >>>>>>>> - Sites that go to client sites, which go to other client sites >>>>>>>> >>>>>>>> In other words, over the years we've unleashed a bit of a monster. >>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>>>>>> handle pathological cases like us! >>>>>>>> >>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>> graphviz to spit out a directed graph: >>>>>>>> [image: image.png] >>>>>>>> That doesn't include client sites! Legend: >>>>>>>> >>>>>>>> >>>>>>>> - Green = the root site. >>>>>>>> - Red = a site >>>>>>>> - Blue = an access point >>>>>>>> - Magenta = a client site that has children >>>>>>>> >>>>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>>>> tiny fraction of the total code. :-) >>>>>>>> >>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>> _______________________________________________ >>>>>>>> LibreQoS mailing list >>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Robert Chacón >>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>> _______________________________________________ >>>>>>> LibreQoS mailing list >>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>> >>>>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> Robert Chacón >>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>> Dev | LibreQoS.io >>> >>> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > Robert Chacón > CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> > Dev | LibreQoS.io > > [-- Attachment #1.2: Type: text/html, Size: 32499 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-29 15:57 ` Herbert Wolverson @ 2022-10-29 19:05 ` Robert Chacón 2022-10-29 19:43 ` Dave Taht 2022-10-29 19:18 ` Dave Taht 1 sibling, 1 reply; 33+ messages in thread From: Robert Chacón @ 2022-10-29 19:05 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 19218 bytes --] > Per your suggestion, devices with no IP addresses (v4 or v6) are not added. > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side of things, so it can be used in other integrations also. I don't have a setup on which to test it, but if I'm reading the code right then the unit test is testing it appropriately. Fantastic. > excludeSites is supported as a common API feature. If a node is added with a name that matches an excluded site, it won't be added. The tree builder is smart enough to replace invalid "parentId" references with the shaper root, so if you have other tree items that rely on this site - they will be added to the tree. Was that the intent? (It looks pretty useful; we have a child site down the tree with a HUGE amount of load, and bumping it to the top-level with excludeSites would probably help our load balancing quite a bit) Very cool approach, I like it! Yeah we have some cases where we need to balance out high load child nodes across CPUs so that's perfect. Originally I thought of it to just exclude sites that don't fit into the shaped topology but this approach is more useful. Should we rename excludeSites to moveSitesToTop or something similar? That functionality of distributing across top level nodes / cpu cores seems more important anyway. >exceptionCPEs is also supported as a common API feature. It simply overrides the "parentId'' of incoming nodes with the new parent. Another potentially useful feature; if I got excludeSites the wrong away around, I'd add a "my_big_site":"" entry to push it to the top. Awesome > UISP integration now supports a "flat" topology option (set via uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to include this entry. Nice! > I'll look and see how much of the Spylnx code I can shorten with the new API; I don't have a Spylnx setup to test against, making that tricky. I'll send you the Splynx login they gave us. > I *think* the new API should shorten things a lot. I think routers act as node parents, with clients underneath them? Otherwise, a "flat" setup should be a little shorter (the CSV code can be replaced with a call to the graph builder). Most of the Spylnx (and VISP) users I've talked to layer MPLS+VPLS to pretend to have a big, flat network and then connect via a RADIUS call in the DHCP server; I've always assumed that's because those systems prefer the telecom model of "pretend everything is equal" to trying to model topology.* Yeah splynx doesn't seem to natively support any topology mapping or even AP designation, one person I spoke to said they track corresponding APs in radius anyway. So for now the flat model may be fine. > I need to clean things up a bit (there's still a bit of duplicated code, and I believe in the DRY principle - don't repeat yourself; Dave Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, and I feel obliged to use it everywhere!), and do a quick rebase (I accidentally parented the branch off of a branch instead of main) - but I think I can have this as a PR for you on Monday. This is really great work and will make future integrations much cleaner and nicer to work with. Thank you! On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > Alright, the UISP side of the common integrations is pretty much feature > complete. I'll update the tracking issue in a bit. > > - Per your suggestion, devices with no IP addresses (v4 or v6) are not > added. > - Mikrotik "4 to 6" mapping is implemented. I put it in the "common" > side of things, so it can be used in other integrations also. I don't have > a setup on which to test it, but if I'm reading the code right then the > unit test is testing it appropriately. > - excludeSites is supported as a common API feature. If a node is > added with a name that matches an excluded site, it won't be added. The > tree builder is smart enough to replace invalid "parentId" references with > the shaper root, so if you have other tree items that rely on this site - > they will be added to the tree. Was that the intent? (It looks pretty > useful; we have a child site down the tree with a HUGE amount of load, and > bumping it to the top-level with excludeSites would probably help our load > balancing quite a bit) > - If the intent was to exclude the site and everything underneath > it, I'd have to rework things a bit. Let me know; it wasn't quite clear. > - exceptionCPEs is also supported as a common API feature. It > simply overrides the "parentId'' of incoming nodes with the new parent. > Another potentially useful feature; if I got excludeSites the wrong away > around, I'd add a "my_big_site":"" entry to push it to the top. > - UISP integration now supports a "flat" topology option (set via > uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py > to include this entry. > > I'll look and see how much of the Spylnx code I can shorten with the new > API; I don't have a Spylnx setup to test against, making that tricky. I > *think* the new API should shorten things a lot. I think routers act as > node parents, with clients underneath them? Otherwise, a "flat" setup > should be a little shorter (the CSV code can be replaced with a call to the > graph builder). Most of the Spylnx (and VISP) users I've talked to layer > MPLS+VPLS to pretend to have a big, flat network and then connect via a > RADIUS call in the DHCP server; I've always assumed that's because those > systems prefer the telecom model of "pretend everything is equal" to trying > to model topology.* > > I need to clean things up a bit (there's still a bit of duplicated code, > and I believe in the DRY principle - don't repeat yourself; Dave Thomas - > my boss at PragProg - coined the term in The Pragmatic Programmer, and I > feel obliged to use it everywhere!), and do a quick rebase (I accidentally > parented the branch off of a branch instead of main) - but I think I can > have this as a PR for you on Monday. > > * - The first big wireless network I setup used a Motorola WiMAX setup. > They *required* that every single AP share two VLANs (management and > bearer) with every other AP - all the way to the core. It kinda worked once > they remembered client isolation was a thing in a patch... Then again, > their installation instructions included connecting two ports of a router > together with a jumper cable, because their localhost implementation didn't > quite work. :-| > > On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < > robert.chacon@jackrabbitwireless.com> wrote: > >> Awesome work. It succeeded in building the topology and creating >> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >> I notice that in ShapedDevices.csv it does add CPE radios (which in our >> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >> being empty lists []. >> This is not necessarily bad, but it may lead to empty leaf classes being >> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class >> counter increment toward the 32k limit faster. >> Do you think perhaps we should check: >> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >> * # Skip adding this entry to ShapedDevices.csv* >> Or something similar around line 329 of integrationCommon.py? >> Open to your suggestions there. >> >> >> >> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> One more update, and I'm going to sleep until "pick up daughter" time. >>> :-) >>> >>> The tree at >>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>> can now build a network.json, ShapedDevices.csv, and >>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>> previous importer - other than using data links to build the hierarchy and >>> letting (requiring, currently) you specify the root node. It's handling our >>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>> recommend just running integrationUISP.py and checking the output rather >>> than throwing it into production), I'd appreciate any feedback. >>> >>> Still on my list: handling the Mikrotik IPv6 connections, and >>> exceptionCPE and site exclusion. >>> >>> If you want the pretty graphics, you need to "pip install graphviz" and >>> "sudo apt install graphviz". It *should* detect that these aren't present >>> and not try to draw pictures, otherwise. >>> >>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>> robert.chacon@jackrabbitwireless.com> wrote: >>> >>>> Wow. This is very nicely done. Awesome work! >>>> >>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> The integration is coming along nicely. Some progress updates: >>>>> >>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>> This sets where in the topology you want the tree to start. This has two >>>>> purposes: >>>>> - It's hard to be psychic and know for sure where the shaper is >>>>> in the network. >>>>> - You could run multiple shapers at different egress points, >>>>> with failover - and rebuild the entire topology from the point of view of a >>>>> network node. >>>>> - "Child node with children" are now automatically converted into >>>>> a "(Generated Site) name" site, and their children rearranged. This: >>>>> - Allows you to set the "site" bandwidth independently of the >>>>> client site bandwidth. >>>>> - Makes for easier trees, because we're inserting the site that >>>>> really should be there. >>>>> - Network.json generation (not the shaped devices file yet) is >>>>> automatically generated from a tree, once PrepareTree() and >>>>> createNetworkJson() are called. >>>>> - There's a unit test that generates the network.example.json >>>>> file and compares it with the original to ensure that they match. >>>>> - Unit test coverage hits every function in the graph system, now. >>>>> >>>>> I'm liking this setup. With the non-vendor-specific logic contained >>>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>>> tree is down to 65 >>>>> lines of code, including comments. That'll grow a bit as I re-insert >>>>> some automatic speed limit determination, AP/Site speed overrides ( >>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>> >>>>> Creating the network.example.json file only requires: >>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>> import json >>>>> net = NetworkGraph() >>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType. >>>>> site, 1000, 1000)) >>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType. >>>>> site, 500, 500)) >>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType. >>>>> ap, 500, 500)) >>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>> NodeType.site, 500, 500)) >>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>> NodeType.site, 200, 200)) >>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType. >>>>> ap, 120, 120)) >>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType >>>>> .site, 60, 60)) >>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType >>>>> .ap, 30, 30)) >>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>> NodeType.site, 200, 200)) >>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType. >>>>> ap, 100, 100)) >>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType. >>>>> ap, 150, 150)) >>>>> net.prepareTree() >>>>> net.createNetworkJson() >>>>> >>>>> (The id and name fields are duplicated right now, I'm using readable >>>>> names to keep me sane. The third string is the parent, and the last two >>>>> numbers are bandwidth limits) >>>>> The nice, readable format being: >>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>> NodeType.site, download=1000, upload=1000) >>>>> >>>>> That in turns gives you the example network: >>>>> [image: image.png] >>>>> >>>>> >>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>> herberticus@gmail.com> wrote: >>>>> >>>>>> Dave: I love those Gource animations! Game development is my other >>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>>> glow "just right". :-) >>>>>> >>>>>> Dan: Discovery would be nice, but I don't think we're ready to look >>>>>> in that direction yet. I'm trying to build a "common grammar" to make it >>>>>> easier to express network layout from integrations; that would be another >>>>>> form/layer of integration and a lot easier to work with once there's a >>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>>>> deal with things like PMP450 clients that change MAC when you put them in >>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>>>> allocations by MAC (this is also the hardest part for me to test, since I >>>>>> don't have any v6 to test, currently). >>>>>> >>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>> database for a ton of external tools (mostly ones we've created). >>>>>> >>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>> >>>>>>> we're pretty similar in that we've made UISP a mess. Multiple paths >>>>>>> to a pop. multiple pops on the network. failover between pops. Lots of >>>>>>> 'other' devices. handing out /29 etc to customers. >>>>>>> >>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>> from discovery. >>>>>>> >>>>>>> I've also thought about doing routine trace routes or watching TTLs >>>>>>> or something like that to get some indication that topology has changed and >>>>>>> then do another discovery and potential tree rebuild. >>>>>>> >>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>> >>>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>>> Being able to map out these complex integrations will help ISPs a >>>>>>>> ton, and I really like that it is sharing common features between the >>>>>>>> Splynx and UISP integrations. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Robert >>>>>>>> >>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>> >>>>>>>>> I started by implementing a graph structure that mirrors both the >>>>>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>> >>>>>>>>> Our UISP instance is a *great* testcase for torturing the system. >>>>>>>>> I even found a case of UISP somehow auto-generating a circular portion of >>>>>>>>> the tree. We have: >>>>>>>>> >>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>>>> same IP, with a "service IP" entry >>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>> - Sites that go to client sites, which go to other client sites >>>>>>>>> >>>>>>>>> In other words, over the years we've unleashed a bit of a monster. >>>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>>>>>>> handle pathological cases like us! >>>>>>>>> >>>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>>> graphviz to spit out a directed graph: >>>>>>>>> [image: image.png] >>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>> >>>>>>>>> >>>>>>>>> - Green = the root site. >>>>>>>>> - Red = a site >>>>>>>>> - Blue = an access point >>>>>>>>> - Magenta = a client site that has children >>>>>>>>> >>>>>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>>>>> tiny fraction of the total code. :-) >>>>>>>>> >>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Chacón >>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>> _______________________________________________ >>>>>>>> LibreQoS mailing list >>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>> >>>>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> Robert Chacón >>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>> Dev | LibreQoS.io >>>> >>>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >> Dev | LibreQoS.io >> >> _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> Dev | LibreQoS.io [-- Attachment #1.2: Type: text/html, Size: 37451 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-29 19:05 ` Robert Chacón @ 2022-10-29 19:43 ` Dave Taht 2022-10-30 1:45 ` Herbert Wolverson 0 siblings, 1 reply; 33+ messages in thread From: Dave Taht @ 2022-10-29 19:43 UTC (permalink / raw) To: Robert Chacón; +Cc: Herbert Wolverson, libreqos [-- Attachment #1.1: Type: text/plain, Size: 21255 bytes --] For starters, let me also offer praise for this work which is so ahead of schedule! I am (perhaps cluelessly) thinking about bigger pictures, and still stuck in my mindset involving distributing the packet processing, and representing the network topology, plans and compensating for the physics. So you have a major tower, a separate libreqos instance goes there. Or libreqos outputs rules compatible with mikrotik or vyatta or whatever is there. Or are you basically thinking one device rules them all and off the only interface, shapes them? Or: You have another pop with a separate connection to the internet that you inherited from a buyout, or you wanted physical redundancy for your BGP AS's internet access, maybe just between DCs in the same town or... ____________________________________________ / / cloud -> pop -> customers - customers <- pop <- cloud \ ----- leased fiber or wireless / I'm also a little puzzled as to whats the ISP->internet link? juniper? cisco? mikrotik, and what role and services that is expected to have. On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > > Per your suggestion, devices with no IP addresses (v4 or v6) are not > added. > > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side > of things, so it can be used in other integrations also. I don't have a > setup on which to test it, but if I'm reading the code right then the unit > test is testing it appropriately. > > Fantastic. > > > excludeSites is supported as a common API feature. If a node is added > with a name that matches an excluded site, it won't be added. The tree > builder is smart enough to replace invalid "parentId" references with the > shaper root, so if you have other tree items that rely on this site - they > will be added to the tree. Was that the intent? (It looks pretty useful; we > have a child site down the tree with a HUGE amount of load, and bumping it > to the top-level with excludeSites would probably help our load balancing > quite a bit) > > Very cool approach, I like it! Yeah we have some cases where we need to > balance out high load child nodes across CPUs so that's perfect. > Originally I thought of it to just exclude sites that don't fit into the > shaped topology but this approach is more useful. > Should we rename excludeSites to moveSitesToTop or something similar? That > functionality of distributing across top level nodes / cpu cores seems more > important anyway. > > >exceptionCPEs is also supported as a common API feature. It simply > overrides the "parentId'' of incoming nodes with the new parent. Another > potentially useful feature; if I got excludeSites the wrong away around, > I'd add a "my_big_site":"" entry to push it to the top. > > Awesome > > > UISP integration now supports a "flat" topology option (set via > uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to > include this entry. > > Nice! > > > I'll look and see how much of the Spylnx code I can shorten with the new > API; I don't have a Spylnx setup to test against, making that tricky. > > I'll send you the Splynx login they gave us. > > > I *think* the new API should shorten things a lot. I think routers act > as node parents, with clients underneath them? Otherwise, a "flat" setup > should be a little shorter (the CSV code can be replaced with a call to the > graph builder). Most of the Spylnx (and VISP) users I've talked to layer > MPLS+VPLS to pretend to have a big, flat network and then connect via a > RADIUS call in the DHCP server; I've always assumed that's because those > systems prefer the telecom model of "pretend everything is equal" to trying > to model topology.* > > Yeah splynx doesn't seem to natively support any topology mapping or even > AP designation, one person I spoke to said they track corresponding APs in > radius anyway. So for now the flat model may be fine. > > > I need to clean things up a bit (there's still a bit of duplicated code, > and I believe in the DRY principle - don't repeat yourself; Dave Thomas - > my boss at PragProg - coined the term in The Pragmatic Programmer, and I > feel obliged to use it everywhere!), and do a quick rebase (I accidentally > parented the branch off of a branch instead of main) - but I think I can > have this as a PR for you on Monday. > > This is really great work and will make future integrations much cleaner > and nicer to work with. Thank you! > > > On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> Alright, the UISP side of the common integrations is pretty much feature >> complete. I'll update the tracking issue in a bit. >> >> - Per your suggestion, devices with no IP addresses (v4 or v6) are >> not added. >> - Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >> side of things, so it can be used in other integrations also. I don't have >> a setup on which to test it, but if I'm reading the code right then the >> unit test is testing it appropriately. >> - excludeSites is supported as a common API feature. If a node is >> added with a name that matches an excluded site, it won't be added. The >> tree builder is smart enough to replace invalid "parentId" references with >> the shaper root, so if you have other tree items that rely on this site - >> they will be added to the tree. Was that the intent? (It looks pretty >> useful; we have a child site down the tree with a HUGE amount of load, and >> bumping it to the top-level with excludeSites would probably help our load >> balancing quite a bit) >> - If the intent was to exclude the site and everything underneath >> it, I'd have to rework things a bit. Let me know; it wasn't quite clear. >> - exceptionCPEs is also supported as a common API feature. It >> simply overrides the "parentId'' of incoming nodes with the new parent. >> Another potentially useful feature; if I got excludeSites the wrong away >> around, I'd add a "my_big_site":"" entry to push it to the top. >> - UISP integration now supports a "flat" topology option (set via >> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >> to include this entry. >> >> I'll look and see how much of the Spylnx code I can shorten with the new >> API; I don't have a Spylnx setup to test against, making that tricky. I >> *think* the new API should shorten things a lot. I think routers act as >> node parents, with clients underneath them? Otherwise, a "flat" setup >> should be a little shorter (the CSV code can be replaced with a call to the >> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >> MPLS+VPLS to pretend to have a big, flat network and then connect via a >> RADIUS call in the DHCP server; I've always assumed that's because those >> systems prefer the telecom model of "pretend everything is equal" to trying >> to model topology.* >> >> I need to clean things up a bit (there's still a bit of duplicated code, >> and I believe in the DRY principle - don't repeat yourself; Dave Thomas - >> my boss at PragProg - coined the term in The Pragmatic Programmer, and I >> feel obliged to use it everywhere!), and do a quick rebase (I accidentally >> parented the branch off of a branch instead of main) - but I think I can >> have this as a PR for you on Monday. >> >> * - The first big wireless network I setup used a Motorola WiMAX setup. >> They *required* that every single AP share two VLANs (management and >> bearer) with every other AP - all the way to the core. It kinda worked once >> they remembered client isolation was a thing in a patch... Then again, >> their installation instructions included connecting two ports of a router >> together with a jumper cable, because their localhost implementation didn't >> quite work. :-| >> >> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >> robert.chacon@jackrabbitwireless.com> wrote: >> >>> Awesome work. It succeeded in building the topology and creating >>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>> I notice that in ShapedDevices.csv it does add CPE radios (which in our >>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>> being empty lists []. >>> This is not necessarily bad, but it may lead to empty leaf classes being >>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class >>> counter increment toward the 32k limit faster. >>> Do you think perhaps we should check: >>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>> * # Skip adding this entry to ShapedDevices.csv* >>> Or something similar around line 329 of integrationCommon.py? >>> Open to your suggestions there. >>> >>> >>> >>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> One more update, and I'm going to sleep until "pick up daughter" time. >>>> :-) >>>> >>>> The tree at >>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>> can now build a network.json, ShapedDevices.csv, and >>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>> previous importer - other than using data links to build the hierarchy and >>>> letting (requiring, currently) you specify the root node. It's handling our >>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>> recommend just running integrationUISP.py and checking the output rather >>>> than throwing it into production), I'd appreciate any feedback. >>>> >>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>> exceptionCPE and site exclusion. >>>> >>>> If you want the pretty graphics, you need to "pip install graphviz" and >>>> "sudo apt install graphviz". It *should* detect that these aren't present >>>> and not try to draw pictures, otherwise. >>>> >>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>> robert.chacon@jackrabbitwireless.com> wrote: >>>> >>>>> Wow. This is very nicely done. Awesome work! >>>>> >>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> The integration is coming along nicely. Some progress updates: >>>>>> >>>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>>> This sets where in the topology you want the tree to start. This has two >>>>>> purposes: >>>>>> - It's hard to be psychic and know for sure where the shaper >>>>>> is in the network. >>>>>> - You could run multiple shapers at different egress points, >>>>>> with failover - and rebuild the entire topology from the point of view of a >>>>>> network node. >>>>>> - "Child node with children" are now automatically converted into >>>>>> a "(Generated Site) name" site, and their children rearranged. This: >>>>>> - Allows you to set the "site" bandwidth independently of the >>>>>> client site bandwidth. >>>>>> - Makes for easier trees, because we're inserting the site >>>>>> that really should be there. >>>>>> - Network.json generation (not the shaped devices file yet) is >>>>>> automatically generated from a tree, once PrepareTree() and >>>>>> createNetworkJson() are called. >>>>>> - There's a unit test that generates the network.example.json >>>>>> file and compares it with the original to ensure that they match. >>>>>> - Unit test coverage hits every function in the graph system, now. >>>>>> >>>>>> I'm liking this setup. With the non-vendor-specific logic contained >>>>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>>>> tree is down to 65 >>>>>> lines of code, including comments. That'll grow a bit as I re-insert >>>>>> some automatic speed limit determination, AP/Site speed overrides ( >>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>> >>>>>> Creating the network.example.json file only requires: >>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>> import json >>>>>> net = NetworkGraph() >>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType. >>>>>> site, 1000, 1000)) >>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType. >>>>>> site, 500, 500)) >>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType >>>>>> .ap, 500, 500)) >>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>> NodeType.site, 500, 500)) >>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>> NodeType.site, 200, 200)) >>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType. >>>>>> ap, 120, 120)) >>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>> NodeType.site, 60, 60)) >>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>> NodeType.ap, 30, 30)) >>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>> NodeType.site, 200, 200)) >>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType. >>>>>> ap, 100, 100)) >>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType >>>>>> .ap, 150, 150)) >>>>>> net.prepareTree() >>>>>> net.createNetworkJson() >>>>>> >>>>>> (The id and name fields are duplicated right now, I'm using readable >>>>>> names to keep me sane. The third string is the parent, and the last two >>>>>> numbers are bandwidth limits) >>>>>> The nice, readable format being: >>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>> NodeType.site, download=1000, upload=1000) >>>>>> >>>>>> That in turns gives you the example network: >>>>>> [image: image.png] >>>>>> >>>>>> >>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>> herberticus@gmail.com> wrote: >>>>>> >>>>>>> Dave: I love those Gource animations! Game development is my other >>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>>>> glow "just right". :-) >>>>>>> >>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look >>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it >>>>>>> easier to express network layout from integrations; that would be another >>>>>>> form/layer of integration and a lot easier to work with once there's a >>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>>>>> deal with things like PMP450 clients that change MAC when you put them in >>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>>>>> allocations by MAC (this is also the hardest part for me to test, since I >>>>>>> don't have any v6 to test, currently). >>>>>>> >>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>> >>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>> >>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>> >>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>> from discovery. >>>>>>>> >>>>>>>> I've also thought about doing routine trace routes or watching TTLs >>>>>>>> or something like that to get some indication that topology has changed and >>>>>>>> then do another discovery and potential tree rebuild. >>>>>>>> >>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>>>> Being able to map out these complex integrations will help ISPs a >>>>>>>>> ton, and I really like that it is sharing common features between the >>>>>>>>> Splynx and UISP integrations. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Robert >>>>>>>>> >>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>> >>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>> >>>>>>>>>> I started by implementing a graph structure that mirrors both the >>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>> >>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>> portion of the tree. We have: >>>>>>>>>> >>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>>>>> same IP, with a "service IP" entry >>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>> sites >>>>>>>>>> >>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>> >>>>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>>>> graphviz to spit out a directed graph: >>>>>>>>>> [image: image.png] >>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - Green = the root site. >>>>>>>>>> - Red = a site >>>>>>>>>> - Blue = an access point >>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>> >>>>>>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>>>>>> tiny fraction of the total code. :-) >>>>>>>>>> >>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>> _______________________________________________ >>>>>>>>>> LibreQoS mailing list >>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Chacón >>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Chacón >>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>> Dev | LibreQoS.io >>>>> >>>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> Robert Chacón >>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>> Dev | LibreQoS.io >>> >>> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > Robert Chacón > CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> > Dev | LibreQoS.io > > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC [-- Attachment #1.2: Type: text/html, Size: 40156 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-29 19:43 ` Dave Taht @ 2022-10-30 1:45 ` Herbert Wolverson 2022-10-31 0:15 ` Dave Taht 0 siblings, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-30 1:45 UTC (permalink / raw) To: Dave Taht; +Cc: Robert Chacón, libreqos [-- Attachment #1.1: Type: text/plain, Size: 24316 bytes --] > For starters, let me also offer praise for this work which is so ahead of schedule! Thank you. I'm enjoying a short period while I wait for my editor to finish up with a couple of chapters of my next book (working title More Hands-on Rust; it's intermediate to advanced Rust, taught through the lens of game development). I think at least initially, the primary focus is on what WISPs are used to (and ask for): a fat shaper box that sits between a WISP and their Internet connection(s). Usually in the topology: (router connected to upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's network as a whole). That's a simplification; there's usually a bypass (in case LibreQoS dies, is being updated, etc.), sometimes multiple connections that need shaping, etc. That's how Preseem (and the others) tend to insert themselves - shape everything on the way out. I think there's a lot to be said for the possibility of LibreQoS at towers that need it the most, also. That might require a bit of MPLS support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier does if it receives a packet with the TCP/UDP header stuck behind some MPLS headers?), but has the potential to really clean things up. Especially for a really busy tower site. (On a similar note, WISPs with multiple Internet connections at different sites would benefit from LibreQoS on each of them). Generally, the QoS box doesn't really care what you are running in the way of a router. We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in the mix too!), I know of people who love Juniper, use Cisco, etc. Since we're shaping in the "router sandwich" (which can be one router with a bit of care), we don't necessarily need to worry too much about their innards. With that said, some future SNMP support (please, not polling everything all the time... that's a monitoring program's job!) is probably hard to avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to be trying to cease supporting it, ugh) I could see some support for outputting rules for routers, especially if the goal is to get Cake managing buffer-bloat in many places down the line. Incidentally, using my latest build of cpumap-pping (and no separate pping running, eating a CPU) my average network latency has dropped to 24ms at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real customer traffic through the system. :-) On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote: > For starters, let me also offer praise for this work which is so ahead of > schedule! > > I am (perhaps cluelessly) thinking about bigger pictures, and still stuck > in my mindset involving distributing the packet processing, > and representing the network topology, plans and compensating for the > physics. > > So you have a major tower, a separate libreqos instance goes there. Or > libreqos outputs rules compatible with mikrotik or vyatta or whatever is > there. Or are you basically thinking one device rules them all and off the > only interface, shapes them? > > Or: > > You have another pop with a separate connection to the internet that you > inherited from a buyout, or you wanted physical redundancy for your BGP > AS's internet access, maybe just between DCs in the same town or... > ____________________________________________ > > / > / > cloud -> pop -> customers - customers <- pop <- cloud > \ ----- leased fiber or wireless / > > > I'm also a little puzzled as to whats the ISP->internet link? juniper? > cisco? mikrotik, and what role and services that is expected to have. > > > > On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> > Per your suggestion, devices with no IP addresses (v4 or v6) are not >> added. >> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" side >> of things, so it can be used in other integrations also. I don't have a >> setup on which to test it, but if I'm reading the code right then the unit >> test is testing it appropriately. >> >> Fantastic. >> >> > excludeSites is supported as a common API feature. If a node is added >> with a name that matches an excluded site, it won't be added. The tree >> builder is smart enough to replace invalid "parentId" references with the >> shaper root, so if you have other tree items that rely on this site - they >> will be added to the tree. Was that the intent? (It looks pretty useful; we >> have a child site down the tree with a HUGE amount of load, and bumping it >> to the top-level with excludeSites would probably help our load balancing >> quite a bit) >> >> Very cool approach, I like it! Yeah we have some cases where we need to >> balance out high load child nodes across CPUs so that's perfect. >> Originally I thought of it to just exclude sites that don't fit into the >> shaped topology but this approach is more useful. >> Should we rename excludeSites to moveSitesToTop or something similar? >> That functionality of distributing across top level nodes / cpu cores seems >> more important anyway. >> >> >exceptionCPEs is also supported as a common API feature. It simply >> overrides the "parentId'' of incoming nodes with the new parent. Another >> potentially useful feature; if I got excludeSites the wrong away around, >> I'd add a "my_big_site":"" entry to push it to the top. >> >> Awesome >> >> > UISP integration now supports a "flat" topology option (set via >> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to >> include this entry. >> >> Nice! >> >> > I'll look and see how much of the Spylnx code I can shorten with the >> new API; I don't have a Spylnx setup to test against, making that tricky. >> >> I'll send you the Splynx login they gave us. >> >> > I *think* the new API should shorten things a lot. I think routers act >> as node parents, with clients underneath them? Otherwise, a "flat" setup >> should be a little shorter (the CSV code can be replaced with a call to the >> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >> MPLS+VPLS to pretend to have a big, flat network and then connect via a >> RADIUS call in the DHCP server; I've always assumed that's because those >> systems prefer the telecom model of "pretend everything is equal" to trying >> to model topology.* >> >> Yeah splynx doesn't seem to natively support any topology mapping or even >> AP designation, one person I spoke to said they track corresponding APs in >> radius anyway. So for now the flat model may be fine. >> >> > I need to clean things up a bit (there's still a bit of duplicated >> code, and I believe in the DRY principle - don't repeat yourself; Dave >> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >> and I feel obliged to use it everywhere!), and do a quick rebase (I >> accidentally parented the branch off of a branch instead of main) - but I >> think I can have this as a PR for you on Monday. >> >> This is really great work and will make future integrations much cleaner >> and nicer to work with. Thank you! >> >> >> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> Alright, the UISP side of the common integrations is pretty much feature >>> complete. I'll update the tracking issue in a bit. >>> >>> - Per your suggestion, devices with no IP addresses (v4 or v6) are >>> not added. >>> - Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>> side of things, so it can be used in other integrations also. I don't have >>> a setup on which to test it, but if I'm reading the code right then the >>> unit test is testing it appropriately. >>> - excludeSites is supported as a common API feature. If a node is >>> added with a name that matches an excluded site, it won't be added. The >>> tree builder is smart enough to replace invalid "parentId" references with >>> the shaper root, so if you have other tree items that rely on this site - >>> they will be added to the tree. Was that the intent? (It looks pretty >>> useful; we have a child site down the tree with a HUGE amount of load, and >>> bumping it to the top-level with excludeSites would probably help our load >>> balancing quite a bit) >>> - If the intent was to exclude the site and everything underneath >>> it, I'd have to rework things a bit. Let me know; it wasn't quite clear. >>> - exceptionCPEs is also supported as a common API feature. It >>> simply overrides the "parentId'' of incoming nodes with the new parent. >>> Another potentially useful feature; if I got excludeSites the wrong away >>> around, I'd add a "my_big_site":"" entry to push it to the top. >>> - UISP integration now supports a "flat" topology option (set via >>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>> to include this entry. >>> >>> I'll look and see how much of the Spylnx code I can shorten with the new >>> API; I don't have a Spylnx setup to test against, making that tricky. I >>> *think* the new API should shorten things a lot. I think routers act as >>> node parents, with clients underneath them? Otherwise, a "flat" setup >>> should be a little shorter (the CSV code can be replaced with a call to the >>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >>> MPLS+VPLS to pretend to have a big, flat network and then connect via a >>> RADIUS call in the DHCP server; I've always assumed that's because those >>> systems prefer the telecom model of "pretend everything is equal" to trying >>> to model topology.* >>> >>> I need to clean things up a bit (there's still a bit of duplicated code, >>> and I believe in the DRY principle - don't repeat yourself; Dave Thomas - >>> my boss at PragProg - coined the term in The Pragmatic Programmer, and I >>> feel obliged to use it everywhere!), and do a quick rebase (I accidentally >>> parented the branch off of a branch instead of main) - but I think I can >>> have this as a PR for you on Monday. >>> >>> * - The first big wireless network I setup used a Motorola WiMAX setup. >>> They *required* that every single AP share two VLANs (management and >>> bearer) with every other AP - all the way to the core. It kinda worked once >>> they remembered client isolation was a thing in a patch... Then again, >>> their installation instructions included connecting two ports of a router >>> together with a jumper cable, because their localhost implementation didn't >>> quite work. :-| >>> >>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>> robert.chacon@jackrabbitwireless.com> wrote: >>> >>>> Awesome work. It succeeded in building the topology and creating >>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>> I notice that in ShapedDevices.csv it does add CPE radios (which in our >>>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>> being empty lists []. >>>> This is not necessarily bad, but it may lead to empty leaf classes >>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>> class counter increment toward the 32k limit faster. >>>> Do you think perhaps we should check: >>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>> * # Skip adding this entry to ShapedDevices.csv* >>>> Or something similar around line 329 of integrationCommon.py? >>>> Open to your suggestions there. >>>> >>>> >>>> >>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> One more update, and I'm going to sleep until "pick up daughter" time. >>>>> :-) >>>>> >>>>> The tree at >>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>> can now build a network.json, ShapedDevices.csv, and >>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>> previous importer - other than using data links to build the hierarchy and >>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>> recommend just running integrationUISP.py and checking the output rather >>>>> than throwing it into production), I'd appreciate any feedback. >>>>> >>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>> exceptionCPE and site exclusion. >>>>> >>>>> If you want the pretty graphics, you need to "pip install graphviz" >>>>> and "sudo apt install graphviz". It *should* detect that these aren't >>>>> present and not try to draw pictures, otherwise. >>>>> >>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>> >>>>>> Wow. This is very nicely done. Awesome work! >>>>>> >>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>> >>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>> >>>>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>>>> This sets where in the topology you want the tree to start. This has two >>>>>>> purposes: >>>>>>> - It's hard to be psychic and know for sure where the shaper >>>>>>> is in the network. >>>>>>> - You could run multiple shapers at different egress points, >>>>>>> with failover - and rebuild the entire topology from the point of view of a >>>>>>> network node. >>>>>>> - "Child node with children" are now automatically converted >>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>> - Allows you to set the "site" bandwidth independently of the >>>>>>> client site bandwidth. >>>>>>> - Makes for easier trees, because we're inserting the site >>>>>>> that really should be there. >>>>>>> - Network.json generation (not the shaped devices file yet) is >>>>>>> automatically generated from a tree, once PrepareTree() and >>>>>>> createNetworkJson() are called. >>>>>>> - There's a unit test that generates the network.example.json >>>>>>> file and compares it with the original to ensure that they match. >>>>>>> - Unit test coverage hits every function in the graph system, >>>>>>> now. >>>>>>> >>>>>>> I'm liking this setup. With the non-vendor-specific logic contained >>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>>>>> tree is down to 65 >>>>>>> lines of code, including comments. That'll grow a bit as I re-insert >>>>>>> some automatic speed limit determination, AP/Site speed overrides ( >>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>> >>>>>>> Creating the network.example.json file only requires: >>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>>> import json >>>>>>> net = NetworkGraph() >>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType. >>>>>>> site, 1000, 1000)) >>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType. >>>>>>> site, 500, 500)) >>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>> NodeType.ap, 500, 500)) >>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>> NodeType.site, 500, 500)) >>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>> NodeType.site, 200, 200)) >>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType >>>>>>> .ap, 120, 120)) >>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>> NodeType.site, 60, 60)) >>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>> NodeType.ap, 30, 30)) >>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>> NodeType.site, 200, 200)) >>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType >>>>>>> .ap, 100, 100)) >>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>> NodeType.ap, 150, 150)) >>>>>>> net.prepareTree() >>>>>>> net.createNetworkJson() >>>>>>> >>>>>>> (The id and name fields are duplicated right now, I'm using readable >>>>>>> names to keep me sane. The third string is the parent, and the last two >>>>>>> numbers are bandwidth limits) >>>>>>> The nice, readable format being: >>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>>> NodeType.site, download=1000, upload=1000) >>>>>>> >>>>>>> That in turns gives you the example network: >>>>>>> [image: image.png] >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>> herberticus@gmail.com> wrote: >>>>>>> >>>>>>>> Dave: I love those Gource animations! Game development is my other >>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>>>>> glow "just right". :-) >>>>>>>> >>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look >>>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it >>>>>>>> easier to express network layout from integrations; that would be another >>>>>>>> form/layer of integration and a lot easier to work with once there's a >>>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>>>>>> deal with things like PMP450 clients that change MAC when you put them in >>>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>>>>>> allocations by MAC (this is also the hardest part for me to test, since I >>>>>>>> don't have any v6 to test, currently). >>>>>>>> >>>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>>> >>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>>> >>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>> >>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>> from discovery. >>>>>>>>> >>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>> >>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>> >>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>>>>> Being able to map out these complex integrations will help ISPs a >>>>>>>>>> ton, and I really like that it is sharing common features between the >>>>>>>>>> Splynx and UISP integrations. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Robert >>>>>>>>>> >>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>> >>>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>>> >>>>>>>>>>> I started by implementing a graph structure that mirrors both >>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>> >>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>> >>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>>>>>> same IP, with a "service IP" entry >>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>>> sites >>>>>>>>>>> >>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>> >>>>>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>>>>> graphviz to spit out a directed graph: >>>>>>>>>>> [image: image.png] >>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Green = the root site. >>>>>>>>>>> - Red = a site >>>>>>>>>>> - Blue = an access point >>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>> >>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>> >>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Robert Chacón >>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>> _______________________________________________ >>>>>>>>>> LibreQoS mailing list >>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>> LibreQoS mailing list >>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Robert Chacón >>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>> Dev | LibreQoS.io >>>>>> >>>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> Robert Chacón >>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>> Dev | LibreQoS.io >>>> >>>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >> Dev | LibreQoS.io >> >> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > This song goes out to all the folk that thought Stadia would work: > > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz > Dave Täht CEO, TekLibre, LLC > [-- Attachment #1.2: Type: text/html, Size: 43273 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-30 1:45 ` Herbert Wolverson @ 2022-10-31 0:15 ` Dave Taht 2022-10-31 1:15 ` Robert Chacón 2022-10-31 1:26 ` Herbert Wolverson 0 siblings, 2 replies; 33+ messages in thread From: Dave Taht @ 2022-10-31 0:15 UTC (permalink / raw) To: Herbert Wolverson; +Cc: Robert Chacón, libreqos [-- Attachment #1.1: Type: text/plain, Size: 27688 bytes --] On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com> wrote: > > For starters, let me also offer praise for this work which is so ahead > of schedule! > > Thank you. I'm enjoying a short period while I wait for my editor to > finish up with a couple of chapters of my next book (working title More > Hands-on Rust; it's intermediate to advanced Rust, taught through the lens > of game development). > cool. I'm 32 years into my PHD thesis. > > I think at least initially, the primary focus is on what WISPs are used to > (and ask for): a fat shaper box that sits between a WISP and their Internet > connection(s). Usually in the topology: (router connected to upstream) <--> > (LibreQoS) <--> (core site router, connected to the WISP's network as a > whole). That's a simplification; there's usually a bypass (in case LibreQoS > dies, is being updated, etc.), sometimes multiple connections that need > shaping, etc. That's how Preseem (and the others) tend to insert themselves > - shape everything on the way out. > Presently LibreQos appears to be inserting about 200us of delay into the path, for the sparsest packets. Every box on the path adds delay, though cut-through switches are common. Don't talk to me about network slicing and disaggregated this or that in the 3GPP world, tho... ugh. I guess, for every "box" (or virtual machine) on the path I have amdah's law stuck in my head. This is in part why the K8 crowd makes me a little crazy. > > I think there's a lot to be said for the possibility of LibreQoS at towers > that need it the most, also. That might require a bit of MPLS support (I > can do the xdp-cpumap-tc part; I'm not sure what the classifier does if it > receives a packet with the TCP/UDP header stuck behind some MPLS headers?), > but has the potential to really clean things up. Especially for a really > busy tower site. (On a similar note, WISPs with multiple Internet > connections at different sites would benefit from LibreQoS on each of > them). > > Generally, the QoS box doesn't really care what you are running in the way > of a router. > It is certainly simpler to have a transparent middlebox for this stuff, initially, and it would take a great leap of faith, for many, to just plug in a lqos box as the main box... but cumulus did succeed at a lot of that... they open sourced a bfd daemon... numerous other tools... https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/ > We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in > the mix too!), I know of people who love Juniper, use Cisco, etc. Since > we're shaping in the "router sandwich" (which can be one router with a bit > of care), we don't necessarily need to worry too much about their innards. > > An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, or a pair perhaps using a fiber optic splitter for failover http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos > With that said, some future SNMP support (please, not polling everything > all the time... that's a monitoring program's job!) is probably hard to > avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to > be trying to cease supporting it, ugh) > > Building on this initial core strength - sampling RTT - would be a differentiator. Examples: RTT per AP RTT P1 per AP (what's the effective minimum) RTT P99 (what's the worst case?) RTT variance P1 to P99 per internet IP (worst 20 performers) or AS number or /24 (variance is a very important concept) > I could see some support for outputting rules for routers, especially if > the goal is to get Cake managing buffer-bloat in many places down the line. > > Incidentally, using my latest build of cpumap-pping (and no separate pping > running, eating a CPU) my average network latency has dropped to 24ms at > peak time (from 40ms). At peak time, while pulling 1.8 gbps of real > customer traffic through the system. :-) > OK, this is something that "triggers" my inner pedant. Forgive me in advance? "average" of "what"? Changing the monitoring tool shouldn't have affected the average latency, unless how it is calculated is different, or the sample population (more likely) has changed. If you are tracking now far more short flows, the observed latency will decline, but the higher latencies you were observing in the first place are still there. Also... between where and where? Across the network? To the customer to their typical set of IP addresses of their servers? on wireless? vs fiber? ( Transiting a fiber network to your pop's edge should take under 2ms). Wifi hops at the end of the link are probably adding the most delay... If you consider 24ms "good" - however you calculate - going for ever less via whatever means can be obtained from these analyses, is useful. But there are some things I don't think make as much sense as they used to - a netflix cache hitrate must be so low nowadays as to cost you just as much to fetch it from upstream than host a box... > > > > > On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote: > >> For starters, let me also offer praise for this work which is so ahead of >> schedule! >> >> I am (perhaps cluelessly) thinking about bigger pictures, and still stuck >> in my mindset involving distributing the packet processing, >> and representing the network topology, plans and compensating for the >> physics. >> >> So you have a major tower, a separate libreqos instance goes there. Or >> libreqos outputs rules compatible with mikrotik or vyatta or whatever is >> there. Or are you basically thinking one device rules them all and off the >> only interface, shapes them? >> >> Or: >> >> You have another pop with a separate connection to the internet that you >> inherited from a buyout, or you wanted physical redundancy for your BGP >> AS's internet access, maybe just between DCs in the same town or... >> ____________________________________________ >> >> / >> / >> cloud -> pop -> customers - customers <- pop <- cloud >> \ ----- leased fiber or wireless / >> >> >> I'm also a little puzzled as to whats the ISP->internet link? juniper? >> cisco? mikrotik, and what role and services that is expected to have. >> >> >> >> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not >>> added. >>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>> side of things, so it can be used in other integrations also. I don't have >>> a setup on which to test it, but if I'm reading the code right then the >>> unit test is testing it appropriately. >>> >>> Fantastic. >>> >>> > excludeSites is supported as a common API feature. If a node is added >>> with a name that matches an excluded site, it won't be added. The tree >>> builder is smart enough to replace invalid "parentId" references with the >>> shaper root, so if you have other tree items that rely on this site - they >>> will be added to the tree. Was that the intent? (It looks pretty useful; we >>> have a child site down the tree with a HUGE amount of load, and bumping it >>> to the top-level with excludeSites would probably help our load balancing >>> quite a bit) >>> >>> Very cool approach, I like it! Yeah we have some cases where we need to >>> balance out high load child nodes across CPUs so that's perfect. >>> Originally I thought of it to just exclude sites that don't fit into the >>> shaped topology but this approach is more useful. >>> Should we rename excludeSites to moveSitesToTop or something similar? >>> That functionality of distributing across top level nodes / cpu cores seems >>> more important anyway. >>> >>> >exceptionCPEs is also supported as a common API feature. It simply >>> overrides the "parentId'' of incoming nodes with the new parent. Another >>> potentially useful feature; if I got excludeSites the wrong away around, >>> I'd add a "my_big_site":"" entry to push it to the top. >>> >>> Awesome >>> >>> > UISP integration now supports a "flat" topology option (set via >>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py to >>> include this entry. >>> >>> Nice! >>> >>> > I'll look and see how much of the Spylnx code I can shorten with the >>> new API; I don't have a Spylnx setup to test against, making that tricky. >>> >>> I'll send you the Splynx login they gave us. >>> >>> > I *think* the new API should shorten things a lot. I think routers >>> act as node parents, with clients underneath them? Otherwise, a "flat" >>> setup should be a little shorter (the CSV code can be replaced with a call >>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>> a RADIUS call in the DHCP server; I've always assumed that's because those >>> systems prefer the telecom model of "pretend everything is equal" to trying >>> to model topology.* >>> >>> Yeah splynx doesn't seem to natively support any topology mapping or >>> even AP designation, one person I spoke to said they track corresponding >>> APs in radius anyway. So for now the flat model may be fine. >>> >>> > I need to clean things up a bit (there's still a bit of duplicated >>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>> accidentally parented the branch off of a branch instead of main) - but I >>> think I can have this as a PR for you on Monday. >>> >>> This is really great work and will make future integrations much cleaner >>> and nicer to work with. Thank you! >>> >>> >>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> Alright, the UISP side of the common integrations is pretty much >>>> feature complete. I'll update the tracking issue in a bit. >>>> >>>> - Per your suggestion, devices with no IP addresses (v4 or v6) are >>>> not added. >>>> - Mikrotik "4 to 6" mapping is implemented. I put it in the >>>> "common" side of things, so it can be used in other integrations also. I >>>> don't have a setup on which to test it, but if I'm reading the code right >>>> then the unit test is testing it appropriately. >>>> - excludeSites is supported as a common API feature. If a node is >>>> added with a name that matches an excluded site, it won't be added. The >>>> tree builder is smart enough to replace invalid "parentId" references with >>>> the shaper root, so if you have other tree items that rely on this site - >>>> they will be added to the tree. Was that the intent? (It looks pretty >>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>> bumping it to the top-level with excludeSites would probably help our load >>>> balancing quite a bit) >>>> - If the intent was to exclude the site and everything >>>> underneath it, I'd have to rework things a bit. Let me know; it wasn't >>>> quite clear. >>>> - exceptionCPEs is also supported as a common API feature. It >>>> simply overrides the "parentId'' of incoming nodes with the new parent. >>>> Another potentially useful feature; if I got excludeSites the wrong away >>>> around, I'd add a "my_big_site":"" entry to push it to the top. >>>> - UISP integration now supports a "flat" topology option (set via >>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>>> to include this entry. >>>> >>>> I'll look and see how much of the Spylnx code I can shorten with the >>>> new API; I don't have a Spylnx setup to test against, making that tricky. I >>>> *think* the new API should shorten things a lot. I think routers act >>>> as node parents, with clients underneath them? Otherwise, a "flat" setup >>>> should be a little shorter (the CSV code can be replaced with a call to the >>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a >>>> RADIUS call in the DHCP server; I've always assumed that's because those >>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>> to model topology.* >>>> >>>> I need to clean things up a bit (there's still a bit of duplicated >>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>> accidentally parented the branch off of a branch instead of main) - but I >>>> think I can have this as a PR for you on Monday. >>>> >>>> * - The first big wireless network I setup used a Motorola WiMAX setup. >>>> They *required* that every single AP share two VLANs (management and >>>> bearer) with every other AP - all the way to the core. It kinda worked once >>>> they remembered client isolation was a thing in a patch... Then again, >>>> their installation instructions included connecting two ports of a router >>>> together with a jumper cable, because their localhost implementation didn't >>>> quite work. :-| >>>> >>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>>> robert.chacon@jackrabbitwireless.com> wrote: >>>> >>>>> Awesome work. It succeeded in building the topology and creating >>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in >>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>>> being empty lists []. >>>>> This is not necessarily bad, but it may lead to empty leaf classes >>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>>> class counter increment toward the 32k limit faster. >>>>> Do you think perhaps we should check: >>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>>> * # Skip adding this entry to ShapedDevices.csv* >>>>> Or something similar around line 329 of integrationCommon.py? >>>>> Open to your suggestions there. >>>>> >>>>> >>>>> >>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> One more update, and I'm going to sleep until "pick up daughter" >>>>>> time. :-) >>>>>> >>>>>> The tree at >>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>> can now build a network.json, ShapedDevices.csv, and >>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>>> previous importer - other than using data links to build the hierarchy and >>>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>>> recommend just running integrationUISP.py and checking the output rather >>>>>> than throwing it into production), I'd appreciate any feedback. >>>>>> >>>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>>> exceptionCPE and site exclusion. >>>>>> >>>>>> If you want the pretty graphics, you need to "pip install graphviz" >>>>>> and "sudo apt install graphviz". It *should* detect that these aren't >>>>>> present and not try to draw pictures, otherwise. >>>>>> >>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>> >>>>>>> Wow. This is very nicely done. Awesome work! >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>> >>>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>>> >>>>>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>>>>> This sets where in the topology you want the tree to start. This has two >>>>>>>> purposes: >>>>>>>> - It's hard to be psychic and know for sure where the shaper >>>>>>>> is in the network. >>>>>>>> - You could run multiple shapers at different egress points, >>>>>>>> with failover - and rebuild the entire topology from the point of view of a >>>>>>>> network node. >>>>>>>> - "Child node with children" are now automatically converted >>>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>>> - Allows you to set the "site" bandwidth independently of >>>>>>>> the client site bandwidth. >>>>>>>> - Makes for easier trees, because we're inserting the site >>>>>>>> that really should be there. >>>>>>>> - Network.json generation (not the shaped devices file yet) is >>>>>>>> automatically generated from a tree, once PrepareTree() and >>>>>>>> createNetworkJson() are called. >>>>>>>> - There's a unit test that generates the >>>>>>>> network.example.json file and compares it with the original to ensure that >>>>>>>> they match. >>>>>>>> - Unit test coverage hits every function in the graph system, >>>>>>>> now. >>>>>>>> >>>>>>>> I'm liking this setup. With the non-vendor-specific logic contained >>>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>>>>>> tree is down to 65 >>>>>>>> lines of code, including comments. That'll grow a bit as I >>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides >>>>>>>> ( >>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>>> >>>>>>>> Creating the network.example.json file only requires: >>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>>>> import json >>>>>>>> net = NetworkGraph() >>>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType >>>>>>>> .site, 1000, 1000)) >>>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType >>>>>>>> .site, 500, 500)) >>>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>>> NodeType.ap, 500, 500)) >>>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>>> NodeType.site, 500, 500)) >>>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>>> NodeType.site, 200, 200)) >>>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", >>>>>>>> NodeType.ap, 120, 120)) >>>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>>> NodeType.site, 60, 60)) >>>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>>> NodeType.ap, 30, 30)) >>>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>>> NodeType.site, 200, 200)) >>>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", >>>>>>>> NodeType.ap, 100, 100)) >>>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>>> NodeType.ap, 150, 150)) >>>>>>>> net.prepareTree() >>>>>>>> net.createNetworkJson() >>>>>>>> >>>>>>>> (The id and name fields are duplicated right now, I'm using >>>>>>>> readable names to keep me sane. The third string is the parent, and the >>>>>>>> last two numbers are bandwidth limits) >>>>>>>> The nice, readable format being: >>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>>>> NodeType.site, download=1000, upload=1000) >>>>>>>> >>>>>>>> That in turns gives you the example network: >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>>> herberticus@gmail.com> wrote: >>>>>>>> >>>>>>>>> Dave: I love those Gource animations! Game development is my other >>>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>>>>>> glow "just right". :-) >>>>>>>>> >>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to >>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make >>>>>>>>> it easier to express network layout from integrations; that would be >>>>>>>>> another form/layer of integration and a lot easier to work with once >>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly >>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route >>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra" >>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you >>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some >>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense. >>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers >>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test, >>>>>>>>> since I don't have any v6 to test, currently). >>>>>>>>> >>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>>>> >>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>>> >>>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>>> from discovery. >>>>>>>>>> >>>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>>> >>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>> >>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>>>>>> Being able to map out these complex integrations will help ISPs >>>>>>>>>>> a ton, and I really like that it is sharing common features between the >>>>>>>>>>> Splynx and UISP integrations. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Robert >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>> >>>>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>>>> >>>>>>>>>>>> I started by implementing a graph structure that mirrors both >>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>>> >>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>>> >>>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate >>>>>>>>>>>> the same IP, with a "service IP" entry >>>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>>>> sites >>>>>>>>>>>> >>>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>>> >>>>>>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>>>>>> graphviz to spit out a directed graph: >>>>>>>>>>>> [image: image.png] >>>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Green = the root site. >>>>>>>>>>>> - Red = a site >>>>>>>>>>>> - Blue = an access point >>>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>>> >>>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>>> >>>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Robert Chacón >>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>> LibreQoS mailing list >>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Robert Chacón >>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>> Dev | LibreQoS.io >>>>>>> >>>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Chacón >>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>> Dev | LibreQoS.io >>>>> >>>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> Robert Chacón >>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>> Dev | LibreQoS.io >>> >>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> This song goes out to all the folk that thought Stadia would work: >> >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >> Dave Täht CEO, TekLibre, LLC >> > -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC [-- Attachment #1.2: Type: text/html, Size: 48489 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 0:15 ` Dave Taht @ 2022-10-31 1:15 ` Robert Chacón 2022-10-31 1:26 ` Herbert Wolverson 1 sibling, 0 replies; 33+ messages in thread From: Robert Chacón @ 2022-10-31 1:15 UTC (permalink / raw) To: Dave Taht; +Cc: Herbert Wolverson, libreqos [-- Attachment #1.1: Type: text/plain, Size: 29972 bytes --] > RTT per AP RTT P1 per AP (what's the effective minimum) RTT P99 (what's the worst case?) RTT variance P1 to P99 per internet IP (worst 20 performers) or AS number or /24 Working on it. RTT per AP is actually already there in v1.3 - graphed in InfluxDB. We just need to keep testing cpumap-pping with more real world traffic. When I tried it today it worked great for 99% of users. And was very resource efficient. There's a small issue when clients have plans past 500Mbps <https://github.com/thebracket/cpumap-pping/issues/2> but that's admittedly an edge case for most small ISPs. For now we could implement a toggle between LibreQoS using xdp-cpumap-tc or cpumap-pping in ispConfig.py until that's figured out eventually. > "average" of "what"? Probably average RTT from end-user households to CDNs and major IXs? I think he means when running Pollere's PPing instead of his much faster XDP based cpumap-pping, PPing was hammering the CPU so hard it was negatively affecting end-user RTT. The original PPing chokes after 1Gbps or so and uses way too much CPU, which likely hindered the functionality of HTBs and CAKE instances on the same cores. > on wireless? vs fiber? ( Transiting a fiber network to your pop's edge should take under 2ms). Wifi hops at the end of the link are probably adding the most delay... I think they're mostly wireless. 24ms aint bad at all from the end user to IX! On Sun, Oct 30, 2022 at 6:15 PM Dave Taht <dave.taht@gmail.com> wrote: > > > On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com> > wrote: > >> > For starters, let me also offer praise for this work which is so ahead >> of schedule! >> >> Thank you. I'm enjoying a short period while I wait for my editor to >> finish up with a couple of chapters of my next book (working title More >> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens >> of game development). >> > > cool. I'm 32 years into my PHD thesis. > > >> >> I think at least initially, the primary focus is on what WISPs are used >> to (and ask for): a fat shaper box that sits between a WISP and their >> Internet connection(s). Usually in the topology: (router connected to >> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's >> network as a whole). That's a simplification; there's usually a bypass (in >> case LibreQoS dies, is being updated, etc.), sometimes multiple connections >> that need shaping, etc. That's how Preseem (and the others) tend to insert >> themselves - shape everything on the way out. >> > > Presently LibreQos appears to be inserting about 200us of delay into the > path, for the sparsest packets. Every box on the path adds > delay, though cut-through switches are common. Don't talk to me about > network slicing and disaggregated this or that in the 3GPP world, tho... > ugh. > > I guess, for every "box" (or virtual machine) on the path I have amdah's > law stuck in my head. > > This is in part why the K8 crowd makes me a little crazy. > > >> >> I think there's a lot to be said for the possibility of LibreQoS at >> towers that need it the most, also. That might require a bit of MPLS >> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier >> does if it receives a packet with the TCP/UDP header stuck behind some MPLS >> headers?), but has the potential to really clean things up. Especially for >> a really busy tower site. (On a similar note, WISPs with multiple Internet >> connections at different sites would benefit from LibreQoS on each of >> them). >> >> Generally, the QoS box doesn't really care what you are running in the >> way of a router. >> > > It is certainly simpler to have a transparent middlebox for this stuff, > initially, and it would take a great leap of faith, > for many, to just plug in a lqos box as the main box... but cumulus did > succeed at a lot of that... they open sourced a bfd daemon... numerous > other tools... > > https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/ > > >> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in >> the mix too!), I know of people who love Juniper, use Cisco, etc. Since >> we're shaping in the "router sandwich" (which can be one router with a bit >> of care), we don't necessarily need to worry too much about their innards. >> >> > An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, > or a pair perhaps using a fiber optic splitter for failover > > http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos > > > > >> With that said, some future SNMP support (please, not polling everything >> all the time... that's a monitoring program's job!) is probably hard to >> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to >> be trying to cease supporting it, ugh) >> >> > Building on this initial core strength - sampling RTT - would be a > differentiator. > > Examples: > > RTT per AP > RTT P1 per AP (what's the effective minimum) > RTT P99 (what's the worst case?) > RTT variance P1 to P99 per internet IP (worst 20 performers) or AS number > or /24 > > (variance is a very important concept) > > > > > >> I could see some support for outputting rules for routers, especially if >> the goal is to get Cake managing buffer-bloat in many places down the line. >> >> Incidentally, using my latest build of cpumap-pping (and no separate >> pping running, eating a CPU) my average network latency has dropped to 24ms >> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real >> customer traffic through the system. :-) >> > > OK, this is something that "triggers" my inner pedant. Forgive me in > advance? > > "average" of "what"? > > Changing the monitoring tool shouldn't have affected the average latency, > unless how it is calculated is different, or the sample > population (more likely) has changed. If you are tracking now far more > short flows, the observed latency will decline, but the > higher latencies you were observing in the first place are still there. > > Also... between where and where? Across the network? To the customer to > their typical set of IP addresses of their servers? > on wireless? vs fiber? ( Transiting a fiber network to your pop's edge > should take under 2ms). Wifi hops at the end of the link are > probably adding the most delay... > > If you consider 24ms "good" - however you calculate - going for ever less > via whatever means can be obtained from these > analyses, is useful. But there are some things I don't think make as much > sense as they used to - a netflix cache hitrate must > be so low nowadays as to cost you just as much to fetch it from upstream > than host a box... > > > > >> >> >> >> >> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote: >> >>> For starters, let me also offer praise for this work which is so ahead >>> of schedule! >>> >>> I am (perhaps cluelessly) thinking about bigger pictures, and still >>> stuck in my mindset involving distributing the packet processing, >>> and representing the network topology, plans and compensating for the >>> physics. >>> >>> So you have a major tower, a separate libreqos instance goes there. Or >>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is >>> there. Or are you basically thinking one device rules them all and off the >>> only interface, shapes them? >>> >>> Or: >>> >>> You have another pop with a separate connection to the internet that you >>> inherited from a buyout, or you wanted physical redundancy for your BGP >>> AS's internet access, maybe just between DCs in the same town or... >>> ____________________________________________ >>> >>> / >>> / >>> cloud -> pop -> customers - customers <- pop <- cloud >>> \ ----- leased fiber or wireless / >>> >>> >>> I'm also a little puzzled as to whats the ISP->internet link? juniper? >>> cisco? mikrotik, and what role and services that is expected to have. >>> >>> >>> >>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not >>>> added. >>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>>> side of things, so it can be used in other integrations also. I don't have >>>> a setup on which to test it, but if I'm reading the code right then the >>>> unit test is testing it appropriately. >>>> >>>> Fantastic. >>>> >>>> > excludeSites is supported as a common API feature. If a node is added >>>> with a name that matches an excluded site, it won't be added. The tree >>>> builder is smart enough to replace invalid "parentId" references with the >>>> shaper root, so if you have other tree items that rely on this site - they >>>> will be added to the tree. Was that the intent? (It looks pretty useful; we >>>> have a child site down the tree with a HUGE amount of load, and bumping it >>>> to the top-level with excludeSites would probably help our load balancing >>>> quite a bit) >>>> >>>> Very cool approach, I like it! Yeah we have some cases where we need to >>>> balance out high load child nodes across CPUs so that's perfect. >>>> Originally I thought of it to just exclude sites that don't fit into >>>> the shaped topology but this approach is more useful. >>>> Should we rename excludeSites to moveSitesToTop or something similar? >>>> That functionality of distributing across top level nodes / cpu cores seems >>>> more important anyway. >>>> >>>> >exceptionCPEs is also supported as a common API feature. It simply >>>> overrides the "parentId'' of incoming nodes with the new parent. Another >>>> potentially useful feature; if I got excludeSites the wrong away around, >>>> I'd add a "my_big_site":"" entry to push it to the top. >>>> >>>> Awesome >>>> >>>> > UISP integration now supports a "flat" topology option (set via >>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>>> to include this entry. >>>> >>>> Nice! >>>> >>>> > I'll look and see how much of the Spylnx code I can shorten with the >>>> new API; I don't have a Spylnx setup to test against, making that tricky. >>>> >>>> I'll send you the Splynx login they gave us. >>>> >>>> > I *think* the new API should shorten things a lot. I think routers >>>> act as node parents, with clients underneath them? Otherwise, a "flat" >>>> setup should be a little shorter (the CSV code can be replaced with a call >>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>>> a RADIUS call in the DHCP server; I've always assumed that's because those >>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>> to model topology.* >>>> >>>> Yeah splynx doesn't seem to natively support any topology mapping or >>>> even AP designation, one person I spoke to said they track corresponding >>>> APs in radius anyway. So for now the flat model may be fine. >>>> >>>> > I need to clean things up a bit (there's still a bit of duplicated >>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>> accidentally parented the branch off of a branch instead of main) - but I >>>> think I can have this as a PR for you on Monday. >>>> >>>> This is really great work and will make future integrations much >>>> cleaner and nicer to work with. Thank you! >>>> >>>> >>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> Alright, the UISP side of the common integrations is pretty much >>>>> feature complete. I'll update the tracking issue in a bit. >>>>> >>>>> - Per your suggestion, devices with no IP addresses (v4 or v6) are >>>>> not added. >>>>> - Mikrotik "4 to 6" mapping is implemented. I put it in the >>>>> "common" side of things, so it can be used in other integrations also. I >>>>> don't have a setup on which to test it, but if I'm reading the code right >>>>> then the unit test is testing it appropriately. >>>>> - excludeSites is supported as a common API feature. If a node is >>>>> added with a name that matches an excluded site, it won't be added. The >>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>> the shaper root, so if you have other tree items that rely on this site - >>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>> bumping it to the top-level with excludeSites would probably help our load >>>>> balancing quite a bit) >>>>> - If the intent was to exclude the site and everything >>>>> underneath it, I'd have to rework things a bit. Let me know; it wasn't >>>>> quite clear. >>>>> - exceptionCPEs is also supported as a common API feature. It >>>>> simply overrides the "parentId'' of incoming nodes with the new parent. >>>>> Another potentially useful feature; if I got excludeSites the wrong away >>>>> around, I'd add a "my_big_site":"" entry to push it to the top. >>>>> - UISP integration now supports a "flat" topology option (set via >>>>> uispStrategy = "flat" in ispConfig). I expanded >>>>> ispConfig.example.py to include this entry. >>>>> >>>>> I'll look and see how much of the Spylnx code I can shorten with the >>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I >>>>> *think* the new API should shorten things a lot. I think routers act >>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup >>>>> should be a little shorter (the CSV code can be replaced with a call to the >>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a >>>>> RADIUS call in the DHCP server; I've always assumed that's because those >>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>> to model topology.* >>>>> >>>>> I need to clean things up a bit (there's still a bit of duplicated >>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>> think I can have this as a PR for you on Monday. >>>>> >>>>> * - The first big wireless network I setup used a Motorola WiMAX >>>>> setup. They *required* that every single AP share two VLANs >>>>> (management and bearer) with every other AP - all the way to the core. It >>>>> kinda worked once they remembered client isolation was a thing in a >>>>> patch... Then again, their installation instructions included connecting >>>>> two ports of a router together with a jumper cable, because their localhost >>>>> implementation didn't quite work. :-| >>>>> >>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>> >>>>>> Awesome work. It succeeded in building the topology and creating >>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in >>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>>>> being empty lists []. >>>>>> This is not necessarily bad, but it may lead to empty leaf classes >>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>>>> class counter increment toward the 32k limit faster. >>>>>> Do you think perhaps we should check: >>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>>>> * # Skip adding this entry to ShapedDevices.csv* >>>>>> Or something similar around line 329 of integrationCommon.py? >>>>>> Open to your suggestions there. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>> >>>>>>> One more update, and I'm going to sleep until "pick up daughter" >>>>>>> time. :-) >>>>>>> >>>>>>> The tree at >>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>> can now build a network.json, ShapedDevices.csv, and >>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>>>> previous importer - other than using data links to build the hierarchy and >>>>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>>>> recommend just running integrationUISP.py and checking the output rather >>>>>>> than throwing it into production), I'd appreciate any feedback. >>>>>>> >>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>>>> exceptionCPE and site exclusion. >>>>>>> >>>>>>> If you want the pretty graphics, you need to "pip install graphviz" >>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't >>>>>>> present and not try to draw pictures, otherwise. >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>>> >>>>>>>> Wow. This is very nicely done. Awesome work! >>>>>>>> >>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>>>> >>>>>>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>>>>>> This sets where in the topology you want the tree to start. This has two >>>>>>>>> purposes: >>>>>>>>> - It's hard to be psychic and know for sure where the >>>>>>>>> shaper is in the network. >>>>>>>>> - You could run multiple shapers at different egress >>>>>>>>> points, with failover - and rebuild the entire topology from the point of >>>>>>>>> view of a network node. >>>>>>>>> - "Child node with children" are now automatically converted >>>>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>>>> - Allows you to set the "site" bandwidth independently of >>>>>>>>> the client site bandwidth. >>>>>>>>> - Makes for easier trees, because we're inserting the site >>>>>>>>> that really should be there. >>>>>>>>> - Network.json generation (not the shaped devices file yet) is >>>>>>>>> automatically generated from a tree, once PrepareTree() and >>>>>>>>> createNetworkJson() are called. >>>>>>>>> - There's a unit test that generates the >>>>>>>>> network.example.json file and compares it with the original to ensure that >>>>>>>>> they match. >>>>>>>>> - Unit test coverage hits every function in the graph system, >>>>>>>>> now. >>>>>>>>> >>>>>>>>> I'm liking this setup. With the non-vendor-specific logic >>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate >>>>>>>>> the example tree is down to 65 >>>>>>>>> lines of code, including comments. That'll grow a bit as I >>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides >>>>>>>>> ( >>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>>>> >>>>>>>>> Creating the network.example.json file only requires: >>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>>>>> import json >>>>>>>>> net = NetworkGraph() >>>>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", >>>>>>>>> NodeType.site, 1000, 1000)) >>>>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", >>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>>>> NodeType.ap, 500, 500)) >>>>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", >>>>>>>>> NodeType.ap, 120, 120)) >>>>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>>>> NodeType.site, 60, 60)) >>>>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>>>> NodeType.ap, 30, 30)) >>>>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", >>>>>>>>> NodeType.ap, 100, 100)) >>>>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>>>> NodeType.ap, 150, 150)) >>>>>>>>> net.prepareTree() >>>>>>>>> net.createNetworkJson() >>>>>>>>> >>>>>>>>> (The id and name fields are duplicated right now, I'm using >>>>>>>>> readable names to keep me sane. The third string is the parent, and the >>>>>>>>> last two numbers are bandwidth limits) >>>>>>>>> The nice, readable format being: >>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>>>>> NodeType.site, download=1000, upload=1000) >>>>>>>>> >>>>>>>>> That in turns gives you the example network: >>>>>>>>> [image: image.png] >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>>>> herberticus@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Dave: I love those Gource animations! Game development is my >>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make >>>>>>>>>> the glow "just right". :-) >>>>>>>>>> >>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to >>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make >>>>>>>>>> it easier to express network layout from integrations; that would be >>>>>>>>>> another form/layer of integration and a lot easier to work with once >>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly >>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route >>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra" >>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you >>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some >>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense. >>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers >>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test, >>>>>>>>>> since I don't have any v6 to test, currently). >>>>>>>>>> >>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>>>>> >>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>>>> >>>>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>>>> from discovery. >>>>>>>>>>> >>>>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>> >>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing >>>>>>>>>>>> this. >>>>>>>>>>>> Being able to map out these complex integrations will help ISPs >>>>>>>>>>>> a ton, and I really like that it is sharing common features between the >>>>>>>>>>>> Splynx and UISP integrations. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Robert >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>>>>> >>>>>>>>>>>>> I started by implementing a graph structure that mirrors both >>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>>>> >>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>>>> >>>>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate >>>>>>>>>>>>> the same IP, with a "service IP" entry >>>>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>>>>> sites >>>>>>>>>>>>> >>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>>>> >>>>>>>>>>>>> So I fed our network into the current graph generator, and >>>>>>>>>>>>> used graphviz to spit out a directed graph: >>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Green = the root site. >>>>>>>>>>>>> - Red = a site >>>>>>>>>>>>> - Blue = an access point >>>>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>>>> >>>>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>>>> >>>>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Robert Chacón >>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Chacón >>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>> Dev | LibreQoS.io >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> LibreQoS mailing list >>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Robert Chacón >>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>> Dev | LibreQoS.io >>>>>> >>>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> Robert Chacón >>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>> Dev | LibreQoS.io >>>> >>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> This song goes out to all the folk that thought Stadia would work: >>> >>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>> Dave Täht CEO, TekLibre, LLC >>> >> > > -- > This song goes out to all the folk that thought Stadia would work: > > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz > Dave Täht CEO, TekLibre, LLC > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> Dev | LibreQoS.io [-- Attachment #1.2: Type: text/html, Size: 50992 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 0:15 ` Dave Taht 2022-10-31 1:15 ` Robert Chacón @ 2022-10-31 1:26 ` Herbert Wolverson 2022-10-31 1:36 ` Herbert Wolverson 1 sibling, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-31 1:26 UTC (permalink / raw) To: Dave Taht; +Cc: Robert Chacón, libreqos [-- Attachment #1.1: Type: text/plain, Size: 30812 bytes --] > "average" of "what"? Mean TCP RTT times, as measured by pping-cpumap. There's two steps of improvement; the original "pping" started to eat a bunch of CPU at higher traffic levels, and I had a feeling - not entirely quantified - that the excess CPU usage was causing some latency. Switching to pping-cpumap showed that I was correct in my hunch. On top of that,as Robert had observed, the previous version was causing a slight "stutter" when it filled the tracking buffers (and then recovered fine). My most recent build scales the tracking buffers up a LOT - which I was worried would cause some slowdown (since the program is now searching a much larger hashmap space, making it less cache friendly). The buffer increase fixed up the stutter issue. I probably should have been a little more clear on what I was talking about. I'm still trying to figure out the optimal buffer size, and the optimal stats collection (which "resets" the buffers, eliminating any resource depletion) period. I'm also experimenting with a few other ideas to keep the measurement latency more consistent. I tried "dump it all into a perfmap and figure it out in userspace" which went spectacularly badly. :-| The RTT measurements are from the customer to whatever the heck they are using on the Internet. So customers using a slow service that's bottlenecked far outside of my control will negatively affect the results - but there's nothing I can do about that. Coincidentally, it's the same "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself included) tend to have a "feel" for it. If I remember rightly, Preseem (which is basically fq-codel queues per customer, with an optional layer of AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms as "red" - and a lot of WISPs have become used to that grading. I always thought that an average of 70ms seemed pretty excessive to be "good". The idea is that it's quantifying the customer's *experience* - the lower the average, the snappier the connection "feels". You can have a pretty happy customer with very low latency and a low speed plan, if they aren't doing anything that needs to exhaust their speed plan. (This contrasts with a lot of other solutions - notably Sandvine - which have always focused heavily on "how much less upsteam does the ISP need to buy?") On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht@gmail.com> wrote: > > > On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com> > wrote: > >> > For starters, let me also offer praise for this work which is so ahead >> of schedule! >> >> Thank you. I'm enjoying a short period while I wait for my editor to >> finish up with a couple of chapters of my next book (working title More >> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens >> of game development). >> > > cool. I'm 32 years into my PHD thesis. > > >> >> I think at least initially, the primary focus is on what WISPs are used >> to (and ask for): a fat shaper box that sits between a WISP and their >> Internet connection(s). Usually in the topology: (router connected to >> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's >> network as a whole). That's a simplification; there's usually a bypass (in >> case LibreQoS dies, is being updated, etc.), sometimes multiple connections >> that need shaping, etc. That's how Preseem (and the others) tend to insert >> themselves - shape everything on the way out. >> > > Presently LibreQos appears to be inserting about 200us of delay into the > path, for the sparsest packets. Every box on the path adds > delay, though cut-through switches are common. Don't talk to me about > network slicing and disaggregated this or that in the 3GPP world, tho... > ugh. > > I guess, for every "box" (or virtual machine) on the path I have amdah's > law stuck in my head. > > This is in part why the K8 crowd makes me a little crazy. > > >> >> I think there's a lot to be said for the possibility of LibreQoS at >> towers that need it the most, also. That might require a bit of MPLS >> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier >> does if it receives a packet with the TCP/UDP header stuck behind some MPLS >> headers?), but has the potential to really clean things up. Especially for >> a really busy tower site. (On a similar note, WISPs with multiple Internet >> connections at different sites would benefit from LibreQoS on each of >> them). >> >> Generally, the QoS box doesn't really care what you are running in the >> way of a router. >> > > It is certainly simpler to have a transparent middlebox for this stuff, > initially, and it would take a great leap of faith, > for many, to just plug in a lqos box as the main box... but cumulus did > succeed at a lot of that... they open sourced a bfd daemon... numerous > other tools... > > https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/ > > >> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco in >> the mix too!), I know of people who love Juniper, use Cisco, etc. Since >> we're shaping in the "router sandwich" (which can be one router with a bit >> of care), we don't necessarily need to worry too much about their innards. >> >> > An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, > or a pair perhaps using a fiber optic splitter for failover > > http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos > > > > >> With that said, some future SNMP support (please, not polling everything >> all the time... that's a monitoring program's job!) is probably hard to >> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to >> be trying to cease supporting it, ugh) >> >> > Building on this initial core strength - sampling RTT - would be a > differentiator. > > Examples: > > RTT per AP > RTT P1 per AP (what's the effective minimum) > RTT P99 (what's the worst case?) > RTT variance P1 to P99 per internet IP (worst 20 performers) or AS number > or /24 > > (variance is a very important concept) > > > > > >> I could see some support for outputting rules for routers, especially if >> the goal is to get Cake managing buffer-bloat in many places down the line. >> >> Incidentally, using my latest build of cpumap-pping (and no separate >> pping running, eating a CPU) my average network latency has dropped to 24ms >> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real >> customer traffic through the system. :-) >> > > OK, this is something that "triggers" my inner pedant. Forgive me in > advance? > > "average" of "what"? > > Changing the monitoring tool shouldn't have affected the average latency, > unless how it is calculated is different, or the sample > population (more likely) has changed. If you are tracking now far more > short flows, the observed latency will decline, but the > higher latencies you were observing in the first place are still there. > > Also... between where and where? Across the network? To the customer to > their typical set of IP addresses of their servers? > on wireless? vs fiber? ( Transiting a fiber network to your pop's edge > should take under 2ms). Wifi hops at the end of the link are > probably adding the most delay... > > If you consider 24ms "good" - however you calculate - going for ever less > via whatever means can be obtained from these > analyses, is useful. But there are some things I don't think make as much > sense as they used to - a netflix cache hitrate must > be so low nowadays as to cost you just as much to fetch it from upstream > than host a box... > > > > >> >> >> >> >> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote: >> >>> For starters, let me also offer praise for this work which is so ahead >>> of schedule! >>> >>> I am (perhaps cluelessly) thinking about bigger pictures, and still >>> stuck in my mindset involving distributing the packet processing, >>> and representing the network topology, plans and compensating for the >>> physics. >>> >>> So you have a major tower, a separate libreqos instance goes there. Or >>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is >>> there. Or are you basically thinking one device rules them all and off the >>> only interface, shapes them? >>> >>> Or: >>> >>> You have another pop with a separate connection to the internet that you >>> inherited from a buyout, or you wanted physical redundancy for your BGP >>> AS's internet access, maybe just between DCs in the same town or... >>> ____________________________________________ >>> >>> / >>> / >>> cloud -> pop -> customers - customers <- pop <- cloud >>> \ ----- leased fiber or wireless / >>> >>> >>> I'm also a little puzzled as to whats the ISP->internet link? juniper? >>> cisco? mikrotik, and what role and services that is expected to have. >>> >>> >>> >>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not >>>> added. >>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>>> side of things, so it can be used in other integrations also. I don't have >>>> a setup on which to test it, but if I'm reading the code right then the >>>> unit test is testing it appropriately. >>>> >>>> Fantastic. >>>> >>>> > excludeSites is supported as a common API feature. If a node is added >>>> with a name that matches an excluded site, it won't be added. The tree >>>> builder is smart enough to replace invalid "parentId" references with the >>>> shaper root, so if you have other tree items that rely on this site - they >>>> will be added to the tree. Was that the intent? (It looks pretty useful; we >>>> have a child site down the tree with a HUGE amount of load, and bumping it >>>> to the top-level with excludeSites would probably help our load balancing >>>> quite a bit) >>>> >>>> Very cool approach, I like it! Yeah we have some cases where we need to >>>> balance out high load child nodes across CPUs so that's perfect. >>>> Originally I thought of it to just exclude sites that don't fit into >>>> the shaped topology but this approach is more useful. >>>> Should we rename excludeSites to moveSitesToTop or something similar? >>>> That functionality of distributing across top level nodes / cpu cores seems >>>> more important anyway. >>>> >>>> >exceptionCPEs is also supported as a common API feature. It simply >>>> overrides the "parentId'' of incoming nodes with the new parent. Another >>>> potentially useful feature; if I got excludeSites the wrong away around, >>>> I'd add a "my_big_site":"" entry to push it to the top. >>>> >>>> Awesome >>>> >>>> > UISP integration now supports a "flat" topology option (set via >>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>>> to include this entry. >>>> >>>> Nice! >>>> >>>> > I'll look and see how much of the Spylnx code I can shorten with the >>>> new API; I don't have a Spylnx setup to test against, making that tricky. >>>> >>>> I'll send you the Splynx login they gave us. >>>> >>>> > I *think* the new API should shorten things a lot. I think routers >>>> act as node parents, with clients underneath them? Otherwise, a "flat" >>>> setup should be a little shorter (the CSV code can be replaced with a call >>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>>> a RADIUS call in the DHCP server; I've always assumed that's because those >>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>> to model topology.* >>>> >>>> Yeah splynx doesn't seem to natively support any topology mapping or >>>> even AP designation, one person I spoke to said they track corresponding >>>> APs in radius anyway. So for now the flat model may be fine. >>>> >>>> > I need to clean things up a bit (there's still a bit of duplicated >>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>> accidentally parented the branch off of a branch instead of main) - but I >>>> think I can have this as a PR for you on Monday. >>>> >>>> This is really great work and will make future integrations much >>>> cleaner and nicer to work with. Thank you! >>>> >>>> >>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> Alright, the UISP side of the common integrations is pretty much >>>>> feature complete. I'll update the tracking issue in a bit. >>>>> >>>>> - Per your suggestion, devices with no IP addresses (v4 or v6) are >>>>> not added. >>>>> - Mikrotik "4 to 6" mapping is implemented. I put it in the >>>>> "common" side of things, so it can be used in other integrations also. I >>>>> don't have a setup on which to test it, but if I'm reading the code right >>>>> then the unit test is testing it appropriately. >>>>> - excludeSites is supported as a common API feature. If a node is >>>>> added with a name that matches an excluded site, it won't be added. The >>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>> the shaper root, so if you have other tree items that rely on this site - >>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>> bumping it to the top-level with excludeSites would probably help our load >>>>> balancing quite a bit) >>>>> - If the intent was to exclude the site and everything >>>>> underneath it, I'd have to rework things a bit. Let me know; it wasn't >>>>> quite clear. >>>>> - exceptionCPEs is also supported as a common API feature. It >>>>> simply overrides the "parentId'' of incoming nodes with the new parent. >>>>> Another potentially useful feature; if I got excludeSites the wrong away >>>>> around, I'd add a "my_big_site":"" entry to push it to the top. >>>>> - UISP integration now supports a "flat" topology option (set via >>>>> uispStrategy = "flat" in ispConfig). I expanded >>>>> ispConfig.example.py to include this entry. >>>>> >>>>> I'll look and see how much of the Spylnx code I can shorten with the >>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I >>>>> *think* the new API should shorten things a lot. I think routers act >>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup >>>>> should be a little shorter (the CSV code can be replaced with a call to the >>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a >>>>> RADIUS call in the DHCP server; I've always assumed that's because those >>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>> to model topology.* >>>>> >>>>> I need to clean things up a bit (there's still a bit of duplicated >>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>> think I can have this as a PR for you on Monday. >>>>> >>>>> * - The first big wireless network I setup used a Motorola WiMAX >>>>> setup. They *required* that every single AP share two VLANs >>>>> (management and bearer) with every other AP - all the way to the core. It >>>>> kinda worked once they remembered client isolation was a thing in a >>>>> patch... Then again, their installation instructions included connecting >>>>> two ports of a router together with a jumper cable, because their localhost >>>>> implementation didn't quite work. :-| >>>>> >>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>> >>>>>> Awesome work. It succeeded in building the topology and creating >>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in >>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>>>> being empty lists []. >>>>>> This is not necessarily bad, but it may lead to empty leaf classes >>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>>>> class counter increment toward the 32k limit faster. >>>>>> Do you think perhaps we should check: >>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>>>> * # Skip adding this entry to ShapedDevices.csv* >>>>>> Or something similar around line 329 of integrationCommon.py? >>>>>> Open to your suggestions there. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>> >>>>>>> One more update, and I'm going to sleep until "pick up daughter" >>>>>>> time. :-) >>>>>>> >>>>>>> The tree at >>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>> can now build a network.json, ShapedDevices.csv, and >>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>>>> previous importer - other than using data links to build the hierarchy and >>>>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>>>> recommend just running integrationUISP.py and checking the output rather >>>>>>> than throwing it into production), I'd appreciate any feedback. >>>>>>> >>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>>>> exceptionCPE and site exclusion. >>>>>>> >>>>>>> If you want the pretty graphics, you need to "pip install graphviz" >>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't >>>>>>> present and not try to draw pictures, otherwise. >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>>> >>>>>>>> Wow. This is very nicely done. Awesome work! >>>>>>>> >>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>>>> >>>>>>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>>>>>> This sets where in the topology you want the tree to start. This has two >>>>>>>>> purposes: >>>>>>>>> - It's hard to be psychic and know for sure where the >>>>>>>>> shaper is in the network. >>>>>>>>> - You could run multiple shapers at different egress >>>>>>>>> points, with failover - and rebuild the entire topology from the point of >>>>>>>>> view of a network node. >>>>>>>>> - "Child node with children" are now automatically converted >>>>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>>>> - Allows you to set the "site" bandwidth independently of >>>>>>>>> the client site bandwidth. >>>>>>>>> - Makes for easier trees, because we're inserting the site >>>>>>>>> that really should be there. >>>>>>>>> - Network.json generation (not the shaped devices file yet) is >>>>>>>>> automatically generated from a tree, once PrepareTree() and >>>>>>>>> createNetworkJson() are called. >>>>>>>>> - There's a unit test that generates the >>>>>>>>> network.example.json file and compares it with the original to ensure that >>>>>>>>> they match. >>>>>>>>> - Unit test coverage hits every function in the graph system, >>>>>>>>> now. >>>>>>>>> >>>>>>>>> I'm liking this setup. With the non-vendor-specific logic >>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate >>>>>>>>> the example tree is down to 65 >>>>>>>>> lines of code, including comments. That'll grow a bit as I >>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides >>>>>>>>> ( >>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>>>> >>>>>>>>> Creating the network.example.json file only requires: >>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>>>>> import json >>>>>>>>> net = NetworkGraph() >>>>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", >>>>>>>>> NodeType.site, 1000, 1000)) >>>>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", >>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>>>> NodeType.ap, 500, 500)) >>>>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", >>>>>>>>> NodeType.ap, 120, 120)) >>>>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>>>> NodeType.site, 60, 60)) >>>>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>>>> NodeType.ap, 30, 30)) >>>>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", >>>>>>>>> NodeType.ap, 100, 100)) >>>>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>>>> NodeType.ap, 150, 150)) >>>>>>>>> net.prepareTree() >>>>>>>>> net.createNetworkJson() >>>>>>>>> >>>>>>>>> (The id and name fields are duplicated right now, I'm using >>>>>>>>> readable names to keep me sane. The third string is the parent, and the >>>>>>>>> last two numbers are bandwidth limits) >>>>>>>>> The nice, readable format being: >>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>>>>> NodeType.site, download=1000, upload=1000) >>>>>>>>> >>>>>>>>> That in turns gives you the example network: >>>>>>>>> [image: image.png] >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>>>> herberticus@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Dave: I love those Gource animations! Game development is my >>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make >>>>>>>>>> the glow "just right". :-) >>>>>>>>>> >>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to >>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make >>>>>>>>>> it easier to express network layout from integrations; that would be >>>>>>>>>> another form/layer of integration and a lot easier to work with once >>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly >>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route >>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra" >>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you >>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some >>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense. >>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers >>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test, >>>>>>>>>> since I don't have any v6 to test, currently). >>>>>>>>>> >>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>>>>> >>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>>>> >>>>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>>>> from discovery. >>>>>>>>>>> >>>>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>> >>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing >>>>>>>>>>>> this. >>>>>>>>>>>> Being able to map out these complex integrations will help ISPs >>>>>>>>>>>> a ton, and I really like that it is sharing common features between the >>>>>>>>>>>> Splynx and UISP integrations. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Robert >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>>>>> >>>>>>>>>>>>> I started by implementing a graph structure that mirrors both >>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>>>> >>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>>>> >>>>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate >>>>>>>>>>>>> the same IP, with a "service IP" entry >>>>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>>>>> sites >>>>>>>>>>>>> >>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>>>> >>>>>>>>>>>>> So I fed our network into the current graph generator, and >>>>>>>>>>>>> used graphviz to spit out a directed graph: >>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Green = the root site. >>>>>>>>>>>>> - Red = a site >>>>>>>>>>>>> - Blue = an access point >>>>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>>>> >>>>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>>>> >>>>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Robert Chacón >>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Chacón >>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>> Dev | LibreQoS.io >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> LibreQoS mailing list >>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Robert Chacón >>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>> Dev | LibreQoS.io >>>>>> >>>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> Robert Chacón >>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>> Dev | LibreQoS.io >>>> >>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> This song goes out to all the folk that thought Stadia would work: >>> >>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>> Dave Täht CEO, TekLibre, LLC >>> >> > > -- > This song goes out to all the folk that thought Stadia would work: > > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz > Dave Täht CEO, TekLibre, LLC > [-- Attachment #1.2: Type: text/html, Size: 51494 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 1:26 ` Herbert Wolverson @ 2022-10-31 1:36 ` Herbert Wolverson 2022-10-31 1:46 ` Herbert Wolverson 0 siblings, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-31 1:36 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 32744 bytes --] On a high-level, I've been playing with: - The brute force approach: have a bigger buffer, so exhaustion is less likely to ever happen. - A shared "config" flag that turns off monitoring once exhaustion is near - it costs one synchronized lookup/increment, and gets reset when you read the stats. - Per-CPU buffers for the very volatile data, which is generally faster (at the expense of RAM) - but is also quite hard to manage from userspace. It significantly reduces the likelihood of stalling, but I'm not fond of the complexity so far. - Replacing the volatile "packet buffer" with a "least recently used" map that automatically gets rid of old data if it isn't cleaned up (the original only cleans up when a TCP connection closes gracefully) - Maintaining two sets of buffers and keeping a pointer to each. A shared config variable indicates whether we are currently writing to A or B. "Cleanup" cleans the *other* buffer and switches the pointers. So we're never sharing "hot" data with a userland cleanup. That's a lot to play with, so I'm taking my time. My gut likes the A/B switch, currently. On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson <herberticus@gmail.com> wrote: > > "average" of "what"? > > Mean TCP RTT times, as measured by pping-cpumap. There's two steps of > improvement; the original "pping" started to eat a bunch of CPU at higher > traffic levels, and I had a feeling - not entirely quantified - that the > excess CPU usage was causing some latency. Switching to pping-cpumap showed > that I was correct in my hunch. On top of that,as Robert had observed, the > previous version was causing a slight "stutter" when it filled the tracking > buffers (and then recovered fine). My most recent build scales the tracking > buffers up a LOT - which I was worried would cause some slowdown (since the > program is now searching a much larger hashmap space, making it less cache > friendly). The buffer increase fixed up the stutter issue. I probably > should have been a little more clear on what I was talking about. I'm still > trying to figure out the optimal buffer size, and the optimal stats > collection (which "resets" the buffers, eliminating any resource depletion) > period. > > I'm also experimenting with a few other ideas to keep the measurement > latency more consistent. I tried "dump it all into a perfmap and figure it > out in userspace" which went spectacularly badly. :-| > > The RTT measurements are from the customer to whatever the heck they are > using on the Internet. So customers using a slow service that's > bottlenecked far outside of my control will negatively affect the results - > but there's nothing I can do about that. Coincidentally, it's the same > "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself > included) tend to have a "feel" for it. If I remember rightly, Preseem > (which is basically fq-codel queues per customer, with an optional layer of > AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms > as "red" - and a lot of WISPs have become used to that grading. I always > thought that an average of 70ms seemed pretty excessive to be "good". The > idea is that it's quantifying the customer's *experience* - the lower the > average, the snappier the connection "feels". You can have a pretty happy > customer with very low latency and a low speed plan, if they aren't doing > anything that needs to exhaust their speed plan. (This contrasts with a lot > of other solutions - notably Sandvine - which have always focused heavily > on "how much less upsteam does the ISP need to buy?") > > On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht@gmail.com> wrote: > >> >> >> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com> >> wrote: >> >>> > For starters, let me also offer praise for this work which is so ahead >>> of schedule! >>> >>> Thank you. I'm enjoying a short period while I wait for my editor to >>> finish up with a couple of chapters of my next book (working title More >>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens >>> of game development). >>> >> >> cool. I'm 32 years into my PHD thesis. >> >> >>> >>> I think at least initially, the primary focus is on what WISPs are used >>> to (and ask for): a fat shaper box that sits between a WISP and their >>> Internet connection(s). Usually in the topology: (router connected to >>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's >>> network as a whole). That's a simplification; there's usually a bypass (in >>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections >>> that need shaping, etc. That's how Preseem (and the others) tend to insert >>> themselves - shape everything on the way out. >>> >> >> Presently LibreQos appears to be inserting about 200us of delay into the >> path, for the sparsest packets. Every box on the path adds >> delay, though cut-through switches are common. Don't talk to me about >> network slicing and disaggregated this or that in the 3GPP world, tho... >> ugh. >> >> I guess, for every "box" (or virtual machine) on the path I have amdah's >> law stuck in my head. >> >> This is in part why the K8 crowd makes me a little crazy. >> >> >>> >>> I think there's a lot to be said for the possibility of LibreQoS at >>> towers that need it the most, also. That might require a bit of MPLS >>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier >>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS >>> headers?), but has the potential to really clean things up. Especially for >>> a really busy tower site. (On a similar note, WISPs with multiple Internet >>> connections at different sites would benefit from LibreQoS on each of >>> them). >>> >>> Generally, the QoS box doesn't really care what you are running in the >>> way of a router. >>> >> >> It is certainly simpler to have a transparent middlebox for this stuff, >> initially, and it would take a great leap of faith, >> for many, to just plug in a lqos box as the main box... but cumulus did >> succeed at a lot of that... they open sourced a bfd daemon... numerous >> other tools... >> >> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/ >> >> >>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco >>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since >>> we're shaping in the "router sandwich" (which can be one router with a bit >>> of care), we don't necessarily need to worry too much about their innards. >>> >>> >> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, >> or a pair perhaps using a fiber optic splitter for failover >> >> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos >> >> >> >> >>> With that said, some future SNMP support (please, not polling everything >>> all the time... that's a monitoring program's job!) is probably hard to >>> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to >>> be trying to cease supporting it, ugh) >>> >>> >> Building on this initial core strength - sampling RTT - would be a >> differentiator. >> >> Examples: >> >> RTT per AP >> RTT P1 per AP (what's the effective minimum) >> RTT P99 (what's the worst case?) >> RTT variance P1 to P99 per internet IP (worst 20 performers) or AS >> number or /24 >> >> (variance is a very important concept) >> >> >> >> >> >>> I could see some support for outputting rules for routers, especially if >>> the goal is to get Cake managing buffer-bloat in many places down the line. >>> >>> Incidentally, using my latest build of cpumap-pping (and no separate >>> pping running, eating a CPU) my average network latency has dropped to 24ms >>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real >>> customer traffic through the system. :-) >>> >> >> OK, this is something that "triggers" my inner pedant. Forgive me in >> advance? >> >> "average" of "what"? >> >> Changing the monitoring tool shouldn't have affected the average latency, >> unless how it is calculated is different, or the sample >> population (more likely) has changed. If you are tracking now far more >> short flows, the observed latency will decline, but the >> higher latencies you were observing in the first place are still there. >> >> Also... between where and where? Across the network? To the customer to >> their typical set of IP addresses of their servers? >> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge >> should take under 2ms). Wifi hops at the end of the link are >> probably adding the most delay... >> >> If you consider 24ms "good" - however you calculate - going for ever >> less via whatever means can be obtained from these >> analyses, is useful. But there are some things I don't think make as much >> sense as they used to - a netflix cache hitrate must >> be so low nowadays as to cost you just as much to fetch it from upstream >> than host a box... >> >> >> >> >>> >>> >>> >>> >>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote: >>> >>>> For starters, let me also offer praise for this work which is so ahead >>>> of schedule! >>>> >>>> I am (perhaps cluelessly) thinking about bigger pictures, and still >>>> stuck in my mindset involving distributing the packet processing, >>>> and representing the network topology, plans and compensating for the >>>> physics. >>>> >>>> So you have a major tower, a separate libreqos instance goes there. Or >>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is >>>> there. Or are you basically thinking one device rules them all and off the >>>> only interface, shapes them? >>>> >>>> Or: >>>> >>>> You have another pop with a separate connection to the internet that >>>> you inherited from a buyout, or you wanted physical redundancy for your BGP >>>> AS's internet access, maybe just between DCs in the same town or... >>>> ____________________________________________ >>>> >>>> / >>>> / >>>> cloud -> pop -> customers - customers <- pop <- cloud >>>> \ ----- leased fiber or wireless / >>>> >>>> >>>> I'm also a little puzzled as to whats the ISP->internet link? juniper? >>>> cisco? mikrotik, and what role and services that is expected to have. >>>> >>>> >>>> >>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not >>>>> added. >>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>>>> side of things, so it can be used in other integrations also. I don't have >>>>> a setup on which to test it, but if I'm reading the code right then the >>>>> unit test is testing it appropriately. >>>>> >>>>> Fantastic. >>>>> >>>>> > excludeSites is supported as a common API feature. If a node is >>>>> added with a name that matches an excluded site, it won't be added. The >>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>> the shaper root, so if you have other tree items that rely on this site - >>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>> bumping it to the top-level with excludeSites would probably help our load >>>>> balancing quite a bit) >>>>> >>>>> Very cool approach, I like it! Yeah we have some cases where we need >>>>> to balance out high load child nodes across CPUs so that's perfect. >>>>> Originally I thought of it to just exclude sites that don't fit into >>>>> the shaped topology but this approach is more useful. >>>>> Should we rename excludeSites to moveSitesToTop or something similar? >>>>> That functionality of distributing across top level nodes / cpu cores seems >>>>> more important anyway. >>>>> >>>>> >exceptionCPEs is also supported as a common API feature. It simply >>>>> overrides the "parentId'' of incoming nodes with the new parent. Another >>>>> potentially useful feature; if I got excludeSites the wrong away around, >>>>> I'd add a "my_big_site":"" entry to push it to the top. >>>>> >>>>> Awesome >>>>> >>>>> > UISP integration now supports a "flat" topology option (set via >>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>>>> to include this entry. >>>>> >>>>> Nice! >>>>> >>>>> > I'll look and see how much of the Spylnx code I can shorten with the >>>>> new API; I don't have a Spylnx setup to test against, making that tricky. >>>>> >>>>> I'll send you the Splynx login they gave us. >>>>> >>>>> > I *think* the new API should shorten things a lot. I think routers >>>>> act as node parents, with clients underneath them? Otherwise, a "flat" >>>>> setup should be a little shorter (the CSV code can be replaced with a call >>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>>>> a RADIUS call in the DHCP server; I've always assumed that's because those >>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>> to model topology.* >>>>> >>>>> Yeah splynx doesn't seem to natively support any topology mapping or >>>>> even AP designation, one person I spoke to said they track corresponding >>>>> APs in radius anyway. So for now the flat model may be fine. >>>>> >>>>> > I need to clean things up a bit (there's still a bit of duplicated >>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>> think I can have this as a PR for you on Monday. >>>>> >>>>> This is really great work and will make future integrations much >>>>> cleaner and nicer to work with. Thank you! >>>>> >>>>> >>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> Alright, the UISP side of the common integrations is pretty much >>>>>> feature complete. I'll update the tracking issue in a bit. >>>>>> >>>>>> - Per your suggestion, devices with no IP addresses (v4 or v6) >>>>>> are not added. >>>>>> - Mikrotik "4 to 6" mapping is implemented. I put it in the >>>>>> "common" side of things, so it can be used in other integrations also. I >>>>>> don't have a setup on which to test it, but if I'm reading the code right >>>>>> then the unit test is testing it appropriately. >>>>>> - excludeSites is supported as a common API feature. If a node is >>>>>> added with a name that matches an excluded site, it won't be added. The >>>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>>> the shaper root, so if you have other tree items that rely on this site - >>>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>>> bumping it to the top-level with excludeSites would probably help our load >>>>>> balancing quite a bit) >>>>>> - If the intent was to exclude the site and everything >>>>>> underneath it, I'd have to rework things a bit. Let me know; it wasn't >>>>>> quite clear. >>>>>> - exceptionCPEs is also supported as a common API feature. It >>>>>> simply overrides the "parentId'' of incoming nodes with the new parent. >>>>>> Another potentially useful feature; if I got excludeSites the wrong away >>>>>> around, I'd add a "my_big_site":"" entry to push it to the top. >>>>>> - UISP integration now supports a "flat" topology option (set via >>>>>> uispStrategy = "flat" in ispConfig). I expanded >>>>>> ispConfig.example.py to include this entry. >>>>>> >>>>>> I'll look and see how much of the Spylnx code I can shorten with the >>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I >>>>>> *think* the new API should shorten things a lot. I think routers act >>>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup >>>>>> should be a little shorter (the CSV code can be replaced with a call to the >>>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >>>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a >>>>>> RADIUS call in the DHCP server; I've always assumed that's because those >>>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>>> to model topology.* >>>>>> >>>>>> I need to clean things up a bit (there's still a bit of duplicated >>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>>> think I can have this as a PR for you on Monday. >>>>>> >>>>>> * - The first big wireless network I setup used a Motorola WiMAX >>>>>> setup. They *required* that every single AP share two VLANs >>>>>> (management and bearer) with every other AP - all the way to the core. It >>>>>> kinda worked once they remembered client isolation was a thing in a >>>>>> patch... Then again, their installation instructions included connecting >>>>>> two ports of a router together with a jumper cable, because their localhost >>>>>> implementation didn't quite work. :-| >>>>>> >>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>> >>>>>>> Awesome work. It succeeded in building the topology and creating >>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in >>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>>>>> being empty lists []. >>>>>>> This is not necessarily bad, but it may lead to empty leaf classes >>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>>>>> class counter increment toward the 32k limit faster. >>>>>>> Do you think perhaps we should check: >>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>>>>> * # Skip adding this entry to ShapedDevices.csv* >>>>>>> Or something similar around line 329 of integrationCommon.py? >>>>>>> Open to your suggestions there. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>> >>>>>>>> One more update, and I'm going to sleep until "pick up daughter" >>>>>>>> time. :-) >>>>>>>> >>>>>>>> The tree at >>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>> can now build a network.json, ShapedDevices.csv, and >>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>>>>> previous importer - other than using data links to build the hierarchy and >>>>>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>>>>> recommend just running integrationUISP.py and checking the output rather >>>>>>>> than throwing it into production), I'd appreciate any feedback. >>>>>>>> >>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>>>>> exceptionCPE and site exclusion. >>>>>>>> >>>>>>>> If you want the pretty graphics, you need to "pip install graphviz" >>>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't >>>>>>>> present and not try to draw pictures, otherwise. >>>>>>>> >>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>>>> >>>>>>>>> Wow. This is very nicely done. Awesome work! >>>>>>>>> >>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>> >>>>>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>>>>> >>>>>>>>>> - You can specify a variable in ispConfig.py named >>>>>>>>>> "uispSite". This sets where in the topology you want the tree to start. >>>>>>>>>> This has two purposes: >>>>>>>>>> - It's hard to be psychic and know for sure where the >>>>>>>>>> shaper is in the network. >>>>>>>>>> - You could run multiple shapers at different egress >>>>>>>>>> points, with failover - and rebuild the entire topology from the point of >>>>>>>>>> view of a network node. >>>>>>>>>> - "Child node with children" are now automatically converted >>>>>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>>>>> - Allows you to set the "site" bandwidth independently of >>>>>>>>>> the client site bandwidth. >>>>>>>>>> - Makes for easier trees, because we're inserting the site >>>>>>>>>> that really should be there. >>>>>>>>>> - Network.json generation (not the shaped devices file yet) >>>>>>>>>> is automatically generated from a tree, once PrepareTree() and >>>>>>>>>> createNetworkJson() are called. >>>>>>>>>> - There's a unit test that generates the >>>>>>>>>> network.example.json file and compares it with the original to ensure that >>>>>>>>>> they match. >>>>>>>>>> - Unit test coverage hits every function in the graph system, >>>>>>>>>> now. >>>>>>>>>> >>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic >>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate >>>>>>>>>> the example tree is down to 65 >>>>>>>>>> lines of code, including comments. That'll grow a bit as I >>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides >>>>>>>>>> ( >>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>>>>> >>>>>>>>>> Creating the network.example.json file only requires: >>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>>>>>> import json >>>>>>>>>> net = NetworkGraph() >>>>>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", >>>>>>>>>> NodeType.site, 1000, 1000)) >>>>>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", >>>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>>>>> NodeType.ap, 500, 500)) >>>>>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", >>>>>>>>>> NodeType.ap, 120, 120)) >>>>>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>>>>> NodeType.site, 60, 60)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>>>>> NodeType.ap, 30, 30)) >>>>>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", >>>>>>>>>> NodeType.ap, 100, 100)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>>>>> NodeType.ap, 150, 150)) >>>>>>>>>> net.prepareTree() >>>>>>>>>> net.createNetworkJson() >>>>>>>>>> >>>>>>>>>> (The id and name fields are duplicated right now, I'm using >>>>>>>>>> readable names to keep me sane. The third string is the parent, and the >>>>>>>>>> last two numbers are bandwidth limits) >>>>>>>>>> The nice, readable format being: >>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>>>>>> NodeType.site, download=1000, upload=1000) >>>>>>>>>> >>>>>>>>>> That in turns gives you the example network: >>>>>>>>>> [image: image.png] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>>>>> herberticus@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Dave: I love those Gource animations! Game development is my >>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make >>>>>>>>>>> the glow "just right". :-) >>>>>>>>>>> >>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to >>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make >>>>>>>>>>> it easier to express network layout from integrations; that would be >>>>>>>>>>> another form/layer of integration and a lot easier to work with once >>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly >>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route >>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra" >>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you >>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some >>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense. >>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers >>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test, >>>>>>>>>>> since I don't have any v6 to test, currently). >>>>>>>>>>> >>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>>>>> >>>>>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>>>>> from discovery. >>>>>>>>>>>> >>>>>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing >>>>>>>>>>>>> this. >>>>>>>>>>>>> Being able to map out these complex integrations will help >>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between >>>>>>>>>>>>> the Splynx and UISP integrations. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Robert >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS >>>>>>>>>>>>> <libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I started by implementing a graph structure that mirrors both >>>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>>>>> >>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate >>>>>>>>>>>>>> the same IP, with a "service IP" entry >>>>>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>>>>>> sites >>>>>>>>>>>>>> >>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>>>>> >>>>>>>>>>>>>> So I fed our network into the current graph generator, and >>>>>>>>>>>>>> used graphviz to spit out a directed graph: >>>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Green = the root site. >>>>>>>>>>>>>> - Red = a site >>>>>>>>>>>>>> - Blue = an access point >>>>>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>>>>> >>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Robert Chacón >>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>> LibreQoS mailing list >>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Chacón >>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>> Dev | LibreQoS.io >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>> LibreQoS mailing list >>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Robert Chacón >>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>> Dev | LibreQoS.io >>>>>>> >>>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Chacón >>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>> Dev | LibreQoS.io >>>>> >>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> This song goes out to all the folk that thought Stadia would work: >>>> >>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>>> Dave Täht CEO, TekLibre, LLC >>>> >>> >> >> -- >> This song goes out to all the folk that thought Stadia would work: >> >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >> Dave Täht CEO, TekLibre, LLC >> > [-- Attachment #1.2: Type: text/html, Size: 53131 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 1:36 ` Herbert Wolverson @ 2022-10-31 1:46 ` Herbert Wolverson 2022-10-31 2:21 ` Dave Taht 0 siblings, 1 reply; 33+ messages in thread From: Herbert Wolverson @ 2022-10-31 1:46 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 35053 bytes --] While I remember, a quick Preseem anecdote. The majority of WISPs I've talked to who have adopted Preseem run it in "monitor only" mode for a bit, and then turn it on. That way, you can see that it did something. Not a bad idea for us to support. It's *remarkable* how many WISPs see a sea of red when they first start monitoring - 100ms+ RTT times (for whatever customer traffic exists) is pretty common. Just enabling FQ_CODEL, mapped to the customer's speed limit, tends to start bringing things down into the green/yellow. I begged them for Cake a few times (along with the ability to set site/backhaul hierarchies) - and was always told "it's not worth the extra CPU load". Our experience, turning on BracketQoS (which is basically LibreQoS, in Rust and designed for our network) was that the remaining reds became yellows, the remaining yellows became green and customers reported a "snappier" experience. It's so hard to quantify the latter. I could feel the difference at my desk; fire up a video while a download was running, and it simply "felt" like it responded better. TCP RTT times are the best measure of "feel" I've found, so far. We've tended to go with "median" latency as a guide, rather than mean. Thanks to monitoring things beyond our control, some of the outliers tend to be *really bad* - even if the network is fine. There's literally nothing we can do about a customer trying to work with a malfunctioning system somewhere (in space, for all I know!) On Sun, Oct 30, 2022 at 8:36 PM Herbert Wolverson <herberticus@gmail.com> wrote: > On a high-level, I've been playing with: > > - The brute force approach: have a bigger buffer, so exhaustion is > less likely to ever happen. > - A shared "config" flag that turns off monitoring once exhaustion is > near - it costs one synchronized lookup/increment, and gets reset when you > read the stats. > - Per-CPU buffers for the very volatile data, which is generally > faster (at the expense of RAM) - but is also quite hard to manage from > userspace. It significantly reduces the likelihood of stalling, but I'm not > fond of the complexity so far. > - Replacing the volatile "packet buffer" with a "least recently used" > map that automatically gets rid of old data if it isn't cleaned up (the > original only cleans up when a TCP connection closes gracefully) > - Maintaining two sets of buffers and keeping a pointer to each. A > shared config variable indicates whether we are currently writing to A or > B. "Cleanup" cleans the *other* buffer and switches the pointers. So > we're never sharing "hot" data with a userland cleanup. > > That's a lot to play with, so I'm taking my time. My gut likes the A/B > switch, currently. > > On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson <herberticus@gmail.com> > wrote: > >> > "average" of "what"? >> >> Mean TCP RTT times, as measured by pping-cpumap. There's two steps of >> improvement; the original "pping" started to eat a bunch of CPU at higher >> traffic levels, and I had a feeling - not entirely quantified - that the >> excess CPU usage was causing some latency. Switching to pping-cpumap showed >> that I was correct in my hunch. On top of that,as Robert had observed, the >> previous version was causing a slight "stutter" when it filled the tracking >> buffers (and then recovered fine). My most recent build scales the tracking >> buffers up a LOT - which I was worried would cause some slowdown (since the >> program is now searching a much larger hashmap space, making it less cache >> friendly). The buffer increase fixed up the stutter issue. I probably >> should have been a little more clear on what I was talking about. I'm still >> trying to figure out the optimal buffer size, and the optimal stats >> collection (which "resets" the buffers, eliminating any resource depletion) >> period. >> >> I'm also experimenting with a few other ideas to keep the measurement >> latency more consistent. I tried "dump it all into a perfmap and figure it >> out in userspace" which went spectacularly badly. :-| >> >> The RTT measurements are from the customer to whatever the heck they are >> using on the Internet. So customers using a slow service that's >> bottlenecked far outside of my control will negatively affect the results - >> but there's nothing I can do about that. Coincidentally, it's the same >> "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself >> included) tend to have a "feel" for it. If I remember rightly, Preseem >> (which is basically fq-codel queues per customer, with an optional layer of >> AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms >> as "red" - and a lot of WISPs have become used to that grading. I always >> thought that an average of 70ms seemed pretty excessive to be "good". The >> idea is that it's quantifying the customer's *experience* - the lower >> the average, the snappier the connection "feels". You can have a pretty >> happy customer with very low latency and a low speed plan, if they aren't >> doing anything that needs to exhaust their speed plan. (This contrasts with >> a lot of other solutions - notably Sandvine - which have always focused >> heavily on "how much less upsteam does the ISP need to buy?") >> >> On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht@gmail.com> wrote: >> >>> >>> >>> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus@gmail.com> >>> wrote: >>> >>>> > For starters, let me also offer praise for this work which is so >>>> ahead of schedule! >>>> >>>> Thank you. I'm enjoying a short period while I wait for my editor to >>>> finish up with a couple of chapters of my next book (working title More >>>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens >>>> of game development). >>>> >>> >>> cool. I'm 32 years into my PHD thesis. >>> >>> >>>> >>>> I think at least initially, the primary focus is on what WISPs are used >>>> to (and ask for): a fat shaper box that sits between a WISP and their >>>> Internet connection(s). Usually in the topology: (router connected to >>>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's >>>> network as a whole). That's a simplification; there's usually a bypass (in >>>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections >>>> that need shaping, etc. That's how Preseem (and the others) tend to insert >>>> themselves - shape everything on the way out. >>>> >>> >>> Presently LibreQos appears to be inserting about 200us of delay into the >>> path, for the sparsest packets. Every box on the path adds >>> delay, though cut-through switches are common. Don't talk to me about >>> network slicing and disaggregated this or that in the 3GPP world, tho... >>> ugh. >>> >>> I guess, for every "box" (or virtual machine) on the path I have amdah's >>> law stuck in my head. >>> >>> This is in part why the K8 crowd makes me a little crazy. >>> >>> >>>> >>>> I think there's a lot to be said for the possibility of LibreQoS at >>>> towers that need it the most, also. That might require a bit of MPLS >>>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier >>>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS >>>> headers?), but has the potential to really clean things up. Especially for >>>> a really busy tower site. (On a similar note, WISPs with multiple Internet >>>> connections at different sites would benefit from LibreQoS on each of >>>> them). >>>> >>>> Generally, the QoS box doesn't really care what you are running in the >>>> way of a router. >>>> >>> >>> It is certainly simpler to have a transparent middlebox for this stuff, >>> initially, and it would take a great leap of faith, >>> for many, to just plug in a lqos box as the main box... but cumulus did >>> succeed at a lot of that... they open sourced a bfd daemon... numerous >>> other tools... >>> >>> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/ >>> >>> >>>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco >>>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since >>>> we're shaping in the "router sandwich" (which can be one router with a bit >>>> of care), we don't necessarily need to worry too much about their innards. >>>> >>>> >>> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, >>> or a pair perhaps using a fiber optic splitter for failover >>> >>> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos >>> >>> >>> >>> >>>> With that said, some future SNMP support (please, not polling >>>> everything all the time... that's a monitoring program's job!) is probably >>>> hard to avoid. At least that's relatively vendor agnostic (even if Ubiquiti >>>> seem to be trying to cease supporting it, ugh) >>>> >>>> >>> Building on this initial core strength - sampling RTT - would be a >>> differentiator. >>> >>> Examples: >>> >>> RTT per AP >>> RTT P1 per AP (what's the effective minimum) >>> RTT P99 (what's the worst case?) >>> RTT variance P1 to P99 per internet IP (worst 20 performers) or AS >>> number or /24 >>> >>> (variance is a very important concept) >>> >>> >>> >>> >>> >>>> I could see some support for outputting rules for routers, especially >>>> if the goal is to get Cake managing buffer-bloat in many places down the >>>> line. >>>> >>>> Incidentally, using my latest build of cpumap-pping (and no separate >>>> pping running, eating a CPU) my average network latency has dropped to 24ms >>>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real >>>> customer traffic through the system. :-) >>>> >>> >>> OK, this is something that "triggers" my inner pedant. Forgive me in >>> advance? >>> >>> "average" of "what"? >>> >>> Changing the monitoring tool shouldn't have affected the average >>> latency, unless how it is calculated is different, or the sample >>> population (more likely) has changed. If you are tracking now far more >>> short flows, the observed latency will decline, but the >>> higher latencies you were observing in the first place are still there. >>> >>> Also... between where and where? Across the network? To the customer to >>> their typical set of IP addresses of their servers? >>> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge >>> should take under 2ms). Wifi hops at the end of the link are >>> probably adding the most delay... >>> >>> If you consider 24ms "good" - however you calculate - going for ever >>> less via whatever means can be obtained from these >>> analyses, is useful. But there are some things I don't think make as >>> much sense as they used to - a netflix cache hitrate must >>> be so low nowadays as to cost you just as much to fetch it from upstream >>> than host a box... >>> >>> >>> >>> >>>> >>>> >>>> >>>> >>>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht@gmail.com> wrote: >>>> >>>>> For starters, let me also offer praise for this work which is so ahead >>>>> of schedule! >>>>> >>>>> I am (perhaps cluelessly) thinking about bigger pictures, and still >>>>> stuck in my mindset involving distributing the packet processing, >>>>> and representing the network topology, plans and compensating for the >>>>> physics. >>>>> >>>>> So you have a major tower, a separate libreqos instance goes there. Or >>>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is >>>>> there. Or are you basically thinking one device rules them all and off the >>>>> only interface, shapes them? >>>>> >>>>> Or: >>>>> >>>>> You have another pop with a separate connection to the internet that >>>>> you inherited from a buyout, or you wanted physical redundancy for your BGP >>>>> AS's internet access, maybe just between DCs in the same town or... >>>>> ____________________________________________ >>>>> >>>>> / >>>>> / >>>>> cloud -> pop -> customers - customers <- pop <- cloud >>>>> \ ----- leased fiber or wireless / >>>>> >>>>> >>>>> I'm also a little puzzled as to whats the ISP->internet link? juniper? >>>>> cisco? mikrotik, and what role and services that is expected to have. >>>>> >>>>> >>>>> >>>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are >>>>>> not added. >>>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>>>>> side of things, so it can be used in other integrations also. I don't have >>>>>> a setup on which to test it, but if I'm reading the code right then the >>>>>> unit test is testing it appropriately. >>>>>> >>>>>> Fantastic. >>>>>> >>>>>> > excludeSites is supported as a common API feature. If a node is >>>>>> added with a name that matches an excluded site, it won't be added. The >>>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>>> the shaper root, so if you have other tree items that rely on this site - >>>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>>> bumping it to the top-level with excludeSites would probably help our load >>>>>> balancing quite a bit) >>>>>> >>>>>> Very cool approach, I like it! Yeah we have some cases where we need >>>>>> to balance out high load child nodes across CPUs so that's perfect. >>>>>> Originally I thought of it to just exclude sites that don't fit into >>>>>> the shaped topology but this approach is more useful. >>>>>> Should we rename excludeSites to moveSitesToTop or something similar? >>>>>> That functionality of distributing across top level nodes / cpu cores seems >>>>>> more important anyway. >>>>>> >>>>>> >exceptionCPEs is also supported as a common API feature. It simply >>>>>> overrides the "parentId'' of incoming nodes with the new parent. Another >>>>>> potentially useful feature; if I got excludeSites the wrong away around, >>>>>> I'd add a "my_big_site":"" entry to push it to the top. >>>>>> >>>>>> Awesome >>>>>> >>>>>> > UISP integration now supports a "flat" topology option (set via >>>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>>>>> to include this entry. >>>>>> >>>>>> Nice! >>>>>> >>>>>> > I'll look and see how much of the Spylnx code I can shorten with >>>>>> the new API; I don't have a Spylnx setup to test against, making that >>>>>> tricky. >>>>>> >>>>>> I'll send you the Splynx login they gave us. >>>>>> >>>>>> > I *think* the new API should shorten things a lot. I think routers >>>>>> act as node parents, with clients underneath them? Otherwise, a "flat" >>>>>> setup should be a little shorter (the CSV code can be replaced with a call >>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those >>>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>>> to model topology.* >>>>>> >>>>>> Yeah splynx doesn't seem to natively support any topology mapping or >>>>>> even AP designation, one person I spoke to said they track corresponding >>>>>> APs in radius anyway. So for now the flat model may be fine. >>>>>> >>>>>> > I need to clean things up a bit (there's still a bit of duplicated >>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>>> think I can have this as a PR for you on Monday. >>>>>> >>>>>> This is really great work and will make future integrations much >>>>>> cleaner and nicer to work with. Thank you! >>>>>> >>>>>> >>>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>> >>>>>>> Alright, the UISP side of the common integrations is pretty much >>>>>>> feature complete. I'll update the tracking issue in a bit. >>>>>>> >>>>>>> - Per your suggestion, devices with no IP addresses (v4 or v6) >>>>>>> are not added. >>>>>>> - Mikrotik "4 to 6" mapping is implemented. I put it in the >>>>>>> "common" side of things, so it can be used in other integrations also. I >>>>>>> don't have a setup on which to test it, but if I'm reading the code right >>>>>>> then the unit test is testing it appropriately. >>>>>>> - excludeSites is supported as a common API feature. If a node >>>>>>> is added with a name that matches an excluded site, it won't be added. The >>>>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>>>> the shaper root, so if you have other tree items that rely on this site - >>>>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>>>> bumping it to the top-level with excludeSites would probably help our load >>>>>>> balancing quite a bit) >>>>>>> - If the intent was to exclude the site and everything >>>>>>> underneath it, I'd have to rework things a bit. Let me know; it wasn't >>>>>>> quite clear. >>>>>>> - exceptionCPEs is also supported as a common API feature. It >>>>>>> simply overrides the "parentId'' of incoming nodes with the new parent. >>>>>>> Another potentially useful feature; if I got excludeSites the wrong away >>>>>>> around, I'd add a "my_big_site":"" entry to push it to the top. >>>>>>> - UISP integration now supports a "flat" topology option (set >>>>>>> via uispStrategy = "flat" in ispConfig). I expanded >>>>>>> ispConfig.example.py to include this entry. >>>>>>> >>>>>>> I'll look and see how much of the Spylnx code I can shorten with the >>>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I >>>>>>> *think* the new API should shorten things a lot. I think routers >>>>>>> act as node parents, with clients underneath them? Otherwise, a "flat" >>>>>>> setup should be a little shorter (the CSV code can be replaced with a call >>>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those >>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>>>> to model topology.* >>>>>>> >>>>>>> I need to clean things up a bit (there's still a bit of duplicated >>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>>>> think I can have this as a PR for you on Monday. >>>>>>> >>>>>>> * - The first big wireless network I setup used a Motorola WiMAX >>>>>>> setup. They *required* that every single AP share two VLANs >>>>>>> (management and bearer) with every other AP - all the way to the core. It >>>>>>> kinda worked once they remembered client isolation was a thing in a >>>>>>> patch... Then again, their installation instructions included connecting >>>>>>> two ports of a router together with a jumper cable, because their localhost >>>>>>> implementation didn't quite work. :-| >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>>> >>>>>>>> Awesome work. It succeeded in building the topology and creating >>>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in >>>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>>>>>> being empty lists []. >>>>>>>> This is not necessarily bad, but it may lead to empty leaf classes >>>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>>>>>> class counter increment toward the 32k limit faster. >>>>>>>> Do you think perhaps we should check: >>>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>>>>>> * # Skip adding this entry to ShapedDevices.csv* >>>>>>>> Or something similar around line 329 of integrationCommon.py? >>>>>>>> Open to your suggestions there. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> One more update, and I'm going to sleep until "pick up daughter" >>>>>>>>> time. :-) >>>>>>>>> >>>>>>>>> The tree at >>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>> can now build a network.json, ShapedDevices.csv, and >>>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>>>>>> previous importer - other than using data links to build the hierarchy and >>>>>>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>>>>>> recommend just running integrationUISP.py and checking the output rather >>>>>>>>> than throwing it into production), I'd appreciate any feedback. >>>>>>>>> >>>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>>>>>> exceptionCPE and site exclusion. >>>>>>>>> >>>>>>>>> If you want the pretty graphics, you need to "pip install >>>>>>>>> graphviz" and "sudo apt install graphviz". It *should* detect that these >>>>>>>>> aren't present and not try to draw pictures, otherwise. >>>>>>>>> >>>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>>>>> >>>>>>>>>> Wow. This is very nicely done. Awesome work! >>>>>>>>>> >>>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>> >>>>>>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>>>>>> >>>>>>>>>>> - You can specify a variable in ispConfig.py named >>>>>>>>>>> "uispSite". This sets where in the topology you want the tree to start. >>>>>>>>>>> This has two purposes: >>>>>>>>>>> - It's hard to be psychic and know for sure where the >>>>>>>>>>> shaper is in the network. >>>>>>>>>>> - You could run multiple shapers at different egress >>>>>>>>>>> points, with failover - and rebuild the entire topology from the point of >>>>>>>>>>> view of a network node. >>>>>>>>>>> - "Child node with children" are now automatically converted >>>>>>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>>>>>> - Allows you to set the "site" bandwidth independently of >>>>>>>>>>> the client site bandwidth. >>>>>>>>>>> - Makes for easier trees, because we're inserting the >>>>>>>>>>> site that really should be there. >>>>>>>>>>> - Network.json generation (not the shaped devices file yet) >>>>>>>>>>> is automatically generated from a tree, once PrepareTree() and >>>>>>>>>>> createNetworkJson() are called. >>>>>>>>>>> - There's a unit test that generates the >>>>>>>>>>> network.example.json file and compares it with the original to ensure that >>>>>>>>>>> they match. >>>>>>>>>>> - Unit test coverage hits every function in the graph >>>>>>>>>>> system, now. >>>>>>>>>>> >>>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic >>>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate >>>>>>>>>>> the example tree is down to 65 >>>>>>>>>>> lines of code, including comments. That'll grow a bit as I >>>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides >>>>>>>>>>> ( >>>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>>>>>> >>>>>>>>>>> Creating the network.example.json file only requires: >>>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, >>>>>>>>>>> NodeType >>>>>>>>>>> import json >>>>>>>>>>> net = NetworkGraph() >>>>>>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", >>>>>>>>>>> NodeType.site, 1000, 1000)) >>>>>>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", >>>>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>>>>>> NodeType.ap, 500, 500)) >>>>>>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", >>>>>>>>>>> NodeType.ap, 120, 120)) >>>>>>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>>>>>> NodeType.site, 60, 60)) >>>>>>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>>>>>> NodeType.ap, 30, 30)) >>>>>>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", >>>>>>>>>>> NodeType.ap, 100, 100)) >>>>>>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>>>>>> NodeType.ap, 150, 150)) >>>>>>>>>>> net.prepareTree() >>>>>>>>>>> net.createNetworkJson() >>>>>>>>>>> >>>>>>>>>>> (The id and name fields are duplicated right now, I'm using >>>>>>>>>>> readable names to keep me sane. The third string is the parent, and the >>>>>>>>>>> last two numbers are bandwidth limits) >>>>>>>>>>> The nice, readable format being: >>>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type >>>>>>>>>>> =NodeType.site, download=1000, upload=1000) >>>>>>>>>>> >>>>>>>>>>> That in turns gives you the example network: >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>>>>>> herberticus@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Dave: I love those Gource animations! Game development is my >>>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make >>>>>>>>>>>> the glow "just right". :-) >>>>>>>>>>>> >>>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to >>>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make >>>>>>>>>>>> it easier to express network layout from integrations; that would be >>>>>>>>>>>> another form/layer of integration and a lot easier to work with once >>>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly >>>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route >>>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra" >>>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you >>>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some >>>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense. >>>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers >>>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test, >>>>>>>>>>>> since I don't have any v6 to test, currently). >>>>>>>>>>>> >>>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like >>>>>>>>>>>> a database for a ton of external tools (mostly ones we've created). >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>>>>>> >>>>>>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>>>>>> from discovery. >>>>>>>>>>>>> >>>>>>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing >>>>>>>>>>>>>> this. >>>>>>>>>>>>>> Being able to map out these complex integrations will help >>>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between >>>>>>>>>>>>>> the Splynx and UISP integrations. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Robert >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via >>>>>>>>>>>>>> LibreQoS <libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> So I've been doing some work on getting UISP integration >>>>>>>>>>>>>>> (and integrations in general) to work a bit more smoothly. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I started by implementing a graph structure that mirrors >>>>>>>>>>>>>>> both the networks and sites system. It's not done yet, but the basics are >>>>>>>>>>>>>>> coming together nicely. You can see my progress so far at: >>>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate >>>>>>>>>>>>>>> the same IP, with a "service IP" entry >>>>>>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>>>>>> - Sites that go to client sites, which go to other >>>>>>>>>>>>>>> client sites >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So I fed our network into the current graph generator, and >>>>>>>>>>>>>>> used graphviz to spit out a directed graph: >>>>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Green = the root site. >>>>>>>>>>>>>>> - Red = a site >>>>>>>>>>>>>>> - Blue = an access point >>>>>>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Robert Chacón >>>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Robert Chacón >>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>>> Dev | LibreQoS.io >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Chacón >>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>> Dev | LibreQoS.io >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> LibreQoS mailing list >>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Robert Chacón >>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>> Dev | LibreQoS.io >>>>>> >>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> This song goes out to all the folk that thought Stadia would work: >>>>> >>>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>>>> Dave Täht CEO, TekLibre, LLC >>>>> >>>> >>> >>> -- >>> This song goes out to all the folk that thought Stadia would work: >>> >>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>> Dave Täht CEO, TekLibre, LLC >>> >> [-- Attachment #1.2: Type: text/html, Size: 55125 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 1:46 ` Herbert Wolverson @ 2022-10-31 2:21 ` Dave Taht 2022-10-31 3:26 ` Robert Chacón ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Dave Taht @ 2022-10-31 2:21 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1: Type: text/plain, Size: 180 bytes --] How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's blue = < 8ms green = < 20ms yellow = < 50ms orange = < 70ms red = > 70ms [-- Attachment #2: Type: text/html, Size: 327 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 2:21 ` Dave Taht @ 2022-10-31 3:26 ` Robert Chacón 2022-10-31 14:47 ` [LibreQoS] metaverse-ready metrics Dave Taht 2022-10-31 15:56 ` [LibreQoS] Integration system, aka fun with graph theory dan 2 siblings, 0 replies; 33+ messages in thread From: Robert Chacón @ 2022-10-31 3:26 UTC (permalink / raw) To: Dave Taht; +Cc: Herbert Wolverson, libreqos [-- Attachment #1: Type: text/plain, Size: 2067 bytes --] > That's a lot to play with, so I'm taking my time. My gut likes the A/B switch, currently. Take your time, I'm just thrilled to see this working so well so far. > I could feel the difference at my desk; fire up a video while a download was running, and it simply "felt" like it responded better. TCP RTT times are the best measure of "feel" I've found, so far. I've experienced the same when our network switched from LibreQoS using fq_codel to LibreQoS using CAKE. Really hard to quantify it but the "snappiness" or "feel" is noticeable to end-users. > We've tended to go with "median" latency as a guide, rather than mean. Thanks to monitoring things beyond our control, some of the outliers tend to be *really bad* - even if the network is fine. There's literally nothing we can do about a customer trying to work with a malfunctioning system somewhere (in space, for all I know!) True. And it can be sort of helpful for troubleshooting WiFi latency issues and bottlenecks inside the home and such. > "monitor only" mode Perhaps we can use ePPing just for this aspect? Or instead we could use cpumap-pping but with all HTB classes set to high rates (no plan enforcement) and no CAKE leafs. > How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's Good idea. I've now added both a standard (preseem like) table and "metaverse-ready" table of Node (AP) TCP Latency on the InfluxDB template. On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > How about the idea of "metaverse-ready" metrics, with one table that is > preseem-like and another that's > > blue = < 8ms > green = < 20ms > yellow = < 50ms > orange = < 70ms > red = > 70ms > > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> Dev | LibreQoS.io [-- Attachment #2: Type: text/html, Size: 3217 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* [LibreQoS] metaverse-ready metrics 2022-10-31 2:21 ` Dave Taht 2022-10-31 3:26 ` Robert Chacón @ 2022-10-31 14:47 ` Dave Taht 2022-10-31 14:50 ` Dave Taht 2022-10-31 15:56 ` [LibreQoS] Integration system, aka fun with graph theory dan 2 siblings, 1 reply; 33+ messages in thread From: Dave Taht @ 2022-10-31 14:47 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos On Sun, Oct 30, 2022 at 7:21 PM Dave Taht <dave.taht@gmail.com> wrote: > > How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's aquamarine = < 3.2ms - this is as low as it is possible to measure, as tcp timestamps are in ms. blue = < 8ms green = < 20ms yellow = < 50ms orange = < 70ms red = > 70ms mordor-red > 120ms is there a truly ugly tone of red, blackish, ugly as sin? (mordor-red) This above is almost but not quite, a : https://en.wikipedia.org/wiki/Seven-number_summary > -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] metaverse-ready metrics 2022-10-31 14:47 ` [LibreQoS] metaverse-ready metrics Dave Taht @ 2022-10-31 14:50 ` Dave Taht 0 siblings, 0 replies; 33+ messages in thread From: Dave Taht @ 2022-10-31 14:50 UTC (permalink / raw) To: Herbert Wolverson, Andrew McGregor; +Cc: libreqos Andrew?, I can't remember or find the name of that algebra and distribution you were so hot on 5? 8? years ago, that influenced bbr. On Mon, Oct 31, 2022 at 7:47 AM Dave Taht <dave.taht@gmail.com> wrote: > > On Sun, Oct 30, 2022 at 7:21 PM Dave Taht <dave.taht@gmail.com> wrote: > > > > How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's > > aquamarine = < 3.2ms - this is as low as it is possible to measure, as > tcp timestamps are in ms. > blue = < 8ms > green = < 20ms > yellow = < 50ms > orange = < 70ms > red = > 70ms > mordor-red > 120ms > > is there a truly ugly tone of red, blackish, ugly as sin? (mordor-red) > > This above is almost but not quite, a : > https://en.wikipedia.org/wiki/Seven-number_summary > > > > > > > -- > This song goes out to all the folk that thought Stadia would work: > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz > Dave Täht CEO, TekLibre, LLC -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 2:21 ` Dave Taht 2022-10-31 3:26 ` Robert Chacón 2022-10-31 14:47 ` [LibreQoS] metaverse-ready metrics Dave Taht @ 2022-10-31 15:56 ` dan 2022-10-31 21:19 ` Herbert Wolverson 2 siblings, 1 reply; 33+ messages in thread From: dan @ 2022-10-31 15:56 UTC (permalink / raw) To: Dave Taht; +Cc: Herbert Wolverson, libreqos [-- Attachment #1: Type: text/plain, Size: 746 bytes --] On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > How about the idea of "metaverse-ready" metrics, with one table that is > preseem-like and another that's > > blue = < 8ms > green = < 20ms > yellow = < 50ms > orange = < 70ms > red = > 70ms > These need configurable. There are a lot of wisps that would have everything orange/red. We're considering anything under 100ms good on the rural plans. Also keep in mind that if you're tracking latence via pping etc, then you need some buffer in there for the internet at large. <70ms to Amazon is one thing, they're very well connected, but <70ms to most of the internet isn't probably very realistic and would make most charts look like poop. [-- Attachment #2: Type: text/html, Size: 1217 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 15:56 ` [LibreQoS] Integration system, aka fun with graph theory dan @ 2022-10-31 21:19 ` Herbert Wolverson 2022-10-31 21:54 ` Dave Taht ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Herbert Wolverson @ 2022-10-31 21:19 UTC (permalink / raw) Cc: libreqos [-- Attachment #1: Type: text/plain, Size: 3362 bytes --] I'd agree with color coding (when it exists - no rush, IMO) being configurable. From the "how much delay are we adding" discussion earlier, I thought I'd do a little bit of profiling of the BPF programs themselves. This is with the latest round of performance updates ( https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring anything in production. I simply added a call to get the clock at the start, and again at the end - and log the difference. Measuring both XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it to the right CPU)->(egress)->(TC sends it to the right classifier, on the correct CPU and measures RTT latency). This is adding about two clock checks and a debug log entry to execution time, so measuring it is slowing it down. The results are interesting, and mostly tell me to try a different measurement system. I'm seeing a pretty wide variance. Hammering it with an iperf session and a queue capped at 5 gbit/s: most of the TC timings were 40 nanoseconds - not a packet that requires extra tracking, already in cache, so proceed. When the TCP RTT tracker fired and recorded a performance event, it peaked at 5,900 nanoseconds. So the tc xdp program seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side of things is typically in the 300-400 nanosecond range, I saw a handful of worst-case numbers in the 3400 nanosecond range. So the XDP side is adding 0.00349 ms. So - assuming worst case (and keeping the overhead added by the not-so-great monitoring), we're adding *0.0093 ms* to packet transit time with the BPF programs. With a much more sedate queue (ceiling 500 mbit/s), I saw much more consistent numbers. The vast majority of XDP timings were in the 75-150 nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't have an update to perform - peaking very occasionally at 1500 nanoseconds. Only adding 0.00155 ms to packet times is pretty good. It definitely performs best on long streams, probably because the previous lookups are all in cache. This is also making me question the answer I found to "how long does it take to read the clock?" I'd seen ballpark estimates of 53 nanoseconds. Given that this reads the clock twice, that can't be right. (I'm *really* not sure how to measure that one) Again - not a great test (I'll have to learn the perf system to do this properly - which in turn opens up the potential for flame graphs and some proper tracing). Interesting ballpark, though. On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote: > > > On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> How about the idea of "metaverse-ready" metrics, with one table that is >> preseem-like and another that's >> >> blue = < 8ms >> green = < 20ms >> yellow = < 50ms >> orange = < 70ms >> red = > 70ms >> > > These need configurable. There are a lot of wisps that would have > everything orange/red. We're considering anything under 100ms good on the > rural plans. Also keep in mind that if you're tracking latence via pping > etc, then you need some buffer in there for the internet at large. <70ms > to Amazon is one thing, they're very well connected, but <70ms to most of > the internet isn't probably very realistic and would make most charts look > like poop. > [-- Attachment #2: Type: text/html, Size: 4379 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 21:19 ` Herbert Wolverson @ 2022-10-31 21:54 ` Dave Taht 2022-10-31 21:57 ` Robert Chacón 2022-11-01 3:31 ` Dave Taht 2 siblings, 0 replies; 33+ messages in thread From: Dave Taht @ 2022-10-31 21:54 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos glibc added a vdo mapping directly to the kernel time page, so gettimeofday is not a syscall, and the results in the linux 4.0-4.2 era were in the 40ns range. Last I looked, musl used the syscall, which was much, much worse. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 21:19 ` Herbert Wolverson 2022-10-31 21:54 ` Dave Taht @ 2022-10-31 21:57 ` Robert Chacón 2022-10-31 23:31 ` dan 2022-11-01 3:31 ` Dave Taht 2 siblings, 1 reply; 33+ messages in thread From: Robert Chacón @ 2022-10-31 21:57 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1: Type: text/plain, Size: 4841 bytes --] > I'd agree with color coding (when it exists - no rush, IMO) being configurable. Thankfully it will be configurable, and easily, through the InfluxDB interface. Any operator will be able to click the Gear icon above the tables and set the thresholds to whatever is desired. I've set it to include both a standard table and "metaverse-ready" table based on Dave's threshold recommendations. - Standard (Preseem like) - green = < 75 ms - yellow = < 100 ms - red = > 100 ms - Metaverse-Ready - blue = < 8ms - green = < 20ms - yellow = < 50ms - orange = < 70ms - red = > 70ms Are the defaults here reasonable at least? Should we change the Standard table thresholds a bit? > Only adding 0.00155 ms to packet times is pretty good. Agreed! That's excellent. Great work on this so far it's looking like you're making tremendous progress. On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > I'd agree with color coding (when it exists - no rush, IMO) being > configurable. > > From the "how much delay are we adding" discussion earlier, I thought I'd > do a little bit of profiling of the BPF programs themselves. This is with > the latest round of performance updates ( > https://github.com/thebracket/cpumap-pping/issues/2), so it's not > measuring anything in production. I simply added a call to get the clock at > the start, and again at the end - and log the difference. Measuring both > XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap > sends it to the right CPU)->(egress)->(TC sends it to the right classifier, > on the correct CPU and measures RTT latency). This is adding about two > clock checks and a debug log entry to execution time, so measuring it is > slowing it down. > > The results are interesting, and mostly tell me to try a different > measurement system. I'm seeing a pretty wide variance. Hammering it with an > iperf session and a queue capped at 5 gbit/s: most of the TC timings were > 40 nanoseconds - not a packet that requires extra tracking, already in > cache, so proceed. When the TCP RTT tracker fired and recorded a > performance event, it peaked at 5,900 nanoseconds. So the tc xdp program > seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side > of things is typically in the 300-400 nanosecond range, I saw a handful of > worst-case numbers in the 3400 nanosecond range. So the XDP side is adding > 0.00349 ms. So - assuming worst case (and keeping the overhead added by the > not-so-great monitoring), we're adding *0.0093 ms* to packet transit time > with the BPF programs. > > With a much more sedate queue (ceiling 500 mbit/s), I saw much more > consistent numbers. The vast majority of XDP timings were in the 75-150 > nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't > have an update to perform - peaking very occasionally at 1500 nanoseconds. > Only adding 0.00155 ms to packet times is pretty good. > > It definitely performs best on long streams, probably because the previous > lookups are all in cache. This is also making me question the answer I > found to "how long does it take to read the clock?" I'd seen ballpark > estimates of 53 nanoseconds. Given that this reads the clock twice, that > can't be right. (I'm *really* not sure how to measure that one) > > Again - not a great test (I'll have to learn the perf system to do this > properly - which in turn opens up the potential for flame graphs and some > proper tracing). Interesting ballpark, though. > > On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote: > >> >> >> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> How about the idea of "metaverse-ready" metrics, with one table that is >>> preseem-like and another that's >>> >>> blue = < 8ms >>> green = < 20ms >>> yellow = < 50ms >>> orange = < 70ms >>> red = > 70ms >>> >> >> These need configurable. There are a lot of wisps that would have >> everything orange/red. We're considering anything under 100ms good on the >> rural plans. Also keep in mind that if you're tracking latence via pping >> etc, then you need some buffer in there for the internet at large. <70ms >> to Amazon is one thing, they're very well connected, but <70ms to most of >> the internet isn't probably very realistic and would make most charts look >> like poop. >> > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- Robert Chacón CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> Dev | LibreQoS.io [-- Attachment #2: Type: text/html, Size: 6584 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 21:57 ` Robert Chacón @ 2022-10-31 23:31 ` dan 2022-10-31 23:45 ` Dave Taht 0 siblings, 1 reply; 33+ messages in thread From: dan @ 2022-10-31 23:31 UTC (permalink / raw) To: Robert Chacón; +Cc: Herbert Wolverson, libreqos [-- Attachment #1: Type: text/plain, Size: 5940 bytes --] preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they just consolidate everything >200 to 200, basically so there's no 'terrible' color lol. I think these numbers are reasonable for standard internet service these days. for a 'default' value anyway. >100ms isn't bad service for most people, and most wisps will have a LOT of traffic coming through with >100ms from the far reaches of the internet. Maybe just reasonable defaults like preseem uses for integrated 'generic' tracking, but then have a separate graph hitting some target services. ie, try to get game servers on there, AWS, Cloudflare, Azure, Google cloud. Show a radar graphic or similar. On Mon, Oct 31, 2022 at 3:57 PM Robert Chacón via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > > I'd agree with color coding (when it exists - no rush, IMO) being > configurable. > > Thankfully it will be configurable, and easily, through the InfluxDB > interface. > Any operator will be able to click the Gear icon above the tables and set > the thresholds to whatever is desired. > I've set it to include both a standard table and "metaverse-ready" table > based on Dave's threshold recommendations. > > - Standard (Preseem like) > - green = < 75 ms > - yellow = < 100 ms > - red = > 100 ms > - Metaverse-Ready > - blue = < 8ms > - green = < 20ms > - yellow = < 50ms > - orange = < 70ms > - red = > 70ms > > Are the defaults here reasonable at least? Should we change the Standard > table thresholds a bit? > > > Only adding 0.00155 ms to packet times is pretty good. > > Agreed! That's excellent. Great work on this so far it's looking like > you're making tremendous progress. > > On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> I'd agree with color coding (when it exists - no rush, IMO) being >> configurable. >> >> From the "how much delay are we adding" discussion earlier, I thought I'd >> do a little bit of profiling of the BPF programs themselves. This is with >> the latest round of performance updates ( >> https://github.com/thebracket/cpumap-pping/issues/2), so it's not >> measuring anything in production. I simply added a call to get the clock at >> the start, and again at the end - and log the difference. Measuring both >> XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap >> sends it to the right CPU)->(egress)->(TC sends it to the right classifier, >> on the correct CPU and measures RTT latency). This is adding about two >> clock checks and a debug log entry to execution time, so measuring it is >> slowing it down. >> >> The results are interesting, and mostly tell me to try a different >> measurement system. I'm seeing a pretty wide variance. Hammering it with an >> iperf session and a queue capped at 5 gbit/s: most of the TC timings were >> 40 nanoseconds - not a packet that requires extra tracking, already in >> cache, so proceed. When the TCP RTT tracker fired and recorded a >> performance event, it peaked at 5,900 nanoseconds. So the tc xdp program >> seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side >> of things is typically in the 300-400 nanosecond range, I saw a handful of >> worst-case numbers in the 3400 nanosecond range. So the XDP side is adding >> 0.00349 ms. So - assuming worst case (and keeping the overhead added by the >> not-so-great monitoring), we're adding *0.0093 ms* to packet transit >> time with the BPF programs. >> >> With a much more sedate queue (ceiling 500 mbit/s), I saw much more >> consistent numbers. The vast majority of XDP timings were in the 75-150 >> nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't >> have an update to perform - peaking very occasionally at 1500 nanoseconds. >> Only adding 0.00155 ms to packet times is pretty good. >> >> It definitely performs best on long streams, probably because the >> previous lookups are all in cache. This is also making me question the >> answer I found to "how long does it take to read the clock?" I'd seen >> ballpark estimates of 53 nanoseconds. Given that this reads the clock >> twice, that can't be right. (I'm *really* not sure how to measure that one) >> >> Again - not a great test (I'll have to learn the perf system to do this >> properly - which in turn opens up the potential for flame graphs and some >> proper tracing). Interesting ballpark, though. >> >> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote: >> >>> >>> >>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> How about the idea of "metaverse-ready" metrics, with one table that is >>>> preseem-like and another that's >>>> >>>> blue = < 8ms >>>> green = < 20ms >>>> yellow = < 50ms >>>> orange = < 70ms >>>> red = > 70ms >>>> >>> >>> These need configurable. There are a lot of wisps that would have >>> everything orange/red. We're considering anything under 100ms good on the >>> rural plans. Also keep in mind that if you're tracking latence via pping >>> etc, then you need some buffer in there for the internet at large. <70ms >>> to Amazon is one thing, they're very well connected, but <70ms to most of >>> the internet isn't probably very realistic and would make most charts look >>> like poop. >>> >> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > Robert Chacón > CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> > Dev | LibreQoS.io > > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > [-- Attachment #2: Type: text/html, Size: 8033 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 23:31 ` dan @ 2022-10-31 23:45 ` Dave Taht 0 siblings, 0 replies; 33+ messages in thread From: Dave Taht @ 2022-10-31 23:45 UTC (permalink / raw) To: dan; +Cc: Robert Chacón, libreqos On Mon, Oct 31, 2022 at 4:32 PM dan via LibreQoS <libreqos@lists.bufferbloat.net> wrote: > > preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they just consolidate everything >200 to 200, basically so there's no 'terrible' color lol. I am sorry to hear those numbers are considered to be good. My numbers are based on human factors research, some of which are cited here: https://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/ > I think these numbers are reasonable for standard internet service these days. for a 'default' value anyway. >100ms isn't bad service for most people, and most wisps will have a LOT of traffic coming through with >100ms from the far reaches of the internet. I'm puzzled, actually. Given the rise of CDNs I would expect most internet connections to the ISP to have far less than 60ms latency at this point. Google, is typically 2ms away from most fiber in the eu, for example. Very few transactions go to the far reaches of the planet anymore, but I do lack real world data on that. > > Maybe just reasonable defaults like preseem uses for integrated 'generic' tracking, but then have a separate graph hitting some target services. ie, try to get game servers on there, AWS, Cloudflare, Azure, Google cloud. Show a radar graphic or similar. My thought for slices of the data (2nd tier support and CTO level) would be ISP infrastructure (aquamarine, less than 3ms) First hop infrastructure (blue, less than 8ms) ISP -> customer - 10-20ms (green) for wired, much worse for wifi customer to world - ideally, sub 50ms. I can certainly agree that the metaverse metrics are scary given the state of things you describe, but the 8ms figure is the bare minimum to have an acceptible experience in that virtual world. > > On Mon, Oct 31, 2022 at 3:57 PM Robert Chacón via LibreQoS <libreqos@lists.bufferbloat.net> wrote: >> >> > I'd agree with color coding (when it exists - no rush, IMO) being configurable. >> >> Thankfully it will be configurable, and easily, through the InfluxDB interface. >> Any operator will be able to click the Gear icon above the tables and set the thresholds to whatever is desired. >> I've set it to include both a standard table and "metaverse-ready" table based on Dave's threshold recommendations. >> >> Standard (Preseem like) >> >> green = < 75 ms >> yellow = < 100 ms >> red = > 100 ms >> >> Metaverse-Ready aquamarine <= 3ms >> blue = < 8ms >> green = < 20ms >> yellow = < 50ms >> orange = < 70ms >> red = > 70ms mordor-red = >100ms >> Are the defaults here reasonable at least? Should we change the Standard table thresholds a bit? Following exactly preseems current breakdown seems best for the "preseem" table. Calling it "standard", kind of requires actual standards. >> >> > Only adding 0.00155 ms to packet times is pretty good. >> >> Agreed! That's excellent. Great work on this so far it's looking like you're making tremendous progress. >> >> On Mon, Oct 31, 2022 at 3:20 PM Herbert Wolverson via LibreQoS <libreqos@lists.bufferbloat.net> wrote: >>> >>> I'd agree with color coding (when it exists - no rush, IMO) being configurable. >>> >>> From the "how much delay are we adding" discussion earlier, I thought I'd do a little bit of profiling of the BPF programs themselves. This is with the latest round of performance updates (https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring anything in production. I simply added a call to get the clock at the start, and again at the end - and log the difference. Measuring both XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it to the right CPU)->(egress)->(TC sends it to the right classifier, on the correct CPU and measures RTT latency). This is adding about two clock checks and a debug log entry to execution time, so measuring it is slowing it down. >>> >>> The results are interesting, and mostly tell me to try a different measurement system. I'm seeing a pretty wide variance. Hammering it with an iperf session and a queue capped at 5 gbit/s: most of the TC timings were 40 nanoseconds - not a packet that requires extra tracking, already in cache, so proceed. When the TCP RTT tracker fired and recorded a performance event, it peaked at 5,900 nanoseconds. So the tc xdp program seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side of things is typically in the 300-400 nanosecond range, I saw a handful of worst-case numbers in the 3400 nanosecond range. So the XDP side is adding 0.00349 ms. So - assuming worst case (and keeping the overhead added by the not-so-great monitoring), we're adding 0.0093 ms to packet transit time with the BPF programs. >>> >>> With a much more sedate queue (ceiling 500 mbit/s), I saw much more consistent numbers. The vast majority of XDP timings were in the 75-150 nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't have an update to perform - peaking very occasionally at 1500 nanoseconds. Only adding 0.00155 ms to packet times is pretty good. >>> >>> It definitely performs best on long streams, probably because the previous lookups are all in cache. This is also making me question the answer I found to "how long does it take to read the clock?" I'd seen ballpark estimates of 53 nanoseconds. Given that this reads the clock twice, that can't be right. (I'm *really* not sure how to measure that one) >>> >>> Again - not a great test (I'll have to learn the perf system to do this properly - which in turn opens up the potential for flame graphs and some proper tracing). Interesting ballpark, though. >>> >>> On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote: >>>> >>>> >>>> >>>> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>> How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's >>>>> >>>>> blue = < 8ms >>>>> green = < 20ms >>>>> yellow = < 50ms >>>>> orange = < 70ms >>>>> red = > 70ms >>>> >>>> >>>> These need configurable. There are a lot of wisps that would have everything orange/red. We're considering anything under 100ms good on the rural plans. Also keep in mind that if you're tracking latence via pping etc, then you need some buffer in there for the internet at large. <70ms to Amazon is one thing, they're very well connected, but <70ms to most of the internet isn't probably very realistic and would make most charts look like poop. >>> >>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC >> Dev | LibreQoS.io >> >> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos > > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-31 21:19 ` Herbert Wolverson 2022-10-31 21:54 ` Dave Taht 2022-10-31 21:57 ` Robert Chacón @ 2022-11-01 3:31 ` Dave Taht 2022-11-01 13:38 ` Herbert Wolverson 2 siblings, 1 reply; 33+ messages in thread From: Dave Taht @ 2022-11-01 3:31 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos Calling rdtsc directly used to be even faster than gettimeofday https://github.com/dtaht/libv6/blob/master/erm/includes/get_cycles.h On Mon, Oct 31, 2022 at 2:20 PM Herbert Wolverson via LibreQoS <libreqos@lists.bufferbloat.net> wrote: > > I'd agree with color coding (when it exists - no rush, IMO) being configurable. > > From the "how much delay are we adding" discussion earlier, I thought I'd do a little bit of profiling of the BPF programs themselves. This is with the latest round of performance updates (https://github.com/thebracket/cpumap-pping/issues/2), so it's not measuring anything in production. I simply added a call to get the clock at the start, and again at the end - and log the difference. Measuring both XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap sends it to the right CPU)->(egress)->(TC sends it to the right classifier, on the correct CPU and measures RTT latency). This is adding about two clock checks and a debug log entry to execution time, so measuring it is slowing it down. > > The results are interesting, and mostly tell me to try a different measurement system. I'm seeing a pretty wide variance. Hammering it with an iperf session and a queue capped at 5 gbit/s: most of the TC timings were 40 nanoseconds - not a packet that requires extra tracking, already in cache, so proceed. When the TCP RTT tracker fired and recorded a performance event, it peaked at 5,900 nanoseconds. So the tc xdp program seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side of things is typically in the 300-400 nanosecond range, I saw a handful of worst-case numbers in the 3400 nanosecond range. So the XDP side is adding 0.00349 ms. So - assuming worst case (and keeping the overhead added by the not-so-great monitoring), we're adding 0.0093 ms to packet transit time with the BPF programs. > > With a much more sedate queue (ceiling 500 mbit/s), I saw much more consistent numbers. The vast majority of XDP timings were in the 75-150 nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't have an update to perform - peaking very occasionally at 1500 nanoseconds. Only adding 0.00155 ms to packet times is pretty good. > > It definitely performs best on long streams, probably because the previous lookups are all in cache. This is also making me question the answer I found to "how long does it take to read the clock?" I'd seen ballpark estimates of 53 nanoseconds. Given that this reads the clock twice, that can't be right. (I'm *really* not sure how to measure that one) > > Again - not a great test (I'll have to learn the perf system to do this properly - which in turn opens up the potential for flame graphs and some proper tracing). Interesting ballpark, though. > > On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote: >> >> >> >> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS <libreqos@lists.bufferbloat.net> wrote: >>> >>> How about the idea of "metaverse-ready" metrics, with one table that is preseem-like and another that's >>> >>> blue = < 8ms >>> green = < 20ms >>> yellow = < 50ms >>> orange = < 70ms >>> red = > 70ms >> >> >> These need configurable. There are a lot of wisps that would have everything orange/red. We're considering anything under 100ms good on the rural plans. Also keep in mind that if you're tracking latence via pping etc, then you need some buffer in there for the internet at large. <70ms to Amazon is one thing, they're very well connected, but <70ms to most of the internet isn't probably very realistic and would make most charts look like poop. > > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-11-01 3:31 ` Dave Taht @ 2022-11-01 13:38 ` Herbert Wolverson 0 siblings, 0 replies; 33+ messages in thread From: Herbert Wolverson @ 2022-11-01 13:38 UTC (permalink / raw) To: Dave Taht; +Cc: libreqos [-- Attachment #1: Type: text/plain, Size: 11815 bytes --] Dave: in this case, I'm running inside the eBPF VM - so I'm already in kernel space, but have a very limited set of functions available. bpf_ktime_get_ns() seems to be the approved way to get the clock. There was a big debate that it uses the kernel's monotonic clock, which takes longer to sample. I'm guessing they improved that, because I'm not seeing the delay that some people were complaining about (it's not free, but it's also a *lot* faster than the estimates I was finding). > > preseems numbers ar -074 green, 75-124 yellow, 125-200 red, and they just consolidate everything >200 to 200, basically so there's no 'terrible' color lol. > I am sorry to hear those numbers are considered to be good. It's interesting that you see adverts on Wisp Talk (the FB group) showing "wow, half my APs are now green!" (and showing about 50% green, 25% yellow, 25% red). When we had Preseem, we always took "red" to mean "oh no, something's really wrong" - and got to work fixing it. There were a couple of distant (many hops down the chain) APs that struggled to stay yellow, but red was always a sign for battle stations. I think that's part of why WISPs suffer from "jump ship as soon as something better comes along" - I'd be jumping ship too, if my ISP expected me to "enjoy" 125-200 ms RTT latency for any extended period of time (I'm pretty understanding about "something went wrong, we're working on it"). Geography does play a large part. I'll see if I can resurrect a tool I had that turned RTT latency measurements into a Google Maps heatmap overlay (updating, so you could see the orange/red areas moving when the network suffered). It can be pretty tough to find a good upstream far from towns, which affects everything. But more, deep chains of backhauls add up - and add up fast if you have any sort of congestion issue along the way. For example: - We have a pretty decently connected upstream, averaging 8ms ping round-trip time to Cloudflare's DNS. - Going down our "hottest" path (60 ghz AF60 LR to a tower, and then another one to a 3,000 bed apartment complex - peaks at 900 mbit/s every night; will peak at a lot more than that as soon as their check clears for some Siklu gear), we worked *stupidly hard* to keep the average ping time there at 9ms to Cloudflare's DNS. Even then, it's closer to 16ms when fully loaded. They are a topic for a future Cake discussion. :-) - We have a few clients connected directly off of the facility with the upstream - and they all get great RTT times (a mix of 5.8 and 3.6 CBRS; Wave coming as soon as it's in stock at the same time as the guy with the money being at a keyboard!). - Our largest (by # of customers) tower is 11 miles away, currently fed by 2 AirFiber 5XHD (ECMP balanced). We've worked really hard to keep that tower's average ping time to Cloudflare at 18ms. We have some nicer radios (the Cambium 400C is a beast) going in soon, which should help. - That tower feeds 4 micro-pops. The worst is near line-of-sight (trees) on a 3.6 ghz Medusa. It suffers a bit at 33ms round-trip ping times to Cloudflare. The best averages 22ms ping times to Cloudflare. - We have a bunch more sites behind a 13 mile backhaul hop (followed by a 3 mile backhaul hop; geography meant going around a tree-covered ridge). We've had a heck of time getting that up to scratch; AF5XHD kinda worked, but the experience was pretty wretched. They were the testbed for the Cambium 400C, and now average 22ms to Cloudflare. - There's 15 (!) small towers behind that one! We eventually got the most distant one to 35ms to Cloudflare pings - but ripped/replaced SO much hardware to get there. (Even then, customer experience at some of those sites isn't what I'd like; I just tried a ping test from a customer running a 2.4 ghz "elevated" Ubiquiti dish to an old ePMP 1000 - at a tower 5 hops in. 45-50ms to Cloudflare. Not great. Physics dictates that the tiny towers, separated from the core by miles of backhaul and hops between them aren't going to perform as well as the nearby ones. You *can* get them going well, but it's expensive and time consuming. One thing Preseem does pretty well is show daily reports in brightly colored bars, which "gamifies" fixing the issue. If you have any gamers on staff, they start to obsess with turning everything green. It's great. :-) The other thing I keep running into is network management. A few years ago, we bought a WISP with 20 towers and a few hundred customers (it was a friendly "I'm getting too unwell to keep doing this" purchase). The guy who set it up was pretty amazing; he had no networking experience whatsoever, but was pretty good at building things. So he'd built most of the towers himself, purely because he wanted to get better service out to some *very* rural parts of Missouri (including a whole bunch of non-profits and churches, which is our largest market). While it's impressive what he pulled off, he'd still just lost 200 customers to an electric coop's fiber build-out. His construction skills were awesome; his network skills - not so much. He had 1 public IP, connected to a 100mbit/s connection at his house. Every single tower (over a 60 mile spread) was connected to exactly one other tower. Every tower had backhauls in bridge mode, connected to a (netgear consumer) switch at the tower. Every AP (all of them 2.4ghz Bullet M2) was in bridge mode with client isolation turned off, connected to an assortment of CPES (mostly Airgrid M2) - also in bridge mode. No DHCP, he had every customer type in their 192.168.x.y address (he had the whole /16 setup on the one link; no VLANs). Speed limits were set by turning on traffic shaping on the M2 CPEs... and he wondered why latency sometimes resembled remote control of a Mars rover, or parts of the network would randomly die when somebody accidentally plugged their net connection into their router's LAN port. A couple of customers had foregone routers altogether, and you could see their Windows networking broadcasts traversing the network! I wish I could say that was unusual, but I've helped a handful of WISPs in similar situations. One of the first things we did was get Preseem running (after adding every client into UNMS as it was called then). That made a big difference, and gave good visibility into how bad it was. Then it was a long process of breaking the network down into routed chunks, enabling DHCP, replacing backhauls (there were a bunch of times when towers were connected in the order they were constructed, and never connected to a new tower a mile away - but 20 miles down the chain), switching out bullets, etc. Eventually, it's a great network - and growing again. I'm not sure we could've done that without a) great visibility from monitoring platforms, and b) decades of experience between us. Longer-term, I'm hoping that we can help networks like that one. Great shaping and visibility go a *long* way. Building up some "best practices" and offering advice can go a *really long* way. (And good mapping makes a big difference; I'm not all that far from releasing a generally usable version of my LiDAR mapping suite, an ancient version is here - https://github.com/thebracket/rf-signals ; You can get LiDAR data for about 2/3 of the US for free, now. ). On Mon, Oct 31, 2022 at 10:32 PM Dave Taht <dave.taht@gmail.com> wrote: > Calling rdtsc directly used to be even faster than gettimeofday > > https://github.com/dtaht/libv6/blob/master/erm/includes/get_cycles.h > > On Mon, Oct 31, 2022 at 2:20 PM Herbert Wolverson via LibreQoS > <libreqos@lists.bufferbloat.net> wrote: > > > > I'd agree with color coding (when it exists - no rush, IMO) being > configurable. > > > > From the "how much delay are we adding" discussion earlier, I thought > I'd do a little bit of profiling of the BPF programs themselves. This is > with the latest round of performance updates ( > https://github.com/thebracket/cpumap-pping/issues/2), so it's not > measuring anything in production. I simply added a call to get the clock at > the start, and again at the end - and log the difference. Measuring both > XDP and TC BPF programs. (Execution goes (packet arrives)->(XDP cpumap > sends it to the right CPU)->(egress)->(TC sends it to the right classifier, > on the correct CPU and measures RTT latency). This is adding about two > clock checks and a debug log entry to execution time, so measuring it is > slowing it down. > > > > The results are interesting, and mostly tell me to try a different > measurement system. I'm seeing a pretty wide variance. Hammering it with an > iperf session and a queue capped at 5 gbit/s: most of the TC timings were > 40 nanoseconds - not a packet that requires extra tracking, already in > cache, so proceed. When the TCP RTT tracker fired and recorded a > performance event, it peaked at 5,900 nanoseconds. So the tc xdp program > seems to be adding a worst-case of 0.0059 ms to packet times. The XDP side > of things is typically in the 300-400 nanosecond range, I saw a handful of > worst-case numbers in the 3400 nanosecond range. So the XDP side is adding > 0.00349 ms. So - assuming worst case (and keeping the overhead added by the > not-so-great monitoring), we're adding 0.0093 ms to packet transit time > with the BPF programs. > > > > With a much more sedate queue (ceiling 500 mbit/s), I saw much more > consistent numbers. The vast majority of XDP timings were in the 75-150 > nanosecond range, and TC was a consistent 50-55 nanoseconds when it didn't > have an update to perform - peaking very occasionally at 1500 nanoseconds. > Only adding 0.00155 ms to packet times is pretty good. > > > > It definitely performs best on long streams, probably because the > previous lookups are all in cache. This is also making me question the > answer I found to "how long does it take to read the clock?" I'd seen > ballpark estimates of 53 nanoseconds. Given that this reads the clock > twice, that can't be right. (I'm *really* not sure how to measure that one) > > > > Again - not a great test (I'll have to learn the perf system to do this > properly - which in turn opens up the potential for flame graphs and some > proper tracing). Interesting ballpark, though. > > > > On Mon, Oct 31, 2022 at 10:56 AM dan <dandenson@gmail.com> wrote: > >> > >> > >> > >> On Sun, Oct 30, 2022 at 8:21 PM Dave Taht via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >>> > >>> How about the idea of "metaverse-ready" metrics, with one table that > is preseem-like and another that's > >>> > >>> blue = < 8ms > >>> green = < 20ms > >>> yellow = < 50ms > >>> orange = < 70ms > >>> red = > 70ms > >> > >> > >> These need configurable. There are a lot of wisps that would have > everything orange/red. We're considering anything under 100ms good on the > rural plans. Also keep in mind that if you're tracking latence via pping > etc, then you need some buffer in there for the internet at large. <70ms > to Amazon is one thing, they're very well connected, but <70ms to most of > the internet isn't probably very realistic and would make most charts look > like poop. > > > > _______________________________________________ > > LibreQoS mailing list > > LibreQoS@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/libreqos > > > > -- > This song goes out to all the folk that thought Stadia would work: > > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz > Dave Täht CEO, TekLibre, LLC > [-- Attachment #2: Type: text/html, Size: 13577 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-29 15:57 ` Herbert Wolverson 2022-10-29 19:05 ` Robert Chacón @ 2022-10-29 19:18 ` Dave Taht 2022-10-30 1:10 ` Herbert Wolverson 1 sibling, 1 reply; 33+ messages in thread From: Dave Taht @ 2022-10-29 19:18 UTC (permalink / raw) To: Herbert Wolverson; +Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 17425 bytes --] On Sat, Oct 29, 2022 at 8:57 AM Herbert Wolverson via LibreQoS < libreqos@lists.bufferbloat.net> wrote: > Alright, the UISP side of the common integrations is pretty much feature > complete. I'll update the tracking issue in a bit. > > - Per your suggestion, devices with no IP addresses (v4 or v6) are not > added. > > Every device that is ipv6-ready comes up with a link-local address derived from the mac like fe80::6f16:fa94:f32b:e2e Some actually will accept things like ssh to that address Not that this is necessarily relevant to this bit of code. Dr irrelevant I am today. (in the context of babel, at least, you can route ipv4 and ipv6 without either an ipv6 or ipv4 address, and hnetd configure) I am kind of curious as to what weird configuration protocols are in common use today Painfully common are "smart switches" that don't listen to dhcp by default AND come up on 192.168.1.1 ubnt comes up on 192.168.1.20 by defualt a lot of cpe comes up on 192.168.1.100 (like cable and starlink) I've seen stuff that uses ancient ieee protocols bootp and tftp are still things I've always kind of wanted a daemon on every device that would probe all possible ip addresses with a ttl of 2, to find rogue devices etc. > > - Mikrotik "4 to 6" mapping is implemented. I put it in the "common" > side of things, so it can be used in other integrations also. I don't have > a setup on which to test it, but if I'm reading the code right then the > unit test is testing it appropriately. > > You talking about the relevant rfc? > > - excludeSites is supported as a common API feature. If a node is > added with a name that matches an excluded site, it won't be added. The > tree builder is smart enough to replace invalid "parentId" references with > the shaper root, so if you have other tree items that rely on this site - > they will be added to the tree. Was that the intent? (It looks pretty > useful; we have a child site down the tree with a HUGE amount of load, and > bumping it to the top-level with excludeSites would probably help our load > balancing quite a bit) > - If the intent was to exclude the site and everything underneath > it, I'd have to rework things a bit. Let me know; it wasn't quite clear. > - exceptionCPEs is also supported as a common API feature. It > simply overrides the "parentId'' of incoming nodes with the new parent. > Another potentially useful feature; if I got excludeSites the wrong away > around, I'd add a "my_big_site":"" entry to push it to the top. > > Seems to be a need for some level of exclusions for device type, e.g. (at least per your report), don't run ack-filter on a cambium path. > > - UISP integration now supports a "flat" topology option (set via > uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py > to include this entry. > > I'll look and see how much of the Spylnx code I can shorten with the new > API; I don't have a Spylnx setup to test against, making that tricky. I > *think* the new API should shorten things a lot. I think routers act as > node parents, with clients underneath them? Otherwise, a "flat" setup > should be a little shorter (the CSV code can be replaced with a call to the > graph builder). Most of the Spylnx (and VISP) users I've talked to layer > MPLS+VPLS to pretend to have a big, flat network and then connect via a > RADIUS call in the DHCP server; > Is there any particularly common set of radius servers in use? > I've always assumed that's because those systems prefer the telecom model > of "pretend everything is equal" to trying to model topology.* > Except the billing. Always the billing. Our tuesday golden plate special is you can download all the pr0n from our special partner netblix for 24 hours a week! 9.95! > > I need to clean things up a bit (there's still a bit of duplicated code, > and I believe in the DRY principle - don't repeat yourself; Dave Thomas - > my boss at PragProg - coined the term in The Pragmatic Programmer, and I > feel obliged to use it everywhere!), and do a quick rebase (I accidentally > parented the branch off of a branch instead of main) - but I think I can > have this as a PR for you on Monday. > > * - The first big wireless network I setup used a Motorola WiMAX setup. > They *required* that every single AP share two VLANs (management and > bearer) with every other AP - all the way to the core. It kinda worked once > they remembered client isolation was a thing in a patch... Then again, > their installation instructions included connecting two ports of a router > together with a jumper cable, because their localhost implementation didn't > quite work. :-| > > On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < > robert.chacon@jackrabbitwireless.com> wrote: > >> Awesome work. It succeeded in building the topology and creating >> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >> I notice that in ShapedDevices.csv it does add CPE radios (which in our >> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >> being empty lists []. >> This is not necessarily bad, but it may lead to empty leaf classes being >> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class >> counter increment toward the 32k limit faster. >> Do you think perhaps we should check: >> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >> * # Skip adding this entry to ShapedDevices.csv* >> Or something similar around line 329 of integrationCommon.py? >> Open to your suggestions there. >> >> >> >> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >> libreqos@lists.bufferbloat.net> wrote: >> >>> One more update, and I'm going to sleep until "pick up daughter" time. >>> :-) >>> >>> The tree at >>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>> can now build a network.json, ShapedDevices.csv, and >>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>> previous importer - other than using data links to build the hierarchy and >>> letting (requiring, currently) you specify the root node. It's handling our >>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>> recommend just running integrationUISP.py and checking the output rather >>> than throwing it into production), I'd appreciate any feedback. >>> >>> Still on my list: handling the Mikrotik IPv6 connections, and >>> exceptionCPE and site exclusion. >>> >>> If you want the pretty graphics, you need to "pip install graphviz" and >>> "sudo apt install graphviz". It *should* detect that these aren't present >>> and not try to draw pictures, otherwise. >>> >>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>> robert.chacon@jackrabbitwireless.com> wrote: >>> >>>> Wow. This is very nicely done. Awesome work! >>>> >>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> The integration is coming along nicely. Some progress updates: >>>>> >>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>> This sets where in the topology you want the tree to start. This has two >>>>> purposes: >>>>> - It's hard to be psychic and know for sure where the shaper is >>>>> in the network. >>>>> - You could run multiple shapers at different egress points, >>>>> with failover - and rebuild the entire topology from the point of view of a >>>>> network node. >>>>> - "Child node with children" are now automatically converted into >>>>> a "(Generated Site) name" site, and their children rearranged. This: >>>>> - Allows you to set the "site" bandwidth independently of the >>>>> client site bandwidth. >>>>> - Makes for easier trees, because we're inserting the site that >>>>> really should be there. >>>>> - Network.json generation (not the shaped devices file yet) is >>>>> automatically generated from a tree, once PrepareTree() and >>>>> createNetworkJson() are called. >>>>> - There's a unit test that generates the network.example.json >>>>> file and compares it with the original to ensure that they match. >>>>> - Unit test coverage hits every function in the graph system, now. >>>>> >>>>> I'm liking this setup. With the non-vendor-specific logic contained >>>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>>> tree is down to 65 >>>>> lines of code, including comments. That'll grow a bit as I re-insert >>>>> some automatic speed limit determination, AP/Site speed overrides ( >>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>> >>>>> Creating the network.example.json file only requires: >>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>> import json >>>>> net = NetworkGraph() >>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType. >>>>> site, 1000, 1000)) >>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType. >>>>> site, 500, 500)) >>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType. >>>>> ap, 500, 500)) >>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>> NodeType.site, 500, 500)) >>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>> NodeType.site, 200, 200)) >>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType. >>>>> ap, 120, 120)) >>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", NodeType >>>>> .site, 60, 60)) >>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", NodeType >>>>> .ap, 30, 30)) >>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>> NodeType.site, 200, 200)) >>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType. >>>>> ap, 100, 100)) >>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType. >>>>> ap, 150, 150)) >>>>> net.prepareTree() >>>>> net.createNetworkJson() >>>>> >>>>> (The id and name fields are duplicated right now, I'm using readable >>>>> names to keep me sane. The third string is the parent, and the last two >>>>> numbers are bandwidth limits) >>>>> The nice, readable format being: >>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>> NodeType.site, download=1000, upload=1000) >>>>> >>>>> That in turns gives you the example network: >>>>> [image: image.png] >>>>> >>>>> >>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>> herberticus@gmail.com> wrote: >>>>> >>>>>> Dave: I love those Gource animations! Game development is my other >>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>>> glow "just right". :-) >>>>>> >>>>>> Dan: Discovery would be nice, but I don't think we're ready to look >>>>>> in that direction yet. I'm trying to build a "common grammar" to make it >>>>>> easier to express network layout from integrations; that would be another >>>>>> form/layer of integration and a lot easier to work with once there's a >>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>>>> deal with things like PMP450 clients that change MAC when you put them in >>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>>>> allocations by MAC (this is also the hardest part for me to test, since I >>>>>> don't have any v6 to test, currently). >>>>>> >>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>> database for a ton of external tools (mostly ones we've created). >>>>>> >>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>> >>>>>>> we're pretty similar in that we've made UISP a mess. Multiple paths >>>>>>> to a pop. multiple pops on the network. failover between pops. Lots of >>>>>>> 'other' devices. handing out /29 etc to customers. >>>>>>> >>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>> from discovery. >>>>>>> >>>>>>> I've also thought about doing routine trace routes or watching TTLs >>>>>>> or something like that to get some indication that topology has changed and >>>>>>> then do another discovery and potential tree rebuild. >>>>>>> >>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>> >>>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>>> Being able to map out these complex integrations will help ISPs a >>>>>>>> ton, and I really like that it is sharing common features between the >>>>>>>> Splynx and UISP integrations. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Robert >>>>>>>> >>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>> >>>>>>>>> I started by implementing a graph structure that mirrors both the >>>>>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>> >>>>>>>>> Our UISP instance is a *great* testcase for torturing the system. >>>>>>>>> I even found a case of UISP somehow auto-generating a circular portion of >>>>>>>>> the tree. We have: >>>>>>>>> >>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>>>> same IP, with a "service IP" entry >>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>> - Sites that go to client sites, which go to other client sites >>>>>>>>> >>>>>>>>> In other words, over the years we've unleashed a bit of a monster. >>>>>>>>> Cleaning it up is a useful talk, but I wanted the integration to be able to >>>>>>>>> handle pathological cases like us! >>>>>>>>> >>>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>>> graphviz to spit out a directed graph: >>>>>>>>> [image: image.png] >>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>> >>>>>>>>> >>>>>>>>> - Green = the root site. >>>>>>>>> - Red = a site >>>>>>>>> - Blue = an access point >>>>>>>>> - Magenta = a client site that has children >>>>>>>>> >>>>>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>>>>> tiny fraction of the total code. :-) >>>>>>>>> >>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Robert Chacón >>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>> _______________________________________________ >>>>>>>> LibreQoS mailing list >>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>> >>>>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> Robert Chacón >>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>> Dev | LibreQoS.io >>>> >>>> _______________________________________________ >>> LibreQoS mailing list >>> LibreQoS@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/libreqos >>> >> >> >> -- >> Robert Chacón >> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >> Dev | LibreQoS.io >> >> _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC [-- Attachment #1.2: Type: text/html, Size: 36267 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [LibreQoS] Integration system, aka fun with graph theory 2022-10-29 19:18 ` Dave Taht @ 2022-10-30 1:10 ` Herbert Wolverson 0 siblings, 0 replies; 33+ messages in thread From: Herbert Wolverson @ 2022-10-30 1:10 UTC (permalink / raw) Cc: libreqos [-- Attachment #1.1: Type: text/plain, Size: 22416 bytes --] > You talking about the relevant rfc? In this case, the "6 to 4" refers to some integration code that was already present - named "mikrotikFindIpv6.py". I probably should've made that more clear. It connects to Mikrotik routers, and performs a MAC address search in their DHCPv6 tables - finding known MAC addresses and providing the allocated IPv6 address-space. Looks like a handy tool, and a good work-around for UISP (Ubiquiti's combined management and CRM tool) only kind-of supporting IPv6. The database format supports v6 addresses, but it doesn't consistently put any data in there; worse, it doesn't show it on-screen when it has it! > Seems to be a need for some level of exclusions for device type, e.g. (at least per your report), don't run ack-filter on a cambium path. I agree with that longer-term. For now, I'm trying to get the existing integrations up-to-speed and easy to work with. The whole "build on a good foundation" thing. That's one thing I've learned the hard way over the decades; it's a *lot* easier to shoot for the moon if you take the time to come up with a good launch platform! Longer-term, it's looking more and more like we'll need a more robust discovery system. I've some ideas, but they are way too formative to be useful yet. Some early thinking: there's a big disparity between what the various back-ends WISPs (and ISPs in general) are using to manage and monitor their networks, and the systems that handle CRM (billing, ticketing, customer interaction, etc.). Spylnx and its ilk are great billing systems, but don't really know a lot about your network arrangement - it wouldn't surprise me if there are Spylnx and VISP users who also have UISP (just the network management mode) going as well. On the other extreme, PowerCode tries to write directly to your Mikrotik routers and wants to know everything right down to your underwear colour. In my mind: * Step 1 (we're nearly there!) is to build a good foundation for representing an IPv4/IPv6 network, that's really agnostic to all the crazy things a WISP may be doing. It should automate all the tedious parts (figuring out a tree from a soup of sites, access points, users - rearranging the tree to have a "starting point", emitting the various control files, etc.), be easy enough to use that someone could say "wow, I need to support my management system" and be able to do so with a little bit of hand-holding - encouraging participation. * Step 2 would be to provide some great manual tools for the DIY crowd, and some really good documentation to make their life easy. * Step 3 is some kind of way to mix-and-match systems. Say you have Splynx AND the management part of UISP. Wouldn't it be great if Spylnx could provide all of the "plan" data, and the data be provided from UISP's management"? It seems like that's quite do-able with a little work. We may need to think about a management GUI at this point, just to help hold hands a bit. * Step 4 would be something Dan keeps asking about, ways to query hardware that exists and build some topology around it. That would be great, and is quite the undertaking (best tackled incrementally, and in a modular fashion, IMHO). This is still just the musings of a sleep-deprived brain. :-) > Is there any particularly common set of radius servers in use? It seems like when I poke deeply enough, most people are running FreeRADIUS or something vendor-supplied (which is sometimes FreeRADIUS with a badge on it). Then there's crazy people paying $10k for super high-end RADIUS servers that aren't actually much better than the free ones. RADIUS is a tough one, because LibreQoS isn't really well placed to directly utilize it. Typically, RADIUS is basically a "yes or no" box, with options attached. RADIUS queries happen on network entry (either as part of the admissions process, part of the Ethernet security step, or from the DHCP server) and the reply is basically "yes, you're admitted - these are your options". The problem is, Libre doesn't necessarily see any of that - it's inside the network. That's why we have API dependencies, even though Spylnx and VISP are basically a really big billing system that comes bundled with a RADIUS server. (Unfortunately, Mikrotik interprets the RADIUS replies to make a simple queue on the router that made the request - you can script that, but it gets messy fast). I'll answer the second email in a bit. On Sat, Oct 29, 2022 at 2:18 PM Dave Taht <dave.taht@gmail.com> wrote: > > > On Sat, Oct 29, 2022 at 8:57 AM Herbert Wolverson via LibreQoS < > libreqos@lists.bufferbloat.net> wrote: > >> Alright, the UISP side of the common integrations is pretty much feature >> complete. I'll update the tracking issue in a bit. >> >> - Per your suggestion, devices with no IP addresses (v4 or v6) are >> not added. >> >> Every device that is ipv6-ready comes up with a link-local address > derived from the mac like fe80::6f16:fa94:f32b:e2e > Some actually will accept things like ssh to that address > Not that this is necessarily relevant to this bit of code. Dr irrelevant I > am today. > (in the context of babel, at least, you can route ipv4 and ipv6 without > either an ipv6 or ipv4 address, and hnetd configure) > > I am kind of curious as to what weird configuration protocols are in > common use today > > Painfully common are "smart switches" that don't listen to dhcp by default > AND come up on 192.168.1.1 > ubnt comes up on 192.168.1.20 by defualt > a lot of cpe comes up on 192.168.1.100 (like cable and starlink) > I've seen stuff that uses ancient ieee protocols > bootp and tftp are still things > > I've always kind of wanted a daemon on every device that would probe all > possible ip addresses with a ttl of 2, to find rogue > devices etc. > >> >> - Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >> side of things, so it can be used in other integrations also. I don't have >> a setup on which to test it, but if I'm reading the code right then the >> unit test is testing it appropriately. >> >> > You talking about the relevant rfc? > > >> >> - excludeSites is supported as a common API feature. If a node is >> added with a name that matches an excluded site, it won't be added. The >> tree builder is smart enough to replace invalid "parentId" references with >> the shaper root, so if you have other tree items that rely on this site - >> they will be added to the tree. Was that the intent? (It looks pretty >> useful; we have a child site down the tree with a HUGE amount of load, and >> bumping it to the top-level with excludeSites would probably help our load >> balancing quite a bit) >> - If the intent was to exclude the site and everything underneath >> it, I'd have to rework things a bit. Let me know; it wasn't quite clear. >> - exceptionCPEs is also supported as a common API feature. It >> simply overrides the "parentId'' of incoming nodes with the new parent. >> Another potentially useful feature; if I got excludeSites the wrong away >> around, I'd add a "my_big_site":"" entry to push it to the top. >> >> > Seems to be a need for some level of exclusions for device type, e.g. (at > least per your report), don't run ack-filter on a cambium path. > > >> >> - UISP integration now supports a "flat" topology option (set via >> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >> to include this entry. >> >> I'll look and see how much of the Spylnx code I can shorten with the new >> API; I don't have a Spylnx setup to test against, making that tricky. I >> *think* the new API should shorten things a lot. I think routers act as >> node parents, with clients underneath them? Otherwise, a "flat" setup >> should be a little shorter (the CSV code can be replaced with a call to the >> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >> MPLS+VPLS to pretend to have a big, flat network and then connect via a >> RADIUS call in the DHCP server; >> > > Is there any particularly common set of radius servers in use? > > >> I've always assumed that's because those systems prefer the telecom model >> of "pretend everything is equal" to trying to model topology.* >> > > Except the billing. Always the billing. Our tuesday golden plate special > is you can download all the pr0n from our special partner netblix for 24 > hours a week! 9.95! > > >> >> I need to clean things up a bit (there's still a bit of duplicated code, >> and I believe in the DRY principle - don't repeat yourself; Dave Thomas - >> my boss at PragProg - coined the term in The Pragmatic Programmer, and I >> feel obliged to use it everywhere!), and do a quick rebase (I accidentally >> parented the branch off of a branch instead of main) - but I think I can >> have this as a PR for you on Monday. >> >> * - The first big wireless network I setup used a Motorola WiMAX setup. >> They *required* that every single AP share two VLANs (management and >> bearer) with every other AP - all the way to the core. It kinda worked once >> they remembered client isolation was a thing in a patch... Then again, >> their installation instructions included connecting two ports of a router >> together with a jumper cable, because their localhost implementation didn't >> quite work. :-| >> >> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >> robert.chacon@jackrabbitwireless.com> wrote: >> >>> Awesome work. It succeeded in building the topology and creating >>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>> I notice that in ShapedDevices.csv it does add CPE radios (which in our >>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>> being empty lists []. >>> This is not necessarily bad, but it may lead to empty leaf classes being >>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class >>> counter increment toward the 32k limit faster. >>> Do you think perhaps we should check: >>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>> * # Skip adding this entry to ShapedDevices.csv* >>> Or something similar around line 329 of integrationCommon.py? >>> Open to your suggestions there. >>> >>> >>> >>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>> libreqos@lists.bufferbloat.net> wrote: >>> >>>> One more update, and I'm going to sleep until "pick up daughter" time. >>>> :-) >>>> >>>> The tree at >>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>> can now build a network.json, ShapedDevices.csv, and >>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>> previous importer - other than using data links to build the hierarchy and >>>> letting (requiring, currently) you specify the root node. It's handling our >>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>> recommend just running integrationUISP.py and checking the output rather >>>> than throwing it into production), I'd appreciate any feedback. >>>> >>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>> exceptionCPE and site exclusion. >>>> >>>> If you want the pretty graphics, you need to "pip install graphviz" and >>>> "sudo apt install graphviz". It *should* detect that these aren't present >>>> and not try to draw pictures, otherwise. >>>> >>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>> robert.chacon@jackrabbitwireless.com> wrote: >>>> >>>>> Wow. This is very nicely done. Awesome work! >>>>> >>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> The integration is coming along nicely. Some progress updates: >>>>>> >>>>>> - You can specify a variable in ispConfig.py named "uispSite". >>>>>> This sets where in the topology you want the tree to start. This has two >>>>>> purposes: >>>>>> - It's hard to be psychic and know for sure where the shaper >>>>>> is in the network. >>>>>> - You could run multiple shapers at different egress points, >>>>>> with failover - and rebuild the entire topology from the point of view of a >>>>>> network node. >>>>>> - "Child node with children" are now automatically converted into >>>>>> a "(Generated Site) name" site, and their children rearranged. This: >>>>>> - Allows you to set the "site" bandwidth independently of the >>>>>> client site bandwidth. >>>>>> - Makes for easier trees, because we're inserting the site >>>>>> that really should be there. >>>>>> - Network.json generation (not the shaped devices file yet) is >>>>>> automatically generated from a tree, once PrepareTree() and >>>>>> createNetworkJson() are called. >>>>>> - There's a unit test that generates the network.example.json >>>>>> file and compares it with the original to ensure that they match. >>>>>> - Unit test coverage hits every function in the graph system, now. >>>>>> >>>>>> I'm liking this setup. With the non-vendor-specific logic contained >>>>>> inside the NetworkGraph type, the actual UISP code to generate the example >>>>>> tree is down to 65 >>>>>> lines of code, including comments. That'll grow a bit as I re-insert >>>>>> some automatic speed limit determination, AP/Site speed overrides ( >>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>> >>>>>> Creating the network.example.json file only requires: >>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>> import json >>>>>> net = NetworkGraph() >>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType. >>>>>> site, 1000, 1000)) >>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType. >>>>>> site, 500, 500)) >>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType >>>>>> .ap, 500, 500)) >>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>> NodeType.site, 500, 500)) >>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>> NodeType.site, 200, 200)) >>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType. >>>>>> ap, 120, 120)) >>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>> NodeType.site, 60, 60)) >>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>> NodeType.ap, 30, 30)) >>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>> NodeType.site, 200, 200)) >>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType. >>>>>> ap, 100, 100)) >>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType >>>>>> .ap, 150, 150)) >>>>>> net.prepareTree() >>>>>> net.createNetworkJson() >>>>>> >>>>>> (The id and name fields are duplicated right now, I'm using readable >>>>>> names to keep me sane. The third string is the parent, and the last two >>>>>> numbers are bandwidth limits) >>>>>> The nice, readable format being: >>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>> NodeType.site, download=1000, upload=1000) >>>>>> >>>>>> That in turns gives you the example network: >>>>>> [image: image.png] >>>>>> >>>>>> >>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>> herberticus@gmail.com> wrote: >>>>>> >>>>>>> Dave: I love those Gource animations! Game development is my other >>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the >>>>>>> glow "just right". :-) >>>>>>> >>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look >>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it >>>>>>> easier to express network layout from integrations; that would be another >>>>>>> form/layer of integration and a lot easier to work with once there's a >>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly; >>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite >>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to >>>>>>> deal with things like PMP450 clients that change MAC when you put them in >>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places >>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's >>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6 >>>>>>> allocations by MAC (this is also the hardest part for me to test, since I >>>>>>> don't have any v6 to test, currently). >>>>>>> >>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>> >>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson@gmail.com> wrote: >>>>>>> >>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>> >>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>> from discovery. >>>>>>>> >>>>>>>> I've also thought about doing routine trace routes or watching TTLs >>>>>>>> or something like that to get some indication that topology has changed and >>>>>>>> then do another discovery and potential tree rebuild. >>>>>>>> >>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>> >>>>>>>>> This is awesome! Way to go here. Thank you for contributing this. >>>>>>>>> Being able to map out these complex integrations will help ISPs a >>>>>>>>> ton, and I really like that it is sharing common features between the >>>>>>>>> Splynx and UISP integrations. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Robert >>>>>>>>> >>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS < >>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>> >>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>> >>>>>>>>>> I started by implementing a graph structure that mirrors both the >>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>> >>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>> portion of the tree. We have: >>>>>>>>>> >>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>> - Bridge mode devices using Option 82 to always allocate the >>>>>>>>>> same IP, with a "service IP" entry >>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>> sites >>>>>>>>>> >>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>> >>>>>>>>>> So I fed our network into the current graph generator, and used >>>>>>>>>> graphviz to spit out a directed graph: >>>>>>>>>> [image: image.png] >>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - Green = the root site. >>>>>>>>>> - Red = a site >>>>>>>>>> - Blue = an access point >>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>> >>>>>>>>>> So the part in "common" is designed heavily to reduce repetition. >>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices, >>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the >>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a >>>>>>>>>> tiny fraction of the total code. :-) >>>>>>>>>> >>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>> _______________________________________________ >>>>>>>>>> LibreQoS mailing list >>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Chacón >>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>>>>>> _______________________________________________ >>>>>>>>> LibreQoS mailing list >>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Chacón >>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>>>> Dev | LibreQoS.io >>>>> >>>>> _______________________________________________ >>>> LibreQoS mailing list >>>> LibreQoS@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/libreqos >>>> >>> >>> >>> -- >>> Robert Chacón >>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com> >>> Dev | LibreQoS.io >>> >>> _______________________________________________ >> LibreQoS mailing list >> LibreQoS@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/libreqos >> > > > -- > This song goes out to all the folk that thought Stadia would work: > > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz > Dave Täht CEO, TekLibre, LLC > [-- Attachment #1.2: Type: text/html, Size: 41564 bytes --] [-- Attachment #2: image.png --] [-- Type: image/png, Size: 573568 bytes --] [-- Attachment #3: image.png --] [-- Type: image/png, Size: 115596 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2022-11-01 13:39 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-27 21:33 [LibreQoS] Integration system, aka fun with graph theory Herbert Wolverson 2022-10-27 21:41 ` Dave Taht 2022-10-27 21:44 ` Dave Taht 2022-10-27 21:48 ` Robert Chacón 2022-10-28 0:27 ` dan 2022-10-28 12:40 ` Herbert Wolverson 2022-10-28 17:43 ` Herbert Wolverson 2022-10-28 19:05 ` Robert Chacón 2022-10-28 19:54 ` Herbert Wolverson 2022-10-28 21:15 ` Robert Chacón 2022-10-29 15:57 ` Herbert Wolverson 2022-10-29 19:05 ` Robert Chacón 2022-10-29 19:43 ` Dave Taht 2022-10-30 1:45 ` Herbert Wolverson 2022-10-31 0:15 ` Dave Taht 2022-10-31 1:15 ` Robert Chacón 2022-10-31 1:26 ` Herbert Wolverson 2022-10-31 1:36 ` Herbert Wolverson 2022-10-31 1:46 ` Herbert Wolverson 2022-10-31 2:21 ` Dave Taht 2022-10-31 3:26 ` Robert Chacón 2022-10-31 14:47 ` [LibreQoS] metaverse-ready metrics Dave Taht 2022-10-31 14:50 ` Dave Taht 2022-10-31 15:56 ` [LibreQoS] Integration system, aka fun with graph theory dan 2022-10-31 21:19 ` Herbert Wolverson 2022-10-31 21:54 ` Dave Taht 2022-10-31 21:57 ` Robert Chacón 2022-10-31 23:31 ` dan 2022-10-31 23:45 ` Dave Taht 2022-11-01 3:31 ` Dave Taht 2022-11-01 13:38 ` Herbert Wolverson 2022-10-29 19:18 ` Dave Taht 2022-10-30 1:10 ` Herbert Wolverson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox