On a high-level, I've been playing with: - The brute force approach: have a bigger buffer, so exhaustion is less likely to ever happen. - A shared "config" flag that turns off monitoring once exhaustion is near - it costs one synchronized lookup/increment, and gets reset when you read the stats. - Per-CPU buffers for the very volatile data, which is generally faster (at the expense of RAM) - but is also quite hard to manage from userspace. It significantly reduces the likelihood of stalling, but I'm not fond of the complexity so far. - Replacing the volatile "packet buffer" with a "least recently used" map that automatically gets rid of old data if it isn't cleaned up (the original only cleans up when a TCP connection closes gracefully) - Maintaining two sets of buffers and keeping a pointer to each. A shared config variable indicates whether we are currently writing to A or B. "Cleanup" cleans the *other* buffer and switches the pointers. So we're never sharing "hot" data with a userland cleanup. That's a lot to play with, so I'm taking my time. My gut likes the A/B switch, currently. On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson wrote: > > "average" of "what"? > > Mean TCP RTT times, as measured by pping-cpumap. There's two steps of > improvement; the original "pping" started to eat a bunch of CPU at higher > traffic levels, and I had a feeling - not entirely quantified - that the > excess CPU usage was causing some latency. Switching to pping-cpumap showed > that I was correct in my hunch. On top of that,as Robert had observed, the > previous version was causing a slight "stutter" when it filled the tracking > buffers (and then recovered fine). My most recent build scales the tracking > buffers up a LOT - which I was worried would cause some slowdown (since the > program is now searching a much larger hashmap space, making it less cache > friendly). The buffer increase fixed up the stutter issue. I probably > should have been a little more clear on what I was talking about. I'm still > trying to figure out the optimal buffer size, and the optimal stats > collection (which "resets" the buffers, eliminating any resource depletion) > period. > > I'm also experimenting with a few other ideas to keep the measurement > latency more consistent. I tried "dump it all into a perfmap and figure it > out in userspace" which went spectacularly badly. :-| > > The RTT measurements are from the customer to whatever the heck they are > using on the Internet. So customers using a slow service that's > bottlenecked far outside of my control will negatively affect the results - > but there's nothing I can do about that. Coincidentally, it's the same > "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself > included) tend to have a "feel" for it. If I remember rightly, Preseem > (which is basically fq-codel queues per customer, with an optional layer of > AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms > as "red" - and a lot of WISPs have become used to that grading. I always > thought that an average of 70ms seemed pretty excessive to be "good". The > idea is that it's quantifying the customer's *experience* - the lower the > average, the snappier the connection "feels". You can have a pretty happy > customer with very low latency and a low speed plan, if they aren't doing > anything that needs to exhaust their speed plan. (This contrasts with a lot > of other solutions - notably Sandvine - which have always focused heavily > on "how much less upsteam does the ISP need to buy?") > > On Sun, Oct 30, 2022 at 7:15 PM Dave Taht wrote: > >> >> >> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson >> wrote: >> >>> > For starters, let me also offer praise for this work which is so ahead >>> of schedule! >>> >>> Thank you. I'm enjoying a short period while I wait for my editor to >>> finish up with a couple of chapters of my next book (working title More >>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens >>> of game development). >>> >> >> cool. I'm 32 years into my PHD thesis. >> >> >>> >>> I think at least initially, the primary focus is on what WISPs are used >>> to (and ask for): a fat shaper box that sits between a WISP and their >>> Internet connection(s). Usually in the topology: (router connected to >>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's >>> network as a whole). That's a simplification; there's usually a bypass (in >>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections >>> that need shaping, etc. That's how Preseem (and the others) tend to insert >>> themselves - shape everything on the way out. >>> >> >> Presently LibreQos appears to be inserting about 200us of delay into the >> path, for the sparsest packets. Every box on the path adds >> delay, though cut-through switches are common. Don't talk to me about >> network slicing and disaggregated this or that in the 3GPP world, tho... >> ugh. >> >> I guess, for every "box" (or virtual machine) on the path I have amdah's >> law stuck in my head. >> >> This is in part why the K8 crowd makes me a little crazy. >> >> >>> >>> I think there's a lot to be said for the possibility of LibreQoS at >>> towers that need it the most, also. That might require a bit of MPLS >>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier >>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS >>> headers?), but has the potential to really clean things up. Especially for >>> a really busy tower site. (On a similar note, WISPs with multiple Internet >>> connections at different sites would benefit from LibreQoS on each of >>> them). >>> >>> Generally, the QoS box doesn't really care what you are running in the >>> way of a router. >>> >> >> It is certainly simpler to have a transparent middlebox for this stuff, >> initially, and it would take a great leap of faith, >> for many, to just plug in a lqos box as the main box... but cumulus did >> succeed at a lot of that... they open sourced a bfd daemon... numerous >> other tools... >> >> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/ >> >> >>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco >>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since >>> we're shaping in the "router sandwich" (which can be one router with a bit >>> of care), we don't necessarily need to worry too much about their innards. >>> >>> >> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff, >> or a pair perhaps using a fiber optic splitter for failover >> >> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos >> >> >> >> >>> With that said, some future SNMP support (please, not polling everything >>> all the time... that's a monitoring program's job!) is probably hard to >>> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to >>> be trying to cease supporting it, ugh) >>> >>> >> Building on this initial core strength - sampling RTT - would be a >> differentiator. >> >> Examples: >> >> RTT per AP >> RTT P1 per AP (what's the effective minimum) >> RTT P99 (what's the worst case?) >> RTT variance P1 to P99 per internet IP (worst 20 performers) or AS >> number or /24 >> >> (variance is a very important concept) >> >> >> >> >> >>> I could see some support for outputting rules for routers, especially if >>> the goal is to get Cake managing buffer-bloat in many places down the line. >>> >>> Incidentally, using my latest build of cpumap-pping (and no separate >>> pping running, eating a CPU) my average network latency has dropped to 24ms >>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real >>> customer traffic through the system. :-) >>> >> >> OK, this is something that "triggers" my inner pedant. Forgive me in >> advance? >> >> "average" of "what"? >> >> Changing the monitoring tool shouldn't have affected the average latency, >> unless how it is calculated is different, or the sample >> population (more likely) has changed. If you are tracking now far more >> short flows, the observed latency will decline, but the >> higher latencies you were observing in the first place are still there. >> >> Also... between where and where? Across the network? To the customer to >> their typical set of IP addresses of their servers? >> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge >> should take under 2ms). Wifi hops at the end of the link are >> probably adding the most delay... >> >> If you consider 24ms "good" - however you calculate - going for ever >> less via whatever means can be obtained from these >> analyses, is useful. But there are some things I don't think make as much >> sense as they used to - a netflix cache hitrate must >> be so low nowadays as to cost you just as much to fetch it from upstream >> than host a box... >> >> >> >> >>> >>> >>> >>> >>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht wrote: >>> >>>> For starters, let me also offer praise for this work which is so ahead >>>> of schedule! >>>> >>>> I am (perhaps cluelessly) thinking about bigger pictures, and still >>>> stuck in my mindset involving distributing the packet processing, >>>> and representing the network topology, plans and compensating for the >>>> physics. >>>> >>>> So you have a major tower, a separate libreqos instance goes there. Or >>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is >>>> there. Or are you basically thinking one device rules them all and off the >>>> only interface, shapes them? >>>> >>>> Or: >>>> >>>> You have another pop with a separate connection to the internet that >>>> you inherited from a buyout, or you wanted physical redundancy for your BGP >>>> AS's internet access, maybe just between DCs in the same town or... >>>> ____________________________________________ >>>> >>>> / >>>> / >>>> cloud -> pop -> customers - customers <- pop <- cloud >>>> \ ----- leased fiber or wireless / >>>> >>>> >>>> I'm also a little puzzled as to whats the ISP->internet link? juniper? >>>> cisco? mikrotik, and what role and services that is expected to have. >>>> >>>> >>>> >>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS < >>>> libreqos@lists.bufferbloat.net> wrote: >>>> >>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not >>>>> added. >>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common" >>>>> side of things, so it can be used in other integrations also. I don't have >>>>> a setup on which to test it, but if I'm reading the code right then the >>>>> unit test is testing it appropriately. >>>>> >>>>> Fantastic. >>>>> >>>>> > excludeSites is supported as a common API feature. If a node is >>>>> added with a name that matches an excluded site, it won't be added. The >>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>> the shaper root, so if you have other tree items that rely on this site - >>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>> bumping it to the top-level with excludeSites would probably help our load >>>>> balancing quite a bit) >>>>> >>>>> Very cool approach, I like it! Yeah we have some cases where we need >>>>> to balance out high load child nodes across CPUs so that's perfect. >>>>> Originally I thought of it to just exclude sites that don't fit into >>>>> the shaped topology but this approach is more useful. >>>>> Should we rename excludeSites to moveSitesToTop or something similar? >>>>> That functionality of distributing across top level nodes / cpu cores seems >>>>> more important anyway. >>>>> >>>>> >exceptionCPEs is also supported as a common API feature. It simply >>>>> overrides the "parentId'' of incoming nodes with the new parent. Another >>>>> potentially useful feature; if I got excludeSites the wrong away around, >>>>> I'd add a "my_big_site":"" entry to push it to the top. >>>>> >>>>> Awesome >>>>> >>>>> > UISP integration now supports a "flat" topology option (set via >>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py >>>>> to include this entry. >>>>> >>>>> Nice! >>>>> >>>>> > I'll look and see how much of the Spylnx code I can shorten with the >>>>> new API; I don't have a Spylnx setup to test against, making that tricky. >>>>> >>>>> I'll send you the Splynx login they gave us. >>>>> >>>>> > I *think* the new API should shorten things a lot. I think routers >>>>> act as node parents, with clients underneath them? Otherwise, a "flat" >>>>> setup should be a little shorter (the CSV code can be replaced with a call >>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to >>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via >>>>> a RADIUS call in the DHCP server; I've always assumed that's because those >>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>> to model topology.* >>>>> >>>>> Yeah splynx doesn't seem to natively support any topology mapping or >>>>> even AP designation, one person I spoke to said they track corresponding >>>>> APs in radius anyway. So for now the flat model may be fine. >>>>> >>>>> > I need to clean things up a bit (there's still a bit of duplicated >>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>> think I can have this as a PR for you on Monday. >>>>> >>>>> This is really great work and will make future integrations much >>>>> cleaner and nicer to work with. Thank you! >>>>> >>>>> >>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS < >>>>> libreqos@lists.bufferbloat.net> wrote: >>>>> >>>>>> Alright, the UISP side of the common integrations is pretty much >>>>>> feature complete. I'll update the tracking issue in a bit. >>>>>> >>>>>> - Per your suggestion, devices with no IP addresses (v4 or v6) >>>>>> are not added. >>>>>> - Mikrotik "4 to 6" mapping is implemented. I put it in the >>>>>> "common" side of things, so it can be used in other integrations also. I >>>>>> don't have a setup on which to test it, but if I'm reading the code right >>>>>> then the unit test is testing it appropriately. >>>>>> - excludeSites is supported as a common API feature. If a node is >>>>>> added with a name that matches an excluded site, it won't be added. The >>>>>> tree builder is smart enough to replace invalid "parentId" references with >>>>>> the shaper root, so if you have other tree items that rely on this site - >>>>>> they will be added to the tree. Was that the intent? (It looks pretty >>>>>> useful; we have a child site down the tree with a HUGE amount of load, and >>>>>> bumping it to the top-level with excludeSites would probably help our load >>>>>> balancing quite a bit) >>>>>> - If the intent was to exclude the site and everything >>>>>> underneath it, I'd have to rework things a bit. Let me know; it wasn't >>>>>> quite clear. >>>>>> - exceptionCPEs is also supported as a common API feature. It >>>>>> simply overrides the "parentId'' of incoming nodes with the new parent. >>>>>> Another potentially useful feature; if I got excludeSites the wrong away >>>>>> around, I'd add a "my_big_site":"" entry to push it to the top. >>>>>> - UISP integration now supports a "flat" topology option (set via >>>>>> uispStrategy = "flat" in ispConfig). I expanded >>>>>> ispConfig.example.py to include this entry. >>>>>> >>>>>> I'll look and see how much of the Spylnx code I can shorten with the >>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I >>>>>> *think* the new API should shorten things a lot. I think routers act >>>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup >>>>>> should be a little shorter (the CSV code can be replaced with a call to the >>>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer >>>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a >>>>>> RADIUS call in the DHCP server; I've always assumed that's because those >>>>>> systems prefer the telecom model of "pretend everything is equal" to trying >>>>>> to model topology.* >>>>>> >>>>>> I need to clean things up a bit (there's still a bit of duplicated >>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave >>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer, >>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I >>>>>> accidentally parented the branch off of a branch instead of main) - but I >>>>>> think I can have this as a PR for you on Monday. >>>>>> >>>>>> * - The first big wireless network I setup used a Motorola WiMAX >>>>>> setup. They *required* that every single AP share two VLANs >>>>>> (management and bearer) with every other AP - all the way to the core. It >>>>>> kinda worked once they remembered client isolation was a thing in a >>>>>> patch... Then again, their installation instructions included connecting >>>>>> two ports of a router together with a jumper cable, because their localhost >>>>>> implementation didn't quite work. :-| >>>>>> >>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón < >>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>> >>>>>>> Awesome work. It succeeded in building the topology and creating >>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice! >>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in >>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both >>>>>>> being empty lists []. >>>>>>> This is not necessarily bad, but it may lead to empty leaf classes >>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor >>>>>>> class counter increment toward the 32k limit faster. >>>>>>> Do you think perhaps we should check: >>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):* >>>>>>> * # Skip adding this entry to ShapedDevices.csv* >>>>>>> Or something similar around line 329 of integrationCommon.py? >>>>>>> Open to your suggestions there. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS < >>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>> >>>>>>>> One more update, and I'm going to sleep until "pick up daughter" >>>>>>>> time. :-) >>>>>>>> >>>>>>>> The tree at >>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>> can now build a network.json, ShapedDevices.csv, and >>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the >>>>>>>> previous importer - other than using data links to build the hierarchy and >>>>>>>> letting (requiring, currently) you specify the root node. It's handling our >>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I >>>>>>>> recommend just running integrationUISP.py and checking the output rather >>>>>>>> than throwing it into production), I'd appreciate any feedback. >>>>>>>> >>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and >>>>>>>> exceptionCPE and site exclusion. >>>>>>>> >>>>>>>> If you want the pretty graphics, you need to "pip install graphviz" >>>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't >>>>>>>> present and not try to draw pictures, otherwise. >>>>>>>> >>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón < >>>>>>>> robert.chacon@jackrabbitwireless.com> wrote: >>>>>>>> >>>>>>>>> Wow. This is very nicely done. Awesome work! >>>>>>>>> >>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS < >>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>> >>>>>>>>>> The integration is coming along nicely. Some progress updates: >>>>>>>>>> >>>>>>>>>> - You can specify a variable in ispConfig.py named >>>>>>>>>> "uispSite". This sets where in the topology you want the tree to start. >>>>>>>>>> This has two purposes: >>>>>>>>>> - It's hard to be psychic and know for sure where the >>>>>>>>>> shaper is in the network. >>>>>>>>>> - You could run multiple shapers at different egress >>>>>>>>>> points, with failover - and rebuild the entire topology from the point of >>>>>>>>>> view of a network node. >>>>>>>>>> - "Child node with children" are now automatically converted >>>>>>>>>> into a "(Generated Site) name" site, and their children rearranged. This: >>>>>>>>>> - Allows you to set the "site" bandwidth independently of >>>>>>>>>> the client site bandwidth. >>>>>>>>>> - Makes for easier trees, because we're inserting the site >>>>>>>>>> that really should be there. >>>>>>>>>> - Network.json generation (not the shaped devices file yet) >>>>>>>>>> is automatically generated from a tree, once PrepareTree() and >>>>>>>>>> createNetworkJson() are called. >>>>>>>>>> - There's a unit test that generates the >>>>>>>>>> network.example.json file and compares it with the original to ensure that >>>>>>>>>> they match. >>>>>>>>>> - Unit test coverage hits every function in the graph system, >>>>>>>>>> now. >>>>>>>>>> >>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic >>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate >>>>>>>>>> the example tree is down to 65 >>>>>>>>>> lines of code, including comments. That'll grow a bit as I >>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides >>>>>>>>>> ( >>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean. >>>>>>>>>> >>>>>>>>>> Creating the network.example.json file only requires: >>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType >>>>>>>>>> import json >>>>>>>>>> net = NetworkGraph() >>>>>>>>>> net.addRawNode(NetworkNode("Site_1", "Site_1", "", >>>>>>>>>> NodeType.site, 1000, 1000)) >>>>>>>>>> net.addRawNode(NetworkNode("Site_2", "Site_2", "", >>>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", >>>>>>>>>> NodeType.ap, 500, 500)) >>>>>>>>>> net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1", >>>>>>>>>> NodeType.site, 500, 500)) >>>>>>>>>> net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3", >>>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", >>>>>>>>>> NodeType.ap, 120, 120)) >>>>>>>>>> net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5", >>>>>>>>>> NodeType.site, 60, 60)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6", >>>>>>>>>> NodeType.ap, 30, 30)) >>>>>>>>>> net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2", >>>>>>>>>> NodeType.site, 200, 200)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", >>>>>>>>>> NodeType.ap, 100, 100)) >>>>>>>>>> net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", >>>>>>>>>> NodeType.ap, 150, 150)) >>>>>>>>>> net.prepareTree() >>>>>>>>>> net.createNetworkJson() >>>>>>>>>> >>>>>>>>>> (The id and name fields are duplicated right now, I'm using >>>>>>>>>> readable names to keep me sane. The third string is the parent, and the >>>>>>>>>> last two numbers are bandwidth limits) >>>>>>>>>> The nice, readable format being: >>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type= >>>>>>>>>> NodeType.site, download=1000, upload=1000) >>>>>>>>>> >>>>>>>>>> That in turns gives you the example network: >>>>>>>>>> [image: image.png] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson < >>>>>>>>>> herberticus@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Dave: I love those Gource animations! Game development is my >>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make >>>>>>>>>>> the glow "just right". :-) >>>>>>>>>>> >>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to >>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make >>>>>>>>>>> it easier to express network layout from integrations; that would be >>>>>>>>>>> another form/layer of integration and a lot easier to work with once >>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly >>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route >>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra" >>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you >>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some >>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense. >>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers >>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test, >>>>>>>>>>> since I don't have any v6 to test, currently). >>>>>>>>>>> >>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a >>>>>>>>>>> database for a ton of external tools (mostly ones we've created). >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan wrote: >>>>>>>>>>> >>>>>>>>>>>> we're pretty similar in that we've made UISP a mess. Multiple >>>>>>>>>>>> paths to a pop. multiple pops on the network. failover between pops. >>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers. >>>>>>>>>>>> >>>>>>>>>>>> Some sort of discovery would be nice. Ideally though, pulling >>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a >>>>>>>>>>>> 'network elements' list with each of the links described. ie, backhaul 12 >>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that >>>>>>>>>>>> from discovery. >>>>>>>>>>>> >>>>>>>>>>>> I've also thought about doing routine trace routes or watching >>>>>>>>>>>> TTLs or something like that to get some indication that topology has >>>>>>>>>>>> changed and then do another discovery and potential tree rebuild. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS < >>>>>>>>>>>> libreqos@lists.bufferbloat.net> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing >>>>>>>>>>>>> this. >>>>>>>>>>>>> Being able to map out these complex integrations will help >>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between >>>>>>>>>>>>> the Splynx and UISP integrations. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Robert >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and >>>>>>>>>>>>>> integrations in general) to work a bit more smoothly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I started by implementing a graph structure that mirrors both >>>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming >>>>>>>>>>>>>> together nicely. You can see my progress so far at: >>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph >>>>>>>>>>>>>> >>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the >>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular >>>>>>>>>>>>>> portion of the tree. We have: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Non Ubiquiti devices as "other devices" >>>>>>>>>>>>>> - Sections that need shaping by subnet (e.g. "all of >>>>>>>>>>>>>> 192.168.1.0/24 shared 100 mbit") >>>>>>>>>>>>>> - Bridge mode devices using Option 82 to always allocate >>>>>>>>>>>>>> the same IP, with a "service IP" entry >>>>>>>>>>>>>> - Various bits of infrastructure mapped >>>>>>>>>>>>>> - Sites that go to client sites, which go to other client >>>>>>>>>>>>>> sites >>>>>>>>>>>>>> >>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a >>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to >>>>>>>>>>>>>> be able to handle pathological cases like us! >>>>>>>>>>>>>> >>>>>>>>>>>>>> So I fed our network into the current graph generator, and >>>>>>>>>>>>>> used graphviz to spit out a directed graph: >>>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>>> That doesn't include client sites! Legend: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Green = the root site. >>>>>>>>>>>>>> - Red = a site >>>>>>>>>>>>>> - Blue = an access point >>>>>>>>>>>>>> - Magenta = a client site that has children >>>>>>>>>>>>>> >>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce >>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs, >>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is >>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both >>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I can't post the full tree, it's full of client names. >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Robert Chacón >>>>>>>>>>>>> CEO | JackRabbit Wireless LLC >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> LibreQoS mailing list >>>>>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>> LibreQoS mailing list >>>>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Chacón >>>>>>>>> CEO | JackRabbit Wireless LLC >>>>>>>>> Dev | LibreQoS.io >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>> LibreQoS mailing list >>>>>>>> LibreQoS@lists.bufferbloat.net >>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Robert Chacón >>>>>>> CEO | JackRabbit Wireless LLC >>>>>>> Dev | LibreQoS.io >>>>>>> >>>>>>> _______________________________________________ >>>>>> LibreQoS mailing list >>>>>> LibreQoS@lists.bufferbloat.net >>>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Chacón >>>>> CEO | JackRabbit Wireless LLC >>>>> Dev | LibreQoS.io >>>>> >>>>> _______________________________________________ >>>>> LibreQoS mailing list >>>>> LibreQoS@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/libreqos >>>>> >>>> >>>> >>>> -- >>>> This song goes out to all the folk that thought Stadia would work: >>>> >>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >>>> Dave Täht CEO, TekLibre, LLC >>>> >>> >> >> -- >> This song goes out to all the folk that thought Stadia would work: >> >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz >> Dave Täht CEO, TekLibre, LLC >> >