[LibreQoS] Integration system, aka fun with graph theory

Herbert Wolverson herberticus at gmail.com
Sun Oct 30 21:36:19 EDT 2022


On a high-level, I've been playing with:

   - The brute force approach: have a bigger buffer, so exhaustion is less
   likely to ever happen.
   - A shared "config" flag that turns off monitoring once exhaustion is
   near - it costs one synchronized lookup/increment, and gets reset when you
   read the stats.
   - Per-CPU buffers for the very volatile data, which is generally faster
   (at the expense of RAM) - but is also quite hard to manage from userspace.
   It significantly reduces the likelihood of stalling, but I'm not fond of
   the complexity so far.
   - Replacing the volatile "packet buffer" with a "least recently used"
   map that automatically gets rid of old data if it isn't cleaned up (the
   original only cleans up when a TCP connection closes gracefully)
   - Maintaining two sets of buffers and keeping a pointer to each. A
   shared config variable indicates whether we are currently writing to A or
   B. "Cleanup" cleans the *other* buffer and switches the pointers. So
   we're never sharing "hot" data with a userland cleanup.

That's a lot to play with, so I'm taking my time. My gut likes the A/B
switch, currently.

On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson <herberticus at gmail.com>
wrote:

> > "average" of "what"?
>
> Mean TCP RTT times, as measured by pping-cpumap. There's two steps of
> improvement; the original "pping" started to eat a bunch of CPU at higher
> traffic levels, and I had a feeling - not entirely quantified - that the
> excess CPU usage was causing some latency. Switching to pping-cpumap showed
> that I was correct in my hunch. On top of that,as Robert had observed, the
> previous version was causing a slight "stutter" when it filled the tracking
> buffers (and then recovered fine). My most recent build scales the tracking
> buffers up a LOT - which I was worried would cause some slowdown (since the
> program is now searching a much larger hashmap space, making it less cache
> friendly). The buffer increase fixed up the stutter issue. I probably
> should have been a little more clear on what I was talking about. I'm still
> trying to figure out the optimal buffer size, and the optimal stats
> collection (which "resets" the buffers, eliminating any resource depletion)
> period.
>
> I'm also experimenting with a few other ideas to keep the measurement
> latency more consistent. I tried "dump it all into a perfmap and figure it
> out in userspace" which went spectacularly badly. :-|
>
> The RTT measurements are from the customer to whatever the heck they are
> using on the Internet. So customers using a slow service that's
> bottlenecked far outside of my control will negatively affect the results -
> but there's nothing I can do about that. Coincidentally, it's the same
> "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself
> included) tend to have a "feel" for it. If I remember rightly, Preseem
> (which is basically fq-codel queues per customer, with an optional layer of
> AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms
> as "red" - and a lot of WISPs have become used to that grading. I always
> thought that an average of 70ms seemed pretty excessive to be "good". The
> idea is that it's quantifying the customer's *experience* - the lower the
> average, the snappier the connection "feels". You can have a pretty happy
> customer with very low latency and a low speed plan, if they aren't doing
> anything that needs to exhaust their speed plan. (This contrasts with a lot
> of other solutions - notably Sandvine - which have always focused heavily
> on "how much less upsteam does the ISP need to buy?")
>
> On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht at gmail.com> wrote:
>
>>
>>
>> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus at gmail.com>
>> wrote:
>>
>>> > For starters, let me also offer praise for this work which is so ahead
>>> of schedule!
>>>
>>> Thank you. I'm enjoying a short period while I wait for my editor to
>>> finish up with a couple of chapters of my next book (working title More
>>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
>>> of game development).
>>>
>>
>> cool. I'm 32 years into my PHD thesis.
>>
>>
>>>
>>> I think at least initially, the primary focus is on what WISPs are used
>>> to (and ask for): a fat shaper box that sits between a WISP and their
>>> Internet connection(s). Usually in the topology: (router connected to
>>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's
>>> network as a whole). That's a simplification; there's usually a bypass (in
>>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections
>>> that need shaping, etc. That's how Preseem (and the others) tend to insert
>>> themselves - shape everything on the way out.
>>>
>>
>> Presently LibreQos appears to be inserting about 200us of delay into the
>> path, for the sparsest packets. Every box on the path adds
>> delay, though cut-through switches are common. Don't talk to me about
>> network slicing and disaggregated this or that in the 3GPP world, tho...
>> ugh.
>>
>> I guess, for every "box" (or virtual machine) on the path I have amdah's
>> law stuck in my head.
>>
>> This is in part why the K8 crowd makes me a little crazy.
>>
>>
>>>
>>> I think there's a lot to be said for the possibility of LibreQoS at
>>> towers that need it the most, also. That might require a bit of MPLS
>>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier
>>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS
>>> headers?), but has the potential to really clean things up. Especially for
>>> a really busy tower site. (On a similar note, WISPs with multiple Internet
>>> connections at different sites would benefit from LibreQoS on each of
>>> them).
>>>
>>> Generally, the QoS box doesn't really care what you are running in the
>>> way of a router.
>>>
>>
>> It is certainly simpler to have a transparent middlebox for this stuff,
>> initially, and it would take a great leap of faith,
>> for many, to just plug in a lqos box as the main box... but cumulus did
>> succeed at a lot of that... they open sourced a bfd daemon... numerous
>> other tools...
>>
>> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/
>>
>>
>>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco
>>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since
>>> we're shaping in the "router sandwich" (which can be one router with a bit
>>> of care), we don't necessarily need to worry too much about their innards.
>>>
>>>
>> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff,
>> or a pair perhaps using a fiber optic splitter for failover
>>
>> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos
>>
>>
>>
>>
>>> With that said, some future SNMP support (please, not polling everything
>>> all the time... that's a monitoring program's job!) is probably hard to
>>> avoid. At least that's relatively vendor agnostic (even if Ubiquiti seem to
>>> be trying to cease supporting  it, ugh)
>>>
>>>
>> Building on this initial core strength - sampling RTT - would be a
>> differentiator.
>>
>> Examples:
>>
>> RTT per AP
>> RTT P1 per AP (what's the effective minimum)
>> RTT P99 (what's the worst case?)
>> RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS
>> number or /24
>>
>> (variance is a very important concept)
>>
>>
>>
>>
>>
>>> I could see some support for outputting rules for routers, especially if
>>> the goal is to get Cake managing buffer-bloat in many places down the line.
>>>
>>> Incidentally, using my latest build of cpumap-pping (and no separate
>>> pping running, eating a CPU) my average network latency has dropped to 24ms
>>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
>>> customer traffic through the system. :-)
>>>
>>
>> OK, this is something that "triggers" my inner pedant. Forgive me in
>> advance?
>>
>> "average" of "what"?
>>
>> Changing the monitoring tool shouldn't have affected the average latency,
>> unless how it is calculated is different, or the sample
>> population (more likely) has changed. If you are tracking now far more
>> short flows, the observed latency will decline, but the
>> higher latencies you were observing in the first place are still there.
>>
>> Also... between where and where? Across the network? To the customer to
>> their typical set of IP addresses of their servers?
>> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
>> should take under 2ms). Wifi hops at the end of the link are
>> probably adding the most delay...
>>
>> If you consider 24ms "good" - however you calculate -  going for ever
>> less via whatever means can be obtained from these
>> analyses, is useful. But there are some things I don't think make as much
>> sense as they used to - a netflix cache hitrate must
>> be so low nowadays as to cost you just as much to fetch it from upstream
>> than host a box...
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht at gmail.com> wrote:
>>>
>>>> For starters, let me also offer praise for this work which is so ahead
>>>> of schedule!
>>>>
>>>> I am (perhaps cluelessly) thinking about bigger pictures, and still
>>>> stuck in my mindset involving distributing the packet processing,
>>>> and representing the network topology, plans and compensating for the
>>>> physics.
>>>>
>>>> So you have a major tower, a separate libreqos instance goes there. Or
>>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>>>> there. Or are you basically thinking one device rules them all and off the
>>>> only interface, shapes them?
>>>>
>>>> Or:
>>>>
>>>> You have another pop with a separate connection to the internet that
>>>> you inherited from a buyout, or you wanted physical redundancy for your BGP
>>>> AS's internet access, maybe just between DCs in the same town or...
>>>>     ____________________________________________
>>>>
>>>> /
>>>> /
>>>> cloud -> pop -> customers - customers <- pop <- cloud
>>>>                  \  ----- leased fiber or wireless   /
>>>>
>>>>
>>>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>>>> cisco? mikrotik, and what role and services that is expected to have.
>>>>
>>>>
>>>>
>>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>
>>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are not
>>>>> added.
>>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>>> side of things, so it can be used in other integrations also. I don't have
>>>>> a setup on which to test it, but if I'm reading the code right then the
>>>>> unit test is testing it appropriately.
>>>>>
>>>>> Fantastic.
>>>>>
>>>>> > excludeSites is supported as a common API feature. If a node is
>>>>> added with a name that matches an excluded site, it won't be added. The
>>>>> tree builder is smart enough to replace invalid "parentId" references with
>>>>> the shaper root, so if you have other tree items that rely on this site -
>>>>> they will be added to the tree. Was that the intent? (It looks pretty
>>>>> useful; we have a child site down the tree with a HUGE amount of load, and
>>>>> bumping it to the top-level with excludeSites would probably help our load
>>>>> balancing quite a bit)
>>>>>
>>>>> Very cool approach, I like it! Yeah we have some cases where we need
>>>>> to balance out high load child nodes across CPUs so that's perfect.
>>>>> Originally I thought of it to just exclude sites that don't fit into
>>>>> the shaped topology but this approach is more useful.
>>>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>>>> That functionality of distributing across top level nodes / cpu cores seems
>>>>> more important anyway.
>>>>>
>>>>> >exceptionCPEs is also supported as a common API feature. It simply
>>>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>>>> potentially useful feature; if I got excludeSites the wrong away around,
>>>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>>>
>>>>> Awesome
>>>>>
>>>>> > UISP integration now supports a "flat" topology option (set via
>>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>>> to include this entry.
>>>>>
>>>>> Nice!
>>>>>
>>>>> > I'll look and see how much of the Spylnx code I can shorten with the
>>>>> new API; I don't have a Spylnx setup to test against, making that tricky.
>>>>>
>>>>> I'll send you the Splynx login they gave us.
>>>>>
>>>>> > I *think* the new API should shorten things a lot. I think routers
>>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>> to model topology.*
>>>>>
>>>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>>>> even AP designation, one person I spoke to said they track corresponding
>>>>> APs in radius anyway. So for now the flat model may be fine.
>>>>>
>>>>> > I need to clean things up a bit (there's still a bit of duplicated
>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>> think I can have this as a PR for you on Monday.
>>>>>
>>>>> This is really great work and will make future integrations much
>>>>> cleaner and nicer to work with. Thank you!
>>>>>
>>>>>
>>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>
>>>>>> Alright, the UISP side of the common integrations is pretty much
>>>>>> feature complete. I'll update the tracking issue in a bit.
>>>>>>
>>>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6)
>>>>>>    are not added.
>>>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>>>    then the unit test is testing it appropriately.
>>>>>>    - excludeSites is supported as a common API feature. If a node is
>>>>>>    added with a name that matches an excluded site, it won't be added. The
>>>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>>>    balancing quite a bit)
>>>>>>       - If the intent was to exclude the site and everything
>>>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>>>       quite clear.
>>>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>>>    - UISP integration now supports a "flat" topology option (set via
>>>>>>    uispStrategy = "flat" in ispConfig). I expanded
>>>>>>    ispConfig.example.py to include this entry.
>>>>>>
>>>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>>>> *think* the new API should shorten things a lot. I think routers act
>>>>>> as node parents, with clients underneath them? Otherwise, a "flat" setup
>>>>>> should be a little shorter (the CSV code can be replaced with a call to the
>>>>>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>>>>>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>>>>>> RADIUS call in the DHCP server; I've always assumed that's because those
>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>>> to model topology.*
>>>>>>
>>>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>>> think I can have this as a PR for you on Monday.
>>>>>>
>>>>>> * - The first big wireless network I setup used a Motorola WiMAX
>>>>>> setup. They *required* that every single AP share two VLANs
>>>>>> (management and bearer) with every other AP - all the way to the core. It
>>>>>> kinda worked once they remembered client isolation was a thing in a
>>>>>> patch... Then again, their installation instructions included connecting
>>>>>> two ports of a router together with a jumper cable, because their localhost
>>>>>> implementation didn't quite work. :-|
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>>>> robert.chacon at jackrabbitwireless.com> wrote:
>>>>>>
>>>>>>> Awesome work. It succeeded in building the topology and creating
>>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>>>> being empty lists [].
>>>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>>>> class counter increment toward the 32k limit faster.
>>>>>>> Do you think perhaps we should check:
>>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>>>> Open to your suggestions there.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>
>>>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>>>> time. :-)
>>>>>>>>
>>>>>>>> The tree at
>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>>>
>>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>>>> exceptionCPE and site exclusion.
>>>>>>>>
>>>>>>>> If you want the pretty graphics, you need to "pip install graphviz"
>>>>>>>> and "sudo apt install graphviz". It *should* detect that these aren't
>>>>>>>> present and not try to draw pictures, otherwise.
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>>>> robert.chacon at jackrabbitwireless.com> wrote:
>>>>>>>>
>>>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>
>>>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>>>
>>>>>>>>>>    - You can specify a variable in ispConfig.py named
>>>>>>>>>>    "uispSite". This sets where in the topology you want the tree to start.
>>>>>>>>>>    This has two purposes:
>>>>>>>>>>       - It's hard to be psychic and know for sure where the
>>>>>>>>>>       shaper is in the network.
>>>>>>>>>>       - You could run multiple shapers at different egress
>>>>>>>>>>       points, with failover - and rebuild the entire topology from the point of
>>>>>>>>>>       view of a network node.
>>>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>>>       the client site bandwidth.
>>>>>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>>>>>       that really should be there.
>>>>>>>>>>    - Network.json generation (not the shaped devices file yet)
>>>>>>>>>>    is automatically generated from a tree, once PrepareTree() and
>>>>>>>>>>    createNetworkJson() are called.
>>>>>>>>>>       - There's a unit test that generates the
>>>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>>>       they match.
>>>>>>>>>>    - Unit test coverage hits every function in the graph system,
>>>>>>>>>>    now.
>>>>>>>>>>
>>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic
>>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate
>>>>>>>>>> the example tree is down to 65
>>>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>>>> (
>>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>>>
>>>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>>>>>         import json
>>>>>>>>>>         net = NetworkGraph()
>>>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "",
>>>>>>>>>> NodeType.site, 1000, 1000))
>>>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "",
>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>>>         net.prepareTree()
>>>>>>>>>>         net.createNetworkJson()
>>>>>>>>>>
>>>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>>>> The nice, readable format being:
>>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>>>>>
>>>>>>>>>> That in turns gives you the example network:
>>>>>>>>>> [image: image.png]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>>>> herberticus at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Dave: I love those Gource animations! Game development is my
>>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make
>>>>>>>>>>> the glow "just right". :-)
>>>>>>>>>>>
>>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>>>
>>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>>>
>>>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>>>> from discovery.
>>>>>>>>>>>>
>>>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing
>>>>>>>>>>>>> this.
>>>>>>>>>>>>> Being able to map out these complex integrations will help
>>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between
>>>>>>>>>>>>> the Splynx and UISP integrations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Robert
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS
>>>>>>>>>>>>> <libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I started by implementing a graph structure that mirrors both
>>>>>>>>>>>>>> the networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>>>>>    sites
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I fed our network into the current graph generator, and
>>>>>>>>>>>>>> used graphviz to spit out a directed graph:
>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Robert Chacón
>>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Chacón
>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>> Dev | LibreQoS.io
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>> LibreQoS mailing list
>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Chacón
>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>> Dev | LibreQoS.io
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> Dev | LibreQoS.io
>>>>>
>>>>> _______________________________________________
>>>>> LibreQoS mailing list
>>>>> LibreQoS at lists.bufferbloat.net
>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>
>>>>
>>>>
>>>> --
>>>> This song goes out to all the folk that thought Stadia would work:
>>>>
>>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>>> Dave Täht CEO, TekLibre, LLC
>>>>
>>>
>>
>> --
>> This song goes out to all the folk that thought Stadia would work:
>>
>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>> Dave Täht CEO, TekLibre, LLC
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221030/2e3e7e02/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 573568 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221030/2e3e7e02/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 115596 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221030/2e3e7e02/attachment-0003.png>


More information about the LibreQoS mailing list