[LibreQoS] Integration system, aka fun with graph theory

Herbert Wolverson herberticus at gmail.com
Sun Oct 30 21:46:57 EDT 2022


While I remember, a quick Preseem anecdote. The majority of WISPs I've
talked to who have adopted Preseem run it in "monitor only" mode for a bit,
and then turn it on. That way, you can see that it did something. Not a bad
idea for us to support. It's *remarkable* how many WISPs see a sea of red
when they first start monitoring - 100ms+ RTT times (for whatever customer
traffic exists) is pretty common. Just enabling FQ_CODEL, mapped to the
customer's speed limit, tends to start bringing things down into the
green/yellow. I begged them for Cake a few times (along with the ability to
set site/backhaul hierarchies) - and was always told "it's not worth the
extra CPU load". Our experience, turning on BracketQoS (which is basically
LibreQoS, in Rust and designed for our network) was that the remaining reds
became yellows, the remaining yellows became green and customers reported a
"snappier" experience. It's so hard to quantify the latter. I could feel
the difference at my desk; fire up a video while a download was running,
and it simply "felt" like it responded better. TCP RTT times are the best
measure of "feel" I've found, so far.

We've tended to go with "median" latency as a guide, rather than mean.
Thanks to monitoring things beyond our control, some of the outliers tend
to be *really bad* - even if the network is fine. There's literally nothing
we can do about a customer trying to work with a malfunctioning system
somewhere (in space, for all I know!)

On Sun, Oct 30, 2022 at 8:36 PM Herbert Wolverson <herberticus at gmail.com>
wrote:

> On a high-level, I've been playing with:
>
>    - The brute force approach: have a bigger buffer, so exhaustion is
>    less likely to ever happen.
>    - A shared "config" flag that turns off monitoring once exhaustion is
>    near - it costs one synchronized lookup/increment, and gets reset when you
>    read the stats.
>    - Per-CPU buffers for the very volatile data, which is generally
>    faster (at the expense of RAM) - but is also quite hard to manage from
>    userspace. It significantly reduces the likelihood of stalling, but I'm not
>    fond of the complexity so far.
>    - Replacing the volatile "packet buffer" with a "least recently used"
>    map that automatically gets rid of old data if it isn't cleaned up (the
>    original only cleans up when a TCP connection closes gracefully)
>    - Maintaining two sets of buffers and keeping a pointer to each. A
>    shared config variable indicates whether we are currently writing to A or
>    B. "Cleanup" cleans the *other* buffer and switches the pointers. So
>    we're never sharing "hot" data with a userland cleanup.
>
> That's a lot to play with, so I'm taking my time. My gut likes the A/B
> switch, currently.
>
> On Sun, Oct 30, 2022 at 8:26 PM Herbert Wolverson <herberticus at gmail.com>
> wrote:
>
>> > "average" of "what"?
>>
>> Mean TCP RTT times, as measured by pping-cpumap. There's two steps of
>> improvement; the original "pping" started to eat a bunch of CPU at higher
>> traffic levels, and I had a feeling - not entirely quantified - that the
>> excess CPU usage was causing some latency. Switching to pping-cpumap showed
>> that I was correct in my hunch. On top of that,as Robert had observed, the
>> previous version was causing a slight "stutter" when it filled the tracking
>> buffers (and then recovered fine). My most recent build scales the tracking
>> buffers up a LOT - which I was worried would cause some slowdown (since the
>> program is now searching a much larger hashmap space, making it less cache
>> friendly). The buffer increase fixed up the stutter issue. I probably
>> should have been a little more clear on what I was talking about. I'm still
>> trying to figure out the optimal buffer size, and the optimal stats
>> collection (which "resets" the buffers, eliminating any resource depletion)
>> period.
>>
>> I'm also experimenting with a few other ideas to keep the measurement
>> latency more consistent. I tried "dump it all into a perfmap and figure it
>> out in userspace" which went spectacularly badly. :-|
>>
>> The RTT measurements are from the customer to whatever the heck they are
>> using on the Internet. So customers using a slow service that's
>> bottlenecked far outside of my control will negatively affect the results -
>> but there's nothing I can do about that. Coincidentally, it's the same
>> "QoE" metric that Preseem uses - so Preseem to LibreQoS refugees (myself
>> included) tend to have a "feel" for it. If I remember rightly, Preseem
>> (which is basically fq-codel queues per customer, with an optional layer of
>> AP queues above) ranks 0-74 ms as "green", 75-100 ms a "yellow" and 100+ ms
>> as "red" - and a lot of WISPs have become used to that grading. I always
>> thought that an average of 70ms seemed pretty excessive to be "good". The
>> idea is that it's quantifying the customer's *experience* - the lower
>> the average, the snappier the connection "feels". You can have a pretty
>> happy customer with very low latency and a low speed plan, if they aren't
>> doing anything that needs to exhaust their speed plan. (This contrasts with
>> a lot of other solutions - notably Sandvine - which have always focused
>> heavily on "how much less upsteam does the ISP need to buy?")
>>
>> On Sun, Oct 30, 2022 at 7:15 PM Dave Taht <dave.taht at gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Oct 29, 2022 at 6:45 PM Herbert Wolverson <herberticus at gmail.com>
>>> wrote:
>>>
>>>> > For starters, let me also offer praise for this work which is so
>>>> ahead of schedule!
>>>>
>>>> Thank you. I'm enjoying a short period while I wait for my editor to
>>>> finish up with a couple of chapters of my next book (working title More
>>>> Hands-on Rust; it's intermediate to advanced Rust, taught through the lens
>>>> of game development).
>>>>
>>>
>>> cool. I'm 32 years into my PHD thesis.
>>>
>>>
>>>>
>>>> I think at least initially, the primary focus is on what WISPs are used
>>>> to (and ask for): a fat shaper box that sits between a WISP and their
>>>> Internet connection(s). Usually in the topology: (router connected to
>>>> upstream) <--> (LibreQoS) <--> (core site router, connected to the WISP's
>>>> network as a whole). That's a simplification; there's usually a bypass (in
>>>> case LibreQoS dies, is being updated, etc.), sometimes multiple connections
>>>> that need shaping, etc. That's how Preseem (and the others) tend to insert
>>>> themselves - shape everything on the way out.
>>>>
>>>
>>> Presently LibreQos appears to be inserting about 200us of delay into the
>>> path, for the sparsest packets. Every box on the path adds
>>> delay, though cut-through switches are common. Don't talk to me about
>>> network slicing and disaggregated this or that in the 3GPP world, tho...
>>> ugh.
>>>
>>> I guess, for every "box" (or virtual machine) on the path I have amdah's
>>> law stuck in my head.
>>>
>>> This is in part why the K8 crowd makes me a little crazy.
>>>
>>>
>>>>
>>>> I think there's a lot to be said for the possibility of LibreQoS at
>>>> towers that need it the most, also. That might require a bit of MPLS
>>>> support (I can do the xdp-cpumap-tc part; I'm not sure what the classifier
>>>> does if it receives a packet with the TCP/UDP header stuck behind some MPLS
>>>> headers?), but has the potential to really clean things up. Especially for
>>>> a really busy tower site. (On a similar note, WISPs with multiple Internet
>>>> connections at different sites would benefit from LibreQoS on each of
>>>> them).
>>>>
>>>> Generally, the QoS box doesn't really care what you are running in the
>>>> way of a router.
>>>>
>>>
>>> It is certainly simpler to have a transparent middlebox for this stuff,
>>> initially, and it would take a great leap of faith,
>>> for many, to just plug in a lqos box as the main box... but cumulus did
>>> succeed at a lot of that... they open sourced a bfd daemon... numerous
>>> other tools...
>>>
>>> https://www.nvidia.com/en-us/networking/ethernet-switching/cumulus-linux/
>>>
>>>
>>>> We run mostly Mikrotik (with a bit of FreeBSD, and a tiny bit of Cisco
>>>> in the mix too!), I know of people who love Juniper, use Cisco, etc. Since
>>>> we're shaping in the "router sandwich" (which can be one router with a bit
>>>> of care), we don't necessarily need to worry too much about their innards.
>>>>
>>>>
>>> An ISP in a SDN shaping whitebox that does all that juniper/cisco stuff,
>>> or a pair perhaps using a fiber optic splitter for failover
>>>
>>> http://www.comlaninc.com/products/fiber-optic-products/id/23/cl-fos
>>>
>>>
>>>
>>>
>>>> With that said, some future SNMP support (please, not polling
>>>> everything all the time... that's a monitoring program's job!) is probably
>>>> hard to avoid. At least that's relatively vendor agnostic (even if Ubiquiti
>>>> seem to be trying to cease supporting  it, ugh)
>>>>
>>>>
>>> Building on this initial core strength - sampling RTT - would be a
>>> differentiator.
>>>
>>> Examples:
>>>
>>> RTT per AP
>>> RTT P1 per AP (what's the effective minimum)
>>> RTT P99 (what's the worst case?)
>>> RTT variance  P1 to P99 per internet IP (worst 20 performers) or AS
>>> number or /24
>>>
>>> (variance is a very important concept)
>>>
>>>
>>>
>>>
>>>
>>>> I could see some support for outputting rules for routers, especially
>>>> if the goal is to get Cake managing buffer-bloat in many places down the
>>>> line.
>>>>
>>>> Incidentally, using my latest build of cpumap-pping (and no separate
>>>> pping running, eating a CPU) my average network latency has dropped to 24ms
>>>> at peak time (from 40ms). At peak time, while pulling 1.8 gbps of real
>>>> customer traffic through the system. :-)
>>>>
>>>
>>> OK, this is something that "triggers" my inner pedant. Forgive me in
>>> advance?
>>>
>>> "average" of "what"?
>>>
>>> Changing the monitoring tool shouldn't have affected the average
>>> latency, unless how it is calculated is different, or the sample
>>> population (more likely) has changed. If you are tracking now far more
>>> short flows, the observed latency will decline, but the
>>> higher latencies you were observing in the first place are still there.
>>>
>>> Also... between where and where? Across the network? To the customer to
>>> their typical set of IP addresses of their servers?
>>> on wireless? vs fiber? ( Transiting a fiber network to your pop's edge
>>> should take under 2ms). Wifi hops at the end of the link are
>>> probably adding the most delay...
>>>
>>> If you consider 24ms "good" - however you calculate -  going for ever
>>> less via whatever means can be obtained from these
>>> analyses, is useful. But there are some things I don't think make as
>>> much sense as they used to - a netflix cache hitrate must
>>> be so low nowadays as to cost you just as much to fetch it from upstream
>>> than host a box...
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Oct 29, 2022 at 2:43 PM Dave Taht <dave.taht at gmail.com> wrote:
>>>>
>>>>> For starters, let me also offer praise for this work which is so ahead
>>>>> of schedule!
>>>>>
>>>>> I am (perhaps cluelessly) thinking about bigger pictures, and still
>>>>> stuck in my mindset involving distributing the packet processing,
>>>>> and representing the network topology, plans and compensating for the
>>>>> physics.
>>>>>
>>>>> So you have a major tower, a separate libreqos instance goes there. Or
>>>>> libreqos outputs rules compatible with mikrotik or vyatta or whatever is
>>>>> there. Or are you basically thinking one device rules them all and off the
>>>>> only interface, shapes them?
>>>>>
>>>>> Or:
>>>>>
>>>>> You have another pop with a separate connection to the internet that
>>>>> you inherited from a buyout, or you wanted physical redundancy for your BGP
>>>>> AS's internet access, maybe just between DCs in the same town or...
>>>>>     ____________________________________________
>>>>>
>>>>> /
>>>>> /
>>>>> cloud -> pop -> customers - customers <- pop <- cloud
>>>>>                  \  ----- leased fiber or wireless   /
>>>>>
>>>>>
>>>>> I'm also a little puzzled as to whats the ISP->internet link? juniper?
>>>>> cisco? mikrotik, and what role and services that is expected to have.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Oct 29, 2022 at 12:06 PM Robert Chacón via LibreQoS <
>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>
>>>>>> > Per your suggestion, devices with no IP addresses (v4 or v6) are
>>>>>> not added.
>>>>>> > Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>>>>> side of things, so it can be used in other integrations also. I don't have
>>>>>> a setup on which to test it, but if I'm reading the code right then the
>>>>>> unit test is testing it appropriately.
>>>>>>
>>>>>> Fantastic.
>>>>>>
>>>>>> > excludeSites is supported as a common API feature. If a node is
>>>>>> added with a name that matches an excluded site, it won't be added. The
>>>>>> tree builder is smart enough to replace invalid "parentId" references with
>>>>>> the shaper root, so if you have other tree items that rely on this site -
>>>>>> they will be added to the tree. Was that the intent? (It looks pretty
>>>>>> useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>> bumping it to the top-level with excludeSites would probably help our load
>>>>>> balancing quite a bit)
>>>>>>
>>>>>> Very cool approach, I like it! Yeah we have some cases where we need
>>>>>> to balance out high load child nodes across CPUs so that's perfect.
>>>>>> Originally I thought of it to just exclude sites that don't fit into
>>>>>> the shaped topology but this approach is more useful.
>>>>>> Should we rename excludeSites to moveSitesToTop or something similar?
>>>>>> That functionality of distributing across top level nodes / cpu cores seems
>>>>>> more important anyway.
>>>>>>
>>>>>> >exceptionCPEs is also supported as a common API feature. It simply
>>>>>> overrides the "parentId'' of incoming nodes with the new parent. Another
>>>>>> potentially useful feature; if I got excludeSites the wrong away around,
>>>>>> I'd add a "my_big_site":"" entry to push it to the top.
>>>>>>
>>>>>> Awesome
>>>>>>
>>>>>> > UISP integration now supports a "flat" topology option (set via
>>>>>> uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>>>>> to include this entry.
>>>>>>
>>>>>> Nice!
>>>>>>
>>>>>> > I'll look and see how much of the Spylnx code I can shorten with
>>>>>> the new API; I don't have a Spylnx setup to test against, making that
>>>>>> tricky.
>>>>>>
>>>>>> I'll send you the Splynx login they gave us.
>>>>>>
>>>>>> > I *think* the new API should shorten things a lot. I think routers
>>>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>>> to model topology.*
>>>>>>
>>>>>> Yeah splynx doesn't seem to natively support any topology mapping or
>>>>>> even AP designation, one person I spoke to said they track corresponding
>>>>>> APs in radius anyway. So for now the flat model may be fine.
>>>>>>
>>>>>> > I need to clean things up a bit (there's still a bit of duplicated
>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>>> think I can have this as a PR for you on Monday.
>>>>>>
>>>>>> This is really great work and will make future integrations much
>>>>>> cleaner and nicer to work with. Thank you!
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 29, 2022 at 9:57 AM Herbert Wolverson via LibreQoS <
>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>
>>>>>>> Alright, the UISP side of the common integrations is pretty much
>>>>>>> feature complete. I'll update the tracking issue in a bit.
>>>>>>>
>>>>>>>    - Per your suggestion, devices with no IP addresses (v4 or v6)
>>>>>>>    are not added.
>>>>>>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the
>>>>>>>    "common" side of things, so it can be used in other integrations also. I
>>>>>>>    don't have a setup on which to test it, but if I'm reading the code right
>>>>>>>    then the unit test is testing it appropriately.
>>>>>>>    - excludeSites is supported as a common API feature. If a node
>>>>>>>    is added with a name that matches an excluded site, it won't be added. The
>>>>>>>    tree builder is smart enough to replace invalid "parentId" references with
>>>>>>>    the shaper root, so if you have other tree items that rely on this site -
>>>>>>>    they will be added to the tree. Was that the intent? (It looks pretty
>>>>>>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>>>>>>    bumping it to the top-level with excludeSites would probably help our load
>>>>>>>    balancing quite a bit)
>>>>>>>       - If the intent was to exclude the site and everything
>>>>>>>       underneath it, I'd have to rework things a bit. Let me know; it wasn't
>>>>>>>       quite clear.
>>>>>>>       - exceptionCPEs is also supported as a common API feature. It
>>>>>>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>>>>>>    Another potentially useful feature; if I got excludeSites the wrong away
>>>>>>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>>>>>>    - UISP integration now supports a "flat" topology option (set
>>>>>>>    via uispStrategy = "flat" in ispConfig). I expanded
>>>>>>>    ispConfig.example.py to include this entry.
>>>>>>>
>>>>>>> I'll look and see how much of the Spylnx code I can shorten with the
>>>>>>> new API; I don't have a Spylnx setup to test against, making that tricky. I
>>>>>>> *think* the new API should shorten things a lot. I think routers
>>>>>>> act as node parents, with clients underneath them? Otherwise, a "flat"
>>>>>>> setup should be a little shorter (the CSV code can be replaced with a call
>>>>>>> to the graph builder). Most of the Spylnx (and VISP) users I've talked to
>>>>>>> layer MPLS+VPLS to pretend to have a big, flat network and then connect via
>>>>>>> a RADIUS call in the DHCP server; I've always assumed that's because those
>>>>>>> systems prefer the telecom model of "pretend everything is equal" to trying
>>>>>>> to model topology.*
>>>>>>>
>>>>>>> I need to clean things up a bit (there's still a bit of duplicated
>>>>>>> code, and I believe in the DRY principle - don't repeat yourself; Dave
>>>>>>> Thomas - my boss at PragProg - coined the term in The Pragmatic Programmer,
>>>>>>> and I feel obliged to use it everywhere!), and do a quick rebase (I
>>>>>>> accidentally parented the branch off of a branch instead of main) - but I
>>>>>>> think I can have this as a PR for you on Monday.
>>>>>>>
>>>>>>> * - The first big wireless network I setup used a Motorola WiMAX
>>>>>>> setup. They *required* that every single AP share two VLANs
>>>>>>> (management and bearer) with every other AP - all the way to the core. It
>>>>>>> kinda worked once they remembered client isolation was a thing in a
>>>>>>> patch... Then again, their installation instructions included connecting
>>>>>>> two ports of a router together with a jumper cable, because their localhost
>>>>>>> implementation didn't quite work. :-|
>>>>>>>
>>>>>>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>>>>>>> robert.chacon at jackrabbitwireless.com> wrote:
>>>>>>>
>>>>>>>> Awesome work. It succeeded in building the topology and creating
>>>>>>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>>>>>>> I notice that in ShapedDevices.csv it does add CPE radios (which in
>>>>>>>> our case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>>>>>>> being empty lists [].
>>>>>>>> This is not necessarily bad, but it may lead to empty leaf classes
>>>>>>>> being created on LibreQoS.py runs. Not a huge deal, it just makes the minor
>>>>>>>> class counter increment toward the 32k limit faster.
>>>>>>>> Do you think perhaps we should check:
>>>>>>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>>>>>>> *   # Skip adding this entry to ShapedDevices.csv*
>>>>>>>> Or something similar around line 329 of integrationCommon.py?
>>>>>>>> Open to your suggestions there.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> One more update, and I'm going to sleep until "pick up daughter"
>>>>>>>>> time. :-)
>>>>>>>>>
>>>>>>>>> The tree at
>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>> can now build a network.json, ShapedDevices.csv, and
>>>>>>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>>>>>>> previous importer - other than using data links to build the hierarchy and
>>>>>>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>>>>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>>>>>>> recommend just running integrationUISP.py and checking the output rather
>>>>>>>>> than throwing it into production), I'd appreciate any feedback.
>>>>>>>>>
>>>>>>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>>>>>>> exceptionCPE and site exclusion.
>>>>>>>>>
>>>>>>>>> If you want the pretty graphics, you need to "pip install
>>>>>>>>> graphviz" and "sudo apt install graphviz". It *should* detect that these
>>>>>>>>> aren't present and not try to draw pictures, otherwise.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>>>>>>> robert.chacon at jackrabbitwireless.com> wrote:
>>>>>>>>>
>>>>>>>>>> Wow. This is very nicely done. Awesome work!
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>>
>>>>>>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>>>>>>
>>>>>>>>>>>    - You can specify a variable in ispConfig.py named
>>>>>>>>>>>    "uispSite". This sets where in the topology you want the tree to start.
>>>>>>>>>>>    This has two purposes:
>>>>>>>>>>>       - It's hard to be psychic and know for sure where the
>>>>>>>>>>>       shaper is in the network.
>>>>>>>>>>>       - You could run multiple shapers at different egress
>>>>>>>>>>>       points, with failover - and rebuild the entire topology from the point of
>>>>>>>>>>>       view of a network node.
>>>>>>>>>>>    - "Child node with children" are now automatically converted
>>>>>>>>>>>    into a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>>>>>>       - Allows you to set the "site" bandwidth independently of
>>>>>>>>>>>       the client site bandwidth.
>>>>>>>>>>>       - Makes for easier trees, because we're inserting the
>>>>>>>>>>>       site that really should be there.
>>>>>>>>>>>    - Network.json generation (not the shaped devices file yet)
>>>>>>>>>>>    is automatically generated from a tree, once PrepareTree() and
>>>>>>>>>>>    createNetworkJson() are called.
>>>>>>>>>>>       - There's a unit test that generates the
>>>>>>>>>>>       network.example.json file and compares it with the original to ensure that
>>>>>>>>>>>       they match.
>>>>>>>>>>>    - Unit test coverage hits every function in the graph
>>>>>>>>>>>    system, now.
>>>>>>>>>>>
>>>>>>>>>>> I'm liking this setup. With the non-vendor-specific logic
>>>>>>>>>>> contained inside the NetworkGraph type, the actual UISP code to generate
>>>>>>>>>>> the example tree is down to 65
>>>>>>>>>>> lines of code, including comments. That'll grow a bit as I
>>>>>>>>>>> re-insert some automatic speed limit determination, AP/Site speed overrides
>>>>>>>>>>> (
>>>>>>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>>>>>>
>>>>>>>>>>> Creating the network.example.json file only requires:
>>>>>>>>>>> from integrationCommon import NetworkGraph, NetworkNode,
>>>>>>>>>>> NodeType
>>>>>>>>>>>         import json
>>>>>>>>>>>         net = NetworkGraph()
>>>>>>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "",
>>>>>>>>>>> NodeType.site, 1000, 1000))
>>>>>>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "",
>>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1",
>>>>>>>>>>> NodeType.ap, 500, 500))
>>>>>>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>>>>>>> NodeType.site, 500, 500))
>>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5",
>>>>>>>>>>> NodeType.ap, 120, 120))
>>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>>>>>>> NodeType.site, 60, 60))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>>>>>>> NodeType.ap, 30, 30))
>>>>>>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>>>>>>> NodeType.site, 200, 200))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1",
>>>>>>>>>>> NodeType.ap, 100, 100))
>>>>>>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2",
>>>>>>>>>>> NodeType.ap, 150, 150))
>>>>>>>>>>>         net.prepareTree()
>>>>>>>>>>>         net.createNetworkJson()
>>>>>>>>>>>
>>>>>>>>>>> (The id and name fields are duplicated right now, I'm using
>>>>>>>>>>> readable names to keep me sane. The third string is the parent, and the
>>>>>>>>>>> last two numbers are bandwidth limits)
>>>>>>>>>>> The nice, readable format being:
>>>>>>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type
>>>>>>>>>>> =NodeType.site, download=1000, upload=1000)
>>>>>>>>>>>
>>>>>>>>>>> That in turns gives you the example network:
>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>>>>>>> herberticus at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dave: I love those Gource animations! Game development is my
>>>>>>>>>>>> other hobby, I could easily get lost for weeks tweaking the shaders to make
>>>>>>>>>>>> the glow "just right". :-)
>>>>>>>>>>>>
>>>>>>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to
>>>>>>>>>>>> look in that direction yet. I'm trying to build a "common grammar" to make
>>>>>>>>>>>> it easier to express network layout from integrations; that would be
>>>>>>>>>>>> another form/layer of integration and a lot easier to work with once
>>>>>>>>>>>> there's a solid foundation. Preseem does some of this (admittedly
>>>>>>>>>>>> over-eagerly; nothing needs to query SNMP that often!), and the SNMP route
>>>>>>>>>>>> is quite remarkably convoluted. Their support turned on a few "extra"
>>>>>>>>>>>> modules to deal with things like PMP450 clients that change MAC when you
>>>>>>>>>>>> put them in bridge mode vs NAT mode (and report the bridge mode CPE in some
>>>>>>>>>>>> places either way), Elevate CPEs that almost but not quite make sense.
>>>>>>>>>>>> Robert's code has the beginnings of some of this, scanning Mikrotik routers
>>>>>>>>>>>> for IPv6 allocations by MAC (this is also the hardest part for me to test,
>>>>>>>>>>>> since I don't have any v6 to test, currently).
>>>>>>>>>>>>
>>>>>>>>>>>> We tend to use UISP as the "source of truth" and treat it like
>>>>>>>>>>>> a database for a ton of external tools (mostly ones we've created).
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>>>>>>> from discovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've also thought about doing routine trace routes or watching
>>>>>>>>>>>>> TTLs or something like that to get some indication that topology has
>>>>>>>>>>>>> changed and then do another discovery and potential tree rebuild.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is awesome! Way to go here. Thank you for contributing
>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>> Being able to map out these complex integrations will help
>>>>>>>>>>>>>> ISPs a ton, and I really like that it is sharing common features between
>>>>>>>>>>>>>> the Splynx and UISP integrations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Robert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via
>>>>>>>>>>>>>> LibreQoS <libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I've been doing some work on getting UISP integration
>>>>>>>>>>>>>>> (and integrations in general) to work a bit more smoothly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I started by implementing a graph structure that mirrors
>>>>>>>>>>>>>>> both the networks and sites system. It's not done yet, but the basics are
>>>>>>>>>>>>>>> coming together nicely. You can see my progress so far at:
>>>>>>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate
>>>>>>>>>>>>>>>    the same IP, with a "service IP" entry
>>>>>>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>>>>>>    - Sites that go to client sites, which go to other
>>>>>>>>>>>>>>>    client sites
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I fed our network into the current graph generator, and
>>>>>>>>>>>>>>> used graphviz to spit out a directed graph:
>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>>>>>>    - Red = a site
>>>>>>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the part in "common" is designed heavily to reduce
>>>>>>>>>>>>>>> repetition. When it's done, you should be able to feed in sites, APs,
>>>>>>>>>>>>>>> clients, devices, etc. in a pretty flexible manner. Given how much code is
>>>>>>>>>>>>>>> shared between the UISP and Splynx integration code, I'm pretty sure both
>>>>>>>>>>>>>>> will be cut to a tiny fraction of the total code. :-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Robert Chacón
>>>>>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Robert Chacón
>>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>>> Dev | LibreQoS.io
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Chacón
>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>> Dev | LibreQoS.io
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> LibreQoS mailing list
>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Chacón
>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>> Dev | LibreQoS.io
>>>>>>
>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> This song goes out to all the folk that thought Stadia would work:
>>>>>
>>>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>>>> Dave Täht CEO, TekLibre, LLC
>>>>>
>>>>
>>>
>>> --
>>> This song goes out to all the folk that thought Stadia would work:
>>>
>>> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>>> Dave Täht CEO, TekLibre, LLC
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221030/eec0a74a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 573568 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221030/eec0a74a/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 115596 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221030/eec0a74a/attachment-0003.png>


More information about the LibreQoS mailing list