[LibreQoS] Integration system, aka fun with graph theory

Herbert Wolverson herberticus at gmail.com
Sat Oct 29 21:10:27 EDT 2022


> You talking about the relevant rfc?

In this case, the "6 to 4" refers to some integration code that was already
present - named "mikrotikFindIpv6.py". I probably should've made that more
clear. It connects to Mikrotik routers, and performs a MAC address search
in their DHCPv6 tables - finding known MAC addresses and providing the
allocated IPv6 address-space. Looks like a handy tool, and a good
work-around for UISP (Ubiquiti's combined management and CRM tool) only
kind-of supporting IPv6. The database format supports v6 addresses, but it
doesn't consistently put any data in there; worse, it doesn't show it
on-screen when it has it!

> Seems to be a need for some level of exclusions for device type, e.g. (at
least per your report), don't run ack-filter on a cambium path.

I agree with that longer-term. For now, I'm trying to get the existing
integrations up-to-speed and easy to work with. The whole "build on a good
foundation" thing. That's one thing I've learned the hard way over the
decades; it's a *lot* easier to shoot for the moon if you take the time to
come up with a good launch platform!

Longer-term, it's looking more and more like we'll need a more robust
discovery system. I've some ideas, but they are way too formative to be
useful yet. Some early thinking: there's a big disparity between what the
various back-ends WISPs (and ISPs in general) are using to manage and
monitor their networks, and the systems that handle CRM (billing,
ticketing, customer interaction, etc.). Spylnx and its ilk are great
billing systems, but don't really know a lot about your network arrangement
- it wouldn't surprise me if there are Spylnx and VISP users who also have
UISP (just the network management mode) going as well. On the other
extreme, PowerCode tries to write directly to your Mikrotik routers and
wants to know everything right down to your underwear colour.

In my mind:
* Step 1 (we're nearly there!) is to build a good foundation for
representing an IPv4/IPv6 network, that's really agnostic to all the crazy
things a WISP may be doing. It should automate all the tedious parts
(figuring out a tree from a soup of sites, access points, users -
rearranging the tree to have a "starting point", emitting the various
control files, etc.), be easy enough to use that someone could say "wow, I
need to support my management system" and be able to do so with a little
bit of hand-holding - encouraging participation.
* Step 2 would be to provide some great manual tools for the DIY crowd, and
some really good documentation to make their life easy.
* Step 3 is some kind of way to mix-and-match systems. Say you have Splynx
AND the management part of UISP. Wouldn't it be great if Spylnx could
provide all of the "plan" data, and the data be provided from UISP's
management"? It seems like that's quite do-able with a little work. We may
need to think about a management GUI at this point, just to help hold hands
a bit.
* Step 4 would be something Dan keeps asking about, ways to query hardware
that exists and build some topology around it. That would be great, and is
quite the undertaking (best tackled incrementally, and in a modular
fashion, IMHO).

This is still just the musings of a sleep-deprived brain. :-)

> Is there any particularly common set of radius servers in use?

It seems like when I poke deeply enough, most people are running FreeRADIUS
or something vendor-supplied (which is sometimes FreeRADIUS with a badge on
it). Then there's crazy people paying $10k for super high-end RADIUS
servers that aren't actually much better than the free ones. RADIUS is a
tough one, because LibreQoS isn't really well placed to directly utilize
it. Typically, RADIUS is basically a "yes or no" box, with options
attached. RADIUS queries happen on network entry (either as part of the
admissions process, part of the Ethernet security step, or from the DHCP
server) and the reply is basically "yes, you're admitted - these are your
options". The problem is, Libre doesn't necessarily see any of that - it's
inside the network. That's why we have API dependencies, even though Spylnx
and VISP are basically a really big billing system that comes bundled with
a RADIUS server. (Unfortunately, Mikrotik interprets the RADIUS replies to
make a simple queue on the router that made the request - you can script
that, but it gets messy fast).

I'll answer the second email in a bit.



On Sat, Oct 29, 2022 at 2:18 PM Dave Taht <dave.taht at gmail.com> wrote:

>
>
> On Sat, Oct 29, 2022 at 8:57 AM Herbert Wolverson via LibreQoS <
> libreqos at lists.bufferbloat.net> wrote:
>
>> Alright, the UISP side of the common integrations is pretty much feature
>> complete. I'll update the tracking issue in a bit.
>>
>>    - Per your suggestion, devices with no IP addresses (v4 or v6) are
>>    not added.
>>
>> Every device that is ipv6-ready comes up with a link-local address
> derived from the mac like fe80::6f16:fa94:f32b:e2e
> Some actually will accept things like ssh to that address
> Not that this is necessarily relevant to this bit of code. Dr irrelevant I
> am today.
> (in the context of babel, at least, you can route ipv4 and ipv6 without
> either an ipv6 or ipv4 address, and hnetd configure)
>
> I am kind of curious as to what weird configuration protocols are in
> common use today
>
> Painfully common are "smart switches" that don't listen to dhcp by default
> AND come up on 192.168.1.1
> ubnt comes up on 192.168.1.20 by defualt
> a lot of cpe comes up on 192.168.1.100 (like cable and starlink)
> I've seen stuff that uses ancient ieee protocols
> bootp and tftp are still things
>
> I've always kind of wanted a daemon on every device that would probe all
> possible ip addresses with a ttl of 2, to find rogue
> devices etc.
>
>>
>>    - Mikrotik "4 to 6" mapping is implemented. I put it in the "common"
>>    side of things, so it can be used in other integrations also. I don't have
>>    a setup on which to test it, but if I'm reading the code right then the
>>    unit test is testing it appropriately.
>>
>>
> You talking about the relevant rfc?
>
>
>>
>>    - excludeSites is supported as a common API feature. If a node is
>>    added with a name that matches an excluded site, it won't be added. The
>>    tree builder is smart enough to replace invalid "parentId" references with
>>    the shaper root, so if you have other tree items that rely on this site -
>>    they will be added to the tree. Was that the intent? (It looks pretty
>>    useful; we have a child site down the tree with a HUGE amount of load, and
>>    bumping it to the top-level with excludeSites would probably help our load
>>    balancing quite a bit)
>>       - If the intent was to exclude the site and everything underneath
>>       it, I'd have to rework things a bit. Let me know; it wasn't quite clear.
>>       - exceptionCPEs is also supported as a common API feature. It
>>    simply overrides the "parentId'' of incoming nodes with the new parent.
>>    Another potentially useful feature; if I got excludeSites the wrong away
>>    around, I'd add a "my_big_site":"" entry to push it to the top.
>>
>>
> Seems to be a need for some level of exclusions for device type, e.g. (at
> least per your report), don't run ack-filter on a cambium path.
>
>
>>
>>    - UISP integration now supports a "flat" topology option (set via
>>    uispStrategy = "flat" in ispConfig). I expanded ispConfig.example.py
>>    to include this entry.
>>
>> I'll look and see how much of the Spylnx code I can shorten with the new
>> API; I don't have a Spylnx setup to test against, making that tricky. I
>> *think* the new API should shorten things a lot. I think routers act as
>> node parents, with clients underneath them? Otherwise, a "flat" setup
>> should be a little shorter (the CSV code can be replaced with a call to the
>> graph builder). Most of the Spylnx (and VISP) users I've talked to layer
>> MPLS+VPLS to pretend to have a big, flat network and then connect via a
>> RADIUS call in the DHCP server;
>>
>
> Is there any particularly common set of radius servers in use?
>
>
>> I've always assumed that's because those systems prefer the telecom model
>> of "pretend everything is equal" to trying to model topology.*
>>
>
> Except the billing. Always the billing. Our tuesday golden plate special
> is you can download all the pr0n from our special partner netblix for 24
> hours a week! 9.95!
>
>
>>
>> I need to clean things up a bit (there's still a bit of duplicated code,
>> and I believe in the DRY principle - don't repeat yourself; Dave Thomas -
>> my boss at PragProg - coined the term in The Pragmatic Programmer, and I
>> feel obliged to use it everywhere!), and do a quick rebase (I accidentally
>> parented the branch off of a branch instead of main) - but I think I can
>> have this as a PR for you on Monday.
>>
>> * - The first big wireless network I setup used a Motorola WiMAX setup.
>> They *required* that every single AP share two VLANs (management and
>> bearer) with every other AP - all the way to the core. It kinda worked once
>> they remembered client isolation was a thing in a patch... Then again,
>> their installation instructions included connecting two ports of a router
>> together with a jumper cable, because their localhost implementation didn't
>> quite work. :-|
>>
>> On Fri, Oct 28, 2022 at 4:15 PM Robert Chacón <
>> robert.chacon at jackrabbitwireless.com> wrote:
>>
>>> Awesome work. It succeeded in building the topology and creating
>>> ShapedDevices.csv for my network. It even graphed it perfectly. Nice!
>>> I notice that in ShapedDevices.csv it does add CPE radios (which in our
>>> case we don't shape - they are in bridge mode) with IPv4 and IPv6s both
>>> being empty lists [].
>>> This is not necessarily bad, but it may lead to empty leaf classes being
>>> created on LibreQoS.py runs. Not a huge deal, it just makes the minor class
>>> counter increment toward the 32k limit faster.
>>> Do you think perhaps we should check:
>>> *if (len(IPv4) == 0) and (len(IPv6) == 0):*
>>> *   # Skip adding this entry to ShapedDevices.csv*
>>> Or something similar around line 329 of integrationCommon.py?
>>> Open to your suggestions there.
>>>
>>>
>>>
>>> On Fri, Oct 28, 2022 at 1:55 PM Herbert Wolverson via LibreQoS <
>>> libreqos at lists.bufferbloat.net> wrote:
>>>
>>>> One more update, and I'm going to sleep until "pick up daughter" time.
>>>> :-)
>>>>
>>>> The tree at
>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>> can now build a network.json, ShapedDevices.csv, and
>>>> integrationUISPBandwidth.csv and follows pretty much the same logic as the
>>>> previous importer - other than using data links to build the hierarchy and
>>>> letting (requiring, currently) you specify the root node. It's handling our
>>>> bizarre UISP setup pretty well now - so if anyone wants to test it (I
>>>> recommend just running integrationUISP.py and checking the output rather
>>>> than throwing it into production), I'd appreciate any feedback.
>>>>
>>>> Still on my list: handling the Mikrotik IPv6 connections, and
>>>> exceptionCPE and site exclusion.
>>>>
>>>> If you want the pretty graphics, you need to "pip install graphviz" and
>>>> "sudo apt install graphviz". It *should* detect that these aren't present
>>>> and not try to draw pictures, otherwise.
>>>>
>>>> On Fri, Oct 28, 2022 at 2:06 PM Robert Chacón <
>>>> robert.chacon at jackrabbitwireless.com> wrote:
>>>>
>>>>> Wow. This is very nicely done. Awesome work!
>>>>>
>>>>> On Fri, Oct 28, 2022 at 11:44 AM Herbert Wolverson via LibreQoS <
>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>
>>>>>> The integration is coming along nicely. Some progress updates:
>>>>>>
>>>>>>    - You can specify a variable in ispConfig.py named "uispSite".
>>>>>>    This sets where in the topology you want the tree to start. This has two
>>>>>>    purposes:
>>>>>>       - It's hard to be psychic and know for sure where the shaper
>>>>>>       is in the network.
>>>>>>       - You could run multiple shapers at different egress points,
>>>>>>       with failover - and rebuild the entire topology from the point of view of a
>>>>>>       network node.
>>>>>>    - "Child node with children" are now automatically converted into
>>>>>>    a "(Generated Site) name" site, and their children rearranged. This:
>>>>>>       - Allows you to set the "site" bandwidth independently of the
>>>>>>       client site bandwidth.
>>>>>>       - Makes for easier trees, because we're inserting the site
>>>>>>       that really should be there.
>>>>>>    - Network.json generation (not the shaped devices file yet) is
>>>>>>    automatically generated from a tree, once PrepareTree() and
>>>>>>    createNetworkJson() are called.
>>>>>>       - There's a unit test that generates the network.example.json
>>>>>>       file and compares it with the original to ensure that they match.
>>>>>>    - Unit test coverage hits every function in the graph system, now.
>>>>>>
>>>>>> I'm liking this setup. With the non-vendor-specific logic contained
>>>>>> inside the NetworkGraph type, the actual UISP code to generate the example
>>>>>> tree is down to 65
>>>>>> lines of code, including comments. That'll grow a bit as I re-insert
>>>>>> some automatic speed limit determination, AP/Site speed overrides (
>>>>>> i.e. the integrationUISPbandwidths.csv file). Still pretty clean.
>>>>>>
>>>>>> Creating the network.example.json file only requires:
>>>>>> from integrationCommon import NetworkGraph, NetworkNode, NodeType
>>>>>>         import json
>>>>>>         net = NetworkGraph()
>>>>>>         net.addRawNode(NetworkNode("Site_1", "Site_1", "", NodeType.
>>>>>> site, 1000, 1000))
>>>>>>         net.addRawNode(NetworkNode("Site_2", "Site_2", "", NodeType.
>>>>>> site, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("AP_A", "AP_A", "Site_1", NodeType
>>>>>> .ap, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("Site_3", "Site_3", "Site_1",
>>>>>> NodeType.site, 500, 500))
>>>>>>         net.addRawNode(NetworkNode("PoP_5", "PoP_5", "Site_3",
>>>>>> NodeType.site, 200, 200))
>>>>>>         net.addRawNode(NetworkNode("AP_9", "AP_9", "PoP_5", NodeType.
>>>>>> ap, 120, 120))
>>>>>>         net.addRawNode(NetworkNode("PoP_6", "PoP_6", "PoP_5",
>>>>>> NodeType.site, 60, 60))
>>>>>>         net.addRawNode(NetworkNode("AP_11", "AP_11", "PoP_6",
>>>>>> NodeType.ap, 30, 30))
>>>>>>         net.addRawNode(NetworkNode("PoP_1", "PoP_1", "Site_2",
>>>>>> NodeType.site, 200, 200))
>>>>>>         net.addRawNode(NetworkNode("AP_7", "AP_7", "PoP_1", NodeType.
>>>>>> ap, 100, 100))
>>>>>>         net.addRawNode(NetworkNode("AP_1", "AP_1", "Site_2", NodeType
>>>>>> .ap, 150, 150))
>>>>>>         net.prepareTree()
>>>>>>         net.createNetworkJson()
>>>>>>
>>>>>> (The id and name fields are duplicated right now, I'm using readable
>>>>>> names to keep me sane. The third string is the parent, and the last two
>>>>>> numbers are bandwidth limits)
>>>>>> The nice, readable format being:
>>>>>> NetworkNode(id="Site_1", displayName="Site_1", parentId="", type=
>>>>>> NodeType.site, download=1000, upload=1000)
>>>>>>
>>>>>> That in turns gives you the example network:
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2022 at 7:40 AM Herbert Wolverson <
>>>>>> herberticus at gmail.com> wrote:
>>>>>>
>>>>>>> Dave: I love those Gource animations! Game development is my other
>>>>>>> hobby, I could easily get lost for weeks tweaking the shaders to make the
>>>>>>> glow "just right". :-)
>>>>>>>
>>>>>>> Dan: Discovery would be nice, but I don't think we're ready to look
>>>>>>> in that direction yet. I'm trying to build a "common grammar" to make it
>>>>>>> easier to express network layout from integrations; that would be another
>>>>>>> form/layer of integration and a lot easier to work with once there's a
>>>>>>> solid foundation. Preseem does some of this (admittedly over-eagerly;
>>>>>>> nothing needs to query SNMP that often!), and the SNMP route is quite
>>>>>>> remarkably convoluted. Their support turned on a few "extra" modules to
>>>>>>> deal with things like PMP450 clients that change MAC when you put them in
>>>>>>> bridge mode vs NAT mode (and report the bridge mode CPE in some places
>>>>>>> either way), Elevate CPEs that almost but not quite make sense. Robert's
>>>>>>> code has the beginnings of some of this, scanning Mikrotik routers for IPv6
>>>>>>> allocations by MAC (this is also the hardest part for me to test, since I
>>>>>>> don't have any v6 to test, currently).
>>>>>>>
>>>>>>> We tend to use UISP as the "source of truth" and treat it like a
>>>>>>> database for a ton of external tools (mostly ones we've created).
>>>>>>>
>>>>>>> On Thu, Oct 27, 2022 at 7:27 PM dan <dandenson at gmail.com> wrote:
>>>>>>>
>>>>>>>> we're pretty similar in that we've made UISP a mess.  Multiple
>>>>>>>> paths to a pop.  multiple pops on the network.  failover between pops.
>>>>>>>> Lots of 'other' devices. handing out /29 etc to customers.
>>>>>>>>
>>>>>>>> Some sort of discovery would be nice.  Ideally though, pulling
>>>>>>>> something from SNMP or router APIs etc to build the paths, but having a
>>>>>>>> 'network elements' list with each of the links described.  ie, backhaul 12
>>>>>>>> has MACs ..01 and ...02 at 300x100 and then build the topology around that
>>>>>>>> from discovery.
>>>>>>>>
>>>>>>>> I've also thought about doing routine trace routes or watching TTLs
>>>>>>>> or something like that to get some indication that topology has changed and
>>>>>>>> then do another discovery and potential tree rebuild.
>>>>>>>>
>>>>>>>> On Thu, Oct 27, 2022 at 3:48 PM Robert Chacón via LibreQoS <
>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>
>>>>>>>>> This is awesome! Way to go here. Thank you for contributing this.
>>>>>>>>> Being able to map out these complex integrations will help ISPs a
>>>>>>>>> ton, and I really like that it is sharing common features between the
>>>>>>>>> Splynx and UISP integrations.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>> On Thu, Oct 27, 2022 at 3:33 PM Herbert Wolverson via LibreQoS <
>>>>>>>>> libreqos at lists.bufferbloat.net> wrote:
>>>>>>>>>
>>>>>>>>>> So I've been doing some work on getting UISP integration (and
>>>>>>>>>> integrations in general) to work a bit more smoothly.
>>>>>>>>>>
>>>>>>>>>> I started by implementing a graph structure that mirrors both the
>>>>>>>>>> networks and sites system. It's not done yet, but the basics are coming
>>>>>>>>>> together nicely. You can see my progress so far at:
>>>>>>>>>> https://github.com/thebracket/LibreQoS/tree/integration-common-graph
>>>>>>>>>>
>>>>>>>>>> Our UISP instance is a *great* testcase for torturing the
>>>>>>>>>> system. I even found a case of UISP somehow auto-generating a circular
>>>>>>>>>> portion of the tree. We have:
>>>>>>>>>>
>>>>>>>>>>    - Non Ubiquiti devices as "other devices"
>>>>>>>>>>    - Sections that need shaping by subnet (e.g. "all of
>>>>>>>>>>    192.168.1.0/24 shared 100 mbit")
>>>>>>>>>>    - Bridge mode devices using Option 82 to always allocate the
>>>>>>>>>>    same IP, with a "service IP" entry
>>>>>>>>>>    - Various bits of infrastructure mapped
>>>>>>>>>>    - Sites that go to client sites, which go to other client
>>>>>>>>>>    sites
>>>>>>>>>>
>>>>>>>>>> In other words, over the years we've unleashed a bit of a
>>>>>>>>>> monster. Cleaning it up is a useful talk, but I wanted the integration to
>>>>>>>>>> be able to handle pathological cases like us!
>>>>>>>>>>
>>>>>>>>>> So I fed our network into the current graph generator, and used
>>>>>>>>>> graphviz to spit out a directed graph:
>>>>>>>>>> [image: image.png]
>>>>>>>>>> That doesn't include client sites! Legend:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Green = the root site.
>>>>>>>>>>    - Red = a site
>>>>>>>>>>    - Blue = an access point
>>>>>>>>>>    - Magenta = a client site that has children
>>>>>>>>>>
>>>>>>>>>> So the part in "common" is designed heavily to reduce repetition.
>>>>>>>>>> When it's done, you should be able to feed in sites, APs, clients, devices,
>>>>>>>>>> etc. in a pretty flexible manner. Given how much code is shared between the
>>>>>>>>>> UISP and Splynx integration code, I'm pretty sure both will be cut to a
>>>>>>>>>> tiny fraction of the total code. :-)
>>>>>>>>>>
>>>>>>>>>> I can't post the full tree, it's full of client names.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LibreQoS mailing list
>>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Chacón
>>>>>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>>>>>> _______________________________________________
>>>>>>>>> LibreQoS mailing list
>>>>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>> LibreQoS mailing list
>>>>>> LibreQoS at lists.bufferbloat.net
>>>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Chacón
>>>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>>>> Dev | LibreQoS.io
>>>>>
>>>>> _______________________________________________
>>>> LibreQoS mailing list
>>>> LibreQoS at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/libreqos
>>>>
>>>
>>>
>>> --
>>> Robert Chacón
>>> CEO | JackRabbit Wireless LLC <http://jackrabbitwireless.com>
>>> Dev | LibreQoS.io
>>>
>>> _______________________________________________
>> LibreQoS mailing list
>> LibreQoS at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/libreqos
>>
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
>
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221029/ff58035b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 573568 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221029/ff58035b/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 115596 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20221029/ff58035b/attachment-0003.png>


More information about the LibreQoS mailing list