[LibreQoS] routing protocols and daemons

Dave Taht dave.taht at gmail.com
Thu Oct 27 12:35:17 EDT 2022

On Thu, Oct 27, 2022 at 8:29 AM Mark Steckel via LibreQoS
<libreqos at lists.bufferbloat.net> wrote:
> Herbert,
> Great info!   (And just a quick shout out that this has quickly become the best signal to noise email list of any I'm currently on. Really appreciate the expertise and depth of knowledge. Thanks everyone!)

I have a golden contact list. Most of us, though, are getting close to
or past the age of retirement, but I will continue to cultivate this
list with the best people I can find. If there is anyone y'all know
into a culture of sharing information and problems and solutions,
please ask them to join also? So delighted in my doddering years to
have been learning from you all.

I was and remain heartly sick and tired of the top down
leader/follower trend that has taken over the internet in the last
decade, and the marketing drivel, (the front page of google now reads
like national enquirer, where before it had things like the economist
and stuff I wanted to read) and if getting off of twitter and into
email is what it takes, with a simple signup, to get rational
discourse on the internet again, well, I'm all for it!

Easy pass/fail test for someone suited for hearabouts, is if your
potential invitee digs the song I have in my .sig.

> I also really like OSPF except when it goes wonky on me...
> I've been struggling with a OSPF issue for awhile and can't seem to track down the source (other than generically grousing at Ubnt's firmware and software development practices). The short version of the story is that the ospfd process consumes 100% cpu and starts consuming memory which it is eventually exhausted. When this happens usually 2 or more routers are involved. My working theory is that the routers involved are caught in an OSPF update storm. Our internal router table is not horribly large.

Oh, boy, did I wrestle with this on multiple fronts a few years ago. I
ended up writing the "routing tables of death" tool:

The readme is pretty funny, I think. Enjoy. Perhaps leveraging that
tool you can isolate the problem better.

I found so many bugs in route handling (and reported most of them to
no avail), in the linux kernel, babel,
bgp, and olsr. They included

Excessive updates saturating other daemons like dnsmasq and odhcpd
listing on the route socket
Totally wiping out the wifi (16 seconds of multicast latency on one of
the devices I was testing
The dameon not prioritizing essential work so it fell way, way behind
Not filtering out events you didn't care about in general
Not failing safe (e.g. if you are way behind, at LEAST make sure you
are announcing the default routes on time)

I did a pretty thorough rewrite of babeld (buggy tho) to handle loads
like this and ultimately got it to
scale past 4096 routes on a 400mhz mips box without falling over (bird
is much better here), but in the end I concluded a rewrite in rust or
go was needed with a sharp eye towards multi-cpu and smarter event
handling and I wasn't up for that.

The linux kernel itself didn't at the time, let you do a make before
break, either - you needed to do a
del/add on a better route, rather than a add/del. Some version of the
kernel fixed that (later), and I
think bird picked up that fix also.

> mjs at PWTW04-NNNN-STREET-ST-RTR:~$ show ip route summary
> IP routing table name is Default-IP-Routing-Table(0)
> IP routing table maximum-paths   : 8
> Total number of IPv4 routes      : 861
> Total number of IPv4 paths       : 861
> Route Source    Networks
> connected       9
> ospf            852
> Total           861
> FIB             852

Well, inject a few thousand routes with rtod and make popcorn on the cpu?

> I am slowly replacing all Ubnt routers at major sites with VyOS on DC powered Supermicro servers. So far, really liking VyOS. Example: https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-9D-8CN8TP.cfm

I really liked the fast for a typist interface the vyos (and cisco)
and mikrotik have. Always loved how lotus 123 did things. To heck with

UBNT really goofed when they abandoned edgeos and laid off everyone in
san jose. The stock price tho,
hasn't suffered.

> A bit more below.
> ---- On Thu, 27 Oct 2022 09:29:22 -0400 Herbert Wolverson via LibreQoS  wrote ---
>  > OSPF is a pretty stable protocol, so it doesn't change all that much (one of the reasons I like it is that it is supported everywhere, and works cross-vendor consistently). In answer to your questions:
>  >
>  > * With OSPFv3 you can push IPv6 routes. I've only used this a bit, but it worked fine last time we tried it.
>  > * You have a fair amount of flexibility on how hops talk to one another (configured on the interface). The default is still multicast, but "point-to-point" is just a unicast broadcast (and VERY fast route coalescing; it's our default),
> A few years ago I briefly looked at these config options but I never implemented anything. This morning I updated 2 routers to your "point-to-point" (I think...). Is anything more than the following needed on both routers (adjusting for ports, etc)?
> set interfaces ethernet eth1 ip ospf network point-to-point
> > "point-to-multipoint" can do the same for 1:many relationships (I don't use this one, I prefer one interface to one destination - even if the interfaces are VLANs. Makes traffic analysis a whole lot easier). There's also NBMA, which has you type in the addresses of adjacent neighbors (which feels like it defeats some of the point!) - but doesn't even broadcast.
> At major sites, we typically have 2 routers running in an active-active manner: router-1 has a.b.c.2, router-2 has a.b.c.3 and vrrp is used to float a.b.c.1 between both routers.
> Is "point-to-multpoint" required in this situation?
> > There was a time that NBMA was the most consistent over old Ubiquiti equipment, but these days the other modes work fine.
>  > * ECMP (Equal Cost Multipath) is awesome. If two routes to the destination sum out at equal costs, traffic is split equally between them. Traffic is split at layer 3 (on an address-port hash), so flows stay together. It's rock solid, we have it sharing traffic over two backhauls in a few places - and it handles one interface going down flawlessly (with either BFD, or point-to-point mode which coalesces so fast).
>  >
>  > Dijkstra's algorithm remains a very natural approach to mapping a graph (I've waxed lyrical about it in various book/article/blog posts over the years in the game-dev world), so I find it a very comfy way to model the underlying network. Very easy to reason "this will go that way".
>  >
>  > So what's bad about it?
>  >
>  > * ECMP is equal; if you have routes with different costs it will only use the lowest cost - it won't try and "mesh" some of the traffic in other directions. Likewise, if you have two equal routes - and one of them is running at a low capacity - you wind up only utilizing twice the capacity of the slowest link. You're often better off dropping the link over the degraded circuit (hence "carrier drop" features on various radios).
>  > * OSPF has no idea what the capacities of various links are. It'll use the shortest cost route, and leave the details up to the lower layers of the stack.
>  >
>  > There have been various attempts to integrate capacity into network design, and I've yet to see one that holds up well on a multi-vendor network. If you ask a Ubiquiti Rocket AC Gen2 its capacity, it'll often give you some nice big number - but it'll stall out far before that number. The Force 400C link we just put up doesn't even try to estimate its capacity. So multi-vendor mesh routing tends to be problematic, because the advertised capability of "we'll utilize everywhere you have capacity" tends to get snarled up in figuring out what the capacity is. I understand Teragraph is supposed to do better there. MPLS traffic engineering was originally announced as solving this one, too! I don't see a lot of  hope for capacity-aware routing protocols taking off, but I imagine we'll get a few new ones announced and then quietly forgotten as before.
>  >
>  > And lastly: once your network gets really big, OSPF tables can get too large - and you're stuck either dividing your OSPF zones and/or using some BGP in interior mode. You can mitigate this with some careful design.
> What is typically considered "too large" and how does "too large" type problems typically show up?
> Thanks
> Mark
>  >
>  > On Wed, Oct 26, 2022 at 5:35 PM dan dandenson at gmail.com> wrote:
>  > I have played with batman-adv quite a bit and there are some concepts in it I really like.  Not being shortest path for one, and rating a link quality instead of hard up/down.   I also like the layer2 model so it looks like a big switch.  It's very clean from an operational perspective as it behaves essentially like an MPLS/VPLS network administratively.
>  >
>  > What I think we're missing is the integration of network attributes and class of service.  For instance, user to 'internet' has 3 potential paths with each having these end-to-end latency, upload throughput, download throughput, and say 'quality' or packet loss.  Then having your QoS engine able to tag packets for how it perceives them to need routed and then have the routing engine pick routes based on availability.  So you might have a longer path that will suffer some on latency because of the hops and link type but has big bandwidth 'available' (ie large capacity and underused) so it should ask for that flow to take the underused high capacity (and yet still meets other criteria) path.  Something considered realtime might prefer that 700Mbps licenced path that has lower and more predictable latency and enough available capacity for the job. By encouraging high throughput needs to take paths with a lot of availability and some mechanism to prevent occilations from reroutes you could keep lower latency links less busy and get load balancing by a more intelligent choice.    You could also have some sort of reservation number tagged onto that packet to ask the intermediate hops to reduce their available amount.  If you were going to go all out on this and have devices that spoke this everywhere, you could put your shapers everywhere as well, getting that desired egrees shapping on both sides and letting the network sort of reserve a bit of bandwidth for each customer based on that.
>  >
>  > Of course this means scaling issues almost inherently because those 'available capacity' numbers and packet loss need to be communicated.  computationally intensive.
>  >
>  > batman-adv does this in a way with it's OGM/ELP system.  You can take a longer path through a batman-adv network because of a saturated link and it doesn't consider that saturated link 'down'.
>  >
>  > rflo was an interesting tech that did some of these once upon a time.
>  >
>  > Just thoughts.
>  > On Wed, Oct 26, 2022 at 3:38 PM Dave Taht via LibreQoS libreqos at lists.bufferbloat.net> wrote:
>  > On Wed, Oct 26, 2022 at 1:53 PM Herbert Wolverson via LibreQoS
>  >  libreqos at lists.bufferbloat.net> wrote:
>  >  >
>  >  > My name is Herbert, and I'm an OSPF addict... seriously, I love OSPF. Right down to stub sites, not-so-stubby sites, and isolating IP blocks within a site into "stub" nets and ensuring they are aggregated properly. I should probably go outside more...
>  >
>  >  haha.
>  >
>  >  My name is dave, and I think all routing protocols should have evolved
>  >  much better to elegantly meet the real world problems they were trying
>  >  to solve, than they have.
>  >
>  >  To avoid burying the lede, to what extent does OSPF still rely on
>  >  multicast? How well can it carry ipv6 now? What extensions are common
>  >  in the real WISP world?
>  >
>  >  BGP needs a few more napkins.
>  >
>  >  RIP was a VERY good start but we drew the wrong lessons from its
>  >  failures, and the super-duper-trendline towards centralized
>  >  controllers inherent in OSPF and ISIS that happened in the 90s that
>  >  doesn't scale anywhere near as I'd like.
>  >
>  >  I liked the rise of meshy 802.11 networks, I know the author of AODV
>  >  well (charlie perkins is arguably one of the fathers of mesh
>  >  networking, far too few have read his books from the 90s). And I've
>  >  been involved in the "battlemesh" group for many years with those
>  >  trying to make 'em work better on networks such as guifi,
>  >  wlan-slovinia, etc.
>  >
>  >  Backstory. Back in 07, in Nicaragua, I was (stupidly) trying to get
>  >  ipv6 to work over nanostation m2s or m5s I forget which, and the basic
>  >  option was to run two copies of the ospf daemon to manage 4 and 6
>  >  independently. I only had 32MB of memory and it didn't fit, so I
>  >  started looking for alternatives, found babel, corresponded with (and
>  >  frankly thoroughly annoyed) the author, and starting giving it a go.
>  >  It transported 4 and 6 in the same packets, was tiny, was
>  >  distance-vector (thus, I thought, more a match for bgp), and (to me)
>  >  most importantly, solved the ipv4 and ipv6 routing problems in the
>  >  same daemon at the same time, and actually fit into less memory than
>  >  ospf did. It was good enough it seemed, to deploy to a few hundred
>  >  routers without having to play major tricks with areas and stubs and
>  >  so on.
>  >
>  >  Babel is so simple that toke wrote a near complete implementation from
>  >  the spec, in python, during a string of extremely boring IETF
>  >  meetings, over the course of a week. He later took on the bird port.
>  >  Over the years we've wedged most (but not all) of the key features I
>  >  thought a meshy wireless routing protocol should have, with
>  >  implementations in a standalone daemon, bird, and FRR. (there was a
>  >  quagga port at one point too. I forget what happened to toke's python
>  >  version).
>  >
>  >  https://www.rfc-editor.org/rfc/rfc8966.html babel
>  >  https://arxiv.org/abs/1403.0445 source specific routing
>  >  https://datatracker.ietf.org/doc/rfc8967/ HMAC authentication
>  >  https://datatracker.ietf.org/doc/html/draft-ietf-babel-rtt-extension-00
>  >  RTT metric
>  >  https://datatracker.ietf.org/meeting/99/materials/slides-99-babel-unicast-hellos-00.pdf
>  >   unicast hellos
>  >
>  >  Missing is BFD support, and the slightest bit of traction outside of
>  >  the shrinking battlemesh communities.
>  >
>  >  Althea is using babel and fq_codel in their blockchain routing thing
>  >  (I reserve comment), and I don't know where else, besides as part of
>  >  wireguard tunnels, babel is being used today. But I'm rather
>  >  interested in how OSPF evolved since I last touched it, and what use
>  >  cases it is good at and fails at?
>  >
>  >
>  >  > On Wed, Oct 26, 2022 at 3:29 PM Dave Taht via LibreQoS libreqos at lists.bufferbloat.net> wrote:
>  >  >>
>  >  >> OK, since I'm getting such great updates on the state of the wisp
>  >  >> world, far more in a few days than I've had in 10 years... and btw, no
>  >  >> need to leap on dr science guy research questions like mine if you
>  >  >> have like, towers flooding or the phone ringing off the hook....
>  >  >>
>  >  >> What routing protocols are in use nowadays? BGP, yes, and it seems
>  >  >> ospf is popular?
>  >  >>
>  >  >> How about ISIS?
>  >  >>
>  >  >> I figure babel has zero traction or awareness despite being mandated
>  >  >> by the ietf homenet working group.
>  >  >>
>  >  >> Secondly, do you rely on BGP based on the edge router or use it in
>  >  >> software (frr? quagga? bird?). Using RPKI? Push FIBs anywhere? (route
>  >  >> 666 in particular)
>  >  >>
>  >  >> Similar question related to the IGP protocol in use, where do you rely
>  >  >> on it, vs all the tunnels you have, on what kinds of hardware?
>  >  >>
>  >  >> I note that robert at some point, somewhere, pointed out how fq_codel
>  >  >> saved his bacon when there was a major routing mishap (as there is no
>  >  >> congestion control in ospf), and I'd like to hear more of that story.
>  >  >>
>  >  >> BATMAN has been mentioned. There's other wireless protocols I've liked
>  >  >> - OLSR for example...
>  >  >>
>  >  >> Nobody knows what lies underneath many consumer wireless meshes
>  >  >> although it looks like 802.11s is a starting point, none, so far as I
>  >  >> know interoperate across brands.
>  >  >>
>  >  >> --
>  >  >> This song goes out to all the folk that thought Stadia would work:
>  >  >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>  >  >> Rip Van Winkle COO, TekLibre, LLC
>  >  >> _______________________________________________
>  >  >> LibreQoS mailing list
>  >  >> LibreQoS at lists.bufferbloat.net
>  >  >> https://lists.bufferbloat.net/listinfo/libreqos
>  >  >
>  >  > _______________________________________________
>  >  > LibreQoS mailing list
>  >  > LibreQoS at lists.bufferbloat.net
>  >  > https://lists.bufferbloat.net/listinfo/libreqos
>  >
>  >
>  >
>  >  --
>  >  This song goes out to all the folk that thought Stadia would work:
>  >  https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
>  >  Dave Täht CEO, TekLibre, LLC
>  >  _______________________________________________
>  >  LibreQoS mailing list
>  >  LibreQoS at lists.bufferbloat.net
>  >  https://lists.bufferbloat.net/listinfo/libreqos
>  > _______________________________________________
>  > LibreQoS mailing list
>  > LibreQoS at lists.bufferbloat.net
>  > https://lists.bufferbloat.net/listinfo/libreqos
>  >
> _______________________________________________
> LibreQoS mailing list
> LibreQoS at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/libreqos

This song goes out to all the folk that thought Stadia would work:
Dave Täht CEO, TekLibre, LLC

More information about the LibreQoS mailing list