From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id CAC1A3B29E for ; Thu, 27 Oct 2022 12:35:31 -0400 (EDT) Received: by mail-wr1-x42a.google.com with SMTP id w14so3178061wru.8 for ; Thu, 27 Oct 2022 09:35:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xZIDIw6cVBTDOGgZlqYd8XuJp2eul2gCSs1e8uNY/CY=; b=Nkv9zikyntiyFJ57DfwWRftZ6L3SQOhCYb6mk2fiOmg0hKVFz/ylj1RGPeMcUlh4kf F5dCM7nsc5LMO6gAfEYzgSndsOPes9uuZ9KWIJA7t0+7qtnCz+XpWh5yac4rJRw/czp5 rBeUUn8EwF9FSxJPa06eMa/BAvZkhiAQuXc5j9sTzC9OR0PEeWFoI5mSAoxMg6TXql+Y v4y1MNG2NiWJu5zAhSipbxKIo4lN+FDEkBiBRdGEGSgIyWULz2Q4qZSJQE7DPn0KyElq TgufLi66wL5+PEXs2456Ix4ciD70gvJyeWzbUPm4OCnqbh0N6Y3Zx3YPd4AnhaeKuZzy Ct4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xZIDIw6cVBTDOGgZlqYd8XuJp2eul2gCSs1e8uNY/CY=; b=BXjdmdFrCbgq8SbQ4TNYkwRm7/MUxhNWeAz7vIB3K9MZJFdV5Wy/nbCKZamtU1/g9y HdttOxmtBeG9da6ZYEXsDAS7LIcltJb7sxXFEtPFSlT3Weo1ooCgEWYB7RDn5Q6Lqq2a Vl5PUING/s8W281ZElbrYFnqZOEaSn/EgdImeMbjbFPBKKtGM9JcRK81d9/deguC8TFX 3gu1Oqev/yklOQJeIn5MJkL9pG2sB0xLRNxVojgMBHjCtT0Gh+kyWMzit/QkJKsIHcN1 6OnxJ0e+e8QEMX/oslwSLBfVXJ9xg2oMUQEXegaOTRJLjAx364WWZnlHclqULNgEXmyA K8zA== X-Gm-Message-State: ACrzQf1ubLoB+UA6FNQQfUbbCIFsZHMQWjunmZ+AJfGnH+bBZYnCPtIU g+/uYvV6yEq1mixPllNrhaK+LNG+EQ4+KP5/Yz1UvpoZTCY= X-Google-Smtp-Source: AMsMyM7DvNnmC6beIpgXiKNQGud1KfNIJ3v+T6lFBQzNrVaIf53XuyVhMhgCFrQoLusrVjvvHEg37swoDPZFKY9xjEc= X-Received: by 2002:adf:f242:0:b0:236:68ef:e76e with SMTP id b2-20020adff242000000b0023668efe76emr16957395wrp.482.1666888530424; Thu, 27 Oct 2022 09:35:30 -0700 (PDT) MIME-Version: 1.0 References: <1841a0fbf16.113ad32ea2588859.8857765356573562459@phillywisper.net> In-Reply-To: <1841a0fbf16.113ad32ea2588859.8857765356573562459@phillywisper.net> From: Dave Taht Date: Thu, 27 Oct 2022 09:35:17 -0700 Message-ID: To: Mark Steckel Cc: Herbert Wolverson , libreqos Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [LibreQoS] routing protocols and daemons X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2022 16:35:32 -0000 On Thu, Oct 27, 2022 at 8:29 AM Mark Steckel via LibreQoS wrote: > > Herbert, > > Great info! (And just a quick shout out that this has quickly become th= e best signal to noise email list of any I'm currently on. Really appreciat= e the expertise and depth of knowledge. Thanks everyone!) I have a golden contact list. Most of us, though, are getting close to or past the age of retirement, but I will continue to cultivate this list with the best people I can find. If there is anyone y'all know into a culture of sharing information and problems and solutions, please ask them to join also? So delighted in my doddering years to have been learning from you all. I was and remain heartly sick and tired of the top down leader/follower trend that has taken over the internet in the last decade, and the marketing drivel, (the front page of google now reads like national enquirer, where before it had things like the economist and stuff I wanted to read) and if getting off of twitter and into email is what it takes, with a simple signup, to get rational discourse on the internet again, well, I'm all for it! Easy pass/fail test for someone suited for hearabouts, is if your potential invitee digs the song I have in my .sig. > > I also really like OSPF except when it goes wonky on me... > > I've been struggling with a OSPF issue for awhile and can't seem to track= down the source (other than generically grousing at Ubnt's firmware and so= ftware development practices). The short version of the story is that the o= spfd process consumes 100% cpu and starts consuming memory which it is even= tually exhausted. When this happens usually 2 or more routers are involved.= My working theory is that the routers involved are caught in an OSPF updat= e storm. Our internal router table is not horribly large. Oh, boy, did I wrestle with this on multiple fronts a few years ago. I ended up writing the "routing tables of death" tool: https://github.com/dtaht/rtod#readme The readme is pretty funny, I think. Enjoy. Perhaps leveraging that tool you can isolate the problem better. I found so many bugs in route handling (and reported most of them to no avail), in the linux kernel, babel, bgp, and olsr. They included Excessive updates saturating other daemons like dnsmasq and odhcpd listing on the route socket Totally wiping out the wifi (16 seconds of multicast latency on one of the devices I was testing The dameon not prioritizing essential work so it fell way, way behind Not filtering out events you didn't care about in general Not failing safe (e.g. if you are way behind, at LEAST make sure you are announcing the default routes on time) I did a pretty thorough rewrite of babeld (buggy tho) to handle loads like this and ultimately got it to scale past 4096 routes on a 400mhz mips box without falling over (bird is much better here), but in the end I concluded a rewrite in rust or go was needed with a sharp eye towards multi-cpu and smarter event handling and I wasn't up for that. The linux kernel itself didn't at the time, let you do a make before break, either - you needed to do a del/add on a better route, rather than a add/del. Some version of the kernel fixed that (later), and I think bird picked up that fix also. > mjs@PWTW04-NNNN-STREET-ST-RTR:~$ show ip route summary > IP routing table name is Default-IP-Routing-Table(0) > IP routing table maximum-paths : 8 > Total number of IPv4 routes : 861 > Total number of IPv4 paths : 861 > Route Source Networks > connected 9 > ospf 852 > Total 861 > FIB 852 Well, inject a few thousand routes with rtod and make popcorn on the cpu? > > I am slowly replacing all Ubnt routers at major sites with VyOS on DC pow= ered Supermicro servers. So far, really liking VyOS. Example: https://www.s= upermicro.com/en/products/system/Mini-ITX/SYS-E300-9D-8CN8TP.cfm I really liked the fast for a typist interface the vyos (and cisco) and mikrotik have. Always loved how lotus 123 did things. To heck with clicking. UBNT really goofed when they abandoned edgeos and laid off everyone in san jose. The stock price tho, hasn't suffered. > > A bit more below. > > > ---- On Thu, 27 Oct 2022 09:29:22 -0400 Herbert Wolverson via LibreQoS w= rote --- > > > OSPF is a pretty stable protocol, so it doesn't change all that much (= one of the reasons I like it is that it is supported everywhere, and works = cross-vendor consistently). In answer to your questions: > > > > * With OSPFv3 you can push IPv6 routes. I've only used this a bit, but= it worked fine last time we tried it. > > * You have a fair amount of flexibility on how hops talk to one anothe= r (configured on the interface). The default is still multicast, but "point= -to-point" is just a unicast broadcast (and VERY fast route coalescing; it'= s our default), > > A few years ago I briefly looked at these config options but I never impl= emented anything. This morning I updated 2 routers to your "point-to-point"= (I think...). Is anything more than the following needed on both routers (= adjusting for ports, etc)? > > set interfaces ethernet eth1 ip ospf network point-to-point > > > > > "point-to-multipoint" can do the same for 1:many relationships (I don't= use this one, I prefer one interface to one destination - even if the inte= rfaces are VLANs. Makes traffic analysis a whole lot easier). There's also = NBMA, which has you type in the addresses of adjacent neighbors (which feel= s like it defeats some of the point!) - but doesn't even broadcast. > > At major sites, we typically have 2 routers running in an active-active m= anner: router-1 has a.b.c.2, router-2 has a.b.c.3 and vrrp is used to float= a.b.c.1 between both routers. > > Is "point-to-multpoint" required in this situation? > > > > > There was a time that NBMA was the most consistent over old Ubiquiti eq= uipment, but these days the other modes work fine. > > * ECMP (Equal Cost Multipath) is awesome. If two routes to the destina= tion sum out at equal costs, traffic is split equally between them. Traffic= is split at layer 3 (on an address-port hash), so flows stay together. It'= s rock solid, we have it sharing traffic over two backhauls in a few places= - and it handles one interface going down flawlessly (with either BFD, or = point-to-point mode which coalesces so fast). > > > > Dijkstra's algorithm remains a very natural approach to mapping a grap= h (I've waxed lyrical about it in various book/article/blog posts over the = years in the game-dev world), so I find it a very comfy way to model the un= derlying network. Very easy to reason "this will go that way". > > > > So what's bad about it? > > > > * ECMP is equal; if you have routes with different costs it will only = use the lowest cost - it won't try and "mesh" some of the traffic in other = directions. Likewise, if you have two equal routes - and one of them is run= ning at a low capacity - you wind up only utilizing twice the capacity of t= he slowest link. You're often better off dropping the link over the degrade= d circuit (hence "carrier drop" features on various radios). > > * OSPF has no idea what the capacities of various links are. It'll use= the shortest cost route, and leave the details up to the lower layers of t= he stack. > > > > There have been various attempts to integrate capacity into network de= sign, and I've yet to see one that holds up well on a multi-vendor network.= If you ask a Ubiquiti Rocket AC Gen2 its capacity, it'll often give you so= me nice big number - but it'll stall out far before that number. The Force = 400C link we just put up doesn't even try to estimate its capacity. So mult= i-vendor mesh routing tends to be problematic, because the advertised capab= ility of "we'll utilize everywhere you have capacity" tends to get snarled = up in figuring out what the capacity is. I understand Teragraph is supposed= to do better there. MPLS traffic engineering was originally announced as s= olving this one, too! I don't see a lot of hope for capacity-aware routing= protocols taking off, but I imagine we'll get a few new ones announced and= then quietly forgotten as before. > > > > And lastly: once your network gets really big, OSPF tables can get too= large - and you're stuck either dividing your OSPF zones and/or using some= BGP in interior mode. You can mitigate this with some careful design. > > What is typically considered "too large" and how does "too large" type pr= oblems typically show up? > > Thanks > Mark > > > > > > > On Wed, Oct 26, 2022 at 5:35 PM dan dandenson@gmail.com> wrote: > > I have played with batman-adv quite a bit and there are some concepts = in it I really like. Not being shortest path for one, and rating a link qu= ality instead of hard up/down. I also like the layer2 model so it looks l= ike a big switch. It's very clean from an operational perspective as it be= haves essentially like an MPLS/VPLS network administratively. > > > > What I think we're missing is the integration of network attributes an= d class of service. For instance, user to 'internet' has 3 potential paths= with each having these end-to-end latency, upload throughput, download thr= oughput, and say 'quality' or packet loss. Then having your QoS engine abl= e to tag packets for how it perceives them to need routed and then have the= routing engine pick routes based on availability. So you might have a lon= ger path that will suffer some on latency because of the hops and link type= but has big bandwidth 'available' (ie large capacity and underused) so it = should ask for that flow to take the underused high capacity (and yet still= meets other criteria) path. Something considered realtime might prefer th= at 700Mbps licenced path that has lower and more predictable latency and en= ough available capacity for the job. By encouraging high throughput needs t= o take paths with a lot of availability and some mechanism to prevent occil= ations from reroutes you could keep lower latency links less busy and get l= oad balancing by a more intelligent choice. You could also have some sor= t of reservation number tagged onto that packet to ask the intermediate hop= s to reduce their available amount. If you were going to go all out on thi= s and have devices that spoke this everywhere, you could put your shapers e= verywhere as well, getting that desired egrees shapping on both sides and l= etting the network sort of reserve a bit of bandwidth for each customer bas= ed on that. > > > > Of course this means scaling issues almost inherently because those 'a= vailable capacity' numbers and packet loss need to be communicated. comput= ationally intensive. > > > > batman-adv does this in a way with it's OGM/ELP system. You can take = a longer path through a batman-adv network because of a saturated link and = it doesn't consider that saturated link 'down'. > > > > rflo was an interesting tech that did some of these once upon a time. > > > > Just thoughts. > > On Wed, Oct 26, 2022 at 3:38 PM Dave Taht via LibreQoS libreqos@lists.= bufferbloat.net> wrote: > > On Wed, Oct 26, 2022 at 1:53 PM Herbert Wolverson via LibreQoS > > libreqos@lists.bufferbloat.net> wrote: > > > > > > My name is Herbert, and I'm an OSPF addict... seriously, I love OSP= F. Right down to stub sites, not-so-stubby sites, and isolating IP blocks w= ithin a site into "stub" nets and ensuring they are aggregated properly. I = should probably go outside more... > > > > haha. > > > > My name is dave, and I think all routing protocols should have evolve= d > > much better to elegantly meet the real world problems they were tryin= g > > to solve, than they have. > > > > To avoid burying the lede, to what extent does OSPF still rely on > > multicast? How well can it carry ipv6 now? What extensions are common > > in the real WISP world? > > > > BGP needs a few more napkins. > > > > RIP was a VERY good start but we drew the wrong lessons from its > > failures, and the super-duper-trendline towards centralized > > controllers inherent in OSPF and ISIS that happened in the 90s that > > doesn't scale anywhere near as I'd like. > > > > I liked the rise of meshy 802.11 networks, I know the author of AODV > > well (charlie perkins is arguably one of the fathers of mesh > > networking, far too few have read his books from the 90s). And I've > > been involved in the "battlemesh" group for many years with those > > trying to make 'em work better on networks such as guifi, > > wlan-slovinia, etc. > > > > Backstory. Back in 07, in Nicaragua, I was (stupidly) trying to get > > ipv6 to work over nanostation m2s or m5s I forget which, and the basi= c > > option was to run two copies of the ospf daemon to manage 4 and 6 > > independently. I only had 32MB of memory and it didn't fit, so I > > started looking for alternatives, found babel, corresponded with (and > > frankly thoroughly annoyed) the author, and starting giving it a go. > > It transported 4 and 6 in the same packets, was tiny, was > > distance-vector (thus, I thought, more a match for bgp), and (to me) > > most importantly, solved the ipv4 and ipv6 routing problems in the > > same daemon at the same time, and actually fit into less memory than > > ospf did. It was good enough it seemed, to deploy to a few hundred > > routers without having to play major tricks with areas and stubs and > > so on. > > > > Babel is so simple that toke wrote a near complete implementation fro= m > > the spec, in python, during a string of extremely boring IETF > > meetings, over the course of a week. He later took on the bird port. > > Over the years we've wedged most (but not all) of the key features I > > thought a meshy wireless routing protocol should have, with > > implementations in a standalone daemon, bird, and FRR. (there was a > > quagga port at one point too. I forget what happened to toke's python > > version). > > > > https://www.rfc-editor.org/rfc/rfc8966.html babel > > https://arxiv.org/abs/1403.0445 source specific routing > > https://datatracker.ietf.org/doc/rfc8967/ HMAC authentication > > https://datatracker.ietf.org/doc/html/draft-ietf-babel-rtt-extension-= 00 > > RTT metric > > https://datatracker.ietf.org/meeting/99/materials/slides-99-babel-uni= cast-hellos-00.pdf > > unicast hellos > > > > Missing is BFD support, and the slightest bit of traction outside of > > the shrinking battlemesh communities. > > > > Althea is using babel and fq_codel in their blockchain routing thing > > (I reserve comment), and I don't know where else, besides as part of > > wireguard tunnels, babel is being used today. But I'm rather > > interested in how OSPF evolved since I last touched it, and what use > > cases it is good at and fails at? > > > > > > > On Wed, Oct 26, 2022 at 3:29 PM Dave Taht via LibreQoS libreqos@lis= ts.bufferbloat.net> wrote: > > >> > > >> OK, since I'm getting such great updates on the state of the wisp > > >> world, far more in a few days than I've had in 10 years... and btw= , no > > >> need to leap on dr science guy research questions like mine if you > > >> have like, towers flooding or the phone ringing off the hook.... > > >> > > >> What routing protocols are in use nowadays? BGP, yes, and it seems > > >> ospf is popular? > > >> > > >> How about ISIS? > > >> > > >> I figure babel has zero traction or awareness despite being mandat= ed > > >> by the ietf homenet working group. > > >> > > >> Secondly, do you rely on BGP based on the edge router or use it in > > >> software (frr? quagga? bird?). Using RPKI? Push FIBs anywhere? (ro= ute > > >> 666 in particular) > > >> > > >> Similar question related to the IGP protocol in use, where do you = rely > > >> on it, vs all the tunnels you have, on what kinds of hardware? > > >> > > >> I note that robert at some point, somewhere, pointed out how fq_co= del > > >> saved his bacon when there was a major routing mishap (as there is= no > > >> congestion control in ospf), and I'd like to hear more of that sto= ry. > > >> > > >> BATMAN has been mentioned. There's other wireless protocols I've l= iked > > >> - OLSR for example... > > >> > > >> Nobody knows what lies underneath many consumer wireless meshes > > >> although it looks like 802.11s is a starting point, none, so far a= s I > > >> know interoperate across brands. > > >> > > >> -- > > >> This song goes out to all the folk that thought Stadia would work: > > >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69= 81366665607352320-FXtz > > >> Rip Van Winkle COO, TekLibre, LLC > > >> _______________________________________________ > > >> LibreQoS mailing list > > >> LibreQoS@lists.bufferbloat.net > > >> https://lists.bufferbloat.net/listinfo/libreqos > > > > > > _______________________________________________ > > > LibreQoS mailing list > > > LibreQoS@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/libreqos > > > > > > > > -- > > This song goes out to all the folk that thought Stadia would work: > > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813= 66665607352320-FXtz > > Dave T=C3=A4ht CEO, TekLibre, LLC > > _______________________________________________ > > LibreQoS mailing list > > LibreQoS@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/libreqos > > _______________________________________________ > > LibreQoS mailing list > > LibreQoS@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/libreqos > > > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos --=20 This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-69813666656= 07352320-FXtz Dave T=C3=A4ht CEO, TekLibre, LLC