From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sender4-op-o11.zoho.com (sender4-op-o11.zoho.com [136.143.188.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 602C93B29E for ; Thu, 27 Oct 2022 11:29:15 -0400 (EDT) ARC-Seal: i=1; a=rsa-sha256; t=1666884552; cv=none; d=zohomail.com; s=zohoarc; b=kcEsJVaENmnOA4dRx3YPObpw+kL1cTNtaQ6tLpZhWGHiiEqTa3KbV/wrbwjQOHGaVkUpNkc1lSmzJ+tYGl4HhtdW8amLqwaxEnbQGKZDJwaa7WLo4Xlp3xVqF+3ZmOXR97NtpY9hQ5VUsYkWUNQlXa05r0EzVO6+Mco2K3WTazY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666884552; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=MJJJQmFFnU9uHhpNCvq8zUqwPf1dBNqny17b2mM1duk=; b=VXK+YklQLQhSwwRoHcYE5PQ6BUDvQ3GszJV6mw8Z5pG/lDO/BanmZpU75EQi1LRNgEUP2m/Zm25pz+0eMDXVeqyOoHm+D+eZ5+cd3b4Sj6aILaXu/t8HVP1WUsUWGX9Y70yzxkOwrKQjFcXu69KEN8IjkY0wtbY52XggnAFXtfE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=phillywisper.net; spf=pass smtp.mailfrom=mjs@phillywisper.net; dmarc=pass header.from= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=mail; d=phillywisper.net; h=date:from:to:cc:message-id:in-reply-to:references:subject:mime-version:content-type:user-agent; b=QQ01e4o5fPugu+jllgXHIuY/A4SqOkTGv4xj8GnzcKvH8LkxbQat0jJlHkQEz8deHC9VFB9VdZVw 1vhINb7f1zcYIeH/Ng5erxLY1Pueegoh8wCYjLigReakTp7Ji5ZEFWJC7v7fVbENHtyhMU6kDY+i NVYtTdBx+jHIKhTWPBo= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1666884552; s=mail; d=phillywisper.net; i=mjs@phillywisper.net; h=Date:Date:From:From:To:To:Cc:Cc:Message-ID:In-Reply-To:References:Subject:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-Id:Reply-To; bh=MJJJQmFFnU9uHhpNCvq8zUqwPf1dBNqny17b2mM1duk=; b=dvqNTitfDr7oi1Nj7OE39/Z2wgqO/Xq2kNPSJF2Gx/W7aUHol5cSjtzgyDvbpclt vYiMRTKWvfJFRL0RTwJ7GVcr4DxJkjdaTyVwKlaRvH2nvuIFcgpz/Zmeqb4G43h5bMt M4zDVgmApwtyzubAbVBXcd5uOyiaZ2mN1yMQwQm8= Received: from mail.zoho.com by mx.zohomail.com with SMTP id 16668845504579.187830029217707; Thu, 27 Oct 2022 08:29:10 -0700 (PDT) Date: Thu, 27 Oct 2022 11:29:10 -0400 From: Mark Steckel To: "Herbert Wolverson" Cc: "libreqos" Message-ID: <1841a0fbf16.113ad32ea2588859.8857765356573562459@phillywisper.net> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Importance: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail Subject: Re: [LibreQoS] routing protocols and daemons X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2022 15:29:15 -0000 Herbert, Great info! (And just a quick shout out that this has quickly become the = best signal to noise email list of any I'm currently on. Really appreciate = the expertise and depth of knowledge. Thanks everyone!) I also really like OSPF except when it goes wonky on me... I've been struggling with a OSPF issue for awhile and can't seem to track d= own the source (other than generically grousing at Ubnt's firmware and soft= ware development practices). The short version of the story is that the osp= fd process consumes 100% cpu and starts consuming memory which it is eventu= ally exhausted. When this happens usually 2 or more routers are involved. M= y working theory is that the routers involved are caught in an OSPF update = storm. Our internal router table is not horribly large. mjs@PWTW04-NNNN-STREET-ST-RTR:~$ show ip route summary=20 IP routing table name is Default-IP-Routing-Table(0) IP routing table maximum-paths : 8 Total number of IPv4 routes : 861 Total number of IPv4 paths : 861 Route Source Networks connected 9 ospf 852 Total 861 FIB 852 I am slowly replacing all Ubnt routers at major sites with VyOS on DC power= ed Supermicro servers. So far, really liking VyOS. Example: https://www.sup= ermicro.com/en/products/system/Mini-ITX/SYS-E300-9D-8CN8TP.cfm A bit more below. ---- On Thu, 27 Oct 2022 09:29:22 -0400 Herbert Wolverson via LibreQoS wro= te --- > OSPF is a pretty stable protocol, so it doesn't change all that much (on= e of the reasons I like it is that it is supported everywhere, and works cr= oss-vendor consistently). In answer to your questions: >=20 > * With OSPFv3 you can push IPv6 routes. I've only used this a bit, but i= t worked fine last time we tried it. > * You have a fair amount of flexibility on how hops talk to one another = (configured on the interface). The default is still multicast, but "point-t= o-point" is just a unicast broadcast (and VERY fast route coalescing; it's = our default), A few years ago I briefly looked at these config options but I never implem= ented anything. This morning I updated 2 routers to your "point-to-point" (= I think...). Is anything more than the following needed on both routers (ad= justing for ports, etc)? set interfaces ethernet eth1 ip ospf network point-to-point > "point-to-multipoint" can do the same for 1:many relationships (I don't u= se this one, I prefer one interface to one destination - even if the interf= aces are VLANs. Makes traffic analysis a whole lot easier). There's also NB= MA, which has you type in the addresses of adjacent neighbors (which feels = like it defeats some of the point!) - but doesn't even broadcast.=20 At major sites, we typically have 2 routers running in an active-active man= ner: router-1 has a.b.c.2, router-2 has a.b.c.3 and vrrp is used to float a= .b.c.1 between both routers. Is "point-to-multpoint" required in this situation? > There was a time that NBMA was the most consistent over old Ubiquiti equi= pment, but these days the other modes work fine. > * ECMP (Equal Cost Multipath) is awesome. If two routes to the destinati= on sum out at equal costs, traffic is split equally between them. Traffic i= s split at layer 3 (on an address-port hash), so flows stay together. It's = rock solid, we have it sharing traffic over two backhauls in a few places -= and it handles one interface going down flawlessly (with either BFD, or po= int-to-point mode which coalesces so fast). >=20 > Dijkstra's algorithm remains a very natural approach to mapping a graph = (I've waxed lyrical about it in various book/article/blog posts over the ye= ars in the game-dev world), so I find it a very comfy way to model the unde= rlying network. Very easy to reason "this will go that way". >=20 > So what's bad about it? >=20 > * ECMP is equal; if you have routes with different costs it will only us= e the lowest cost - it won't try and "mesh" some of the traffic in other di= rections. Likewise, if you have two equal routes - and one of them is runni= ng at a low capacity - you wind up only utilizing twice the capacity of the= slowest link. You're often better off dropping the link over the degraded = circuit (hence "carrier drop" features on various radios). > * OSPF has no idea what the capacities of various links are. It'll use t= he shortest cost route, and leave the details up to the lower layers of the= stack. >=20 > There have been various attempts to integrate capacity into network desi= gn, and I've yet to see one that holds up well on a multi-vendor network. I= f you ask a Ubiquiti Rocket AC Gen2 its capacity, it'll often give you some= nice big number - but it'll stall out far before that number. The Force 40= 0C link we just put up doesn't even try to estimate its capacity. So multi-= vendor mesh routing tends to be problematic, because the advertised capabil= ity of "we'll utilize everywhere you have capacity" tends to get snarled up= in figuring out what the capacity is. I understand Teragraph is supposed t= o do better there. MPLS traffic engineering was originally announced as sol= ving this one, too! I don't see a lot of=C2=A0 hope for capacity-aware rout= ing protocols taking off, but I imagine we'll get a few new ones announced = and then quietly forgotten as before. >=20 > And lastly: once your network gets really big, OSPF tables can get too l= arge - and you're stuck either dividing your OSPF zones and/or using some B= GP in interior mode. You can mitigate this with some careful design. What is typically considered "too large" and how does "too large" type prob= lems typically show up? Thanks Mark >=20 > On Wed, Oct 26, 2022 at 5:35 PM dan dandenson@gmail.com> wrote: > I have played with batman-adv quite a bit and there are some concepts in= it I really like.=C2=A0 Not being shortest path for one, and rating a link= quality instead of hard up/down.=C2=A0 =C2=A0I also like the layer2 model = so it looks like a big switch.=C2=A0 It's very clean from an operational pe= rspective as it behaves essentially like an MPLS/VPLS network administrativ= ely. >=20 > What I think we're missing is the integration of network attributes and = class of service.=C2=A0 For instance, user to 'internet' has 3 potential pa= ths with each having these end-to-end latency, upload throughput, download = throughput, and say 'quality' or packet loss.=C2=A0 Then having your QoS en= gine able to tag packets for how it perceives them to need routed and then = have the routing engine pick routes based on availability.=C2=A0 So you mig= ht have a longer path that will suffer some on latency because of the hops = and link type but has big bandwidth 'available' (ie large capacity and unde= rused) so it should ask for that flow to take the underused high capacity (= and yet still meets other criteria) path.=C2=A0 Something considered realti= me might prefer that 700Mbps licenced path that has lower and more predicta= ble latency and enough available capacity for the job. By encouraging=C2=A0= high throughput needs to take paths with a lot of availability and some mec= hanism to prevent occilations from reroutes you could keep lower latency li= nks less busy and get load balancing by a more intelligent choice.=C2=A0 = =C2=A0 You could also have some sort of reservation number tagged onto that= packet to ask the intermediate hops to reduce their available amount.=C2= =A0 If you were going to go all out on this and have devices that spoke thi= s everywhere, you could put your shapers everywhere as well, getting that d= esired egrees shapping=C2=A0on both sides and letting the network sort of r= eserve a bit of bandwidth for each customer based on that.=C2=A0 >=20 > Of course this means scaling issues almost inherently=C2=A0because those= 'available capacity' numbers and packet loss need to be communicated.=C2= =A0 computationally intensive.=C2=A0=C2=A0 >=20 > batman-adv does this in a way with it's OGM/ELP system.=C2=A0 You can ta= ke a longer path through a batman-adv network because of a saturated link a= nd it doesn't consider that saturated link 'down'.=C2=A0=C2=A0 >=20 > rflo was an interesting tech that did some of these once upon a time. >=20 > Just thoughts. > On Wed, Oct 26, 2022 at 3:38 PM Dave Taht via LibreQoS libreqos@lists.bu= fferbloat.net> wrote: > On Wed, Oct 26, 2022 at 1:53 PM Herbert Wolverson via LibreQoS > libreqos@lists.bufferbloat.net> wrote: > > > > My name is Herbert, and I'm an OSPF addict... seriously, I love OSPF.= Right down to stub sites, not-so-stubby sites, and isolating IP blocks wit= hin a site into "stub" nets and ensuring they are aggregated properly. I sh= ould probably go outside more... > =20 > haha. > =20 > My name is dave, and I think all routing protocols should have evolved > much better to elegantly meet the real world problems they were trying > to solve, than they have. > =20 > To avoid burying the lede, to what extent does OSPF still rely on > multicast? How well can it carry ipv6 now? What extensions are common > in the real WISP world? > =20 > BGP needs a few more napkins. > =20 > RIP was a VERY good start but we drew the wrong lessons from its > failures, and the super-duper-trendline towards centralized > controllers inherent in OSPF and ISIS that happened in the 90s that > doesn't scale anywhere near as I'd like. > =20 > I liked the rise of meshy 802.11 networks, I know the author of AODV > well (charlie perkins is arguably one of the fathers of mesh > networking, far too few have read his books from the 90s). And I've > been involved in the "battlemesh" group for many years with those > trying to make 'em work better on networks such as guifi, > wlan-slovinia, etc. > =20 > Backstory. Back in 07, in Nicaragua, I was (stupidly) trying to get > ipv6 to work over nanostation m2s or m5s I forget which, and the basic > option was to run two copies of the ospf daemon to manage 4 and 6 > independently. I only had 32MB of memory and it didn't fit, so I > started looking for alternatives, found babel, corresponded with (and > frankly thoroughly annoyed) the author, and starting giving it a go. > It transported 4 and 6 in the same packets, was tiny, was > distance-vector (thus, I thought, more a match for bgp), and (to me) > most importantly, solved the ipv4 and ipv6 routing problems in the > same daemon at the same time, and actually fit into less memory than > ospf did. It was good enough it seemed, to deploy to a few hundred > routers without having to play major tricks with areas and stubs and > so on. > =20 > Babel is so simple that toke wrote a near complete implementation from > the spec, in python, during a string of extremely boring IETF > meetings, over the course of a week. He later took on the bird port. > Over the years we've wedged most (but not all) of the key features I > thought a meshy wireless routing protocol should have, with > implementations in a standalone daemon, bird, and FRR. (there was a > quagga port at one point too. I forget what happened to toke's python > version). > =20 > https://www.rfc-editor.org/rfc/rfc8966.html babel > https://arxiv.org/abs/1403.0445 source specific routing > https://datatracker.ietf.org/doc/rfc8967/ HMAC authentication > https://datatracker.ietf.org/doc/html/draft-ietf-babel-rtt-extension-00 > RTT metric > https://datatracker.ietf.org/meeting/99/materials/slides-99-babel-unica= st-hellos-00.pdf > =C2=A0unicast hellos > =20 > Missing is BFD support, and the slightest bit of traction outside of > the shrinking battlemesh communities. > =20 > Althea is using babel and fq_codel in their blockchain routing thing > (I reserve comment), and I don't know where else, besides as part of > wireguard tunnels, babel is being used today. But I'm rather > interested in how OSPF evolved since I last touched it, and what use > cases it is good at and fails at? > =20 > =20 > > On Wed, Oct 26, 2022 at 3:29 PM Dave Taht via LibreQoS libreqos@lists= .bufferbloat.net> wrote: > >> > >> OK, since I'm getting such great updates on the state of the wisp > >> world, far more in a few days than I've had in 10 years... and btw, = no > >> need to leap on dr science guy research questions like mine if you > >> have like, towers flooding or the phone ringing off the hook.... > >> > >> What routing protocols are in use nowadays? BGP, yes, and it seems > >> ospf is popular? > >> > >> How about ISIS? > >> > >> I figure babel has zero traction or awareness despite being mandated > >> by the ietf homenet working group. > >> > >> Secondly, do you rely on BGP based on the edge router or use it in > >> software (frr? quagga? bird?). Using RPKI? Push FIBs anywhere? (rout= e > >> 666 in particular) > >> > >> Similar question related to the IGP protocol in use, where do you re= ly > >> on it, vs all the tunnels you have, on what kinds of hardware? > >> > >> I note that robert at some point, somewhere, pointed out how fq_code= l > >> saved his bacon when there was a major routing mishap (as there is n= o > >> congestion control in ospf), and I'd like to hear more of that story= . > >> > >> BATMAN has been mentioned. There's other wireless protocols I've lik= ed > >> - OLSR for example... > >> > >> Nobody knows what lies underneath many consumer wireless meshes > >> although it looks like 802.11s is a starting point, none, so far as = I > >> know interoperate across brands. > >> > >> -- > >> This song goes out to all the folk that thought Stadia would work: > >> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981= 366665607352320-FXtz > >> Rip Van Winkle COO, TekLibre, LLC > >> _______________________________________________ > >> LibreQoS mailing list > >> LibreQoS@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/libreqos > > > > _______________________________________________ > > LibreQoS mailing list > > LibreQoS@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/libreqos > =20 > =20 > =20 > --=20 > This song goes out to all the folk that thought Stadia would work: > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366= 665607352320-FXtz > Dave T=C3=A4ht CEO, TekLibre, LLC > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos > _______________________________________________ > LibreQoS mailing list > LibreQoS@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/libreqos >=20