[LibreQoS] routing protocols and daemons

Dave Taht dave.taht at gmail.com
Fri Oct 28 20:01:49 EDT 2022


On Fri, Oct 28, 2022 at 2:45 PM Juliusz Chroboczek <jch at irif.fr> wrote:
>
> > found babel, corresponded with (and frankly thoroughly annoyed) the
> > author,
>
> Being said author, I can confirm that you did thoroughly annoy me.  But
> then, you also made me think.

I'm glad, ultimately, we also drank, and I'll always remember (somewhat
blurrily) the day/night of our toasts to the builders of Notre-Dame.

earlier on in this thread, herbert listed his complaints about OSPF,
if you have a spare moment, have you ever had any deep thoughts about
it?

--snip snip--
Dijkstra's algorithm remains a very natural approach to mapping a
graph (I've waxed lyrical about it in various book/article/blog posts
over the years in the game-dev world), so I find it a very comfy way
to model the underlying network. Very easy to reason "this will go
that way".

So what's bad about it?

* ECMP is equal; if you have routes with different costs it will only
use the lowest cost - it won't try and "mesh" some of the traffic in
other directions. Likewise, if you have two equal routes - and one of
them is running at a low capacity - you wind up only utilizing twice
the capacity of the slowest link. You're often better off dropping the
link over the degraded circuit (hence "carrier drop" features on
various radios).
* OSPF has no idea what the capacities of various links are. It'll use
the shortest cost route, and leave the details up to the lower layers
of the stack.

There have been various attempts to integrate capacity into network
design, and I've yet to see one that holds up well on a multi-vendor
network. If you ask a Ubiquiti Rocket AC Gen2 its capacity, it'll
often give you some nice big number - but it'll stall out far before
that number. The Force 400C link we just put up doesn't even try to
estimate its capacity. So multi-vendor mesh routing tends to be
problematic, because the advertised capability of "we'll utilize
everywhere you have capacity" tends to get snarled up in figuring out
what the capacity is. I understand Teragraph is supposed to do better
there. MPLS traffic engineering was originally announced as solving
this one, too! I don't see a lot of  hope for capacity-aware routing
protocols taking off, but I imagine we'll get a few new ones announced
and then quietly forgotten as before.

And lastly: once your network gets really big, OSPF tables can get too
large - and you're stuck either dividing your OSPF zones and/or using
some BGP in interior mode. You can mitigate this with some careful
design.
-- unsnip --

I don't suppose you have ever had any ideas to how to improve things?


>
> > Babel is so simple that toke wrote a near complete implementation from
> > the spec, in python, during a string of extremely boring IETF
> > meetings, over the course of a week. He later took on the bird port.
>
> This is not correct.  Babel was first reimplemented in Python during two
> nights during an IETF meeting by Markus Stenberg.  As to Toke, he did the
> BIRD reimplementation in C during a Battlemesh meeting, and it took him
> a whole four days.  I later did a minimal implementation in C, which
> compiled to 20kB of x86 code.

You're working on galene, now, primarily? How's android treating you?

>
> > I forget what happened to toke's python version).
>
> Markus Stenberg's.  It's still available, but fairly obsolete due to
> advances in babeld and BIRD.
>
>   https://github.com/fingon/pybabel

yea!

> > Althea is using babel and fq_codel in their blockchain routing thing
> > (I reserve comment), and I don't know where else, besides as part of
> > wireguard tunnels, babel is being used today.
>
> Now that Babel is no longer a legitimate research project, the main user
> and main source of funding for Babel are Nexedi, who use it in their
> distributed cloud
>
>   https://www.nexedi.com/

If you can stomach it, I have a copy of althea's most current preso,
which I can send under separate cover. They scored 15 out of a 100 on
a funding round with it, but there babel and fq_codel were, with an
rtt metric on page 16...

At least they leave breadcrumbs back to where the actual good ideas
came from. Many others in web3 haven't.

>
> But I agree with you, Dave, Babel did not take over the world.  The main
> reason, I suspect, is that OSPF is very good, and that most people are
> happy enough with it.

Everything can be improved.

> Notwithstanding that, we're still maintaining both the standalone babeld
> and Toke's BIRD module, and we've been busy extending the protocol with
> source-specific routing, with v4-via-v6 routing, with MAC protection.

I am so in love with v4 via v6. It's just having to wait 6+ years for
places like mikrotik to ship it that I can't stand.

>
> -- Juliusz


--
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC


More information about the LibreQoS mailing list