[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Jesper Dangaard Brouer
brouer at redhat.com
Fri Nov 6 09:13:24 EST 2020
On Fri, 6 Nov 2020 13:53:58 +0100
Jesper Dangaard Brouer <brouer at redhat.com> wrote:
> [...]
> > >
> > > Could this be related to netlink? I have gobgpd running on these
> > > routers, which injects routes via netlink.
> > > But the churn rate during the tests is very minimal, maybe 30 - 40
> > > routes every second.
>
> Yes, this could be related. The internal data-structure for FIB
> lookups is a fibtrie which is a compressed patricia tree, related to
> radix tree idea. Thus, I can imagine that the kernel have to
> rebuild/rebalance the tree with all these updates.
Reading the kernel code. The IPv4 fib_trie code is very well tuned,
fully RCU-ified, meaning read-side is lock-free. The resize() function
code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses too
much time. And the update looks lockfree.
The IPv6 update looks more scary, as it seems to take a "bh" spinlock
that can block softirq from running code in net/ipv6/ip6_fib.c
(spin_lock_bh(&f6i->fib6_table->tb6_lock).
Have you tried to use 'perf record' to observe that is happening on the system while these latency incidents happen? (let me know if you want some cmdline hints)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
More information about the Bloat
mailing list