[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Jesper Dangaard Brouer brouer at redhat.com
Fri Nov 6 09:13:24 EST 2020


On Fri, 6 Nov 2020 13:53:58 +0100
Jesper Dangaard Brouer <brouer at redhat.com> wrote:

> [...]
> > >
> > > Could this be related to netlink? I have gobgpd running on these 
> > > routers, which injects routes via netlink.
> > > But the churn rate during the tests is very minimal, maybe 30 - 40 
> > > routes every second.  
> 
> Yes, this could be related.  The internal data-structure for FIB
> lookups is a fibtrie which is a compressed patricia tree, related to
> radix tree idea.  Thus, I can imagine that the kernel have to
> rebuild/rebalance the tree with all these updates.

Reading the kernel code. The IPv4 fib_trie code is very well tuned,
fully RCU-ified, meaning read-side is lock-free.  The resize() function
code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses too
much time.  And the update looks lockfree.

The IPv6 update looks more scary, as it seems to take a "bh" spinlock
that can block softirq from running code in net/ipv6/ip6_fib.c
(spin_lock_bh(&f6i->fib6_table->tb6_lock).

Have you tried to use 'perf record' to observe that is happening on the system while these latency incidents happen?  (let me know if you want some cmdline hints)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



More information about the Bloat mailing list