[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein thomas.rosenstein at creamfinance.com
Fri Nov 6 12:04:49 EST 2020



On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote:

> On Fri, 6 Nov 2020 13:53:58 +0100
> Jesper Dangaard Brouer <brouer at redhat.com> wrote:
>
>> [...]
>>>>
>>>> Could this be related to netlink? I have gobgpd running on these
>>>> routers, which injects routes via netlink.
>>>> But the churn rate during the tests is very minimal, maybe 30 - 40
>>>> routes every second.
>>
>> Yes, this could be related.  The internal data-structure for FIB
>> lookups is a fibtrie which is a compressed patricia tree, related to
>> radix tree idea.  Thus, I can imagine that the kernel have to
>> rebuild/rebalance the tree with all these updates.
>
> Reading the kernel code. The IPv4 fib_trie code is very well tuned,
> fully RCU-ified, meaning read-side is lock-free.  The resize() 
> function
> code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses 
> too
> much time.  And the update looks lockfree.
>
> The IPv6 update looks more scary, as it seems to take a "bh" spinlock
> that can block softirq from running code in net/ipv6/ip6_fib.c
> (spin_lock_bh(&f6i->fib6_table->tb6_lock).

I'm using ping on IPv4, but I'll try to see if IPv6 makes any 
difference!

>
> Have you tried to use 'perf record' to observe that is happening on 
> the system while these latency incidents happen?  (let me know if you 
> want some cmdline hints)

Haven't tried this yet. If you have some hints what events to monitor 
I'll take them!

>
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


More information about the Bloat mailing list