[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Thomas Rosenstein
thomas.rosenstein at creamfinance.com
Fri Nov 6 12:04:49 EST 2020
On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote:
> On Fri, 6 Nov 2020 13:53:58 +0100
> Jesper Dangaard Brouer <brouer at redhat.com> wrote:
>
>> [...]
>>>>
>>>> Could this be related to netlink? I have gobgpd running on these
>>>> routers, which injects routes via netlink.
>>>> But the churn rate during the tests is very minimal, maybe 30 - 40
>>>> routes every second.
>>
>> Yes, this could be related. The internal data-structure for FIB
>> lookups is a fibtrie which is a compressed patricia tree, related to
>> radix tree idea. Thus, I can imagine that the kernel have to
>> rebuild/rebalance the tree with all these updates.
>
> Reading the kernel code. The IPv4 fib_trie code is very well tuned,
> fully RCU-ified, meaning read-side is lock-free. The resize()
> function
> code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses
> too
> much time. And the update looks lockfree.
>
> The IPv6 update looks more scary, as it seems to take a "bh" spinlock
> that can block softirq from running code in net/ipv6/ip6_fib.c
> (spin_lock_bh(&f6i->fib6_table->tb6_lock).
I'm using ping on IPv4, but I'll try to see if IPv6 makes any
difference!
>
> Have you tried to use 'perf record' to observe that is happening on
> the system while these latency incidents happen? (let me know if you
> want some cmdline hints)
Haven't tried this yet. If you have some hints what events to monitor
I'll take them!
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
More information about the Bloat
mailing list