[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Jesper Dangaard Brouer
brouer at redhat.com
Fri Nov 6 07:53:58 EST 2020
On Fri, 06 Nov 2020 12:45:31 +0100
Toke Høiland-Jørgensen <toke at toke.dk> wrote:
> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> writes:
>
> > On 6 Nov 2020, at 12:18, Jesper Dangaard Brouer wrote:
> >
> >> On Fri, 06 Nov 2020 10:18:10 +0100
> >> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> wrote:
> >>
> >>>>> I just tested 5.9.4 seems to also fix it partly, I have long
> >>>>> stretches where it looks good, and then some increases again. (3.10
> >>>>> Stock has them too, but not so high, rather 1-3 ms)
> >>>>>
> >>
> >> That you have long stretches where latency looks good is interesting
> >> information. My theory is that your system have a periodic userspace
> >> process that does a kernel syscall that takes too long, blocking
> >> network card from processing packets. (Note it can also be a kernel
> >> thread).
> >
[...]
> >
> > Could this be related to netlink? I have gobgpd running on these
> > routers, which injects routes via netlink.
> > But the churn rate during the tests is very minimal, maybe 30 - 40
> > routes every second.
Yes, this could be related. The internal data-structure for FIB
lookups is a fibtrie which is a compressed patricia tree, related to
radix tree idea. Thus, I can imagine that the kernel have to
rebuild/rebalance the tree with all these updates.
> >
> > Otherwise we got: salt-minion, collectd, node_exporter, sshd
>
> collectd may be polling the interface stats; try turning that off?
It should be fairly easy for you to test the theory if any of these
services (except sshd) is causing this, by turning them off
individually.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
More information about the Bloat
mailing list