[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Toke Høiland-Jørgensen toke at toke.dk
Thu Nov 5 06:21:54 EST 2020


"Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> writes:

>> If so, this sounds more like a driver issue, or maybe something to do
>> with scheduling. Does it only happen with ICMP? You could try this 
>> tool
>> for a userspace UDP measurement:
>
> It happens with all packets, therefore the transfer to backblaze with 40 
> threads goes down to ~8MB/s instead of >60MB/s

Huh, right, definitely sounds like a kernel bug; or maybe the new kernel
is getting the hardware into a state where it bugs out when there are
lots of flows or something.

You could try looking at the ethtool stats (ethtool -S) while running
the test and see if any error counters go up. Here's a handy script to
monitor changes in the counters:

https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

> I'll try what that reports!
>
>> Also, what happens if you ping a host on the internet (*through* the
>> router instead of *to* it)?
>
> Same issue, but twice pronounced, as it seems all interfaces are 
> affected.
> So, ping on one interface and the second has the issue.
> Also all traffic across the host has the issue, but on both sides, so 
> ping to the internet increased by 2x

Right, so even an unloaded interface suffers? But this is the same NIC,
right? So it could still be a hardware issue...

> Yep default that CentOS ships, I just tested 4.12.5 there the issue also 
> does not happen. So I guess I can bisect it then...(really don't want to 
> 😃)

Well that at least narrows it down :)

>>
>> How did you configure the new kernel? Did you start from scratch, or 
>> is
>> it based on the old centos config?
>
> first oldconfig and from there then added additional options for IB, 
> NVMe, etc (which I don't really need on the routers)

OK, so you're probably building with roughly the same options in terms
of scheduling granularity etc. That's good. Did you enable spectre
mitigations etc on the new kernel? What's the output of
`tail /sys/devices/system/cpu/vulnerabilities/*` ?

-Toke


More information about the Bloat mailing list