[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein thomas.rosenstein at creamfinance.com
Fri Nov 6 04:18:10 EST 2020



On 5 Nov 2020, at 14:33, Jesper Dangaard Brouer wrote:

> On Thu, 05 Nov 2020 13:22:10 +0100
> Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> wrote:
>
>> On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote:
>>
>>> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> writes:
>>>
>>>>> If so, this sounds more like a driver issue, or maybe something to
>>>>> do with scheduling. Does it only happen with ICMP? You could try 
>>>>> this
>>>>> tool for a userspace UDP measurement:
>>>>
>>>> It happens with all packets, therefore the transfer to backblaze 
>>>> with
>>>> 40 threads goes down to ~8MB/s instead of >60MB/s
>>>
>>> Huh, right, definitely sounds like a kernel bug; or maybe the new
>>> kernel is getting the hardware into a state where it bugs out when
>>> there are lots of flows or something.
>>>
>>> You could try looking at the ethtool stats (ethtool -S) while
>>> running the test and see if any error counters go up. Here's a
>>> handy script to monitor changes in the counters:
>>>
>>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
>>>
>>>> I'll try what that reports!
>>>>
>>>>> Also, what happens if you ping a host on the internet (*through*
>>>>> the router instead of *to* it)?
>>>>
>>>> Same issue, but twice pronounced, as it seems all interfaces are
>>>> affected.
>>>> So, ping on one interface and the second has the issue.
>>>> Also all traffic across the host has the issue, but on both sides,
>>>> so ping to the internet increased by 2x
>>>
>>> Right, so even an unloaded interface suffers? But this is the same
>>> NIC, right? So it could still be a hardware issue...
>>>
>>>> Yep default that CentOS ships, I just tested 4.12.5 there the
>>>> issue also does not happen. So I guess I can bisect it
>>>> then...(really don't want to 😃)
>>>
>>> Well that at least narrows it down :)
>>
>> I just tested 5.9.4 seems to also fix it partly, I have long
>> stretches where it looks good, and then some increases again. (3.10
>> Stock has them too, but not so high, rather 1-3 ms)
>>
>> for example:
>>
>> 64 bytes from x.x.x.x: icmp_seq=10 ttl=64 time=0.169 ms
>> 64 bytes from x.x.x.x: icmp_seq=11 ttl=64 time=5.53 ms
>> 64 bytes from x.x.x.x: icmp_seq=12 ttl=64 time=9.44 ms
>> 64 bytes from x.x.x.x: icmp_seq=13 ttl=64 time=0.167 ms
>> 64 bytes from x.x.x.x: icmp_seq=14 ttl=64 time=3.88 ms
>>
>> and then again:
>>
>> 64 bytes from x.x.x.x: icmp_seq=15 ttl=64 time=0.569 ms
>> 64 bytes from x.x.x.x: icmp_seq=16 ttl=64 time=0.148 ms
>> 64 bytes from x.x.x.x: icmp_seq=17 ttl=64 time=0.286 ms
>> 64 bytes from x.x.x.x: icmp_seq=18 ttl=64 time=0.257 ms
>> 64 bytes from x.x.x.x: icmp_seq=19 ttl=64 time=0.220 ms
>> 64 bytes from x.x.x.x: icmp_seq=20 ttl=64 time=0.125 ms
>> 64 bytes from x.x.x.x: icmp_seq=21 ttl=64 time=0.188 ms
>> 64 bytes from x.x.x.x: icmp_seq=22 ttl=64 time=0.202 ms
>> 64 bytes from x.x.x.x: icmp_seq=23 ttl=64 time=0.195 ms
>> 64 bytes from x.x.x.x: icmp_seq=24 ttl=64 time=0.177 ms
>> 64 bytes from x.x.x.x: icmp_seq=25 ttl=64 time=0.242 ms
>> 64 bytes from x.x.x.x: icmp_seq=26 ttl=64 time=0.339 ms
>> 64 bytes from x.x.x.x: icmp_seq=27 ttl=64 time=0.183 ms
>> 64 bytes from x.x.x.x: icmp_seq=28 ttl=64 time=0.221 ms
>> 64 bytes from x.x.x.x: icmp_seq=29 ttl=64 time=0.317 ms
>> 64 bytes from x.x.x.x: icmp_seq=30 ttl=64 time=0.210 ms
>> 64 bytes from x.x.x.x: icmp_seq=31 ttl=64 time=0.242 ms
>> 64 bytes from x.x.x.x: icmp_seq=32 ttl=64 time=0.127 ms
>> 64 bytes from x.x.x.x: icmp_seq=33 ttl=64 time=0.217 ms
>> 64 bytes from x.x.x.x: icmp_seq=34 ttl=64 time=0.184 ms
>>
>>
>> For me it looks now that there was some fix between 5.4.60 and 5.9.4
>> ... anyone can pinpoint it?

I have now retried with 3.10, here's the results:

I'm pinging between the hosts with the traffic flow, upload is running 
55 MB/s (compared to ~20 MB/s on 5.9.4, and 8MB/s on 5.4.60):

64 bytes from x.x.x.x: icmp_seq=159 ttl=64 time=0.050 ms
64 bytes from x.x.x.x: icmp_seq=160 ttl=64 time=0.056 ms
64 bytes from x.x.x.x: icmp_seq=161 ttl=64 time=0.061 ms
64 bytes from x.x.x.x: icmp_seq=162 ttl=64 time=0.072 ms
64 bytes from x.x.x.x: icmp_seq=163 ttl=64 time=0.052 ms
64 bytes from x.x.x.x: icmp_seq=164 ttl=64 time=0.053 ms
64 bytes from x.x.x.x: icmp_seq=165 ttl=64 time=0.068 ms
64 bytes from x.x.x.x: icmp_seq=166 ttl=64 time=0.050 ms
64 bytes from x.x.x.x: icmp_seq=167 ttl=64 time=0.057 ms
64 bytes from x.x.x.x: icmp_seq=168 ttl=64 time=0.051 ms
64 bytes from x.x.x.x: icmp_seq=169 ttl=64 time=0.045 ms
64 bytes from x.x.x.x: icmp_seq=170 ttl=64 time=0.138 ms
64 bytes from x.x.x.x: icmp_seq=171 ttl=64 time=0.052 ms
64 bytes from x.x.x.x: icmp_seq=172 ttl=64 time=0.049 ms
64 bytes from x.x.x.x: icmp_seq=173 ttl=64 time=0.094 ms
64 bytes from x.x.x.x: icmp_seq=174 ttl=64 time=0.050 ms
64 bytes from x.x.x.x: icmp_seq=175 ttl=64 time=0.810 ms
64 bytes from x.x.x.x: icmp_seq=176 ttl=64 time=0.077 ms
64 bytes from x.x.x.x: icmp_seq=177 ttl=64 time=0.055 ms
64 bytes from x.x.x.x: icmp_seq=178 ttl=64 time=0.049 ms
64 bytes from x.x.x.x: icmp_seq=179 ttl=64 time=0.050 ms
64 bytes from x.x.x.x: icmp_seq=180 ttl=64 time=0.073 ms
64 bytes from x.x.x.x: icmp_seq=181 ttl=64 time=0.065 ms
64 bytes from x.x.x.x: icmp_seq=182 ttl=64 time=0.123 ms
64 bytes from x.x.x.x: icmp_seq=183 ttl=64 time=0.045 ms
64 bytes from x.x.x.x: icmp_seq=184 ttl=64 time=0.054 ms
64 bytes from x.x.x.x: icmp_seq=185 ttl=64 time=0.063 ms
64 bytes from x.x.x.x: icmp_seq=186 ttl=64 time=0.044 ms
64 bytes from x.x.x.x: icmp_seq=187 ttl=64 time=0.075 ms
64 bytes from x.x.x.x: icmp_seq=188 ttl=64 time=0.076 ms
64 bytes from x.x.x.x: icmp_seq=189 ttl=64 time=0.065 ms
64 bytes from x.x.x.x: icmp_seq=190 ttl=64 time=0.049 ms

The response times with traffic flow are around 3 times better than on 
newer kernels without the traffic flow.
I have a feeling that maybe not all config options were correctly moved 
to the newer kernel.
Or there's a big bug somewhere ... (which would seem rather weird for me 
to be the first one to discover this)

I'll rebuild the 5.9 kernel on one of the 3.10 kernel and see if it 
makes a difference ...

>
> I have some bpftrace tools to measure these kind of latency spikes 
> here:
>  [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/
>
> The tool you want is: softirq_net_latency.bt
>  [2] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt
>
> Example output see[3]:
>  [3] https://bugzilla.redhat.com/show_bug.cgi?id=1795049#c8
>
> Based on the kernel versions, I don't expect this to be same latency
> issue as described in the bugzilla[3] case (as IIRC it was fixed in
> 4.19).  It can still be similar issue, where some userspace process is
> reading information from the kernel (/sys/fs/cgroup/memory/memory.stat
> in BZ case) that blocks softirq from running, and result in these
> latency spikes.
>
>
>
> Install guide to bpftrace[4]:
>  [4] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/INSTALL.org
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


More information about the Bloat mailing list