[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Fri Nov 6 05:53:18 EST 2020

On Fri, 06 Nov 2020 09:48:46 +0100
"Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> wrote:

> On 5 Nov 2020, at 14:33, Jesper Dangaard Brouer wrote:
> 
> > On Thu, 05 Nov 2020 13:22:10 +0100
> > Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> wrote:
> >  
> >> On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote:
> >>  
> >>> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> writes:
> >>>  
> >>>>> If so, this sounds more like a driver issue, or maybe something to
> >>>>> do with scheduling. Does it only happen with ICMP? You could try 
> >>>>> this
> >>>>> tool for a userspace UDP measurement:  
> >>>>
> >>>> It happens with all packets, therefore the transfer to backblaze 
> >>>> with
> >>>> 40 threads goes down to ~8MB/s instead of >60MB/s  
> >>>
> >>> Huh, right, definitely sounds like a kernel bug; or maybe the new
> >>> kernel is getting the hardware into a state where it bugs out when
> >>> there are lots of flows or something.
> >>>
> >>> You could try looking at the ethtool stats (ethtool -S) while
> >>> running the test and see if any error counters go up. Here's a
> >>> handy script to monitor changes in the counters:
> >>>
> >>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
> >>>  
> >>>> I'll try what that reports!
> >>>>  
> >>>>> Also, what happens if you ping a host on the internet (*through*
> >>>>> the router instead of *to* it)?  
> >>>>
> >>>> Same issue, but twice pronounced, as it seems all interfaces are
> >>>> affected.
> >>>> So, ping on one interface and the second has the issue.
> >>>> Also all traffic across the host has the issue, but on both sides,
> >>>> so ping to the internet increased by 2x  
> >>>
> >>> Right, so even an unloaded interface suffers? But this is the same
> >>> NIC, right? So it could still be a hardware issue...
> >>>  
> >>>> Yep default that CentOS ships, I just tested 4.12.5 there the
> >>>> issue also does not happen. So I guess I can bisect it
> >>>> then...(really don't want to 😃)  
> >>>
> >>> Well that at least narrows it down :)  
> >>
> >> I just tested 5.9.4 seems to also fix it partly, I have long
> >> stretches where it looks good, and then some increases again. (3.10
> >> Stock has them too, but not so high, rather 1-3 ms)
> >>
> >> for example:
> >>
> >> 64 bytes from x.x.x.x: icmp_seq=10 ttl=64 time=0.169 ms
> >> 64 bytes from x.x.x.x: icmp_seq=11 ttl=64 time=5.53 ms
> >> 64 bytes from x.x.x.x: icmp_seq=12 ttl=64 time=9.44 ms
> >> 64 bytes from x.x.x.x: icmp_seq=13 ttl=64 time=0.167 ms
> >> 64 bytes from x.x.x.x: icmp_seq=14 ttl=64 time=3.88 ms
> >>
> >> and then again:
> >>
> >> 64 bytes from x.x.x.x: icmp_seq=15 ttl=64 time=0.569 ms
> >> 64 bytes from x.x.x.x: icmp_seq=16 ttl=64 time=0.148 ms
> >> 64 bytes from x.x.x.x: icmp_seq=17 ttl=64 time=0.286 ms
> >> 64 bytes from x.x.x.x: icmp_seq=18 ttl=64 time=0.257 ms
> >> 64 bytes from x.x.x.x: icmp_seq=19 ttl=64 time=0.220 ms
> >> 64 bytes from x.x.x.x: icmp_seq=20 ttl=64 time=0.125 ms
> >> 64 bytes from x.x.x.x: icmp_seq=21 ttl=64 time=0.188 ms
> >> 64 bytes from x.x.x.x: icmp_seq=22 ttl=64 time=0.202 ms
> >> 64 bytes from x.x.x.x: icmp_seq=23 ttl=64 time=0.195 ms
> >> 64 bytes from x.x.x.x: icmp_seq=24 ttl=64 time=0.177 ms
> >> 64 bytes from x.x.x.x: icmp_seq=25 ttl=64 time=0.242 ms
> >> 64 bytes from x.x.x.x: icmp_seq=26 ttl=64 time=0.339 ms
> >> 64 bytes from x.x.x.x: icmp_seq=27 ttl=64 time=0.183 ms
> >> 64 bytes from x.x.x.x: icmp_seq=28 ttl=64 time=0.221 ms
> >> 64 bytes from x.x.x.x: icmp_seq=29 ttl=64 time=0.317 ms
> >> 64 bytes from x.x.x.x: icmp_seq=30 ttl=64 time=0.210 ms
> >> 64 bytes from x.x.x.x: icmp_seq=31 ttl=64 time=0.242 ms
> >> 64 bytes from x.x.x.x: icmp_seq=32 ttl=64 time=0.127 ms
> >> 64 bytes from x.x.x.x: icmp_seq=33 ttl=64 time=0.217 ms
> >> 64 bytes from x.x.x.x: icmp_seq=34 ttl=64 time=0.184 ms
> >>
> >>
> >> For me it looks now that there was some fix between 5.4.60 and 5.9.4
> >> ... anyone can pinpoint it?  
> 
> So, new day, same issue!
> 
> I upgraded now all routers to 5.9.4, and the issue is back ...
> 
> here, when I stop it, it goes immediatly down to 0.xx ms
> 
> 64 bytes from x.x.x.x: icmp_seq=48 ttl=64 time=1.67 ms
> 64 bytes from x.x.x.x: icmp_seq=49 ttl=64 time=12.6 ms
> 64 bytes from x.x.x.x: icmp_seq=50 ttl=64 time=13.8 ms
> 64 bytes from x.x.x.x: icmp_seq=51 ttl=64 time=5.59 ms
> 64 bytes from x.x.x.x: icmp_seq=52 ttl=64 time=5.86 ms
> 64 bytes from x.x.x.x: icmp_seq=53 ttl=64 time=9.26 ms
> 64 bytes from x.x.x.x: icmp_seq=54 ttl=64 time=8.28 ms
> 64 bytes from x.x.x.x: icmp_seq=55 ttl=64 time=12.4 ms
> 64 bytes from x.x.x.x: icmp_seq=56 ttl=64 time=0.551 ms
> 64 bytes from x.x.x.x: icmp_seq=57 ttl=64 time=4.37 ms
> 64 bytes from x.x.x.x: icmp_seq=58 ttl=64 time=12.1 ms
> 64 bytes from x.x.x.x: icmp_seq=59 ttl=64 time=5.93 ms
> 64 bytes from x.x.x.x: icmp_seq=60 ttl=64 time=6.58 ms
> 64 bytes from x.x.x.x: icmp_seq=61 ttl=64 time=9.19 ms
> 64 bytes from x.x.x.x: icmp_seq=62 ttl=64 time=0.124 ms
> 64 bytes from x.x.x.x: icmp_seq=63 ttl=64 time=7.08 ms
> 64 bytes from x.x.x.x: icmp_seq=64 ttl=64 time=9.69 ms
> 64 bytes from x.x.x.x: icmp_seq=65 ttl=64 time=7.52 ms
> 64 bytes from x.x.x.x: icmp_seq=66 ttl=64 time=14.9 ms
> 64 bytes from x.x.x.x: icmp_seq=67 ttl=64 time=12.6 ms
> 64 bytes from x.x.x.x: icmp_seq=68 ttl=64 time=2.34 ms
> 64 bytes from x.x.x.x: icmp_seq=69 ttl=64 time=8.97 ms
> 64 bytes from x.x.x.x: icmp_seq=70 ttl=64 time=0.203 ms
> 64 bytes from x.x.x.x: icmp_seq=71 ttl=64 time=9.10 ms
> 64 bytes from x.x.x.x: icmp_seq=72 ttl=64 time=3.16 ms
> 64 bytes from x.x.x.x: icmp_seq=73 ttl=64 time=1.88 ms
> 64 bytes from x.x.x.x: icmp_seq=74 ttl=64 time=11.5 ms
> 64 bytes from x.x.x.x: icmp_seq=75 ttl=64 time=11.9 ms
> 64 bytes from x.x.x.x: icmp_seq=76 ttl=64 time=3.64 ms
> 64 bytes from x.x.x.x: icmp_seq=77 ttl=64 time=5.82 ms
> 64 bytes from x.x.x.x: icmp_seq=78 ttl=64 time=0.121 ms
> 64 bytes from x.x.x.x: icmp_seq=79 ttl=64 time=0.212 ms
> 64 bytes from x.x.x.x: icmp_seq=80 ttl=64 time=0.158 ms
> 
> 
> I have also tried the bpftrace, there were no high IRQ reported during 
> the test, here's the latency output:

So, you didn't see any reports of "High IRQ-to-softirq latency" ?

> 
[...cut...]

> @Jesper: how should that be interpreted?

The tool is looking for outlines and (default) above 2000 us or 2 ms it
will print a statement about finding  "High IRQ-to-softirq latency".
In below output there are 3 cases that should have triggered such a
print out line...

> @latency_usecs[9]:
> [0]                  407 |                                                  |
> [1]                 8040 |@@@@@                                             |
> [2, 4)             80780 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4, 8)             46867 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
> [8, 16)             3232 |@@                                                |
> [16, 32)             614 |                                                  |
> [32, 64)             337 |                                                  |
> [64, 128)            231 |                                                  |
> [128, 256)            33 |                                                  |
> [256, 512)             0 |                                                  |
> [512, 1K)              0 |                                                  |
> [1K, 2K)               0 |                                                  |
> [2K, 4K)               0 |                                                  |
> [4K, 8K)               0 |                                                  |
> [8K, 16K)              1 |                                                  |
> 

The majority of IRQ-to-softirq latency is within 2-8 usecs (0.002-0.008 ms).

The interesting is the outliner a number between 8ms to <16ms.

[... cut ...]

> @latency_usecs[17]:
> [0]                 1268 |                                                  |
> [1]                16486 |@@@@@                                             |
> [2, 4)            146148 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4, 8)             92483 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
> [8, 16)            12484 |@@@@                                              |
> [16, 32)            2330 |                                                  |
> [32, 64)            1040 |                                                  |
> [64, 128)           1054 |                                                  |
> [128, 256)           234 |                                                  |
> [256, 512)             6 |                                                  |
> [512, 1K)              2 |                                                  |
> [1K, 2K)               1 |                                                  |
> [2K, 4K)               0 |                                                  |
> [4K, 8K)               1 |                                                  |

[...cut...]

Outliner a number between 4ms to <8ms.

> 
> @latency_usecs[12]:
> [0]                 2360 |                                                  |
> [1]                33684 |@@@@@@                                            |
> [2, 4)            280464 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4, 8)             51498 |@@@@@@@@@                                         |
> [8, 16)            10636 |@                                                 |
> [16, 32)            4374 |                                                  |
> [32, 64)            2077 |                                                  |
> [64, 128)           2414 |                                                  |
> [128, 256)           596 |                                                  |
> [256, 512)             8 |                                                  |
> [512, 1K)              0 |                                                  |
> [1K, 2K)               0 |                                                  |
> [2K, 4K)               0 |                                                  |
> [4K, 8K)               1 |                                                  |
> 
[...cut...]

The tools measures hardirq IRQ-to-softirq delays, which can happen if
softirq is blocked from running. Meaning kernel was busy doing other
stuff (especially for no-preempt kernels).

Even though there were some outliners in above, then your ping shows a
lot more count of high latency reported, although each count can be
multiple packets at this level.  Thus, the issue might lie in in this
specific step.

I will think of other ways to catch these latency outliners.

We can likely find inspiration in he GitHub SRE blogpost on how they
troubleshooted their latency issue (which my tool is related to).

> >
> > I have some bpftrace tools to measure these kind of latency spikes 
> > here:
> >  [1] 
> > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/
> >
> > The tool you want is: softirq_net_latency.bt
> >  [2] 
> > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt
> >
> > Example output see[3]:
> >  [3] https://bugzilla.redhat.com/show_bug.cgi?id=1795049#c8
> >
> > Based on the kernel versions, I don't expect this to be same latency
> > issue as described in the bugzilla[3] case (as IIRC it was fixed in
> > 4.19).  It can still be similar issue, where some userspace process is
> > reading information from the kernel (/sys/fs/cgroup/memory/memory.stat
> > in BZ case) that blocks softirq from running, and result in these
> > latency spikes.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer