From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <brouer@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 86B213B29D
 for <bloat@lists.bufferbloat.net>; Fri,  6 Nov 2020 05:53:31 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1604660011;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=S/6QbPyhXM+rxFqIzAv4GH5Lk7dRau7g/3v8sYVBXxI=;
 b=iHmLaND10WKxqQkOv/HWHTvrcqCx7xpoxvdX9gF48BzsW45EaNoE/CJjjiVuX15zuTelU+
 bFmEv5UGxN95mGTuBzNcNo4QYY+COv53qL99KS/g1/HroNwhjw0W9XhNjZT3xRRFoZ3+0r
 Dvg3vcZ8AuMYs1Y8lQIycraN9RogSfo=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-495-Fqz3CQoMN0emHLwMIDJUvA-1; Fri, 06 Nov 2020 05:53:26 -0500
X-MC-Unique: Fqz3CQoMN0emHLwMIDJUvA-1
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com
 [10.5.11.23])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F12428049C9;
 Fri,  6 Nov 2020 10:53:24 +0000 (UTC)
Received: from carbon (unknown [10.36.110.25])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 7268B385;
 Fri,  6 Nov 2020 10:53:20 +0000 (UTC)
Date: Fri, 6 Nov 2020 11:53:18 +0100
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com>
Cc: Bloat <bloat@lists.bufferbloat.net>, "Toke =?UTF-8?B?SMO4aWxhbmQtSsO4?=
 =?UTF-8?B?cmdlbnNlbg==?=" <toke@toke.dk>, brouer@redhat.com
Message-ID: <20201106115318.04d210a1@carbon>
In-Reply-To: <E851C7D6-E684-4FE5-8607-E5A4DAD2BF70@creamfinance.com>
References: <ED588EA6-9DC5-45BE-82CD-D84F5919C057@creamfinance.com>
 <87imalumps.fsf@toke.dk>
 <ED77E328-D5E6-45F7-9733-47B97EAE6810@creamfinance.com>
 <871rh8vf1p.fsf@toke.dk>
 <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com>
 <87sg9ot5f1.fsf@toke.dk>
 <D00929D6-E0BF-4C69-AD71-4986D3FB7857@creamfinance.com>
 <20201105143317.78276bbc@carbon>
 <E851C7D6-E684-4FE5-8607-E5A4DAD2BF70@creamfinance.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23
Authentication-Results: relay.mimecast.com;
 auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=brouer@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Bloat] Router congestion,
 slow ping/ack times with kernel 5.4.60
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 06 Nov 2020 10:53:31 -0000

On Fri, 06 Nov 2020 09:48:46 +0100
"Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> wrote:

> On 5 Nov 2020, at 14:33, Jesper Dangaard Brouer wrote:
>=20
> > On Thu, 05 Nov 2020 13:22:10 +0100
> > Thomas Rosenstein via Bloat <bloat@lists.bufferbloat.net> wrote:
> > =20
> >> On 5 Nov 2020, at 12:21, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
> >> =20
> >>> "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> writes:
> >>> =20
> >>>>> If so, this sounds more like a driver issue, or maybe something to
> >>>>> do with scheduling. Does it only happen with ICMP? You could try=20
> >>>>> this
> >>>>> tool for a userspace UDP measurement: =20
> >>>>
> >>>> It happens with all packets, therefore the transfer to backblaze=20
> >>>> with
> >>>> 40 threads goes down to ~8MB/s instead of >60MB/s =20
> >>>
> >>> Huh, right, definitely sounds like a kernel bug; or maybe the new
> >>> kernel is getting the hardware into a state where it bugs out when
> >>> there are lots of flows or something.
> >>>
> >>> You could try looking at the ethtool stats (ethtool -S) while
> >>> running the test and see if any error counters go up. Here's a
> >>> handy script to monitor changes in the counters:
> >>>
> >>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethto=
ol_stats.pl
> >>> =20
> >>>> I'll try what that reports!
> >>>> =20
> >>>>> Also, what happens if you ping a host on the internet (*through*
> >>>>> the router instead of *to* it)? =20
> >>>>
> >>>> Same issue, but twice pronounced, as it seems all interfaces are
> >>>> affected.
> >>>> So, ping on one interface and the second has the issue.
> >>>> Also all traffic across the host has the issue, but on both sides,
> >>>> so ping to the internet increased by 2x =20
> >>>
> >>> Right, so even an unloaded interface suffers? But this is the same
> >>> NIC, right? So it could still be a hardware issue...
> >>> =20
> >>>> Yep default that CentOS ships, I just tested 4.12.5 there the
> >>>> issue also does not happen. So I guess I can bisect it
> >>>> then...(really don't want to =F0=9F=98=83) =20
> >>>
> >>> Well that at least narrows it down :) =20
> >>
> >> I just tested 5.9.4 seems to also fix it partly, I have long
> >> stretches where it looks good, and then some increases again. (3.10
> >> Stock has them too, but not so high, rather 1-3 ms)
> >>
> >> for example:
> >>
> >> 64 bytes from x.x.x.x: icmp_seq=3D10 ttl=3D64 time=3D0.169 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D11 ttl=3D64 time=3D5.53 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D12 ttl=3D64 time=3D9.44 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D13 ttl=3D64 time=3D0.167 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D14 ttl=3D64 time=3D3.88 ms
> >>
> >> and then again:
> >>
> >> 64 bytes from x.x.x.x: icmp_seq=3D15 ttl=3D64 time=3D0.569 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D16 ttl=3D64 time=3D0.148 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D17 ttl=3D64 time=3D0.286 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D18 ttl=3D64 time=3D0.257 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D19 ttl=3D64 time=3D0.220 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D20 ttl=3D64 time=3D0.125 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D21 ttl=3D64 time=3D0.188 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D22 ttl=3D64 time=3D0.202 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D23 ttl=3D64 time=3D0.195 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D24 ttl=3D64 time=3D0.177 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D25 ttl=3D64 time=3D0.242 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D26 ttl=3D64 time=3D0.339 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D27 ttl=3D64 time=3D0.183 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D28 ttl=3D64 time=3D0.221 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D29 ttl=3D64 time=3D0.317 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D30 ttl=3D64 time=3D0.210 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D31 ttl=3D64 time=3D0.242 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D32 ttl=3D64 time=3D0.127 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D33 ttl=3D64 time=3D0.217 ms
> >> 64 bytes from x.x.x.x: icmp_seq=3D34 ttl=3D64 time=3D0.184 ms
> >>
> >>
> >> For me it looks now that there was some fix between 5.4.60 and 5.9.4
> >> ... anyone can pinpoint it? =20
>=20
> So, new day, same issue!
>=20
> I upgraded now all routers to 5.9.4, and the issue is back ...
>=20
> here, when I stop it, it goes immediatly down to 0.xx ms
>=20
> 64 bytes from x.x.x.x: icmp_seq=3D48 ttl=3D64 time=3D1.67 ms
> 64 bytes from x.x.x.x: icmp_seq=3D49 ttl=3D64 time=3D12.6 ms
> 64 bytes from x.x.x.x: icmp_seq=3D50 ttl=3D64 time=3D13.8 ms
> 64 bytes from x.x.x.x: icmp_seq=3D51 ttl=3D64 time=3D5.59 ms
> 64 bytes from x.x.x.x: icmp_seq=3D52 ttl=3D64 time=3D5.86 ms
> 64 bytes from x.x.x.x: icmp_seq=3D53 ttl=3D64 time=3D9.26 ms
> 64 bytes from x.x.x.x: icmp_seq=3D54 ttl=3D64 time=3D8.28 ms
> 64 bytes from x.x.x.x: icmp_seq=3D55 ttl=3D64 time=3D12.4 ms
> 64 bytes from x.x.x.x: icmp_seq=3D56 ttl=3D64 time=3D0.551 ms
> 64 bytes from x.x.x.x: icmp_seq=3D57 ttl=3D64 time=3D4.37 ms
> 64 bytes from x.x.x.x: icmp_seq=3D58 ttl=3D64 time=3D12.1 ms
> 64 bytes from x.x.x.x: icmp_seq=3D59 ttl=3D64 time=3D5.93 ms
> 64 bytes from x.x.x.x: icmp_seq=3D60 ttl=3D64 time=3D6.58 ms
> 64 bytes from x.x.x.x: icmp_seq=3D61 ttl=3D64 time=3D9.19 ms
> 64 bytes from x.x.x.x: icmp_seq=3D62 ttl=3D64 time=3D0.124 ms
> 64 bytes from x.x.x.x: icmp_seq=3D63 ttl=3D64 time=3D7.08 ms
> 64 bytes from x.x.x.x: icmp_seq=3D64 ttl=3D64 time=3D9.69 ms
> 64 bytes from x.x.x.x: icmp_seq=3D65 ttl=3D64 time=3D7.52 ms
> 64 bytes from x.x.x.x: icmp_seq=3D66 ttl=3D64 time=3D14.9 ms
> 64 bytes from x.x.x.x: icmp_seq=3D67 ttl=3D64 time=3D12.6 ms
> 64 bytes from x.x.x.x: icmp_seq=3D68 ttl=3D64 time=3D2.34 ms
> 64 bytes from x.x.x.x: icmp_seq=3D69 ttl=3D64 time=3D8.97 ms
> 64 bytes from x.x.x.x: icmp_seq=3D70 ttl=3D64 time=3D0.203 ms
> 64 bytes from x.x.x.x: icmp_seq=3D71 ttl=3D64 time=3D9.10 ms
> 64 bytes from x.x.x.x: icmp_seq=3D72 ttl=3D64 time=3D3.16 ms
> 64 bytes from x.x.x.x: icmp_seq=3D73 ttl=3D64 time=3D1.88 ms
> 64 bytes from x.x.x.x: icmp_seq=3D74 ttl=3D64 time=3D11.5 ms
> 64 bytes from x.x.x.x: icmp_seq=3D75 ttl=3D64 time=3D11.9 ms
> 64 bytes from x.x.x.x: icmp_seq=3D76 ttl=3D64 time=3D3.64 ms
> 64 bytes from x.x.x.x: icmp_seq=3D77 ttl=3D64 time=3D5.82 ms
> 64 bytes from x.x.x.x: icmp_seq=3D78 ttl=3D64 time=3D0.121 ms
> 64 bytes from x.x.x.x: icmp_seq=3D79 ttl=3D64 time=3D0.212 ms
> 64 bytes from x.x.x.x: icmp_seq=3D80 ttl=3D64 time=3D0.158 ms
>=20
>=20
> I have also tried the bpftrace, there were no high IRQ reported during=20
> the test, here's the latency output:

So, you didn't see any reports of "High IRQ-to-softirq latency" ?

>=20
[...cut...]

> @Jesper: how should that be interpreted?

The tool is looking for outlines and (default) above 2000 us or 2 ms it
will print a statement about finding  "High IRQ-to-softirq latency".
In below output there are 3 cases that should have triggered such a
print out line...


> @latency_usecs[9]:
> [0]                  407 |                                               =
   |
> [1]                 8040 |@@@@@                                          =
   |
> [2, 4)             80780 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@=
@@@|
> [4, 8)             46867 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 =
   |
> [8, 16)             3232 |@@                                             =
   |
> [16, 32)             614 |                                               =
   |
> [32, 64)             337 |                                               =
   |
> [64, 128)            231 |                                               =
   |
> [128, 256)            33 |                                               =
   |
> [256, 512)             0 |                                               =
   |
> [512, 1K)              0 |                                               =
   |
> [1K, 2K)               0 |                                               =
   |
> [2K, 4K)               0 |                                               =
   |
> [4K, 8K)               0 |                                               =
   |
> [8K, 16K)              1 |                                               =
   |
>=20

The majority of IRQ-to-softirq latency is within 2-8 usecs (0.002-0.008 ms)=
.

The interesting is the outliner a number between 8ms to <16ms.

[... cut ...]

> @latency_usecs[17]:
> [0]                 1268 |                                               =
   |
> [1]                16486 |@@@@@                                          =
   |
> [2, 4)            146148 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@=
@@@|
> [4, 8)             92483 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@               =
   |
> [8, 16)            12484 |@@@@                                           =
   |
> [16, 32)            2330 |                                               =
   |
> [32, 64)            1040 |                                               =
   |
> [64, 128)           1054 |                                               =
   |
> [128, 256)           234 |                                               =
   |
> [256, 512)             6 |                                               =
   |
> [512, 1K)              2 |                                               =
   |
> [1K, 2K)               1 |                                               =
   |
> [2K, 4K)               0 |                                               =
   |
> [4K, 8K)               1 |                                               =
   |

[...cut...]

Outliner a number between 4ms to <8ms.

>=20
> @latency_usecs[12]:
> [0]                 2360 |                                               =
   |
> [1]                33684 |@@@@@@                                         =
   |
> [2, 4)            280464 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@=
@@@|
> [4, 8)             51498 |@@@@@@@@@                                      =
   |
> [8, 16)            10636 |@                                              =
   |
> [16, 32)            4374 |                                               =
   |
> [32, 64)            2077 |                                               =
   |
> [64, 128)           2414 |                                               =
   |
> [128, 256)           596 |                                               =
   |
> [256, 512)             8 |                                               =
   |
> [512, 1K)              0 |                                               =
   |
> [1K, 2K)               0 |                                               =
   |
> [2K, 4K)               0 |                                               =
   |
> [4K, 8K)               1 |                                               =
   |
>=20
[...cut...]

The tools measures hardirq IRQ-to-softirq delays, which can happen if
softirq is blocked from running. Meaning kernel was busy doing other
stuff (especially for no-preempt kernels).

Even though there were some outliners in above, then your ping shows a
lot more count of high latency reported, although each count can be
multiple packets at this level.  Thus, the issue might lie in in this
specific step.

I will think of other ways to catch these latency outliners.

We can likely find inspiration in he GitHub SRE blogpost on how they
troubleshooted their latency issue (which my tool is related to).

> >
> > I have some bpftrace tools to measure these kind of latency spikes=20
> > here:
> >  [1]=20
> > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/
> >
> > The tool you want is: softirq_net_latency.bt
> >  [2]=20
> > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/so=
ftirq_net_latency.bt
> >
> > Example output see[3]:
> >  [3] https://bugzilla.redhat.com/show_bug.cgi?id=3D1795049#c8
> >
> > Based on the kernel versions, I don't expect this to be same latency
> > issue as described in the bugzilla[3] case (as IIRC it was fixed in
> > 4.19).  It can still be similar issue, where some userspace process is
> > reading information from the kernel (/sys/fs/cgroup/memory/memory.stat
> > in BZ case) that blocks softirq from running, and result in these
> > latency spikes.

--=20
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer