From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <toke@toke.dk>
Received: from mail.toke.dk (mail.toke.dk [IPv6:2a0c:4d80:42:2001::664])
 (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id C53F83B29D
 for <bloat@lists.bufferbloat.net>; Thu,  5 Nov 2020 07:38:57 -0500 (EST)
From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= <toke@toke.dk>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023;
 t=1604579936; bh=KwBSrcUjeX2fP6y9yCtCfiyw7Zl2pZfUy2uPLAYsOv4=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date:From;
 b=Ezj36A+kKyUpz5RIzpcAdibaxZW0mpiAK9SfclSXhnz/dm5dN6lrk9Hh6H7tOQK/Z
 j3wW2+NcRCcRu7WR/SxOTOJC7r5gknpqPFj+K8tLwljGyqM7C6tN3ORJ9Uq4a0RQP1
 sR22wyXeSusug2WwmAaX7o6gxHsjnUJQQ/xHVbAmj3HKq/UA4wMFx4NDMBEadji13H
 fpqj7FrPxQmuZKSFQb0ud14tedbxIyBQwZ3tBRNK33hwC6H+x23Qb5xVwxLtllrzla
 XdCAbGYjstbli1M1C4MXruFFQHhz4xB+piGRf4YtseauqWEs+Lo5MJpYjDA7B3W7RL
 7Feo1EWyGAk2g==
To: Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
Cc: bloat@lists.bufferbloat.net
In-Reply-To: <D00929D6-E0BF-4C69-AD71-4986D3FB7857@creamfinance.com>
References: <ED588EA6-9DC5-45BE-82CD-D84F5919C057@creamfinance.com>
 <87imalumps.fsf@toke.dk>
 <ED77E328-D5E6-45F7-9733-47B97EAE6810@creamfinance.com>
 <871rh8vf1p.fsf@toke.dk>
 <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com>
 <87sg9ot5f1.fsf@toke.dk>
 <D00929D6-E0BF-4C69-AD71-4986D3FB7857@creamfinance.com>
Date: Thu, 05 Nov 2020 13:38:56 +0100
X-Clacks-Overhead: GNU Terry Pratchett
Message-ID: <87eel8t1un.fsf@toke.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Bloat] Router congestion,
	slow ping/ack times with kernel 5.4.60
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 05 Nov 2020 12:38:57 -0000

"Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> writes:

> On 5 Nov 2020, at 12:21, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
>
>> "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> writes:
>>
>>>> If so, this sounds more like a driver issue, or maybe something to=20
>>>> do
>>>> with scheduling. Does it only happen with ICMP? You could try this
>>>> tool
>>>> for a userspace UDP measurement:
>>>
>>> It happens with all packets, therefore the transfer to backblaze with=20
>>> 40
>>> threads goes down to ~8MB/s instead of >60MB/s
>>
>> Huh, right, definitely sounds like a kernel bug; or maybe the new=20
>> kernel
>> is getting the hardware into a state where it bugs out when there are
>> lots of flows or something.
>>
>> You could try looking at the ethtool stats (ethtool -S) while running
>> the test and see if any error counters go up. Here's a handy script to
>> monitor changes in the counters:
>>
>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_=
stats.pl
>>
>>> I'll try what that reports!
>>>
>>>> Also, what happens if you ping a host on the internet (*through* the
>>>> router instead of *to* it)?
>>>
>>> Same issue, but twice pronounced, as it seems all interfaces are
>>> affected.
>>> So, ping on one interface and the second has the issue.
>>> Also all traffic across the host has the issue, but on both sides, so
>>> ping to the internet increased by 2x
>>
>> Right, so even an unloaded interface suffers? But this is the same=20
>> NIC,
>> right? So it could still be a hardware issue...
>>
>>> Yep default that CentOS ships, I just tested 4.12.5 there the issue=20
>>> also
>>> does not happen. So I guess I can bisect it then...(really don't want=20
>>> to
>>> =F0=9F=98=83)
>>
>> Well that at least narrows it down :)
>
> I just tested 5.9.4 seems to also fix it partly, I have long stretches=20
> where it looks good, and then some increases again. (3.10 Stock has them=
=20
> too, but not so high, rather 1-3 ms)
>
> for example:
>
> 64 bytes from x.x.x.x: icmp_seq=3D10 ttl=3D64 time=3D0.169 ms
> 64 bytes from x.x.x.x: icmp_seq=3D11 ttl=3D64 time=3D5.53 ms
> 64 bytes from x.x.x.x: icmp_seq=3D12 ttl=3D64 time=3D9.44 ms
> 64 bytes from x.x.x.x: icmp_seq=3D13 ttl=3D64 time=3D0.167 ms
> 64 bytes from x.x.x.x: icmp_seq=3D14 ttl=3D64 time=3D3.88 ms
>
> and then again:
>
> 64 bytes from x.x.x.x: icmp_seq=3D15 ttl=3D64 time=3D0.569 ms
> 64 bytes from x.x.x.x: icmp_seq=3D16 ttl=3D64 time=3D0.148 ms
> 64 bytes from x.x.x.x: icmp_seq=3D17 ttl=3D64 time=3D0.286 ms
> 64 bytes from x.x.x.x: icmp_seq=3D18 ttl=3D64 time=3D0.257 ms
> 64 bytes from x.x.x.x: icmp_seq=3D19 ttl=3D64 time=3D0.220 ms
> 64 bytes from x.x.x.x: icmp_seq=3D20 ttl=3D64 time=3D0.125 ms
> 64 bytes from x.x.x.x: icmp_seq=3D21 ttl=3D64 time=3D0.188 ms
> 64 bytes from x.x.x.x: icmp_seq=3D22 ttl=3D64 time=3D0.202 ms
> 64 bytes from x.x.x.x: icmp_seq=3D23 ttl=3D64 time=3D0.195 ms
> 64 bytes from x.x.x.x: icmp_seq=3D24 ttl=3D64 time=3D0.177 ms
> 64 bytes from x.x.x.x: icmp_seq=3D25 ttl=3D64 time=3D0.242 ms
> 64 bytes from x.x.x.x: icmp_seq=3D26 ttl=3D64 time=3D0.339 ms
> 64 bytes from x.x.x.x: icmp_seq=3D27 ttl=3D64 time=3D0.183 ms
> 64 bytes from x.x.x.x: icmp_seq=3D28 ttl=3D64 time=3D0.221 ms
> 64 bytes from x.x.x.x: icmp_seq=3D29 ttl=3D64 time=3D0.317 ms
> 64 bytes from x.x.x.x: icmp_seq=3D30 ttl=3D64 time=3D0.210 ms
> 64 bytes from x.x.x.x: icmp_seq=3D31 ttl=3D64 time=3D0.242 ms
> 64 bytes from x.x.x.x: icmp_seq=3D32 ttl=3D64 time=3D0.127 ms
> 64 bytes from x.x.x.x: icmp_seq=3D33 ttl=3D64 time=3D0.217 ms
> 64 bytes from x.x.x.x: icmp_seq=3D34 ttl=3D64 time=3D0.184 ms
>
>
> For me it looks now that there was some fix between 5.4.60 and 5.9.4 ...=
=20
> anyone can pinpoint it?

$ git log --no-merges --oneline v5.4.60..v5.9.4|wc -l
72932

Only 73k commits; should be easy, right? :)

(In other words no, I have no idea; I'd suggest either (a) asking on
netdev, (b) bisecting or (c) using 5.9+ and just making peace with not
knowing).

>>>> How did you configure the new kernel? Did you start from scratch, or
>>>> is
>>>> it based on the old centos config?
>>>
>>> first oldconfig and from there then added additional options for IB,
>>> NVMe, etc (which I don't really need on the routers)
>>
>> OK, so you're probably building with roughly the same options in terms
>> of scheduling granularity etc. That's good. Did you enable spectre
>> mitigations etc on the new kernel? What's the output of
>> `tail /sys/devices/system/cpu/vulnerabilities/*` ?
>
> mitigations are off

Right, I just figured maybe you were hitting some threshold that
involved a lot of indirect calls which slowed things down due to
mitigations. Guess not, then...

-Toke