From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2a0c:4d80:42:2001::664]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id C9C993B29E for ; Thu, 5 Nov 2020 06:21:58 -0500 (EST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1604575316; bh=JO6BX6rrCWflG/Y7x78lesz9UiK+W2fzO33EhrxxVmE=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=WFokrkCC4/Sa9zRpSp9dTq3o86l/D9I8bwGJy3TowRpUuBlW8/ah1hhNYofD6R3Op a7OO61UK4o+14jsneyjiB7X2OQ/kLfFymNEI6Lg+EugBTVUg35bsRGZfTdkGQT0gBL XtoY6i/nmuXqSSuYMnVHjoDQqDWxHcSblcq0EvoZPXi2ek0hHOsrMPwSMz/iBsRGrM FsvIRuAsYfGvmjgeHLpuFm5qwj4aBUpBCEDc/UiJnzOAAzIWYuOdlhMsUSZ/R5deTm QNn/XhpwUcO7NSFGj/3Sro1k8QERGIxL/t6Q+HWvOl/HHXEuaqiG7zQU29qkLg0Pg9 eN9fodlDpl1pQ== To: Thomas Rosenstein Cc: bloat@lists.bufferbloat.net In-Reply-To: <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com> References: <87imalumps.fsf@toke.dk> <871rh8vf1p.fsf@toke.dk> <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com> Date: Thu, 05 Nov 2020 12:21:54 +0100 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87sg9ot5f1.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2020 11:21:58 -0000 "Thomas Rosenstein" writes: >> If so, this sounds more like a driver issue, or maybe something to do >> with scheduling. Does it only happen with ICMP? You could try this=20 >> tool >> for a userspace UDP measurement: > > It happens with all packets, therefore the transfer to backblaze with 40= =20 > threads goes down to ~8MB/s instead of >60MB/s Huh, right, definitely sounds like a kernel bug; or maybe the new kernel is getting the hardware into a state where it bugs out when there are lots of flows or something. You could try looking at the ethtool stats (ethtool -S) while running the test and see if any error counters go up. Here's a handy script to monitor changes in the counters: https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_sta= ts.pl > I'll try what that reports! > >> Also, what happens if you ping a host on the internet (*through* the >> router instead of *to* it)? > > Same issue, but twice pronounced, as it seems all interfaces are=20 > affected. > So, ping on one interface and the second has the issue. > Also all traffic across the host has the issue, but on both sides, so=20 > ping to the internet increased by 2x Right, so even an unloaded interface suffers? But this is the same NIC, right? So it could still be a hardware issue... > Yep default that CentOS ships, I just tested 4.12.5 there the issue also= =20 > does not happen. So I guess I can bisect it then...(really don't want to= =20 > =F0=9F=98=83) Well that at least narrows it down :) >> >> How did you configure the new kernel? Did you start from scratch, or=20 >> is >> it based on the old centos config? > > first oldconfig and from there then added additional options for IB,=20 > NVMe, etc (which I don't really need on the routers) OK, so you're probably building with roughly the same options in terms of scheduling granularity etc. That's good. Did you enable spectre mitigations etc on the new kernel? What's the output of `tail /sys/devices/system/cpu/vulnerabilities/*` ? -Toke