From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 2D9A73B29E for ; Thu, 5 Nov 2020 03:48:37 -0500 (EST) Received: by mail-wr1-x434.google.com with SMTP id w1so753705wrm.4 for ; Thu, 05 Nov 2020 00:48:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=creamfinance.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cgAXQM8MHE4rh8iahu/5cwQPIT6b32MCSbpxlh1EPJA=; b=RZkJKsVPJKuja1/k+PPY0kKInqsx/iiyRNlo8e9+oIlqWnMew0lPfdp+nAwN/b44Eo 8nfA2SymKgBc+DIWB1VYQgvWxO7Fg/n1+tYh8Fgsy75A3JBv6FbqVqYzGPXbp6O46fC7 EiP6DeyBbSvpgXIS1icuHRbSbT7iVUSILYedg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cgAXQM8MHE4rh8iahu/5cwQPIT6b32MCSbpxlh1EPJA=; b=i4rJ9E9Jm/CWk0dKEf7VUU0yFztV/jaPdemT4KxLAd25z3Wh8+AGARcKSW26x6Aml5 N1WdZvAjlX8379IDK2lCUTOMulusBUFkR4PrPcWXIdVkzgZpfUxs4Hnh3gBHCn1n5LRc qWbwcKnZsLpQ7NgrX51XU7aE9nQ9fAk46nT2A63DLbfVANgpsLndCjAh/Quu6nQbWL4D cgD0CNM0fCQqB/V1km0Zl6xidMDzpwoAPZK2kQy70msWQ/J2BkS86XbuFiioXPfHdWhJ tMT6m8v+Ye6lD9OIuuva3W8szAgSCm8i8DQhEB/W0x2KYxZQymZSejN+mulDoUlFQtfH KiWQ== X-Gm-Message-State: AOAM531gDOFKRad7T2d/LIRrfM03JHh02G5p8DjUaWhNxMYQPDewNua6 xS4Tjy/BKupZ6gL8S5eE0E1gIrhB/e4THqI= X-Google-Smtp-Source: ABdhPJz7CXv7vMQw0r63rgJBC1i9+28sbSj2lApliQ5BB/JcNnYHtnf/UP324+Waw/a3pDPRbIiYbQ== X-Received: by 2002:adf:f3c7:: with SMTP id g7mr1669770wrp.394.1604566115867; Thu, 05 Nov 2020 00:48:35 -0800 (PST) Received: from [10.8.100.3] (ip-185.208.132.9.cf-it.at. [185.208.132.9]) by smtp.gmail.com with ESMTPSA id v14sm1482725wrq.46.2020.11.05.00.48.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Nov 2020 00:48:35 -0800 (PST) From: "Thomas Rosenstein" To: "Toke =?utf-8?b?SMO4aWxhbmQtSsO4cmdlbnNlbg==?=" Cc: bloat@lists.bufferbloat.net Date: Thu, 05 Nov 2020 09:48:33 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com> In-Reply-To: <871rh8vf1p.fsf@toke.dk> References: <87imalumps.fsf@toke.dk> <871rh8vf1p.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2020 08:48:37 -0000 On 5 Nov 2020, at 1:10, Toke H=C3=B8iland-J=C3=B8rgensen wrote: > "Thomas Rosenstein" writes: > >> On 4 Nov 2020, at 17:10, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >>> Thomas Rosenstein via Bloat writes: >>> >>>> Hi all, >>>> >>>> I'm coming from the lartc mailing list, here's the original text: >>>> >>>> =3D=3D=3D=3D=3D >>>> >>>> I have multiple routers which connect to multiple upstream = >>>> providers, >>>> I >>>> have noticed a high latency shift in icmp (and generally all >>>> connection) >>>> if I run b2 upload-file --threads 40 (and I can reproduce this) >>>> >>>> What options do I have to analyze why this happens? >>>> >>>> General Info: >>>> >>>> Routers are connected between each other with 10G Mellanox = >>>> Connect-X >>>> cards via 10G SPF+ DAC cables via a 10G Switch from fs.com >>>> Latency generally is around 0.18 ms between all routers (4). >>>> Throughput is 9.4 Gbit/s with 0 retransmissions when tested with >>>> iperf3. >>>> 2 of the 4 routers are connected upstream with a 1G connection >>>> (separate >>>> port, same network card) >>>> All routers have the full internet routing tables, i.e. 80k entries >>>> for >>>> IPv6 and 830k entries for IPv4 >>>> Conntrack is disabled (-j NOTRACK) >>>> Kernel 5.4.60 (custom) >>>> 2x Xeon X5670 @ 2.93 Ghz >>>> 96 GB RAM >>>> No Swap >>>> CentOs 7 >>>> >>>> During high latency: >>>> >>>> Latency on routers which have the traffic flow increases to 12 - 20 >>>> ms, >>>> for all interfaces, moving of the stream (via bgp disable session) >>>> moves >>>> also the high latency >>>> iperf3 performance plumets to 300 - 400 MBits >>>> CPU load (user / system) are around 0.1% >>>> Ram Usage is around 3 - 4 GB >>>> if_packets count is stable (around 8000 pkt/s more) >>> >>> I'm not sure I get you topology. Packets are going from where to >>> where, >>> and what link is the bottleneck for the transfer you're doing? Are = >>> you >>> measuring the latency along the same path? >>> >>> Have you tried running 'mtr' to figure out which hop the latency is >>> at? >> >> I tried to draw the topology, I hope this is okay and explains = >> betters >> what's happening: >> >> https://drive.google.com/file/d/15oAsxiNfsbjB9a855Q_dh6YvFZBDdY5I/view= ?usp=3Dsharing > > Ohh, right, you're pinging between two of the routers across a 10 Gbps > link with plenty of capacity to spare, and *that* goes up by two = > orders > of magnitude when you start the transfer, even though the transfer > itself is <1Gbps? Am I understanding you correctly now? Exactly :) > > If so, this sounds more like a driver issue, or maybe something to do > with scheduling. Does it only happen with ICMP? You could try this = > tool > for a userspace UDP measurement: It happens with all packets, therefore the transfer to backblaze with 40 = threads goes down to ~8MB/s instead of >60MB/s > > https://github.com/heistp/irtt/ > I'll try what that reports! > Also, what happens if you ping a host on the internet (*through* the > router instead of *to* it)? Same issue, but twice pronounced, as it seems all interfaces are = affected. So, ping on one interface and the second has the issue. Also all traffic across the host has the issue, but on both sides, so = ping to the internet increased by 2x > > And which version of the Connect-X cards are you using (or rather, = > which > driver? mlx4?) > It's Connect-X 4 Lx cards, specifcally: MCX4121A-ACAT Driver is mlx5_core >> So it must be something in the kernel tacking on a delay, I could try = >> to >> do a bisect and build like 10 kernels :) > > That may ultimately end up being necessary. However, when you say = > 'stock > kernel' you mean what CentOS ships, right? If so, that's not really a > 3.10 kernel - the RHEL kernels (that centos is based on) are... = > somewhat > creative... about their versioning. So if you're switched to a vanilla > upstream kernel you may find bisecting difficult :/ Yep default that CentOS ships, I just tested 4.12.5 there the issue also = does not happen. So I guess I can bisect it then...(really don't want to = =F0=9F=98=83) > > How did you configure the new kernel? Did you start from scratch, or = > is > it based on the old centos config? first oldconfig and from there then added additional options for IB, = NVMe, etc (which I don't really need on the routers) > > -Toke