From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 817EB3B29E for ; Wed, 4 Nov 2020 10:23:15 -0500 (EST) Received: by mail-wr1-x42b.google.com with SMTP id 33so11667418wrl.7 for ; Wed, 04 Nov 2020 07:23:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=creamfinance.com; s=google; h=from:to:subject:date:message-id:mime-version; bh=Ks41rYT+q3bSVWq/mvMjaug908dAOpMhU0JSTVhArAY=; b=afW1ZCvSjFDlgbrH0xhL4xUub4LDH1jAspcBKQCiRcLeDkq3t/soBKcYko9YJcTfK4 rlIhEQCqMdGUSS6tdj4wjWOdFVJS/COver5u0bPzKRlZdiQWikGA97SGrseIGaqSYeuh QUuBas3cn0DNco3V+6xApFT/sxIWJqnK83yzw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=Ks41rYT+q3bSVWq/mvMjaug908dAOpMhU0JSTVhArAY=; b=A6zYOv5qoZ+vezZZMCbGPh3oBbhvuZ+1DHJYlz0d7jkvkDOFMsrxqdsGQPN83j+MTt wcBXLTAzOY4+a+CsA6DmkzdTaRyf1VOmEEfWhK3ISoeORvXnKxys8YxzB4BYsytStEfZ PT00OtJhgyIsj1A1XmXAkiQZAD/hVjjNho3fRf6qXLLp49qRmxBWVXjkHSLoGSnQ/Lna dNwL6whyakJhXYKOLqaS/lM4PYebE1R6HV7DrXjwpQeyt1UJcHPC//m4KUuVPmTCGmjK ozZmsE7pagwcentLNPukeBN02hfanEJ71sr3D0IUKqGthTr+jbGJD9Upb3zXK8M6xMMF HSwQ== X-Gm-Message-State: AOAM5308mlWue906d33dptOJBxsyAbFUsedJiGxD4jP13WU6TyH+ZUEi oDgOZoDaMMxOEskv6bExx94Qbyp/s3dnLEM= X-Google-Smtp-Source: ABdhPJxT52RLHKHsmxpyrferLmC7MAd5Ay0/hqPzI3z8LuMdakbqruoddAocl9XQm4dksD61TQtxDg== X-Received: by 2002:adf:9163:: with SMTP id j90mr14720317wrj.323.1604503394040; Wed, 04 Nov 2020 07:23:14 -0800 (PST) Received: from [10.8.100.3] (ip-185.208.132.9.cf-it.at. [185.208.132.9]) by smtp.gmail.com with ESMTPSA id q2sm3017647wrw.40.2020.11.04.07.23.13 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2020 07:23:13 -0800 (PST) From: "Thomas Rosenstein" To: bloat@lists.bufferbloat.net Date: Wed, 04 Nov 2020 16:23:12 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: MIME-Version: 1.0 Content-Type: text/plain; format=flowed; markup=markdown Subject: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 15:23:15 -0000 Hi all, I'm coming from the lartc mailing list, here's the original text: ===== I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shift in icmp (and generally all connection) if I run b2 upload-file --threads 40 (and I can reproduce this) What options do I have to analyze why this happens? General Info: Routers are connected between each other with 10G Mellanox Connect-X cards via 10G SPF+ DAC cables via a 10G Switch from fs.com Latency generally is around 0.18 ms between all routers (4). Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3. 2 of the 4 routers are connected upstream with a 1G connection (separate port, same network card) All routers have the full internet routing tables, i.e. 80k entries for IPv6 and 830k entries for IPv4 Conntrack is disabled (-j NOTRACK) Kernel 5.4.60 (custom) 2x Xeon X5670 @ 2.93 Ghz 96 GB RAM No Swap CentOs 7 During high latency: Latency on routers which have the traffic flow increases to 12 - 20 ms, for all interfaces, moving of the stream (via bgp disable session) moves also the high latency iperf3 performance plumets to 300 - 400 MBits CPU load (user / system) are around 0.1% Ram Usage is around 3 - 4 GB if_packets count is stable (around 8000 pkt/s more) for b2 upload-file with 10 threads I can achieve 60 MB/s consistently, with 40 threads the performance drops to 8 MB/s I do not believe that 40 tcp streams should be any problem for a machine of that size. Thanks for any ideas, help, pointers, things I can verify / check / provide additional! ======= So far I have tested: 1) Use Stock Kernel 3.10.0-541 -> issue does not happen 2) setup fq_codel on the interfaces: Here is the tc -s qdisc output: qdisc fq_codel 8005: dev eth4 root refcnt 193 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 8374229144 bytes 10936167 pkt (dropped 0, overlimits 0 requeues 6127) backlog 0b 0p requeues 6127 maxpacket 25398 drop_overlimit 0 new_flow_count 15441 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8008: dev eth5 root refcnt 193 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 1072480080 bytes 1012973 pkt (dropped 0, overlimits 0 requeues 735) backlog 0b 0p requeues 735 maxpacket 19682 drop_overlimit 0 new_flow_count 15963 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8004: dev eth4.2300 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 8441021899 bytes 11021070 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 68130 drop_overlimit 0 new_flow_count 257055 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8006: dev eth5.2501 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 571984459 bytes 2148377 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 7570 drop_overlimit 0 new_flow_count 11300 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8007: dev eth5.2502 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 1401322222 bytes 1966724 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 19682 drop_overlimit 0 new_flow_count 76653 ecn_mark 0 new_flows_len 0 old_flows_len 0 I have no statistics / metrics that would point to a slow down on the server, cpu / load / network / packets / memory all show normal very low load. Is there other, (hidden) metrics I can collect to analyze this issue further? Thanks Thomas