From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4B3CC3B29D for ; Fri, 6 Nov 2020 03:48:50 -0500 (EST) Received: by mail-wr1-x431.google.com with SMTP id g12so427030wrp.10 for ; Fri, 06 Nov 2020 00:48:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=creamfinance.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MjhtckdD/DX37u+Vx5T7Zf5gt4P025+PT0BxXWlBXvo=; b=DPnO/X6GyWWV4WC6FX543idSgN+gwGYzUyCH1fqi2aFLedBlnxnxHYAHwnxkCASE9h nngY79SSYSND5bgE34dDPTPQqwleLXU2wwiBxYaKZ33/Dt7FwuVWJ3D/kCymxQ4M3vIT /pPzrBhNrqTMBbNLlxvbFfsymfDS9Uwue7frE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MjhtckdD/DX37u+Vx5T7Zf5gt4P025+PT0BxXWlBXvo=; b=lR001B5JShb5mriyHDp6pGLgWOU2gKuQNchj+h2LKCyZFXoagfg2oCVuv2vvhc1ep0 O4KpZmTiU+W1r+w3Z5UL4VJPsI62nM8EdfZvM26XDyBonCCxt68j0LTVCRvHa1wTbk+Z cxNoSoBtzbP0Dniwbml+KCKFTfb9u+Y1655kLQhn6AdGw3dagd8rM22YPyi2ri7K8g+M ViVh5MynTa4Shi/2fZfRYbU26i+/+gFD90/5rGsUKKGKEsMZjsgHRZ66jypbv4Ily1vz BCsE/9tBnSt51s+mMFUOfePezWLgEkMrzButDv7Pq5eMBmRpWaKLv+HSq88DoixFC6NR M7Gg== X-Gm-Message-State: AOAM532FjhvU4lQ4lltwwf8Qb71+josg5l2r0hhMsxVeFQB6QgGiE8u8 DUqMQZv9UZMQ8tq1QfELTL/C X-Google-Smtp-Source: ABdhPJyMnSY+QQx3Dy8pDzLuWsFbgzHidRAXmvON0ZV0Vk6Vzyd404JMqmSCaVy7bHn1cRLkjlISpw== X-Received: by 2002:adf:e486:: with SMTP id i6mr1380072wrm.397.1604652528858; Fri, 06 Nov 2020 00:48:48 -0800 (PST) Received: from [10.8.100.3] (ip-185.208.132.9.cf-it.at. [185.208.132.9]) by smtp.gmail.com with ESMTPSA id v6sm1226858wrb.53.2020.11.06.00.48.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 06 Nov 2020 00:48:48 -0800 (PST) From: "Thomas Rosenstein" To: "Jesper Dangaard Brouer" Cc: Bloat , "Toke =?utf-8?b?SMO4aWxhbmQtSsO4cmdlbnNlbg==?=" Date: Fri, 06 Nov 2020 09:48:46 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: In-Reply-To: <20201105143317.78276bbc@carbon> References: <87imalumps.fsf@toke.dk> <871rh8vf1p.fsf@toke.dk> <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com> <87sg9ot5f1.fsf@toke.dk> <20201105143317.78276bbc@carbon> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2020 08:48:50 -0000 On 5 Nov 2020, at 14:33, Jesper Dangaard Brouer wrote: > On Thu, 05 Nov 2020 13:22:10 +0100 > Thomas Rosenstein via Bloat wrote: > >> On 5 Nov 2020, at 12:21, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >>> "Thomas Rosenstein" writes: >>> >>>>> If so, this sounds more like a driver issue, or maybe something to >>>>> do with scheduling. Does it only happen with ICMP? You could try = >>>>> this >>>>> tool for a userspace UDP measurement: >>>> >>>> It happens with all packets, therefore the transfer to backblaze = >>>> with >>>> 40 threads goes down to ~8MB/s instead of >60MB/s >>> >>> Huh, right, definitely sounds like a kernel bug; or maybe the new >>> kernel is getting the hardware into a state where it bugs out when >>> there are lots of flows or something. >>> >>> You could try looking at the ethtool stats (ethtool -S) while >>> running the test and see if any error counters go up. Here's a >>> handy script to monitor changes in the counters: >>> >>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethto= ol_stats.pl >>> >>>> I'll try what that reports! >>>> >>>>> Also, what happens if you ping a host on the internet (*through* >>>>> the router instead of *to* it)? >>>> >>>> Same issue, but twice pronounced, as it seems all interfaces are >>>> affected. >>>> So, ping on one interface and the second has the issue. >>>> Also all traffic across the host has the issue, but on both sides, >>>> so ping to the internet increased by 2x >>> >>> Right, so even an unloaded interface suffers? But this is the same >>> NIC, right? So it could still be a hardware issue... >>> >>>> Yep default that CentOS ships, I just tested 4.12.5 there the >>>> issue also does not happen. So I guess I can bisect it >>>> then...(really don't want to =F0=9F=98=83) >>> >>> Well that at least narrows it down :) >> >> I just tested 5.9.4 seems to also fix it partly, I have long >> stretches where it looks good, and then some increases again. (3.10 >> Stock has them too, but not so high, rather 1-3 ms) >> >> for example: >> >> 64 bytes from x.x.x.x: icmp_seq=3D10 ttl=3D64 time=3D0.169 ms >> 64 bytes from x.x.x.x: icmp_seq=3D11 ttl=3D64 time=3D5.53 ms >> 64 bytes from x.x.x.x: icmp_seq=3D12 ttl=3D64 time=3D9.44 ms >> 64 bytes from x.x.x.x: icmp_seq=3D13 ttl=3D64 time=3D0.167 ms >> 64 bytes from x.x.x.x: icmp_seq=3D14 ttl=3D64 time=3D3.88 ms >> >> and then again: >> >> 64 bytes from x.x.x.x: icmp_seq=3D15 ttl=3D64 time=3D0.569 ms >> 64 bytes from x.x.x.x: icmp_seq=3D16 ttl=3D64 time=3D0.148 ms >> 64 bytes from x.x.x.x: icmp_seq=3D17 ttl=3D64 time=3D0.286 ms >> 64 bytes from x.x.x.x: icmp_seq=3D18 ttl=3D64 time=3D0.257 ms >> 64 bytes from x.x.x.x: icmp_seq=3D19 ttl=3D64 time=3D0.220 ms >> 64 bytes from x.x.x.x: icmp_seq=3D20 ttl=3D64 time=3D0.125 ms >> 64 bytes from x.x.x.x: icmp_seq=3D21 ttl=3D64 time=3D0.188 ms >> 64 bytes from x.x.x.x: icmp_seq=3D22 ttl=3D64 time=3D0.202 ms >> 64 bytes from x.x.x.x: icmp_seq=3D23 ttl=3D64 time=3D0.195 ms >> 64 bytes from x.x.x.x: icmp_seq=3D24 ttl=3D64 time=3D0.177 ms >> 64 bytes from x.x.x.x: icmp_seq=3D25 ttl=3D64 time=3D0.242 ms >> 64 bytes from x.x.x.x: icmp_seq=3D26 ttl=3D64 time=3D0.339 ms >> 64 bytes from x.x.x.x: icmp_seq=3D27 ttl=3D64 time=3D0.183 ms >> 64 bytes from x.x.x.x: icmp_seq=3D28 ttl=3D64 time=3D0.221 ms >> 64 bytes from x.x.x.x: icmp_seq=3D29 ttl=3D64 time=3D0.317 ms >> 64 bytes from x.x.x.x: icmp_seq=3D30 ttl=3D64 time=3D0.210 ms >> 64 bytes from x.x.x.x: icmp_seq=3D31 ttl=3D64 time=3D0.242 ms >> 64 bytes from x.x.x.x: icmp_seq=3D32 ttl=3D64 time=3D0.127 ms >> 64 bytes from x.x.x.x: icmp_seq=3D33 ttl=3D64 time=3D0.217 ms >> 64 bytes from x.x.x.x: icmp_seq=3D34 ttl=3D64 time=3D0.184 ms >> >> >> For me it looks now that there was some fix between 5.4.60 and 5.9.4 >> ... anyone can pinpoint it? So, new day, same issue! I upgraded now all routers to 5.9.4, and the issue is back ... here, when I stop it, it goes immediatly down to 0.xx ms 64 bytes from x.x.x.x: icmp_seq=3D48 ttl=3D64 time=3D1.67 ms 64 bytes from x.x.x.x: icmp_seq=3D49 ttl=3D64 time=3D12.6 ms 64 bytes from x.x.x.x: icmp_seq=3D50 ttl=3D64 time=3D13.8 ms 64 bytes from x.x.x.x: icmp_seq=3D51 ttl=3D64 time=3D5.59 ms 64 bytes from x.x.x.x: icmp_seq=3D52 ttl=3D64 time=3D5.86 ms 64 bytes from x.x.x.x: icmp_seq=3D53 ttl=3D64 time=3D9.26 ms 64 bytes from x.x.x.x: icmp_seq=3D54 ttl=3D64 time=3D8.28 ms 64 bytes from x.x.x.x: icmp_seq=3D55 ttl=3D64 time=3D12.4 ms 64 bytes from x.x.x.x: icmp_seq=3D56 ttl=3D64 time=3D0.551 ms 64 bytes from x.x.x.x: icmp_seq=3D57 ttl=3D64 time=3D4.37 ms 64 bytes from x.x.x.x: icmp_seq=3D58 ttl=3D64 time=3D12.1 ms 64 bytes from x.x.x.x: icmp_seq=3D59 ttl=3D64 time=3D5.93 ms 64 bytes from x.x.x.x: icmp_seq=3D60 ttl=3D64 time=3D6.58 ms 64 bytes from x.x.x.x: icmp_seq=3D61 ttl=3D64 time=3D9.19 ms 64 bytes from x.x.x.x: icmp_seq=3D62 ttl=3D64 time=3D0.124 ms 64 bytes from x.x.x.x: icmp_seq=3D63 ttl=3D64 time=3D7.08 ms 64 bytes from x.x.x.x: icmp_seq=3D64 ttl=3D64 time=3D9.69 ms 64 bytes from x.x.x.x: icmp_seq=3D65 ttl=3D64 time=3D7.52 ms 64 bytes from x.x.x.x: icmp_seq=3D66 ttl=3D64 time=3D14.9 ms 64 bytes from x.x.x.x: icmp_seq=3D67 ttl=3D64 time=3D12.6 ms 64 bytes from x.x.x.x: icmp_seq=3D68 ttl=3D64 time=3D2.34 ms 64 bytes from x.x.x.x: icmp_seq=3D69 ttl=3D64 time=3D8.97 ms 64 bytes from x.x.x.x: icmp_seq=3D70 ttl=3D64 time=3D0.203 ms 64 bytes from x.x.x.x: icmp_seq=3D71 ttl=3D64 time=3D9.10 ms 64 bytes from x.x.x.x: icmp_seq=3D72 ttl=3D64 time=3D3.16 ms 64 bytes from x.x.x.x: icmp_seq=3D73 ttl=3D64 time=3D1.88 ms 64 bytes from x.x.x.x: icmp_seq=3D74 ttl=3D64 time=3D11.5 ms 64 bytes from x.x.x.x: icmp_seq=3D75 ttl=3D64 time=3D11.9 ms 64 bytes from x.x.x.x: icmp_seq=3D76 ttl=3D64 time=3D3.64 ms 64 bytes from x.x.x.x: icmp_seq=3D77 ttl=3D64 time=3D5.82 ms 64 bytes from x.x.x.x: icmp_seq=3D78 ttl=3D64 time=3D0.121 ms 64 bytes from x.x.x.x: icmp_seq=3D79 ttl=3D64 time=3D0.212 ms 64 bytes from x.x.x.x: icmp_seq=3D80 ttl=3D64 time=3D0.158 ms 64 bytes from x.x.x.x: icmp_seq=3D81 ttl=3D64 time=3D0.174 ms 64 bytes from x.x.x.x: icmp_seq=3D82 ttl=3D64 time=3D0.160 ms 64 bytes from x.x.x.x: icmp_seq=3D83 ttl=3D64 time=3D0.210 ms 64 bytes from x.x.x.x: icmp_seq=3D84 ttl=3D64 time=3D0.211 ms 64 bytes from x.x.x.x: icmp_seq=3D85 ttl=3D64 time=3D0.182 ms 64 bytes from x.x.x.x: icmp_seq=3D86 ttl=3D64 time=3D0.161 ms 64 bytes from x.x.x.x: icmp_seq=3D87 ttl=3D64 time=3D0.167 ms 64 bytes from x.x.x.x: icmp_seq=3D88 ttl=3D64 time=3D0.143 ms 64 bytes from x.x.x.x: icmp_seq=3D89 ttl=3D64 time=3D0.186 ms 64 bytes from x.x.x.x: icmp_seq=3D90 ttl=3D64 time=3D0.148 ms 64 bytes from x.x.x.x: icmp_seq=3D91 ttl=3D64 time=3D0.187 ms 64 bytes from x.x.x.x: icmp_seq=3D92 ttl=3D64 time=3D0.174 ms 64 bytes from x.x.x.x: icmp_seq=3D93 ttl=3D64 time=3D0.167 ms I have also tried the bpftrace, there were no high IRQ reported during = the test, here's the latency output: @latency_usecs[21]: [0] 17 | = | [1] 463 |@@@ = | [2, 4) 6814 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 5401 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 136 |@ = | [16, 32) 42 | = | [32, 64) 2 | = | [64, 128) 1 | = | @latency_usecs[13]: [0] 108 | = | [1] 2089 |@@@ = | [2, 4) 32112 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 7642 |@@@@@@@@@@@@ = | [8, 16) 462 | = | [16, 32) 159 | = | [32, 64) 32 | = | [64, 128) 34 | = | [128, 256) 10 | = | [256, 512) 1 | = | @latency_usecs[22]: [0] 244 | = | [1] 3481 |@@@@@ = | [2, 4) 33795 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 13848 |@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 607 | = | [16, 32) 214 | = | [32, 64) 58 | = | [64, 128) 42 | = | [128, 256) 12 | = | @latency_usecs[16]: [0] 539 | = | [1] 7601 |@@@@@@ = | [2, 4) 62482 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 22456 |@@@@@@@@@@@@@@@@@@ = | [8, 16) 1252 |@ = | [16, 32) 480 | = | [32, 64) 211 | = | [64, 128) 134 | = | [128, 256) 17 | = | @latency_usecs[2]: [0] 352 | = | [1] 18810 |@@@@@@@@@@@@@ = | [2, 4) 74915 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 11504 |@@@@@@@ = | [8, 16) 1077 | = | [16, 32) 479 | = | [32, 64) 329 | = | [64, 128) 359 | = | [128, 256) 79 | = | [256, 512) 2 | = | @latency_usecs[23]: [0] 1214 | = | [1] 9483 |@@@@@@@ = | [2, 4) 64949 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 34220 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 1940 |@ = | [16, 32) 775 | = | [32, 64) 224 | = | [64, 128) 220 | = | [128, 256) 51 | = | @latency_usecs[9]: [0] 407 | = | [1] 8040 |@@@@@ = | [2, 4) 80780 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 46867 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 3232 |@@ = | [16, 32) 614 | = | [32, 64) 337 | = | [64, 128) 231 | = | [128, 256) 33 | = | [256, 512) 0 | = | [512, 1K) 0 | = | [1K, 2K) 0 | = | [2K, 4K) 0 | = | [4K, 8K) 0 | = | [8K, 16K) 1 | = | @latency_usecs[4]: [0] 513 | = | [1] 10016 |@@@@@ = | [2, 4) 100149 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 32956 |@@@@@@@@@@@@@@@@@ = | [8, 16) 1436 | = | [16, 32) 731 | = | [32, 64) 285 | = | [64, 128) 302 | = | [128, 256) 64 | = | [256, 512) 1 | = | @latency_usecs[18]: [0] 679 | = | [1] 10962 |@@@@@@ = | [2, 4) 93837 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 36900 |@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 2050 |@ = | [16, 32) 1283 | = | [32, 64) 390 | = | [64, 128) 419 | = | [128, 256) 91 | = | [256, 512) 1 | = | @latency_usecs[3]: [0] 792 | = | [1] 12109 |@@@@@@@ = | [2, 4) 83144 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 43316 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 4688 |@@ = | [16, 32) 2087 |@ = | [32, 64) 786 | = | [64, 128) 791 | = | [128, 256) 316 | = | [256, 512) 43 | = | [512, 1K) 5 | = | [1K, 2K) 1 | = | @latency_usecs[5]: [0] 1990 |@ = | [1] 14018 |@@@@@@@@ = | [2, 4) 85797 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 56720 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 2993 |@ = | [16, 32) 1239 | = | [32, 64) 536 | = | [64, 128) 567 | = | [128, 256) 106 | = | [256, 512) 5 | = | [512, 1K) 11 | = | [1K, 2K) 4 | = | @latency_usecs[15]: [0] 719 | = | [1] 12086 |@@@@@@ = | [2, 4) 95664 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 57490 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 3632 |@ = | [16, 32) 1417 | = | [32, 64) 695 | = | [64, 128) 696 | = | [128, 256) 182 | = | [256, 512) 2 | = | [512, 1K) 7 | = | [1K, 2K) 6 | = | @latency_usecs[6]: [0] 585 | = | [1] 11880 |@@@@@@@ = | [2, 4) 87736 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 72264 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 2554 |@ = | [16, 32) 1403 | = | [32, 64) 813 | = | [64, 128) 624 | = | [128, 256) 168 | = | [256, 512) 23 | = | [512, 1K) 3 | = | @latency_usecs[11]: [0] 768 | = | [1] 19378 |@@@@@@@@@ = | [2, 4) 104935 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 81555 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 3645 |@ = | [16, 32) 1535 | = | [32, 64) 837 | = | [64, 128) 913 | = | [128, 256) 207 | = | [256, 512) 2 | = | @latency_usecs[1]: [0] 1511 | = | [1] 23327 |@@@@@@@@@@ = | [2, 4) 107938 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 118722 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8, 16) 5364 |@@ = | [16, 32) 2551 |@ = | [32, 64) 874 | = | [64, 128) 809 | = | [128, 256) 192 | = | [256, 512) 3 | = | [512, 1K) 0 | = | [1K, 2K) 1 | = | @latency_usecs[17]: [0] 1268 | = | [1] 16486 |@@@@@ = | [2, 4) 146148 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 92483 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 12484 |@@@@ = | [16, 32) 2330 | = | [32, 64) 1040 | = | [64, 128) 1054 | = | [128, 256) 234 | = | [256, 512) 6 | = | [512, 1K) 2 | = | [1K, 2K) 1 | = | [2K, 4K) 0 | = | [4K, 8K) 1 | = | @latency_usecs[14]: [0] 2685 |@ = | [1] 119278 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 4) 113518 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 33067 |@@@@@@@@@@@@@@ = | [8, 16) 9740 |@@@@ = | [16, 32) 3348 |@ = | [32, 64) 2245 | = | [64, 128) 1515 | = | [128, 256) 466 | = | [256, 512) 8 | = | @latency_usecs[10]: [0] 2013 | = | [1] 28900 |@@@@@@@@ = | [2, 4) 175104 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 71562 |@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 6094 |@ = | [16, 32) 2884 | = | [32, 64) 1481 | = | [64, 128) 1450 | = | [128, 256) 311 | = | [256, 512) 6 | = | [512, 1K) 1 | = | @latency_usecs[7]: [0] 4790 |@ = | [1] 94276 |@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [2, 4) 186012 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 44289 |@@@@@@@@@@@@ = | [8, 16) 6822 |@ = | [16, 32) 2842 | = | [32, 64) 2361 | = | [64, 128) 3178 | = | [128, 256) 910 | = | [256, 512) 19 | = | [512, 1K) 1 | = | @latency_usecs[12]: [0] 2360 | = | [1] 33684 |@@@@@@ = | [2, 4) 280464 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 51498 |@@@@@@@@@ = | [8, 16) 10636 |@ = | [16, 32) 4374 | = | [32, 64) 2077 | = | [64, 128) 2414 | = | [128, 256) 596 | = | [256, 512) 8 | = | [512, 1K) 0 | = | [1K, 2K) 0 | = | [2K, 4K) 0 | = | [4K, 8K) 1 | = | @latency_usecs[0]: [0] 3559 | = | [1] 41168 |@@@@@@@@ = | [2, 4) 251224 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 126271 |@@@@@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 15213 |@@@ = | [16, 32) 6788 |@ = | [32, 64) 4227 | = | [64, 128) 4231 | = | [128, 256) 1669 | = | [256, 512) 395 | = | [512, 1K) 54 | = | [1K, 2K) 1 | = | @latency_usecs[8]: [0] 5280 | = | [1] 43115 |@@@@@@@ = | [2, 4) 319098 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 98723 |@@@@@@@@@@@@@@@@ = | [8, 16) 16048 |@@ = | [16, 32) 9455 |@ = | [32, 64) 3755 | = | [64, 128) 4650 | = | [128, 256) 1284 | = | [256, 512) 61 | = | [512, 1K) 2 | = | [1K, 2K) 1 | = | [2K, 4K) 1 | = | [4K, 8K) 2 | = | @latency_usecs[19]: [0] 49889 |@@@@@@@@@@@@ = | [1] 199560 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 4) 185771 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 59143 |@@@@@@@@@@@@@@@ = | [8, 16) 24402 |@@@@@@ = | [16, 32) 13158 |@@@ = | [32, 64) 12889 |@@@ = | [64, 128) 12269 |@@@ = | [128, 256) 6322 |@ = | [256, 512) 299 | = | [512, 1K) 8 | = | [1K, 2K) 1 | = | @latency_usecs[20]: [0] 6314 | = | [1] 51792 |@@@@@@@ = | [2, 4) 353847 = |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 154730 |@@@@@@@@@@@@@@@@@@@@@@ = | [8, 16) 23434 |@@@ = | [16, 32) 8441 |@ = | [32, 64) 3300 | = | [64, 128) 3058 | = | [128, 256) 1017 | = | [256, 512) 124 | = | [512, 1K) 12 | = | [1K, 2K) 1 | = | @Jesper: how should that be interpreted? > > I have some bpftrace tools to measure these kind of latency spikes = > here: > [1] = > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/ > > The tool you want is: softirq_net_latency.bt > [2] = > https://github.com/xdp-project/xdp-project/blob/master/areas/latency/so= ftirq_net_latency.bt > > Example output see[3]: > [3] https://bugzilla.redhat.com/show_bug.cgi?id=3D1795049#c8 > > Based on the kernel versions, I don't expect this to be same latency > issue as described in the bugzilla[3] case (as IIRC it was fixed in > 4.19). It can still be similar issue, where some userspace process is > reading information from the kernel (/sys/fs/cgroup/memory/memory.stat > in BZ case) that blocks softirq from running, and result in these > latency spikes. > > > > Install guide to bpftrace[4]: > [4] = > https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftra= ce/INSTALL.org > -- = > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer