From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 9CC793B2A4 for ; Mon, 9 Nov 2020 09:33:50 -0500 (EST) Received: by mail-wm1-x335.google.com with SMTP id v5so8170209wmh.1 for ; Mon, 09 Nov 2020 06:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=creamfinance.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZGscuHd7URo9hatYkpo6+N2j4aPM8NgfgZaLqFQXKtc=; b=Raqi8mYs2fp6K4m7cFFC95n1mhd2A6ZV/1tDUYzSHv4cJ4/TIAjhAZ0fwLpqJjDcnK TiwOTkG2g6ZzbmgcthBtSaoP/u5bGv501tNOf/VVAwR2lXX1plvZai95bjyC6SqEu+po Z/72z3SKzH3lwfTRF+6LxaZYIwHq7Ixcexc0c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZGscuHd7URo9hatYkpo6+N2j4aPM8NgfgZaLqFQXKtc=; b=Z10L43taEdPqgjDBZdaycVncy5dVHY1MpH1ICfI2QiZhCBWcAc0nlVhm4IaFSS0lS9 WJVstRoZjxJindYJvYrdFhTbG7DhKiNaLdFwi1EvkOie3N8mnZKLVuuSISTFL8knBuzY xtuxhehAOhrAwTWgejpbZBYh7bhHXvlNjjTm3pdeWP638bzz4WFuzo4N5fYbYmN5MgPp wJFZFG1GVrrm0Nc8B/C133rnLupFBR0KodUuRf20JNgsG5byOO2q+WGPMF4YgnnFAA78 dnhqXze+5k5z5u2p/Vsy2XaIf/2prCzNjSPkkjwxLJRp8oz3gPbAykxNLoEFPTKTN7rw c7rg== X-Gm-Message-State: AOAM532mEB9I9ZcuqRpx1PcfBtAD8RatMBuAyhzIVeM0Qsj8HB3f39Hs v7n4upPRhAXy738z44Mguitc X-Google-Smtp-Source: ABdhPJyL575JqZBNEaon/nWEHzMFoRs1TXCetNISG49E2YqT0O50E1vRdfpYWmlNM7ueVNbKc61Ibw== X-Received: by 2002:a1c:4054:: with SMTP id n81mr15165774wma.48.1604932429281; Mon, 09 Nov 2020 06:33:49 -0800 (PST) Received: from [10.8.100.3] (ip-185.208.132.9.cf-it.at. [185.208.132.9]) by smtp.gmail.com with ESMTPSA id u5sm12873494wml.13.2020.11.09.06.33.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 09 Nov 2020 06:33:48 -0800 (PST) From: "Thomas Rosenstein" To: "Jesper Dangaard Brouer" Cc: "Thomas Rosenstein via Bloat" Date: Mon, 09 Nov 2020 15:33:46 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: <27110D8E-77DF-4D10-A5EA-6430DBD55BC7@creamfinance.com> In-Reply-To: <20201109124030.71216677@carbon> References: <87imalumps.fsf@toke.dk> <871rh8vf1p.fsf@toke.dk> <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com> <87sg9ot5f1.fsf@toke.dk> <20201105143317.78276bbc@carbon> <11812D44-BD46-4CA4-BA39-6080BD88F163@creamfinance.com> <20201106121840.7959ae4b@carbon> <87blgaso84.fsf@toke.dk> <20201106135358.09f6c281@carbon> <20201106151324.5f506574@carbon> <1E70B6D2-1212-43FA-989A-03B657EEE2F2@creamfinance.com> <20201106211940.4c30ccc9@carbon> <6963be0e-3eb5-5875-b53c-66033f50dc2d@gmail.com> <12D28386-7C00-4A31-91E4-37083C1674F9@creamfinance.com> <20201109092428.293104ea@carbon> <7723D882-4DAB-4A70-9D00-DF1976872AC2@creamfinance.com> <20201109124030.71216677@carbon> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; markup=markdown Content-Transfer-Encoding: quoted-printable Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2020 14:33:50 -0000 On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote: > On Mon, 09 Nov 2020 11:09:33 +0100 > "Thomas Rosenstein" wrote: > >> On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote: >> >>> On Sat, 07 Nov 2020 14:00:04 +0100 >>> Thomas Rosenstein via Bloat wrote: >>> >>>> Here's an extract from the ethtool https://pastebin.com/cabpWGFz = >>>> just >>>> in >>>> case there's something hidden. >>> >>> Yes, there is something hiding in the data from ethtool_stats.pl[1]: >>> (10G Mellanox Connect-X cards via 10G SPF+ DAC) >>> >>> stat: 1 ( 1) <=3D outbound_pci_stalled_wr_events= = >>> /sec >>> stat: 339731557 (339,731,557) <=3D rx_buffer_passed_thres_phy /se= c >>> >>> I've not seen this counter 'rx_buffer_passed_thres_phy' before, = >>> looking >>> in the kernel driver code it is related to "rx_buffer_almost_full". >>> The numbers per second is excessive (but it be related to a driver = >>> bug >>> as it ends up reading "high" -> rx_buffer_almost_full_high in the >>> extended counters). I have now tested with a new kernel 5.9.4 build made from 3.10 with make = oldconfig and I noticed an interesting effect. The first ca. 2 minutes the router behaves completely normal as with = 3.10, after that the ping times go crazy. I have recorded this with ethtool, and also the ping times. Ethtool: (13 MB) https://drive.google.com/file/d/1Ojp64UUw0zKwrgF_CisZb3BCdidAJYZo/view?us= p=3Dsharing The transfer first was doing around 50 - 70 MB/s then once the ping = times go worse it dropped to ~12 MB/s. ca. Line 74324 the transfer speed drops to 12 MB/s Seems you are right about the rx_buffer_passed_thres_phy if you check = just those lines they appear more often once the speed dropped. Not sure if that's the cause or an effect of the underlying problem! Pings: https://drive.google.com/file/d/16phOxM5IFU6RAl4Ua4pRqMNuLYBc4RK7/view?us= p=3Dsharing Pause frames were activated again after the restart. (Here a link for rerefence for the ethtool variables: = https://community.mellanox.com/s/article/understanding-mlx5-ethtool-count= ers)