From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 23A783B29E for ; Sat, 7 Nov 2020 07:37:05 -0500 (EST) Received: by mail-wm1-x335.google.com with SMTP id 10so3174432wml.2 for ; Sat, 07 Nov 2020 04:37:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=creamfinance.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version; bh=F6m/6FMlghkrYO8IVuvxjZ/PjpXb2siqgg1YWrrOw+o=; b=Hjvq20qRTiLN6DT0zy/7SQrc2fUOsJWKR0fXzjzg1mj0fcJgnI2wN2Y8KwC9eF1QeZ rBVjvQjK16vdIO3WlRtOZg4E2ThZl5e9sVomRcHt6xt+ScPgpdZ5WyYjUpYXKtbijrwa ZJbCK+UGcWozeT5mmvaHfyyGqqo/OF/faDnLQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version; bh=F6m/6FMlghkrYO8IVuvxjZ/PjpXb2siqgg1YWrrOw+o=; b=n0f0UgQT6rMnDWp5ZXlLroNaic2slh+vSgnwAxzJqOKXyjxys99BDvsHlznN4zmeWm 0kMgOsws8KxdLFTajKT2D+5BDf72Q+bzLJXa0L+jIGIDit1An34xhNA6mB0TXsDiIM2u Pz9nZWqCtgEb8rvSNsgA8kamhpzV0Vq53qWwYdqwwMLBbxLCdlBwQjBipryk7k7YKwfn G54K9NGS7kTUUtMDW/fhpMhPeSuSlQbNe83iy0aOsmAJjR3EH2SOix269Z9cf/OoBHh9 TiDvAHdHJwnugkMjI+txi/mTlqjnRBne3CmDqUQwbX5dFCcv0Cf3qxY6xaL6kvQ+rTnC v7Hg== X-Gm-Message-State: AOAM531ILMbKSiRCIfBrO6s7KJkj9qVbJNzPEjOk3k/s+H0ZIye9hvrq F/5GlzfAalAoo8D8W+27kNQw X-Google-Smtp-Source: ABdhPJwkw0cR6IS4VjyUb9hUmq0BFq3Gn9akxXb0/PHsmsxaS01IFWM2w4kISiTjKYvA0xIsRTCsoA== X-Received: by 2002:a1c:d0:: with SMTP id 199mr4619625wma.148.1604752623973; Sat, 07 Nov 2020 04:37:03 -0800 (PST) Received: from [10.8.100.3] (ip-185.208.132.9.cf-it.at. [185.208.132.9]) by smtp.gmail.com with ESMTPSA id j71sm6225983wmj.10.2020.11.07.04.37.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 07 Nov 2020 04:37:03 -0800 (PST) From: "Thomas Rosenstein" To: "Jesper Dangaard Brouer" Cc: Bloat Date: Sat, 07 Nov 2020 13:37:01 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: In-Reply-To: <20201106211940.4c30ccc9@carbon> References: <87imalumps.fsf@toke.dk> <871rh8vf1p.fsf@toke.dk> <81ED2A33-D366-42FC-9344-985FEE8F11BA@creamfinance.com> <87sg9ot5f1.fsf@toke.dk> <20201105143317.78276bbc@carbon> <11812D44-BD46-4CA4-BA39-6080BD88F163@creamfinance.com> <20201106121840.7959ae4b@carbon> <87blgaso84.fsf@toke.dk> <20201106135358.09f6c281@carbon> <20201106151324.5f506574@carbon> <1E70B6D2-1212-43FA-989A-03B657EEE2F2@creamfinance.com> <20201106211940.4c30ccc9@carbon> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; markup=markdown Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Nov 2020 12:37:05 -0000 On 6 Nov 2020, at 21:19, Jesper Dangaard Brouer wrote: > On Fri, 06 Nov 2020 18:04:49 +0100 > "Thomas Rosenstein" wrote: > >> On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote: >> >> >> I'm using ping on IPv4, but I'll try to see if IPv6 makes any >> difference! > > I think you misunderstand me. I'm not asking you to use ping6. The > gobgpd daemon updates will both update IPv4 and IPv6 routes, right. > Updating IPv6 routes are more problematic than IPv4 routes. The IPv6 > route tables update can potentially stall softirq from running, which > was the latency tool was measuring... and it did show some outliers. yes I did, I assumed the latency would be introduced in the traffic path by the lock. Nonetheless, I tested it and no difference :) > > >>> Have you tried to use 'perf record' to observe that is happening on >>> the system while these latency incidents happen? (let me know if >>> you >>> want some cmdline hints) >> >> Haven't tried this yet. If you have some hints what events to monitor >> I'll take them! > > Okay to record everything (-a) on the system and save call-graph (-g), > and run for 5 seconds (via profiling the sleep function). > > # perf record -g -a sleep 5 > > To view the result the simply use the 'perf report', but likely you > want to use option --no-children as you are profiling the kernel (and > not a userspace program you want to have grouped 'children' by). I > also include the CPU column via '--sort cpu,comm,dso,symbol' and you > can select/zoom-in-on a specific CPU via '-C zero-indexed-cpu-num'. > > # perf report --sort cpu,comm,dso,symbol --no-children > > When we ask you to provide the output, you can use the --stdio option, > and provide txt-info via a pastebin link as it is very long. Here is the output from kernel 3.10_1127 (I updated to the really newest in that branch): https://pastebin.com/5mxirXPw Here is the output from kernel 5.9.4: https://pastebin.com/KDZ2Ei2F I have noticed that the delays are directly related to the traffic flows, see below. These tests are WITHOUT gobgpd running, so no updates to the route table, but the route tables are fully populated. Also, it's ONLY outgoing traffic, the return packets are coming in on another router. I have then cleared the routing tables, and the issue persists, table has only 78 entries. 40 threads -> sometimes higher rtt times: https://pastebin.com/Y9nd0h4h 60 threads -> always high rtt times: https://pastebin.com/JFvhtLrH So it definitly gets worse the more connections there are. I have also tried to reproduce the issue with the kernel on a virtual hyper-v machine, there I don't have any adverse effects. But it's not 100% the same, since MASQ happens on it .. will restructure a bit to get a similar representation I also suspected now that -j NOTRACK would be an issue, removed that too, no change. (it's anyways async routing) Additionally I have quit all applications except for sshd, no change! > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer