From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4C83C3B29E for ; Mon, 3 Feb 2025 09:05:03 -0500 (EST) Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-54025432becso4685191e87.1 for ; Mon, 03 Feb 2025 06:05:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738591502; x=1739196302; darn=lists.bufferbloat.net; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vpUtOwoRJiNyowmSNnU/MccBuKd5NazjpreAymwIdro=; b=mprzKYERAyrtTS3ojfe+HKoti8oxmypjGp1pV+aSUv+5sTY7EV/yrWjozB/lJ+S54K cdYW354Q55i1gIbMLCiwjYu/h0DVOCYlyfknSTbJN6QWL2DMVxgOeIdCUYuMgy6Q4AaI kPe+NxT44NHlwrpgPLjeFsl2qwHa0eHjD7Cmz4LOLK8+3slHMiJaHvbJgbAPZl0+Tp3D GsP4eyfglmruekUDGHjjmGqQmnYA/ghv3Lg4CtzWir23tWYaCAZC1qI10jbtbtidCv9O qzkGLv4OEsdBXDKzdlw1m2edSYj5jNBiQKBtWtogr2+aTqrBRsOfVxvxvv2/AtyzGLCT oyeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738591502; x=1739196302; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vpUtOwoRJiNyowmSNnU/MccBuKd5NazjpreAymwIdro=; b=HpTNQjG41zXZYnRJlUbiWIAzfZFO7upsDIQ9vSnuLgoAX24fjUHl4kLElAeERsqaoM Bkfy9+k0axIwPCOR6Ji2sCXlKoiIhDl2J0jkxnQGH6faA9MctGRkpZZlw8OorMWk28c9 Iws45slL2EgDyC1g9Z7Np/HVj096KyVCp5Fka2gpc2WKQOeE8Z58PbWCJBLiDE+Nv0dP UYtV7GdNMpU270NrxBrVBJfurNZbUU09AnspqQGzV8I/TUxcj73mtJjUbgT3/5jRt7Ou TIDojeCx9tXwAdVtoTiI/FCKBuslEvKSxh02K8jbWbjfKYfcFngrf2d03cAZfq3MM8Sp ka7g== X-Forwarded-Encrypted: i=1; AJvYcCV7bTGc07xdufVpPofC3sf0/k2X46za0FKqpEhB8zIREj+TSuUFaCoFROmQPV39hOKgbnzNhA==@lists.bufferbloat.net X-Gm-Message-State: AOJu0YycaJLb7m4U9qukFBye5gGYQnut3IA2ETRoC2xwHT6OUqsrHqCw XvgVPWmb5jsAM0fmTisVdg7POK2PtZGLvJJZ8izlmqXnssK2rU9X X-Gm-Gg: ASbGncsuDtlEIRRxi4L/OQUUTBNK4ISNKfZRRHvZSO+RZDg/4kNcApWNjrOVgvSZkBD ti0fLXnf01wsL0pkMpDuEajlQMZdMrR7NtwZ/839nAPly/PMZPKh6PogOS0GFQ1gil/gk3joT5U asMqKMtpEF9HQynTUwybiKOl+XLGaMLYxzZpQDFAbEgmgOC4JcwjIXMQZxLk6RyUEul+dsccwov ugtPvTeECFtXhUwc92O+LQ5cr/XL+PaZ87DMIsxwJiq1vFrJHhzvsydCEg4+fM7fjEVOuiw/U2c idd5JnMvlMITKKhbKuLeJ25jI97gaU58J1SljCpVAkt5GmurMaAAL21Cl9s= X-Google-Smtp-Source: AGHT+IFzgDPYsBeJlk0Ct56rVHVuuwye7JzgoQKI27i0fvZ/9cRBlf4f7ioWKADBcFDtsQ3tsDoO/Q== X-Received: by 2002:a05:6512:239e:b0:540:2f59:7561 with SMTP id 2adb3069b0e04-543e4c3c98cmr6856949e87.50.1738591501284; Mon, 03 Feb 2025 06:05:01 -0800 (PST) Received: from smtpclient.apple (37-33-7-26.bb.dnainternet.fi. [37.33.7.26]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-543ebe0ff47sm1311763e87.93.2025.02.03.06.04.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Feb 2025 06:04:59 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\)) From: Jonathan Morton In-Reply-To: <93E5FE09-C0C6-4912-B51B-07F4F8479F83@gmx.de> Date: Mon, 3 Feb 2025 16:04:58 +0200 Cc: =?utf-8?Q?Dave_T=C3=A4ht?= , David Collier-Brown , Rich Brown via Bloat Content-Transfer-Encoding: quoted-printable Message-Id: <1CF53C3A-9028-466C-A0B0-704D9E156BAC@gmail.com> References: <48F77C27-0E57-4F96-9BD8-238CC93342F8@gmail.com> <9A110AF4-E976-4228-9FA6-92C5C99F611A@gmx.de> <48FB68A5-8320-4B46-97E1-4C67BB7B7B1B@gmx.de> <4B1379B8-7EF6-4D34-8091-451A48585811@gmail.com> <0492E50C-BB28-455D-9E4A-B6272EA7F6AD@gmail.com> <93E5FE09-C0C6-4912-B51B-07F4F8479F83@gmx.de> To: Sebastian Moeller X-Mailer: Apple Mail (2.3654.100.0.2.22) Subject: Re: [Bloat] Comcast & L4S X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Feb 2025 14:05:03 -0000 >> Actually, the 5ms target is already too tight for efficient TCP = operation on typical Internet paths - unless there is significant = statistical multiplexing on the bottleneck link, which is rarely the = case in a domestic context. =20 >=20 > I respectfully disagree, even at 1 Gbps we only go down to 85% = utilisation with a single flow I assume. That is a trade-off I am happy = to make... At 100ms RTT, yes. But you can see that Codel has disproportionately = more trouble when the RTT increases a little more than that, and such = paths are not uncommon when you look outside of our usual stomping = grounds of Europe and North America. This happens because each = congestion episode starts to lead to more than one Multiplicative = Decrease due to congestion signalling, so the average cwnd falls below = what would normally be expected. This would not typically occur at high = statistical multiplexing. >> Short RTTs on a LAN allow for achieving full throughput with the = queue held this small, but remember that the concept of "LAN" also = includes WiFi links whose median latency is orders of magnitude greater = than that of switched Ethernet. That's why I don't want to encourage = going below 5ms too much. >=20 > Not wanting top be contrarian, but here I believe fixing WiFi is the = better path forward. Perhaps, but you'll need to change the fundamental collision-avoidance = MAC design of WiFi to do that. Until someone (and it will take more = than mere *individual* contributions) gets around to that and existing = WiFi hardware mostly drops out of use, we have to design for its current = behaviour. I'm not talking about the bufferbloat of some specific WiFi hardware = here - we've already done all the technical work we can to fix that. = It's the fundamental link protocol. >> DelTiC actually reverts to the 25ms queue target that has = historically been typical for AQMs targeting conventional TCP. >=20 > Not doubting one bit that 25ms makes a ton of sense for DelTic, but = where do these historical 25ms come from and how was this number = selected? Perhaps "historical" is putting it too strongly - it's only quite = recently that AQM has used a time-based delay target at all. It is, = however, the delay target that PIE uses. The graphs I attached arise from an effort to decide what "rightsize" = actually means for a dumb FIFO buffer, in which it proved convenient to = also test some AQMs. The classical rule is based on Reno behaviour, and = in the absence of statistical multiplexing reduces to "buffer depth = equal to baseline path length" to obtain 100% throughput. Updating this = for CUBIC yields a rule of "buffer depth 3/7ths of baseline path = length", which for a 100ms path would be around 40ms buffer. This is, = again, for 100% throughput at steady state. Examining the detailed behaviour of CUBIC, we realised that = approximately halving this would still yield reasonably good throughput, = due to CUBIC's designed-in decelerating approach to the previous highest = cwnd and, particularly, its intermittent use of "fast convergence" = cycles in which the inflection point is placed halfway between the peak = and trough of the sawtooth. That yields a buffer size of 3/14ths of the = baseline RTT. On a 100ms path, 25ms gives a reasonable engineering = margin on top of this rule, and is also small enough for VoIP to easily = accommodate the jitter induced by a competing traffic load. Thus, in the graphs, you can see DelTiC staying consistently above 95% = throughput at 100ms, and falling off relatively gracefully above that. = Codel requires a path of 32ms or shorter to achieve that. Even PIE, = with the same delay target as DelTiC, doesn't do as well - but that is = due to its incorrect marking behaviour, which we have discussed at = length before. >> As for CPU efficiency, that is indeed something to keep in mind. The = scheduling logic in Cake got very complex in the end, and there are = undoubtedly ways to avoid that with a fresh design. >=20 > Ah, that was not my main focus here, with 1600 Gbps ethernet already = in the horizon, I assume a shaper running out of CPU is not really = avoidable, I am more interested in that shaper having a graceful = latency-conserving failure mode when running out of timely CPU access. = Making scheduling more efficient is something that I am fully behind, = but I consider these two mostly orthogonal issues. I suppose there are two distinct meanings of "scheduling". One is = deciding which packet to send next. The other is deciding WHEN the next = packet can be sent. It's the latter that might be more complicated than = necessary in Cake, and that complexity could easily result in exercising = the kernel timer infrastructure more than required. However, I would also note that this behaviour is only seen on certain = specific classes of hardware, and on that particular hardware I think = there is another mechanism contributing to poor throughput. Cake's = shaper architecture quite deliberately "pushes harder" when the = throughput goes below the configured rate, and that manifests as higher = CPU utilisation. HTB isn't as good at that. But the underlying reason = may be a bottleneck in the I/O infrastructure between the CPU and the = network. When a qdisc is not attached, this I/O bottleneck is bypassed. - Jonathan Morton