From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 1EBB93B2A4 for ; Thu, 29 Aug 2019 10:42:57 -0400 (EDT) Received: by mail-lf1-x134.google.com with SMTP id q27so2713360lfo.10 for ; Thu, 29 Aug 2019 07:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yDcZQEf6yEA6RnJMGP6YRJVeYr0QlXpgH8LKEf/3ogk=; b=oaA+q5ljtYkfCNgzAWZ095snGpejpuIZTgz4SCF4e7AE/+f4mHG+yxL25RUNHFuPyP TA+r2TGxXxgUuupoQ9f2SPBvy6neD+u74ae29IC3DqcxKh6PC+j2X+5ju0tkwH+KqN/a hZzd+hjKYVbn9kUn1+RmnjeZ95WCXt6Eup2aBvlbeMlQ9z2H4HHippEf63C7lWI4RR6d amjCaaDIw+JIGW2kAYznSeYKKJ+TV6SJ9pnyH1XtZHZyl+m60anufnsK90SkGQtcrdVS 5dstTxjZFC6mFGY7vcHn8Kj+WqHkYK13whF9PuDtWZ+mIwZZXognXOJrjQvRD08T13LJ gZbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yDcZQEf6yEA6RnJMGP6YRJVeYr0QlXpgH8LKEf/3ogk=; b=jPqmuPMLok5z8jE7ejfbY03yBk6A10RmmHylVI3YvYvVyihIgl4HPkIMTqPeDBkmwr POHtAC6odJbiL3XwUDr9taLKzbDLX0ESzX2Ob19urPpM0YsfOICe5A/e2oOvJL5TqfU1 ZphW/T5r0/7yKoGwobSOt+4YDApgImfl5SfBh39iIsnQMsLGw3zHdutWLADfMBWfp5Sw B0fyBS17SI/QJKhvwqPs8onam9g07fad7EWujOMBtmkRZeO4RkTrXd7vK2iKJz3ak9rQ lKTSmLZo2emw4JxZZDQY5LKKHuzxjv8WclKUoRz5IDV5Bm53rnqxesT/fWgOG//2rgli lYOA== X-Gm-Message-State: APjAAAUIchoU6u2BIxgerd6QVkB68bUnrm0NRfQ2dyfyoob0T/F9kqLH ehOE+lAfxcmF8eBulZjmvcg= X-Google-Smtp-Source: APXvYqwXk+Ev8bFYXnHky8Sqv5JY9MFj+7sWOpr8b0geixxU9tqKezTOTPbG02vNcpqmaC17dlJ0Og== X-Received: by 2002:a19:e204:: with SMTP id z4mr6525306lfg.157.1567089775809; Thu, 29 Aug 2019 07:42:55 -0700 (PDT) Received: from jonathartonsmbp.lan (83-245-237-193-nat-p.elisa-mobile.fi. [83.245.237.193]) by smtp.gmail.com with ESMTPSA id o26sm426996lfi.51.2019.08.29.07.42.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 29 Aug 2019 07:42:55 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Jonathan Morton In-Reply-To: Date: Thu, 29 Aug 2019 17:42:53 +0300 Cc: ECN-Sane Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Dave Taht X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Ecn-sane] rfc3168 sec 6.1.2 X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2019 14:42:57 -0000 > On 29 Aug, 2019, at 4:51 pm, Dave Taht wrote: >=20 > I am leveraging hazy memories of old work a years back where I pounded = 50 ? 100 ? flows through a 100Mbit ethernet At 100 flows, that gives you 1Mbps per flow fair share, so 80pps or = 12.5ms between packets on each flow, assuming they're all saturating. = This also means you have a minimum sojourn time (for saturating flows) = of 12.5ms, which is well above the Codel target, so Codel will always be = in dropping-state and will continuously ramp up its signalling frequency = (unless some mitigation is in place for this very situation, which there = is in Cake). Both Cake and fq_codel should still be able to prioritise sparse flows = to sub-millisecond delays under these conditions. They'll be pretty = strict about what counts as "sparse" though. Your individual keystrokes = and echoes should get through quickly, but output from programs may end = up waiting. > A) fq_codel with drop had MUCH lower RTTs - and would trigger RTOs etc RTOs are bad. They indicate that the steady flow of traffic has broken = down on that flow due to tail loss, which is a particular danger at very = small cwnds. Cake tries to avoid them by not dropping the last queued packet from any = given flow. Fq_codel doesn't have that protection, so in non-ECN mode = it will drop way too many packets in a desperate (and misguided) attempt = to maintain the target sojourn time. What you need to understand here is that dropped packets increase = *application* latency, even if they also reduce the delay to individual = packets. ECN doesn't incur that problem. > B) cake (or fq_codel with ecn) hit, I don't remember, 40ms tcp delays. A delay of 40ms suggests about 3 packets per flow are in the queue. = That's pretty close to the minimum cwnd of 2. One would like to do = better than that, of course, but options for doing so become limited. I would expect SCE to do better at staying *at* the minimum cwnd in = these conditions. That by itself would reduce your delay to 25ms. = Combined with setting the CA pacing scale factor to 40%, that would also = reduce the average packets per flow in the queue to 0.8. I think that's = independent of whether the receiver still acks only every other segment. = The delay on each flow would probably go down to about 10ms on average, = but I'm not going to claim anything about the variance around that = value. Since 10ms is still well above the normal Codel target, SCE will be = signalling 100% to these flows, and thus preventing them from increasing = the cwnd from 2. > C) The workload was such that the babel protocol (1000? routes - 4 > packet non-ecn'd udp bursts) would eventually fail - dramatically, by > retracting the route I was on and thus acting as a circuit breaker on > all traffic, so I'd lose connectivit for 16 sec That's a problem with Babel, not with ECN. A robust routing protocol = should not drop the last working route to any node, just because the = link gets congested. It *may* consider that link as non-preferred and = seek alternative routes that are less congested, but it *must* keep the = route open (if it is working at all) until such an alternative is found. But you did find that turning on ECN for the routing protocol helped. = So the problem wasn't latency per se, but packet loss from the AQM = over-reacting to that latency. > Anyway, 100 flows, no delays, straight ethernet, and babel with 1000+ = routes is easy to setup as a std test, and I'd love it if y'all could = have that in your testbed. Let's put it on the todo list. Do you have a working script we can just = use? - Jonathan Morton=