Re: [Ecn-sane] rfc3168 sec 6.1.2

Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed

From: Jonathan Morton <chromatix99@gmail.com>
To: Dave Taht <dave.taht@gmail.com>
Cc: ECN-Sane <ecn-sane@lists.bufferbloat.net>
Subject: Re: [Ecn-sane] rfc3168 sec 6.1.2
Date: Thu, 29 Aug 2019 17:42:53 +0300	[thread overview]
Message-ID: <DF529AFE-C5C2-4553-8EC0-C64A6308FBB1@gmail.com> (raw)
In-Reply-To: <CAA93jw5G2fONfM6zfkdkugOUwUXLbk5aBtr8ii9XYQ8Mqs42Uw@mail.gmail.com>

> On 29 Aug, 2019, at 4:51 pm, Dave Taht <dave.taht@gmail.com> wrote:
> 
> I am leveraging hazy memories of old work a years back where I pounded 50 ? 100 ? flows through a 100Mbit ethernet

At 100 flows, that gives you 1Mbps per flow fair share, so 80pps or 12.5ms between packets on each flow, assuming they're all saturating.  This also means you have a minimum sojourn time (for saturating flows) of 12.5ms, which is well above the Codel target, so Codel will always be in dropping-state and will continuously ramp up its signalling frequency (unless some mitigation is in place for this very situation, which there is in Cake).

Both Cake and fq_codel should still be able to prioritise sparse flows to sub-millisecond delays under these conditions.  They'll be pretty strict about what counts as "sparse" though.  Your individual keystrokes and echoes should get through quickly, but output from programs may end up waiting.

> A) fq_codel with drop had MUCH lower RTTs - and would trigger RTOs etc

RTOs are bad.  They indicate that the steady flow of traffic has broken down on that flow due to tail loss, which is a particular danger at very small cwnds.

Cake tries to avoid them by not dropping the last queued packet from any given flow.  Fq_codel doesn't have that protection, so in non-ECN mode it will drop way too many packets in a desperate (and misguided) attempt to maintain the target sojourn time.

What you need to understand here is that dropped packets increase *application* latency, even if they also reduce the delay to individual packets.  ECN doesn't incur that problem.

> B) cake (or fq_codel with ecn) hit, I don't remember, 40ms tcp delays.

A delay of 40ms suggests about 3 packets per flow are in the queue.  That's pretty close to the minimum cwnd of 2.  One would like to do better than that, of course, but options for doing so become limited.

I would expect SCE to do better at staying *at* the minimum cwnd in these conditions.  That by itself would reduce your delay to 25ms.  Combined with setting the CA pacing scale factor to 40%, that would also reduce the average packets per flow in the queue to 0.8.  I think that's independent of whether the receiver still acks only every other segment.  The delay on each flow would probably go down to about 10ms on average, but I'm not going to claim anything about the variance around that value.

Since 10ms is still well above the normal Codel target, SCE will be signalling 100% to these flows, and thus preventing them from increasing the cwnd from 2.

> C) The workload was such that the babel protocol (1000?  routes - 4
> packet non-ecn'd udp bursts) would eventually fail - dramatically, by
> retracting the route I was on and thus acting as a circuit breaker on
> all traffic, so I'd lose connectivit for 16 sec

That's a problem with Babel, not with ECN.  A robust routing protocol should not drop the last working route to any node, just because the link gets congested.  It *may* consider that link as non-preferred and seek alternative routes that are less congested, but it *must* keep the route open (if it is working at all) until such an alternative is found.

But you did find that turning on ECN for the routing protocol helped.  So the problem wasn't latency per se, but packet loss from the AQM over-reacting to that latency.

> Anyway, 100 flows, no delays, straight ethernet, and babel with 1000+ routes is easy to setup as a std test, and I'd love it if y'all could have that in your testbed.

Let's put it on the todo list.  Do you have a working script we can just use?

 - Jonathan Morton

next prev parent reply	other threads:[~2019-08-29 14:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29  2:08 Dave Taht
2019-08-29  8:02 ` Jonathan Morton
2019-08-29 13:51   ` Dave Taht
2019-08-29 14:35     ` Jeremy Harris
2019-08-29 14:42     ` Jonathan Morton [this message]
2019-08-29 19:10       ` Dave Taht
2019-08-29 19:45         ` Dave Taht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/ecn-sane.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DF529AFE-C5C2-4553-8EC0-C64A6308FBB1@gmail.com \
    --to=chromatix99@gmail.com \
    --cc=dave.taht@gmail.com \
    --cc=ecn-sane@lists.bufferbloat.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox