From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com
 [IPv6:2607:f8b0:4864:20::d41])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id F3DC63B2A4
 for <ecn-sane@lists.bufferbloat.net>; Thu, 29 Aug 2019 15:45:18 -0400 (EDT)
Received: by mail-io1-xd41.google.com with SMTP id p12so9352064iog.5
 for <ecn-sane@lists.bufferbloat.net>; Thu, 29 Aug 2019 12:45:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=cSeSpziTJ70ZoWwqA2AnrwbwHvu9jhovmWv6AjRqnX4=;
 b=f8bOpr8w9LGKZc/Yb2TnLC93BO0leOGkBueqfBuQbrEzJCVoYT+ugWov3Gsq0XUgTA
 Agm6lYJeHPPTVdFmjhprnzPaVRHQibuQz/LZnhZBz4gFecR8hxSygfaV4DIEBf+IzHNQ
 GMlTKtAlURwulBVaLPj0Jg0F5CVgYAw2IBdO7LyDPTLqP1q9B9NEMlBzbF9C9YaYO1Ry
 qeqMo39fwv3COmlnpen6YFnHf2lHk5UkW20adOQxGBiS6AJoWwe1rG3NQj+5HWNnhYMY
 SJpoLuzD0fmZj3ogDtJuOJVXIopLPlnDx/xmS4siISfZGK5R8ePRDaCBLt8Hpb4apKQU
 /YrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=cSeSpziTJ70ZoWwqA2AnrwbwHvu9jhovmWv6AjRqnX4=;
 b=NKvOB1KMjGg9mm979PRGSIOzO87TX2v5lRqWuVPR/6snixhDdla/5hEL2kvWfxoVjJ
 nJDirD80E5pJbtH8I4d5rh+SjMqUn1QkYd4Ut9LzgNka/mdeZyMEr2wAgavqA+Bn8GcQ
 CVM37PTQYeYf0W5qgRPnJQx7tCq/5/7mra+Fc4DM+H0gcA0U3zJYghG6a2+irnqsuhUN
 reKmDGoYg850lo0X87CDva1DOahVnSyn4xVzMGn4AEufeifrgLCPQ5m9UpoCEOpo8Oab
 IXdXd8JRvJeeKR+b1+E3XII8ajBjt+AvUDHx+j29lo2SxLLWq8wBVd3bEIZDSOfYh6aS
 p/Yw==
X-Gm-Message-State: APjAAAUYL71+PJKqcJbfQ9vVK81wIMOCjeTmQ4ircx7wbgyjTfM06TiZ
 Rk1qJ7WtpBNewFJequb8HEExLT5Q+4uJgKnTypidWVm3
X-Google-Smtp-Source: APXvYqzohdMrSOhOdVsYb+bekfZXft4eaDT8bLFGSmRcC40NaHPUkVQsj/GQ2I/Os9uIaE/SXrJVC1CxKZfMXc6fiBE=
X-Received: by 2002:a05:6638:45:: with SMTP id
 a5mr12049847jap.61.1567107918106; 
 Thu, 29 Aug 2019 12:45:18 -0700 (PDT)
MIME-Version: 1.0
References: <CAA93jw7tkKEQF_BiGOrnE2j79-7m3wNOOrMSmGRg613DMvdg1w@mail.gmail.com>
 <F0DF005B-21F2-4B91-B9A2-855CFE60B1DC@gmail.com>
 <CAA93jw5G2fONfM6zfkdkugOUwUXLbk5aBtr8ii9XYQ8Mqs42Uw@mail.gmail.com>
 <DF529AFE-C5C2-4553-8EC0-C64A6308FBB1@gmail.com>
 <CAA93jw4vVu-U4JrUEZUKSr10LN3yson8Ypy9df3_Dp78ETi1fw@mail.gmail.com>
In-Reply-To: <CAA93jw4vVu-U4JrUEZUKSr10LN3yson8Ypy9df3_Dp78ETi1fw@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Thu, 29 Aug 2019 12:45:06 -0700
Message-ID: <CAA93jw4H4uPFYUKV9+EMNRNoHhQyuQgxyLfB2geRE0GjK2hfiQ@mail.gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
Cc: ECN-Sane <ecn-sane@lists.bufferbloat.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Ecn-sane] rfc3168 sec 6.1.2
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 29 Aug 2019 19:45:19 -0000

On Thu, Aug 29, 2019 at 12:10 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Thu, Aug 29, 2019 at 7:42 AM Jonathan Morton <chromatix99@gmail.com> w=
rote:
> >
> > > On 29 Aug, 2019, at 4:51 pm, Dave Taht <dave.taht@gmail.com> wrote:
> > >
> > > I am leveraging hazy memories of old work a years back where I pounde=
d 50 ? 100 ? flows through a 100Mbit ethernet
> >
> > At 100 flows, that gives you 1Mbps per flow fair share, so 80pps or 12.=
5ms between packets on each flow, assuming they're all saturating.  This al=
so means you have a minimum sojourn time (for saturating flows) of 12.5ms, =
which is well above the Codel target, so Codel will always be in dropping-s=
tate and will continuously ramp up its signalling frequency (unless some mi=
tigation is in place for this very situation, which there is in Cake).
> >
> > Both Cake and fq_codel should still be able to prioritise sparse flows =
to sub-millisecond delays under these conditions.  They'll be pretty strict=
 about what counts as "sparse" though.  Your individual keystrokes and echo=
es should get through quickly, but output from programs may end up waiting.
> >
> > > A) fq_codel with drop had MUCH lower RTTs - and would trigger RTOs et=
c
> >
> > RTOs are bad.  They indicate that the steady flow of traffic has broken=
 down on that flow due to tail loss, which is a particular danger at very s=
mall cwnds.
>
> They indicated that traffic has broken down for any of a zillion
> reasons. RTO's for example, are what
> gets tcp restarted after babel does the circuit breaker thing on this
> test and restores it.
>
> RTOs are Good. :)
>
> > Cake tries to avoid them by not dropping the last queued packet from an=
y given flow.  Fq_codel doesn't have that protection, so in non-ECN mode it=
 will drop way too many packets in a desperate (and misguided) attempt to m=
aintain the target sojourn time.
>
> We are trying to encourage others to stop editorizing so much. As the
> author of this behavior in fq_codel,
> my reasoning at the time was that under conditions of overload that
> there were usually packets "in the network", and keeping the last
> packet in the queue scaled badly in terms of total RTT. Saying "go
> away, come back later" was a totally reasonable response, baked into
> TCPs since the very beginning.
>
> I'm glad that cake and fq_codel have a different response curve here.
> It's interesting. Catagorizing the
> differences between approaches is good.
>
> As best as I can recall I put this behavior into fq_codel after some
> very similar testing back in 2012.
>
>
> > What you need to understand here is that dropped packets increase *appl=
ication* latency, even if they also reduce the delay to individual packets.=
  ECN doesn't incur that problem.
>
> Well, let me point at my data here:
> http://blog.cerowrt.org/post/ecn_fq_codel_wifi_airbook/
>
> We need to be clear about what we consider an "application". I tend to
> think about things more
> as "human facing" or not, and optimize for humans first.
>
> In this case dropped packets on a 2 second flow account for a maximum
> of 16ms increase for FCT. Inperceptable. Compared to making room for
> other packets from other flows at the point of contention
> is a win for those other flows.

And to wax philosophical (I'm trying really hard to keep my limbic system
out of things this time around!), in stuart's other example of a screen sha=
ring
application, ecn is useful!

And he made use of tcp_notsent_lowat to skip a frame when congestion
indicators told the application to do so. very compelling example.

I think that starting to build charts of our different outlooks under
variing circumstances would help.

Me, I'm all about the latency, willing to do almost anything to hold overal=
l
latencies to a minimum. I'd assert: on reliable transports you recover
from a short rtt loss
faster than if you get a swollen RTT and CE within that rtt (I DO note
that both sce and l4s change this equation!!!!) - and try to show that
-

and I'm generally willing to accept lots of loss on voice and gaming
traffic in exchange for low jitter.

If we can improve the tcps or quic in any way - drop, loss, ecn, sce,
improving rto behavior, reducing mss, cc behavior, even adding tachyon
support  - GREAT.

Anyway with more stuff in comparison tables, and maybe we could also
channel sally floyd and the l4s folk, for each remarkable
circumstance.

ok, I really gotta go


One of my puzzlements in life is that I really love that option and I imagi=
ne
it's not used as much as it could be.
> In particular (and perhaps we can show this with a heavy load test)
> having shorter RTTs from drop makes it
> faster for a new or existing flows to grab back bandwidth when part of
> that load exits.

> I've long bought the argument for human interactive flows that need a
> reliable transport - that ecn is good - as we did in mosh. But (being
> chicken) on doing it to everything, not so much.
>
> Anyway, the cwnd 1 + retransmit (or pacing!) idea would hopefully
> reduce the ecn'd RTTs to something
> more comparable to the drop in this particular test, which would be a
> step forward.
>
> I'll get to your other points below, later.
>
> > > B) cake (or fq_codel with ecn) hit, I don't remember, 40ms tcp delays=
.
> >
> > A delay of 40ms suggests about 3 packets per flow are in the queue.  Th=
at's pretty close to the minimum cwnd of 2.  One would like to do better th=
an that, of course, but options for doing so become limited.
> >
> > I would expect SCE to do better at staying *at* the minimum cwnd in the=
se conditions.  That by itself would reduce your delay to 25ms.  Combined w=
ith setting the CA pacing scale factor to 40%, that would also reduce the a=
verage packets per flow in the queue to 0.8.  I think that's independent of=
 whether the receiver still acks only every other segment.  The delay on ea=
ch flow would probably go down to about 10ms on average, but I'm not going =
to claim anything about the variance around that value.
> >
> > Since 10ms is still well above the normal Codel target, SCE will be sig=
nalling 100% to these flows, and thus preventing them from increasing the c=
wnd from 2.
> >
> > > C) The workload was such that the babel protocol (1000?  routes - 4
> > > packet non-ecn'd udp bursts) would eventually fail - dramatically, by
> > > retracting the route I was on and thus acting as a circuit breaker on
> > > all traffic, so I'd lose connectivit for 16 sec
> >
> > That's a problem with Babel, not with ECN.  A robust routing protocol s=
hould not drop the last working route to any node, just because the link ge=
ts congested.  It *may* consider that link as non-preferred and seek altern=
ative routes that are less congested, but it *must* keep the route open (if=
 it is working at all) until such an alternative is found.
> >
> > But you did find that turning on ECN for the routing protocol helped.  =
So the problem wasn't latency per se, but packet loss from the AQM over-rea=
cting to that latency.
> >
> > > Anyway, 100 flows, no delays, straight ethernet, and babel with 1000+=
 routes is easy to setup as a std test, and I'd love it if y'all could have=
 that in your testbed.
> >
> > Let's put it on the todo list.  Do you have a working script we can jus=
t use?
> >
> >  - Jonathan Morton
>
>
>
> --
>
> Dave T=C3=A4ht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740


--=20

Dave T=C3=A4ht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740