From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com
 [IPv6:2a00:1450:4864:20::236])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 00AAC3B2A4
 for <ecn-sane@lists.bufferbloat.net>; Sat, 22 Jun 2019 19:07:11 -0400 (EDT)
Received: by mail-lj1-x236.google.com with SMTP id k18so9127178ljc.11
 for <ecn-sane@lists.bufferbloat.net>; Sat, 22 Jun 2019 16:07:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=zS5Tj6141d1G8e+sL/TQVYrZeSfWYY1nDMHUzW2C28A=;
 b=UxCHZSovUgNmBHfM3OmHX+V2ugJAG0TJQ2v1ZclT1D3TGB0dtQ9vrmJoyShSGQbass
 tcJSwpkfXNNS9NRJ953vNftrQYtzDW2LBhA9q35Qv+QbUlVivLrXPXGB9qj4KR/axM9M
 /5iQWzR9lrGZNfXDex6Oq02O8qRdG3MB3MDg2Ma030WfP+xNACCqIKLflwKJGJWQIJbs
 x/wK9ws7LwjR/lrYBp0LF2cJwOJ0Fuyn0qsptEfhgD4W3fPhklJ8/L4fX2BlhLKO9THr
 LPwONP7PwOZTJKerSrHLRrNNcUJn2b7ECozmuApdqh9MeISNd6jn0+FONKP+k7Rh3C5n
 lU9A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=zS5Tj6141d1G8e+sL/TQVYrZeSfWYY1nDMHUzW2C28A=;
 b=UKHvaJsyDmRe3g040zcLGGBp9apFPQ+r6NQ+pO/shfC1TRNJYWTBld6efM5Mj1o0Am
 Qg7t+rLpTCoeMMwSorGe2QKcPnkSkJMKQr/dy21aLwpioWQYZ8ciDy0slQEv8xmbjLGr
 Mp7CJ3yV1gc2B8cFzDXTMg36EP8rd+DSbvzAkfaI512AJN0FjUmlKgFSb0UXfvEWTNHT
 rhWOGdP0ji3P98Dvyx3fLRrZfei2/7BcceMhHvw/X3KQV1iU1YSQOyIbMBPI/oLvG036
 N+bUHqKsAwWK5pHwMI18KgD3bhp3f0CDZqn4DwbkXLinyJKzSwd9mfjEaAJknosuldTc
 iusg==
X-Gm-Message-State: APjAAAXuxTmXRozxlWHEjnNO169WBmwane+/7X8wpDIXSo8UiuO5H7sw
 e15PlG2Pbw5AmADDVdDuiic=
X-Google-Smtp-Source: APXvYqzbQq7ilWKBDBs0a5WEUXAM712o5GYBf3is8XWy0r5HcLBxakep+CEmcBu+43JaGHdp0XmR9A==
X-Received: by 2002:a2e:81c4:: with SMTP id s4mr65252078ljg.182.1561244830796; 
 Sat, 22 Jun 2019 16:07:10 -0700 (PDT)
Received: from jonathartonsmbp.lan (83-245-253-251-nat-p.elisa-mobile.fi.
 [83.245.253.251])
 by smtp.gmail.com with ESMTPSA id p6sm1042157lja.67.2019.06.22.16.07.09
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 22 Jun 2019 16:07:10 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
X-Priority: 3 (Normal)
In-Reply-To: <1561241377.4026977@apps.rackspace.com>
Date: Sun, 23 Jun 2019 02:07:08 +0300
Cc: Brian E Carpenter <brian.e.carpenter@gmail.com>,
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <081BAF4F-2E1C-441B-A31A-9AC70E3EDA32@gmail.com>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de>
 <CAHx=1M4+sJBEe-wqCyuVyy=oDz7A+SG_ZxBbu_ZZDZiCHrX2uw@mail.gmail.com>
 <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com>
 <1561233009.95886420@apps.rackspace.com>
 <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com>
 <1561241377.4026977@apps.rackspace.com>
To: "David P. Reed" <dpreed@deepplum.com>
X-Mailer: Apple Mail (2.3445.9.1)
Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 22 Jun 2019 23:07:12 -0000

> On 23 Jun, 2019, at 1:09 am, David P. Reed <dpreed@deepplum.com> =
wrote:
>=20
> per-flow scheduling is appropriate on a shared link. However, the =
end-to-end argument would suggest that the network not try to divine =
which flows get preferred.
> And beyond the end-to-end argument, there's a practical problem - =
since the ideal state of a shared link means that it ought to have no =
local backlog in the queue, the information needed to schedule "fairly" =
isn't in the queue backlog itself.  If there is only one packet, what's =
to schedule?

This is a great straw-man.  Allow me to deconstruct it.

The concept that DRR++ has empirically proved is that flows can be =
classified into two categories - sparse and saturating - very easily by =
the heuristic that a saturating flow's arrival rate exceeds its =
available delivery rate, and the opposite is true for a sparse flow.

An excessive arrival rate results in a standing queue; with Reno, the =
excess arrival rate after capacity is reached is precisely 1 segment per =
RTT, very small next to modern link capacities.  If there is no overall =
standing queue, then by definition all of the flows passing through are =
currently sparse.  DRR++ (as implemented in fq_codel and Cake) ensures =
that all sparse traffic is processed with minimum delay and no AQM =
activity, while saturating traffic is metered out fairly and given =
appropriate AQM signals.

> In fact, what the ideal queueing discipline would do is send signals =
to the endpoints that provide information as to what each flow's =
appropriate share is, and/or how far its current share is from what's =
fair.

The definition of which flows are sparse and which are saturating shifts =
dynamically in response to endpoint behaviour.

> Well, presumably the flows have definable average rates.

Today's TCP traffic exhibits the classic sawtooth behaviour - which has =
a different shape and period with CUBIC than Reno, but is fundamentally =
similar.  The sender probes capacity by increasing send rate until a =
congestion signal is fed back to it, at which point it drops back =
sharply.  With efficient AQM action, a TCP flow will therefore spend =
most of its time "sparse" and using less than the available path =
capacity, with occasional excursions into "saturating" territory which =
are fairly promptly terminated by AQM signals.

So TCP does *not* have a definable "average rate".  It grows to fill =
available capacity, just like the number of cars on a motorway network.

The recent work on high-fidelity ECN (including SCE) aims to eliminate =
the sawtooth, so that dropping out of "saturating" mode is done faster =
and by only a small margin, wasting less capacity and reducing peak =
delays - very close to ideal control as you describe.  But it's still =
necessary to avoid giving these signals unnecessarily to "sparse" flows, =
which would cause them to back off and thus waste capacity, but only to =
"saturating" flows that have just begun to build a queue.  And it's also =
necessary to protect these well-behaved "modern" flows from "legacy" =
endpoint behaviour, and vice versa.  DRR++ does that very naturally.

> Merely re-ordering the packets on a link is just not very effective at =
achieving fairness.

I'm afraid this assertion is simply false.  DRR++ does precisely that, =
and achieves near-perfect fairness.

It is important however to define "flow" correctly relative to the =
measure of fairness you want to achieve.  Traditionally the unique =
5-tuple is used to define "flow", but this means applications can game =
the system by opening multiple flows.  For an ISP a better definition =
might be that each subscriber's traffic is one "flow".  Or there is a =
tweak to DRR++ which allows a two-layer fairness definition, implemented =
successfully in Cake.

> So the end-to-end approach would suggest moving most of the scheduling =
back to the endpoints of each flow, with the role of the routers being =
to extract information about the competing flows that are congesting the =
network, and forwarding those signals (via drops or marking) to the =
endpoints. That's because, in the end-to-end argument that applies here =
- the router cannot do the entire function of managing congestion or =
priority.

It must be remembered that congestion signals require one RTT to =
circulate from the bottleneck, via the receiver, back to the sender, and =
their effects to then be felt at the bottleneck.  That's typically a =
much longer response time (say 100ms for a general Internet path) than =
can be achieved by packet scheduling (sub-millisecond for a 20Mbps =
link), and therefore effects only a looser control (by fundamental =
control theory).  Both mechanisms are useful and complement each other.

My personal interpretation of the end-to-end principle is that endpoints =
generally do not, cannot, and *should not* be aware of the topology of =
the network between them, nor of any other traffic that might be sharing =
that network.  The network itself takes care of those details, and may =
send standardised control-feedback signals to the endpoints to inform =
them about actions they need to take.  These currently take the form of =
ICMP error packets and the ECN field, the latter substituted by packet =
drops on Not-ECT flows.

 - Jonathan Morton=