From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <krose@krose.org>
Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com
 [IPv6:2607:f8b0:4864:20::b32])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id C0C643CB42
 for <ecn-sane@lists.bufferbloat.net>; Tue, 23 Jul 2019 11:12:58 -0400 (EDT)
Received: by mail-yb1-xb32.google.com with SMTP id z128so12850430yba.6
 for <ecn-sane@lists.bufferbloat.net>; Tue, 23 Jul 2019 08:12:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=krose.org; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=junMy6ATjTHpvHPW8ZEy792DVPz7DWyTO1axsCKZCxY=;
 b=IMg/8iC01pyHXZTy32ZUHA49LGSfGSdj2Z9A6d7NqeadgoQkuuwC3JHG/n3TmfDEVD
 PPBIOPlVd9LH9GrWAEU9EjlQGQbpZK5+45rSRlkDA0Y44C01UfpIoPjLNw1zrk76h0t1
 A0TdOT4ML6n48cwyJx1iB6X1YD7EGxs8BPOOE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=junMy6ATjTHpvHPW8ZEy792DVPz7DWyTO1axsCKZCxY=;
 b=c36YBJS5jI/quy/T3Q+2/QLDORg4fI0FFk0GgH7+9/271KY7ji+RM8TRP/YwIMWIkV
 sAkQ5c9vglLOot+xTmYWyBEKB52GjfzRPFu4MYIguHRZCNfxaeOKZ0gy2dItVmVrsf0X
 mvuEd2U78AivFugjcbuUPe8kemsfCRbQlWkVUyftP4Cji6GZvdAY5jKu0Ze7/4FxFKtZ
 tio6GKbZVQAXhsEjq46l8rb/26y+YvBSgovBAm4z7f0GY4qHlhDDG6Op7syEAWEA73ea
 pw+GDG6PE8hPplXBq1pBcKh1OIngIVPDL7N4GPjKCYe+vRUl9/ma4TdLhZaW4/ggAum1
 Sngw==
X-Gm-Message-State: APjAAAWhl/paW9/kgwzrgSLmVVBll7opjzjMETGBhQyzHicOfl1qK22P
 ZP21OkxRF/BZ8qxnk66BNLfOta1xrW0ZepSro0citYhgkTO2Lg==
X-Google-Smtp-Source: APXvYqzWUktURlXH3e3E0ycEikO3Fr1WWvu0vLW2wpGsVcwr8RNJX6Z0jDSIIBLHmicsSD38cyBjU9MUNyGsLikQ6Tw=
X-Received: by 2002:a25:790a:: with SMTP id u10mr33491341ybc.379.1563894778008; 
 Tue, 23 Jul 2019 08:12:58 -0700 (PDT)
MIME-Version: 1.0
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de>
 <1563401917.00951412@apps.rackspace.com>
 <D1595770-9481-46F6-AC50-3A720E28E03D@gmail.com>
 <d8911b7e-406d-adfd-37a5-1c2c20b353f2@bobbriscoe.net>
In-Reply-To: <d8911b7e-406d-adfd-37a5-1c2c20b353f2@bobbriscoe.net>
From: Kyle Rose <krose@krose.org>
Date: Tue, 23 Jul 2019 11:12:46 -0400
Message-ID: <CAJU8_nWTuQ4ERGP9PhXhpiju_4750xc3BX10z4yp4an0QBE-xw@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: Jonathan Morton <chromatix99@gmail.com>,
 "David P. Reed" <dpreed@deepplum.com>, 
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 tsvwg IETF list <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b5adea058e5aa0ab"
X-Mailman-Approved-At: Tue, 23 Jul 2019 17:48:18 -0400
Subject: Re: [Ecn-sane] [tsvwg]  per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 23 Jul 2019 15:12:58 -0000

--000000000000b5adea058e5aa0ab
Content-Type: text/plain; charset="UTF-8"

On Mon, Jul 22, 2019 at 9:44 AM Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Folks,
>
> As promised, I've pulled together and uploaded the main architectural
> arguments about per-flow scheduling that cause concern:
>
> Per-Flow Scheduling and the End-to-End Argum ent
> <http://bobbriscoe.net/projects/latency/per-flow_tr.pdf>
>
> It runs to 6 pages of reading. But I tried to make the time readers will
> have to spend worth it.
>

Before reading the other responses (poisoning my own thinking), I wanted to
offer my own reaction. In the discussion of figure 1, you seem to imply
that there's some obvious choice of bin packing for the flows involved, but
that can't be right. What if the dark green flow has deadlines? Why should
that be the one that gets only leftover bandwidth? I'll return to this
point in a bit.

The tl;dr summary of the paper seems to be that the L4S approach leaves the
allocation of limited bandwidth up to the endpoints, while FQ arbitrarily
enforces equality in the presence of limited bandwidth; but in reality the
bottleneck device needs to make *some* choice when there's a shortage and
flows don't respond. That requires some choice of policy.

In FQ, the chosen policy is to make sure every flow has the ability to get
low latency for itself, but in the absence of some other kind of trusted
signaling allocates an equal proportion of the available bandwidth to each
flow. ISTM this is the best you can do in an adversarial environment,
because anything else can be gamed to get a more than equal share (and
depending on how "flow" is defined, even this can be gamed by opening up
more flows; but this is not a problem unique to FQ).

In L4S, the policy is to assume one queue is well-behaved and one not, and
to use the ECT(1) codepoint as a classifier to get into one or the other.
But policy choice doesn't end there: in an uncooperative or adversarial
environment, you can easily get into a situation in which the bottleneck
has to apply policy to several unresponsive flows in the supposedly
well-behaved queue. Note that this doesn't even have to involve bad actors
misclassifying on purpose: it could be two uncooperative 200 Mb VR flows
competing for 300 Mb of bandwidth. In this case, L4S falls back to classic,
which with DualQ means every flow, not just the uncooperative ones,
suffers. As a user, I don't want my small, responsive flows to suffer when
uncooperative actors decide to exceed the BBW.

Getting back to figure 1, how do you choose the right allocation? With the
proposed use of ECT(1) as classifier, you have exactly one bit available to
decide which queue, and therefore which policy, applies to a flow. Should
all the classic flows get assigned whatever is left after the L4S flows are
allocated bandwidth? That hardly seems fair to classic flows. But let's say
this policy is implemented. It then escapes me how this is any different
from the trust problems facing end-to-end DSCP/QoS: why wouldn't everyone
just classify their classic flows as L4S, forcing everything to be treated
as classic and getting access to a (greater) share of the overall BBW? Then
we're left both with a spent ECT(1) codepoint and a need for FQ or some
other queuing policy to arbitrate between flows, without any bits with
which to implement the high-fidelity congestion signal required to achieve
low latency without getting squeezed out.

The bottom line is that I see no way to escape the necessity of something
FQ-like at bottlenecks outside of the sender's trust domain. If FQ can't be
done in backbone-grade hardware, then the only real answer is pipes in the
core big enough to force the bottleneck to live somewhere closer to the
edge, where FQ does scale.

Note that, in a perfect world, FQ wouldn't trigger at all because there
would always be enough bandwidth for everything users wanted to do, but in
the real world it seems like the best you can possibly do in the absence of
trusted information about how to prioritize traffic. IMO, best to think of
FQ as a last-ditch measure indicating to the operator that they're gonna
need a bigger pipe than as a steady-state bandwidth allocator.

Kyle

--000000000000b5adea058e5aa0ab
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr">On Mon, Jul 22, 2019 at =
9:44 AM Bob Briscoe &lt;<a href=3D"mailto:ietf@bobbriscoe.net">ietf@bobbris=
coe.net</a>&gt; wrote:<br></div><div class=3D"gmail_quote"><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid r=
gb(204,204,204);padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF">
    Folks,<br>
    <br>
    As promised, I&#39;ve pulled together and uploaded the main
    architectural arguments about per-flow scheduling that cause
    concern:<br>
    <p style=3D"margin:0px;text-indent:0px"><a href=3D"http://bobbriscoe.ne=
t/projects/latency/per-flow_tr.pdf" target=3D"_blank">Per-Flow Scheduling a=
nd the End-to-End Argum ent</a></p>
    <br>
    It runs to 6 pages of reading. But I tried to make the time readers
    will have to spend worth it.<br></div></blockquote><div><br></div><div>=
Before reading the other responses (poisoning my own thinking), I wanted to=
 offer my own reaction. In the discussion of figure 1, you seem to imply th=
at there&#39;s some obvious choice of bin packing for the flows involved, b=
ut that can&#39;t be right. What if the dark green flow has deadlines? Why =
should that be the one that gets only leftover bandwidth? I&#39;ll return t=
o this point in a bit.<br></div><div><br></div><div>The tl;dr summary of th=
e paper seems to be that the L4S approach leaves the allocation of limited =
bandwidth up to the endpoints, while FQ arbitrarily enforces equality in th=
e presence of limited bandwidth; but in reality the bottleneck device needs=
 to make *some* choice when there&#39;s a shortage and flows don&#39;t resp=
ond. That requires some choice of policy.<br></div><div><br></div><div>In F=
Q, the chosen policy is to make sure every flow has the ability to get low =
latency for itself, but in the absence of some other kind of trusted signal=
ing allocates an equal proportion of the available bandwidth to each flow. =
ISTM this is the best you can do in an adversarial environment, because any=
thing else can be gamed to get a more than equal share (and depending on ho=
w &quot;flow&quot; is defined, even this can be gamed by opening up more fl=
ows; but this is not a problem unique to FQ).<br></div><div><br></div><div>=
In L4S, the policy is to assume one queue is well-behaved and one not, and =
to use the ECT(1) codepoint as a classifier to get into one or the other. B=
ut policy choice doesn&#39;t end there: in an uncooperative or adversarial =
environment, you can easily get into a situation in which the bottleneck ha=
s to apply policy to several unresponsive flows in the supposedly well-beha=
ved queue. Note that this doesn&#39;t even have to involve bad actors miscl=
assifying on purpose: it could be two uncooperative 200 Mb VR flows competi=
ng for 300 Mb of bandwidth. In this case, L4S falls back to classic, which =
with DualQ means every flow, not just the uncooperative ones, suffers. As a=
 user, I don&#39;t want my small, responsive flows to suffer when uncoopera=
tive actors decide to exceed the BBW.<br></div><div><br></div><div>Getting =
back to figure 1, how do you choose the right allocation? With the proposed=
 use of ECT(1) as classifier, you have exactly one bit available to decide =
which queue, and therefore which policy, applies to a flow.  Should all the=
 classic flows get assigned whatever is left after the L4S flows are alloca=
ted bandwidth? That hardly seems fair to classic flows. But let&#39;s say t=
his policy is implemented. It then escapes me how this is any different fro=
m the trust problems facing end-to-end DSCP/QoS: why wouldn&#39;t everyone =
just classify their classic flows as L4S, forcing everything to be treated =
as classic and getting access to a (greater) share of the overall BBW? Then=
 we&#39;re left both with a spent ECT(1) codepoint and a need for FQ or som=
e other queuing policy to arbitrate between flows, without any bits with wh=
ich to implement the high-fidelity congestion signal required to achieve lo=
w latency without getting squeezed out.<br></div><div><br></div><div>The bo=
ttom line is that I see no way to escape the necessity of something FQ-like=
 at bottlenecks outside of the sender&#39;s trust domain. If FQ can&#39;t b=
e done in backbone-grade hardware, then the only real answer is pipes in th=
e core big enough to force the bottleneck to live somewhere closer to the e=
dge, where FQ does scale.</div><div><br></div><div>Note that, in a perfect =
world, FQ wouldn&#39;t trigger at all because there would always be enough =
bandwidth for everything users wanted to do, but in the real world it seems=
 like the best you can possibly do in the absence of trusted information ab=
out how to prioritize traffic. IMO, best to think of FQ as a last-ditch mea=
sure indicating to the operator that they&#39;re gonna need a bigger pipe t=
han as a steady-state bandwidth allocator.<br></div><div><br></div><div>Kyl=
e<br></div><div><br></div></div></div></div>

--000000000000b5adea058e5aa0ab--