From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@deepplum.com>
Received: from smtp108.iad3a.emailsrvr.com (smtp108.iad3a.emailsrvr.com
 [173.203.187.108])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 127BC3B29E
 for <ecn-sane@lists.bufferbloat.net>; Mon, 24 Jun 2019 14:57:59 -0400 (EDT)
Received: from smtp30.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp30.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id DEA3A42E8;
 Mon, 24 Jun 2019 14:57:58 -0400 (EDT)
X-SMTPDoctor-Processed: csmtpprox beta
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=g001.emailsrvr.com;
 s=20190322-9u7zjiwi; t=1561402678;
 bh=5gB5oYog5lh/Fi0bKxZABJU4LWZVD8xXfsqjbXMxfQQ=;
 h=Date:Subject:From:To:From;
 b=N7V9ty7m47cnh7Liy2MQWcPdN4rc7HfyvDSUO/5PPewSo8J8hq/6JLtnajT+xey5a
 DcvnpcXdZcCB4p8tyJIvLZc1NGBRPb1cnHUnoiJx1xiFQi+XUfZ0sAPRg1CZ7XOoMI
 rvGowovg0tteJ/+mu4VJxm9uWbiPrZzcX+qHUIy8=
Received: from app27.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140])
 by smtp30.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 97F0E42DA;
 Mon, 24 Jun 2019 14:57:58 -0400 (EDT)
X-Sender-Id: dpreed@deepplum.com
Received: from app27.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12);
 Mon, 24 Jun 2019 14:57:58 -0400
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
 by app27.wa-webapps.iad3a (Postfix) with ESMTP id 80E9020046;
 Mon, 24 Jun 2019 14:57:58 -0400 (EDT)
Received: by apps.rackspace.com
 (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) 
 with HTTP; Mon, 24 Jun 2019 14:57:58 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Mon, 24 Jun 2019 14:57:58 -0400 (EDT)
From: "David P. Reed" <dpreed@deepplum.com>
To: "Jonathan Morton" <chromatix99@gmail.com>
Cc: "Brian E Carpenter" <brian.e.carpenter@gmail.com>,
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 "tsvwg IETF list" <tsvwg@ietf.org>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="----=_20190624145758000000_79469"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To: <081BAF4F-2E1C-441B-A31A-9AC70E3EDA32@gmail.com>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> 
 <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de> 
 <CAHx=1M4+sJBEe-wqCyuVyy=oDz7A+SG_ZxBbu_ZZDZiCHrX2uw@mail.gmail.com> 
 <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com> 
 <1561233009.95886420@apps.rackspace.com> 
 <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com> 
 <1561241377.4026977@apps.rackspace.com> 
 <081BAF4F-2E1C-441B-A31A-9AC70E3EDA32@gmail.com>
Message-ID: <1561402678.523819778@apps.rackspace.com>
X-Mailer: webmail/16.4.5-RC
Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 24 Jun 2019 18:57:59 -0000

------=_20190624145758000000_79469
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

=0AJonathan - all of the things you say are kind of silly. An HTTP 1.1 prot=
ocol running over TCP is not compatible with this description, except in "f=
antasyland".=0A =0AI think you are obsessed with some idea of "proving me w=
rong". That's not productive.=0A =0AIf you have actual data describing how =
HTTP 1.1 connections proceed over time that disagrees with my observation, =
show them. Preferably taken in the wild.=0A =0AI honestly can't imagine tha=
t you have actually observed any system other than the constrained single c=
onnection between a LAN and a residential ISP.=0A =0ATCP doesn't have a "na=
tural sawtooth" - that is the response of TCP to a particular "queueing dis=
cipline" in a particular kind of a router - it would respond differently (a=
nd does!) if the router were to drop packets randomly on a Poisson basis, f=
or example. No sawtooth at all.=0A =0ASo you seem to see routers are part o=
f TCP. That's not the way the Internet is designed.=0A =0A =0A =0A =0AOn Sa=
turday, June 22, 2019 7:07pm, "Jonathan Morton" <chromatix99@gmail.com> sai=
d:=0A=0A=0A=0A> > On 23 Jun, 2019, at 1:09 am, David P. Reed <dpreed@deeppl=
um.com>=0A> wrote:=0A> >=0A> > per-flow scheduling is appropriate on a shar=
ed link. However, the end-to-end=0A> argument would suggest that the networ=
k not try to divine which flows get=0A> preferred.=0A> > And beyond the end=
-to-end argument, there's a practical problem - since the=0A> ideal state o=
f a shared link means that it ought to have no local backlog in the=0A> que=
ue, the information needed to schedule "fairly" isn't in the queue backlog=
=0A> itself. If there is only one packet, what's to schedule?=0A> =0A> This=
 is a great straw-man. Allow me to deconstruct it.=0A> =0A> The concept tha=
t DRR++ has empirically proved is that flows can be classified into=0A> two=
 categories - sparse and saturating - very easily by the heuristic that a=
=0A> saturating flow's arrival rate exceeds its available delivery rate, an=
d the=0A> opposite is true for a sparse flow.=0A> =0A> An excessive arrival=
 rate results in a standing queue; with Reno, the excess=0A> arrival rate a=
fter capacity is reached is precisely 1 segment per RTT, very small=0A> nex=
t to modern link capacities. If there is no overall standing queue, then by=
=0A> definition all of the flows passing through are currently sparse. DRR+=
+ (as=0A> implemented in fq_codel and Cake) ensures that all sparse traffic=
 is processed=0A> with minimum delay and no AQM activity, while saturating =
traffic is metered out=0A> fairly and given appropriate AQM signals.=0A> =
=0A> > In fact, what the ideal queueing discipline would do is send signals=
 to the=0A> endpoints that provide information as to what each flow's appro=
priate share is,=0A> and/or how far its current share is from what's fair.=
=0A> =0A> The definition of which flows are sparse and which are saturating=
 shifts=0A> dynamically in response to endpoint behaviour.=0A> =0A> > Well,=
 presumably the flows have definable average rates.=0A> =0A> Today's TCP tr=
affic exhibits the classic sawtooth behaviour - which has a=0A> different s=
hape and period with CUBIC than Reno, but is fundamentally similar. =0A> Th=
e sender probes capacity by increasing send rate until a congestion signal =
is=0A> fed back to it, at which point it drops back sharply. With efficient=
 AQM action,=0A> a TCP flow will therefore spend most of its time "sparse" =
and using less than the=0A> available path capacity, with occasional excurs=
ions into "saturating" territory=0A> which are fairly promptly terminated b=
y AQM signals.=0A> =0A> So TCP does *not* have a definable "average rate". =
It grows to fill available=0A> capacity, just like the number of cars on a =
motorway network.=0A> =0A> The recent work on high-fidelity ECN (including =
SCE) aims to eliminate the=0A> sawtooth, so that dropping out of "saturatin=
g" mode is done faster and by only a=0A> small margin, wasting less capacit=
y and reducing peak delays - very close to ideal=0A> control as you describ=
e. But it's still necessary to avoid giving these signals=0A> unnecessarily=
 to "sparse" flows, which would cause them to back off and thus waste=0A> c=
apacity, but only to "saturating" flows that have just begun to build a que=
ue. =0A> And it's also necessary to protect these well-behaved "modern" flo=
ws from "legacy"=0A> endpoint behaviour, and vice versa. DRR++ does that ve=
ry naturally.=0A> =0A> > Merely re-ordering the packets on a link is just n=
ot very effective at=0A> achieving fairness.=0A> =0A> I'm afraid this asser=
tion is simply false. DRR++ does precisely that, and=0A> achieves near-perf=
ect fairness.=0A> =0A> It is important however to define "flow" correctly r=
elative to the measure of=0A> fairness you want to achieve. Traditionally t=
he unique 5-tuple is used to define=0A> "flow", but this means applications=
 can game the system by opening multiple flows.=0A> For an ISP a better def=
inition might be that each subscriber's traffic is one=0A> "flow". Or there=
 is a tweak to DRR++ which allows a two-layer fairness=0A> definition, impl=
emented successfully in Cake.=0A> =0A> > So the end-to-end approach would s=
uggest moving most of the scheduling back=0A> to the endpoints of each flow=
, with the role of the routers being to extract=0A> information about the c=
ompeting flows that are congesting the network, and=0A> forwarding those si=
gnals (via drops or marking) to the endpoints. That's because,=0A> in the e=
nd-to-end argument that applies here - the router cannot do the entire=0A> =
function of managing congestion or priority.=0A> =0A> It must be remembered=
 that congestion signals require one RTT to circulate from=0A> the bottlene=
ck, via the receiver, back to the sender, and their effects to then be=0A> =
felt at the bottleneck. That's typically a much longer response time (say 1=
00ms=0A> for a general Internet path) than can be achieved by packet schedu=
ling=0A> (sub-millisecond for a 20Mbps link), and therefore effects only a =
looser control=0A> (by fundamental control theory). Both mechanisms are use=
ful and complement each=0A> other.=0A> =0A> My personal interpretation of t=
he end-to-end principle is that endpoints generally=0A> do not, cannot, and=
 *should not* be aware of the topology of the network between=0A> them, nor=
 of any other traffic that might be sharing that network. The network=0A> i=
tself takes care of those details, and may send standardised control-feedba=
ck=0A> signals to the endpoints to inform them about actions they need to t=
ake. These=0A> currently take the form of ICMP error packets and the ECN fi=
eld, the latter=0A> substituted by packet drops on Not-ECT flows.=0A> =0A> =
- Jonathan Morton
------=_20190624145758000000_79469
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<font face=3D"arial" size=3D"3"><p style=3D"margin:0;padding:0;font-family:=
 arial; font-size: 12pt; overflow-wrap: break-word;">Jonathan - all of the =
things you say are kind of silly. An HTTP 1.1 protocol running over TCP is =
not compatible with this description, except in "fantasyland".</p>=0A<p sty=
le=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; overflow-wrap=
: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: ari=
al; font-size: 12pt; overflow-wrap: break-word;">I think you are obsessed w=
ith some idea of "proving me wrong". That's not productive.</p>=0A<p style=
=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; overflow-wrap: =
break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial=
; font-size: 12pt; overflow-wrap: break-word;">If you have actual data desc=
ribing how HTTP 1.1 connections proceed over time that disagrees with my ob=
servation, show them. Preferably taken in the wild.</p>=0A<p style=3D"margi=
n:0;padding:0;font-family: arial; font-size: 12pt; overflow-wrap: break-wor=
d;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-si=
ze: 12pt; overflow-wrap: break-word;">I honestly can't imagine that you hav=
e actually observed any system other than the constrained single connection=
 between a LAN and a residential ISP.</p>=0A<p style=3D"margin:0;padding:0;=
font-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=
=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; over=
flow-wrap: break-word;">TCP doesn't have a "natural sawtooth" - that is the=
 response of TCP to a particular "queueing discipline" in a particular kind=
 of a router - it would respond differently (and does!) if the router were =
to drop packets randomly on a Poisson basis, for example. No sawtooth at al=
l.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt=
; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;f=
ont-family: arial; font-size: 12pt; overflow-wrap: break-word;">So you seem=
 to see routers are part of TCP. That's not the way the Internet is designe=
d.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt=
; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;f=
ont-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=
=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; over=
flow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-fa=
mily: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p s=
tyle=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; overflow-wr=
ap: break-word;">On Saturday, June 22, 2019 7:07pm, "Jonathan Morton" &lt;c=
hromatix99@gmail.com&gt; said:<br /><br /></p>=0A<div id=3D"SafeStyles15614=
02248">=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 12p=
t; overflow-wrap: break-word;">&gt; &gt; On 23 Jun, 2019, at 1:09 am, David=
 P. Reed &lt;dpreed@deepplum.com&gt;<br />&gt; wrote:<br />&gt; &gt;<br />&=
gt; &gt; per-flow scheduling is appropriate on a shared link. However, the =
end-to-end<br />&gt; argument would suggest that the network not try to div=
ine which flows get<br />&gt; preferred.<br />&gt; &gt; And beyond the end-=
to-end argument, there's a practical problem - since the<br />&gt; ideal st=
ate of a shared link means that it ought to have no local backlog in the<br=
 />&gt; queue, the information needed to schedule "fairly" isn't in the que=
ue backlog<br />&gt; itself. If there is only one packet, what's to schedul=
e?<br />&gt; <br />&gt; This is a great straw-man. Allow me to deconstruct =
it.<br />&gt; <br />&gt; The concept that DRR++ has empirically proved is t=
hat flows can be classified into<br />&gt; two categories - sparse and satu=
rating - very easily by the heuristic that a<br />&gt; saturating flow's ar=
rival rate exceeds its available delivery rate, and the<br />&gt; opposite =
is true for a sparse flow.<br />&gt; <br />&gt; An excessive arrival rate r=
esults in a standing queue; with Reno, the excess<br />&gt; arrival rate af=
ter capacity is reached is precisely 1 segment per RTT, very small<br />&gt=
; next to modern link capacities. If there is no overall standing queue, th=
en by<br />&gt; definition all of the flows passing through are currently s=
parse. DRR++ (as<br />&gt; implemented in fq_codel and Cake) ensures that a=
ll sparse traffic is processed<br />&gt; with minimum delay and no AQM acti=
vity, while saturating traffic is metered out<br />&gt; fairly and given ap=
propriate AQM signals.<br />&gt; <br />&gt; &gt; In fact, what the ideal qu=
eueing discipline would do is send signals to the<br />&gt; endpoints that =
provide information as to what each flow's appropriate share is,<br />&gt; =
and/or how far its current share is from what's fair.<br />&gt; <br />&gt; =
The definition of which flows are sparse and which are saturating shifts<br=
 />&gt; dynamically in response to endpoint behaviour.<br />&gt; <br />&gt;=
 &gt; Well, presumably the flows have definable average rates.<br />&gt; <b=
r />&gt; Today's TCP traffic exhibits the classic sawtooth behaviour - whic=
h has a<br />&gt; different shape and period with CUBIC than Reno, but is f=
undamentally similar. <br />&gt; The sender probes capacity by increasing s=
end rate until a congestion signal is<br />&gt; fed back to it, at which po=
int it drops back sharply. With efficient AQM action,<br />&gt; a TCP flow =
will therefore spend most of its time "sparse" and using less than the<br /=
>&gt; available path capacity, with occasional excursions into "saturating"=
 territory<br />&gt; which are fairly promptly terminated by AQM signals.<b=
r />&gt; <br />&gt; So TCP does *not* have a definable "average rate". It g=
rows to fill available<br />&gt; capacity, just like the number of cars on =
a motorway network.<br />&gt; <br />&gt; The recent work on high-fidelity E=
CN (including SCE) aims to eliminate the<br />&gt; sawtooth, so that droppi=
ng out of "saturating" mode is done faster and by only a<br />&gt; small ma=
rgin, wasting less capacity and reducing peak delays - very close to ideal<=
br />&gt; control as you describe. But it's still necessary to avoid giving=
 these signals<br />&gt; unnecessarily to "sparse" flows, which would cause=
 them to back off and thus waste<br />&gt; capacity, but only to "saturatin=
g" flows that have just begun to build a queue. <br />&gt; And it's also ne=
cessary to protect these well-behaved "modern" flows from "legacy"<br />&gt=
; endpoint behaviour, and vice versa. DRR++ does that very naturally.<br />=
&gt; <br />&gt; &gt; Merely re-ordering the packets on a link is just not v=
ery effective at<br />&gt; achieving fairness.<br />&gt; <br />&gt; I'm afr=
aid this assertion is simply false. DRR++ does precisely that, and<br />&gt=
; achieves near-perfect fairness.<br />&gt; <br />&gt; It is important howe=
ver to define "flow" correctly relative to the measure of<br />&gt; fairnes=
s you want to achieve. Traditionally the unique 5-tuple is used to define<b=
r />&gt; "flow", but this means applications can game the system by opening=
 multiple flows.<br />&gt; For an ISP a better definition might be that eac=
h subscriber's traffic is one<br />&gt; "flow". Or there is a tweak to DRR+=
+ which allows a two-layer fairness<br />&gt; definition, implemented succe=
ssfully in Cake.<br />&gt; <br />&gt; &gt; So the end-to-end approach would=
 suggest moving most of the scheduling back<br />&gt; to the endpoints of e=
ach flow, with the role of the routers being to extract<br />&gt; informati=
on about the competing flows that are congesting the network, and<br />&gt;=
 forwarding those signals (via drops or marking) to the endpoints. That's b=
ecause,<br />&gt; in the end-to-end argument that applies here - the router=
 cannot do the entire<br />&gt; function of managing congestion or priority=
.<br />&gt; <br />&gt; It must be remembered that congestion signals requir=
e one RTT to circulate from<br />&gt; the bottleneck, via the receiver, bac=
k to the sender, and their effects to then be<br />&gt; felt at the bottlen=
eck. That's typically a much longer response time (say 100ms<br />&gt; for =
a general Internet path) than can be achieved by packet scheduling<br />&gt=
; (sub-millisecond for a 20Mbps link), and therefore effects only a looser =
control<br />&gt; (by fundamental control theory). Both mechanisms are usef=
ul and complement each<br />&gt; other.<br />&gt; <br />&gt; My personal in=
terpretation of the end-to-end principle is that endpoints generally<br />&=
gt; do not, cannot, and *should not* be aware of the topology of the networ=
k between<br />&gt; them, nor of any other traffic that might be sharing th=
at network. The network<br />&gt; itself takes care of those details, and m=
ay send standardised control-feedback<br />&gt; signals to the endpoints to=
 inform them about actions they need to take. These<br />&gt; currently tak=
e the form of ICMP error packets and the ECN field, the latter<br />&gt; su=
bstituted by packet drops on Not-ECT flows.<br />&gt; <br />&gt; - Jonathan=
 Morton</p>=0A</div></font>
------=_20190624145758000000_79469--