From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@deepplum.com>
Received: from smtp116.iad3a.emailsrvr.com (smtp116.iad3a.emailsrvr.com
 [173.203.187.116])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 57B183CB3A
 for <ecn-sane@lists.bufferbloat.net>; Wed, 17 Jul 2019 18:34:16 -0400 (EDT)
Received: from smtp39.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp39.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 268F85766;
 Wed, 17 Jul 2019 18:34:16 -0400 (EDT)
X-SMTPDoctor-Processed: csmtpprox beta
Received: from app38.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140])
 by smtp39.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id EE4235758;
 Wed, 17 Jul 2019 18:34:15 -0400 (EDT)
X-Sender-Id: dpreed@deepplum.com
Received: from app38.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12);
 Wed, 17 Jul 2019 18:34:16 -0400
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
 by app38.wa-webapps.iad3a (Postfix) with ESMTP id D8436E0359;
 Wed, 17 Jul 2019 18:34:15 -0400 (EDT)
Received: by apps.rackspace.com
 (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) 
 with HTTP; Wed, 17 Jul 2019 18:34:15 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Wed, 17 Jul 2019 18:34:15 -0400 (EDT)
From: "David P. Reed" <dpreed@deepplum.com>
To: "David P. Reed" <dpreed@deepplum.com>
Cc: "Sebastian Moeller" <moeller0@gmx.de>,
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 "Bob Briscoe" <ietf@bobbriscoe.net>, "tsvwg IETF list" <tsvwg@ietf.org>
MIME-Version: 1.0
Content-Type: text/plain;charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Importance: Normal
X-Priority: 3 (Normal)
X-Type: plain
In-Reply-To: <1563401917.00951412@apps.rackspace.com>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>  
 <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de> 
 <1563401917.00951412@apps.rackspace.com>
Message-ID: <1563402855.88484511@apps.rackspace.com>
X-Mailer: webmail/16.4.5-RC
Subject: Re: [Ecn-sane] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2019 22:34:16 -0000

A follow up point that I think needs to be made is one more end-to-end argu=
ment:=0A=0AIt is NOT the job of the IP transport layer to provide free stor=
age for low priority packets. The end-to-end argument here says: the ends c=
an and must hold packets until they are either delivered or not relevant (i=
n RTP, they become irrelevant when they get older than their desired delive=
ry time, if you want an example of the latter), SO, the network should not =
provide the function of storage beyond the minimum needed to deal with tran=
sients.=0A=0AThat means, unfortunately, that the dream of some kind of "bac=
kground" path that stores "low priority" packets in the network fails the e=
nd-to-end argument test.=0A=0AIf you think about this, it even applies to s=
ome imaginary interplanetary IP layer network. Queueing delay is not a feat=
ure of any end-to-end requirement.=0A=0AWhat may be desired at the router/l=
ink level in an interplanetary IP layer is holding packets because a link i=
s actually down, or using link-level error correction coding or retransmiss=
ion to bring the error rate down to an acceptable level before declaring it=
 down. But that's quite different - it's the link level protocol, which aim=
s to deliver minimum queueing delay under tough conditions, without bufferi=
ng more than needed for that (the number of bits that fit in the light-spee=
d transmission at the transmission rate.=0A=0ASo, the main reason I'm sayin=
g this is because again, there are those who want to implement the TCP func=
tion of reliable delivery of each packet in the links. That's a very bad id=
ea.=0A=0AOn Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deeppl=
um.com> said:=0A=0A> I do want to toss in my personal observations about th=
e "end-to-end argument"=0A> related to per-flow-scheduling. (Such arguments=
 are, of course, a class of=0A> arguments to which my name is attached. Not=
 that I am a judge/jury of such=0A> questions...)=0A> =0A> A core principle=
 of the Internet design is to move function out of the network,=0A> includi=
ng routers and middleboxes, if those functions=0A> =0A> a) can be properly =
accomplished by the endpoints, and=0A> b) are not relevant to all uses of t=
he Internet transport fabric being used by the=0A> ends.=0A> =0A> The ratio=
nale here has always seemed obvious to me. Like Bob Briscoe suggests, we=0A=
> were very wary of throwing features into the network that would preclude=
=0A> unanticipated future interoperability needs, new applications, and new=
 technology=0A> in the infrastructure of the Internet as a whole.=0A> =0A> =
So what are we talking about here (ignoring the fine points of SCE, some of=
 which=0A> I think are debatable - especially the focus on TCP alone, since=
 much traffic will=0A> likely move away from TCP in the near future.=0A> =
=0A> A second technical requirement (necessary invariant) of the Internet's=
 transport=0A> is that the entire Internet depends on rigorously stopping q=
ueueing delay from=0A> building up anywhere except at the endpoints, where =
the ends can manage it.This is=0A> absolutely critical, though it is peculi=
ar in that many engineers, especially=0A> those who work at the IP layer an=
d below, have a mental model of routing as=0A> essentially being about buil=
ding up queueing delay (in order to manage priority in=0A> some trivial way=
 by building up the queue on purpose, apparently).=0A> =0A> This second tec=
hnical requirement cannot be resolved merely by the endpoints.=0A> The reas=
on is that the endpoints cannot know accurately what host-host paths share=
=0A> common queues.=0A> =0A> This lack of a way to "cooperate" among indepe=
ndent users of a queue cannot be=0A> solved by a purely end-to-end solution=
. (well, I suppose some genius might invent=0A> a way, but I have not seen =
one in my 36 years closely watching the Internet in=0A> operation since it =
went live in 1983.)=0A> =0A> So, what the end-to-end argument would tend to=
 do here, in my opinion, is to=0A> provide the most minimal mechanism in th=
e devices that are capable of building up=0A> a queue in order to allow all=
 the ends sharing that queue to do their job - which=0A> is to stop filling=
 up the queue!=0A> =0A> Only the endpoints can prevent filling up queues. A=
nd depending on the protocol,=0A> they may need to make very different, yet=
 compatible choices.=0A> =0A> This is a question of design at the architect=
ural level. And the future matters.=0A> =0A> So there is an end-to-end argu=
ment to be made here, but it is a subtle one.=0A> =0A> The basic mechanism =
for controlling queue depth has been, and remains, quite=0A> simple: droppi=
ng packets. This has two impacts: 1) immediately reducing queueing=0A> dela=
y, and 2) signalling to endpoints that are paying attention that they have=
=0A> contributed to an overfull queue.=0A> =0A> The optimum queueing delay =
in a steady state would always be one packet or less.=0A> Kleinrock has sho=
wn this in the last few years. Of course there aren't steady=0A> states. Bu=
t we don't want a mechanism that can't converge to that steady state=0A> *q=
uickly*, for all queues in the network.=0A> =0A> Another issue is that endp=
oints are not aware of the fact that packets can take=0A> multiple paths to=
 any destination. In the future, alternate path choices can be=0A> made by =
routers (when we get smarter routing algorithms based on traffic=0A> engine=
ering).=0A> =0A> So again, some minimal kind of information must be exposed=
 to endpoints that will=0A> continue to communicate. Again, the routers mus=
t be able to help a wide variety of=0A> endpoints with different use cases =
to decide how to move queue buildup out of the=0A> network itself.=0A> =0A>=
 Now the decision made by the endpoints must be made in the context of info=
rmation=0A> about fairness. Maybe this is what is not obvious.=0A> =0A> The=
 most obvious notion of fairness is equal shares among source host, dest ho=
st=0A> pairs. There are drawbacks to that, but the benefit of it is that it=
 affects the=0A> IP layer alone, and deals with lots of boundary cases like=
 the case where a single=0A> host opens a zillion TCP connections or uses l=
ots of UDP source ports or=0A> destinations to somehow "cheat" by appearing=
 to have "lots of flows".=0A> =0A> Another way to deal with dividing up flo=
ws is to ignore higher level protocol=0A> information entirely, and put the=
 flow idenfitication in the IP layer. A 32-bit or=0A> 64-bit random number =
could be added as an "option" to IP to somehow extend the=0A> flow space.=
=0A> =0A> But that is not the most important thing today.=0A> =0A> I write =
this to say:=0A> 1) some kind of per-flow queueing, during the transient st=
ate where a queue is=0A> overloaded before packets are dropped would provid=
e much needed information to the=0A> ends of every flow sharing a common qu=
eue.=0A> 2) per-flow queueing, minimized to a very low level, using IP enve=
lope address=0A> information (plus maybe UDP and TCP addresses for those pr=
otocols in an extended=0A> address-based flow definition) is totally compat=
ible with end-to-end arguments,=0A> but ONLY if the decisions made are cert=
ain to drive queueing delay out of the=0A> router to the endpoints.=0A> =0A=
> =0A> =0A> =0A> On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <m=
oeller0@gmx.de> said:=0A> =0A>> Dear Bob, dear IETF team,=0A>>=0A>>=0A>>> O=
n Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote:=0A>>>=0A=
>>> Jake, all,=0A>>>=0A>>> You may not be aware of my long history of conce=
rn about how per-flow scheduling=0A>>> within endpoints and networks will l=
imit the Internet in future. I find per-flow=0A>>> scheduling a violation o=
f the e2e principle in such a profound way - the dynamic=0A>>> choice of th=
e spacing between packets - that most people don't even associate it=0A>>> =
with the e2e principle.=0A>>=0A>> =09This does not rhyme well with the L4S =
stated advantage of allowing packet=0A>> reordering (due to mandating RACK =
for all L4S tcp endpoints). Because surely=0A>> changing the order of packe=
ts messes up the "the dynamic choice of the spacing=0A>> between packets" i=
n a significant way. IMHO it is either L4S is great because it=0A>> will gi=
ve intermediate hops more leeway to re-order packets, or "a sender's=0A>> p=
acket spacing" is sacred, please make up your mind which it is.=0A>>=0A>>>=
=0A>>> I detected that you were talking about FQ in a way that might have a=
ssumed my=0A>>> concern with it was just about implementation complexity. I=
f you (or anyone=0A>>> watching) is not aware of the architectural concerns=
 with per-flow scheduling, I=0A>>> can enumerate them.=0A>>=0A>> =09Please =
do not hesitate to do so after your deserved holiday, and please state a=0A=
>> superior alternative.=0A>>=0A>> Best Regards=0A>> =09Sebastian=0A>>=0A>>=
=0A>>>=0A>>> I originally started working on what became L4S to prove that =
it was possible to=0A>>> separate out reducing queuing delay from throughpu=
t scheduling. When Koen and I=0A>>> started working together on this, we di=
scovered we had identical concerns on=0A>>> this.=0A>>>=0A>>>=0A>>>=0A>>> B=
ob=0A>>>=0A>>>=0A>>> --=0A>>> _____________________________________________=
___________________=0A>>> Bob Briscoe                               http://=
bobbriscoe.net/=0A>>>=0A>>> _______________________________________________=
=0A>>> Ecn-sane mailing list=0A>>> Ecn-sane@lists.bufferbloat.net=0A>>> htt=
ps://lists.bufferbloat.net/listinfo/ecn-sane=0A>>=0A>> ____________________=
___________________________=0A>> Ecn-sane mailing list=0A>> Ecn-sane@lists.=
bufferbloat.net=0A>> https://lists.bufferbloat.net/listinfo/ecn-sane=0A>>=
=0A> =0A> =0A> _______________________________________________=0A> Ecn-sane=
 mailing list=0A> Ecn-sane@lists.bufferbloat.net=0A> https://lists.bufferbl=
oat.net/listinfo/ecn-sane=0A> =0A