From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp116.iad3a.emailsrvr.com (smtp116.iad3a.emailsrvr.com [173.203.187.116]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 57B183CB3A for ; Wed, 17 Jul 2019 18:34:16 -0400 (EDT) Received: from smtp39.relay.iad3a.emailsrvr.com (localhost [127.0.0.1]) by smtp39.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 268F85766; Wed, 17 Jul 2019 18:34:16 -0400 (EDT) X-SMTPDoctor-Processed: csmtpprox beta Received: from app38.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp39.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id EE4235758; Wed, 17 Jul 2019 18:34:15 -0400 (EDT) X-Sender-Id: dpreed@deepplum.com Received: from app38.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12); Wed, 17 Jul 2019 18:34:16 -0400 Received: from deepplum.com (localhost.localdomain [127.0.0.1]) by app38.wa-webapps.iad3a (Postfix) with ESMTP id D8436E0359; Wed, 17 Jul 2019 18:34:15 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) with HTTP; Wed, 17 Jul 2019 18:34:15 -0400 (EDT) X-Auth-ID: dpreed@deepplum.com Date: Wed, 17 Jul 2019 18:34:15 -0400 (EDT) From: "David P. Reed" To: "David P. Reed" Cc: "Sebastian Moeller" , "ecn-sane@lists.bufferbloat.net" , "Bob Briscoe" , "tsvwg IETF list" MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: quoted-printable Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: <1563401917.00951412@apps.rackspace.com> References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de> <1563401917.00951412@apps.rackspace.com> Message-ID: <1563402855.88484511@apps.rackspace.com> X-Mailer: webmail/16.4.5-RC Subject: Re: [Ecn-sane] per-flow scheduling X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jul 2019 22:34:16 -0000 A follow up point that I think needs to be made is one more end-to-end argu= ment:=0A=0AIt is NOT the job of the IP transport layer to provide free stor= age for low priority packets. The end-to-end argument here says: the ends c= an and must hold packets until they are either delivered or not relevant (i= n RTP, they become irrelevant when they get older than their desired delive= ry time, if you want an example of the latter), SO, the network should not = provide the function of storage beyond the minimum needed to deal with tran= sients.=0A=0AThat means, unfortunately, that the dream of some kind of "bac= kground" path that stores "low priority" packets in the network fails the e= nd-to-end argument test.=0A=0AIf you think about this, it even applies to s= ome imaginary interplanetary IP layer network. Queueing delay is not a feat= ure of any end-to-end requirement.=0A=0AWhat may be desired at the router/l= ink level in an interplanetary IP layer is holding packets because a link i= s actually down, or using link-level error correction coding or retransmiss= ion to bring the error rate down to an acceptable level before declaring it= down. But that's quite different - it's the link level protocol, which aim= s to deliver minimum queueing delay under tough conditions, without bufferi= ng more than needed for that (the number of bits that fit in the light-spee= d transmission at the transmission rate.=0A=0ASo, the main reason I'm sayin= g this is because again, there are those who want to implement the TCP func= tion of reliable delivery of each packet in the links. That's a very bad id= ea.=0A=0AOn Wednesday, July 17, 2019 6:18pm, "David P. Reed" said:=0A=0A> I do want to toss in my personal observations about th= e "end-to-end argument"=0A> related to per-flow-scheduling. (Such arguments= are, of course, a class of=0A> arguments to which my name is attached. Not= that I am a judge/jury of such=0A> questions...)=0A> =0A> A core principle= of the Internet design is to move function out of the network,=0A> includi= ng routers and middleboxes, if those functions=0A> =0A> a) can be properly = accomplished by the endpoints, and=0A> b) are not relevant to all uses of t= he Internet transport fabric being used by the=0A> ends.=0A> =0A> The ratio= nale here has always seemed obvious to me. Like Bob Briscoe suggests, we=0A= > were very wary of throwing features into the network that would preclude= =0A> unanticipated future interoperability needs, new applications, and new= technology=0A> in the infrastructure of the Internet as a whole.=0A> =0A> = So what are we talking about here (ignoring the fine points of SCE, some of= which=0A> I think are debatable - especially the focus on TCP alone, since= much traffic will=0A> likely move away from TCP in the near future.=0A> = =0A> A second technical requirement (necessary invariant) of the Internet's= transport=0A> is that the entire Internet depends on rigorously stopping q= ueueing delay from=0A> building up anywhere except at the endpoints, where = the ends can manage it.This is=0A> absolutely critical, though it is peculi= ar in that many engineers, especially=0A> those who work at the IP layer an= d below, have a mental model of routing as=0A> essentially being about buil= ding up queueing delay (in order to manage priority in=0A> some trivial way= by building up the queue on purpose, apparently).=0A> =0A> This second tec= hnical requirement cannot be resolved merely by the endpoints.=0A> The reas= on is that the endpoints cannot know accurately what host-host paths share= =0A> common queues.=0A> =0A> This lack of a way to "cooperate" among indepe= ndent users of a queue cannot be=0A> solved by a purely end-to-end solution= . (well, I suppose some genius might invent=0A> a way, but I have not seen = one in my 36 years closely watching the Internet in=0A> operation since it = went live in 1983.)=0A> =0A> So, what the end-to-end argument would tend to= do here, in my opinion, is to=0A> provide the most minimal mechanism in th= e devices that are capable of building up=0A> a queue in order to allow all= the ends sharing that queue to do their job - which=0A> is to stop filling= up the queue!=0A> =0A> Only the endpoints can prevent filling up queues. A= nd depending on the protocol,=0A> they may need to make very different, yet= compatible choices.=0A> =0A> This is a question of design at the architect= ural level. And the future matters.=0A> =0A> So there is an end-to-end argu= ment to be made here, but it is a subtle one.=0A> =0A> The basic mechanism = for controlling queue depth has been, and remains, quite=0A> simple: droppi= ng packets. This has two impacts: 1) immediately reducing queueing=0A> dela= y, and 2) signalling to endpoints that are paying attention that they have= =0A> contributed to an overfull queue.=0A> =0A> The optimum queueing delay = in a steady state would always be one packet or less.=0A> Kleinrock has sho= wn this in the last few years. Of course there aren't steady=0A> states. Bu= t we don't want a mechanism that can't converge to that steady state=0A> *q= uickly*, for all queues in the network.=0A> =0A> Another issue is that endp= oints are not aware of the fact that packets can take=0A> multiple paths to= any destination. In the future, alternate path choices can be=0A> made by = routers (when we get smarter routing algorithms based on traffic=0A> engine= ering).=0A> =0A> So again, some minimal kind of information must be exposed= to endpoints that will=0A> continue to communicate. Again, the routers mus= t be able to help a wide variety of=0A> endpoints with different use cases = to decide how to move queue buildup out of the=0A> network itself.=0A> =0A>= Now the decision made by the endpoints must be made in the context of info= rmation=0A> about fairness. Maybe this is what is not obvious.=0A> =0A> The= most obvious notion of fairness is equal shares among source host, dest ho= st=0A> pairs. There are drawbacks to that, but the benefit of it is that it= affects the=0A> IP layer alone, and deals with lots of boundary cases like= the case where a single=0A> host opens a zillion TCP connections or uses l= ots of UDP source ports or=0A> destinations to somehow "cheat" by appearing= to have "lots of flows".=0A> =0A> Another way to deal with dividing up flo= ws is to ignore higher level protocol=0A> information entirely, and put the= flow idenfitication in the IP layer. A 32-bit or=0A> 64-bit random number = could be added as an "option" to IP to somehow extend the=0A> flow space.= =0A> =0A> But that is not the most important thing today.=0A> =0A> I write = this to say:=0A> 1) some kind of per-flow queueing, during the transient st= ate where a queue is=0A> overloaded before packets are dropped would provid= e much needed information to the=0A> ends of every flow sharing a common qu= eue.=0A> 2) per-flow queueing, minimized to a very low level, using IP enve= lope address=0A> information (plus maybe UDP and TCP addresses for those pr= otocols in an extended=0A> address-based flow definition) is totally compat= ible with end-to-end arguments,=0A> but ONLY if the decisions made are cert= ain to drive queueing delay out of the=0A> router to the endpoints.=0A> =0A= > =0A> =0A> =0A> On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" said:=0A> =0A>> Dear Bob, dear IETF team,=0A>>=0A>>=0A>>> O= n Jun 19, 2019, at 16:12, Bob Briscoe wrote:=0A>>>=0A= >>> Jake, all,=0A>>>=0A>>> You may not be aware of my long history of conce= rn about how per-flow scheduling=0A>>> within endpoints and networks will l= imit the Internet in future. I find per-flow=0A>>> scheduling a violation o= f the e2e principle in such a profound way - the dynamic=0A>>> choice of th= e spacing between packets - that most people don't even associate it=0A>>> = with the e2e principle.=0A>>=0A>> =09This does not rhyme well with the L4S = stated advantage of allowing packet=0A>> reordering (due to mandating RACK = for all L4S tcp endpoints). Because surely=0A>> changing the order of packe= ts messes up the "the dynamic choice of the spacing=0A>> between packets" i= n a significant way. IMHO it is either L4S is great because it=0A>> will gi= ve intermediate hops more leeway to re-order packets, or "a sender's=0A>> p= acket spacing" is sacred, please make up your mind which it is.=0A>>=0A>>>= =0A>>> I detected that you were talking about FQ in a way that might have a= ssumed my=0A>>> concern with it was just about implementation complexity. I= f you (or anyone=0A>>> watching) is not aware of the architectural concerns= with per-flow scheduling, I=0A>>> can enumerate them.=0A>>=0A>> =09Please = do not hesitate to do so after your deserved holiday, and please state a=0A= >> superior alternative.=0A>>=0A>> Best Regards=0A>> =09Sebastian=0A>>=0A>>= =0A>>>=0A>>> I originally started working on what became L4S to prove that = it was possible to=0A>>> separate out reducing queuing delay from throughpu= t scheduling. When Koen and I=0A>>> started working together on this, we di= scovered we had identical concerns on=0A>>> this.=0A>>>=0A>>>=0A>>>=0A>>> B= ob=0A>>>=0A>>>=0A>>> --=0A>>> _____________________________________________= ___________________=0A>>> Bob Briscoe http://= bobbriscoe.net/=0A>>>=0A>>> _______________________________________________= =0A>>> Ecn-sane mailing list=0A>>> Ecn-sane@lists.bufferbloat.net=0A>>> htt= ps://lists.bufferbloat.net/listinfo/ecn-sane=0A>>=0A>> ____________________= ___________________________=0A>> Ecn-sane mailing list=0A>> Ecn-sane@lists.= bufferbloat.net=0A>> https://lists.bufferbloat.net/listinfo/ecn-sane=0A>>= =0A> =0A> =0A> _______________________________________________=0A> Ecn-sane= mailing list=0A> Ecn-sane@lists.bufferbloat.net=0A> https://lists.bufferbl= oat.net/listinfo/ecn-sane=0A> =0A