From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@deepplum.com>
Received: from smtp116.iad3a.emailsrvr.com (smtp116.iad3a.emailsrvr.com
 [173.203.187.116])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id ACE2B3B2A4
 for <ecn-sane@lists.bufferbloat.net>; Thu, 18 Jul 2019 11:02:12 -0400 (EDT)
Received: from smtp23.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 7ABB02510F;
 Thu, 18 Jul 2019 11:02:12 -0400 (EDT)
X-SMTPDoctor-Processed: csmtpprox beta
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=g001.emailsrvr.com;
 s=20190322-9u7zjiwi; t=1563462132;
 bh=abi00DGe85HDO726LL8YKB/hCViTrUmH+4hMNctAEWI=;
 h=Date:Subject:From:To:From;
 b=R9OaGqLlNg19ykQo3e8UD1t4Z/g3zB1YhT/pKS2ZlI72iHVUpTdvIILQdfhk7JXiv
 4U1TsplEmFefL0bGW4oL3Ch6jsyVY7U3dT5N1VcakzFaVoD/v+u+XUgdLnzUcCcotz
 iG+x+FyB7wyK4+R1dVcykpbcuBfgL/cIaUTArt6Q=
Received: from app63.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140])
 by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 3C9352516D;
 Thu, 18 Jul 2019 11:02:12 -0400 (EDT)
X-Sender-Id: dpreed@deepplum.com
Received: from app63.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12);
 Thu, 18 Jul 2019 11:02:12 -0400
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
 by app63.wa-webapps.iad3a (Postfix) with ESMTP id 23049E0046;
 Thu, 18 Jul 2019 11:02:12 -0400 (EDT)
Received: by apps.rackspace.com
 (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) 
 with HTTP; Thu, 18 Jul 2019 11:02:12 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Thu, 18 Jul 2019 11:02:12 -0400 (EDT)
From: "David P. Reed" <dpreed@deepplum.com>
To: "Dave Taht" <dave.taht@gmail.com>
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 "Bob Briscoe" <ietf@bobbriscoe.net>, "tsvwg IETF list" <tsvwg@ietf.org>
MIME-Version: 1.0
Content-Type: text/plain;charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Importance: Normal
X-Priority: 3 (Normal)
X-Type: plain
In-Reply-To: <CAA93jw564FZSJOtP9BBpCiJEKhPxKACxSMBg5gpfuHfr==TzVQ@mail.gmail.com>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> 
 <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de> 
 <1563401917.00951412@apps.rackspace.com> 
 <1563402855.88484511@apps.rackspace.com> 
 <CAA93jw564FZSJOtP9BBpCiJEKhPxKACxSMBg5gpfuHfr==TzVQ@mail.gmail.com>
Message-ID: <1563462132.13975616@apps.rackspace.com>
X-Mailer: webmail/16.4.5-RC
Subject: Re: [Ecn-sane] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2019 15:02:12 -0000

Dave -=0AThe context of my remarks was about the end-to-end arguments for p=
lacing function in the Internet.=0A=0ATo that end, that "you do not mind pu=
tting storage for low priority packets in the routers" doesn't matter, for =
two important reasons:=0A=0A1) the idea that one should "throw in a feature=
" because people "don't mind" is exactly what leads to feature creep of the=
 worst kind - features that serve absolutely no real purpose. That's what w=
e rigorously objected to in the late 1970's. No, we would NOT throw in feat=
ures as they were "requested" because we didn't mind.=0A=0A2) you have made=
 no argument that the function cannot be done properly at the ends, and no =
argument that putting it in the network is necessary for the ends to achiev=
e storage.=0A=0AOn Wednesday, July 17, 2019 7:23pm, "Dave Taht" <dave.taht@=
gmail.com> said:=0A=0A> On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpre=
ed@deepplum.com> wrote:=0A>>=0A>> A follow up point that I think needs to b=
e made is one more end-to-end argument:=0A>>=0A>> It is NOT the job of the =
IP transport layer to provide free storage for low=0A>> priority packets. T=
he end-to-end argument here says: the ends can and must hold=0A>> packets u=
ntil they are either delivered or not relevant (in RTP, they become=0A>> ir=
relevant when they get older than their desired delivery time, if you want =
an=0A>> example of the latter), SO, the network should not provide the func=
tion of=0A>> storage beyond the minimum needed to deal with transients.=0A>=
>=0A>> That means, unfortunately, that the dream of some kind of "backgroun=
d" path that=0A>> stores "low priority" packets in the network fails the en=
d-to-end argument test.=0A> =0A> I do not mind reserving a tiny portion of =
the network for "background"=0A> traffic. This=0A> is different (I think?) =
than storing low priority packets in the=0A> network. A background=0A> traf=
fic "queue" of 1 packet would be fine....=0A> =0A>> If you think about this=
, it even applies to some imaginary interplanetary IP=0A>> layer network. Q=
ueueing delay is not a feature of any end-to-end requirement.=0A>>=0A>> Wha=
t may be desired at the router/link level in an interplanetary IP layer is=
=0A>> holding packets because a link is actually down, or using link-level =
error=0A>> correction coding or retransmission to bring the error rate down=
 to an acceptable=0A>> level before declaring it down. But that's quite dif=
ferent - it's the link level=0A>> protocol, which aims to deliver minimum q=
ueueing delay under tough conditions,=0A>> without buffering more than need=
ed for that (the number of bits that fit in the=0A>> light-speed transmissi=
on at the transmission rate.=0A> =0A> As I outlined in my mit wifi talk - 1=
 layer of retry of at the wifi=0A> mac layer made it=0A> work, in 1998, and=
 that seemed a very acceptable compromise at the=0A> time. Present day=0A> =
retries at the layer, not congestion controlled, is totally out of hand.=0A=
> =0A> In thinking about starlink's mac, and mobility, I gradulally came to=
=0A> the conclusion that=0A> 1 retry from satellites 550km up (3.6ms rtt) w=
as needed, as much as I=0A> disliked the idea.=0A> =0A> I still dislike ret=
ries at layer 2, even for nearby sats. really=0A> complicates things. so fo=
r all I know I'll be advocating ripping 'em=0A> out in starlink, if they ar=
e indeed, in there, next week.=0A> =0A>> So, the main reason I'm saying thi=
s is because again, there are those who want to=0A>> implement the TCP func=
tion of reliable delivery of each packet in the links.=0A>> That's a very b=
ad idea.=0A> =0A> It was tried in the arpanet, and didn't work well there. =
There's a=0A> good story about many=0A> of the flaws of the Arpanet's desig=
n, including that problem, in the=0A> latter half of Kleinrock's second boo=
k on queue theory, at least the=0A> first edition...=0A> =0A> Wifi (and 345=
g) re-introduced the same problem with retransmits and=0A> block acks at la=
yer 2.=0A> =0A> and after dissecting my ecn battlemesh data and observing w=
hat the=0A> retries at the mac layer STILL do on wifi with the current defa=
ult=0A> wifi codel target (20ms AFTER two txops are in the hardware) curren=
tly=0A> achieve (50ms, which is 10x worse than what we could do and still=
=0A> better performance under load than any other shipping physical layer=
=0A> we have with fifos)... and after thinking hard about nagle's thought=
=0A> that "every application has a right to one packet in the network", and=
=0A> this very long thread reworking the end to end argument in a similar,=
=0A> but not quite identical direction, I'm coming to a couple conclusions=
=0A> I'd possibly not quite expressed well before.=0A> =0A> 1) transports s=
hould treat an RFC3168 CE coupled with loss (drop and=0A> mark) as an even =
stronger signal of congestion than either, and that=0A> this bit of the cod=
el algorithm,=0A> when ecn is in use, is wrong, and has always been wrong:=
=0A> =0A> https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L=
178=0A> =0A> (we added this arbitrarily to codel in the 5th day of developm=
ent in=0A> 2012. Using FQ masked it's effects on light traffic)=0A> =0A> Wh=
at it should do instead is peek the queue and drop until it hits a=0A> mark=
able packet, at the very least.=0A> =0A> Pie has an arbitrary drop at 10% f=
igure, which does lighten the load=0A> some... cake used to have drop and m=
ark also until a year or two=0A> back...=0A> =0A> 2) At low rates and high =
contention, we really need pacing and fractional cwnd.=0A> =0A> (while I wo=
uld very much like to see a dynamic reduction of MSS tried,=0A> that too ha=
s a bottom limit)=0A> =0A> even then, drop as per bullet 1.=0A> =0A> 3) In =
the end, I could see a world with SCE marks, and CE being=0A> obsoleted in =
favor of drop, or CE only being exerted on really light=0A> loads similar t=
o (or less than!) what the arbitrary 10% figure for pie=0A> uses=0A> =0A> 4=
) in all cases, I vastly prefer somehow ultimately shifting greedy=0A> tran=
sports to RTT rather than drop or CE as their primary congestion=0A> contro=
l indicator. FQ makes that feasible today. With enough FQ=0A> deployed for =
enough congestive scenarios and hardware, and RTT=0A> becoming the core ind=
icator for more transports, single queued designs=0A> become possible in th=
e distant future.=0A> =0A> =0A>>=0A>> On Wednesday, July 17, 2019 6:18pm, "=
David P. Reed" <dpreed@deepplum.com> said:=0A>>=0A>> > I do want to toss in=
 my personal observations about the "end-to-end argument"=0A>> > related to=
 per-flow-scheduling. (Such arguments are, of course, a class of=0A>> > arg=
uments to which my name is attached. Not that I am a judge/jury of such=0A>=
> > questions...)=0A>> >=0A>> > A core principle of the Internet design is =
to move function out of the=0A>> network,=0A>> > including routers and midd=
leboxes, if those functions=0A>> >=0A>> > a) can be properly accomplished b=
y the endpoints, and=0A>> > b) are not relevant to all uses of the Internet=
 transport fabric being used by=0A>> the=0A>> > ends.=0A>> >=0A>> > The rat=
ionale here has always seemed obvious to me. Like Bob Briscoe suggests,=0A>=
> we=0A>> > were very wary of throwing features into the network that would=
 preclude=0A>> > unanticipated future interoperability needs, new applicati=
ons, and new=0A>> technology=0A>> > in the infrastructure of the Internet a=
s a whole.=0A>> >=0A>> > So what are we talking about here (ignoring the fi=
ne points of SCE, some of=0A>> which=0A>> > I think are debatable - especia=
lly the focus on TCP alone, since much traffic=0A>> will=0A>> > likely move=
 away from TCP in the near future.=0A>> >=0A>> > A second technical require=
ment (necessary invariant) of the Internet's=0A>> transport=0A>> > is that =
the entire Internet depends on rigorously stopping queueing delay from=0A>>=
 > building up anywhere except at the endpoints, where the ends can manage =
it.This=0A>> is=0A>> > absolutely critical, though it is peculiar in that m=
any engineers, especially=0A>> > those who work at the IP layer and below, =
have a mental model of routing as=0A>> > essentially being about building u=
p queueing delay (in order to manage priority=0A>> in=0A>> > some trivial w=
ay by building up the queue on purpose, apparently).=0A>> >=0A>> > This sec=
ond technical requirement cannot be resolved merely by the endpoints.=0A>> =
> The reason is that the endpoints cannot know accurately what host-host pa=
ths=0A>> share=0A>> > common queues.=0A>> >=0A>> > This lack of a way to "c=
ooperate" among independent users of a queue cannot be=0A>> > solved by a p=
urely end-to-end solution. (well, I suppose some genius might=0A>> invent=
=0A>> > a way, but I have not seen one in my 36 years closely watching the =
Internet in=0A>> > operation since it went live in 1983.)=0A>> >=0A>> > So,=
 what the end-to-end argument would tend to do here, in my opinion, is to=
=0A>> > provide the most minimal mechanism in the devices that are capable =
of building=0A>> up=0A>> > a queue in order to allow all the ends sharing t=
hat queue to do their job -=0A>> which=0A>> > is to stop filling up the que=
ue!=0A>> >=0A>> > Only the endpoints can prevent filling up queues. And dep=
ending on the=0A>> protocol,=0A>> > they may need to make very different, y=
et compatible choices.=0A>> >=0A>> > This is a question of design at the ar=
chitectural level. And the future=0A>> matters.=0A>> >=0A>> > So there is a=
n end-to-end argument to be made here, but it is a subtle one.=0A>> >=0A>> =
> The basic mechanism for controlling queue depth has been, and remains, qu=
ite=0A>> > simple: dropping packets. This has two impacts: 1) immediately r=
educing=0A>> queueing=0A>> > delay, and 2) signalling to endpoints that are=
 paying attention that they have=0A>> > contributed to an overfull queue.=
=0A>> >=0A>> > The optimum queueing delay in a steady state would always be=
 one packet or=0A>> less.=0A>> > Kleinrock has shown this in the last few y=
ears. Of course there aren't steady=0A>> > states. But we don't want a mech=
anism that can't converge to that steady state=0A>> > *quickly*, for all qu=
eues in the network.=0A>> >=0A>> > Another issue is that endpoints are not =
aware of the fact that packets can=0A>> take=0A>> > multiple paths to any d=
estination. In the future, alternate path choices can=0A>> be=0A>> > made b=
y routers (when we get smarter routing algorithms based on traffic=0A>> > e=
ngineering).=0A>> >=0A>> > So again, some minimal kind of information must =
be exposed to endpoints that=0A>> will=0A>> > continue to communicate. Agai=
n, the routers must be able to help a wide variety=0A>> of=0A>> > endpoints=
 with different use cases to decide how to move queue buildup out of=0A>> t=
he=0A>> > network itself.=0A>> >=0A>> > Now the decision made by the endpoi=
nts must be made in the context of=0A>> information=0A>> > about fairness. =
Maybe this is what is not obvious.=0A>> >=0A>> > The most obvious notion of=
 fairness is equal shares among source host, dest=0A>> host=0A>> > pairs. T=
here are drawbacks to that, but the benefit of it is that it affects=0A>> t=
he=0A>> > IP layer alone, and deals with lots of boundary cases like the ca=
se where a=0A>> single=0A>> > host opens a zillion TCP connections or uses =
lots of UDP source ports or=0A>> > destinations to somehow "cheat" by appea=
ring to have "lots of flows".=0A>> >=0A>> > Another way to deal with dividi=
ng up flows is to ignore higher level protocol=0A>> > information entirely,=
 and put the flow idenfitication in the IP layer. A 32-bit=0A>> or=0A>> > 6=
4-bit random number could be added as an "option" to IP to somehow extend t=
he=0A>> > flow space.=0A>> >=0A>> > But that is not the most important thin=
g today.=0A>> >=0A>> > I write this to say:=0A>> > 1) some kind of per-flow=
 queueing, during the transient state where a queue is=0A>> > overloaded be=
fore packets are dropped would provide much needed information to=0A>> the=
=0A>> > ends of every flow sharing a common queue.=0A>> > 2) per-flow queue=
ing, minimized to a very low level, using IP envelope address=0A>> > inform=
ation (plus maybe UDP and TCP addresses for those protocols in an=0A>> exte=
nded=0A>> > address-based flow definition) is totally compatible with end-t=
o-end=0A>> arguments,=0A>> > but ONLY if the decisions made are certain to =
drive queueing delay out of the=0A>> > router to the endpoints.=0A>> >=0A>>=
 >=0A>> >=0A>> >=0A>> > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moel=
ler" <moeller0@gmx.de>=0A>> said:=0A>> >=0A>> >> Dear Bob, dear IETF team,=
=0A>> >>=0A>> >>=0A>> >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobb=
riscoe.net> wrote:=0A>> >>>=0A>> >>> Jake, all,=0A>> >>>=0A>> >>> You may n=
ot be aware of my long history of concern about how per-flow=0A>> schedulin=
g=0A>> >>> within endpoints and networks will limit the Internet in future.=
 I find=0A>> per-flow=0A>> >>> scheduling a violation of the e2e principle =
in such a profound way - the=0A>> dynamic=0A>> >>> choice of the spacing be=
tween packets - that most people don't even associate=0A>> it=0A>> >>> with=
 the e2e principle.=0A>> >>=0A>> >>      This does not rhyme well with the =
L4S stated advantage of allowing=0A>> packet=0A>> >> reordering (due to man=
dating RACK for all L4S tcp endpoints). Because surely=0A>> >> changing the=
 order of packets messes up the "the dynamic choice of the=0A>> spacing=0A>=
> >> between packets" in a significant way. IMHO it is either L4S is great =
because=0A>> it=0A>> >> will give intermediate hops more leeway to re-order=
 packets, or "a sender's=0A>> >> packet spacing" is sacred, please make up =
your mind which it is.=0A>> >>=0A>> >>>=0A>> >>> I detected that you were t=
alking about FQ in a way that might have assumed=0A>> my=0A>> >>> concern w=
ith it was just about implementation complexity. If you (or anyone=0A>> >>>=
 watching) is not aware of the architectural concerns with per-flow=0A>> sc=
heduling, I=0A>> >>> can enumerate them.=0A>> >>=0A>> >>      Please do not=
 hesitate to do so after your deserved holiday, and please=0A>> state a=0A>=
> >> superior alternative.=0A>> >>=0A>> >> Best Regards=0A>> >>      Sebast=
ian=0A>> >>=0A>> >>=0A>> >>>=0A>> >>> I originally started working on what =
became L4S to prove that it was possible=0A>> to=0A>> >>> separate out redu=
cing queuing delay from throughput scheduling. When Koen and=0A>> I=0A>> >>=
> started working together on this, we discovered we had identical concerns=
 on=0A>> >>> this.=0A>> >>>=0A>> >>>=0A>> >>>=0A>> >>> Bob=0A>> >>>=0A>> >>=
>=0A>> >>> --=0A>> >>> ____________________________________________________=
____________=0A>> >>> Bob Briscoe                               http://bobb=
riscoe.net/=0A>> >>>=0A>> >>> _____________________________________________=
__=0A>> >>> Ecn-sane mailing list=0A>> >>> Ecn-sane@lists.bufferbloat.net=
=0A>> >>> https://lists.bufferbloat.net/listinfo/ecn-sane=0A>> >>=0A>> >> _=
______________________________________________=0A>> >> Ecn-sane mailing lis=
t=0A>> >> Ecn-sane@lists.bufferbloat.net=0A>> >> https://lists.bufferbloat.=
net/listinfo/ecn-sane=0A>> >>=0A>> >=0A>> >=0A>> > ________________________=
_______________________=0A>> > Ecn-sane mailing list=0A>> > Ecn-sane@lists.=
bufferbloat.net=0A>> > https://lists.bufferbloat.net/listinfo/ecn-sane=0A>>=
 >=0A>>=0A>>=0A>> _______________________________________________=0A>> Ecn-=
sane mailing list=0A>> Ecn-sane@lists.bufferbloat.net=0A>> https://lists.bu=
fferbloat.net/listinfo/ecn-sane=0A> =0A> =0A> =0A> --=0A> =0A> Dave T=C3=A4=
ht=0A> CTO, TekLibre, LLC=0A> http://www.teklibre.com=0A> Tel: 1-831-205-97=
40=0A> =0A