From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@deepplum.com>
Received: from smtp92.iad3a.emailsrvr.com (smtp92.iad3a.emailsrvr.com
 [173.203.187.92])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id DBFB93CB35
 for <ecn-sane@lists.bufferbloat.net>; Wed, 26 Jun 2019 12:53:02 -0400 (EDT)
Received: from smtp12.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp12.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 99012251F3;
 Wed, 26 Jun 2019 12:53:02 -0400 (EDT)
X-SMTPDoctor-Processed: csmtpprox beta
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=g001.emailsrvr.com;
 s=20190322-9u7zjiwi; t=1561567982;
 bh=odWZgAxOjWk4zAz4E3je0UMtt6RAq/sKpR7lWgJVu/4=;
 h=Date:Subject:From:To:From;
 b=ub9duWvimrLEnwVHkKWy0Ssb93oPvpSkHNc7ByKAJObYJxNj25VAlkwEqqi7SRGk2
 emdm1x11VcY7LkiYsclVufUhJ4Fd37yH+aqmW+Qm+lYCPQ7u3YiRBjGD9eEAVaKDhe
 LY6M3WR8Mz00wN/ck0avQq64BPdD4ylNr/9mp8Fk=
Received: from app15.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140])
 by smtp12.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 42CC8251C0;
 Wed, 26 Jun 2019 12:53:02 -0400 (EDT)
X-Sender-Id: dpreed@deepplum.com
Received: from app15.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12);
 Wed, 26 Jun 2019 12:53:02 -0400
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
 by app15.wa-webapps.iad3a (Postfix) with ESMTP id 29DC4E0084;
 Wed, 26 Jun 2019 12:53:02 -0400 (EDT)
Received: by apps.rackspace.com
 (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) 
 with HTTP; Wed, 26 Jun 2019 12:53:02 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Wed, 26 Jun 2019 12:53:02 -0400 (EDT)
From: "David P. Reed" <dpreed@deepplum.com>
To: "David P. Reed" <dpreed@deepplum.com>
Cc: "Sebastian Moeller" <moeller0@gmx.de>,
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 "Brian E Carpenter" <brian.e.carpenter@gmail.com>,
 "tsvwg IETF list" <tsvwg@ietf.org>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="----=_20190626125302000000_78119"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To: <1561566706.778820831@apps.rackspace.com>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>  
 <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de>  
 <CAHx=1M4+sJBEe-wqCyuVyy=oDz7A+SG_ZxBbu_ZZDZiCHrX2uw@mail.gmail.com>  
 <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com>  
 <1561233009.95886420@apps.rackspace.com>  
 <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com>  
 <1561241377.4026977@apps.rackspace.com>  
 <4E863FC5-D30E-4F76-BDF7-6A787958C628@gmx.de> 
 <1561566706.778820831@apps.rackspace.com>
Message-ID: <1561567982.16883207@apps.rackspace.com>
X-Mailer: webmail/16.4.5-RC
Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Wed, 26 Jun 2019 16:53:02 -0000

------=_20190626125302000000_78119
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

=0AA further minor thought, maybe one that needs not be said:=0A =0AFlows a=
ren't "connections". Routers are not involved in connection state managemen=
t, which is purely part of the end to end protocol. Anything about "connect=
ions" that a router might need to know to handle a packet should be package=
d into the IP header of each packet in a standard form. Routers can "store"=
 this information associated with the source, destination pair if they want=
, for a short time, subject to well understood semantics when they run out =
of storage. This fits into an end-to-end argument as an optiimization of a =
kind, as long as the function of such information is very narrowly and gene=
rally defined to benefit all users of IP-based protocols.=0A =0AFor example=
, remembering the last time a packet of a particular flow was received afte=
r forwarding it, for a short time, to calculate fairness, that seems like a=
 very useful idea, as long as forgetting the last time of receipt is not un=
fair.=0A =0AThis use of the flow's IP headers to carry info into router que=
ueing and routing decisions is analogous to the "Fate Sharing" principle of=
 protocol design that DDC describes. Instead of having an independent contr=
ol plane protocol, which has all kinds of problems with synchronization and=
 combinatorial problems of packet loss, "Fate Sharing" of protocol informat=
ion is very elegant.=0AOn Wednesday, June 26, 2019 12:31pm, "David P. Reed"=
 <dpreed@deepplum.com> said:=0A=0A=0A=0AIt's the limiting case, but also th=
e optimal state given "perfect knowledge".=0A =0AYes, it requires that the =
source-destination pairs sharing the link in question coordinate their pack=
et admission times so they don't "collide" at the link. Ideally the next pa=
cket would arrive during the previous packet's transmission, so it is ready=
-to-go when that packet's transmission ends.=0A =0ASuch exquisite coordinat=
ion is feasible when future behavior by source and destination at the inter=
face is known, which requires an Oracle.=0AThat's the same kind of conditio=
n most information theoretic and queueing theoretic optimality requires.=0A=
 =0ABut this is worth keeping in mind as the overall joint goal of all user=
s.=0A =0AIn particular,  "link utilization" isn't a user goal at all. The l=
ink is there and is being paid for whether it is used or not (looking from =
the network structure as a whole). Its capacity exists to move packets out =
of the way. An ideal link satisfies the requirement that it never creates a=
 queue because of anything other than imperfect coordination of the end-to-=
end flows mapped onto it. That's why the router should not be measured by "=
link utilization" anymore than a tunnel in a city during commuting hours sh=
ould be measured by cars moved per hour. Clearly a tunnel can be VERY conge=
sted and moving many cars if they are attached to each other bumper to bump=
er - the latency through the tunnel would then be huge. If the cars were ti=
pped on their ends and stacked, even more throughput would be achieved thro=
ugh the tunnel, and the delay of rotating them and packing them would add e=
ven more delay.=0A =0AThe idea that "link utilization" of 100% must be achi=
eved is why we got bufferbloat designed into routers. It's a worm's eye per=
spective. To this day, Arista Networks brags about how its bufferbloated fe=
ature design optimizes switch utilization ([ https://packetpushers.net/aris=
tas-big-buffer-b-s/ ]( https://packetpushers.net/aristas-big-buffer-b-s/ ))=
. And it selects benchmarks to "prove" it. Andy Bechtolsheim apparently is =
such a big name that he can sell defective gear at a premium price, letting=
 the datacenters who buy it discover that those switches get "clogged up" b=
y TCP traffic when they are the "bottleneck link". Fortunately, they are fa=
st, so they are less frequently the bottleneck in datacenter daily use.=0A =
=0AIn trying to understand what is going on with congestion signalling, any=
 buffering at the entry to the link should be due only to imperfect informa=
tion being fed back to the endpoints generating traffic. Because a misbehav=
ing endpoint generates Denial of Service for all other users.=0A =0APriorit=
y mechanisms focused on protecting high-paying users from low-paying ones d=
on't help much - they only help at overloaded states of the network. Which =
isn't to say that priority does nothing - it's just that stable assignment =
of a sharing level to priority levels isn't easy.  (See Paris Metro Pricing=
, where there are only two classes, and the problem of deciding how to mana=
ge the access to the "first class" section - the idea that 15 classes with =
different metrics can be handled simply and interoperably between different=
ly managed autonomous systems seems to be an incredibly impractical goal).=
=0AEven in the priority case, buffering is NOT a desirable end user thing.=
=0A =0AMy personal view is that the manager of a network needs to configure=
 the network so that no link ever gets overloaded, if possible. The respons=
e to overload should be to tell the relevant flows to all slow down (not ju=
st one, because if there are 100 flows that start up roughly at the same ti=
me, causing MD on one does very little. This is an example of something whe=
re per-flow stuff in the router actually makes the router helpful in the la=
rge scheme of things. Maybe all flows should be equally informed, as flows.=
 Which means the router needs to know how to signal multiple flows, while n=
ot just hammering all the packets of a single flow.  This case is very real=
, but not as frequently on the client side as on the "server side" in "load=
 balancers" and such like.=0A =0AMy point here is simple:=0A =0A1) the endp=
oints tell the routers what flows are going through a link already. That's =
just the address information. So that information can be used for fairness =
pretty well, especially if short term memory (a bloom filter, perhaps) can =
track a sufficiently large number of flows.=0A =0A2) The per-flow decisions=
 related to congestion control within a flow are necessarily end-to-end in =
nature - the router can only tell the ends what is going on, but the ends (=
together - their admissions rates and consumption rates are coupled to the =
use being made) must be informed and decide. The congestion management must=
 combine information about the source and the destination future behavior (=
even if it is just taking recent history and projecting it as an estimate o=
f future behavior at source and destination). Which is why it is quite natu=
ral to have routers signal the destination, which then signals the source, =
which changes its behavior.=0A =0A3) there are definitely other ways to imp=
rove latency for IP and protocols built on top of it  - routing some flows =
over different paths under congestion is one. call the per-flow routing. An=
other is scattering a flow over several paths (but that seems problematic f=
or today's TcP which assumes all packets take the same path).=0A =0A4) A di=
fferent, but very coupled view of IP is that any application-relevant buffe=
ring shoujld be driven into the endpoints - at the source, buffering is use=
ful to deal with variability in the rate of production of data to be sent. =
At the destination, buffering is useful to minimize jitter, matching to the=
 consumption behavior of the application.  But these buffers should not be =
pushed into the network where they cause congestion for other flows sharing=
 resources.=0ASo buffering in the network should ONLY deal with the uncerta=
inty in resource competition.=0A =0AThis tripartite breakdown of buffering =
is protocol independent. It applies to TCP, NTP, RTP, QUIC/UDP, ...  It's w=
hat we (that is me) had in mind when we split UDP out of TCP, allowing UDP =
based protocols to manage source and destination buffering in the applicati=
on for all the things we thought UDP would be used for - packet speech, com=
puter-computer remote procedure calls (what would be QUIC today), SATNET/in=
terplanetary Internet connections , ...).=0A =0ASadly, in the many years si=
nce the late 1970's the tendency to think file transfers between infinite s=
peed storage devices over TCP are the only relevant use of the Internet has=
 penetrated the router design community. I can't seem to get anyone to reco=
gnize how far we are from that.  No one runs benchmarks for such behavior, =
no one even measures anything other than the "hot rod" maximum throughput c=
ases.=0A =0AAnd many egos seem to think that working on the hot rod cases i=
s going to make their career or sell product.  (e.g. the sad case of Arista=
).=0A =0A =0AOn Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" <moell=
er0@gmx.de> said:=0A=0A=0A=0A> =0A> =0A> > On Jun 23, 2019, at 00:09, David=
 P. Reed <dpreed@deepplum.com> wrote:=0A> >=0A> > [...]=0A> >=0A> > per-flo=
w scheduling is appropriate on a shared link. However, the end-to-end=0A> a=
rgument would suggest that the network not try to divine which flows get=0A=
> preferred.=0A> > And beyond the end-to-end argument, there's a practical =
problem - since the=0A> ideal state of a shared link means that it ought to=
 have no local backlog in the=0A> queue, the information needed to schedule=
 "fairly" isn't in the queue backlog=0A> itself. If there is only one packe=
t, what's to schedule?=0A> >=0A> [...]=0A> =0A> Excuse my stupidity, but th=
e "only one single packet" case is the theoretical=0A> limiting case, no?=
=0A> Because even on a link not running at capacity this effectively requir=
es a=0A> mechanism to "synchronize" all senders (whose packets traverse the=
 hop we are=0A> looking at), as no other packet is allowed to reach the hop=
 unless the "current"=0A> one has been passed to the PHY otherwise we trans=
iently queue 2 packets (I note=0A> that this rationale should hold for any =
small N). The more packets per second a=0A> hop handles the less likely it =
will be to avoid for any newcomer to run into an=0A> already existing packe=
t(s), that is to transiently grow the queue.=0A> Not having a CS background=
, I fail to see how this required synchronized state can=0A> exist outside =
of a few steady state configurations where things change slowly=0A> enough =
that the seemingly required synchronization can actually happen (given=0A> =
that the feedback loop e.g. through ACKs, seems somewhat jittery). Since pa=
ckets=0A> never know which path they take and which hop is going to be crit=
ical there seems=0A> to be no a priori way to synchronize all senders, heck=
 I fail to see whether it=0A> would be possible at all to guarantee synchro=
nized behavior on more than one hop=0A> (unless all hops are extremely unif=
orm).=0A> I happen to believe that L4S suffers from the same conceptual iss=
ue (plus overly=0A> generic promises, from the RITE website:=0A> "We are so=
 used to the unpredictability of queuing delay, we don=E2=80=99t know how=
=0A> good the Internet would feel without it. The RITE project has develope=
d simple=0A> technology to make queuing delay a thing of the past=E2=80=94n=
ot just for a select=0A> few apps, but for all." this seems missing a condi=
tions apply statement)=0A> =0A> Best Regards=0A> Sebastian
------=_20190626125302000000_78119
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<font face=3D"arial" size=3D"3"><p style=3D"margin:0;padding:0;font-family:=
 arial; font-size: 12pt; overflow-wrap: break-word;">A further minor though=
t, maybe one that needs not be said:</p>=0A<p style=3D"margin:0;padding:0;f=
ont-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=
=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; over=
flow-wrap: break-word;">Flows aren't "connections". Routers are not involve=
d in connection state management, which is purely part of the end to end pr=
otocol. Anything about "connections" that a router might need to know to ha=
ndle a packet should be packaged into the IP header of each packet in a sta=
ndard form. Routers can "store" this information associated with the source=
, destination pair if they want, for a short time, subject to well understo=
od semantics when they run out of storage. This fits into an end-to-end arg=
ument as an optiimization of a kind, as long as the function of such inform=
ation is very narrowly and generally defined to benefit all users of IP-bas=
ed protocols.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font=
-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;=
padding:0;font-family: arial; font-size: 12pt; overflow-wrap: break-word;">=
For example, remembering the last time a packet of a particular flow was re=
ceived after forwarding it, for a short time, to calculate fairness, that s=
eems like a very useful idea, as long as forgetting the last time of receip=
t is not unfair.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; f=
ont-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin=
:0;padding:0;font-family: arial; font-size: 12pt; overflow-wrap: break-word=
;">This use of the flow's IP headers to carry info into router queueing and=
 routing decisions is analogous to the "Fate Sharing" principle of protocol=
 design that DDC describes. Instead of having an independent control plane =
protocol, which has all kinds of problems with synchronization and combinat=
orial problems of packet loss, "Fate Sharing" of protocol information is ve=
ry elegant.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-s=
ize: 12pt; overflow-wrap: break-word;">On Wednesday, June 26, 2019 12:31pm,=
 "David P. Reed" &lt;dpreed@deepplum.com&gt; said:<br /><br /></p>=0A<div i=
d=3D"SafeStyles1561567216">=0A<p style=3D"margin:0;padding:0;margin: 0; pad=
ding: 0; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">I=
t's the limiting case, but also the optimal state given "perfect knowledge"=
.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: =
arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=
=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size=
: 12pt; overflow-wrap: break-word;">Yes, it requires that the source-destin=
ation pairs sharing the link in question coordinate their packet admission =
times so they don't "collide" at the link. Ideally the next packet would ar=
rive during the previous packet's transmission, so it is ready-to-go when t=
hat packet's transmission ends.</p>=0A<p style=3D"margin:0;padding:0;margin=
: 0; padding: 0; font-family: arial; font-size: 12pt; overflow-wrap: break-=
word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; f=
ont-family: arial; font-size: 12pt; overflow-wrap: break-word;">Such exquis=
ite coordination is feasible when future behavior by source and destination=
 at the interface is known, which requires an Oracle.</p>=0A<p style=3D"mar=
gin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size: 12pt;=
 overflow-wrap: break-word;">That's the same kind of condition most informa=
tion theoretic and queueing theoretic optimality requires.</p>=0A<p style=
=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size=
: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;paddi=
ng:0;margin: 0; padding: 0; font-family: arial; font-size: 12pt; overflow-w=
rap: break-word;">But this is worth keeping in mind as the overall joint go=
al of all users.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0=
; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</=
p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: ari=
al; font-size: 12pt; overflow-wrap: break-word;">In particular,&nbsp; "link=
 utilization" isn't a user goal at all. The link is there and is being paid=
 for whether it is used or not (looking from the network structure as a who=
le). Its capacity exists to move packets out of the way. An ideal link sati=
sfies the requirement that it never creates a queue because of anything oth=
er than imperfect coordination of the end-to-end flows mapped onto it. That=
's why the router should not be measured by "link utilization" anymore than=
 a tunnel in a city during commuting hours should be measured by cars moved=
 per hour. Clearly a tunnel can be VERY congested and moving many cars if t=
hey are attached to each other bumper to bumper - the latency through the t=
unnel would then be huge. If the cars were tipped on their ends and stacked=
, even more throughput would be achieved through the tunnel, and the delay =
of rotating them and packing them would add even more delay.</p>=0A<p style=
=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size=
: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;paddi=
ng:0;margin: 0; padding: 0; font-family: arial; font-size: 12pt; overflow-w=
rap: break-word;">The idea that "link utilization" of 100% must be achieved=
 is why we got bufferbloat designed into routers. It's a worm's eye perspec=
tive. To this day, Arista Networks brags about how its bufferbloated featur=
e design optimizes switch utilization (<a href=3D"https://packetpushers.net=
/aristas-big-buffer-b-s/">https://packetpushers.net/aristas-big-buffer-b-s/=
</a>). And it selects benchmarks to "prove" it. Andy Bechtolsheim apparentl=
y is such a big name that he can sell defective gear at a premium price, le=
tting the datacenters who buy it discover that those switches get "clogged =
up" by TCP traffic when they are the "bottleneck link". Fortunately, they a=
re fast, so they are less frequently the bottleneck in datacenter daily use=
.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: =
arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=
=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size=
: 12pt; overflow-wrap: break-word;">In trying to understand what is going o=
n with congestion signalling, any buffering at the entry to the link should=
 be due only to imperfect information being fed back to the endpoints gener=
ating traffic. Because a misbehaving endpoint generates Denial of Service f=
or all other users.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding=
: 0; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp=
;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: =
arial; font-size: 12pt; overflow-wrap: break-word;">Priority mechanisms foc=
used on protecting high-paying users from low-paying ones don't help much -=
 they only help at overloaded states of the network. Which isn't to say tha=
t priority does nothing - it's just that stable assignment of a sharing lev=
el to priority levels isn't easy.&nbsp; (See Paris Metro Pricing, where the=
re are only two classes, and the problem of deciding how to manage the acce=
ss to the "first class" section - the idea that 15 classes with different m=
etrics can be handled simply and interoperably between differently managed =
autonomous systems seems to be an incredibly impractical goal).</p>=0A<p st=
yle=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-s=
ize: 12pt; overflow-wrap: break-word;">Even in the priority case, buffering=
 is NOT a desirable end user thing.</p>=0A<p style=3D"margin:0;padding:0;ma=
rgin: 0; padding: 0; font-family: arial; font-size: 12pt; overflow-wrap: br=
eak-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: =
0; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">My pers=
onal view is that the manager of a network needs to configure the network s=
o that no link ever gets overloaded, if possible. The response to overload =
should be to tell the relevant flows to all slow down (not just one, becaus=
e if there are 100 flows that start up roughly at the same time, causing MD=
 on one does very little. This is an example of something where per-flow st=
uff in the router actually makes the router helpful in the large scheme of =
things. Maybe all flows should be equally informed, as flows. Which means t=
he router needs to know how to signal multiple flows, while not just hammer=
ing all the packets of a single flow.&nbsp; This case is very real, but not=
 as frequently on the client side as on the "server side" in "load balancer=
s" and such like.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: =
0; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;<=
/p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: ar=
ial; font-size: 12pt; overflow-wrap: break-word;">My point here is simple:<=
/p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: ar=
ial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"=
margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size: 12=
pt; overflow-wrap: break-word;">1) the endpoints tell the routers what flow=
s are going through a link already. That's just the address information. So=
 that information can be used for fairness pretty well, especially if short=
 term memory (a bloom filter, perhaps) can track a sufficiently large numbe=
r of flows.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; fon=
t-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A=
<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; f=
ont-size: 12pt; overflow-wrap: break-word;">2) The per-flow decisions relat=
ed to congestion control within a flow are necessarily end-to-end in nature=
 - the router can only tell the ends what is going on, but the ends (togeth=
er - their admissions rates and consumption rates are coupled to the use be=
ing made) must be informed and decide. The congestion management must combi=
ne information about the source and the destination future behavior (even i=
f it is just taking recent history and projecting it as an estimate of futu=
re behavior at source and destination). Which is why it is quite natural to=
 have routers signal the destination, which then signals the source, which =
changes its behavior.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; paddi=
ng: 0; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nb=
sp;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family=
: arial; font-size: 12pt; overflow-wrap: break-word;">3) there are definite=
ly other ways to improve latency for IP and protocols built on top of it&nb=
sp; - routing some flows over different paths under congestion is one. call=
 the per-flow routing. Another is scattering a flow over several paths (but=
 that seems problematic for today's TcP which assumes all packets take the =
same path).</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; fon=
t-family: arial; font-size: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A=
<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; f=
ont-size: 12pt; overflow-wrap: break-word;">4) A different, but very couple=
d view of IP is that any application-relevant buffering shoujld be driven i=
nto the endpoints - at the source, buffering is useful to deal with variabi=
lity in the rate of production of data to be sent. At the destination, buff=
ering is useful to minimize jitter, matching to the consumption behavior of=
 the application.&nbsp; But these buffers should not be pushed into the net=
work where they cause congestion for other flows sharing resources.</p>=0A<=
p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; fo=
nt-size: 12pt; overflow-wrap: break-word;">So buffering in the network shou=
ld ONLY deal with the uncertainty in resource competition.</p>=0A<p style=
=3D"margin:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size=
: 12pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;paddi=
ng:0;margin: 0; padding: 0; font-family: arial; font-size: 12pt; overflow-w=
rap: break-word;">This tripartite breakdown of buffering is protocol indepe=
ndent. It applies to TCP, NTP, RTP, QUIC/UDP, ...&nbsp; It's what we (that =
is me) had in mind when we split UDP out of TCP, allowing UDP based protoco=
ls to manage source and destination buffering in the application for all th=
e things we thought UDP would be used for - packet speech, computer-compute=
r remote procedure calls (what would be QUIC today), SATNET/interplanetary =
Internet connections , ...).</p>=0A<p style=3D"margin:0;padding:0;margin: 0=
; padding: 0; font-family: arial; font-size: 12pt; overflow-wrap: break-wor=
d;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font=
-family: arial; font-size: 12pt; overflow-wrap: break-word;">Sadly, in the =
many years since the late 1970's the tendency to think file transfers betwe=
en infinite speed storage devices over TCP are the only relevant use of the=
 Internet has penetrated the router design community. I can't seem to get a=
nyone to recognize how far we are from that.&nbsp; No one runs benchmarks f=
or such behavior, no one even measures anything other than the "hot rod" ma=
ximum throughput cases.</p>=0A<p style=3D"margin:0;padding:0;margin: 0; pad=
ding: 0; font-family: arial; font-size: 12pt; overflow-wrap: break-word;">&=
nbsp;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0; font-fami=
ly: arial; font-size: 12pt; overflow-wrap: break-word;">And many egos seem =
to think that working on the hot rod cases is going to make their career or=
 sell product.&nbsp; (e.g. the sad case of Arista).</p>=0A<p style=3D"margi=
n:0;padding:0;margin: 0; padding: 0; font-family: arial; font-size: 12pt; o=
verflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;marg=
in: 0; padding: 0; font-family: arial; font-size: 12pt; overflow-wrap: brea=
k-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;margin: 0; padding: 0;=
 font-family: arial; font-size: 12pt; overflow-wrap: break-word;">On Wednes=
day, June 26, 2019 8:48am, "Sebastian Moeller" &lt;moeller0@gmx.de&gt; said=
:<br /><br /></p>=0A<div id=3D"SafeStyles1561563193">=0A<p style=3D"margin:=
0;padding:0;margin: 0; padding: 0; font-family: arial; font-size: 12pt; ove=
rflow-wrap: break-word;">&gt; <br />&gt; <br />&gt; &gt; On Jun 23, 2019, a=
t 00:09, David P. Reed &lt;dpreed@deepplum.com&gt; wrote:<br />&gt; &gt;<br=
 />&gt; &gt; [...]<br />&gt; &gt;<br />&gt; &gt; per-flow scheduling is app=
ropriate on a shared link. However, the end-to-end<br />&gt; argument would=
 suggest that the network not try to divine which flows get<br />&gt; prefe=
rred.<br />&gt; &gt; And beyond the end-to-end argument, there's a practica=
l problem - since the<br />&gt; ideal state of a shared link means that it =
ought to have no local backlog in the<br />&gt; queue, the information need=
ed to schedule "fairly" isn't in the queue backlog<br />&gt; itself. If the=
re is only one packet, what's to schedule?<br />&gt; &gt;<br />&gt; [...]<b=
r />&gt; <br />&gt; Excuse my stupidity, but the "only one single packet" c=
ase is the theoretical<br />&gt; limiting case, no?<br />&gt; Because even =
on a link not running at capacity this effectively requires a<br />&gt; mec=
hanism to "synchronize" all senders (whose packets traverse the hop we are<=
br />&gt; looking at), as no other packet is allowed to reach the hop unles=
s the "current"<br />&gt; one has been passed to the PHY otherwise we trans=
iently queue 2 packets (I note<br />&gt; that this rationale should hold fo=
r any small N). The more packets per second a<br />&gt; hop handles the les=
s likely it will be to avoid for any newcomer to run into an<br />&gt; alre=
ady existing packet(s), that is to transiently grow the queue.<br />&gt; No=
t having a CS background, I fail to see how this required synchronized stat=
e can<br />&gt; exist outside of a few steady state configurations where th=
ings change slowly<br />&gt; enough that the seemingly required synchroniza=
tion can actually happen (given<br />&gt; that the feedback loop e.g. throu=
gh ACKs, seems somewhat jittery). Since packets<br />&gt; never know which =
path they take and which hop is going to be critical there seems<br />&gt; =
to be no a priori way to synchronize all senders, heck I fail to see whethe=
r it<br />&gt; would be possible at all to guarantee synchronized behavior =
on more than one hop<br />&gt; (unless all hops are extremely uniform).<br =
/>&gt; I happen to believe that L4S suffers from the same conceptual issue =
(plus overly<br />&gt; generic promises, from the RITE website:<br />&gt; "=
We are so used to the unpredictability of queuing delay, we don=E2=80=99t k=
now how<br />&gt; good the Internet would feel without it. The RITE project=
 has developed simple<br />&gt; technology to make queuing delay a thing of=
 the past=E2=80=94not just for a select<br />&gt; few apps, but for all." t=
his seems missing a conditions apply statement)<br />&gt; <br />&gt; Best R=
egards<br />&gt; Sebastian</p>=0A</div>=0A</div></font>
------=_20190626125302000000_78119--