From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 8000F3CB35
 for <ecn-sane@lists.bufferbloat.net>; Thu, 27 Jun 2019 03:50:00 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net;
 s=badeba3b8450; t=1561621795;
 bh=vw769vQMAUHJhFD82jTpcwORZr9jZAG5KRV6FHLd5Us=;
 h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To;
 b=QussTp7gg27xOJJz2+ctJLJbOn8qMRgnvqvQ2+vrZX976RK4E57UTBeavzLp0ZGxi
 dNT3A0rHt7hgj04gBHoRSq76ZzUdsII9gIIB07Vv6iSvJnet1bz1CDyo5V0R8m6XBE
 qaAT+Np5UsLlgGWdMvdWBsQLJbSt4tkVzt2z38Yk=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [172.17.3.45] ([134.76.241.253]) by mail.gmx.com (mrgmx002
 [212.227.17.190]) with ESMTPSA (Nemesis) id 0LztD9-1icRf61QdX-01559G; Thu, 27
 Jun 2019 09:49:55 +0200
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
X-Priority: 3 (Normal)
In-Reply-To: <1561566706.778820831@apps.rackspace.com>
Date: Thu, 27 Jun 2019 09:49:53 +0200
Cc: Jonathan Morton <chromatix99@gmail.com>,
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 Brian E Carpenter <brian.e.carpenter@gmail.com>,
 tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <9A6E126A-43A3-4BD8-A3AC-507FF9095470@gmx.de>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de>
 <CAHx=1M4+sJBEe-wqCyuVyy=oDz7A+SG_ZxBbu_ZZDZiCHrX2uw@mail.gmail.com>
 <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com>
 <1561233009.95886420@apps.rackspace.com>
 <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com>
 <1561241377.4026977@apps.rackspace.com>
 <4E863FC5-D30E-4F76-BDF7-6A787958C628@gmx.de>
 <1561566706.778820831@apps.rackspace.com>
To: "David P. Reed" <dpreed@deepplum.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:coa5vkGQuB6A7eBbC9IeTqwgcJQ2sQcc52KPAQ3ehBXPeWCDJ01
 e4F6l6av2rGicN4A7pDhxTf9YhiqrFv8C1iYAukbcAMQtO7Trlre/NdXYpLb/nQwZFonSha
 /YyCTfAG0iQg9U0fRNksdCNag843X1jiXxg3nHRBGe4el1PnA+EfO2Pzu0jHk6RCV0eWHru
 HuzoUAZyl6DHYkmXAKZWA==
X-Spam-Flag: NO
X-UI-Out-Filterresults: notjunk:1;V03:K0:bx2Y4n6mwLk=:VZ9Y+ogQ3pQQxyJX+rhqdT
 KGm/G1RxtgilpGHbc+y1IZCcrsdxyDHQFLJ5J20NfLKVzBWUbtQOr+su1g+jLKR3oq/+uuohN
 jlUEUMXP99SoOPotuGo4k+Xkk7y75oVUIRShVXO5hsSVT+TzlPGN3pY0oEFZbFkhUpBCBTdts
 5bNTBDXj0mM3QZXkUmmVI6MTV4Qm59UCQo35M7CRuJLGfSE9u5e4WUIGQhi2Yjzgy8wC5Sog7
 Yt2Z74szieyTbS6v7n7Ef6GoAS0bzi5mUzqoAxoFk+923ZLcXuxQt/fM3fdI427KTCJwicSc2
 x16oyh/8edXujA1N4LXu3xdsjWd51l6Bt2jYco2/S0jk9tMMbDEp6KbsoJ4WTEcdC4BFsOhvU
 1JiCeGrQEY9KxQXuzJaYTOZha7YILc1nNn4OEUJYJ3lwBKFQquAoZ6Vsb7YOB9fPOidBsBiBD
 mO6NHhCil0qfAzQqygcO4BTKNSTvv06HCLPFa4a/XhGEs61d2MpMJxnlyfyPKiJU62n3kaB57
 gdAcEbslsUAodm/5demmI36xUKto3EI0ZUdZ3LYjCTNuGHrIlcjUU73Zkt/XBActPkj4vM7WD
 KKE6VZTTyuQMaEbSdyTeoOh3BhUup1i6MmIN6OKSMnk3v3jkfpb1KNjZQbAfiAuYkDuQQToZ5
 hOt441nq5rHjaTxgMN5ANnFz7aJUv0ns1b/xkURVk4Bs4buSp0PF+yXNZoVM9ZkVBRtP/P7sj
 B/Hwm1Mk9IydM3DpnFjZ50w4FxeL4wPkBYs44FRuJnZ0tvUNbJ/I9dnMpJd4Z7BS7KfB2Wk8X
 GmSV+mMcmcKJGzVRFvj/N3BnNJ4B+HeVMFPbKLImiQr5xD4sIBGGcpDGFjwzHJhtcHPOTEWqX
 lEMMwYlO7gcszHRvFNck9okhoGgrhrSItux03ks9plA95mp06+T33gtLe5n26OpaoLFgCVske
 VxC67FyRaEIBeohlI2pVOq3F8MQiA1rLpYA7vbd8XQ0971xY5hZyH
Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 27 Jun 2019 07:50:00 -0000

Hi David,

thanks for your response.

> On Jun 26, 2019, at 18:31, David P. Reed <dpreed@deepplum.com> wrote:
>=20
> It's the limiting case, but also the optimal state given "perfect =
knowledge".
> =20
> Yes, it requires that the source-destination pairs sharing the link in =
question coordinate their packet admission times so they don't "collide" =
at the link. Ideally the next packet would arrive during the previous =
packet's transmission, so it is ready-to-go when that packet's =
transmission ends.
> =20
> Such exquisite coordination is feasible when future behavior by source =
and destination at the interface is known, which requires an Oracle.
> That's the same kind of condition most information theoretic and =
queueing theoretic optimality requires.

	Ah, great, I had feared I had missed something.

> =20
> But this is worth keeping in mind as the overall joint goal of all =
users.
> =20
> In particular,  "link utilization" isn't a user goal at all. The link =
is there and is being paid for whether it is used or not (looking from =
the network structure as a whole). Its capacity exists to move packets =
out of the way. An ideal link satisfies the requirement that it never =
creates a queue because of anything other than imperfect coordination of =
the end-to-end flows mapped onto it. That's why the router should not be =
measured by "link utilization" anymore than a tunnel in a city during =
commuting hours should be measured by cars moved per hour. Clearly a =
tunnel can be VERY congested and moving many cars if they are attached =
to each other bumper to bumper - the latency through the tunnel would =
then be huge. If the cars were tipped on their ends and stacked, even =
more throughput would be achieved through the tunnel, and the delay of =
rotating them and packing them would add even more delay.

	+1; this is the core of the movement under the "bufferbloat" =
moniker put latency back into the spot light where it belongs (at least =
for common inter-active network usage, bulk transfer is a different =
kettle of fish). Given the relative low rates of common internet access =
links, running at capacity, while not a primary goal, still becomes =
common enough to require special treatment to keep latency under load =
increase under control. Both FQ solutions and L4S offer remedies for =
that case. (Being a non-expert home-user myself this case also is =
prominent on my radar, my ISPs backbone and peerings/transits being well =
managed the access link is the one point where queueing happens, just as =
you describe).

> =20
> The idea that "link utilization" of 100% must be achieved is why we =
got bufferbloat designed into routers.

	While I do not describe to this view (and actually are trading =
in "top-speed" to keep latency sane) a considerable fraction of =
home-users seem obsessed in maxing out their access links and compare =
achievable rates; whether such behaviour shoud be encouraged is a =
different question.

> It's a worm's eye perspective. To this day, Arista Networks brags =
about how its bufferbloated feature design optimizes switch utilization =
(https://packetpushers.net/aristas-big-buffer-b-s/). And it selects =
benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big =
name that he can sell defective gear at a premium price, letting the =
datacenters who buy it discover that those switches get "clogged up" by =
TCP traffic when they are the "bottleneck link". Fortunately, they are =
fast, so they are less frequently the bottleneck in datacenter daily =
use.
> =20
> In trying to understand what is going on with congestion signalling, =
any buffering at the entry to the link should be due only to imperfect =
information being fed back to the endpoints generating traffic. Because =
a misbehaving endpoint generates Denial of Service for all other users.

	This is a good point, and one of the reasons, why I conceptually =
like flow queueing, as that gives the tools to allow to isolate bad =
actors, "trust, but verify" comes to mind as a principle. I also add =
that the _only_ currently known  L4S rolll-out target (low latency =
docsis) actually mandates a mechanism they call "queue protection" which =
to me looks pretty much like it is a FQ system that carefully tries to =
not call itself FQ (it monitors the length of flows and if they exceed =
something pushes them into the RFC3168 queue, which to this layman means =
it need to separately track the packets for each flow in the common =
queue to be able to re-direct them).

> =20
> Priority mechanisms focused on protecting high-paying users from =
low-paying ones don't help much - they only help at overloaded states of =
the network.

	In principle I agree, in practice things get complicated; mixing =
latency-indifferent capacity-devouring applications like bit-torrent =
with say VoIP packets (fixed rates, but latency sensitive) over too =
narrow a link will make it clear that giving the VoIP packet =
precedence/priority over the bulk-transfer packet is a sane policy (that =
becomes an issue due to the difficulty of running a narrow link below =
capacity). I am sure you are aware of all of this, I just need to spell =
it out for my thinking process.


> Which isn't to say that priority does nothing - it's just that stable =
assignment of a sharing level to priority levels isn't easy.  (See Paris =
Metro Pricing, where there are only two classes, and the problem of =
deciding how to manage the access to the "first class" section - the =
idea that 15 classes with different metrics can be handled simply and =
interoperably between differently managed autonomous systems seems to be =
an incredibly impractical goal).

	+1; any prioritization scheme should be extremely simple so that =
an end-user can make predictions about its behavior easily. Also IMHO 3 =
classes of latency behaviour will go a long way, "normal", "don-t care", =
"important" should be enough (L4S IMHO only offers "important" and =
normal, so does not offer to easily down-grade say bulk background =
transfers like bit-torrent (which is going to be an issue with =
bit-torrent triggering on ~100 induced latency increase with L4S's =
RFC3168 queue using a PIE offspring to keep induced latency << 100ms, =
but I digress).)


> Even in the priority case, buffering is NOT a desirable end user =
thing.

	+1; IMHO again a reason for fq, misbehaving flows will not spoil =
the fun for everybody else.

> =20
> My personal view is that the manager of a network needs to configure =
the network so that no link ever gets overloaded, if possible. The =
response to overload should be to tell the relevant flows to all slow =
down (not just one, because if there are 100 flows that start up roughly =
at the same time, causing MD on one does very little.
> This is an example of something where per-flow stuff in the router =
actually makes the router helpful in the large scheme of things. Maybe =
all flows should be equally informed, as flows. Which means the router =
needs to know how to signal multiple flows, while not just hammering all =
the packets of a single flow.  This case is very real, but not as =
frequently on the client side as on the "server side" in "load =
balancers" and such like.
> =20
> My point here is simple:
> =20
> 1) the endpoints tell the routers what flows are going through a link =
already. That's just the address information. So that information can be =
used for fairness pretty well, especially if short term memory (a bloom =
filter, perhaps) can track a sufficiently large number of flows.
> =20
> 2) The per-flow decisions related to congestion control within a flow =
are necessarily end-to-end in nature - the router can only tell the ends =
what is going on, but the ends (together - their admissions rates and =
consumption rates are coupled to the use being made) must be informed =
and decide. The congestion management must combine information about the =
source and the destination future behavior (even if it is just taking =
recent history and projecting it as an estimate of future behavior at =
source and destination). Which is why it is quite natural to have =
routers signal the destination, which then signals the source, which =
changes its behavior.

	In an ideal world the router would also signal the sender as =
that will at least half the time it takes for the congestion information =
to reach the most relevant party; but as I understand this is a) not =
generally possible and b) prone to abuses.

> =20
> 3) there are definitely other ways to improve latency for IP and =
protocols built on top of it  - routing some flows over different paths =
under congestion is one. call the per-flow routing. Another is =
scattering a flow over several paths (but that seems problematic for =
today's TcP which assumes all packets take the same path).

	This is about re-ordering, no?=20

> =20
> 4) A different, but very coupled view of IP is that any =
application-relevant buffering shoujld be driven into the endpoints - at =
the source, buffering is useful to deal with variability in the rate of =
production of data to be sent. At the destination, buffering is useful =
to minimize jitter, matching to the consumption behavior of the =
application.  But these buffers should not be pushed into the network =
where they cause congestion for other flows sharing resources.
> So buffering in the network should ONLY deal with the uncertainty in =
resource competition.

	This, at least in my understanding, is one of the underlaying =
ideas of the L4S approach, so how is your take on how well L4S archives =
that goal?


> =20
> This tripartite breakdown of buffering is protocol independent. It =
applies to TCP, NTP, RTP, QUIC/UDP, ...  It's what we (that is me) had =
in mind when we split UDP out of TCP, allowing UDP based protocols to =
manage source and destination buffering in the application for all the =
things we thought UDP would be used for - packet speech, =
computer-computer remote procedure calls (what would be QUIC today), =
SATNET/interplanetary Internet connections , ...).

	Like many great insights that look obvious in retro-spect, I =
would guess that might have been controversial at its time?

> =20
> Sadly, in the many years since the late 1970's the tendency to think =
file transfers between infinite speed storage devices over TCP are the =
only relevant use of the Internet has penetrated the router design =
community. I can't seem to get anyone to recognize how far we are from =
that.  No one runs benchmarks for such behavior, no one even measures =
anything other than the "hot rod" maximum throughput cases.

	I would guess, that this obsession might be market-driven, as =
long as customers only look at the top-speed numbers, increasing this =
number will be the priority.


Again thanks for your insights

	Sebastian

> =20
> And many egos seem to think that working on the hot rod cases is going =
to make their career or sell product.  (e.g. the sad case of Arista).
> =20
> =20
> On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" =
<moeller0@gmx.de> said:
>=20
> >=20
> >=20
> > > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> =
wrote:
> > >
> > > [...]
> > >
> > > per-flow scheduling is appropriate on a shared link. However, the =
end-to-end
> > argument would suggest that the network not try to divine which =
flows get
> > preferred.
> > > And beyond the end-to-end argument, there's a practical problem - =
since the
> > ideal state of a shared link means that it ought to have no local =
backlog in the
> > queue, the information needed to schedule "fairly" isn't in the =
queue backlog
> > itself. If there is only one packet, what's to schedule?
> > >
> > [...]
> >=20
> > Excuse my stupidity, but the "only one single packet" case is the =
theoretical
> > limiting case, no?
> > Because even on a link not running at capacity this effectively =
requires a
> > mechanism to "synchronize" all senders (whose packets traverse the =
hop we are
> > looking at), as no other packet is allowed to reach the hop unless =
the "current"
> > one has been passed to the PHY otherwise we transiently queue 2 =
packets (I note
> > that this rationale should hold for any small N). The more packets =
per second a
> > hop handles the less likely it will be to avoid for any newcomer to =
run into an
> > already existing packet(s), that is to transiently grow the queue.
> > Not having a CS background, I fail to see how this required =
synchronized state can
> > exist outside of a few steady state configurations where things =
change slowly
> > enough that the seemingly required synchronization can actually =
happen (given
> > that the feedback loop e.g. through ACKs, seems somewhat jittery). =
Since packets
> > never know which path they take and which hop is going to be =
critical there seems
> > to be no a priori way to synchronize all senders, heck I fail to see =
whether it
> > would be possible at all to guarantee synchronized behavior on more =
than one hop
> > (unless all hops are extremely uniform).
> > I happen to believe that L4S suffers from the same conceptual issue =
(plus overly
> > generic promises, from the RITE website:
> > "We are so used to the unpredictability of queuing delay, we don=E2=80=
=99t know how
> > good the Internet would feel without it. The RITE project has =
developed simple
> > technology to make queuing delay a thing of the past=E2=80=94not =
just for a select
> > few apps, but for all." this seems missing a conditions apply =
statement)
> >=20
> > Best Regards
> > Sebastian