From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id AEC953CB35
 for <ecn-sane@lists.bufferbloat.net>; Thu, 27 Jun 2019 03:54:27 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net;
 s=badeba3b8450; t=1561622063;
 bh=xYBMiYY3DYNZbOnFIxj1pa8pd4b8ftbUgTeSp+TjS3E=;
 h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To;
 b=SMIvLxljTWkjlSBGOg0THSBKMF/1GUgSBV/QWLHfsYBwnXkkBVVqQbQPYEr8vQD/r
 3/XEJwAmdO+KKHFEGtBWnYHQF5zP3LPpMAXYP5MbTLJBpAVYWoSs6qjBgdXVZ7payU
 im/+S9XnGw+duzeXG1CXwn6+fwLFJTV68OP9id3E=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [172.17.3.45] ([134.76.241.253]) by mail.gmx.com (mrgmx104
 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MkHQX-1iQzW21sWP-00ke9a; Thu, 27
 Jun 2019 09:54:23 +0200
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
X-Priority: 3 (Normal)
In-Reply-To: <1561567982.16883207@apps.rackspace.com>
Date: Thu, 27 Jun 2019 09:54:21 +0200
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 Brian E Carpenter <brian.e.carpenter@gmail.com>,
 tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <593758EE-C831-4B34-B5C8-1F3858F94236@gmx.de>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de>
 <CAHx=1M4+sJBEe-wqCyuVyy=oDz7A+SG_ZxBbu_ZZDZiCHrX2uw@mail.gmail.com>
 <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com>
 <1561233009.95886420@apps.rackspace.com>
 <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com>
 <1561241377.4026977@apps.rackspace.com>
 <4E863FC5-D30E-4F76-BDF7-6A787958C628@gmx.de>
 <1561566706.778820831@apps.rackspace.com>
 <1561567982.16883207@apps.rackspace.com>
To: "David P. Reed" <dpreed@deepplum.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:LZg08R1ceFJ46DANecr6E+3rXQ4sfKQNr4h0eJWwOrxWWGB2GsM
 uHCP48KqmGJn3y1yF4Guh9ZLOGwiH/AmWEqp39LywG+Qw/GVCYGVUxtcabWZ6rZiGLn8p0L
 j0F1J3Dg6v8UyV4n9Ac/H9GvcNKc2M6wu3+pLZ3zyXst1gTJ5mQKLH8deLoL/IWQdqZZfYP
 ugTe48QO5WhGDG0945stQ==
X-Spam-Flag: NO
X-UI-Out-Filterresults: notjunk:1;V03:K0:9MwcDir6vzs=:mtA6WxXYFVOR6H+YDV4YLV
 CE+U/H2Y6+kge+CzOYd3ZKrwU/gyo9LLihwzZ6VwEOzhQ6xpuicE1gF0z9CTU/DpW4WnzlJjW
 OO3oAiVAbBNk0TFhhVExFM9oRbxP8/StbzItqooVaknQ0SYD53R181pfKKcnUC5oYRgX6iIC6
 dhx7xokryXgGfTS9QxBeduCcmulfMmlVVaFppmH6GTUpCCloNjFRZ920Czc4qm5/5uK1/AyYM
 X2KLgUeW045nmgOBu4Otrgpdyu0wKWDvs/zk63xrWWi3SXiI35BIyPi6rPi5rFrTvu6ykkaAX
 xx2AoKSfsNDGdLeMszXNSs15sF6+9Ubt3NJFJ0vqkDyiHV638Am4YqEn2qi3O1LnOj+ji+dK3
 XflQwu1uEooYCdD2FzSjju+HBIMUt7F06M9h16OfK04NS/cMX8C/VIGvNwMlvNZgS/9kDm53j
 R1K9N5Ea0bqNgG0fi2jkYwKhW2AjiXCp40V+F1t0+aigf6/uAnIHVvX9tXqceZzQkBQLgwrJv
 4fpE7iOR3+NFVeeDHe6Gp3ynKBG1RK2ETwOpoMiSmbPOp0+tN5OcW5tRvGD2vS3a4qnJ1tat2
 e2Anb4pf9QtRi6m0gM7msvvVX6ryxkIbTDM6dAJDigUZlDSvH5tV2q5H94rfD8hThqAKvZkjT
 HO72QnRc9ippy/urFmeuAecuACkYyMnO0f8/+hMNFjMynuM6z78ELkoqwvKj+kvT2PRo9tgKM
 9SPIqOBO2fpqgU2XYT3s3gZQOzySfB8BLtyR99bkYXtLdCJA/aY/lUiBX/sDTeVRNPuMcJPsU
 o0o0t47Nmq+nKCITvVTHJvHG0Qy/5oLdL2LBi+fEgI+gvCNe6M++h2btu9MENtiHH7t5VIO6M
 kRDk4lTI1sIb4UhwNe0qNVRx/Hegwdt4BpqDx3QYno2IWcHgWNt67zIXEhA+cewng7RDUjNq+
 T/YkvexDmL7eD1h6223DC++deLosSLHpICOY0j/wDw/py8bAVSjK9
Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 27 Jun 2019 07:54:28 -0000

Hi David,


> On Jun 26, 2019, at 18:53, David P. Reed <dpreed@deepplum.com> wrote:
>=20
> A further minor thought, maybe one that needs not be said:
> =20
> Flows aren't "connections". Routers are not involved in connection =
state management, which is purely part of the end to end protocol. =
Anything about "connections" that a router might need to know to handle =
a packet should be packaged into the IP header of each packet in a =
standard form.

	I read this, that your are not opposed to using IP packet data =
to convey information to intermediate routers then? In a way (and please =
correct me if this is wrong /too simplistic), L4S intends to use the =
ECT(1) codepoint for enspoints to signal to router's their behavior =
towards CE congestion signals (reduce window/rate by 50% versus a =
smaller step down).


> Routers can "store" this information associated with the source, =
destination pair if they want, for a short time, subject to well =
understood semantics when they run out of storage. This fits into an =
end-to-end argument as an optiimization of a kind, as long as the =
function of such information is very narrowly and generally defined to =
benefit all users of IP-based protocols.

	Okay, that I read as fq-syatems are not in violation of e2e =
then.

Best Regards
	Sebastian


> =20
> For example, remembering the last time a packet of a particular flow =
was received after forwarding it, for a short time, to calculate =
fairness, that seems like a very useful idea, as long as forgetting the =
last time of receipt is not unfair.
> =20
> This use of the flow's IP headers to carry info into router queueing =
and routing decisions is analogous to the "Fate Sharing" principle of =
protocol design that DDC describes. Instead of having an independent =
control plane protocol, which has all kinds of problems with =
synchronization and combinatorial problems of packet loss, "Fate =
Sharing" of protocol information is very elegant.
> On Wednesday, June 26, 2019 12:31pm, "David P. Reed" =
<dpreed@deepplum.com> said:
>=20
> It's the limiting case, but also the optimal state given "perfect =
knowledge".
> =20
> Yes, it requires that the source-destination pairs sharing the link in =
question coordinate their packet admission times so they don't "collide" =
at the link. Ideally the next packet would arrive during the previous =
packet's transmission, so it is ready-to-go when that packet's =
transmission ends.
> =20
> Such exquisite coordination is feasible when future behavior by source =
and destination at the interface is known, which requires an Oracle.
> That's the same kind of condition most information theoretic and =
queueing theoretic optimality requires.
> =20
> But this is worth keeping in mind as the overall joint goal of all =
users.
> =20
> In particular,  "link utilization" isn't a user goal at all. The link =
is there and is being paid for whether it is used or not (looking from =
the network structure as a whole). Its capacity exists to move packets =
out of the way. An ideal link satisfies the requirement that it never =
creates a queue because of anything other than imperfect coordination of =
the end-to-end flows mapped onto it. That's why the router should not be =
measured by "link utilization" anymore than a tunnel in a city during =
commuting hours should be measured by cars moved per hour. Clearly a =
tunnel can be VERY congested and moving many cars if they are attached =
to each other bumper to bumper - the latency through the tunnel would =
then be huge. If the cars were tipped on their ends and stacked, even =
more throughput would be achieved through the tunnel, and the delay of =
rotating them and packing them would add even more delay.
> =20
> The idea that "link utilization" of 100% must be achieved is why we =
got bufferbloat designed into routers. It's a worm's eye perspective. To =
this day, Arista Networks brags about how its bufferbloated feature =
design optimizes switch utilization =
(https://packetpushers.net/aristas-big-buffer-b-s/). And it selects =
benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big =
name that he can sell defective gear at a premium price, letting the =
datacenters who buy it discover that those switches get "clogged up" by =
TCP traffic when they are the "bottleneck link". Fortunately, they are =
fast, so they are less frequently the bottleneck in datacenter daily =
use.
> =20
> In trying to understand what is going on with congestion signalling, =
any buffering at the entry to the link should be due only to imperfect =
information being fed back to the endpoints generating traffic. Because =
a misbehaving endpoint generates Denial of Service for all other users.
> =20
> Priority mechanisms focused on protecting high-paying users from =
low-paying ones don't help much - they only help at overloaded states of =
the network. Which isn't to say that priority does nothing - it's just =
that stable assignment of a sharing level to priority levels isn't easy. =
 (See Paris Metro Pricing, where there are only two classes, and the =
problem of deciding how to manage the access to the "first class" =
section - the idea that 15 classes with different metrics can be handled =
simply and interoperably between differently managed autonomous systems =
seems to be an incredibly impractical goal).
> Even in the priority case, buffering is NOT a desirable end user =
thing.
> =20
> My personal view is that the manager of a network needs to configure =
the network so that no link ever gets overloaded, if possible. The =
response to overload should be to tell the relevant flows to all slow =
down (not just one, because if there are 100 flows that start up roughly =
at the same time, causing MD on one does very little. This is an example =
of something where per-flow stuff in the router actually makes the =
router helpful in the large scheme of things. Maybe all flows should be =
equally informed, as flows. Which means the router needs to know how to =
signal multiple flows, while not just hammering all the packets of a =
single flow.  This case is very real, but not as frequently on the =
client side as on the "server side" in "load balancers" and such like.
> =20
> My point here is simple:
> =20
> 1) the endpoints tell the routers what flows are going through a link =
already. That's just the address information. So that information can be =
used for fairness pretty well, especially if short term memory (a bloom =
filter, perhaps) can track a sufficiently large number of flows.
> =20
> 2) The per-flow decisions related to congestion control within a flow =
are necessarily end-to-end in nature - the router can only tell the ends =
what is going on, but the ends (together - their admissions rates and =
consumption rates are coupled to the use being made) must be informed =
and decide. The congestion management must combine information about the =
source and the destination future behavior (even if it is just taking =
recent history and projecting it as an estimate of future behavior at =
source and destination). Which is why it is quite natural to have =
routers signal the destination, which then signals the source, which =
changes its behavior.
> =20
> 3) there are definitely other ways to improve latency for IP and =
protocols built on top of it  - routing some flows over different paths =
under congestion is one. call the per-flow routing. Another is =
scattering a flow over several paths (but that seems problematic for =
today's TcP which assumes all packets take the same path).
> =20
> 4) A different, but very coupled view of IP is that any =
application-relevant buffering shoujld be driven into the endpoints - at =
the source, buffering is useful to deal with variability in the rate of =
production of data to be sent. At the destination, buffering is useful =
to minimize jitter, matching to the consumption behavior of the =
application.  But these buffers should not be pushed into the network =
where they cause congestion for other flows sharing resources.
> So buffering in the network should ONLY deal with the uncertainty in =
resource competition.
> =20
> This tripartite breakdown of buffering is protocol independent. It =
applies to TCP, NTP, RTP, QUIC/UDP, ...  It's what we (that is me) had =
in mind when we split UDP out of TCP, allowing UDP based protocols to =
manage source and destination buffering in the application for all the =
things we thought UDP would be used for - packet speech, =
computer-computer remote procedure calls (what would be QUIC today), =
SATNET/interplanetary Internet connections , ...).
> =20
> Sadly, in the many years since the late 1970's the tendency to think =
file transfers between infinite speed storage devices over TCP are the =
only relevant use of the Internet has penetrated the router design =
community. I can't seem to get anyone to recognize how far we are from =
that.  No one runs benchmarks for such behavior, no one even measures =
anything other than the "hot rod" maximum throughput cases.
> =20
> And many egos seem to think that working on the hot rod cases is going =
to make their career or sell product.  (e.g. the sad case of Arista).
> =20
> =20
> On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" =
<moeller0@gmx.de> said:
>=20
> >=20
> >=20
> > > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> =
wrote:
> > >
> > > [...]
> > >
> > > per-flow scheduling is appropriate on a shared link. However, the =
end-to-end
> > argument would suggest that the network not try to divine which =
flows get
> > preferred.
> > > And beyond the end-to-end argument, there's a practical problem - =
since the
> > ideal state of a shared link means that it ought to have no local =
backlog in the
> > queue, the information needed to schedule "fairly" isn't in the =
queue backlog
> > itself. If there is only one packet, what's to schedule?
> > >
> > [...]
> >=20
> > Excuse my stupidity, but the "only one single packet" case is the =
theoretical
> > limiting case, no?
> > Because even on a link not running at capacity this effectively =
requires a
> > mechanism to "synchronize" all senders (whose packets traverse the =
hop we are
> > looking at), as no other packet is allowed to reach the hop unless =
the "current"
> > one has been passed to the PHY otherwise we transiently queue 2 =
packets (I note
> > that this rationale should hold for any small N). The more packets =
per second a
> > hop handles the less likely it will be to avoid for any newcomer to =
run into an
> > already existing packet(s), that is to transiently grow the queue.
> > Not having a CS background, I fail to see how this required =
synchronized state can
> > exist outside of a few steady state configurations where things =
change slowly
> > enough that the seemingly required synchronization can actually =
happen (given
> > that the feedback loop e.g. through ACKs, seems somewhat jittery). =
Since packets
> > never know which path they take and which hop is going to be =
critical there seems
> > to be no a priori way to synchronize all senders, heck I fail to see =
whether it
> > would be possible at all to guarantee synchronized behavior on more =
than one hop
> > (unless all hops are extremely uniform).
> > I happen to believe that L4S suffers from the same conceptual issue =
(plus overly
> > generic promises, from the RITE website:
> > "We are so used to the unpredictability of queuing delay, we don=E2=80=
=99t know how
> > good the Internet would feel without it. The RITE project has =
developed simple
> > technology to make queuing delay a thing of the past=E2=80=94not =
just for a select
> > few apps, but for all." this seems missing a conditions apply =
statement)
> >=20
> > Best Regards
> > Sebastian