From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 4154F3B2A4
 for <bloat@lists.bufferbloat.net>; Sun, 17 Mar 2019 11:56:17 -0400 (EDT)
Received: from [192.168.42.220] ([77.182.103.198]) by mail.gmx.com (mrgmx102
 [212.227.17.168]) with ESMTPSA (Nemesis) id 0Meutp-1hPZ3z2utt-00OUSX; Sun, 17
 Mar 2019 16:56:02 +0100
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org>
Date: Sun, 17 Mar 2019 16:56:01 +0100
Cc: Greg White <g.white@CableLabs.com>,
 Ingemar Johansson S <ingemar.s.johansson@ericsson.com>,
 "bloat@lists.bufferbloat.net" <bloat@lists.bufferbloat.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C18E0000-CC99-4056-BBC6-9AF9FC15EED8@gmx.de>
References: <HE1PR07MB442526730269DA318B2ED38BC24B0@HE1PR07MB4425.eurprd07.prod.outlook.com>
 <E3154B34-123E-4A64-B15A-F5F8CF5C55B4@gmx.de>
 <BF9A0862-8C25-43CC-B1C2-0D7B5BE4053B@cablelabs.com>
 <C9000B72-0F6C-4E5A-837A-A864FF773D88@gmx.de>
 <94B04C6B-5997-4971-9698-57BEA3AE5C0E@tzi.org>
 <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de>
 <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3445.9.1)
X-Provags-ID: V03:K1:uFFlor48ZHS8BBYjuvNBMZ2BMwz5Z3slq7XD09PgoNXMM/IS+aD
 aPh7PCYQBbI9NAD9D3q83uUkrw/yQQPpBrzkZuTQnt2IWrvBm+9VF68jApOaW3sR1fpO0+y
 xiW1VmZBKmnXb/wlQ2yj5cgy7gasHIjT2Q4IbwNIexl9Rj8S6rtqeJD1BL6F8O9UPyQplqj
 V3iI6cfGkHwwbMB5TxAQg==
X-Spam-Flag: NO
X-UI-Out-Filterresults: notjunk:1;V03:K0:+I/Q1DpDVnA=:ZJWc9tMdrE5ndWhUF7ftIo
 MnwAFbcpNGlURGhU0CoYzwm1GqM2M1HAwe1hJXDd324brh4kdqW5Zp1LrPt1vJE1yonXXiync
 4S8oG1y/7Bvc0I1UcimIUUhMPXbuzlq1SQwQ7Z60cw1KQxE/aRkStmesQGbnw/QiWhjVl9aIt
 ldDnobNUDQF2MrehiOZGb6DkaJBiTI3W+uSp9uabsFCREJ/W4Dl1kjnA8fK+bUlm7w3FYyu8B
 urWCdlMXjjA//CdgRoN9HE0oQr5rnnEiXJj4jGPpkf1qShv/bxAyRM4OCw+9bLjRLAY95MWjH
 FGPppocYoxKUCRHly58wXw+1H9PHaEsEFqLERj1kORJOqpT7tBGG1iLfWPlZ8Viob5MOMTCpb
 gMuoFLIMy7w585/c4V3JEzlfi/CpIoCeGjAd7hUj7jt1257NuQJTwceWZfSydjhW35ftfNmmA
 v9IPgnFjGl3OA96FfvBxPp4hO8bVOa93GKYKYO0ZuIsBQcoCxQO+sLLOUTXPL4P6u9DcmHne6
 ZSjkVlKr9mReabf3GfZ1Fd0jxIRzsHPV35lauyaKIHqnwdp7VdG0r9FNH0MzfTKCzKyVS9WmV
 m1gu+j0p0E0f5YJbVRss5ur79N+Z2CR5Yt6rfJMbfI1aKr/IbBb8j4PPDP64g4GI8PgQ8C+7n
 PqnKCaNu722PXuof/I1Dd+dJfrWVdRmprYvBtyp0M1sgkVEuRbAOnIJJzctOPhrqFc9auW9We
 ROxZXVi3bT4dooRRmKemPZ/Gt58S9+E1EEMUS/Aq82Def6rwva8BoF6QhIDZ1ZILMGGPIRbOe
 74Snk8FMfcC+hsEl5QypJ+CjhJNWZQ2mK4cuN4NA6kwBe5P9jNThBsYYdjRWTqCmFvXGkEP4I
 ZSqECArRkgJ4WATlfSBQk0iGZ7eaAeOf12XVlKxWQur1cOnNvILN3Y67qcEvrI4wxCZILpNSA
 GRMdYQYWFlg==
Subject: Re: [Bloat] Packet reordering and RACK (was The "Some Congestion
 Experienced" ECN codepoint)
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 17 Mar 2019 15:56:17 -0000

Hi Carsten,

thanks for your insights.


> On Mar 17, 2019, at 15:34, Carsten Bormann <cabo@tzi.org> wrote:
>=20
>>> The end-to-end argument applies:  Ultimately, there needs to be =
resequencing at the end anyway, so any reordering in the network would =
be a performance optimization.  It turns out that keeping packets lying =
around in some buffer somewhere in the network just to do resequencing =
before they exit an L2 domain (or a tunnel) is a pessimization, not an =
optimization.
>>=20
>> 	I do not buy the end to end argument here, because in the =
extreme why do ARQ on individual links anyway, we can just leave it to =
the end-points to do the ARQ and TCP does anyway.
>=20
> The optimization is that the retransmission on a single link (or =
within a path segment, which is what I=E2=80=99m interested in) does not =
need to span the entire end-to-end path.  That is strictly better than =
an end-to-end retransmission. =20

	I agree, and by the same logic local resequencing is also =
better, unless the re-ordering event happened at the bottleneck link.

> Also, a local segment may allow faster recovery by not implicating the =
entire e2e latency, which allows for strictly better latency.
>  So, yes, there are significant optimizations in doing local =
retransmissions, but there are also interesting interactions with =
end-to-end retransmission that need to be taken care of.  This has been =
known for a long time, e.g., see =
https://tools.ietf.org/html/rfc3819#section-8 which documents things =
that were considered to be well known in the early 2000s.

	Thanks, but my understanding of this is basically that a link =
should just drop a packet unless it can be retransmitted with reasonable =
effort (like the G.INP retransmissiond on dsl-links will give up); sure =
we can argue about what "reasonable effort" is in reality, but I fear if =
we move away from 3 dupACKs to say X ms all transport links will assume =
they have leewway to allow re-ordering close to X, that will certainly =
be worse than today. And since I am an end-user and do not operate a =
transport network, I know what I prefer here...

>=20
>> The point is transport-ARQ allows to use link technologies that =
otherwise would not be acceptable at all. So doing ARQ on the individual =
links already indicates that somethings are more efficient to not only =
do e2e.
>=20
> Obviously.
>=20
>> I just happen to think that re-ordering falls into the same category, =
at least for users stuck behind a slow link as is typical at the edge of =
the internet.
>=20
> Resequencing (which is the term I prefer for putting things back in =
sequence again, after they have been reordered) requires storing packets =
that are ahead of later packets.

	Obviously.

>  This is strictly suboptimal if these packets could be delivered =
instead (in contrast, it *is* a good idea to resequence packets that are =
in a queue waiting for a transmission opportunity).

	Fair enough, but that basically expects the bottleneck link that =
actually accumulates a queue to do the heavy lifting, not sure that the =
economic incentives are properly aligned here.

>  So *requiring*(*) local path segments to resequence is strictly =
suboptimal.
>=20
> (*) even if this is not a strict requirement, but just a statement of =
the form =E2=80=9Cthe transport will be much more efficient if you =
deliver in order=E2=80=9D.

	My point is the transport will much more useful if if undertakes =
(reasonable) effort to deliver in-order, that is slight;y different, and =
I understand that those responsible for transport networks have a =
different viewpoint on this.

>=20
>> To put numbers to my example, assume I am on a 1/1 Mbps link and I =
get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the =
numbers approximate) and I get a burst of say 10 packets containing say =
10 individual messages for my application telling the position of say an =
object in 3d space
>>=20
>> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 =
b/packet ) / (1000 * 1000 b/s)  =3D 12 ms
>> So I get access to messages/new positions every 12 ms and I can =
display this smoothly
>=20
> That is already broken by design.

	Does not matter much, a well designed network should also allow =
to do stupid things...

>  If you are not accounting for latency variation (=E2=80=9Cjitter=E2=80=9D=
), you won=E2=80=99t be able to deal with it.

	Which would just complicate the issue a bit if we would =
introduce a say 25 ms de-jitter buffer without affecting the gist of it.

>  Your example also makes sure it does not work well by being based on =
100 % utilization.

	Same here, access links certainly run closer to 100% utilization =
than core links, so operation at full saturation is not completely =
unrealistic, but I really just set it up that way for clarity.

>=20
>> Now if the first packet gets r-odered to be last, I either drop that =
packet
>=20
> =E2=80=A6which is another nice function the network could do for you =
before expending further resources on useless delivery; see e.g. =
draft-ietf-6lo-deadline-time for one way to do this.

	Yes, but typically I do not want the network to do this, as I =
would be quite interested in knowing how much too late the packet =
arrived.

>=20
>> and accept a 12 ms gap or if that is not an option I get to wait 9*12 =
=3D 108ms before positions can be updated, that IMHO shows why =
re-ordering is terrible even if TCP would be more tolerant.=20
>=20
> You are assuming that the network can magically resequence a packet =
into place that it does not have.

	All I expect is that the network makes a reasonable effort to =
undo re-ordering close to where re-ordering happened.

>=20
> Now I do understand that forwarding an out-of-order packet will block =
the output port for the time needed to serialize it.  So if you get it =
right before what would have been an in-order packet, the latter incurs =
additional latency.  Note that this requires a bottleneck configuration, =
i.e., packets to be forwarded arrive faster than they can be serialized =
out.  Don=E2=80=99t do bottlenecks if you want ultra-low latency.  (And =
don=E2=80=99t do links where you need to retransmit, either.)

	I agree, but that is live with a home internet access link, the =
bottleneck is there. This also points out a problem with the L4S =
argument for end-users, as the ultra-low latency (their words, not mine) =
will not realize for end-users close to what the project seems to =
promise.

>=20
>> Especially in the context of L4S something like this seems to be =
totally unacceptable if ultra-low latency is supposed to be anything =
more than marketing.=20
>=20
> Dropping packets that can=E2=80=99t be used anyway is strictly better =
than delivering them.

	Well, not for L4S, as TCP Praque is supposed to fall back to =
legacy congestion control behavior upon encountering packet drops...

> But apart from that, forwarding packets that I have is strictly better =
for low latency than leaving the output port idle and waiting for =
previous-in-order packets to send them out in sequence.

	It really depends what we mean when we talk about latency here, =
as shown for and end-user that might be quite different...

>=20
>>> For three decades now, we have acted as if there is no cost for =
in-order delivery from L2 =E2=80=94 not because that is true, but =
because deployed transport protocol implementations were built and =
tested with simple links that don=E2=80=99t reorder. =20
>>=20
>> 	Well, that is similar to the argument for performing non-aligned =
loads fast in hardware, yes this comes with a considerable cost in =
complexity and it is harder to make this go fast than just allowing =
aligned loads and fixing up unaligned loads by trapping to software, but =
from a user perspective the fast hardware beats the fickle only make =
aligned loads go fast approach any old day.
>=20
> CPUs have an abundance of transistors you can throw at this problem so =
the support of unaligned loads has become standard practice for CPUs =
with enough transistors.
> I=E2=80=99m not sure this argument transfers, because this is not =
about transistors (except maybe when we talk about in-queue =
resequencing, which would be a nice feature if we had information in the =
packets to allow it).

Like the 5-tuple in TCP and UDP? This example was not meant to taken =
literally, but just to illustrate that depending on the level of =
observation speeding up one domain can have an noticeable effect on =
another one, but might still be worth the effort.

>=20
>>> Techniques for ECMP (equal-cost multi-path) have been developed that =
appease that illusion, but they actually also are pessimizations at =
least in some cases.
>>=20
>> 	Sure, but if I understand correctly, this is partly due to the =
fact that transport people opted not to do the re-sorting on a =
flow-by-flow basis; that would solve the blocking issue from the =
transport perspective, sure the affected flow would still suffer from =
some increased delay, but as I tried to show above that might be still =
smaller than the delay incurred by doing the re-sorting after the =
bottleneck link. What is wrong with my analysis?
>=20
> Transport people have no control over what is happening in the =
network, so maybe I don=E2=80=99t understand the argument.

	If the remote end of a potentially re-ordering link would =
implement fair-queueing (a big if, sure) then it should be easy to only =
stall the flows that have outstanding packets, and this could be solely =
be based on the local retransmit ACKs so the link would only need to =
clean up its own re-orderings an could just faithfully relay a flow that =
entered the link already with re-ordering. This might in reality not be =
feasible at all...

>=20
>>> The question at hand is whether we can make the move back to =
end-to-end resequencing techniques that work well,
>>=20
>> 	But we can not, we can make TCP more robust, but what I predict =
if RACK allows for 100ms delay transports will take this as the new the =
new goal and will keep pushing against that limit; and all in the name =
of bandwidth over latency.
>=20
> Where does this number come from?  100 ms is pretty long as a =
reordering maximum for most paths outside of satellite links.  Instead, =
you would do something based on an RTT estimate.

	I just made that number up as the exact N does not matter, the =
argument is what ever we set as the new threshold will be approached by =
transport characteristics. Then again havin something that inversely =
scales with bandwidth is certainly terrible from a transport =
perspective, so I can understand the argument for a fixed temporal =
threshold.

>=20
>>> at least within some limits that we still have to find.
>>> That probably requires some evolution at the end-to-end transport =
implementation layer.  We are in a better position to make that happen =
than we have been for a long time.
>>=20
>> 	Probably true, but also not very attractive from an end-user =
perspective=E2=80=A6. unless this will allow transport innovations that =
will allow massively more bandwidth at a smallish latency cost.
>=20
> The argument against in-network resequencing is mostly a latency =
argument (but, as a second order effect, that reduced latency may also =
allow more throughput), so, again, I don=E2=80=99t quite understand.

	As I tried to show for TCP the flow with re-ordered packets =
certainly pays a latency cost that especially if re-ordering does not =
happen on the bottleneck link but at a faster link could be smaller.

Gruss
	Sebastian

>=20
> Gr=C3=BC=C3=9Fe, Carsten
>=20