From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cabo@tzi.org>
Received: from mailhost.informatik.uni-bremen.de
 (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id A74C73B2A4
 for <bloat@lists.bufferbloat.net>; Sun, 17 Mar 2019 13:10:06 -0400 (EDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de
 (submithost2.informatik.uni-bremen.de
 [IPv6:2001:638:708:30c8:406a:91ff:fe74:f2b7])
 by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id
 x2HH9ohK007426; Sun, 17 Mar 2019 18:09:55 +0100 (CET)
Received: from client-0046.vpn.uni-bremen.de (client-0046.vpn.uni-bremen.de
 [134.102.107.46])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id
 44Mm5V2M57z1Bp8; Sun, 17 Mar 2019 18:09:50 +0100 (CET)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <C18E0000-CC99-4056-BBC6-9AF9FC15EED8@gmx.de>
Date: Sun, 17 Mar 2019 18:09:48 +0100
Cc: Greg White <g.white@CableLabs.com>,
 Ingemar Johansson S <ingemar.s.johansson@ericsson.com>,
 "bloat@lists.bufferbloat.net" <bloat@lists.bufferbloat.net>
X-Mao-Original-Outgoing-Id: 574535386.895347-9a586e23f0bbfa8241e524446a3927d4
Content-Transfer-Encoding: quoted-printable
Message-Id: <D5774843-A41C-47E9-AF30-809E6C58F939@tzi.org>
References: <HE1PR07MB442526730269DA318B2ED38BC24B0@HE1PR07MB4425.eurprd07.prod.outlook.com>
 <E3154B34-123E-4A64-B15A-F5F8CF5C55B4@gmx.de>
 <BF9A0862-8C25-43CC-B1C2-0D7B5BE4053B@cablelabs.com>
 <C9000B72-0F6C-4E5A-837A-A864FF773D88@gmx.de>
 <94B04C6B-5997-4971-9698-57BEA3AE5C0E@tzi.org>
 <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de>
 <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org>
 <C18E0000-CC99-4056-BBC6-9AF9FC15EED8@gmx.de>
To: Sebastian Moeller <moeller0@gmx.de>
X-Mailer: Apple Mail (2.3445.9.1)
Subject: Re: [Bloat] Packet reordering and RACK (was The "Some Congestion
 Experienced" ECN codepoint)
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 17 Mar 2019 17:10:07 -0000

>>=20
>>>> The end-to-end argument applies:  Ultimately, there needs to be =
resequencing at the end anyway, so any reordering in the network would =
be a performance optimization.  It turns out that keeping packets lying =
around in some buffer somewhere in the network just to do resequencing =
before they exit an L2 domain (or a tunnel) is a pessimization, not an =
optimization.
>>>=20
>>> 	I do not buy the end to end argument here, because in the =
extreme why do ARQ on individual links anyway, we can just leave it to =
the end-points to do the ARQ and TCP does anyway.
>>=20
>> The optimization is that the retransmission on a single link (or =
within a path segment, which is what I=E2=80=99m interested in) does not =
need to span the entire end-to-end path.  That is strictly better than =
an end-to-end retransmission. =20
>=20
> 	I agree, and by the same logic local resequencing is also =
better,

Non sequitur.  The same logic simply does not apply.  A resequenced =
packet consumes the same transmission resources.  (It also consumes more =
buffer resources.  So it is strictly worse when just looking at network =
resources expended, which is the basis for the kind of logic applied =
here.)

> unless the re-ordering event happened at the bottleneck link.

Not sure how this comes in now.

>> Also, a local segment may allow faster recovery by not implicating =
the entire e2e latency, which allows for strictly better latency.
>> So, yes, there are significant optimizations in doing local =
retransmissions, but there are also interesting interactions with =
end-to-end retransmission that need to be taken care of.  This has been =
known for a long time, e.g., see =
https://tools.ietf.org/html/rfc3819#section-8 which documents things =
that were considered to be well known in the early 2000s.
>=20
> 	Thanks, but my understanding of this is basically that a link =
should just drop a packet unless it can be retransmitted with reasonable =
effort (like the G.INP retransmissiond on dsl-links will give up); sure =
we can argue about what "reasonable effort" is in reality, but I fear if =
we move away from 3 dupACKs to say X ms all transport links will assume =
they have leewway to allow re-ordering close to X, that will certainly =
be worse than today. And since I am an end-user and do not operate a =
transport network, I know what I prefer here=E2=80=A6

I=E2=80=99m sorry, I grew up as transport layer guy, so =E2=80=9Ctransport=
=E2=80=9D means L4 (transport layer) for me, not =E2=80=9Ctransport =
network=E2=80=9D.
You may want to re-read my sentences with that knowledge; they might =
make more sense.

>> Resequencing (which is the term I prefer for putting things back in =
sequence again, after they have been reordered) requires storing packets =
that are ahead of later packets.
>=20
> 	Obviously.
>=20
>> This is strictly suboptimal if these packets could be delivered =
instead (in contrast, it *is* a good idea to resequence packets that are =
in a queue waiting for a transmission opportunity).
>=20
> 	Fair enough, but that basically expects the bottleneck link that =
actually accumulates a queue to do the heavy lifting, not sure that the =
economic incentives are properly aligned here.

It can actually do so more easily, because the speeds are lower.
But deployment economy arguments are interesting as well; I was making =
theoretical arguments first.

>> So *requiring*(*) local path segments to resequence is strictly =
suboptimal.
>>=20
>> (*) even if this is not a strict requirement, but just a statement of =
the form =E2=80=9Cthe transport will be much more efficient if you =
deliver in order=E2=80=9D.
>=20
> 	My point is the transport will much more useful if if undertakes =
(reasonable) effort to deliver in-order,

Please re-read as advised above.

> that is slight;y different, and I understand that those responsible =
for transport networks have a different viewpoint on this.
>=20
>>=20
>>> To put numbers to my example, assume I am on a 1/1 Mbps link and I =
get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the =
numbers approximate) and I get a burst of say 10 packets containing say =
10 individual messages for my application telling the position of say an =
object in 3d space
>>>=20
>>> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 =
b/packet ) / (1000 * 1000 b/s)  =3D 12 ms
>>> So I get access to messages/new positions every 12 ms and I can =
display this smoothly
>>=20
>> That is already broken by design.
>=20
> 	Does not matter much, a well designed network should also allow =
to do stupid things=E2=80=A6

Sure, but it won=E2=80=99t work very well then (and there is no point in =
optimizing for that =E2=80=94 remember: all in-network work is just an =
optimization under the end-to-end principle).

>> If you are not accounting for latency variation (=E2=80=9Cjitter=E2=80=9D=
), you won=E2=80=99t be able to deal with it.
>=20
> 	Which would just complicate the issue a bit if we would =
introduce a say 25 ms de-jitter buffer without affecting the gist of it.

That buffer increases the total latency but also the (useful) packet =
delivery rate in the presence of reordering.

>> Your example also makes sure it does not work well by being based on =
100 % utilization.
>=20
> 	Same here, access links certainly run closer to 100% utilization =
than core links, so operation at full saturation is not completely =
unrealistic, but I really just set it up that way for clarity.

Please use an example that is more realistic.

>>> Now if the first packet gets r-odered to be last, I either drop that =
packet
>>=20
>> =E2=80=A6which is another nice function the network could do for you =
before expending further resources on useless delivery; see e.g. =
draft-ietf-6lo-deadline-time for one way to do this.
>=20
> 	Yes, but typically I do not want the network to do this, as I =
would be quite interested in knowing how much too late the packet =
arrived.

I don=E2=80=99t know how to make use of that knowledge, do you?
Early discarding of a late packet (e.g., by not retransmitting it in the =
first place) is so much better.

>>> and accept a 12 ms gap or if that is not an option I get to wait =
9*12 =3D 108ms before positions can be updated, that IMHO shows why =
re-ordering is terrible even if TCP would be more tolerant.=20
>>=20
>> You are assuming that the network can magically resequence a packet =
into place that it does not have.
>=20
> 	All I expect is that the network makes a reasonable effort to =
undo re-ordering close to where re-ordering happened.

All I=E2=80=99m trying to say is that this is bad engineering, =
apparently perpetuated by bad transport layer implementations.

>> Now I do understand that forwarding an out-of-order packet will block =
the output port for the time needed to serialize it.  So if you get it =
right before what would have been an in-order packet, the latter incurs =
additional latency.  Note that this requires a bottleneck configuration, =
i.e., packets to be forwarded arrive faster than they can be serialized =
out.  Don=E2=80=99t do bottlenecks if you want ultra-low latency.  (And =
don=E2=80=99t do links where you need to retransmit, either.)
>=20
> 	I agree, but that is live with a home internet access link, the =
bottleneck is there. This also points out a problem with the L4S =
argument for end-users, as the ultra-low latency (their words, not mine) =
will not realize for end-users close to what the project seems to =
promise.

I think reordering is not really a problem for ultra-low latency, or =
more specifically, once reordering happens, you are no longer in the =
ultra-low latency domain,

>>> Especially in the context of L4S something like this seems to be =
totally unacceptable if ultra-low latency is supposed to be anything =
more than marketing.=20
>>=20
>> Dropping packets that can=E2=80=99t be used anyway is strictly better =
than delivering them.
>=20
> 	Well, not for L4S, as TCP Praque is supposed to fall back to =
legacy congestion control behavior upon encountering packet drops=E2=80=A6=


L4S is for reliable transport, which is a different scenario than the =
one that benefits a lot from deadlines for packets.  (Well, deadlines =
might be used to make sure there is no dual retransmission, both local =
and end-to-end, but again, this is not where you would use L4S.)

>> But apart from that, forwarding packets that I have is strictly =
better for low latency than leaving the output port idle and waiting for =
previous-in-order packets to send them out in sequence.
>=20
> 	It really depends what we mean when we talk about latency here, =
as shown for and end-user that might be quite different=E2=80=A6

Apart from the port blocking effect I talked about (which is mostly =
relevant for highly scheduled transmission schemes), I really have no =
idea how the end-to-end latency would benefit from sitting on packets =
while the port is idle.

>>>> For three decades now, we have acted as if there is no cost for =
in-order delivery from L2 =E2=80=94 not because that is true, but =
because deployed transport protocol implementations were built and =
tested with simple links that don=E2=80=99t reorder. =20
>>>=20
>>> 	Well, that is similar to the argument for performing non-aligned =
loads fast in hardware, yes this comes with a considerable cost in =
complexity and it is harder to make this go fast than just allowing =
aligned loads and fixing up unaligned loads by trapping to software, but =
from a user perspective the fast hardware beats the fickle only make =
aligned loads go fast approach any old day.
>>=20
>> CPUs have an abundance of transistors you can throw at this problem =
so the support of unaligned loads has become standard practice for CPUs =
with enough transistors.
>> I=E2=80=99m not sure this argument transfers, because this is not =
about transistors (except maybe when we talk about in-queue =
resequencing, which would be a nice feature if we had information in the =
packets to allow it).
>=20
> Like the 5-tuple in TCP and UDP?

That doesn=E2=80=99t help.  I need a sequence number for resequencing, =
and I can=E2=80=99t use the transport layer one because that is being =
encrypted.  Again, this is mostly theoretical as I don=E2=80=99t see =
people rushing to do in-queue resequencing any time soon.

(Skipping some text that is not relevant to my argument here.)

>> Where does this number come from?  100 ms is pretty long as a =
reordering maximum for most paths outside of satellite links. Instead, =
you would do something based on an RTT estimate.
>=20
> 	I just made that number up as the exact N does not matter, the =
argument is what ever we set as the new threshold will be approached by =
transport characteristics. Then again havin something that inversely =
scales with bandwidth is certainly terrible from a transport =
perspective, so I can understand the argument for a fixed temporal =
threshold.

I don=E2=80=99t follow at all here.

>>>> at least within some limits that we still have to find.
>>>> That probably requires some evolution at the end-to-end transport =
implementation layer.  We are in a better position to make that happen =
than we have been for a long time.
>>>=20
>>> 	Probably true, but also not very attractive from an end-user =
perspective=E2=80=A6. unless this will allow transport innovations that =
will allow massively more bandwidth at a smallish latency cost.
>>=20
>> The argument against in-network resequencing is mostly a latency =
argument (but, as a second order effect, that reduced latency may also =
allow more throughput), so, again, I don=E2=80=99t quite understand.
>=20
> 	As I tried to show for TCP the flow with re-ordered packets =
certainly pays a latency cost that especially if re-ordering does not =
happen on the bottleneck link but at a faster link could be smaller.

I can=E2=80=99t parse this sentence, but my main point remains:

In-network resequencing increases latency (with a potential impact on =
throughput, too), unless it happens within a queue.  We wouldn=E2=80=99t =
want to do that, unless forced by a transport protocol that can=E2=80=99t =
cope.  If we can fix the transport protocols to enable (out-of-order) =
immediate forwarding, then let=E2=80=99s do it; this might also enable =
doing more in-network recovery, with the attendant performance =
improvements.

Gr=C3=BC=C3=9Fe, Carsten