From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cabo@tzi.org>
Received: from mailhost.informatik.uni-bremen.de
 (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id E1A2E3B2A4
 for <bloat@lists.bufferbloat.net>; Sun, 17 Mar 2019 10:34:31 -0400 (EDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de
 (submithost2.informatik.uni-bremen.de
 [IPv6:2001:638:708:30c8:406a:91ff:fe74:f2b7])
 by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id
 x2HEYG1L007002; Sun, 17 Mar 2019 15:34:21 +0100 (CET)
Received: from client-0083.vpn.uni-bremen.de (client-0083.vpn.uni-bremen.de
 [134.102.107.83])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id
 44Mhf013kfz1Br6; Sun, 17 Mar 2019 15:34:16 +0100 (CET)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de>
Date: Sun, 17 Mar 2019 15:34:15 +0100
Cc: Greg White <g.white@CableLabs.com>,
 Ingemar Johansson S <ingemar.s.johansson@ericsson.com>,
 "bloat@lists.bufferbloat.net" <bloat@lists.bufferbloat.net>
X-Mao-Original-Outgoing-Id: 574526053.0983649-3d97693c884c8424b67319101a4862ef
Content-Transfer-Encoding: quoted-printable
Message-Id: <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org>
References: <HE1PR07MB442526730269DA318B2ED38BC24B0@HE1PR07MB4425.eurprd07.prod.outlook.com>
 <E3154B34-123E-4A64-B15A-F5F8CF5C55B4@gmx.de>
 <BF9A0862-8C25-43CC-B1C2-0D7B5BE4053B@cablelabs.com>
 <C9000B72-0F6C-4E5A-837A-A864FF773D88@gmx.de>
 <94B04C6B-5997-4971-9698-57BEA3AE5C0E@tzi.org>
 <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de>
To: Sebastian Moeller <moeller0@gmx.de>
X-Mailer: Apple Mail (2.3445.9.1)
Subject: Re: [Bloat] Packet reordering and RACK (was The "Some Congestion
 Experienced" ECN codepoint)
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 17 Mar 2019 14:34:32 -0000

>> The end-to-end argument applies:  Ultimately, there needs to be =
resequencing at the end anyway, so any reordering in the network would =
be a performance optimization.  It turns out that keeping packets lying =
around in some buffer somewhere in the network just to do resequencing =
before they exit an L2 domain (or a tunnel) is a pessimization, not an =
optimization.
>=20
> 	I do not buy the end to end argument here, because in the =
extreme why do ARQ on individual links anyway, we can just leave it to =
the end-points to do the ARQ and TCP does anyway.

The optimization is that the retransmission on a single link (or within =
a path segment, which is what I=E2=80=99m interested in) does not need =
to span the entire end-to-end path.  That is strictly better than an =
end-to-end retransmission.  Also, a local segment may allow faster =
recovery by not implicating the entire e2e latency, which allows for =
strictly better latency.  So, yes, there are significant optimizations =
in doing local retransmissions, but there are also interesting =
interactions with end-to-end retransmission that need to be taken care =
of.  This has been known for a long time, e.g., see =
https://tools.ietf.org/html/rfc3819#section-8 which documents things =
that were considered to be well known in the early 2000s.

> The point is transport-ARQ allows to use link technologies that =
otherwise would not be acceptable at all. So doing ARQ on the individual =
links already indicates that somethings are more efficient to not only =
do e2e.

Obviously.

> I just happen to think that re-ordering falls into the same category, =
at least for users stuck behind a slow link as is typical at the edge of =
the internet.

Resequencing (which is the term I prefer for putting things back in =
sequence again, after they have been reordered) requires storing packets =
that are ahead of later packets.  This is strictly suboptimal if these =
packets could be delivered instead (in contrast, it *is* a good idea to =
resequence packets that are in a queue waiting for a transmission =
opportunity).  So *requiring*(*) local path segments to resequence is =
strictly suboptimal.

(*) even if this is not a strict requirement, but just a statement of =
the form =E2=80=9Cthe transport will be much more efficient if you =
deliver in order=E2=80=9D.

> To put numbers to my example, assume I am on a 1/1 Mbps link and I get =
TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the =
numbers approximate) and I get a burst of say 10 packets containing say =
10 individual messages for my application telling the position of say an =
object in 3d space
>=20
> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 =
b/packet ) / (1000 * 1000 b/s)  =3D 12 ms
> So I get access to messages/new positions every 12 ms and I can =
display this smoothly

That is already broken by design.  If you are not accounting for latency =
variation (=E2=80=9Cjitter=E2=80=9D), you won=E2=80=99t be able to deal =
with it.  Your example also makes sure it does not work well by being =
based on 100 % utilization.

> Now if the first packet gets r-odered to be last, I either drop that =
packet

=E2=80=A6which is another nice function the network could do for you =
before expending further resources on useless delivery; see e.g. =
draft-ietf-6lo-deadline-time for one way to do this.

> and accept a 12 ms gap or if that is not an option I get to wait 9*12 =
=3D 108ms before positions can be updated, that IMHO shows why =
re-ordering is terrible even if TCP would be more tolerant.=20

You are assuming that the network can magically resequence a packet into =
place that it does not have.

Now I do understand that forwarding an out-of-order packet will block =
the output port for the time needed to serialize it.  So if you get it =
right before what would have been an in-order packet, the latter incurs =
additional latency.  Note that this requires a bottleneck configuration, =
i.e., packets to be forwarded arrive faster than they can be serialized =
out.  Don=E2=80=99t do bottlenecks if you want ultra-low latency.  (And =
don=E2=80=99t do links where you need to retransmit, either.)

> Especially in the context of L4S something like this seems to be =
totally unacceptable if ultra-low latency is supposed to be anything =
more than marketing.=20

Dropping packets that can=E2=80=99t be used anyway is strictly better =
than delivering them.
But apart from that, forwarding packets that I have is strictly better =
for low latency than leaving the output port idle and waiting for =
previous-in-order packets to send them out in sequence.

>> For three decades now, we have acted as if there is no cost for =
in-order delivery from L2 =E2=80=94 not because that is true, but =
because deployed transport protocol implementations were built and =
tested with simple links that don=E2=80=99t reorder. =20
>=20
> 	Well, that is similar to the argument for performing non-aligned =
loads fast in hardware, yes this comes with a considerable cost in =
complexity and it is harder to make this go fast than just allowing =
aligned loads and fixing up unaligned loads by trapping to software, but =
from a user perspective the fast hardware beats the fickle only make =
aligned loads go fast approach any old day.

CPUs have an abundance of transistors you can throw at this problem so =
the support of unaligned loads has become standard practice for CPUs =
with enough transistors.
I=E2=80=99m not sure this argument transfers, because this is not about =
transistors (except maybe when we talk about in-queue resequencing, =
which would be a nice feature if we had information in the packets to =
allow it).

>> Techniques for ECMP (equal-cost multi-path) have been developed that =
appease that illusion, but they actually also are pessimizations at =
least in some cases.
>=20
> 	Sure, but if I understand correctly, this is partly due to the =
fact that transport people opted not to do the re-sorting on a =
flow-by-flow basis; that would solve the blocking issue from the =
transport perspective, sure the affected flow would still suffer from =
some increased delay, but as I tried to show above that might be still =
smaller than the delay incurred by doing the re-sorting after the =
bottleneck link. What is wrong with my analysis?

Transport people have no control over what is happening in the network, =
so maybe I don=E2=80=99t understand the argument.

>> The question at hand is whether we can make the move back to =
end-to-end resequencing techniques that work well,
>=20
> 	But we can not, we can make TCP more robust, but what I predict =
if RACK allows for 100ms delay transports will take this as the new the =
new goal and will keep pushing against that limit; and all in the name =
of bandwidth over latency.

Where does this number come from?  100 ms is pretty long as a reordering =
maximum for most paths outside of satellite links.  Instead, you would =
do something based on an RTT estimate.

>> at least within some limits that we still have to find.
>> That probably requires some evolution at the end-to-end transport =
implementation layer.  We are in a better position to make that happen =
than we have been for a long time.
>=20
> 	Probably true, but also not very attractive from an end-user =
perspective=E2=80=A6. unless this will allow transport innovations that =
will allow massively more bandwidth at a smallish latency cost.

The argument against in-network resequencing is mostly a latency =
argument (but, as a second order effect, that reduced latency may also =
allow more throughput), so, again, I don=E2=80=99t quite understand.

Gr=C3=BC=C3=9Fe, Carsten