From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id E1A2E3B2A4 for ; Sun, 17 Mar 2019 10:34:31 -0400 (EDT) X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [IPv6:2001:638:708:30c8:406a:91ff:fe74:f2b7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id x2HEYG1L007002; Sun, 17 Mar 2019 15:34:21 +0100 (CET) Received: from client-0083.vpn.uni-bremen.de (client-0083.vpn.uni-bremen.de [134.102.107.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 44Mhf013kfz1Br6; Sun, 17 Mar 2019 15:34:16 +0100 (CET) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Carsten Bormann In-Reply-To: <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de> Date: Sun, 17 Mar 2019 15:34:15 +0100 Cc: Greg White , Ingemar Johansson S , "bloat@lists.bufferbloat.net" X-Mao-Original-Outgoing-Id: 574526053.0983649-3d97693c884c8424b67319101a4862ef Content-Transfer-Encoding: quoted-printable Message-Id: <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org> References: <94B04C6B-5997-4971-9698-57BEA3AE5C0E@tzi.org> <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de> To: Sebastian Moeller X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2019 14:34:32 -0000 >> The end-to-end argument applies: Ultimately, there needs to be = resequencing at the end anyway, so any reordering in the network would = be a performance optimization. It turns out that keeping packets lying = around in some buffer somewhere in the network just to do resequencing = before they exit an L2 domain (or a tunnel) is a pessimization, not an = optimization. >=20 > I do not buy the end to end argument here, because in the = extreme why do ARQ on individual links anyway, we can just leave it to = the end-points to do the ARQ and TCP does anyway. The optimization is that the retransmission on a single link (or within = a path segment, which is what I=E2=80=99m interested in) does not need = to span the entire end-to-end path. That is strictly better than an = end-to-end retransmission. Also, a local segment may allow faster = recovery by not implicating the entire e2e latency, which allows for = strictly better latency. So, yes, there are significant optimizations = in doing local retransmissions, but there are also interesting = interactions with end-to-end retransmission that need to be taken care = of. This has been known for a long time, e.g., see = https://tools.ietf.org/html/rfc3819#section-8 which documents things = that were considered to be well known in the early 2000s. > The point is transport-ARQ allows to use link technologies that = otherwise would not be acceptable at all. So doing ARQ on the individual = links already indicates that somethings are more efficient to not only = do e2e. Obviously. > I just happen to think that re-ordering falls into the same category, = at least for users stuck behind a slow link as is typical at the edge of = the internet. Resequencing (which is the term I prefer for putting things back in = sequence again, after they have been reordered) requires storing packets = that are ahead of later packets. This is strictly suboptimal if these = packets could be delivered instead (in contrast, it *is* a good idea to = resequence packets that are in a queue waiting for a transmission = opportunity). So *requiring*(*) local path segments to resequence is = strictly suboptimal. (*) even if this is not a strict requirement, but just a statement of = the form =E2=80=9Cthe transport will be much more efficient if you = deliver in order=E2=80=9D. > To put numbers to my example, assume I am on a 1/1 Mbps link and I get = TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the = numbers approximate) and I get a burst of say 10 packets containing say = 10 individual messages for my application telling the position of say an = object in 3d space >=20 > each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 = b/packet ) / (1000 * 1000 b/s) =3D 12 ms > So I get access to messages/new positions every 12 ms and I can = display this smoothly That is already broken by design. If you are not accounting for latency = variation (=E2=80=9Cjitter=E2=80=9D), you won=E2=80=99t be able to deal = with it. Your example also makes sure it does not work well by being = based on 100 % utilization. > Now if the first packet gets r-odered to be last, I either drop that = packet =E2=80=A6which is another nice function the network could do for you = before expending further resources on useless delivery; see e.g. = draft-ietf-6lo-deadline-time for one way to do this. > and accept a 12 ms gap or if that is not an option I get to wait 9*12 = =3D 108ms before positions can be updated, that IMHO shows why = re-ordering is terrible even if TCP would be more tolerant.=20 You are assuming that the network can magically resequence a packet into = place that it does not have. Now I do understand that forwarding an out-of-order packet will block = the output port for the time needed to serialize it. So if you get it = right before what would have been an in-order packet, the latter incurs = additional latency. Note that this requires a bottleneck configuration, = i.e., packets to be forwarded arrive faster than they can be serialized = out. Don=E2=80=99t do bottlenecks if you want ultra-low latency. (And = don=E2=80=99t do links where you need to retransmit, either.) > Especially in the context of L4S something like this seems to be = totally unacceptable if ultra-low latency is supposed to be anything = more than marketing.=20 Dropping packets that can=E2=80=99t be used anyway is strictly better = than delivering them. But apart from that, forwarding packets that I have is strictly better = for low latency than leaving the output port idle and waiting for = previous-in-order packets to send them out in sequence. >> For three decades now, we have acted as if there is no cost for = in-order delivery from L2 =E2=80=94 not because that is true, but = because deployed transport protocol implementations were built and = tested with simple links that don=E2=80=99t reorder. =20 >=20 > Well, that is similar to the argument for performing non-aligned = loads fast in hardware, yes this comes with a considerable cost in = complexity and it is harder to make this go fast than just allowing = aligned loads and fixing up unaligned loads by trapping to software, but = from a user perspective the fast hardware beats the fickle only make = aligned loads go fast approach any old day. CPUs have an abundance of transistors you can throw at this problem so = the support of unaligned loads has become standard practice for CPUs = with enough transistors. I=E2=80=99m not sure this argument transfers, because this is not about = transistors (except maybe when we talk about in-queue resequencing, = which would be a nice feature if we had information in the packets to = allow it). >> Techniques for ECMP (equal-cost multi-path) have been developed that = appease that illusion, but they actually also are pessimizations at = least in some cases. >=20 > Sure, but if I understand correctly, this is partly due to the = fact that transport people opted not to do the re-sorting on a = flow-by-flow basis; that would solve the blocking issue from the = transport perspective, sure the affected flow would still suffer from = some increased delay, but as I tried to show above that might be still = smaller than the delay incurred by doing the re-sorting after the = bottleneck link. What is wrong with my analysis? Transport people have no control over what is happening in the network, = so maybe I don=E2=80=99t understand the argument. >> The question at hand is whether we can make the move back to = end-to-end resequencing techniques that work well, >=20 > But we can not, we can make TCP more robust, but what I predict = if RACK allows for 100ms delay transports will take this as the new the = new goal and will keep pushing against that limit; and all in the name = of bandwidth over latency. Where does this number come from? 100 ms is pretty long as a reordering = maximum for most paths outside of satellite links. Instead, you would = do something based on an RTT estimate. >> at least within some limits that we still have to find. >> That probably requires some evolution at the end-to-end transport = implementation layer. We are in a better position to make that happen = than we have been for a long time. >=20 > Probably true, but also not very attractive from an end-user = perspective=E2=80=A6. unless this will allow transport innovations that = will allow massively more bandwidth at a smallish latency cost. The argument against in-network resequencing is mostly a latency = argument (but, as a second order effect, that reduced latency may also = allow more throughput), so, again, I don=E2=80=99t quite understand. Gr=C3=BC=C3=9Fe, Carsten