From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id A74C73B2A4 for ; Sun, 17 Mar 2019 13:10:06 -0400 (EDT) X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [IPv6:2001:638:708:30c8:406a:91ff:fe74:f2b7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id x2HH9ohK007426; Sun, 17 Mar 2019 18:09:55 +0100 (CET) Received: from client-0046.vpn.uni-bremen.de (client-0046.vpn.uni-bremen.de [134.102.107.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 44Mm5V2M57z1Bp8; Sun, 17 Mar 2019 18:09:50 +0100 (CET) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Carsten Bormann In-Reply-To: Date: Sun, 17 Mar 2019 18:09:48 +0100 Cc: Greg White , Ingemar Johansson S , "bloat@lists.bufferbloat.net" X-Mao-Original-Outgoing-Id: 574535386.895347-9a586e23f0bbfa8241e524446a3927d4 Content-Transfer-Encoding: quoted-printable Message-Id: References: <94B04C6B-5997-4971-9698-57BEA3AE5C0E@tzi.org> <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de> <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org> To: Sebastian Moeller X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2019 17:10:07 -0000 >>=20 >>>> The end-to-end argument applies: Ultimately, there needs to be = resequencing at the end anyway, so any reordering in the network would = be a performance optimization. It turns out that keeping packets lying = around in some buffer somewhere in the network just to do resequencing = before they exit an L2 domain (or a tunnel) is a pessimization, not an = optimization. >>>=20 >>> I do not buy the end to end argument here, because in the = extreme why do ARQ on individual links anyway, we can just leave it to = the end-points to do the ARQ and TCP does anyway. >>=20 >> The optimization is that the retransmission on a single link (or = within a path segment, which is what I=E2=80=99m interested in) does not = need to span the entire end-to-end path. That is strictly better than = an end-to-end retransmission. =20 >=20 > I agree, and by the same logic local resequencing is also = better, Non sequitur. The same logic simply does not apply. A resequenced = packet consumes the same transmission resources. (It also consumes more = buffer resources. So it is strictly worse when just looking at network = resources expended, which is the basis for the kind of logic applied = here.) > unless the re-ordering event happened at the bottleneck link. Not sure how this comes in now. >> Also, a local segment may allow faster recovery by not implicating = the entire e2e latency, which allows for strictly better latency. >> So, yes, there are significant optimizations in doing local = retransmissions, but there are also interesting interactions with = end-to-end retransmission that need to be taken care of. This has been = known for a long time, e.g., see = https://tools.ietf.org/html/rfc3819#section-8 which documents things = that were considered to be well known in the early 2000s. >=20 > Thanks, but my understanding of this is basically that a link = should just drop a packet unless it can be retransmitted with reasonable = effort (like the G.INP retransmissiond on dsl-links will give up); sure = we can argue about what "reasonable effort" is in reality, but I fear if = we move away from 3 dupACKs to say X ms all transport links will assume = they have leewway to allow re-ordering close to X, that will certainly = be worse than today. And since I am an end-user and do not operate a = transport network, I know what I prefer here=E2=80=A6 I=E2=80=99m sorry, I grew up as transport layer guy, so =E2=80=9Ctransport= =E2=80=9D means L4 (transport layer) for me, not =E2=80=9Ctransport = network=E2=80=9D. You may want to re-read my sentences with that knowledge; they might = make more sense. >> Resequencing (which is the term I prefer for putting things back in = sequence again, after they have been reordered) requires storing packets = that are ahead of later packets. >=20 > Obviously. >=20 >> This is strictly suboptimal if these packets could be delivered = instead (in contrast, it *is* a good idea to resequence packets that are = in a queue waiting for a transmission opportunity). >=20 > Fair enough, but that basically expects the bottleneck link that = actually accumulates a queue to do the heavy lifting, not sure that the = economic incentives are properly aligned here. It can actually do so more easily, because the speeds are lower. But deployment economy arguments are interesting as well; I was making = theoretical arguments first. >> So *requiring*(*) local path segments to resequence is strictly = suboptimal. >>=20 >> (*) even if this is not a strict requirement, but just a statement of = the form =E2=80=9Cthe transport will be much more efficient if you = deliver in order=E2=80=9D. >=20 > My point is the transport will much more useful if if undertakes = (reasonable) effort to deliver in-order, Please re-read as advised above. > that is slight;y different, and I understand that those responsible = for transport networks have a different viewpoint on this. >=20 >>=20 >>> To put numbers to my example, assume I am on a 1/1 Mbps link and I = get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the = numbers approximate) and I get a burst of say 10 packets containing say = 10 individual messages for my application telling the position of say an = object in 3d space >>>=20 >>> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 = b/packet ) / (1000 * 1000 b/s) =3D 12 ms >>> So I get access to messages/new positions every 12 ms and I can = display this smoothly >>=20 >> That is already broken by design. >=20 > Does not matter much, a well designed network should also allow = to do stupid things=E2=80=A6 Sure, but it won=E2=80=99t work very well then (and there is no point in = optimizing for that =E2=80=94 remember: all in-network work is just an = optimization under the end-to-end principle). >> If you are not accounting for latency variation (=E2=80=9Cjitter=E2=80=9D= ), you won=E2=80=99t be able to deal with it. >=20 > Which would just complicate the issue a bit if we would = introduce a say 25 ms de-jitter buffer without affecting the gist of it. That buffer increases the total latency but also the (useful) packet = delivery rate in the presence of reordering. >> Your example also makes sure it does not work well by being based on = 100 % utilization. >=20 > Same here, access links certainly run closer to 100% utilization = than core links, so operation at full saturation is not completely = unrealistic, but I really just set it up that way for clarity. Please use an example that is more realistic. >>> Now if the first packet gets r-odered to be last, I either drop that = packet >>=20 >> =E2=80=A6which is another nice function the network could do for you = before expending further resources on useless delivery; see e.g. = draft-ietf-6lo-deadline-time for one way to do this. >=20 > Yes, but typically I do not want the network to do this, as I = would be quite interested in knowing how much too late the packet = arrived. I don=E2=80=99t know how to make use of that knowledge, do you? Early discarding of a late packet (e.g., by not retransmitting it in the = first place) is so much better. >>> and accept a 12 ms gap or if that is not an option I get to wait = 9*12 =3D 108ms before positions can be updated, that IMHO shows why = re-ordering is terrible even if TCP would be more tolerant.=20 >>=20 >> You are assuming that the network can magically resequence a packet = into place that it does not have. >=20 > All I expect is that the network makes a reasonable effort to = undo re-ordering close to where re-ordering happened. All I=E2=80=99m trying to say is that this is bad engineering, = apparently perpetuated by bad transport layer implementations. >> Now I do understand that forwarding an out-of-order packet will block = the output port for the time needed to serialize it. So if you get it = right before what would have been an in-order packet, the latter incurs = additional latency. Note that this requires a bottleneck configuration, = i.e., packets to be forwarded arrive faster than they can be serialized = out. Don=E2=80=99t do bottlenecks if you want ultra-low latency. (And = don=E2=80=99t do links where you need to retransmit, either.) >=20 > I agree, but that is live with a home internet access link, the = bottleneck is there. This also points out a problem with the L4S = argument for end-users, as the ultra-low latency (their words, not mine) = will not realize for end-users close to what the project seems to = promise. I think reordering is not really a problem for ultra-low latency, or = more specifically, once reordering happens, you are no longer in the = ultra-low latency domain, >>> Especially in the context of L4S something like this seems to be = totally unacceptable if ultra-low latency is supposed to be anything = more than marketing.=20 >>=20 >> Dropping packets that can=E2=80=99t be used anyway is strictly better = than delivering them. >=20 > Well, not for L4S, as TCP Praque is supposed to fall back to = legacy congestion control behavior upon encountering packet drops=E2=80=A6= L4S is for reliable transport, which is a different scenario than the = one that benefits a lot from deadlines for packets. (Well, deadlines = might be used to make sure there is no dual retransmission, both local = and end-to-end, but again, this is not where you would use L4S.) >> But apart from that, forwarding packets that I have is strictly = better for low latency than leaving the output port idle and waiting for = previous-in-order packets to send them out in sequence. >=20 > It really depends what we mean when we talk about latency here, = as shown for and end-user that might be quite different=E2=80=A6 Apart from the port blocking effect I talked about (which is mostly = relevant for highly scheduled transmission schemes), I really have no = idea how the end-to-end latency would benefit from sitting on packets = while the port is idle. >>>> For three decades now, we have acted as if there is no cost for = in-order delivery from L2 =E2=80=94 not because that is true, but = because deployed transport protocol implementations were built and = tested with simple links that don=E2=80=99t reorder. =20 >>>=20 >>> Well, that is similar to the argument for performing non-aligned = loads fast in hardware, yes this comes with a considerable cost in = complexity and it is harder to make this go fast than just allowing = aligned loads and fixing up unaligned loads by trapping to software, but = from a user perspective the fast hardware beats the fickle only make = aligned loads go fast approach any old day. >>=20 >> CPUs have an abundance of transistors you can throw at this problem = so the support of unaligned loads has become standard practice for CPUs = with enough transistors. >> I=E2=80=99m not sure this argument transfers, because this is not = about transistors (except maybe when we talk about in-queue = resequencing, which would be a nice feature if we had information in the = packets to allow it). >=20 > Like the 5-tuple in TCP and UDP? That doesn=E2=80=99t help. I need a sequence number for resequencing, = and I can=E2=80=99t use the transport layer one because that is being = encrypted. Again, this is mostly theoretical as I don=E2=80=99t see = people rushing to do in-queue resequencing any time soon. (Skipping some text that is not relevant to my argument here.) >> Where does this number come from? 100 ms is pretty long as a = reordering maximum for most paths outside of satellite links. Instead, = you would do something based on an RTT estimate. >=20 > I just made that number up as the exact N does not matter, the = argument is what ever we set as the new threshold will be approached by = transport characteristics. Then again havin something that inversely = scales with bandwidth is certainly terrible from a transport = perspective, so I can understand the argument for a fixed temporal = threshold. I don=E2=80=99t follow at all here. >>>> at least within some limits that we still have to find. >>>> That probably requires some evolution at the end-to-end transport = implementation layer. We are in a better position to make that happen = than we have been for a long time. >>>=20 >>> Probably true, but also not very attractive from an end-user = perspective=E2=80=A6. unless this will allow transport innovations that = will allow massively more bandwidth at a smallish latency cost. >>=20 >> The argument against in-network resequencing is mostly a latency = argument (but, as a second order effect, that reduced latency may also = allow more throughput), so, again, I don=E2=80=99t quite understand. >=20 > As I tried to show for TCP the flow with re-ordered packets = certainly pays a latency cost that especially if re-ordering does not = happen on the bottleneck link but at a faster link could be smaller. I can=E2=80=99t parse this sentence, but my main point remains: In-network resequencing increases latency (with a potential impact on = throughput, too), unless it happens within a queue. We wouldn=E2=80=99t = want to do that, unless forced by a transport protocol that can=E2=80=99t = cope. If we can fix the transport protocols to enable (out-of-order) = immediate forwarding, then let=E2=80=99s do it; this might also enable = doing more in-network recovery, with the attendant performance = improvements. Gr=C3=BC=C3=9Fe, Carsten