From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4154F3B2A4 for ; Sun, 17 Mar 2019 11:56:17 -0400 (EDT) Received: from [192.168.42.220] ([77.182.103.198]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0Meutp-1hPZ3z2utt-00OUSX; Sun, 17 Mar 2019 16:56:02 +0100 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Sebastian Moeller In-Reply-To: <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org> Date: Sun, 17 Mar 2019 16:56:01 +0100 Cc: Greg White , Ingemar Johansson S , "bloat@lists.bufferbloat.net" Content-Transfer-Encoding: quoted-printable Message-Id: References: <94B04C6B-5997-4971-9698-57BEA3AE5C0E@tzi.org> <166A7220-875F-4FA0-A8EE-17F11037EC76@gmx.de> <4FA6FA39-7092-4E98-B12E-5236C8EACCE2@tzi.org> To: Carsten Bormann X-Mailer: Apple Mail (2.3445.9.1) X-Provags-ID: V03:K1:uFFlor48ZHS8BBYjuvNBMZ2BMwz5Z3slq7XD09PgoNXMM/IS+aD aPh7PCYQBbI9NAD9D3q83uUkrw/yQQPpBrzkZuTQnt2IWrvBm+9VF68jApOaW3sR1fpO0+y xiW1VmZBKmnXb/wlQ2yj5cgy7gasHIjT2Q4IbwNIexl9Rj8S6rtqeJD1BL6F8O9UPyQplqj V3iI6cfGkHwwbMB5TxAQg== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:+I/Q1DpDVnA=:ZJWc9tMdrE5ndWhUF7ftIo MnwAFbcpNGlURGhU0CoYzwm1GqM2M1HAwe1hJXDd324brh4kdqW5Zp1LrPt1vJE1yonXXiync 4S8oG1y/7Bvc0I1UcimIUUhMPXbuzlq1SQwQ7Z60cw1KQxE/aRkStmesQGbnw/QiWhjVl9aIt ldDnobNUDQF2MrehiOZGb6DkaJBiTI3W+uSp9uabsFCREJ/W4Dl1kjnA8fK+bUlm7w3FYyu8B urWCdlMXjjA//CdgRoN9HE0oQr5rnnEiXJj4jGPpkf1qShv/bxAyRM4OCw+9bLjRLAY95MWjH FGPppocYoxKUCRHly58wXw+1H9PHaEsEFqLERj1kORJOqpT7tBGG1iLfWPlZ8Viob5MOMTCpb gMuoFLIMy7w585/c4V3JEzlfi/CpIoCeGjAd7hUj7jt1257NuQJTwceWZfSydjhW35ftfNmmA v9IPgnFjGl3OA96FfvBxPp4hO8bVOa93GKYKYO0ZuIsBQcoCxQO+sLLOUTXPL4P6u9DcmHne6 ZSjkVlKr9mReabf3GfZ1Fd0jxIRzsHPV35lauyaKIHqnwdp7VdG0r9FNH0MzfTKCzKyVS9WmV m1gu+j0p0E0f5YJbVRss5ur79N+Z2CR5Yt6rfJMbfI1aKr/IbBb8j4PPDP64g4GI8PgQ8C+7n PqnKCaNu722PXuof/I1Dd+dJfrWVdRmprYvBtyp0M1sgkVEuRbAOnIJJzctOPhrqFc9auW9We ROxZXVi3bT4dooRRmKemPZ/Gt58S9+E1EEMUS/Aq82Def6rwva8BoF6QhIDZ1ZILMGGPIRbOe 74Snk8FMfcC+hsEl5QypJ+CjhJNWZQ2mK4cuN4NA6kwBe5P9jNThBsYYdjRWTqCmFvXGkEP4I ZSqECArRkgJ4WATlfSBQk0iGZ7eaAeOf12XVlKxWQur1cOnNvILN3Y67qcEvrI4wxCZILpNSA GRMdYQYWFlg== Subject: Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2019 15:56:17 -0000 Hi Carsten, thanks for your insights. > On Mar 17, 2019, at 15:34, Carsten Bormann wrote: >=20 >>> The end-to-end argument applies: Ultimately, there needs to be = resequencing at the end anyway, so any reordering in the network would = be a performance optimization. It turns out that keeping packets lying = around in some buffer somewhere in the network just to do resequencing = before they exit an L2 domain (or a tunnel) is a pessimization, not an = optimization. >>=20 >> I do not buy the end to end argument here, because in the = extreme why do ARQ on individual links anyway, we can just leave it to = the end-points to do the ARQ and TCP does anyway. >=20 > The optimization is that the retransmission on a single link (or = within a path segment, which is what I=E2=80=99m interested in) does not = need to span the entire end-to-end path. That is strictly better than = an end-to-end retransmission. =20 I agree, and by the same logic local resequencing is also = better, unless the re-ordering event happened at the bottleneck link. > Also, a local segment may allow faster recovery by not implicating the = entire e2e latency, which allows for strictly better latency. > So, yes, there are significant optimizations in doing local = retransmissions, but there are also interesting interactions with = end-to-end retransmission that need to be taken care of. This has been = known for a long time, e.g., see = https://tools.ietf.org/html/rfc3819#section-8 which documents things = that were considered to be well known in the early 2000s. Thanks, but my understanding of this is basically that a link = should just drop a packet unless it can be retransmitted with reasonable = effort (like the G.INP retransmissiond on dsl-links will give up); sure = we can argue about what "reasonable effort" is in reality, but I fear if = we move away from 3 dupACKs to say X ms all transport links will assume = they have leewway to allow re-ordering close to X, that will certainly = be worse than today. And since I am an end-user and do not operate a = transport network, I know what I prefer here... >=20 >> The point is transport-ARQ allows to use link technologies that = otherwise would not be acceptable at all. So doing ARQ on the individual = links already indicates that somethings are more efficient to not only = do e2e. >=20 > Obviously. >=20 >> I just happen to think that re-ordering falls into the same category, = at least for users stuck behind a slow link as is typical at the edge of = the internet. >=20 > Resequencing (which is the term I prefer for putting things back in = sequence again, after they have been reordered) requires storing packets = that are ahead of later packets. Obviously. > This is strictly suboptimal if these packets could be delivered = instead (in contrast, it *is* a good idea to resequence packets that are = in a queue waiting for a transmission opportunity). Fair enough, but that basically expects the bottleneck link that = actually accumulates a queue to do the heavy lifting, not sure that the = economic incentives are properly aligned here. > So *requiring*(*) local path segments to resequence is strictly = suboptimal. >=20 > (*) even if this is not a strict requirement, but just a statement of = the form =E2=80=9Cthe transport will be much more efficient if you = deliver in order=E2=80=9D. My point is the transport will much more useful if if undertakes = (reasonable) effort to deliver in-order, that is slight;y different, and = I understand that those responsible for transport networks have a = different viewpoint on this. >=20 >> To put numbers to my example, assume I am on a 1/1 Mbps link and I = get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the = numbers approximate) and I get a burst of say 10 packets containing say = 10 individual messages for my application telling the position of say an = object in 3d space >>=20 >> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 = b/packet ) / (1000 * 1000 b/s) =3D 12 ms >> So I get access to messages/new positions every 12 ms and I can = display this smoothly >=20 > That is already broken by design. Does not matter much, a well designed network should also allow = to do stupid things... > If you are not accounting for latency variation (=E2=80=9Cjitter=E2=80=9D= ), you won=E2=80=99t be able to deal with it. Which would just complicate the issue a bit if we would = introduce a say 25 ms de-jitter buffer without affecting the gist of it. > Your example also makes sure it does not work well by being based on = 100 % utilization. Same here, access links certainly run closer to 100% utilization = than core links, so operation at full saturation is not completely = unrealistic, but I really just set it up that way for clarity. >=20 >> Now if the first packet gets r-odered to be last, I either drop that = packet >=20 > =E2=80=A6which is another nice function the network could do for you = before expending further resources on useless delivery; see e.g. = draft-ietf-6lo-deadline-time for one way to do this. Yes, but typically I do not want the network to do this, as I = would be quite interested in knowing how much too late the packet = arrived. >=20 >> and accept a 12 ms gap or if that is not an option I get to wait 9*12 = =3D 108ms before positions can be updated, that IMHO shows why = re-ordering is terrible even if TCP would be more tolerant.=20 >=20 > You are assuming that the network can magically resequence a packet = into place that it does not have. All I expect is that the network makes a reasonable effort to = undo re-ordering close to where re-ordering happened. >=20 > Now I do understand that forwarding an out-of-order packet will block = the output port for the time needed to serialize it. So if you get it = right before what would have been an in-order packet, the latter incurs = additional latency. Note that this requires a bottleneck configuration, = i.e., packets to be forwarded arrive faster than they can be serialized = out. Don=E2=80=99t do bottlenecks if you want ultra-low latency. (And = don=E2=80=99t do links where you need to retransmit, either.) I agree, but that is live with a home internet access link, the = bottleneck is there. This also points out a problem with the L4S = argument for end-users, as the ultra-low latency (their words, not mine) = will not realize for end-users close to what the project seems to = promise. >=20 >> Especially in the context of L4S something like this seems to be = totally unacceptable if ultra-low latency is supposed to be anything = more than marketing.=20 >=20 > Dropping packets that can=E2=80=99t be used anyway is strictly better = than delivering them. Well, not for L4S, as TCP Praque is supposed to fall back to = legacy congestion control behavior upon encountering packet drops... > But apart from that, forwarding packets that I have is strictly better = for low latency than leaving the output port idle and waiting for = previous-in-order packets to send them out in sequence. It really depends what we mean when we talk about latency here, = as shown for and end-user that might be quite different... >=20 >>> For three decades now, we have acted as if there is no cost for = in-order delivery from L2 =E2=80=94 not because that is true, but = because deployed transport protocol implementations were built and = tested with simple links that don=E2=80=99t reorder. =20 >>=20 >> Well, that is similar to the argument for performing non-aligned = loads fast in hardware, yes this comes with a considerable cost in = complexity and it is harder to make this go fast than just allowing = aligned loads and fixing up unaligned loads by trapping to software, but = from a user perspective the fast hardware beats the fickle only make = aligned loads go fast approach any old day. >=20 > CPUs have an abundance of transistors you can throw at this problem so = the support of unaligned loads has become standard practice for CPUs = with enough transistors. > I=E2=80=99m not sure this argument transfers, because this is not = about transistors (except maybe when we talk about in-queue = resequencing, which would be a nice feature if we had information in the = packets to allow it). Like the 5-tuple in TCP and UDP? This example was not meant to taken = literally, but just to illustrate that depending on the level of = observation speeding up one domain can have an noticeable effect on = another one, but might still be worth the effort. >=20 >>> Techniques for ECMP (equal-cost multi-path) have been developed that = appease that illusion, but they actually also are pessimizations at = least in some cases. >>=20 >> Sure, but if I understand correctly, this is partly due to the = fact that transport people opted not to do the re-sorting on a = flow-by-flow basis; that would solve the blocking issue from the = transport perspective, sure the affected flow would still suffer from = some increased delay, but as I tried to show above that might be still = smaller than the delay incurred by doing the re-sorting after the = bottleneck link. What is wrong with my analysis? >=20 > Transport people have no control over what is happening in the = network, so maybe I don=E2=80=99t understand the argument. If the remote end of a potentially re-ordering link would = implement fair-queueing (a big if, sure) then it should be easy to only = stall the flows that have outstanding packets, and this could be solely = be based on the local retransmit ACKs so the link would only need to = clean up its own re-orderings an could just faithfully relay a flow that = entered the link already with re-ordering. This might in reality not be = feasible at all... >=20 >>> The question at hand is whether we can make the move back to = end-to-end resequencing techniques that work well, >>=20 >> But we can not, we can make TCP more robust, but what I predict = if RACK allows for 100ms delay transports will take this as the new the = new goal and will keep pushing against that limit; and all in the name = of bandwidth over latency. >=20 > Where does this number come from? 100 ms is pretty long as a = reordering maximum for most paths outside of satellite links. Instead, = you would do something based on an RTT estimate. I just made that number up as the exact N does not matter, the = argument is what ever we set as the new threshold will be approached by = transport characteristics. Then again havin something that inversely = scales with bandwidth is certainly terrible from a transport = perspective, so I can understand the argument for a fixed temporal = threshold. >=20 >>> at least within some limits that we still have to find. >>> That probably requires some evolution at the end-to-end transport = implementation layer. We are in a better position to make that happen = than we have been for a long time. >>=20 >> Probably true, but also not very attractive from an end-user = perspective=E2=80=A6. unless this will allow transport innovations that = will allow massively more bandwidth at a smallish latency cost. >=20 > The argument against in-network resequencing is mostly a latency = argument (but, as a second order effect, that reduced latency may also = allow more throughput), so, again, I don=E2=80=99t quite understand. As I tried to show for TCP the flow with re-ordered packets = certainly pays a latency cost that especially if re-ordering does not = happen on the bottleneck link but at a faster link could be smaller. Gruss Sebastian >=20 > Gr=C3=BC=C3=9Fe, Carsten >=20