* [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint)
@ 2019-03-14 8:26 Ingemar Johansson S
2019-03-14 8:43 ` Sebastian Moeller
0 siblings, 1 reply; 13+ messages in thread
From: Ingemar Johansson S @ 2019-03-14 8:26 UTC (permalink / raw)
To: bloat; +Cc: Ingemar Johansson S
[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]
Hi
In addition to the below, in NR (=New Radio, a part of 5G) the RLC layer will
no longer ensure
in sequence delivery to higher layers. Packet reordering can occur on the MAC
layer as several HARQ (Hybrid ARQ) run simultaneously to transmit packets,
some processes need to retransmit and there you get packet reordering.
The PDCP layer can however (optionally) enforce in sequence delivery,
personally I am sceptic about the benefits of this as it adds extra HoL
blocking to solve a problem that RACK can solve. In addition it costs more
memory in nodes that potentially need to transmit 10s of GByte of data.
/Ingemar
======
Date: Tue, 12 Mar 2019 21:39:42 -0700 (PDT)
From: David Lang <david@lang.hm>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Mikael Abrahamsson <swmike@swm.pp.se>, "Holland, Jake"
<jholland@akamai.com>, Cake List <cake@lists.bufferbloat.net>,
"codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>, bloat
<bloat@lists.bufferbloat.net>, "ecn-sane@lists.bufferbloat.net"
<ecn-sane@lists.bufferbloat.net>
Subject: Re: [Bloat] [Cake] The "Some Congestion Experienced" ECN
codepoint - a new internet draft -
Message-ID: <nycvar.QRO.7.76.6.1903122137430.6242@qynat-yncgbc>
Content-Type: text/plain; charset=US-ASCII; format=flowed
On Mon, 11 Mar 2019, Sebastian Moeller wrote:
> How is packet reordering for anybody but the folks responsible for
> operating the "conduits" in any way attractive?
It's more that not worrying about maintaining the order, and just moving the
packets as fast as possible reduces the overhead.
The majority of the time, packets will be in order, but race conditions and
corner cases are allowed to forward packets out of order rather than having
the delay some packets to maintain the order.
David Lang
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6332 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-14 8:26 [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) Ingemar Johansson S @ 2019-03-14 8:43 ` Sebastian Moeller 2019-03-14 19:23 ` Greg White 0 siblings, 1 reply; 13+ messages in thread From: Sebastian Moeller @ 2019-03-14 8:43 UTC (permalink / raw) To: Ingemar Johansson S; +Cc: bloat Hi, > On Mar 14, 2019, at 09:26, Ingemar Johansson S <ingemar.s.johansson@ericsson.com> wrote: > > Hi > > In addition to the below, in NR (=New Radio, a part of 5G) the RLC layer will > no longer ensure > in sequence delivery to higher layers. Packet reordering can occur on the MAC > layer as several HARQ (Hybrid ARQ) run simultaneously to transmit packets, > some processes need to retransmit and there you get packet reordering. Unfortunate... > The PDCP layer can however (optionally) enforce in sequence delivery, > personally I am sceptic about the benefits of this as it adds extra HoL > blocking to solve a problem that RACK can solve. In the context of the L4S (over-) promises it can not IMHO, unless the lossy link is slower than the internet access link. My rationale is that direct retransmits of packets that did not pass the current "physical" connection and sorting at the remote end of the link before passing the packets on, is only going to introduce more (sorting-)delay then sending out of order and expect the receiver to put things in sequence again, IFF the retransmit process takes the order of time as the transfer of the un-ordered packets over the bottleneck access link. As far as I understand L$S is all about reducing application visible latency, and RACK, in my layman's understanding, is also not going to help much here as it basically introduces another timeout (aka potential delay). I am not trying to pass a judgment on RACK here (and as far as I understand it solves a different problem, by making TCP more tolerant against mild reordering it will increase bandwidth utilization and reduce delays from having to slow down, but it will do nothing for the perceived latency from re-odered packets), all I want to understand how this is going to work in the context of "ultra-low latency" (a term from the L4S RFCs that I believe to be a tad too much)? > In addition it costs more > memory in nodes that potentially need to transmit 10s of GByte of data. Sure, but such is life, this reminds me a bit on debates about cache coherency and un-aligned memory accesses, the hardware people seem to argue life would be simpler/faster if these could be traded-in, but that simply pushes cost and complexity on the software side ;) Best Regards Sebastian P.S.: What is the best place to discuss L4S? Certainly not this mailing list, or? > > /Ingemar > ====== > > Date: Tue, 12 Mar 2019 21:39:42 -0700 (PDT) > From: David Lang <david@lang.hm> > To: Sebastian Moeller <moeller0@gmx.de> > Cc: Mikael Abrahamsson <swmike@swm.pp.se>, "Holland, Jake" > <jholland@akamai.com>, Cake List <cake@lists.bufferbloat.net>, > "codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>, bloat > <bloat@lists.bufferbloat.net>, "ecn-sane@lists.bufferbloat.net" > <ecn-sane@lists.bufferbloat.net> > Subject: Re: [Bloat] [Cake] The "Some Congestion Experienced" ECN > codepoint - a new internet draft - > Message-ID: <nycvar.QRO.7.76.6.1903122137430.6242@qynat-yncgbc> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > On Mon, 11 Mar 2019, Sebastian Moeller wrote: > >> How is packet reordering for anybody but the folks responsible for >> operating the "conduits" in any way attractive? > > It's more that not worrying about maintaining the order, and just moving the > packets as fast as possible reduces the overhead. > > The majority of the time, packets will be in order, but race conditions and > corner cases are allowed to forward packets out of order rather than having > the delay some packets to maintain the order. > > David Lang > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-14 8:43 ` Sebastian Moeller @ 2019-03-14 19:23 ` Greg White 2019-03-14 21:43 ` Sebastian Moeller 0 siblings, 1 reply; 13+ messages in thread From: Greg White @ 2019-03-14 19:23 UTC (permalink / raw) To: Sebastian Moeller, Ingemar Johansson S; +Cc: bloat Sebastian, The latency benefit of eliminating the in-order delivery assumption comes from the fact that L2 links aren't reordering within each microflow. They are reordering on the link as a whole. So, (e.g.) packets from microflows x, y and z can be held up in a resequencing buffer waiting for a packet from microflow w. So, it is a definite latency benefit for flows x, y & z if this can be shut off. Philosophically, since protocols and applications can vary in their need for in-order delivery, why does it not make sense to rely on the applications/protocols that need in-order delivery to implement their own resequencing? -Greg On 3/14/19, 2:43 AM, "Bloat on behalf of Sebastian Moeller" <bloat-bounces@lists.bufferbloat.net on behalf of moeller0@gmx.de> wrote: Hi, > On Mar 14, 2019, at 09:26, Ingemar Johansson S <ingemar.s.johansson@ericsson.com> wrote: > > Hi > > In addition to the below, in NR (=New Radio, a part of 5G) the RLC layer will > no longer ensure > in sequence delivery to higher layers. Packet reordering can occur on the MAC > layer as several HARQ (Hybrid ARQ) run simultaneously to transmit packets, > some processes need to retransmit and there you get packet reordering. Unfortunate... > The PDCP layer can however (optionally) enforce in sequence delivery, > personally I am sceptic about the benefits of this as it adds extra HoL > blocking to solve a problem that RACK can solve. In the context of the L4S (over-) promises it can not IMHO, unless the lossy link is slower than the internet access link. My rationale is that direct retransmits of packets that did not pass the current "physical" connection and sorting at the remote end of the link before passing the packets on, is only going to introduce more (sorting-)delay then sending out of order and expect the receiver to put things in sequence again, IFF the retransmit process takes the order of time as the transfer of the un-ordered packets over the bottleneck access link. As far as I understand L$S is all about reducing application visible latency, and RACK, in my layman's understanding, is also not going to help much here as it basically introduces another timeout (aka potential delay). I am not trying to pass a judgment on RACK here (and as far as I understand it solves a different problem, by making TCP more tolerant against mild reordering it will increase bandwidth utilization and reduce delays from having to slow down, but it will do nothing for the perceived latency from re-odered packets), all I want to understand how this is going to work in the context of "ultra-low latency" (a term from the L4S RFCs that I believe to be a tad too much)? > In addition it costs more > memory in nodes that potentially need to transmit 10s of GByte of data. Sure, but such is life, this reminds me a bit on debates about cache coherency and un-aligned memory accesses, the hardware people seem to argue life would be simpler/faster if these could be traded-in, but that simply pushes cost and complexity on the software side ;) Best Regards Sebastian P.S.: What is the best place to discuss L4S? Certainly not this mailing list, or? > > /Ingemar > ====== > > Date: Tue, 12 Mar 2019 21:39:42 -0700 (PDT) > From: David Lang <david@lang.hm> > To: Sebastian Moeller <moeller0@gmx.de> > Cc: Mikael Abrahamsson <swmike@swm.pp.se>, "Holland, Jake" > <jholland@akamai.com>, Cake List <cake@lists.bufferbloat.net>, > "codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>, bloat > <bloat@lists.bufferbloat.net>, "ecn-sane@lists.bufferbloat.net" > <ecn-sane@lists.bufferbloat.net> > Subject: Re: [Bloat] [Cake] The "Some Congestion Experienced" ECN > codepoint - a new internet draft - > Message-ID: <nycvar.QRO.7.76.6.1903122137430.6242@qynat-yncgbc> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > On Mon, 11 Mar 2019, Sebastian Moeller wrote: > >> How is packet reordering for anybody but the folks responsible for >> operating the "conduits" in any way attractive? > > It's more that not worrying about maintaining the order, and just moving the > packets as fast as possible reduces the overhead. > > The majority of the time, packets will be in order, but race conditions and > corner cases are allowed to forward packets out of order rather than having > the delay some packets to maintain the order. > > David Lang > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-14 19:23 ` Greg White @ 2019-03-14 21:43 ` Sebastian Moeller 2019-03-14 22:05 ` David Lang 2019-03-17 10:23 ` Carsten Bormann 0 siblings, 2 replies; 13+ messages in thread From: Sebastian Moeller @ 2019-03-14 21:43 UTC (permalink / raw) To: Greg White; +Cc: Ingemar Johansson S, bloat Hi Greg, > On Mar 14, 2019, at 20:23, Greg White <g.white@CableLabs.com> wrote: > > Sebastian, > > The latency benefit of eliminating the in-order delivery assumption comes from the fact that L2 links aren't reordering within each microflow. They are reordering on the link as a whole. So, (e.g.) packets from microflows x, y and z can be held up in a resequencing buffer waiting for a packet from microflow w. So, it is a definite latency benefit for flows x, y & z if this can be shut off. I see, that I can understand; it is also sad, that nobody thought about doing the reordering per flow, which would reduce one of the cited costs for doing re-sorting, namely increased memory requirement for worst-case queueing, no? > > Philosophically, since protocols and applications can vary in their need for in-order delivery, why does it not make sense to rely on the applications/protocols that need in-order delivery to implement their own resequencing? As I tried to convey, if the local ARQ is considerably faster than the bottleneck link for each flow, local re-ordering comes at a considerably lower intra-flow latency cost than re-ordering past the bottleneck link. And one could argue, if a specific link technology is prone to introduce reordering due to retransmit it might as well try to clean up after itself... To my knowledge most traffic is currently TCP, and TCP has strict ordering requirements, and even if clever techniques like RACK can make it more robust against reordering, the applications will see the full latency hit incurred by re-sorting re-ordered packets in the receivers TCP stack, no? Anyway, thanks for helping me understand the issues here, as always reality trumps theoretical musings... > > -Greg > > > On 3/14/19, 2:43 AM, "Bloat on behalf of Sebastian Moeller" <bloat-bounces@lists.bufferbloat.net on behalf of moeller0@gmx.de> wrote: > > Hi, > > >> On Mar 14, 2019, at 09:26, Ingemar Johansson S <ingemar.s.johansson@ericsson.com> wrote: >> >> Hi >> >> In addition to the below, in NR (=New Radio, a part of 5G) the RLC layer will >> no longer ensure >> in sequence delivery to higher layers. Packet reordering can occur on the MAC >> layer as several HARQ (Hybrid ARQ) run simultaneously to transmit packets, >> some processes need to retransmit and there you get packet reordering. > > Unfortunate... > >> The PDCP layer can however (optionally) enforce in sequence delivery, >> personally I am sceptic about the benefits of this as it adds extra HoL >> blocking to solve a problem that RACK can solve. > > In the context of the L4S (over-) promises it can not IMHO, unless the lossy link is slower than the internet access link. My rationale is that direct retransmits of packets that did not pass the current "physical" connection and sorting at the remote end of the link before passing the packets on, is only going to introduce more (sorting-)delay then sending out of order and expect the receiver to put things in sequence again, IFF the retransmit process takes the order of time as the transfer of the un-ordered packets over the bottleneck access link. As far as I understand L$S is all about reducing application visible latency, and RACK, in my layman's understanding, is also not going to help much here as it basically introduces another timeout (aka potential delay). I am not trying to pass a judgment on RACK here (and as far as I understand it solves a different problem, by making TCP more tolerant against mild reordering it will increase bandwidth utilization and reduce delays from having to slow down, but it will do nothing for the perceived latency from re-odered packets), all I want to understand how this is going to work in the context of "ultra-low latency" (a term from the L4S RFCs that I believe to be a tad too much)? > > >> In addition it costs more >> memory in nodes that potentially need to transmit 10s of GByte of data. > > Sure, but such is life, this reminds me a bit on debates about cache coherency and un-aligned memory accesses, the hardware people seem to argue life would be simpler/faster if these could be traded-in, but that simply pushes cost and complexity on the software side ;) > > Best Regards > Sebastian > > P.S.: What is the best place to discuss L4S? Certainly not this mailing list, or? > > > >> >> /Ingemar >> ====== >> >> Date: Tue, 12 Mar 2019 21:39:42 -0700 (PDT) >> From: David Lang <david@lang.hm> >> To: Sebastian Moeller <moeller0@gmx.de> >> Cc: Mikael Abrahamsson <swmike@swm.pp.se>, "Holland, Jake" >> <jholland@akamai.com>, Cake List <cake@lists.bufferbloat.net>, >> "codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>, bloat >> <bloat@lists.bufferbloat.net>, "ecn-sane@lists.bufferbloat.net" >> <ecn-sane@lists.bufferbloat.net> >> Subject: Re: [Bloat] [Cake] The "Some Congestion Experienced" ECN >> codepoint - a new internet draft - >> Message-ID: <nycvar.QRO.7.76.6.1903122137430.6242@qynat-yncgbc> >> Content-Type: text/plain; charset=US-ASCII; format=flowed >> >> On Mon, 11 Mar 2019, Sebastian Moeller wrote: >> >>> How is packet reordering for anybody but the folks responsible for >>> operating the "conduits" in any way attractive? >> >> It's more that not worrying about maintaining the order, and just moving the >> packets as fast as possible reduces the overhead. >> >> The majority of the time, packets will be in order, but race conditions and >> corner cases are allowed to forward packets out of order rather than having >> the delay some packets to maintain the order. >> >> David Lang >> >> >> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-14 21:43 ` Sebastian Moeller @ 2019-03-14 22:05 ` David Lang 2019-03-16 22:59 ` Michael Richardson 2019-03-17 10:23 ` Carsten Bormann 1 sibling, 1 reply; 13+ messages in thread From: David Lang @ 2019-03-14 22:05 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Greg White, Ingemar Johansson S, bloat On Thu, 14 Mar 2019, Sebastian Moeller wrote: > As I tried to convey, if the local ARQ is considerably faster than the > bottleneck link for each flow, local re-ordering comes at a considerably lower > intra-flow latency cost than re-ordering past the bottleneck link. And one > could argue, if a specific link technology is prone to introduce reordering > due to retransmit it might as well try to clean up after itself... As soon as you introduce parallelism (either multiple cores working on the traffic or multiple paths, including different frequencies in the medium) then strict ordering starts becomingexpensive > To my knowledge most traffic is currently TCP, and TCP has strict ordering > requirements, and even if clever techniques like RACK can make it more robust > against reordering, the applications will see the full latency hit incurred by > re-sorting re-ordered packets in the receivers TCP stack, no? not necessarily. which is going to take longer, having packets sit in a buffer somewhere in the network until they get re-ordered, or having them sit in the buffer on the target machine until they get re-ordered? if there is no resouce contention, they should be equal. In practice, since the network devices are more likely to run into resource contention (think locking overhead between cores if nothing else), it can easily be faster to sort them at the destination. David Lang ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-14 22:05 ` David Lang @ 2019-03-16 22:59 ` Michael Richardson 0 siblings, 0 replies; 13+ messages in thread From: Michael Richardson @ 2019-03-16 22:59 UTC (permalink / raw) To: David Lang; +Cc: Sebastian Moeller, Ingemar Johansson S, bloat [-- Attachment #1: Type: text/plain, Size: 1108 bytes --] David Lang <david@lang.hm> wrote: > if there is no resouce contention, they should be equal. > In practice, since the network devices are more likely to run into resource > contention (think locking overhead between cores if nothing else), it can > easily be faster to sort them at the destination. The problem has been, as I understand it, is that many historic TCP receivers think that receipt of packet X+n without having seen X, means that there is a loss. This can be solved with appropriate tuning of n, and by how long to wait, but this has usually required some uber-expert action. It seems to me that it could also be learnt heuristically how many might be out of order by observing how long it takes to see packet X. (Perhaps the newer stacks do this... when it comes to latest TCPs algorithms, I'm strictly in the gawking section) -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | IoT architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-14 21:43 ` Sebastian Moeller 2019-03-14 22:05 ` David Lang @ 2019-03-17 10:23 ` Carsten Bormann 2019-03-17 11:45 ` Sebastian Moeller 1 sibling, 1 reply; 13+ messages in thread From: Carsten Bormann @ 2019-03-17 10:23 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Greg White, Ingemar Johansson S, bloat On Mar 14, 2019, at 22:43, Sebastian Moeller <moeller0@gmx.de> wrote: > > if a specific link technology is prone to introduce reordering due to retransmit it might as well try to clean up after itself The end-to-end argument applies: Ultimately, there needs to be resequencing at the end anyway, so any reordering in the network would be a performance optimization. It turns out that keeping packets lying around in some buffer somewhere in the network just to do resequencing before they exit an L2 domain (or a tunnel) is a pessimization, not an optimization. For three decades now, we have acted as if there is no cost for in-order delivery from L2 — not because that is true, but because deployed transport protocol implementations were built and tested with simple links that don’t reorder. Techniques for ECMP (equal-cost multi-path) have been developed that appease that illusion, but they actually also are pessimizations at least in some cases. The question at hand is whether we can make the move back to end-to-end resequencing techniques that work well, at least within some limits that we still have to find. That probably requires some evolution at the end-to-end transport implementation layer. We are in a better position to make that happen than we have been for a long time. Grüße, Carsten ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-17 10:23 ` Carsten Bormann @ 2019-03-17 11:45 ` Sebastian Moeller 2019-03-17 14:34 ` Carsten Bormann 0 siblings, 1 reply; 13+ messages in thread From: Sebastian Moeller @ 2019-03-17 11:45 UTC (permalink / raw) To: Carsten Bormann; +Cc: Greg White, Ingemar Johansson S, bloat Hi Carsten, > On Mar 17, 2019, at 11:23, Carsten Bormann <cabo@tzi.org> wrote: > > On Mar 14, 2019, at 22:43, Sebastian Moeller <moeller0@gmx.de> wrote: >> >> if a specific link technology is prone to introduce reordering due to retransmit it might as well try to clean up after itself > > The end-to-end argument applies: Ultimately, there needs to be resequencing at the end anyway, so any reordering in the network would be a performance optimization. It turns out that keeping packets lying around in some buffer somewhere in the network just to do resequencing before they exit an L2 domain (or a tunnel) is a pessimization, not an optimization. I do not buy the end to end argument here, because in the extreme why do ARQ on individual links anyway, we can just leave it to the end-points to do the ARQ and TCP does anyway. The point is transport-ARQ allows to use link technologies that otherwise would not be acceptable at all. So doing ARQ on the individual links already indicates that somethings are more efficient to not only do e2e. I just happen to think that re-ordering falls into the same category, at least for users stuck behind a slow link as is typical at the edge of the internet. To put numbers to my example, assume I am on a 1/1 Mbps link and I get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the numbers approximate) and I get a burst of say 10 packets containing say 10 individual messages for my application telling the position of say an object in 3d space each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 b/packet ) / (1000 * 1000 b/s) = 12 ms So I get access to messages/new positions every 12 ms and I can display this smoothly Now if the first packet gets r-odered to be last, I either drop that packet and accept a 12 ms gap or if that is not an option I get to wait 9*12 = 108ms before positions can be updated, that IMHO shows why re-ordering is terrible even if TCP would be more tolerant. Especially in the context of L4S something like this seems to be totally unacceptable if ultra-low latency is supposed to be anything more than marketing. > > For three decades now, we have acted as if there is no cost for in-order delivery from L2 — not because that is true, but because deployed transport protocol implementations were built and tested with simple links that don’t reorder. Well, that is similar to the argument for performing non-aligned loads fast in hardware, yes this comes with a considerable cost in complexity and it is harder to make this go fast than just allowing aligned loads and fixing up unaligned loads by trapping to software, but from a user perspective the fast hardware beats the fickle only make aligned loads go fast approach any old day. > Techniques for ECMP (equal-cost multi-path) have been developed that appease that illusion, but they actually also are pessimizations at least in some cases. Sure, but if I understand correctly, this is partly due to the fact that transport people opted not to do the re-sorting on a flow-by-flow basis; that would solve the blocking issue from the transport perspective, sure the affected flow would still suffer from some increased delay, but as I tried to show above that might be still smaller than the delay incurred by doing the re-sorting after the bottleneck link. What is wrong with my analysis? > > The question at hand is whether we can make the move back to end-to-end resequencing techniques that work well, But we can not, we can make TCP more robust, but what I predict if RACK allows for 100ms delay transports will take this as the new the new goal and will keep pushing against that limit; and all in the name of bandwidth over latency. > at least within some limits that we still have to find. > That probably requires some evolution at the end-to-end transport implementation layer. We are in a better position to make that happen than we have been for a long time. Probably true, but also not very attractive from an end-user perspective.... unless this will allow transport innovations that will allow massively more bandwidth at a smallish latency cost. Best Regards Sebastian > > Grüße, Carsten > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-17 11:45 ` Sebastian Moeller @ 2019-03-17 14:34 ` Carsten Bormann 2019-03-17 15:56 ` Sebastian Moeller 0 siblings, 1 reply; 13+ messages in thread From: Carsten Bormann @ 2019-03-17 14:34 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Greg White, Ingemar Johansson S, bloat >> The end-to-end argument applies: Ultimately, there needs to be resequencing at the end anyway, so any reordering in the network would be a performance optimization. It turns out that keeping packets lying around in some buffer somewhere in the network just to do resequencing before they exit an L2 domain (or a tunnel) is a pessimization, not an optimization. > > I do not buy the end to end argument here, because in the extreme why do ARQ on individual links anyway, we can just leave it to the end-points to do the ARQ and TCP does anyway. The optimization is that the retransmission on a single link (or within a path segment, which is what I’m interested in) does not need to span the entire end-to-end path. That is strictly better than an end-to-end retransmission. Also, a local segment may allow faster recovery by not implicating the entire e2e latency, which allows for strictly better latency. So, yes, there are significant optimizations in doing local retransmissions, but there are also interesting interactions with end-to-end retransmission that need to be taken care of. This has been known for a long time, e.g., see https://tools.ietf.org/html/rfc3819#section-8 which documents things that were considered to be well known in the early 2000s. > The point is transport-ARQ allows to use link technologies that otherwise would not be acceptable at all. So doing ARQ on the individual links already indicates that somethings are more efficient to not only do e2e. Obviously. > I just happen to think that re-ordering falls into the same category, at least for users stuck behind a slow link as is typical at the edge of the internet. Resequencing (which is the term I prefer for putting things back in sequence again, after they have been reordered) requires storing packets that are ahead of later packets. This is strictly suboptimal if these packets could be delivered instead (in contrast, it *is* a good idea to resequence packets that are in a queue waiting for a transmission opportunity). So *requiring*(*) local path segments to resequence is strictly suboptimal. (*) even if this is not a strict requirement, but just a statement of the form “the transport will be much more efficient if you deliver in order”. > To put numbers to my example, assume I am on a 1/1 Mbps link and I get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the numbers approximate) and I get a burst of say 10 packets containing say 10 individual messages for my application telling the position of say an object in 3d space > > each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 b/packet ) / (1000 * 1000 b/s) = 12 ms > So I get access to messages/new positions every 12 ms and I can display this smoothly That is already broken by design. If you are not accounting for latency variation (“jitter”), you won’t be able to deal with it. Your example also makes sure it does not work well by being based on 100 % utilization. > Now if the first packet gets r-odered to be last, I either drop that packet …which is another nice function the network could do for you before expending further resources on useless delivery; see e.g. draft-ietf-6lo-deadline-time for one way to do this. > and accept a 12 ms gap or if that is not an option I get to wait 9*12 = 108ms before positions can be updated, that IMHO shows why re-ordering is terrible even if TCP would be more tolerant. You are assuming that the network can magically resequence a packet into place that it does not have. Now I do understand that forwarding an out-of-order packet will block the output port for the time needed to serialize it. So if you get it right before what would have been an in-order packet, the latter incurs additional latency. Note that this requires a bottleneck configuration, i.e., packets to be forwarded arrive faster than they can be serialized out. Don’t do bottlenecks if you want ultra-low latency. (And don’t do links where you need to retransmit, either.) > Especially in the context of L4S something like this seems to be totally unacceptable if ultra-low latency is supposed to be anything more than marketing. Dropping packets that can’t be used anyway is strictly better than delivering them. But apart from that, forwarding packets that I have is strictly better for low latency than leaving the output port idle and waiting for previous-in-order packets to send them out in sequence. >> For three decades now, we have acted as if there is no cost for in-order delivery from L2 — not because that is true, but because deployed transport protocol implementations were built and tested with simple links that don’t reorder. > > Well, that is similar to the argument for performing non-aligned loads fast in hardware, yes this comes with a considerable cost in complexity and it is harder to make this go fast than just allowing aligned loads and fixing up unaligned loads by trapping to software, but from a user perspective the fast hardware beats the fickle only make aligned loads go fast approach any old day. CPUs have an abundance of transistors you can throw at this problem so the support of unaligned loads has become standard practice for CPUs with enough transistors. I’m not sure this argument transfers, because this is not about transistors (except maybe when we talk about in-queue resequencing, which would be a nice feature if we had information in the packets to allow it). >> Techniques for ECMP (equal-cost multi-path) have been developed that appease that illusion, but they actually also are pessimizations at least in some cases. > > Sure, but if I understand correctly, this is partly due to the fact that transport people opted not to do the re-sorting on a flow-by-flow basis; that would solve the blocking issue from the transport perspective, sure the affected flow would still suffer from some increased delay, but as I tried to show above that might be still smaller than the delay incurred by doing the re-sorting after the bottleneck link. What is wrong with my analysis? Transport people have no control over what is happening in the network, so maybe I don’t understand the argument. >> The question at hand is whether we can make the move back to end-to-end resequencing techniques that work well, > > But we can not, we can make TCP more robust, but what I predict if RACK allows for 100ms delay transports will take this as the new the new goal and will keep pushing against that limit; and all in the name of bandwidth over latency. Where does this number come from? 100 ms is pretty long as a reordering maximum for most paths outside of satellite links. Instead, you would do something based on an RTT estimate. >> at least within some limits that we still have to find. >> That probably requires some evolution at the end-to-end transport implementation layer. We are in a better position to make that happen than we have been for a long time. > > Probably true, but also not very attractive from an end-user perspective…. unless this will allow transport innovations that will allow massively more bandwidth at a smallish latency cost. The argument against in-network resequencing is mostly a latency argument (but, as a second order effect, that reduced latency may also allow more throughput), so, again, I don’t quite understand. Grüße, Carsten ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-17 14:34 ` Carsten Bormann @ 2019-03-17 15:56 ` Sebastian Moeller 2019-03-17 17:09 ` Carsten Bormann 0 siblings, 1 reply; 13+ messages in thread From: Sebastian Moeller @ 2019-03-17 15:56 UTC (permalink / raw) To: Carsten Bormann; +Cc: Greg White, Ingemar Johansson S, bloat Hi Carsten, thanks for your insights. > On Mar 17, 2019, at 15:34, Carsten Bormann <cabo@tzi.org> wrote: > >>> The end-to-end argument applies: Ultimately, there needs to be resequencing at the end anyway, so any reordering in the network would be a performance optimization. It turns out that keeping packets lying around in some buffer somewhere in the network just to do resequencing before they exit an L2 domain (or a tunnel) is a pessimization, not an optimization. >> >> I do not buy the end to end argument here, because in the extreme why do ARQ on individual links anyway, we can just leave it to the end-points to do the ARQ and TCP does anyway. > > The optimization is that the retransmission on a single link (or within a path segment, which is what I’m interested in) does not need to span the entire end-to-end path. That is strictly better than an end-to-end retransmission. I agree, and by the same logic local resequencing is also better, unless the re-ordering event happened at the bottleneck link. > Also, a local segment may allow faster recovery by not implicating the entire e2e latency, which allows for strictly better latency. > So, yes, there are significant optimizations in doing local retransmissions, but there are also interesting interactions with end-to-end retransmission that need to be taken care of. This has been known for a long time, e.g., see https://tools.ietf.org/html/rfc3819#section-8 which documents things that were considered to be well known in the early 2000s. Thanks, but my understanding of this is basically that a link should just drop a packet unless it can be retransmitted with reasonable effort (like the G.INP retransmissiond on dsl-links will give up); sure we can argue about what "reasonable effort" is in reality, but I fear if we move away from 3 dupACKs to say X ms all transport links will assume they have leewway to allow re-ordering close to X, that will certainly be worse than today. And since I am an end-user and do not operate a transport network, I know what I prefer here... > >> The point is transport-ARQ allows to use link technologies that otherwise would not be acceptable at all. So doing ARQ on the individual links already indicates that somethings are more efficient to not only do e2e. > > Obviously. > >> I just happen to think that re-ordering falls into the same category, at least for users stuck behind a slow link as is typical at the edge of the internet. > > Resequencing (which is the term I prefer for putting things back in sequence again, after they have been reordered) requires storing packets that are ahead of later packets. Obviously. > This is strictly suboptimal if these packets could be delivered instead (in contrast, it *is* a good idea to resequence packets that are in a queue waiting for a transmission opportunity). Fair enough, but that basically expects the bottleneck link that actually accumulates a queue to do the heavy lifting, not sure that the economic incentives are properly aligned here. > So *requiring*(*) local path segments to resequence is strictly suboptimal. > > (*) even if this is not a strict requirement, but just a statement of the form “the transport will be much more efficient if you deliver in order”. My point is the transport will much more useful if if undertakes (reasonable) effort to deliver in-order, that is slight;y different, and I understand that those responsible for transport networks have a different viewpoint on this. > >> To put numbers to my example, assume I am on a 1/1 Mbps link and I get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the numbers approximate) and I get a burst of say 10 packets containing say 10 individual messages for my application telling the position of say an object in 3d space >> >> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 b/packet ) / (1000 * 1000 b/s) = 12 ms >> So I get access to messages/new positions every 12 ms and I can display this smoothly > > That is already broken by design. Does not matter much, a well designed network should also allow to do stupid things... > If you are not accounting for latency variation (“jitter”), you won’t be able to deal with it. Which would just complicate the issue a bit if we would introduce a say 25 ms de-jitter buffer without affecting the gist of it. > Your example also makes sure it does not work well by being based on 100 % utilization. Same here, access links certainly run closer to 100% utilization than core links, so operation at full saturation is not completely unrealistic, but I really just set it up that way for clarity. > >> Now if the first packet gets r-odered to be last, I either drop that packet > > …which is another nice function the network could do for you before expending further resources on useless delivery; see e.g. draft-ietf-6lo-deadline-time for one way to do this. Yes, but typically I do not want the network to do this, as I would be quite interested in knowing how much too late the packet arrived. > >> and accept a 12 ms gap or if that is not an option I get to wait 9*12 = 108ms before positions can be updated, that IMHO shows why re-ordering is terrible even if TCP would be more tolerant. > > You are assuming that the network can magically resequence a packet into place that it does not have. All I expect is that the network makes a reasonable effort to undo re-ordering close to where re-ordering happened. > > Now I do understand that forwarding an out-of-order packet will block the output port for the time needed to serialize it. So if you get it right before what would have been an in-order packet, the latter incurs additional latency. Note that this requires a bottleneck configuration, i.e., packets to be forwarded arrive faster than they can be serialized out. Don’t do bottlenecks if you want ultra-low latency. (And don’t do links where you need to retransmit, either.) I agree, but that is live with a home internet access link, the bottleneck is there. This also points out a problem with the L4S argument for end-users, as the ultra-low latency (their words, not mine) will not realize for end-users close to what the project seems to promise. > >> Especially in the context of L4S something like this seems to be totally unacceptable if ultra-low latency is supposed to be anything more than marketing. > > Dropping packets that can’t be used anyway is strictly better than delivering them. Well, not for L4S, as TCP Praque is supposed to fall back to legacy congestion control behavior upon encountering packet drops... > But apart from that, forwarding packets that I have is strictly better for low latency than leaving the output port idle and waiting for previous-in-order packets to send them out in sequence. It really depends what we mean when we talk about latency here, as shown for and end-user that might be quite different... > >>> For three decades now, we have acted as if there is no cost for in-order delivery from L2 — not because that is true, but because deployed transport protocol implementations were built and tested with simple links that don’t reorder. >> >> Well, that is similar to the argument for performing non-aligned loads fast in hardware, yes this comes with a considerable cost in complexity and it is harder to make this go fast than just allowing aligned loads and fixing up unaligned loads by trapping to software, but from a user perspective the fast hardware beats the fickle only make aligned loads go fast approach any old day. > > CPUs have an abundance of transistors you can throw at this problem so the support of unaligned loads has become standard practice for CPUs with enough transistors. > I’m not sure this argument transfers, because this is not about transistors (except maybe when we talk about in-queue resequencing, which would be a nice feature if we had information in the packets to allow it). Like the 5-tuple in TCP and UDP? This example was not meant to taken literally, but just to illustrate that depending on the level of observation speeding up one domain can have an noticeable effect on another one, but might still be worth the effort. > >>> Techniques for ECMP (equal-cost multi-path) have been developed that appease that illusion, but they actually also are pessimizations at least in some cases. >> >> Sure, but if I understand correctly, this is partly due to the fact that transport people opted not to do the re-sorting on a flow-by-flow basis; that would solve the blocking issue from the transport perspective, sure the affected flow would still suffer from some increased delay, but as I tried to show above that might be still smaller than the delay incurred by doing the re-sorting after the bottleneck link. What is wrong with my analysis? > > Transport people have no control over what is happening in the network, so maybe I don’t understand the argument. If the remote end of a potentially re-ordering link would implement fair-queueing (a big if, sure) then it should be easy to only stall the flows that have outstanding packets, and this could be solely be based on the local retransmit ACKs so the link would only need to clean up its own re-orderings an could just faithfully relay a flow that entered the link already with re-ordering. This might in reality not be feasible at all... > >>> The question at hand is whether we can make the move back to end-to-end resequencing techniques that work well, >> >> But we can not, we can make TCP more robust, but what I predict if RACK allows for 100ms delay transports will take this as the new the new goal and will keep pushing against that limit; and all in the name of bandwidth over latency. > > Where does this number come from? 100 ms is pretty long as a reordering maximum for most paths outside of satellite links. Instead, you would do something based on an RTT estimate. I just made that number up as the exact N does not matter, the argument is what ever we set as the new threshold will be approached by transport characteristics. Then again havin something that inversely scales with bandwidth is certainly terrible from a transport perspective, so I can understand the argument for a fixed temporal threshold. > >>> at least within some limits that we still have to find. >>> That probably requires some evolution at the end-to-end transport implementation layer. We are in a better position to make that happen than we have been for a long time. >> >> Probably true, but also not very attractive from an end-user perspective…. unless this will allow transport innovations that will allow massively more bandwidth at a smallish latency cost. > > The argument against in-network resequencing is mostly a latency argument (but, as a second order effect, that reduced latency may also allow more throughput), so, again, I don’t quite understand. As I tried to show for TCP the flow with re-ordered packets certainly pays a latency cost that especially if re-ordering does not happen on the bottleneck link but at a faster link could be smaller. Gruss Sebastian > > Grüße, Carsten > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-17 15:56 ` Sebastian Moeller @ 2019-03-17 17:09 ` Carsten Bormann 2019-03-17 19:57 ` Sebastian Moeller 0 siblings, 1 reply; 13+ messages in thread From: Carsten Bormann @ 2019-03-17 17:09 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Greg White, Ingemar Johansson S, bloat >> >>>> The end-to-end argument applies: Ultimately, there needs to be resequencing at the end anyway, so any reordering in the network would be a performance optimization. It turns out that keeping packets lying around in some buffer somewhere in the network just to do resequencing before they exit an L2 domain (or a tunnel) is a pessimization, not an optimization. >>> >>> I do not buy the end to end argument here, because in the extreme why do ARQ on individual links anyway, we can just leave it to the end-points to do the ARQ and TCP does anyway. >> >> The optimization is that the retransmission on a single link (or within a path segment, which is what I’m interested in) does not need to span the entire end-to-end path. That is strictly better than an end-to-end retransmission. > > I agree, and by the same logic local resequencing is also better, Non sequitur. The same logic simply does not apply. A resequenced packet consumes the same transmission resources. (It also consumes more buffer resources. So it is strictly worse when just looking at network resources expended, which is the basis for the kind of logic applied here.) > unless the re-ordering event happened at the bottleneck link. Not sure how this comes in now. >> Also, a local segment may allow faster recovery by not implicating the entire e2e latency, which allows for strictly better latency. >> So, yes, there are significant optimizations in doing local retransmissions, but there are also interesting interactions with end-to-end retransmission that need to be taken care of. This has been known for a long time, e.g., see https://tools.ietf.org/html/rfc3819#section-8 which documents things that were considered to be well known in the early 2000s. > > Thanks, but my understanding of this is basically that a link should just drop a packet unless it can be retransmitted with reasonable effort (like the G.INP retransmissiond on dsl-links will give up); sure we can argue about what "reasonable effort" is in reality, but I fear if we move away from 3 dupACKs to say X ms all transport links will assume they have leewway to allow re-ordering close to X, that will certainly be worse than today. And since I am an end-user and do not operate a transport network, I know what I prefer here… I’m sorry, I grew up as transport layer guy, so “transport” means L4 (transport layer) for me, not “transport network”. You may want to re-read my sentences with that knowledge; they might make more sense. >> Resequencing (which is the term I prefer for putting things back in sequence again, after they have been reordered) requires storing packets that are ahead of later packets. > > Obviously. > >> This is strictly suboptimal if these packets could be delivered instead (in contrast, it *is* a good idea to resequence packets that are in a queue waiting for a transmission opportunity). > > Fair enough, but that basically expects the bottleneck link that actually accumulates a queue to do the heavy lifting, not sure that the economic incentives are properly aligned here. It can actually do so more easily, because the speeds are lower. But deployment economy arguments are interesting as well; I was making theoretical arguments first. >> So *requiring*(*) local path segments to resequence is strictly suboptimal. >> >> (*) even if this is not a strict requirement, but just a statement of the form “the transport will be much more efficient if you deliver in order”. > > My point is the transport will much more useful if if undertakes (reasonable) effort to deliver in-order, Please re-read as advised above. > that is slight;y different, and I understand that those responsible for transport networks have a different viewpoint on this. > >> >>> To put numbers to my example, assume I am on a 1/1 Mbps link and I get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the numbers approximate) and I get a burst of say 10 packets containing say 10 individual messages for my application telling the position of say an object in 3d space >>> >>> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 b/packet ) / (1000 * 1000 b/s) = 12 ms >>> So I get access to messages/new positions every 12 ms and I can display this smoothly >> >> That is already broken by design. > > Does not matter much, a well designed network should also allow to do stupid things… Sure, but it won’t work very well then (and there is no point in optimizing for that — remember: all in-network work is just an optimization under the end-to-end principle). >> If you are not accounting for latency variation (“jitter”), you won’t be able to deal with it. > > Which would just complicate the issue a bit if we would introduce a say 25 ms de-jitter buffer without affecting the gist of it. That buffer increases the total latency but also the (useful) packet delivery rate in the presence of reordering. >> Your example also makes sure it does not work well by being based on 100 % utilization. > > Same here, access links certainly run closer to 100% utilization than core links, so operation at full saturation is not completely unrealistic, but I really just set it up that way for clarity. Please use an example that is more realistic. >>> Now if the first packet gets r-odered to be last, I either drop that packet >> >> …which is another nice function the network could do for you before expending further resources on useless delivery; see e.g. draft-ietf-6lo-deadline-time for one way to do this. > > Yes, but typically I do not want the network to do this, as I would be quite interested in knowing how much too late the packet arrived. I don’t know how to make use of that knowledge, do you? Early discarding of a late packet (e.g., by not retransmitting it in the first place) is so much better. >>> and accept a 12 ms gap or if that is not an option I get to wait 9*12 = 108ms before positions can be updated, that IMHO shows why re-ordering is terrible even if TCP would be more tolerant. >> >> You are assuming that the network can magically resequence a packet into place that it does not have. > > All I expect is that the network makes a reasonable effort to undo re-ordering close to where re-ordering happened. All I’m trying to say is that this is bad engineering, apparently perpetuated by bad transport layer implementations. >> Now I do understand that forwarding an out-of-order packet will block the output port for the time needed to serialize it. So if you get it right before what would have been an in-order packet, the latter incurs additional latency. Note that this requires a bottleneck configuration, i.e., packets to be forwarded arrive faster than they can be serialized out. Don’t do bottlenecks if you want ultra-low latency. (And don’t do links where you need to retransmit, either.) > > I agree, but that is live with a home internet access link, the bottleneck is there. This also points out a problem with the L4S argument for end-users, as the ultra-low latency (their words, not mine) will not realize for end-users close to what the project seems to promise. I think reordering is not really a problem for ultra-low latency, or more specifically, once reordering happens, you are no longer in the ultra-low latency domain, >>> Especially in the context of L4S something like this seems to be totally unacceptable if ultra-low latency is supposed to be anything more than marketing. >> >> Dropping packets that can’t be used anyway is strictly better than delivering them. > > Well, not for L4S, as TCP Praque is supposed to fall back to legacy congestion control behavior upon encountering packet drops… L4S is for reliable transport, which is a different scenario than the one that benefits a lot from deadlines for packets. (Well, deadlines might be used to make sure there is no dual retransmission, both local and end-to-end, but again, this is not where you would use L4S.) >> But apart from that, forwarding packets that I have is strictly better for low latency than leaving the output port idle and waiting for previous-in-order packets to send them out in sequence. > > It really depends what we mean when we talk about latency here, as shown for and end-user that might be quite different… Apart from the port blocking effect I talked about (which is mostly relevant for highly scheduled transmission schemes), I really have no idea how the end-to-end latency would benefit from sitting on packets while the port is idle. >>>> For three decades now, we have acted as if there is no cost for in-order delivery from L2 — not because that is true, but because deployed transport protocol implementations were built and tested with simple links that don’t reorder. >>> >>> Well, that is similar to the argument for performing non-aligned loads fast in hardware, yes this comes with a considerable cost in complexity and it is harder to make this go fast than just allowing aligned loads and fixing up unaligned loads by trapping to software, but from a user perspective the fast hardware beats the fickle only make aligned loads go fast approach any old day. >> >> CPUs have an abundance of transistors you can throw at this problem so the support of unaligned loads has become standard practice for CPUs with enough transistors. >> I’m not sure this argument transfers, because this is not about transistors (except maybe when we talk about in-queue resequencing, which would be a nice feature if we had information in the packets to allow it). > > Like the 5-tuple in TCP and UDP? That doesn’t help. I need a sequence number for resequencing, and I can’t use the transport layer one because that is being encrypted. Again, this is mostly theoretical as I don’t see people rushing to do in-queue resequencing any time soon. (Skipping some text that is not relevant to my argument here.) >> Where does this number come from? 100 ms is pretty long as a reordering maximum for most paths outside of satellite links. Instead, you would do something based on an RTT estimate. > > I just made that number up as the exact N does not matter, the argument is what ever we set as the new threshold will be approached by transport characteristics. Then again havin something that inversely scales with bandwidth is certainly terrible from a transport perspective, so I can understand the argument for a fixed temporal threshold. I don’t follow at all here. >>>> at least within some limits that we still have to find. >>>> That probably requires some evolution at the end-to-end transport implementation layer. We are in a better position to make that happen than we have been for a long time. >>> >>> Probably true, but also not very attractive from an end-user perspective…. unless this will allow transport innovations that will allow massively more bandwidth at a smallish latency cost. >> >> The argument against in-network resequencing is mostly a latency argument (but, as a second order effect, that reduced latency may also allow more throughput), so, again, I don’t quite understand. > > As I tried to show for TCP the flow with re-ordered packets certainly pays a latency cost that especially if re-ordering does not happen on the bottleneck link but at a faster link could be smaller. I can’t parse this sentence, but my main point remains: In-network resequencing increases latency (with a potential impact on throughput, too), unless it happens within a queue. We wouldn’t want to do that, unless forced by a transport protocol that can’t cope. If we can fix the transport protocols to enable (out-of-order) immediate forwarding, then let’s do it; this might also enable doing more in-network recovery, with the attendant performance improvements. Grüße, Carsten ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-17 17:09 ` Carsten Bormann @ 2019-03-17 19:57 ` Sebastian Moeller 2019-03-18 0:05 ` David Lang 0 siblings, 1 reply; 13+ messages in thread From: Sebastian Moeller @ 2019-03-17 19:57 UTC (permalink / raw) To: Carsten Bormann; +Cc: Greg White, Ingemar Johansson S, bloat Dear Carsten, please excuse my tortured logic, being from outside the field it seems I routinely misuse the nomenclature and sow confusion. > On Mar 17, 2019, at 18:09, Carsten Bormann <cabo@tzi.org> wrote: > >>> >>>>> The end-to-end argument applies: Ultimately, there needs to be resequencing at the end anyway, so any reordering in the network would be a performance optimization. It turns out that keeping packets lying around in some buffer somewhere in the network just to do resequencing before they exit an L2 domain (or a tunnel) is a pessimization, not an optimization. >>>> >>>> I do not buy the end to end argument here, because in the extreme why do ARQ on individual links anyway, we can just leave it to the end-points to do the ARQ and TCP does anyway. >>> >>> The optimization is that the retransmission on a single link (or within a path segment, which is what I’m interested in) does not need to span the entire end-to-end path. That is strictly better than an end-to-end retransmission. >> >> I agree, and by the same logic local resequencing is also better, > > Non sequitur. The same logic simply does not apply. A resequenced packet consumes the same transmission resources. (It also consumes more buffer resources. So it is strictly worse when just looking at network resources expended, which is the basis for the kind of logic applied here.) In my tortured example the latency cost of resequencing at the re-ordering fast link was much smaller than the latency cost of resequencing after traversing the bottleneck link, the typical situation for end users. My focus is on the latency visible at that point, I agree that for the intermediate hop it will be simpler to just forward those packets that traversed the link intact. > >> unless the re-ordering event happened at the bottleneck link. > > Not sure how this comes in now. This comes from my vantage point from the edge, I really am sympathetic to what the core network needs to do to maintain the illusion of temporal ordering, but I do really care for end to end latency more than allowing core routers to get away with less buffer memory. > >>> Also, a local segment may allow faster recovery by not implicating the entire e2e latency, which allows for strictly better latency. >>> So, yes, there are significant optimizations in doing local retransmissions, but there are also interesting interactions with end-to-end retransmission that need to be taken care of. This has been known for a long time, e.g., see https://tools.ietf.org/html/rfc3819#section-8 which documents things that were considered to be well known in the early 2000s. >> >> Thanks, but my understanding of this is basically that a link should just drop a packet unless it can be retransmitted with reasonable effort (like the G.INP retransmissiond on dsl-links will give up); sure we can argue about what "reasonable effort" is in reality, but I fear if we move away from 3 dupACKs to say X ms all transport links will assume they have leewway to allow re-ordering close to X, that will certainly be worse than today. And since I am an end-user and do not operate a transport network, I know what I prefer here… > > I’m sorry, I grew up as transport layer guy, so “transport” means L4 (transport layer) for me, not “transport network”. > You may want to re-read my sentences with that knowledge; they might make more sense. Sorry for my misuse of the nomenclature, will try to stick to transport network. > >>> Resequencing (which is the term I prefer for putting things back in sequence again, after they have been reordered) requires storing packets that are ahead of later packets. >> >> Obviously. >> >>> This is strictly suboptimal if these packets could be delivered instead (in contrast, it *is* a good idea to resequence packets that are in a queue waiting for a transmission opportunity). >> >> Fair enough, but that basically expects the bottleneck link that actually accumulates a queue to do the heavy lifting, not sure that the economic incentives are properly aligned here. > > It can actually do so more easily, because the speeds are lower. Tell that to the person paying for the CMTS/BNG, the issue is that queueing here sees to happen not directly at the edge but at centralized places, that will need to queue traffic for 10s of thousands of end-users. This still might be easier than in core. > But deployment economy arguments are interesting as well; I was making theoretical arguments first. > >>> So *requiring*(*) local path segments to resequence is strictly suboptimal. >>> >>> (*) even if this is not a strict requirement, but just a statement of the form “the transport will be much more efficient if you deliver in order”. >> >> My point is the transport will much more useful if if undertakes (reasonable) effort to deliver in-order, > > Please re-read as advised above. > >> that is slight;y different, and I understand that those responsible for transport networks have a different viewpoint on this. >> >>> >>>> To put numbers to my example, assume I am on a 1/1 Mbps link and I get TCP data at 1 Mbps rate and MTU1500 packets (I am going to keep the numbers approximate) and I get a burst of say 10 packets containing say 10 individual messages for my application telling the position of say an object in 3d space >>>> >>>> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 b/packet ) / (1000 * 1000 b/s) = 12 ms >>>> So I get access to messages/new positions every 12 ms and I can display this smoothly >>> >>> That is already broken by design. >> >> Does not matter much, a well designed network should also allow to do stupid things… > > Sure, but it won’t work very well then (and there is no point in optimizing for that — remember: all in-network work is just an optimization under the end-to-end principle). > >>> If you are not accounting for latency variation (“jitter”), you won’t be able to deal with it. >> >> Which would just complicate the issue a bit if we would introduce a say 25 ms de-jitter buffer without affecting the gist of it. > > That buffer increases the total latency but also the (useful) packet delivery rate in the presence of reordering. Yes, but IMHO it does not change the problem in a qualitative way. > >>> Your example also makes sure it does not work well by being based on 100 % utilization. >> >> Same here, access links certainly run closer to 100% utilization than core links, so operation at full saturation is not completely unrealistic, but I really just set it up that way for clarity. > > Please use an example that is more realistic. Why? Unless your argument is an additional 100ms of latency does not matter, I fail to see how the "realness" of my example is relevant. TCP will only pass data to the application after (internal) resequencing, and if we allow extreme out-of-order delivery that will show up as additional extreme latency for the TCP-using application. Now one can claim that TCP might be the wrong "transport" ( I assume that is the correct use of the term), but that effectively demotes TCP to a bulk transport while clearly it does a reasonable job even for mild real-time requirements (I note that even "once a day" is a real-time requirement, one that should be easily achievable with TCP). > >>>> Now if the first packet gets r-odered to be last, I either drop that packet >>> >>> …which is another nice function the network could do for you before expending further resources on useless delivery; see e.g. draft-ietf-6lo-deadline-time for one way to do this. >> >> Yes, but typically I do not want the network to do this, as I would be quite interested in knowing how much too late the packet arrived. > > I don’t know how to make use of that knowledge, do you? Well in a packet capture, seeing a packet out-of-sequence with a delay has more value than not seeing a packet at all, was that culled due to the deadline or was it lost for other reasons, and do I really care? In the first case it can be used for diagnosis in the second it can not. > Early discarding of a late packet (e.g., by not retransmitting it in the first place) is so much better. This seems to be a) pretty cool and b) restricted to a number of quite special use-caeses, no? This might be okay for real time packets (where the consumer needs to react with tight timing constraints), but even just for VoIP (a mild RT application) it might be better to be able to reconstruct intelligible speech with a massive delay than getting something garbled beyond recognition by dropped packets in a timely fashion. But I am sure there are examples where a deadline drop might be advantageous, it is just that none of my use-cases fall into that category. And I naively see only use for that if the bandwidth/tx-slot advantage gained from dropping instead of transmitting a packet is larger than the loss incurred from not getting the information at all, I would guess real-time control with lots of redundant sensors to be such a case, assuming we talk about rarely dropping a packet. > >>>> and accept a 12 ms gap or if that is not an option I get to wait 9*12 = 108ms before positions can be updated, that IMHO shows why re-ordering is terrible even if TCP would be more tolerant. >>> >>> You are assuming that the network can magically resequence a packet into place that it does not have. >> >> All I expect is that the network makes a reasonable effort to undo re-ordering close to where re-ordering happened. > > All I’m trying to say is that this is bad engineering, apparently perpetuated by bad transport layer implementations. ??? Now I am confuzed, to me engineering is all about balancing trade-offs, and this is all about how to evaluate different dimensions. I believe we agree that a network should not re-order packets artificially without some justification and we also agree that some level of re-ordering might be un-avoidable, we basically haggle over how much is acceptable. I also believe, and correct me if I am wrong, that we agree that with TCP endpoints prefer less over more en-passage re-ordering. So why is it bad for a transport layer to aim for pleasing its users (in my book the transport network only exists to allow end-to-end communication)? > >>> Now I do understand that forwarding an out-of-order packet will block the output port for the time needed to serialize it. So if you get it right before what would have been an in-order packet, the latter incurs additional latency. Note that this requires a bottleneck configuration, i.e., packets to be forwarded arrive faster than they can be serialized out. Don’t do bottlenecks if you want ultra-low latency. (And don’t do links where you need to retransmit, either.) >> >> I agree, but that is live with a home internet access link, the bottleneck is there. This also points out a problem with the L4S argument for end-users, as the ultra-low latency (their words, not mine) will not realize for end-users close to what the project seems to promise. > > I think reordering is not really a problem for ultra-low latency, or more specifically, once reordering happens, you are no longer in the ultra-low latency domain, In this thread it has been mentioned that L4S will allow more reordering by mandating participating hosts to implement RACK, so from a transport network's perspective the L4S identifier can be seen as a license to allow more re-ordering. This runs afoul of L4S's claim of ultra-low latency, I agree, but I do not have to square that circle. But in the context of this discussion we have the transport network that re-orders packets simply because this apparently makes the network more efficient and L4S allows it, which according to your argument would break the L4S stated goal of low latency. I can not believe that this is the L4S position in regards to re-ordering (reading https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#page-23 A.1.7. Measuring Reordering Tolerance in Time Units tells me they fail to actually make the connection between reordering and increased latency for the affected flow). > >>>> Especially in the context of L4S something like this seems to be totally unacceptable if ultra-low latency is supposed to be anything more than marketing. >>> >>> Dropping packets that can’t be used anyway is strictly better than delivering them. >> >> Well, not for L4S, as TCP Praque is supposed to fall back to legacy congestion control behavior upon encountering packet drops… > > L4S is for reliable transport, which is a different scenario than the one that benefits a lot from deadlines for packets. (Well, deadlines might be used to make sure there is no dual retransmission, both local and end-to-end, but again, this is not where you would use L4S.) > >>> But apart from that, forwarding packets that I have is strictly better for low latency than leaving the output port idle and waiting for previous-in-order packets to send them out in sequence. >> >> It really depends what we mean when we talk about latency here, as shown for and end-user that might be quite different… > > Apart from the port blocking effect I talked about (which is mostly relevant for highly scheduled transmission schemes), I really have no idea how the end-to-end latency would benefit from sitting on packets while the port is idle. Because as demonstrated with my toy example above that decision to send intact packets immediately might incur a visible delay-increse by 108 ms, so from the end-points perspective transmitting before re-sequencing can have a noticeable effect. > >>>>> For three decades now, we have acted as if there is no cost for in-order delivery from L2 — not because that is true, but because deployed transport protocol implementations were built and tested with simple links that don’t reorder. >>>> >>>> Well, that is similar to the argument for performing non-aligned loads fast in hardware, yes this comes with a considerable cost in complexity and it is harder to make this go fast than just allowing aligned loads and fixing up unaligned loads by trapping to software, but from a user perspective the fast hardware beats the fickle only make aligned loads go fast approach any old day. >>> >>> CPUs have an abundance of transistors you can throw at this problem so the support of unaligned loads has become standard practice for CPUs with enough transistors. >>> I’m not sure this argument transfers, because this is not about transistors (except maybe when we talk about in-queue resequencing, which would be a nice feature if we had information in the packets to allow it). >> >> Like the 5-tuple in TCP and UDP? > > That doesn’t help. I need a sequence number for resequencing, If you do ARQ on your link, you will in all likelihood have something equivalent as you need to identify the packets that need to be retransmitted. As my proposed goal is not generic re-sequencing, but simply not to introduce any additional re-ordering, that should be sufficient, no? > and I can’t use the transport layer one because that is being encrypted. Again, this is mostly theoretical as I don’t see people rushing to do in-queue resequencing any time soon. I guess you are right, but then please do not complain that you need to stay idle while waiting for retransmit, if that is a conscious trade-off you engineered into your system ;) > > (Skipping some text that is not relevant to my argument here.) > >>> Where does this number come from? 100 ms is pretty long as a reordering maximum for most paths outside of satellite links. Instead, you would do something based on an RTT estimate. >> >> I just made that number up as the exact N does not matter, the argument is what ever we set as the new threshold will be approached by transport characteristics. Then again havin something that inversely scales with bandwidth is certainly terrible from a transport perspective, so I can understand the argument for a fixed temporal threshold. > > I don’t follow at all here. Well the RACK RFC makes the same point, as I discovered later (https://tools.ietf.org/html/draft-ietf-tcpm-rack-04): "From a network or link designer's viewpoint, parallelization (eg. link bonding) is the easiest way to get a network to go faster. Therefore their main constraint on speed is reordering, and there is pressure to relax that constraint. If RACK becomes widely deployed, the underlying networks may introduce more reordering for higher throughput. But this may result in excessive reordering that hurts end to end performance: 1. End host packet processing: extreme reordering on high-speed networks would incur high CPU cost by greatly reducing the effectiveness of aggregation mechanisms, such as large receive offload (LRO) and generic receive offload (GRO), and significantly increasing the number of ACKs. 2. Congestion control: TCP congestion control implicitly assumes the feedback from ACKs are from the same bottleneck. Therefore it cannot handle well scenarios where packets are traversing largely disjoint paths. 3. Loss recovery: Having an excessively large reordering window to accommodate widely different latencies from different paths would increase the latency of loss recovery." I note that the benefit of reordering is all in the transport network, while the costs are all carried by the endpoints. With this kind of skewed incentives, what outcome do you expect. > >>>>> at least within some limits that we still have to find. >>>>> That probably requires some evolution at the end-to-end transport implementation layer. We are in a better position to make that happen than we have been for a long time. >>>> >>>> Probably true, but also not very attractive from an end-user perspective…. unless this will allow transport innovations that will allow massively more bandwidth at a smallish latency cost. >>> >>> The argument against in-network resequencing is mostly a latency argument (but, as a second order effect, that reduced latency may also allow more throughput), so, again, I don’t quite understand. >> >> As I tried to show for TCP the flow with re-ordered packets certainly pays a latency cost that especially if re-ordering does not happen on the bottleneck link but at a faster link could be smaller. > > I can’t parse this sentence, but my main point remains: > > In-network resequencing increases latency (with a potential impact on throughput, too), unless it happens within a queue. But this latency is not necessarily end-point visible latency, if a tree falls in a wood and no one is there to hear it, does it make a sound? > We wouldn’t want to do that, unless forced by a transport protocol that can’t cope. If we can fix the transport protocols to enable (out-of-order) immediate forwarding, then let’s do it; this might also enable doing more in-network recovery, with the attendant performance improvements. If all of this does not increase the end-point visible latency and latency variation (too much) I am all for it, but if it does I maintain that the network should serve its users not the other way around (easy to say if one's position is pure end-user, and the complexity of making it happen falls onto others). Anyway, thanks for your time and arguments and information, that gives me something to think about. Beste Gruesse Sebastian > > Grüße, Carsten > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) 2019-03-17 19:57 ` Sebastian Moeller @ 2019-03-18 0:05 ` David Lang 0 siblings, 0 replies; 13+ messages in thread From: David Lang @ 2019-03-18 0:05 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Carsten Bormann, Ingemar Johansson S, bloat [-- Attachment #1: Type: text/plain, Size: 2418 bytes --] On Sun, 17 Mar 2019, Sebastian Moeller wrote: >> All I’m trying to say is that this is bad engineering, apparently perpetuated >> by bad transport layer implementations. > > ??? Now I am confuzed, to me engineering is all about balancing > trade-offs, and this is all about how to evaluate different dimensions. I > believe we agree that a network should not re-order packets artificially > without some justification and we also agree that some level of re-ordering > might be un-avoidable, we basically haggle over how much is acceptable. I also > believe, and correct me if I am wrong, that we agree that with TCP endpoints > prefer less over more en-passage re-ordering. So why is it bad for a transport > layer to aim for pleasing its users (in my book the transport network only > exists to allow end-to-end communication)? what I am seeing is that you seem to be claiming that the network MUST NOT allow packets to pass each other on the network. from the network layer I am hearing that ensuring that packets always arrive in-order has a cost, in buffer space, in latency, and in processing overhead. It forces packet processing to be single threaded at some point to check if the packets are in order rather than just letting the device forward all packets as fast as it can. In the past, this has not been very significant, but as you get the ability to send (and/or receive) packets over multiple links (be they physical wire links, or more probably, multiple parallel channels over multi-mode fiber or RF), I can easily see it becoming more important to try and avoid these bottlenecks. > I note that the benefit of reordering is all in the transport network, while > the costs are all carried by the endpoints. With this kind of skewed > incentives, what outcome do you expect. well, if the transport routers end up slowing down all traffic because they run out of cpu to process packets, that will hurt the endpoint very significantly. See what happens to router performance when you have to get the CPUs involved in packet processing as opposed to letting it happen on the ASICs. Look what happens when you hook a WNDR3800 home router to a 100Mb link and try to run cake (even before you saturate the network links). The endpoint very much notices. the bottleneck link is not always the final hop with the lowest bitrate (it frequently is, but not always) David Lang ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-03-18 0:05 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-14 8:26 [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint) Ingemar Johansson S 2019-03-14 8:43 ` Sebastian Moeller 2019-03-14 19:23 ` Greg White 2019-03-14 21:43 ` Sebastian Moeller 2019-03-14 22:05 ` David Lang 2019-03-16 22:59 ` Michael Richardson 2019-03-17 10:23 ` Carsten Bormann 2019-03-17 11:45 ` Sebastian Moeller 2019-03-17 14:34 ` Carsten Bormann 2019-03-17 15:56 ` Sebastian Moeller 2019-03-17 17:09 ` Carsten Bormann 2019-03-17 19:57 ` Sebastian Moeller 2019-03-18 0:05 ` David Lang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox