* [Ecn-sane] per-flow scheduling @ 2019-06-19 14:12 Bob Briscoe 2019-06-19 14:20 ` [Ecn-sane] [tsvwg] " Kyle Rose ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Bob Briscoe @ 2019-06-19 14:12 UTC (permalink / raw) To: Holland, Jake; +Cc: ecn-sane, tsvwg IETF list Jake, all, You may not be aware of my long history of concern about how per-flow scheduling within endpoints and networks will limit the Internet in future. I find per-flow scheduling a violation of the e2e principle in such a profound way - the dynamic choice of the spacing between packets - that most people don't even associate it with the e2e principle. I detected that you were talking about FQ in a way that might have assumed my concern with it was just about implementation complexity. If you (or anyone watching) is not aware of the architectural concerns with per-flow scheduling, I can enumerate them. I originally started working on what became L4S to prove that it was possible to separate out reducing queuing delay from throughput scheduling. When Koen and I started working together on this, we discovered we had identical concerns on this. Bob -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-19 14:12 [Ecn-sane] per-flow scheduling Bob Briscoe @ 2019-06-19 14:20 ` Kyle Rose 2019-06-21 6:59 ` [Ecn-sane] " Sebastian Moeller 2019-07-17 21:33 ` [Ecn-sane] " Sebastian Moeller 2 siblings, 0 replies; 49+ messages in thread From: Kyle Rose @ 2019-06-19 14:20 UTC (permalink / raw) To: Bob Briscoe; +Cc: Holland, Jake, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 865 bytes --] On Wed, Jun 19, 2019 at 10:13 AM Bob Briscoe <ietf@bobbriscoe.net> wrote: > Jake, all, > > You may not be aware of my long history of concern about how per-flow > scheduling within endpoints and networks will limit the Internet in > future. I find per-flow scheduling a violation of the e2e principle in > such a profound way - the dynamic choice of the spacing between packets > - that most people don't even associate it with the e2e principle. > > I detected that you were talking about FQ in a way that might have > assumed my concern with it was just about implementation complexity. If > you (or anyone watching) is not aware of the architectural concerns with > per-flow scheduling, I can enumerate them. > I would certainly be interested in reading more about these concerns. Even a reference that I can read out-of-band would be fine. Thanks, Bob. Kyle [-- Attachment #2: Type: text/html, Size: 1277 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-06-19 14:12 [Ecn-sane] per-flow scheduling Bob Briscoe 2019-06-19 14:20 ` [Ecn-sane] [tsvwg] " Kyle Rose @ 2019-06-21 6:59 ` Sebastian Moeller 2019-06-21 9:33 ` Luca Muscariello 2019-07-17 21:33 ` [Ecn-sane] " Sebastian Moeller 2 siblings, 1 reply; 49+ messages in thread From: Sebastian Moeller @ 2019-06-21 6:59 UTC (permalink / raw) To: Bob Briscoe; +Cc: Holland, Jake, ecn-sane, tsvwg IETF list > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: > > Jake, all, > > You may not be aware of my long history of concern about how per-flow scheduling within endpoints and networks will limit the Internet in future. I find per-flow scheduling a violation of the e2e principle in such a profound way - the dynamic choice of the spacing between packets - that most people don't even associate it with the e2e principle. Maybe because it is not a violation of the e2e principle at all? My point is that with shared resources between the endpoints, the endpoints simply should have no expectancy that their choice of spacing between packets will be conserved. For the simple reason that it seems generally impossible to guarantee that inter-packet spacing is conserved (think "cross-traffic" at the bottleneck hop along the path and general bunching up of packets in the queue of a fast to slow transition*). I also would claim that the way L4S works (if it works) is to synchronize all active flows at the bottleneck which in tirn means each sender has only a very small timewindow in which to transmit a packet for it to hits its "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing delay guarantees will not work. In other words the senders have basically no say in the "spacing between packets", I fail to see how L4S improves upon FQ in that regard. IMHO having per-flow fairness as the defaults seems quite reasonable, endpoints can still throttle flows to their liking. Now per-flow fairness still can be "abused", so by itself it might not be sufficient, but neither is L4S as it has at best stochastic guarantees, as a single queue AQM (let's ignore the RFC3168 part of the AQM) there is the probability to send a throtteling signal to a low bandwidth flow (fair enough, it is only a mild throtteling signal, but still). But enough about my opinion, what is the ideal fairness measure in your mind, and what is realistically achievable over the internet? Best Regards Sebastian > > I detected that you were talking about FQ in a way that might have assumed my concern with it was just about implementation complexity. If you (or anyone watching) is not aware of the architectural concerns with per-flow scheduling, I can enumerate them. > > I originally started working on what became L4S to prove that it was possible to separate out reducing queuing delay from throughput scheduling. When Koen and I started working together on this, we discovered we had identical concerns on this. > > > > Bob > > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/ > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-06-21 6:59 ` [Ecn-sane] " Sebastian Moeller @ 2019-06-21 9:33 ` Luca Muscariello 2019-06-21 20:37 ` [Ecn-sane] [tsvwg] " Brian E Carpenter 0 siblings, 1 reply; 49+ messages in thread From: Luca Muscariello @ 2019-06-21 9:33 UTC (permalink / raw) To: Sebastian Moeller, David P. Reed; +Cc: Bob Briscoe, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 4302 bytes --] + David Reed, as I'm not sure he's on the ecn-sane list. To me, it seems like a very religious position against per-flow queueing. BTW, I fail to see how this would violate (in a "profound" way ) the e2e principle. When I read it (the e2e principle) Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End Arguments in System Design". In: Proceedings of the Second International Conference on Distributed Computing Systems. Paris, France. April 8–10, 1981. IEEE Computer Society, pp. 509-512. (available on line for free). It seems very much like the application of the Occam's razor to function placement in communication networks back in the 80s. I see no conflict between what is written in that paper and per-flow queueing today, even after almost 40 years. If that was the case, then all service differentiation techniques would violate the e2e principle in a "profound" way too, and dualQ too. A policer? A shaper? A priority queue? Luca On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller <moeller0@gmx.de> wrote: > > > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: > > > > Jake, all, > > > > You may not be aware of my long history of concern about how per-flow > scheduling within endpoints and networks will limit the Internet in future. > I find per-flow scheduling a violation of the e2e principle in such a > profound way - the dynamic choice of the spacing between packets - that > most people don't even associate it with the e2e principle. > > Maybe because it is not a violation of the e2e principle at all? My point > is that with shared resources between the endpoints, the endpoints simply > should have no expectancy that their choice of spacing between packets will > be conserved. For the simple reason that it seems generally impossible to > guarantee that inter-packet spacing is conserved (think "cross-traffic" at > the bottleneck hop along the path and general bunching up of packets in the > queue of a fast to slow transition*). I also would claim that the way L4S > works (if it works) is to synchronize all active flows at the bottleneck > which in tirn means each sender has only a very small timewindow in which > to transmit a packet for it to hits its "slot" in the bottleneck L4S > scheduler, otherwise, L4S's low queueing delay guarantees will not work. In > other words the senders have basically no say in the "spacing between > packets", I fail to see how L4S improves upon FQ in that regard. > > > IMHO having per-flow fairness as the defaults seems quite reasonable, > endpoints can still throttle flows to their liking. Now per-flow fairness > still can be "abused", so by itself it might not be sufficient, but neither > is L4S as it has at best stochastic guarantees, as a single queue AQM > (let's ignore the RFC3168 part of the AQM) there is the probability to send > a throtteling signal to a low bandwidth flow (fair enough, it is only a > mild throtteling signal, but still). > But enough about my opinion, what is the ideal fairness measure in your > mind, and what is realistically achievable over the internet? > > > Best Regards > Sebastian > > > > > > > > I detected that you were talking about FQ in a way that might have > assumed my concern with it was just about implementation complexity. If you > (or anyone watching) is not aware of the architectural concerns with > per-flow scheduling, I can enumerate them. > > > > I originally started working on what became L4S to prove that it was > possible to separate out reducing queuing delay from throughput scheduling. > When Koen and I started working together on this, we discovered we had > identical concerns on this. > > > > > > > > Bob > > > > > > -- > > ________________________________________________________________ > > Bob Briscoe http://bobbriscoe.net/ > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/ecn-sane > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > [-- Attachment #2: Type: text/html, Size: 5879 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-21 9:33 ` Luca Muscariello @ 2019-06-21 20:37 ` Brian E Carpenter 2019-06-22 19:50 ` David P. Reed 0 siblings, 1 reply; 49+ messages in thread From: Brian E Carpenter @ 2019-06-21 20:37 UTC (permalink / raw) To: Luca Muscariello, Sebastian Moeller, David P. Reed Cc: ecn-sane, tsvwg IETF list Below... On 21-Jun-19 21:33, Luca Muscariello wrote: > + David Reed, as I'm not sure he's on the ecn-sane list. > > To me, it seems like a very religious position against per-flow queueing. > BTW, I fail to see how this would violate (in a "profound" way ) the e2e principle. > > When I read it (the e2e principle) > > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End Arguments in System Design". > In: Proceedings of the Second International Conference on Distributed Computing Systems. Paris, France. > April 8–10, 1981. IEEE Computer Society, pp. 509-512. > (available on line for free). > > It seems very much like the application of the Occam's razor to function placement in communication networks back in the 80s. > I see no conflict between what is written in that paper and per-flow queueing today, even after almost 40 years. > > If that was the case, then all service differentiation techniques would violate the e2e principle in a "profound" way too, > and dualQ too. A policer? A shaper? A priority queue? > > Luca Quoting RFC2638 (the "two-bit" RFC): >>> Both these >>> proposals seek to define a single common mechanism that is used by >>> interior network routers, pushing most of the complexity and state of >>> differentiated services to the network edges. I can't help thinking that if DDC had felt this was against the E2E principle, he would have kicked up a fuss when it was written. Bob's right, however, that there might be a tussle here. If end-points are attempting to pace their packets to suit their own needs, and the network is policing packets to support both service differentiation and fairness, these may well be competing rather than collaborating behaviours. And there probably isn't anything we can do about it by twiddling with algorithms. Brian > > > > > > > > > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller <moeller0@gmx.de <mailto:moeller0@gmx.de>> wrote: > > > > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net <mailto:ietf@bobbriscoe.net>> wrote: > > > > Jake, all, > > > > You may not be aware of my long history of concern about how per-flow scheduling within endpoints and networks will limit the Internet in future. I find per-flow scheduling a violation of the e2e principle in such a profound way - the dynamic choice of the spacing between packets - that most people don't even associate it with the e2e principle. > > Maybe because it is not a violation of the e2e principle at all? My point is that with shared resources between the endpoints, the endpoints simply should have no expectancy that their choice of spacing between packets will be conserved. For the simple reason that it seems generally impossible to guarantee that inter-packet spacing is conserved (think "cross-traffic" at the bottleneck hop along the path and general bunching up of packets in the queue of a fast to slow transition*). I also would claim that the way L4S works (if it works) is to synchronize all active flows at the bottleneck which in tirn means each sender has only a very small timewindow in which to transmit a packet for it to hits its "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing delay guarantees will not work. In other words the senders have basically no say in the "spacing between packets", I fail to see how L4S improves upon FQ in that regard. > > > IMHO having per-flow fairness as the defaults seems quite reasonable, endpoints can still throttle flows to their liking. Now per-flow fairness still can be "abused", so by itself it might not be sufficient, but neither is L4S as it has at best stochastic guarantees, as a single queue AQM (let's ignore the RFC3168 part of the AQM) there is the probability to send a throtteling signal to a low bandwidth flow (fair enough, it is only a mild throtteling signal, but still). > But enough about my opinion, what is the ideal fairness measure in your mind, and what is realistically achievable over the internet? > > > Best Regards > Sebastian > > > > > > > > I detected that you were talking about FQ in a way that might have assumed my concern with it was just about implementation complexity. If you (or anyone watching) is not aware of the architectural concerns with per-flow scheduling, I can enumerate them. > > > > I originally started working on what became L4S to prove that it was possible to separate out reducing queuing delay from throughput scheduling. When Koen and I started working together on this, we discovered we had identical concerns on this. > > > > > > > > Bob > > > > > > -- > > ________________________________________________________________ > > Bob Briscoe http://bobbriscoe.net/ > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net <mailto:Ecn-sane@lists.bufferbloat.net> > > https://lists.bufferbloat.net/listinfo/ecn-sane > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net <mailto:Ecn-sane@lists.bufferbloat.net> > https://lists.bufferbloat.net/listinfo/ecn-sane > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-21 20:37 ` [Ecn-sane] [tsvwg] " Brian E Carpenter @ 2019-06-22 19:50 ` David P. Reed 2019-06-22 20:47 ` Jonathan Morton 2019-06-22 21:10 ` Brian E Carpenter 0 siblings, 2 replies; 49+ messages in thread From: David P. Reed @ 2019-06-22 19:50 UTC (permalink / raw) To: Brian E Carpenter Cc: Luca Muscariello, Sebastian Moeller, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 12854 bytes --] Two points: - Jerry Saltzer and I were the primary authors of the End-to-end argument paper, and the motivation was based *my* work on the original TCP and IP protocols. Dave Clark got involved significantly later than all those decisions, which were basically complete when he got involved. (Jerry was my thesis supervisor, I was his student, and I operated largely independently, taking input from various others at MIT). I mention this because Dave understands the end-to-end arguments, but he understands (as we all did) that it was a design *principle* and not a perfectly strict rule. That said, it's a rule that has a strong foundational argument from modularity and evolvability in a context where the system has to work on a wide range of infrastructures (not all knowable in advance) and support a wide range of usage/application-areas (not all knowable in advance). Treating the paper as if it were "DDC" declaring a law is just wrong. He wasn't Moses and it is not written on tablets. Dave did have some "power" in his role of trying to achieve interoperability across diverse implementations. But his focus was primarily on interoperability, not other things. So ideas in the IP protocol like "TOS" which were largely placeholders for not-completely-worked-out concepts deferred to the future were left till later. - It is clear (at least to me) that from the point of view of the source of an IP datagram, the "handling" of that datagram within the network of networks can vary, and so that is why there is a TOS field - to specify an interoperable, meaningfully described per-packet indicator of differential handling. In regards to the end-to-end argument, that handling choice is a network function, *to the extent that it can completely be implemented in the network itself*. Congestion management, however, is not achievable entirely and only within the network. That's completely obvious: congestion happens when the source-destination flows exceed the capacity of the network of networks to satisfy all demands. The network can only implement *certain* general kinds of mechanisms that may be used by the endpoints to resolve congestion: 1) admission controls. These are implemented at the interface between the source entity and the network of networks. They tend to be impractical in the Internet context, because there is, by a fundamental and irreversible design choice made by Cerf and Kahn (and the rest of us), no central controller of the entire network of networks. This is to make evolvability and scalability work. 5G (not an Internet system) implies a central controller, as does SNA, LTE, and many other networks. The Internet is an overlay on top of such networks. 2) signalling congestion to the endpoints, which will respond by slowing their transmission rate (or explicitly re-routing transmission, or compressing their content) through the network to match capacity. This response is done *above* the IP layer, and has proven very practical. The function in the network is reduced to "congestion signalling", in a universally understandable meaningful mechanism: packet drops, ECN, packet-pair separation in arrival time, ... This limited function is essential within the network, because it is the state of the path(s) that is needed to implement the full function at the end points. So congestion signalling, like ECN, is implemented according to the end-to-end argument by carefully defining the network function to be the minimum necessary mechanism so that endpoints can control their rates. 3) automatic selection of routes for flows. It's perfectly fine to select different routes based on information in the IP header (the part that is intended to be read and understood by the network of networks). Now this is currently *rarely* done, due to the complexity of tracking more detailed routing information at the router level. But we had expected that eventually the Internet would be so well connected that there would be diverse routes with diverse capabilities. For example, the "Interplanetary Internet" works with datagrams, that can be implemented with IP, but not using TCP, which requires very low end-to-end latency. Thus, one would expect that TCP would not want any packets transferred over a path via Mars, or for that matter a geosynchronous satellite, even if the throughput would be higher. So one can imagine that eventually a "TOS" might say - send this packet preferably along a path that has at most 200 ms. RTT, *even if that leads to congestion signalling*, while another TOS might say "send this path over the most "capacious" set of paths, ignoring RTT entirely. (these are just for illustration, but obviously something like this woujld work). Note that TOS is really aimed at *route selection* preferences, and not queueing management of individual routers. Queueing management to share a single queue on a path for multiple priorities of traffic is not very compatible with "end-to-end arguments". There are any number of reasons why this doesn't work well. I can go into them. Mainly these reasons are why "diffserv" has never been adopted - it's NOT interoperable because the diversity of traffic between endpoints is hard to specify in a way that translates into the network mechanisms. Of course any queue can be managed in some algorithmic way with parameters, but the endpoints that want to specify an end-to-end goal don't have a way to understand the impact of those parameters on a specific queue that is currently congested. Instead, the history of the Internet (and for that matter *all* networks, even Bell's voice systems) has focused on minimizing queueing delay to near zero throughout the network by whatever means it has at the endpoints or in the design. This is why we have AIMD's MD as a response to detection of congestion. Pragmatic networks (those that operate in the real world) do not choose to operate with shared links in a saturated state. That's known in the phone business as the Mother's Day problem. You want to have enough capacity for the rare near-overload to never result in congestion. Which means that the normal state of the network is very lightly loaded indeed, in order to minimize RTT. Consequently, focusing on somehow trying to optimize the utilization of the network to 100% is just a purely academic exercise. Since "priority" at the packet level within a queue only improves that case, it's just a focus of (bad) Ph.D. theses. (Good Ph.D. theses focus on actual real problems like getting the queues down to 1 packet or less by signalling the endpoints with information that allows them to do their job). So, in considering what goes in the IP layer, both its header and the mechanics of the network of networks, it is those things that actually have implementable meaning in the network of networks when processing the IP datagram. The rest is "content" because the network of networks doesn't need to see it. Thus, don't put anything in the IP header that belongs in the "content" part, just being a signal between end points. Some information used in the network of networks is also logically carried between endpoints. On Friday, June 21, 2019 4:37pm, "Brian E Carpenter" <brian.e.carpenter@gmail.com> said: > Below... > On 21-Jun-19 21:33, Luca Muscariello wrote: > > + David Reed, as I'm not sure he's on the ecn-sane list. > > > > To me, it seems like a very religious position against per-flow > queueing. > > BTW, I fail to see how this would violate (in a "profound" way ) the e2e > principle. > > > > When I read it (the e2e principle) > > > > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End Arguments in > System Design". > > In: Proceedings of the Second International Conference on Distributed > Computing Systems. Paris, France. > > April 8–10, 1981. IEEE Computer Society, pp. 509-512. > > (available on line for free). > > > > It seems very much like the application of the Occam's razor to function > placement in communication networks back in the 80s. > > I see no conflict between what is written in that paper and per-flow queueing > today, even after almost 40 years. > > > > If that was the case, then all service differentiation techniques would > violate the e2e principle in a "profound" way too, > > and dualQ too. A policer? A shaper? A priority queue? > > > > Luca > > Quoting RFC2638 (the "two-bit" RFC): > > >>> Both these > >>> proposals seek to define a single common mechanism that is used > by > >>> interior network routers, pushing most of the complexity and state > of > >>> differentiated services to the network edges. > > I can't help thinking that if DDC had felt this was against the E2E principle, > he would have kicked up a fuss when it was written. > > Bob's right, however, that there might be a tussle here. If end-points are > attempting to pace their packets to suit their own needs, and the network is > policing packets to support both service differentiation and fairness, > these may well be competing rather than collaborating behaviours. And there > probably isn't anything we can do about it by twiddling with algorithms. > > Brian > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller <moeller0@gmx.de > <mailto:moeller0@gmx.de>> wrote: > > > > > > > > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net > <mailto:ietf@bobbriscoe.net>> wrote: > > > > > > Jake, all, > > > > > > You may not be aware of my long history of concern about how > per-flow scheduling within endpoints and networks will limit the Internet in > future. I find per-flow scheduling a violation of the e2e principle in such a > profound way - the dynamic choice of the spacing between packets - that most > people don't even associate it with the e2e principle. > > > > Maybe because it is not a violation of the e2e principle at all? My point > is that with shared resources between the endpoints, the endpoints simply should > have no expectancy that their choice of spacing between packets will be conserved. > For the simple reason that it seems generally impossible to guarantee that > inter-packet spacing is conserved (think "cross-traffic" at the bottleneck hop > along the path and general bunching up of packets in the queue of a fast to slow > transition*). I also would claim that the way L4S works (if it works) is to > synchronize all active flows at the bottleneck which in tirn means each sender has > only a very small timewindow in which to transmit a packet for it to hits its > "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing delay > guarantees will not work. In other words the senders have basically no say in the > "spacing between packets", I fail to see how L4S improves upon FQ in that regard. > > > > > > IMHO having per-flow fairness as the defaults seems quite > reasonable, endpoints can still throttle flows to their liking. Now per-flow > fairness still can be "abused", so by itself it might not be sufficient, but > neither is L4S as it has at best stochastic guarantees, as a single queue AQM > (let's ignore the RFC3168 part of the AQM) there is the probability to send a > throtteling signal to a low bandwidth flow (fair enough, it is only a mild > throtteling signal, but still). > > But enough about my opinion, what is the ideal fairness measure in your > mind, and what is realistically achievable over the internet? > > > > > > Best Regards > > Sebastian > > > > > > > > > > > > > > I detected that you were talking about FQ in a way that might have > assumed my concern with it was just about implementation complexity. If you (or > anyone watching) is not aware of the architectural concerns with per-flow > scheduling, I can enumerate them. > > > > > > I originally started working on what became L4S to prove that it was > possible to separate out reducing queuing delay from throughput scheduling. When > Koen and I started working together on this, we discovered we had identical > concerns on this. > > > > > > > > > > > > Bob > > > > > > > > > -- > > > ________________________________________________________________ > > > Bob Briscoe > http://bobbriscoe.net/ > > > > > > _______________________________________________ > > > Ecn-sane mailing list > > > Ecn-sane@lists.bufferbloat.net > <mailto:Ecn-sane@lists.bufferbloat.net> > > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > <mailto:Ecn-sane@lists.bufferbloat.net> > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > [-- Attachment #2: Type: text/html, Size: 17042 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 19:50 ` David P. Reed @ 2019-06-22 20:47 ` Jonathan Morton 2019-06-22 22:03 ` Luca Muscariello 2019-06-22 22:09 ` David P. Reed 2019-06-22 21:10 ` Brian E Carpenter 1 sibling, 2 replies; 49+ messages in thread From: Jonathan Morton @ 2019-06-22 20:47 UTC (permalink / raw) To: David P. Reed; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list > On 22 Jun, 2019, at 10:50 pm, David P. Reed <dpreed@deepplum.com> wrote: > > Pragmatic networks (those that operate in the real world) do not choose to operate with shared links in a saturated state. That's known in the phone business as the Mother's Day problem. You want to have enough capacity for the rare near-overload to never result in congestion. This is most likely true for core networks. However, I know of several real-world networks and individual links which, in practice, are regularly in a saturated and/or congested state. Indeed, the average Internet consumer's ADSL or VDSL last-mile link becomes saturated for a noticeable interval, every time his operating system or game vendor releases an update. In my case, I share a 3G/4G tower's airtime with whatever variable number of subscribers to the same network happen to be in the area on any given day; today, during midsummer weekend, that number is considerably inflated compared to normal, and my available link bandwidth is substantially impacted as a result, indicating congestion. I did not see anything in your argument specifically about per-flow scheduling for the simple purpose of fairly sharing capacity between flows and/or between subscribers, and minimising the impact of elephants on mice. Empirical evidence suggests that it makes the network run more smoothly. Does anyone have a concrete refutation? - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 20:47 ` Jonathan Morton @ 2019-06-22 22:03 ` Luca Muscariello 2019-06-22 22:09 ` David P. Reed 1 sibling, 0 replies; 49+ messages in thread From: Luca Muscariello @ 2019-06-22 22:03 UTC (permalink / raw) To: Jonathan Morton Cc: Brian E Carpenter, David P. Reed, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 2707 bytes --] On Sat 22 Jun 2019 at 22:48, Jonathan Morton <chromatix99@gmail.com> wrote: > > On 22 Jun, 2019, at 10:50 pm, David P. Reed <dpreed@deepplum.com> wrote: > > > > Pragmatic networks (those that operate in the real world) do not choose > to operate with shared links in a saturated state. That's known in the > phone business as the Mother's Day problem. You want to have enough > capacity for the rare near-overload to never result in congestion. > > This is most likely true for core networks. However, I know of several > real-world networks and individual links which, in practice, are regularly > in a saturated and/or congested state. > > Indeed, the average Internet consumer's ADSL or VDSL last-mile link > becomes saturated for a noticeable interval, every time his operating > system or game vendor releases an update. In my case, I share a 3G/4G > tower's airtime with whatever variable number of subscribers to the same > network happen to be in the area on any given day; today, during midsummer > weekend, that number is considerably inflated compared to normal, and my > available link bandwidth is substantially impacted as a result, indicating > congestion. > > I did not see anything in your argument specifically about per-flow > scheduling for the simple purpose of fairly sharing capacity between flows > and/or between subscribers, and minimising the impact of elephants on > mice. Empirical evidence suggests that it makes the network run more > smoothly. Does anyone have a concrete refutation? > > - Jonathan Morton I don’t think you would be able to find a refutation. Going back for a second to what David and also Brian have said about diffserv, QoS have proved to be an intractable problem and I won’t blame those who have tried to propose solutions that currently work under very special circumstances. Things have not changed to make that problem simpler, quite the opposite, mostly because the mix of applications is way more diverse today with less predictable patters. If I apply the same mindset used in David’s paper, i.e. the Occam’s razor, to get design principles to obtain a solution that is simple and tractable, flow-queuing in your DSL link looks like a perfectly acceptable solution. And I say that w/o religious positions. The fact that flow-isolation generates incentives in sources to well behave is good evidence to me. Also the fact that even in situations that may look like the law of the jungle, flow-isolation brings me performance that is predictable. That brings more evidence that the solution is a good one. In this respect Fq_codel (RFC 8290) looks like a simple useful tool. [-- Attachment #2: Type: text/html, Size: 3394 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 20:47 ` Jonathan Morton 2019-06-22 22:03 ` Luca Muscariello @ 2019-06-22 22:09 ` David P. Reed 2019-06-22 23:07 ` Jonathan Morton 2019-06-26 12:48 ` Sebastian Moeller 1 sibling, 2 replies; 49+ messages in thread From: David P. Reed @ 2019-06-22 22:09 UTC (permalink / raw) To: Jonathan Morton; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 5873 bytes --] Good point, but saturation is not congestion, in my view. Optimal state of a single bottleneck link is a queue of length <= 1. This can be maintained even under full load, by endpoint rate control. (Note that the "Little's Law" result in queueing theory is for a special case of queuing due to uncontrolled Poisson random arrivals. The traffic you refer to, downloading an update, is about as far from Poisson-random as possible. It's neither random nor Poisson). But I take your point that there will be bottlenecked links that are special, and thus congestion control comes into play on those links. per-flow scheduling is appropriate on a shared link. However, the end-to-end argument would suggest that the network not try to divine which flows get preferred. And beyond the end-to-end argument, there's a practical problem - since the ideal state of a shared link means that it ought to have no local backlog in the queue, the information needed to schedule "fairly" isn't in the queue backlog itself. If there is only one packet, what's to schedule? In fact, what the ideal queueing discipline would do is send signals to the endpoints that provide information as to what each flow's appropriate share is, and/or how far its current share is from what's fair. A somewhat less optimal discipline to achieve fairness might also drop packets beyond the fair share. Dropping is the best way to signal the endpoint that it is trying to overuse the link. Merely re-ordering the packets on a link is just not very effective at achieving fairness. Now what to do to determine fair share if the average number of packets in a saturated link's queue is <= 1? Well, presumably the flows have definable average rates. So good statistical estimators of the average rate of each flow exist. For example a table with a moving average of the rate of bytes on each flow can be maintained, which provides wonderful information about the recent history of capacities used by each flow. This allows signalling of overuse to the endpoints of each flow, either by dropping packets or by some small number of bits (e.g. ECN marking). Underuse can also be signalled. We know (from pragmatic observation) that flows defined as source-host, dest-host pairs, have relatively stable demand over many packets, and many RTT's. More specific flows (say, defined using TCP and UDP ports, the "extended header" that maybe should have been included in the IP layer) are less stable in some protocol usages we see. However, what's clear is that something like fq_codel can be created that does not need to accumulate long queues on links to have an effect - obviously queues still result from transient overload, but one wants to avoid requiring queueing to build up in order to balance traffic fairly. Ideally, in a situation where some definition of fairness needs to be implemented among flows competing for a limited resource, the queueing should build up at the source endpoint as much as possible, and yet the point where the conflict occurs may be deep within the network of networks. However, once we start being able to balance loads across many independent paths (rather than assuming that there is exactly one path per flow), the definition of "fairness" becomes FAR more complicated, because the competing flows may not be between a particular source and destination pair - instead the traffic in each flow can be managed across different sets of multipaths. So the end-to-end approach would suggest moving most of the scheduling back to the endpoints of each flow, with the role of the routers being to extract information about the competing flows that are congesting the network, and forwarding those signals (via drops or marking) to the endpoints. That's because, in the end-to-end argument that applies here - the router cannot do the entire function of managing congestion or priority. The case of the "uplink" from a residential network is a common, but not general case. Once a residence has multiple uplinks (say in a Mesh neighborhood network or just in a complex wireless situation where multiple access points are reachable) the idea of localizing priority or congestion management in one box below the IP layer no longer works well at all. On Saturday, June 22, 2019 4:47pm, "Jonathan Morton" <chromatix99@gmail.com> said: > > On 22 Jun, 2019, at 10:50 pm, David P. Reed <dpreed@deepplum.com> > wrote: > > > > Pragmatic networks (those that operate in the real world) do not choose to > operate with shared links in a saturated state. That's known in the phone business > as the Mother's Day problem. You want to have enough capacity for the rare > near-overload to never result in congestion. > > This is most likely true for core networks. However, I know of several real-world > networks and individual links which, in practice, are regularly in a saturated > and/or congested state. > > Indeed, the average Internet consumer's ADSL or VDSL last-mile link becomes > saturated for a noticeable interval, every time his operating system or game > vendor releases an update. In my case, I share a 3G/4G tower's airtime with > whatever variable number of subscribers to the same network happen to be in the > area on any given day; today, during midsummer weekend, that number is > considerably inflated compared to normal, and my available link bandwidth is > substantially impacted as a result, indicating congestion. > > I did not see anything in your argument specifically about per-flow scheduling for > the simple purpose of fairly sharing capacity between flows and/or between > subscribers, and minimising the impact of elephants on mice. Empirical evidence > suggests that it makes the network run more smoothly. Does anyone have a concrete > refutation? > > - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 8891 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 22:09 ` David P. Reed @ 2019-06-22 23:07 ` Jonathan Morton 2019-06-24 18:57 ` David P. Reed 2019-06-26 12:48 ` Sebastian Moeller 1 sibling, 1 reply; 49+ messages in thread From: Jonathan Morton @ 2019-06-22 23:07 UTC (permalink / raw) To: David P. Reed; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list > On 23 Jun, 2019, at 1:09 am, David P. Reed <dpreed@deepplum.com> wrote: > > per-flow scheduling is appropriate on a shared link. However, the end-to-end argument would suggest that the network not try to divine which flows get preferred. > And beyond the end-to-end argument, there's a practical problem - since the ideal state of a shared link means that it ought to have no local backlog in the queue, the information needed to schedule "fairly" isn't in the queue backlog itself. If there is only one packet, what's to schedule? This is a great straw-man. Allow me to deconstruct it. The concept that DRR++ has empirically proved is that flows can be classified into two categories - sparse and saturating - very easily by the heuristic that a saturating flow's arrival rate exceeds its available delivery rate, and the opposite is true for a sparse flow. An excessive arrival rate results in a standing queue; with Reno, the excess arrival rate after capacity is reached is precisely 1 segment per RTT, very small next to modern link capacities. If there is no overall standing queue, then by definition all of the flows passing through are currently sparse. DRR++ (as implemented in fq_codel and Cake) ensures that all sparse traffic is processed with minimum delay and no AQM activity, while saturating traffic is metered out fairly and given appropriate AQM signals. > In fact, what the ideal queueing discipline would do is send signals to the endpoints that provide information as to what each flow's appropriate share is, and/or how far its current share is from what's fair. The definition of which flows are sparse and which are saturating shifts dynamically in response to endpoint behaviour. > Well, presumably the flows have definable average rates. Today's TCP traffic exhibits the classic sawtooth behaviour - which has a different shape and period with CUBIC than Reno, but is fundamentally similar. The sender probes capacity by increasing send rate until a congestion signal is fed back to it, at which point it drops back sharply. With efficient AQM action, a TCP flow will therefore spend most of its time "sparse" and using less than the available path capacity, with occasional excursions into "saturating" territory which are fairly promptly terminated by AQM signals. So TCP does *not* have a definable "average rate". It grows to fill available capacity, just like the number of cars on a motorway network. The recent work on high-fidelity ECN (including SCE) aims to eliminate the sawtooth, so that dropping out of "saturating" mode is done faster and by only a small margin, wasting less capacity and reducing peak delays - very close to ideal control as you describe. But it's still necessary to avoid giving these signals unnecessarily to "sparse" flows, which would cause them to back off and thus waste capacity, but only to "saturating" flows that have just begun to build a queue. And it's also necessary to protect these well-behaved "modern" flows from "legacy" endpoint behaviour, and vice versa. DRR++ does that very naturally. > Merely re-ordering the packets on a link is just not very effective at achieving fairness. I'm afraid this assertion is simply false. DRR++ does precisely that, and achieves near-perfect fairness. It is important however to define "flow" correctly relative to the measure of fairness you want to achieve. Traditionally the unique 5-tuple is used to define "flow", but this means applications can game the system by opening multiple flows. For an ISP a better definition might be that each subscriber's traffic is one "flow". Or there is a tweak to DRR++ which allows a two-layer fairness definition, implemented successfully in Cake. > So the end-to-end approach would suggest moving most of the scheduling back to the endpoints of each flow, with the role of the routers being to extract information about the competing flows that are congesting the network, and forwarding those signals (via drops or marking) to the endpoints. That's because, in the end-to-end argument that applies here - the router cannot do the entire function of managing congestion or priority. It must be remembered that congestion signals require one RTT to circulate from the bottleneck, via the receiver, back to the sender, and their effects to then be felt at the bottleneck. That's typically a much longer response time (say 100ms for a general Internet path) than can be achieved by packet scheduling (sub-millisecond for a 20Mbps link), and therefore effects only a looser control (by fundamental control theory). Both mechanisms are useful and complement each other. My personal interpretation of the end-to-end principle is that endpoints generally do not, cannot, and *should not* be aware of the topology of the network between them, nor of any other traffic that might be sharing that network. The network itself takes care of those details, and may send standardised control-feedback signals to the endpoints to inform them about actions they need to take. These currently take the form of ICMP error packets and the ECN field, the latter substituted by packet drops on Not-ECT flows. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 23:07 ` Jonathan Morton @ 2019-06-24 18:57 ` David P. Reed 2019-06-24 19:31 ` Jonathan Morton 0 siblings, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-06-24 18:57 UTC (permalink / raw) To: Jonathan Morton; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 6500 bytes --] Jonathan - all of the things you say are kind of silly. An HTTP 1.1 protocol running over TCP is not compatible with this description, except in "fantasyland". I think you are obsessed with some idea of "proving me wrong". That's not productive. If you have actual data describing how HTTP 1.1 connections proceed over time that disagrees with my observation, show them. Preferably taken in the wild. I honestly can't imagine that you have actually observed any system other than the constrained single connection between a LAN and a residential ISP. TCP doesn't have a "natural sawtooth" - that is the response of TCP to a particular "queueing discipline" in a particular kind of a router - it would respond differently (and does!) if the router were to drop packets randomly on a Poisson basis, for example. No sawtooth at all. So you seem to see routers are part of TCP. That's not the way the Internet is designed. On Saturday, June 22, 2019 7:07pm, "Jonathan Morton" <chromatix99@gmail.com> said: > > On 23 Jun, 2019, at 1:09 am, David P. Reed <dpreed@deepplum.com> > wrote: > > > > per-flow scheduling is appropriate on a shared link. However, the end-to-end > argument would suggest that the network not try to divine which flows get > preferred. > > And beyond the end-to-end argument, there's a practical problem - since the > ideal state of a shared link means that it ought to have no local backlog in the > queue, the information needed to schedule "fairly" isn't in the queue backlog > itself. If there is only one packet, what's to schedule? > > This is a great straw-man. Allow me to deconstruct it. > > The concept that DRR++ has empirically proved is that flows can be classified into > two categories - sparse and saturating - very easily by the heuristic that a > saturating flow's arrival rate exceeds its available delivery rate, and the > opposite is true for a sparse flow. > > An excessive arrival rate results in a standing queue; with Reno, the excess > arrival rate after capacity is reached is precisely 1 segment per RTT, very small > next to modern link capacities. If there is no overall standing queue, then by > definition all of the flows passing through are currently sparse. DRR++ (as > implemented in fq_codel and Cake) ensures that all sparse traffic is processed > with minimum delay and no AQM activity, while saturating traffic is metered out > fairly and given appropriate AQM signals. > > > In fact, what the ideal queueing discipline would do is send signals to the > endpoints that provide information as to what each flow's appropriate share is, > and/or how far its current share is from what's fair. > > The definition of which flows are sparse and which are saturating shifts > dynamically in response to endpoint behaviour. > > > Well, presumably the flows have definable average rates. > > Today's TCP traffic exhibits the classic sawtooth behaviour - which has a > different shape and period with CUBIC than Reno, but is fundamentally similar. > The sender probes capacity by increasing send rate until a congestion signal is > fed back to it, at which point it drops back sharply. With efficient AQM action, > a TCP flow will therefore spend most of its time "sparse" and using less than the > available path capacity, with occasional excursions into "saturating" territory > which are fairly promptly terminated by AQM signals. > > So TCP does *not* have a definable "average rate". It grows to fill available > capacity, just like the number of cars on a motorway network. > > The recent work on high-fidelity ECN (including SCE) aims to eliminate the > sawtooth, so that dropping out of "saturating" mode is done faster and by only a > small margin, wasting less capacity and reducing peak delays - very close to ideal > control as you describe. But it's still necessary to avoid giving these signals > unnecessarily to "sparse" flows, which would cause them to back off and thus waste > capacity, but only to "saturating" flows that have just begun to build a queue. > And it's also necessary to protect these well-behaved "modern" flows from "legacy" > endpoint behaviour, and vice versa. DRR++ does that very naturally. > > > Merely re-ordering the packets on a link is just not very effective at > achieving fairness. > > I'm afraid this assertion is simply false. DRR++ does precisely that, and > achieves near-perfect fairness. > > It is important however to define "flow" correctly relative to the measure of > fairness you want to achieve. Traditionally the unique 5-tuple is used to define > "flow", but this means applications can game the system by opening multiple flows. > For an ISP a better definition might be that each subscriber's traffic is one > "flow". Or there is a tweak to DRR++ which allows a two-layer fairness > definition, implemented successfully in Cake. > > > So the end-to-end approach would suggest moving most of the scheduling back > to the endpoints of each flow, with the role of the routers being to extract > information about the competing flows that are congesting the network, and > forwarding those signals (via drops or marking) to the endpoints. That's because, > in the end-to-end argument that applies here - the router cannot do the entire > function of managing congestion or priority. > > It must be remembered that congestion signals require one RTT to circulate from > the bottleneck, via the receiver, back to the sender, and their effects to then be > felt at the bottleneck. That's typically a much longer response time (say 100ms > for a general Internet path) than can be achieved by packet scheduling > (sub-millisecond for a 20Mbps link), and therefore effects only a looser control > (by fundamental control theory). Both mechanisms are useful and complement each > other. > > My personal interpretation of the end-to-end principle is that endpoints generally > do not, cannot, and *should not* be aware of the topology of the network between > them, nor of any other traffic that might be sharing that network. The network > itself takes care of those details, and may send standardised control-feedback > signals to the endpoints to inform them about actions they need to take. These > currently take the form of ICMP error packets and the ECN field, the latter > substituted by packet drops on Not-ECT flows. > > - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 8967 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-24 18:57 ` David P. Reed @ 2019-06-24 19:31 ` Jonathan Morton 2019-06-24 19:50 ` David P. Reed 2019-06-24 21:25 ` Luca Muscariello 0 siblings, 2 replies; 49+ messages in thread From: Jonathan Morton @ 2019-06-24 19:31 UTC (permalink / raw) To: David P. Reed; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list > On 24 Jun, 2019, at 9:57 pm, David P. Reed <dpreed@deepplum.com> wrote: > > TCP doesn't have a "natural sawtooth" - that is the response of TCP to a particular "queueing discipline" in a particular kind of a router - it would respond differently (and does!) if the router were to drop packets randomly on a Poisson basis, for example. No sawtooth at all. I challenge you to show me a Reno or CUBIC based connection's cwnd evolution that *doesn't* resemble a sawtooth, regardless of the congestion signalling employed. And I will show you that it either has severe underutilisation of the link, or is using SCE signals. The sawtooth is characteristic of the AIMD congestion control algorithm. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-24 19:31 ` Jonathan Morton @ 2019-06-24 19:50 ` David P. Reed 2019-06-24 20:14 ` Jonathan Morton 2019-06-24 21:25 ` Luca Muscariello 1 sibling, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-06-24 19:50 UTC (permalink / raw) To: Jonathan Morton; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 1435 bytes --] Please! On Monday, June 24, 2019 3:31pm, "Jonathan Morton" <chromatix99@gmail.com> said: > > On 24 Jun, 2019, at 9:57 pm, David P. Reed <dpreed@deepplum.com> > wrote: > > > > TCP doesn't have a "natural sawtooth" - that is the response of TCP to a > particular "queueing discipline" in a particular kind of a router - it would > respond differently (and does!) if the router were to drop packets randomly on a > Poisson basis, for example. No sawtooth at all. > > I challenge you to show me a Reno or CUBIC based connection's cwnd evolution that > *doesn't* resemble a sawtooth, regardless of the congestion signalling employed. > And I will show you that it either has severe underutilisation of the link, or is > using SCE signals. The sawtooth is characteristic of the AIMD congestion control > algorithm. Of course AIMD responds with a sawtooth to a router dropping occasional packets as congestion signalling. Are you trying to pretend I'm an idiot? And various kinds of SACK cause non-sawtooth responses. My overall point here is that you seem to live in a world of academic-like purity - all TCP connections are essentially huge file transfers, where there are no delays on production or consumption of packets at the endpoint, there is no multiplexing or scheduling of processes in the endpoint operating systems, etc. TCP sources don't work like that in practice. > > - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 2632 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-24 19:50 ` David P. Reed @ 2019-06-24 20:14 ` Jonathan Morton 2019-06-25 21:05 ` David P. Reed 0 siblings, 1 reply; 49+ messages in thread From: Jonathan Morton @ 2019-06-24 20:14 UTC (permalink / raw) To: David P. Reed; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list > On 24 Jun, 2019, at 10:50 pm, David P. Reed <dpreed@deepplum.com> wrote: > > My overall point here is that you seem to live in a world of academic-like purity - all TCP connections are essentially huge file transfers, where there are no delays on production or consumption of packets at the endpoint, there is no multiplexing or scheduling of processes in the endpoint operating systems, etc. On the contrary, I've found that per-flow queuing algorithms like DRR++ cope naturally and very nicely with all sorts of deviations from the ideal, including for example the wild variations in goodput and RTT associated with wifi links. These are exactly the kinds of complication that I imagine - and not merely in the abstract but through observation - that a pure end-to-end approach would have great difficulty in accommodating elegantly. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-24 20:14 ` Jonathan Morton @ 2019-06-25 21:05 ` David P. Reed 0 siblings, 0 replies; 49+ messages in thread From: David P. Reed @ 2019-06-25 21:05 UTC (permalink / raw) To: Jonathan Morton; +Cc: Brian E Carpenter, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 1048 bytes --] So, go for it, then. I wish you well. On Monday, June 24, 2019 4:14pm, "Jonathan Morton" <chromatix99@gmail.com> said: > > On 24 Jun, 2019, at 10:50 pm, David P. Reed <dpreed@deepplum.com> > wrote: > > > > My overall point here is that you seem to live in a world of academic-like > purity - all TCP connections are essentially huge file transfers, where there are > no delays on production or consumption of packets at the endpoint, there is no > multiplexing or scheduling of processes in the endpoint operating systems, etc. > > On the contrary, I've found that per-flow queuing algorithms like DRR++ cope > naturally and very nicely with all sorts of deviations from the ideal, including > for example the wild variations in goodput and RTT associated with wifi links. > > These are exactly the kinds of complication that I imagine - and not merely in the > abstract but through observation - that a pure end-to-end approach would have > great difficulty in accommodating elegantly. > > - Jonathan Morton > > [-- Attachment #2: Type: text/html, Size: 1675 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-24 19:31 ` Jonathan Morton 2019-06-24 19:50 ` David P. Reed @ 2019-06-24 21:25 ` Luca Muscariello 1 sibling, 0 replies; 49+ messages in thread From: Luca Muscariello @ 2019-06-24 21:25 UTC (permalink / raw) To: Jonathan Morton Cc: David P. Reed, ecn-sane, Brian E Carpenter, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 1408 bytes --] On Mon, Jun 24, 2019 at 9:32 PM Jonathan Morton <chromatix99@gmail.com> wrote: > > On 24 Jun, 2019, at 9:57 pm, David P. Reed <dpreed@deepplum.com> wrote: > > > > TCP doesn't have a "natural sawtooth" - that is the response of TCP to a > particular "queueing discipline" in a particular kind of a router - it > would respond differently (and does!) if the router were to drop packets > randomly on a Poisson basis, for example. No sawtooth at all. > > I challenge you to show me a Reno or CUBIC based connection's cwnd > evolution that *doesn't* resemble a sawtooth, regardless of the congestion > signalling employed. And I will show you that it either has severe > underutilisation of the link, or is using SCE signals. The sawtooth is > characteristic of the AIMD congestion control algorithm. > Jonathan, even if it is news to nobody, AIMD does not necessarily converge to a limit cycle (e.g. a sawtooth). It depends on some regularity conditions of the AIMD law and congestion feedback too. For instance, AIMD delay based congestion control or AIMD with certain ECN laws may display no limit cycles. Still under certain conditions. Just to recall that the problem is a little more complex than that in general. > > - Jonathan Morton > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > [-- Attachment #2: Type: text/html, Size: 2314 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 22:09 ` David P. Reed 2019-06-22 23:07 ` Jonathan Morton @ 2019-06-26 12:48 ` Sebastian Moeller 2019-06-26 16:31 ` David P. Reed 1 sibling, 1 reply; 49+ messages in thread From: Sebastian Moeller @ 2019-06-26 12:48 UTC (permalink / raw) To: David P. Reed Cc: Jonathan Morton, ecn-sane, Brian E Carpenter, tsvwg IETF list > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> wrote: > > [...] > > per-flow scheduling is appropriate on a shared link. However, the end-to-end argument would suggest that the network not try to divine which flows get preferred. > And beyond the end-to-end argument, there's a practical problem - since the ideal state of a shared link means that it ought to have no local backlog in the queue, the information needed to schedule "fairly" isn't in the queue backlog itself. If there is only one packet, what's to schedule? > [...] Excuse my stupidity, but the "only one single packet" case is the theoretical limiting case, no? Because even on a link not running at capacity this effectively requires a mechanism to "synchronize" all senders (whose packets traverse the hop we are looking at), as no other packet is allowed to reach the hop unless the "current" one has been passed to the PHY otherwise we transiently queue 2 packets (I note that this rationale should hold for any small N). The more packets per second a hop handles the less likely it will be to avoid for any newcomer to run into an already existing packet(s), that is to transiently grow the queue. Not having a CS background, I fail to see how this required synchronized state can exist outside of a few steady state configurations where things change slowly enough that the seemingly required synchronization can actually happen (given that the feedback loop e.g. through ACKs, seems somewhat jittery). Since packets never know which path they take and which hop is going to be critical there seems to be no a priori way to synchronize all senders, heck I fail to see whether it would be possible at all to guarantee synchronized behavior on more than one hop (unless all hops are extremely uniform). I happen to believe that L4S suffers from the same conceptual issue (plus overly generic promises, from the RITE website: "We are so used to the unpredictability of queuing delay, we don’t know how good the Internet would feel without it. The RITE project has developed simple technology to make queuing delay a thing of the past—not just for a select few apps, but for all." this seems missing a conditions apply statement) Best Regards Sebastian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-26 12:48 ` Sebastian Moeller @ 2019-06-26 16:31 ` David P. Reed 2019-06-26 16:53 ` David P. Reed ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: David P. Reed @ 2019-06-26 16:31 UTC (permalink / raw) To: Sebastian Moeller Cc: Jonathan Morton, ecn-sane, Brian E Carpenter, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 9444 bytes --] It's the limiting case, but also the optimal state given "perfect knowledge". Yes, it requires that the source-destination pairs sharing the link in question coordinate their packet admission times so they don't "collide" at the link. Ideally the next packet would arrive during the previous packet's transmission, so it is ready-to-go when that packet's transmission ends. Such exquisite coordination is feasible when future behavior by source and destination at the interface is known, which requires an Oracle. That's the same kind of condition most information theoretic and queueing theoretic optimality requires. But this is worth keeping in mind as the overall joint goal of all users. In particular, "link utilization" isn't a user goal at all. The link is there and is being paid for whether it is used or not (looking from the network structure as a whole). Its capacity exists to move packets out of the way. An ideal link satisfies the requirement that it never creates a queue because of anything other than imperfect coordination of the end-to-end flows mapped onto it. That's why the router should not be measured by "link utilization" anymore than a tunnel in a city during commuting hours should be measured by cars moved per hour. Clearly a tunnel can be VERY congested and moving many cars if they are attached to each other bumper to bumper - the latency through the tunnel would then be huge. If the cars were tipped on their ends and stacked, even more throughput would be achieved through the tunnel, and the delay of rotating them and packing them would add even more delay. The idea that "link utilization" of 100% must be achieved is why we got bufferbloat designed into routers. It's a worm's eye perspective. To this day, Arista Networks brags about how its bufferbloated feature design optimizes switch utilization ([ https://packetpushers.net/aristas-big-buffer-b-s/ ]( https://packetpushers.net/aristas-big-buffer-b-s/ )). And it selects benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big name that he can sell defective gear at a premium price, letting the datacenters who buy it discover that those switches get "clogged up" by TCP traffic when they are the "bottleneck link". Fortunately, they are fast, so they are less frequently the bottleneck in datacenter daily use. In trying to understand what is going on with congestion signalling, any buffering at the entry to the link should be due only to imperfect information being fed back to the endpoints generating traffic. Because a misbehaving endpoint generates Denial of Service for all other users. Priority mechanisms focused on protecting high-paying users from low-paying ones don't help much - they only help at overloaded states of the network. Which isn't to say that priority does nothing - it's just that stable assignment of a sharing level to priority levels isn't easy. (See Paris Metro Pricing, where there are only two classes, and the problem of deciding how to manage the access to the "first class" section - the idea that 15 classes with different metrics can be handled simply and interoperably between differently managed autonomous systems seems to be an incredibly impractical goal). Even in the priority case, buffering is NOT a desirable end user thing. My personal view is that the manager of a network needs to configure the network so that no link ever gets overloaded, if possible. The response to overload should be to tell the relevant flows to all slow down (not just one, because if there are 100 flows that start up roughly at the same time, causing MD on one does very little. This is an example of something where per-flow stuff in the router actually makes the router helpful in the large scheme of things. Maybe all flows should be equally informed, as flows. Which means the router needs to know how to signal multiple flows, while not just hammering all the packets of a single flow. This case is very real, but not as frequently on the client side as on the "server side" in "load balancers" and such like. My point here is simple: 1) the endpoints tell the routers what flows are going through a link already. That's just the address information. So that information can be used for fairness pretty well, especially if short term memory (a bloom filter, perhaps) can track a sufficiently large number of flows. 2) The per-flow decisions related to congestion control within a flow are necessarily end-to-end in nature - the router can only tell the ends what is going on, but the ends (together - their admissions rates and consumption rates are coupled to the use being made) must be informed and decide. The congestion management must combine information about the source and the destination future behavior (even if it is just taking recent history and projecting it as an estimate of future behavior at source and destination). Which is why it is quite natural to have routers signal the destination, which then signals the source, which changes its behavior. 3) there are definitely other ways to improve latency for IP and protocols built on top of it - routing some flows over different paths under congestion is one. call the per-flow routing. Another is scattering a flow over several paths (but that seems problematic for today's TcP which assumes all packets take the same path). 4) A different, but very coupled view of IP is that any application-relevant buffering shoujld be driven into the endpoints - at the source, buffering is useful to deal with variability in the rate of production of data to be sent. At the destination, buffering is useful to minimize jitter, matching to the consumption behavior of the application. But these buffers should not be pushed into the network where they cause congestion for other flows sharing resources. So buffering in the network should ONLY deal with the uncertainty in resource competition. This tripartite breakdown of buffering is protocol independent. It applies to TCP, NTP, RTP, QUIC/UDP, ... It's what we (that is me) had in mind when we split UDP out of TCP, allowing UDP based protocols to manage source and destination buffering in the application for all the things we thought UDP would be used for - packet speech, computer-computer remote procedure calls (what would be QUIC today), SATNET/interplanetary Internet connections , ...). Sadly, in the many years since the late 1970's the tendency to think file transfers between infinite speed storage devices over TCP are the only relevant use of the Internet has penetrated the router design community. I can't seem to get anyone to recognize how far we are from that. No one runs benchmarks for such behavior, no one even measures anything other than the "hot rod" maximum throughput cases. And many egos seem to think that working on the hot rod cases is going to make their career or sell product. (e.g. the sad case of Arista). On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" <moeller0@gmx.de> said: > > > > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> wrote: > > > > [...] > > > > per-flow scheduling is appropriate on a shared link. However, the end-to-end > argument would suggest that the network not try to divine which flows get > preferred. > > And beyond the end-to-end argument, there's a practical problem - since the > ideal state of a shared link means that it ought to have no local backlog in the > queue, the information needed to schedule "fairly" isn't in the queue backlog > itself. If there is only one packet, what's to schedule? > > > [...] > > Excuse my stupidity, but the "only one single packet" case is the theoretical > limiting case, no? > Because even on a link not running at capacity this effectively requires a > mechanism to "synchronize" all senders (whose packets traverse the hop we are > looking at), as no other packet is allowed to reach the hop unless the "current" > one has been passed to the PHY otherwise we transiently queue 2 packets (I note > that this rationale should hold for any small N). The more packets per second a > hop handles the less likely it will be to avoid for any newcomer to run into an > already existing packet(s), that is to transiently grow the queue. > Not having a CS background, I fail to see how this required synchronized state can > exist outside of a few steady state configurations where things change slowly > enough that the seemingly required synchronization can actually happen (given > that the feedback loop e.g. through ACKs, seems somewhat jittery). Since packets > never know which path they take and which hop is going to be critical there seems > to be no a priori way to synchronize all senders, heck I fail to see whether it > would be possible at all to guarantee synchronized behavior on more than one hop > (unless all hops are extremely uniform). > I happen to believe that L4S suffers from the same conceptual issue (plus overly > generic promises, from the RITE website: > "We are so used to the unpredictability of queuing delay, we don’t know how > good the Internet would feel without it. The RITE project has developed simple > technology to make queuing delay a thing of the past—not just for a select > few apps, but for all." this seems missing a conditions apply statement) > > Best Regards > Sebastian [-- Attachment #2: Type: text/html, Size: 13904 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-26 16:31 ` David P. Reed @ 2019-06-26 16:53 ` David P. Reed 2019-06-27 7:54 ` Sebastian Moeller 2019-06-27 7:49 ` Sebastian Moeller 2019-06-27 7:53 ` Bless, Roland (TM) 2 siblings, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-06-26 16:53 UTC (permalink / raw) To: David P. Reed Cc: Sebastian Moeller, ecn-sane, Brian E Carpenter, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 10895 bytes --] A further minor thought, maybe one that needs not be said: Flows aren't "connections". Routers are not involved in connection state management, which is purely part of the end to end protocol. Anything about "connections" that a router might need to know to handle a packet should be packaged into the IP header of each packet in a standard form. Routers can "store" this information associated with the source, destination pair if they want, for a short time, subject to well understood semantics when they run out of storage. This fits into an end-to-end argument as an optiimization of a kind, as long as the function of such information is very narrowly and generally defined to benefit all users of IP-based protocols. For example, remembering the last time a packet of a particular flow was received after forwarding it, for a short time, to calculate fairness, that seems like a very useful idea, as long as forgetting the last time of receipt is not unfair. This use of the flow's IP headers to carry info into router queueing and routing decisions is analogous to the "Fate Sharing" principle of protocol design that DDC describes. Instead of having an independent control plane protocol, which has all kinds of problems with synchronization and combinatorial problems of packet loss, "Fate Sharing" of protocol information is very elegant. On Wednesday, June 26, 2019 12:31pm, "David P. Reed" <dpreed@deepplum.com> said: It's the limiting case, but also the optimal state given "perfect knowledge". Yes, it requires that the source-destination pairs sharing the link in question coordinate their packet admission times so they don't "collide" at the link. Ideally the next packet would arrive during the previous packet's transmission, so it is ready-to-go when that packet's transmission ends. Such exquisite coordination is feasible when future behavior by source and destination at the interface is known, which requires an Oracle. That's the same kind of condition most information theoretic and queueing theoretic optimality requires. But this is worth keeping in mind as the overall joint goal of all users. In particular, "link utilization" isn't a user goal at all. The link is there and is being paid for whether it is used or not (looking from the network structure as a whole). Its capacity exists to move packets out of the way. An ideal link satisfies the requirement that it never creates a queue because of anything other than imperfect coordination of the end-to-end flows mapped onto it. That's why the router should not be measured by "link utilization" anymore than a tunnel in a city during commuting hours should be measured by cars moved per hour. Clearly a tunnel can be VERY congested and moving many cars if they are attached to each other bumper to bumper - the latency through the tunnel would then be huge. If the cars were tipped on their ends and stacked, even more throughput would be achieved through the tunnel, and the delay of rotating them and packing them would add even more delay. The idea that "link utilization" of 100% must be achieved is why we got bufferbloat designed into routers. It's a worm's eye perspective. To this day, Arista Networks brags about how its bufferbloated feature design optimizes switch utilization ([ https://packetpushers.net/aristas-big-buffer-b-s/ ]( https://packetpushers.net/aristas-big-buffer-b-s/ )). And it selects benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big name that he can sell defective gear at a premium price, letting the datacenters who buy it discover that those switches get "clogged up" by TCP traffic when they are the "bottleneck link". Fortunately, they are fast, so they are less frequently the bottleneck in datacenter daily use. In trying to understand what is going on with congestion signalling, any buffering at the entry to the link should be due only to imperfect information being fed back to the endpoints generating traffic. Because a misbehaving endpoint generates Denial of Service for all other users. Priority mechanisms focused on protecting high-paying users from low-paying ones don't help much - they only help at overloaded states of the network. Which isn't to say that priority does nothing - it's just that stable assignment of a sharing level to priority levels isn't easy. (See Paris Metro Pricing, where there are only two classes, and the problem of deciding how to manage the access to the "first class" section - the idea that 15 classes with different metrics can be handled simply and interoperably between differently managed autonomous systems seems to be an incredibly impractical goal). Even in the priority case, buffering is NOT a desirable end user thing. My personal view is that the manager of a network needs to configure the network so that no link ever gets overloaded, if possible. The response to overload should be to tell the relevant flows to all slow down (not just one, because if there are 100 flows that start up roughly at the same time, causing MD on one does very little. This is an example of something where per-flow stuff in the router actually makes the router helpful in the large scheme of things. Maybe all flows should be equally informed, as flows. Which means the router needs to know how to signal multiple flows, while not just hammering all the packets of a single flow. This case is very real, but not as frequently on the client side as on the "server side" in "load balancers" and such like. My point here is simple: 1) the endpoints tell the routers what flows are going through a link already. That's just the address information. So that information can be used for fairness pretty well, especially if short term memory (a bloom filter, perhaps) can track a sufficiently large number of flows. 2) The per-flow decisions related to congestion control within a flow are necessarily end-to-end in nature - the router can only tell the ends what is going on, but the ends (together - their admissions rates and consumption rates are coupled to the use being made) must be informed and decide. The congestion management must combine information about the source and the destination future behavior (even if it is just taking recent history and projecting it as an estimate of future behavior at source and destination). Which is why it is quite natural to have routers signal the destination, which then signals the source, which changes its behavior. 3) there are definitely other ways to improve latency for IP and protocols built on top of it - routing some flows over different paths under congestion is one. call the per-flow routing. Another is scattering a flow over several paths (but that seems problematic for today's TcP which assumes all packets take the same path). 4) A different, but very coupled view of IP is that any application-relevant buffering shoujld be driven into the endpoints - at the source, buffering is useful to deal with variability in the rate of production of data to be sent. At the destination, buffering is useful to minimize jitter, matching to the consumption behavior of the application. But these buffers should not be pushed into the network where they cause congestion for other flows sharing resources. So buffering in the network should ONLY deal with the uncertainty in resource competition. This tripartite breakdown of buffering is protocol independent. It applies to TCP, NTP, RTP, QUIC/UDP, ... It's what we (that is me) had in mind when we split UDP out of TCP, allowing UDP based protocols to manage source and destination buffering in the application for all the things we thought UDP would be used for - packet speech, computer-computer remote procedure calls (what would be QUIC today), SATNET/interplanetary Internet connections , ...). Sadly, in the many years since the late 1970's the tendency to think file transfers between infinite speed storage devices over TCP are the only relevant use of the Internet has penetrated the router design community. I can't seem to get anyone to recognize how far we are from that. No one runs benchmarks for such behavior, no one even measures anything other than the "hot rod" maximum throughput cases. And many egos seem to think that working on the hot rod cases is going to make their career or sell product. (e.g. the sad case of Arista). On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" <moeller0@gmx.de> said: > > > > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> wrote: > > > > [...] > > > > per-flow scheduling is appropriate on a shared link. However, the end-to-end > argument would suggest that the network not try to divine which flows get > preferred. > > And beyond the end-to-end argument, there's a practical problem - since the > ideal state of a shared link means that it ought to have no local backlog in the > queue, the information needed to schedule "fairly" isn't in the queue backlog > itself. If there is only one packet, what's to schedule? > > > [...] > > Excuse my stupidity, but the "only one single packet" case is the theoretical > limiting case, no? > Because even on a link not running at capacity this effectively requires a > mechanism to "synchronize" all senders (whose packets traverse the hop we are > looking at), as no other packet is allowed to reach the hop unless the "current" > one has been passed to the PHY otherwise we transiently queue 2 packets (I note > that this rationale should hold for any small N). The more packets per second a > hop handles the less likely it will be to avoid for any newcomer to run into an > already existing packet(s), that is to transiently grow the queue. > Not having a CS background, I fail to see how this required synchronized state can > exist outside of a few steady state configurations where things change slowly > enough that the seemingly required synchronization can actually happen (given > that the feedback loop e.g. through ACKs, seems somewhat jittery). Since packets > never know which path they take and which hop is going to be critical there seems > to be no a priori way to synchronize all senders, heck I fail to see whether it > would be possible at all to guarantee synchronized behavior on more than one hop > (unless all hops are extremely uniform). > I happen to believe that L4S suffers from the same conceptual issue (plus overly > generic promises, from the RITE website: > "We are so used to the unpredictability of queuing delay, we don’t know how > good the Internet would feel without it. The RITE project has developed simple > technology to make queuing delay a thing of the past—not just for a select > few apps, but for all." this seems missing a conditions apply statement) > > Best Regards > Sebastian [-- Attachment #2: Type: text/html, Size: 17127 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-26 16:53 ` David P. Reed @ 2019-06-27 7:54 ` Sebastian Moeller 0 siblings, 0 replies; 49+ messages in thread From: Sebastian Moeller @ 2019-06-27 7:54 UTC (permalink / raw) To: David P. Reed; +Cc: ecn-sane, Brian E Carpenter, tsvwg IETF list Hi David, > On Jun 26, 2019, at 18:53, David P. Reed <dpreed@deepplum.com> wrote: > > A further minor thought, maybe one that needs not be said: > > Flows aren't "connections". Routers are not involved in connection state management, which is purely part of the end to end protocol. Anything about "connections" that a router might need to know to handle a packet should be packaged into the IP header of each packet in a standard form. I read this, that your are not opposed to using IP packet data to convey information to intermediate routers then? In a way (and please correct me if this is wrong /too simplistic), L4S intends to use the ECT(1) codepoint for enspoints to signal to router's their behavior towards CE congestion signals (reduce window/rate by 50% versus a smaller step down). > Routers can "store" this information associated with the source, destination pair if they want, for a short time, subject to well understood semantics when they run out of storage. This fits into an end-to-end argument as an optiimization of a kind, as long as the function of such information is very narrowly and generally defined to benefit all users of IP-based protocols. Okay, that I read as fq-syatems are not in violation of e2e then. Best Regards Sebastian > > For example, remembering the last time a packet of a particular flow was received after forwarding it, for a short time, to calculate fairness, that seems like a very useful idea, as long as forgetting the last time of receipt is not unfair. > > This use of the flow's IP headers to carry info into router queueing and routing decisions is analogous to the "Fate Sharing" principle of protocol design that DDC describes. Instead of having an independent control plane protocol, which has all kinds of problems with synchronization and combinatorial problems of packet loss, "Fate Sharing" of protocol information is very elegant. > On Wednesday, June 26, 2019 12:31pm, "David P. Reed" <dpreed@deepplum.com> said: > > It's the limiting case, but also the optimal state given "perfect knowledge". > > Yes, it requires that the source-destination pairs sharing the link in question coordinate their packet admission times so they don't "collide" at the link. Ideally the next packet would arrive during the previous packet's transmission, so it is ready-to-go when that packet's transmission ends. > > Such exquisite coordination is feasible when future behavior by source and destination at the interface is known, which requires an Oracle. > That's the same kind of condition most information theoretic and queueing theoretic optimality requires. > > But this is worth keeping in mind as the overall joint goal of all users. > > In particular, "link utilization" isn't a user goal at all. The link is there and is being paid for whether it is used or not (looking from the network structure as a whole). Its capacity exists to move packets out of the way. An ideal link satisfies the requirement that it never creates a queue because of anything other than imperfect coordination of the end-to-end flows mapped onto it. That's why the router should not be measured by "link utilization" anymore than a tunnel in a city during commuting hours should be measured by cars moved per hour. Clearly a tunnel can be VERY congested and moving many cars if they are attached to each other bumper to bumper - the latency through the tunnel would then be huge. If the cars were tipped on their ends and stacked, even more throughput would be achieved through the tunnel, and the delay of rotating them and packing them would add even more delay. > > The idea that "link utilization" of 100% must be achieved is why we got bufferbloat designed into routers. It's a worm's eye perspective. To this day, Arista Networks brags about how its bufferbloated feature design optimizes switch utilization (https://packetpushers.net/aristas-big-buffer-b-s/). And it selects benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big name that he can sell defective gear at a premium price, letting the datacenters who buy it discover that those switches get "clogged up" by TCP traffic when they are the "bottleneck link". Fortunately, they are fast, so they are less frequently the bottleneck in datacenter daily use. > > In trying to understand what is going on with congestion signalling, any buffering at the entry to the link should be due only to imperfect information being fed back to the endpoints generating traffic. Because a misbehaving endpoint generates Denial of Service for all other users. > > Priority mechanisms focused on protecting high-paying users from low-paying ones don't help much - they only help at overloaded states of the network. Which isn't to say that priority does nothing - it's just that stable assignment of a sharing level to priority levels isn't easy. (See Paris Metro Pricing, where there are only two classes, and the problem of deciding how to manage the access to the "first class" section - the idea that 15 classes with different metrics can be handled simply and interoperably between differently managed autonomous systems seems to be an incredibly impractical goal). > Even in the priority case, buffering is NOT a desirable end user thing. > > My personal view is that the manager of a network needs to configure the network so that no link ever gets overloaded, if possible. The response to overload should be to tell the relevant flows to all slow down (not just one, because if there are 100 flows that start up roughly at the same time, causing MD on one does very little. This is an example of something where per-flow stuff in the router actually makes the router helpful in the large scheme of things. Maybe all flows should be equally informed, as flows. Which means the router needs to know how to signal multiple flows, while not just hammering all the packets of a single flow. This case is very real, but not as frequently on the client side as on the "server side" in "load balancers" and such like. > > My point here is simple: > > 1) the endpoints tell the routers what flows are going through a link already. That's just the address information. So that information can be used for fairness pretty well, especially if short term memory (a bloom filter, perhaps) can track a sufficiently large number of flows. > > 2) The per-flow decisions related to congestion control within a flow are necessarily end-to-end in nature - the router can only tell the ends what is going on, but the ends (together - their admissions rates and consumption rates are coupled to the use being made) must be informed and decide. The congestion management must combine information about the source and the destination future behavior (even if it is just taking recent history and projecting it as an estimate of future behavior at source and destination). Which is why it is quite natural to have routers signal the destination, which then signals the source, which changes its behavior. > > 3) there are definitely other ways to improve latency for IP and protocols built on top of it - routing some flows over different paths under congestion is one. call the per-flow routing. Another is scattering a flow over several paths (but that seems problematic for today's TcP which assumes all packets take the same path). > > 4) A different, but very coupled view of IP is that any application-relevant buffering shoujld be driven into the endpoints - at the source, buffering is useful to deal with variability in the rate of production of data to be sent. At the destination, buffering is useful to minimize jitter, matching to the consumption behavior of the application. But these buffers should not be pushed into the network where they cause congestion for other flows sharing resources. > So buffering in the network should ONLY deal with the uncertainty in resource competition. > > This tripartite breakdown of buffering is protocol independent. It applies to TCP, NTP, RTP, QUIC/UDP, ... It's what we (that is me) had in mind when we split UDP out of TCP, allowing UDP based protocols to manage source and destination buffering in the application for all the things we thought UDP would be used for - packet speech, computer-computer remote procedure calls (what would be QUIC today), SATNET/interplanetary Internet connections , ...). > > Sadly, in the many years since the late 1970's the tendency to think file transfers between infinite speed storage devices over TCP are the only relevant use of the Internet has penetrated the router design community. I can't seem to get anyone to recognize how far we are from that. No one runs benchmarks for such behavior, no one even measures anything other than the "hot rod" maximum throughput cases. > > And many egos seem to think that working on the hot rod cases is going to make their career or sell product. (e.g. the sad case of Arista). > > > On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" <moeller0@gmx.de> said: > > > > > > > > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> wrote: > > > > > > [...] > > > > > > per-flow scheduling is appropriate on a shared link. However, the end-to-end > > argument would suggest that the network not try to divine which flows get > > preferred. > > > And beyond the end-to-end argument, there's a practical problem - since the > > ideal state of a shared link means that it ought to have no local backlog in the > > queue, the information needed to schedule "fairly" isn't in the queue backlog > > itself. If there is only one packet, what's to schedule? > > > > > [...] > > > > Excuse my stupidity, but the "only one single packet" case is the theoretical > > limiting case, no? > > Because even on a link not running at capacity this effectively requires a > > mechanism to "synchronize" all senders (whose packets traverse the hop we are > > looking at), as no other packet is allowed to reach the hop unless the "current" > > one has been passed to the PHY otherwise we transiently queue 2 packets (I note > > that this rationale should hold for any small N). The more packets per second a > > hop handles the less likely it will be to avoid for any newcomer to run into an > > already existing packet(s), that is to transiently grow the queue. > > Not having a CS background, I fail to see how this required synchronized state can > > exist outside of a few steady state configurations where things change slowly > > enough that the seemingly required synchronization can actually happen (given > > that the feedback loop e.g. through ACKs, seems somewhat jittery). Since packets > > never know which path they take and which hop is going to be critical there seems > > to be no a priori way to synchronize all senders, heck I fail to see whether it > > would be possible at all to guarantee synchronized behavior on more than one hop > > (unless all hops are extremely uniform). > > I happen to believe that L4S suffers from the same conceptual issue (plus overly > > generic promises, from the RITE website: > > "We are so used to the unpredictability of queuing delay, we don’t know how > > good the Internet would feel without it. The RITE project has developed simple > > technology to make queuing delay a thing of the past—not just for a select > > few apps, but for all." this seems missing a conditions apply statement) > > > > Best Regards > > Sebastian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-26 16:31 ` David P. Reed 2019-06-26 16:53 ` David P. Reed @ 2019-06-27 7:49 ` Sebastian Moeller 2019-06-27 20:33 ` Brian E Carpenter 2019-06-27 7:53 ` Bless, Roland (TM) 2 siblings, 1 reply; 49+ messages in thread From: Sebastian Moeller @ 2019-06-27 7:49 UTC (permalink / raw) To: David P. Reed Cc: Jonathan Morton, ecn-sane, Brian E Carpenter, tsvwg IETF list Hi David, thanks for your response. > On Jun 26, 2019, at 18:31, David P. Reed <dpreed@deepplum.com> wrote: > > It's the limiting case, but also the optimal state given "perfect knowledge". > > Yes, it requires that the source-destination pairs sharing the link in question coordinate their packet admission times so they don't "collide" at the link. Ideally the next packet would arrive during the previous packet's transmission, so it is ready-to-go when that packet's transmission ends. > > Such exquisite coordination is feasible when future behavior by source and destination at the interface is known, which requires an Oracle. > That's the same kind of condition most information theoretic and queueing theoretic optimality requires. Ah, great, I had feared I had missed something. > > But this is worth keeping in mind as the overall joint goal of all users. > > In particular, "link utilization" isn't a user goal at all. The link is there and is being paid for whether it is used or not (looking from the network structure as a whole). Its capacity exists to move packets out of the way. An ideal link satisfies the requirement that it never creates a queue because of anything other than imperfect coordination of the end-to-end flows mapped onto it. That's why the router should not be measured by "link utilization" anymore than a tunnel in a city during commuting hours should be measured by cars moved per hour. Clearly a tunnel can be VERY congested and moving many cars if they are attached to each other bumper to bumper - the latency through the tunnel would then be huge. If the cars were tipped on their ends and stacked, even more throughput would be achieved through the tunnel, and the delay of rotating them and packing them would add even more delay. +1; this is the core of the movement under the "bufferbloat" moniker put latency back into the spot light where it belongs (at least for common inter-active network usage, bulk transfer is a different kettle of fish). Given the relative low rates of common internet access links, running at capacity, while not a primary goal, still becomes common enough to require special treatment to keep latency under load increase under control. Both FQ solutions and L4S offer remedies for that case. (Being a non-expert home-user myself this case also is prominent on my radar, my ISPs backbone and peerings/transits being well managed the access link is the one point where queueing happens, just as you describe). > > The idea that "link utilization" of 100% must be achieved is why we got bufferbloat designed into routers. While I do not describe to this view (and actually are trading in "top-speed" to keep latency sane) a considerable fraction of home-users seem obsessed in maxing out their access links and compare achievable rates; whether such behaviour shoud be encouraged is a different question. > It's a worm's eye perspective. To this day, Arista Networks brags about how its bufferbloated feature design optimizes switch utilization (https://packetpushers.net/aristas-big-buffer-b-s/). And it selects benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big name that he can sell defective gear at a premium price, letting the datacenters who buy it discover that those switches get "clogged up" by TCP traffic when they are the "bottleneck link". Fortunately, they are fast, so they are less frequently the bottleneck in datacenter daily use. > > In trying to understand what is going on with congestion signalling, any buffering at the entry to the link should be due only to imperfect information being fed back to the endpoints generating traffic. Because a misbehaving endpoint generates Denial of Service for all other users. This is a good point, and one of the reasons, why I conceptually like flow queueing, as that gives the tools to allow to isolate bad actors, "trust, but verify" comes to mind as a principle. I also add that the _only_ currently known L4S rolll-out target (low latency docsis) actually mandates a mechanism they call "queue protection" which to me looks pretty much like it is a FQ system that carefully tries to not call itself FQ (it monitors the length of flows and if they exceed something pushes them into the RFC3168 queue, which to this layman means it need to separately track the packets for each flow in the common queue to be able to re-direct them). > > Priority mechanisms focused on protecting high-paying users from low-paying ones don't help much - they only help at overloaded states of the network. In principle I agree, in practice things get complicated; mixing latency-indifferent capacity-devouring applications like bit-torrent with say VoIP packets (fixed rates, but latency sensitive) over too narrow a link will make it clear that giving the VoIP packet precedence/priority over the bulk-transfer packet is a sane policy (that becomes an issue due to the difficulty of running a narrow link below capacity). I am sure you are aware of all of this, I just need to spell it out for my thinking process. > Which isn't to say that priority does nothing - it's just that stable assignment of a sharing level to priority levels isn't easy. (See Paris Metro Pricing, where there are only two classes, and the problem of deciding how to manage the access to the "first class" section - the idea that 15 classes with different metrics can be handled simply and interoperably between differently managed autonomous systems seems to be an incredibly impractical goal). +1; any prioritization scheme should be extremely simple so that an end-user can make predictions about its behavior easily. Also IMHO 3 classes of latency behaviour will go a long way, "normal", "don-t care", "important" should be enough (L4S IMHO only offers "important" and normal, so does not offer to easily down-grade say bulk background transfers like bit-torrent (which is going to be an issue with bit-torrent triggering on ~100 induced latency increase with L4S's RFC3168 queue using a PIE offspring to keep induced latency << 100ms, but I digress).) > Even in the priority case, buffering is NOT a desirable end user thing. +1; IMHO again a reason for fq, misbehaving flows will not spoil the fun for everybody else. > > My personal view is that the manager of a network needs to configure the network so that no link ever gets overloaded, if possible. The response to overload should be to tell the relevant flows to all slow down (not just one, because if there are 100 flows that start up roughly at the same time, causing MD on one does very little. > This is an example of something where per-flow stuff in the router actually makes the router helpful in the large scheme of things. Maybe all flows should be equally informed, as flows. Which means the router needs to know how to signal multiple flows, while not just hammering all the packets of a single flow. This case is very real, but not as frequently on the client side as on the "server side" in "load balancers" and such like. > > My point here is simple: > > 1) the endpoints tell the routers what flows are going through a link already. That's just the address information. So that information can be used for fairness pretty well, especially if short term memory (a bloom filter, perhaps) can track a sufficiently large number of flows. > > 2) The per-flow decisions related to congestion control within a flow are necessarily end-to-end in nature - the router can only tell the ends what is going on, but the ends (together - their admissions rates and consumption rates are coupled to the use being made) must be informed and decide. The congestion management must combine information about the source and the destination future behavior (even if it is just taking recent history and projecting it as an estimate of future behavior at source and destination). Which is why it is quite natural to have routers signal the destination, which then signals the source, which changes its behavior. In an ideal world the router would also signal the sender as that will at least half the time it takes for the congestion information to reach the most relevant party; but as I understand this is a) not generally possible and b) prone to abuses. > > 3) there are definitely other ways to improve latency for IP and protocols built on top of it - routing some flows over different paths under congestion is one. call the per-flow routing. Another is scattering a flow over several paths (but that seems problematic for today's TcP which assumes all packets take the same path). This is about re-ordering, no? > > 4) A different, but very coupled view of IP is that any application-relevant buffering shoujld be driven into the endpoints - at the source, buffering is useful to deal with variability in the rate of production of data to be sent. At the destination, buffering is useful to minimize jitter, matching to the consumption behavior of the application. But these buffers should not be pushed into the network where they cause congestion for other flows sharing resources. > So buffering in the network should ONLY deal with the uncertainty in resource competition. This, at least in my understanding, is one of the underlaying ideas of the L4S approach, so how is your take on how well L4S archives that goal? > > This tripartite breakdown of buffering is protocol independent. It applies to TCP, NTP, RTP, QUIC/UDP, ... It's what we (that is me) had in mind when we split UDP out of TCP, allowing UDP based protocols to manage source and destination buffering in the application for all the things we thought UDP would be used for - packet speech, computer-computer remote procedure calls (what would be QUIC today), SATNET/interplanetary Internet connections , ...). Like many great insights that look obvious in retro-spect, I would guess that might have been controversial at its time? > > Sadly, in the many years since the late 1970's the tendency to think file transfers between infinite speed storage devices over TCP are the only relevant use of the Internet has penetrated the router design community. I can't seem to get anyone to recognize how far we are from that. No one runs benchmarks for such behavior, no one even measures anything other than the "hot rod" maximum throughput cases. I would guess, that this obsession might be market-driven, as long as customers only look at the top-speed numbers, increasing this number will be the priority. Again thanks for your insights Sebastian > > And many egos seem to think that working on the hot rod cases is going to make their career or sell product. (e.g. the sad case of Arista). > > > On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" <moeller0@gmx.de> said: > > > > > > > > On Jun 23, 2019, at 00:09, David P. Reed <dpreed@deepplum.com> wrote: > > > > > > [...] > > > > > > per-flow scheduling is appropriate on a shared link. However, the end-to-end > > argument would suggest that the network not try to divine which flows get > > preferred. > > > And beyond the end-to-end argument, there's a practical problem - since the > > ideal state of a shared link means that it ought to have no local backlog in the > > queue, the information needed to schedule "fairly" isn't in the queue backlog > > itself. If there is only one packet, what's to schedule? > > > > > [...] > > > > Excuse my stupidity, but the "only one single packet" case is the theoretical > > limiting case, no? > > Because even on a link not running at capacity this effectively requires a > > mechanism to "synchronize" all senders (whose packets traverse the hop we are > > looking at), as no other packet is allowed to reach the hop unless the "current" > > one has been passed to the PHY otherwise we transiently queue 2 packets (I note > > that this rationale should hold for any small N). The more packets per second a > > hop handles the less likely it will be to avoid for any newcomer to run into an > > already existing packet(s), that is to transiently grow the queue. > > Not having a CS background, I fail to see how this required synchronized state can > > exist outside of a few steady state configurations where things change slowly > > enough that the seemingly required synchronization can actually happen (given > > that the feedback loop e.g. through ACKs, seems somewhat jittery). Since packets > > never know which path they take and which hop is going to be critical there seems > > to be no a priori way to synchronize all senders, heck I fail to see whether it > > would be possible at all to guarantee synchronized behavior on more than one hop > > (unless all hops are extremely uniform). > > I happen to believe that L4S suffers from the same conceptual issue (plus overly > > generic promises, from the RITE website: > > "We are so used to the unpredictability of queuing delay, we don’t know how > > good the Internet would feel without it. The RITE project has developed simple > > technology to make queuing delay a thing of the past—not just for a select > > few apps, but for all." this seems missing a conditions apply statement) > > > > Best Regards > > Sebastian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-27 7:49 ` Sebastian Moeller @ 2019-06-27 20:33 ` Brian E Carpenter 2019-06-27 21:31 ` David P. Reed 0 siblings, 1 reply; 49+ messages in thread From: Brian E Carpenter @ 2019-06-27 20:33 UTC (permalink / raw) To: Sebastian Moeller, David P. Reed Cc: Jonathan Morton, ecn-sane, tsvwg IETF list On 27-Jun-19 19:49, Sebastian Moeller wrote: ... > a considerable fraction of home-users seem obsessed in maxing out their access links and compare achievable rates; whether such behaviour shoud be encouraged is a different question. I think this is encouraged by, or is even a direct result of, so called "speed tests" for use by consumers (such as https://www.speedtest.net/), and the way connectivity "speed" has been used as a marketing tool. At least where I live, "speed" is the main marketing tool for switching users to fibre instead of copper. No doubt it will be used as the main marketing tool for 5G too. It's almost as if those marketing people don't understand queueing theory. Brian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-27 20:33 ` Brian E Carpenter @ 2019-06-27 21:31 ` David P. Reed 2019-06-28 7:49 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-06-27 21:31 UTC (permalink / raw) To: Brian E Carpenter Cc: Sebastian Moeller, Jonathan Morton, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 2873 bytes --] It's even worse. The FCC got focused on max speeds back in the day as its only way to think about Internet Access service. And I was serving on the FCC Technological Advisory Committee and also in its Spectrum Policy Task Force, then later involved in the rather confused discussions of Network Neutrality, where again "speed" in the "up-to" sense was the sole framing of the discussion. Because it was mostly lawyers and lobbyists (not network engineers), this focus on max speed as the sole measure of quality ended up with a huge distortion of the discussion, strongly encouraged by the lobbyists who love confusion. That said, max speed plays a role at all time scales in minimizing response time, but queuing delay has no constituency, even though its impact is FAR worse in real situations. If the FCC and regulators (or even the DoD communications management layers) ever start talking about queueing delay in shared network services, I will die of shock. But we did have one HUGE temporary success. The speed test at DSL Reports measures lag under load, and calls it bufferbloat, and gives a reasonably scaled score. When I talk to people who are interested in quality of Internet service, I point them at DSL Reports' speed test. That is a big win. However, marketers of Internet access services don't compete to get good scores at DSL Reports. Even "business" providers provide crappy scores. Comcast Business in the South Bay does very poorly on bufferbloat for its high speed business services, for example. This is based on my measurements at my company. I know some very high executives there, and Comcast is the only real game in town for us, so I tried to get the folks in Philadelphia to talk to the local managers. Turns out the local managers just refused to listen to the headquarters execs. They saw no monetary benefit in fixing anything (going from DOCSIS 2 to DOCSIS 3.1 which had already been on the market for several years would have fixed it, probably). On Thursday, June 27, 2019 4:33pm, "Brian E Carpenter" <brian.e.carpenter@gmail.com> said: > On 27-Jun-19 19:49, Sebastian Moeller wrote: > ... > > a considerable fraction of home-users seem obsessed in maxing out their > access links and compare achievable rates; whether such behaviour shoud be > encouraged is a different question. > > I think this is encouraged by, or is even a direct result of, so called "speed > tests" for use by consumers (such as https://www.speedtest.net/), and the way > connectivity "speed" has been used as a marketing tool. At least where I live, > "speed" is the main marketing tool for switching users to fibre instead of copper. > No doubt it will be used as the main marketing tool for 5G too. > > It's almost as if those marketing people don't understand queueing theory. > > Brian > > > [-- Attachment #2: Type: text/html, Size: 4903 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-27 21:31 ` David P. Reed @ 2019-06-28 7:49 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 49+ messages in thread From: Toke Høiland-Jørgensen @ 2019-06-28 7:49 UTC (permalink / raw) To: David P. Reed, Brian E Carpenter; +Cc: ecn-sane, tsvwg IETF list "David P. Reed" <dpreed@deepplum.com> writes: > It's even worse. The FCC got focused on max speeds back in the day as > its only way to think about Internet Access service. And I was serving > on the FCC Technological Advisory Committee and also in its Spectrum > Policy Task Force, then later involved in the rather confused > discussions of Network Neutrality, where again "speed" in the "up-to" > sense was the sole framing of the discussion. > > Because it was mostly lawyers and lobbyists (not network engineers), > this focus on max speed as the sole measure of quality ended up with a > huge distortion of the discussion, strongly encouraged by the > lobbyists who love confusion. > > That said, max speed plays a role at all time scales in minimizing > response time, but queuing delay has no constituency, even though its > impact is FAR worse in real situations. > > If the FCC and regulators (or even the DoD communications management > layers) ever start talking about queueing delay in shared network > services, I will die of shock. > > But we did have one HUGE temporary success. The speed test at DSL > Reports measures lag under load, and calls it bufferbloat, and gives a > reasonably scaled score. The Netflix test at fast.com does as well now (although it's under the "more info" button, so not as visible by default). -Toke ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-26 16:31 ` David P. Reed 2019-06-26 16:53 ` David P. Reed 2019-06-27 7:49 ` Sebastian Moeller @ 2019-06-27 7:53 ` Bless, Roland (TM) 2 siblings, 0 replies; 49+ messages in thread From: Bless, Roland (TM) @ 2019-06-27 7:53 UTC (permalink / raw) To: David P. Reed, Sebastian Moeller; +Cc: ecn-sane, tsvwg IETF list Hi, Am 26.06.19 um 18:31 schrieb David P. Reed: > The idea that "link utilization" of 100% must be achieved is why we got > bufferbloat designed into routers. It's a worm's eye perspective. To You are right, but IMHO it is even worse, because it is an artefact of the particular loss-based TCP congestion control as it was designed and the correspondingly derived "BDP buffer size" rule. The loss-based AIMD congestion control is able to keep up the utilization then, because it uses the systematically built standing queue during its back-off after a packet loss. That essentially keeps the sending rate at the same level. However, now we know that avoiding queuing delay is important, we can also design better congestion controls that do not fill the available buffer capacity up to exhaustion and that do not require standing queues to keep up link utilization. However, as you also wrote, the positive and negative effects of existing buffers depend a lot on the particular traffic pattern and this has also changed much during the last decades. So I think that revising the "buffer size" discussion could be useful... Aside from that I find the SCE proposal very useful, because it allows to provide an additional level of congestion signalling that could be used by various congestion control schemes. Regards, Roland ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 19:50 ` David P. Reed 2019-06-22 20:47 ` Jonathan Morton @ 2019-06-22 21:10 ` Brian E Carpenter 2019-06-22 22:25 ` David P. Reed 1 sibling, 1 reply; 49+ messages in thread From: Brian E Carpenter @ 2019-06-22 21:10 UTC (permalink / raw) To: David P. Reed Cc: Luca Muscariello, Sebastian Moeller, ecn-sane, tsvwg IETF list Just three or four small comments: On 23-Jun-19 07:50, David P. Reed wrote: > Two points: > > > > - Jerry Saltzer and I were the primary authors of the End-to-end argument paper, and the motivation was based *my* work on the original TCP and IP protocols. Dave Clark got involved significantly later than all those decisions, which were basically complete when he got involved. (Jerry was my thesis supervisor, I was his student, and I operated largely independently, taking input from various others at MIT). I mention this because Dave understands the end-to-end arguments, but he understands (as we all did) that it was a design *principle* and not a perfectly strict rule. That said, it's a rule that has a strong foundational argument from modularity and evolvability in a context where the system has to work on a wide range of infrastructures (not all knowable in advance) and support a wide range of usage/application-areas (not all knowable in advance). Treating the paper as if it were "DDC" declaring a law is just wrong. He wasn't Moses and it is not written on tablets. Dave > did have some "power" in his role of trying to achieve interoperability across diverse implementations. But his focus was primarily on interoperability, not other things. So ideas in the IP protocol like "TOS" which were largely placeholders for not-completely-worked-out concepts deferred to the future were left till later. Yes, well understood, but he was in fact the link between the e2e paper and the differentiated services work. Although not a nominal author of the "two-bit" RFC, he was heavily involved in it, which is why I mentioned him. And he was very active in the IETF diffserv WG. > - It is clear (at least to me) that from the point of view of the source of an IP datagram, the "handling" of that datagram within the network of networks can vary, and so that is why there is a TOS field - to specify an interoperable, meaningfully described per-packet indicator of differential handling. In regards to the end-to-end argument, that handling choice is a network function, *to the extent that it can completely be implemented in the network itself*. > > Congestion management, however, is not achievable entirely and only within the network. That's completely obvious: congestion happens when the source-destination flows exceed the capacity of the network of networks to satisfy all demands. > > The network can only implement *certain* general kinds of mechanisms that may be used by the endpoints to resolve congestion: > > 1) admission controls. These are implemented at the interface between the source entity and the network of networks. They tend to be impractical in the Internet context, because there is, by a fundamental and irreversible design choice made by Cerf and Kahn (and the rest of us), no central controller of the entire network of networks. This is to make evolvability and scalability work. 5G (not an Internet system) implies a central controller, as does SNA, LTE, and many other networks. The Internet is an overlay on top of such networks. > > 2) signalling congestion to the endpoints, which will respond by slowing their transmission rate (or explicitly re-routing transmission, or compressing their content) through the network to match capacity. This response is done *above* the IP layer, and has proven very practical. The function in the network is reduced to "congestion signalling", in a universally understandable meaningful mechanism: packet drops, ECN, packet-pair separation in arrival time, ... This limited function is essential within the network, because it is the state of the path(s) that is needed to implement the full function at the end points. So congestion signalling, like ECN, is implemented according to the end-to-end argument by carefully defining the network function to be the minimum necessary mechanism so that endpoints can control their rates. > > 3) automatic selection of routes for flows. It's perfectly fine to select different routes based on information in the IP header (the part that is intended to be read and understood by the network of networks). Now this is currently *rarely* done, due to the complexity of tracking more detailed routing information at the router level. But we had expected that eventually the Internet would be so well connected that there would be diverse routes with diverse capabilities. For example, the "Interplanetary Internet" works with datagrams, that can be implemented with IP, but not using TCP, which requires very low end-to-end latency. Thus, one would expect that TCP would not want any packets transferred over a path via Mars, or for that matter a geosynchronous satellite, even if the throughput would be higher. > > So one can imagine that eventually a "TOS" might say - send this packet preferably along a path that has at most 200 ms. RTT, *even if that leads to congestion signalling*, while another TOS might say "send this path over the most "capacious" set of paths, ignoring RTT entirely. (these are just for illustration, but obviously something like this woujld work). > > Note that TOS is really aimed at *route selection* preferences, and not queueing management of individual routers. That may well have been the original intention, but it was hardly mentioned at all in the diffserv WG (which I co-chaired), and "QOS-based routing" was in very bad odour at that time. > > Queueing management to share a single queue on a path for multiple priorities of traffic is not very compatible with "end-to-end arguments". There are any number of reasons why this doesn't work well. I can go into them. Mainly these reasons are why "diffserv" has never been adopted - Oh, but it has, in lots of local deployments of voice over IP for example. It's what I've taken to calling a limited domain protocol. What has not happened is Internet-wide deployment, because... > it's NOT interoperable because the diversity of traffic between endpoints is hard to specify in a way that translates into the network mechanisms. Of course any queue can be managed in some algorithmic way with parameters, but the endpoints that want to specify an end-to-end goal don't have a way to understand the impact of those parameters on a specific queue that is currently congested. Yes. And thanks for your insights. Brian > > > > Instead, the history of the Internet (and for that matter *all* networks, even Bell's voice systems) has focused on minimizing queueing delay to near zero throughout the network by whatever means it has at the endpoints or in the design. This is why we have AIMD's MD as a response to detection of congestion. > > > > Pragmatic networks (those that operate in the real world) do not choose to operate with shared links in a saturated state. That's known in the phone business as the Mother's Day problem. You want to have enough capacity for the rare near-overload to never result in congestion. Which means that the normal state of the network is very lightly loaded indeed, in order to minimize RTT. Consequently, focusing on somehow trying to optimize the utilization of the network to 100% is just a purely academic exercise. Since "priority" at the packet level within a queue only improves that case, it's just a focus of (bad) Ph.D. theses. (Good Ph.D. theses focus on actual real problems like getting the queues down to 1 packet or less by signalling the endpoints with information that allows them to do their job). > > > > So, in considering what goes in the IP layer, both its header and the mechanics of the network of networks, it is those things that actually have implementable meaning in the network of networks when processing the IP datagram. The rest is "content" because the network of networks doesn't need to see it. > > > > Thus, don't put anything in the IP header that belongs in the "content" part, just being a signal between end points. Some information used in the network of networks is also logically carried between endpoints. > > > > > > On Friday, June 21, 2019 4:37pm, "Brian E Carpenter" <brian.e.carpenter@gmail.com> said: > >> Below... >> On 21-Jun-19 21:33, Luca Muscariello wrote: >> > + David Reed, as I'm not sure he's on the ecn-sane list. >> > >> > To me, it seems like a very religious position against per-flow >> queueing. >> > BTW, I fail to see how this would violate (in a "profound" way ) the e2e >> principle. >> > >> > When I read it (the e2e principle) >> > >> > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End Arguments in >> System Design". >> > In: Proceedings of the Second International Conference on Distributed >> Computing Systems. Paris, France. >> > April 8–10, 1981. IEEE Computer Society, pp. 509-512. >> > (available on line for free). >> > >> > It seems very much like the application of the Occam's razor to function >> placement in communication networks back in the 80s. >> > I see no conflict between what is written in that paper and per-flow queueing >> today, even after almost 40 years. >> > >> > If that was the case, then all service differentiation techniques would >> violate the e2e principle in a "profound" way too, >> > and dualQ too. A policer? A shaper? A priority queue? >> > >> > Luca >> >> Quoting RFC2638 (the "two-bit" RFC): >> >> >>> Both these >> >>> proposals seek to define a single common mechanism that is used >> by >> >>> interior network routers, pushing most of the complexity and state >> of >> >>> differentiated services to the network edges. >> >> I can't help thinking that if DDC had felt this was against the E2E principle, >> he would have kicked up a fuss when it was written. >> >> Bob's right, however, that there might be a tussle here. If end-points are >> attempting to pace their packets to suit their own needs, and the network is >> policing packets to support both service differentiation and fairness, >> these may well be competing rather than collaborating behaviours. And there >> probably isn't anything we can do about it by twiddling with algorithms. >> >> Brian >> >> >> >> >> >> >> >> > >> > >> > >> > >> > >> > >> > >> > >> > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller <moeller0@gmx.de >> <mailto:moeller0@gmx.de>> wrote: >> > >> > >> > >> > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net >> <mailto:ietf@bobbriscoe.net>> wrote: >> > > >> > > Jake, all, >> > > >> > > You may not be aware of my long history of concern about how >> per-flow scheduling within endpoints and networks will limit the Internet in >> future. I find per-flow scheduling a violation of the e2e principle in such a >> profound way - the dynamic choice of the spacing between packets - that most >> people don't even associate it with the e2e principle. >> > >> > Maybe because it is not a violation of the e2e principle at all? My point >> is that with shared resources between the endpoints, the endpoints simply should >> have no expectancy that their choice of spacing between packets will be conserved. >> For the simple reason that it seems generally impossible to guarantee that >> inter-packet spacing is conserved (think "cross-traffic" at the bottleneck hop >> along the path and general bunching up of packets in the queue of a fast to slow >> transition*). I also would claim that the way L4S works (if it works) is to >> synchronize all active flows at the bottleneck which in tirn means each sender has >> only a very small timewindow in which to transmit a packet for it to hits its >> "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing delay >> guarantees will not work. In other words the senders have basically no say in the >> "spacing between packets", I fail to see how L4S improves upon FQ in that regard. >> > >> > >> > IMHO having per-flow fairness as the defaults seems quite >> reasonable, endpoints can still throttle flows to their liking. Now per-flow >> fairness still can be "abused", so by itself it might not be sufficient, but >> neither is L4S as it has at best stochastic guarantees, as a single queue AQM >> (let's ignore the RFC3168 part of the AQM) there is the probability to send a >> throtteling signal to a low bandwidth flow (fair enough, it is only a mild >> throtteling signal, but still). >> > But enough about my opinion, what is the ideal fairness measure in your >> mind, and what is realistically achievable over the internet? >> > >> > >> > Best Regards >> > Sebastian >> > >> > >> > >> > >> > > >> > > I detected that you were talking about FQ in a way that might have >> assumed my concern with it was just about implementation complexity. If you (or >> anyone watching) is not aware of the architectural concerns with per-flow >> scheduling, I can enumerate them. >> > > >> > > I originally started working on what became L4S to prove that it was >> possible to separate out reducing queuing delay from throughput scheduling. When >> Koen and I started working together on this, we discovered we had identical >> concerns on this. >> > > >> > > >> > > >> > > Bob >> > > >> > > >> > > -- >> > > ________________________________________________________________ >> > > Bob Briscoe >> http://bobbriscoe.net/ >> > > >> > > _______________________________________________ >> > > Ecn-sane mailing list >> > > Ecn-sane@lists.bufferbloat.net >> <mailto:Ecn-sane@lists.bufferbloat.net> >> > > https://lists.bufferbloat.net/listinfo/ecn-sane >> > >> > _______________________________________________ >> > Ecn-sane mailing list >> > Ecn-sane@lists.bufferbloat.net >> <mailto:Ecn-sane@lists.bufferbloat.net> >> > https://lists.bufferbloat.net/listinfo/ecn-sane >> > >> >> > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 21:10 ` Brian E Carpenter @ 2019-06-22 22:25 ` David P. Reed 2019-06-22 22:30 ` Luca Muscariello 0 siblings, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-06-22 22:25 UTC (permalink / raw) To: Brian E Carpenter Cc: Luca Muscariello, Sebastian Moeller, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 16286 bytes --] Given the complexity of my broader comments, let me be clear that I have no problem with the broad concept of diffserv being compatible with the end-to-end arguments. I was trying to lay out what I think is a useful way to think about these kinds of issues within the Internet context. Similarly, per-flow scheduling as an end-to-end concept (different flows defined by address pairs being jointly managed as entities) makes great sense, but it's really important to be clear that queue prioritization within a single queue at entry to a bottleneck link is a special case mechanism, and not a general end-to-end concept at the IP datagram level, given the generality of IP as a network packet transport protocol. It's really tied closely to routing, which isn't specified in any way by IP, other than "best efforts", a term that has become much more well defined over the years (including the notions of dropping rather than storing packets, the idea that successive IP datagrams should traverse roughly the same path in order to have stable congestion detection, ...). Per-flow scheduling seems to work quite well in the cases where it applies, transparently below the IP datagram layer (that is, underneath the hourglass neck). IP effectively defines "flows", and it is reasonable to me that "best efforts" as a concept could include some notion of network-wide fairness among flows. Link-level "fairness" isn't a necessary precondition to network level fairness. On Saturday, June 22, 2019 5:10pm, "Brian E Carpenter" <brian.e.carpenter@gmail.com> said: > Just three or four small comments: > > On 23-Jun-19 07:50, David P. Reed wrote: > > Two points: > > > > > > > > - Jerry Saltzer and I were the primary authors of the End-to-end argument > paper, and the motivation was based *my* work on the original TCP and IP > protocols. Dave Clark got involved significantly later than all those decisions, > which were basically complete when he got involved. (Jerry was my thesis > supervisor, I was his student, and I operated largely independently, taking input > from various others at MIT). I mention this because Dave understands the > end-to-end arguments, but he understands (as we all did) that it was a design > *principle* and not a perfectly strict rule. That said, it's a rule that has a > strong foundational argument from modularity and evolvability in a context where > the system has to work on a wide range of infrastructures (not all knowable in > advance) and support a wide range of usage/application-areas (not all knowable in > advance). Treating the paper as if it were "DDC" declaring a law is just wrong. He > wasn't Moses and it is not written on tablets. Dave > > did have some "power" in his role of trying to achieve interoperability > across diverse implementations. But his focus was primarily on interoperability, > not other things. So ideas in the IP protocol like "TOS" which were largely > placeholders for not-completely-worked-out concepts deferred to the future were > left till later. > > Yes, well understood, but he was in fact the link between the e2e paper and the > differentiated services work. Although not a nominal author of the "two-bit" RFC, > he was heavily involved in it, which is why I mentioned him. And he was very > active in the IETF diffserv WG. > > - It is clear (at least to me) that from the point of view of the source of > an IP datagram, the "handling" of that datagram within the network of networks can > vary, and so that is why there is a TOS field - to specify an interoperable, > meaningfully described per-packet indicator of differential handling. In regards > to the end-to-end argument, that handling choice is a network function, *to the > extent that it can completely be implemented in the network itself*. > > > > Congestion management, however, is not achievable entirely and only within > the network. That's completely obvious: congestion happens when the > source-destination flows exceed the capacity of the network of networks to satisfy > all demands. > > > > The network can only implement *certain* general kinds of mechanisms that may > be used by the endpoints to resolve congestion: > > > > 1) admission controls. These are implemented at the interface between the > source entity and the network of networks. They tend to be impractical in the > Internet context, because there is, by a fundamental and irreversible design > choice made by Cerf and Kahn (and the rest of us), no central controller of the > entire network of networks. This is to make evolvability and scalability work. 5G > (not an Internet system) implies a central controller, as does SNA, LTE, and many > other networks. The Internet is an overlay on top of such networks. > > > > 2) signalling congestion to the endpoints, which will respond by slowing > their transmission rate (or explicitly re-routing transmission, or compressing > their content) through the network to match capacity. This response is done > *above* the IP layer, and has proven very practical. The function in the network > is reduced to "congestion signalling", in a universally understandable meaningful > mechanism: packet drops, ECN, packet-pair separation in arrival time, ... > This limited function is essential within the network, because it is the state of > the path(s) that is needed to implement the full function at the end points. So > congestion signalling, like ECN, is implemented according to the end-to-end > argument by carefully defining the network function to be the minimum necessary > mechanism so that endpoints can control their rates. > > > > 3) automatic selection of routes for flows. It's perfectly fine to select > different routes based on information in the IP header (the part that is intended > to be read and understood by the network of networks). Now this is currently > *rarely* done, due to the complexity of tracking more detailed routing information > at the router level. But we had expected that eventually the Internet would be so > well connected that there would be diverse routes with diverse capabilities. For > example, the "Interplanetary Internet" works with datagrams, that can be > implemented with IP, but not using TCP, which requires very low end-to-end > latency. Thus, one would expect that TCP would not want any packets transferred > over a path via Mars, or for that matter a geosynchronous satellite, even if the > throughput would be higher. > > > > So one can imagine that eventually a "TOS" might say - send this packet > preferably along a path that has at most 200 ms. RTT, *even if that leads to > congestion signalling*, while another TOS might say "send this path over the most > "capacious" set of paths, ignoring RTT entirely. (these are just for illustration, > but obviously something like this woujld work). > > > > Note that TOS is really aimed at *route selection* preferences, and not > queueing management of individual routers. > > That may well have been the original intention, but it was hardly mentioned at all > in the diffserv WG (which I co-chaired), and "QOS-based routing" was in very bad > odour at that time. > > > > > Queueing management to share a single queue on a path for multiple priorities > of traffic is not very compatible with "end-to-end arguments". There are any > number of reasons why this doesn't work well. I can go into them. Mainly these > reasons are why "diffserv" has never been adopted - > > Oh, but it has, in lots of local deployments of voice over IP for example. It's > what I've taken to calling a limited domain protocol. What has not happened is > Internet-wide deployment, because... > > > it's NOT interoperable because the diversity of traffic between endpoints is > hard to specify in a way that translates into the network mechanisms. Of course > any queue can be managed in some algorithmic way with parameters, but the > endpoints that want to specify an end-to-end goal don't have a way to understand > the impact of those parameters on a specific queue that is currently congested. > > Yes. And thanks for your insights. > > Brian > > > > > > > > > Instead, the history of the Internet (and for that matter *all* networks, > even Bell's voice systems) has focused on minimizing queueing delay to near zero > throughout the network by whatever means it has at the endpoints or in the design. > This is why we have AIMD's MD as a response to detection of congestion. > > > > > > > > Pragmatic networks (those that operate in the real world) do not choose to > operate with shared links in a saturated state. That's known in the phone business > as the Mother's Day problem. You want to have enough capacity for the rare > near-overload to never result in congestion. Which means that the normal > state of the network is very lightly loaded indeed, in order to minimize RTT. > Consequently, focusing on somehow trying to optimize the utilization of the > network to 100% is just a purely academic exercise. Since "priority" at the packet > level within a queue only improves that case, it's just a focus of (bad) Ph.D. > theses. (Good Ph.D. theses focus on actual real problems like getting the queues > down to 1 packet or less by signalling the endpoints with information that allows > them to do their job). > > > > > > > > So, in considering what goes in the IP layer, both its header and the > mechanics of the network of networks, it is those things that actually have > implementable meaning in the network of networks when processing the IP datagram. > The rest is "content" because the network of networks doesn't need to see it. > > > > > > > > Thus, don't put anything in the IP header that belongs in the "content" part, > just being a signal between end points. Some information used in the network of > networks is also logically carried between endpoints. > > > > > > > > > > > > On Friday, June 21, 2019 4:37pm, "Brian E Carpenter" > <brian.e.carpenter@gmail.com> said: > > > >> Below... > >> On 21-Jun-19 21:33, Luca Muscariello wrote: > >> > + David Reed, as I'm not sure he's on the ecn-sane list. > >> > > >> > To me, it seems like a very religious position against per-flow > >> queueing. > >> > BTW, I fail to see how this would violate (in a "profound" way ) the > e2e > >> principle. > >> > > >> > When I read it (the e2e principle) > >> > > >> > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End > Arguments in > >> System Design". > >> > In: Proceedings of the Second International Conference on > Distributed > >> Computing Systems. Paris, France. > >> > April 8–10, 1981. IEEE Computer Society, pp. 509-512. > >> > (available on line for free). > >> > > >> > It seems very much like the application of the Occam's razor to > function > >> placement in communication networks back in the 80s. > >> > I see no conflict between what is written in that paper and per-flow > queueing > >> today, even after almost 40 years. > >> > > >> > If that was the case, then all service differentiation techniques > would > >> violate the e2e principle in a "profound" way too, > >> > and dualQ too. A policer? A shaper? A priority queue? > >> > > >> > Luca > >> > >> Quoting RFC2638 (the "two-bit" RFC): > >> > >> >>> Both these > >> >>> proposals seek to define a single common mechanism that is > used > >> by > >> >>> interior network routers, pushing most of the complexity and > state > >> of > >> >>> differentiated services to the network edges. > >> > >> I can't help thinking that if DDC had felt this was against the E2E > principle, > >> he would have kicked up a fuss when it was written. > >> > >> Bob's right, however, that there might be a tussle here. If end-points > are > >> attempting to pace their packets to suit their own needs, and the network > is > >> policing packets to support both service differentiation and fairness, > >> these may well be competing rather than collaborating behaviours. And > there > >> probably isn't anything we can do about it by twiddling with algorithms. > >> > >> Brian > >> > >> > >> > >> > >> > >> > >> > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller > <moeller0@gmx.de > >> <mailto:moeller0@gmx.de>> wrote: > >> > > >> > > >> > > >> > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net > >> <mailto:ietf@bobbriscoe.net>> wrote: > >> > > > >> > > Jake, all, > >> > > > >> > > You may not be aware of my long history of concern about how > >> per-flow scheduling within endpoints and networks will limit the Internet > in > >> future. I find per-flow scheduling a violation of the e2e principle in > such a > >> profound way - the dynamic choice of the spacing between packets - that > most > >> people don't even associate it with the e2e principle. > >> > > >> > Maybe because it is not a violation of the e2e principle at all? My > point > >> is that with shared resources between the endpoints, the endpoints simply > should > >> have no expectancy that their choice of spacing between packets will be > conserved. > >> For the simple reason that it seems generally impossible to guarantee > that > >> inter-packet spacing is conserved (think "cross-traffic" at the > bottleneck hop > >> along the path and general bunching up of packets in the queue of a fast > to slow > >> transition*). I also would claim that the way L4S works (if it works) is > to > >> synchronize all active flows at the bottleneck which in tirn means each > sender has > >> only a very small timewindow in which to transmit a packet for it to hits > its > >> "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing > delay > >> guarantees will not work. In other words the senders have basically no > say in the > >> "spacing between packets", I fail to see how L4S improves upon FQ in that > regard. > >> > > >> > > >> > IMHO having per-flow fairness as the defaults seems quite > >> reasonable, endpoints can still throttle flows to their liking. Now > per-flow > >> fairness still can be "abused", so by itself it might not be sufficient, > but > >> neither is L4S as it has at best stochastic guarantees, as a single queue > AQM > >> (let's ignore the RFC3168 part of the AQM) there is the probability to > send a > >> throtteling signal to a low bandwidth flow (fair enough, it is only a > mild > >> throtteling signal, but still). > >> > But enough about my opinion, what is the ideal fairness measure in > your > >> mind, and what is realistically achievable over the internet? > >> > > >> > > >> > Best Regards > >> > Sebastian > >> > > >> > > >> > > >> > > >> > > > >> > > I detected that you were talking about FQ in a way that might > have > >> assumed my concern with it was just about implementation complexity. If > you (or > >> anyone watching) is not aware of the architectural concerns with > per-flow > >> scheduling, I can enumerate them. > >> > > > >> > > I originally started working on what became L4S to prove that > it was > >> possible to separate out reducing queuing delay from throughput > scheduling. When > >> Koen and I started working together on this, we discovered we had > identical > >> concerns on this. > >> > > > >> > > > >> > > > >> > > Bob > >> > > > >> > > > >> > > -- > >> > > > ________________________________________________________________ > >> > > Bob Briscoe > > >> > http://bobbriscoe.net/ > >> > > > >> > > _______________________________________________ > >> > > Ecn-sane mailing list > >> > > Ecn-sane@lists.bufferbloat.net > >> <mailto:Ecn-sane@lists.bufferbloat.net> > >> > > https://lists.bufferbloat.net/listinfo/ecn-sane > >> > > >> > _______________________________________________ > >> > Ecn-sane mailing list > >> > Ecn-sane@lists.bufferbloat.net > >> <mailto:Ecn-sane@lists.bufferbloat.net> > >> > https://lists.bufferbloat.net/listinfo/ecn-sane > >> > > >> > >> > > > > [-- Attachment #2: Type: text/html, Size: 21131 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-06-22 22:25 ` David P. Reed @ 2019-06-22 22:30 ` Luca Muscariello 0 siblings, 0 replies; 49+ messages in thread From: Luca Muscariello @ 2019-06-22 22:30 UTC (permalink / raw) To: David P. Reed Cc: Brian E Carpenter, Sebastian Moeller, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 17377 bytes --] Thanks for the insights. On Sun 23 Jun 2019 at 00:25, David P. Reed <dpreed@deepplum.com> wrote: > Given the complexity of my broader comments, let me be clear that I have > no problem with the broad concept of diffserv being compatible with the > end-to-end arguments. I was trying to lay out what I think is a useful way > to think about these kinds of issues within the Internet context. > > > > Similarly, per-flow scheduling as an end-to-end concept (different flows > defined by address pairs being jointly managed as entities) makes great > sense, but it's really important to be clear that queue prioritization > within a single queue at entry to a bottleneck link is a special case > mechanism, and not a general end-to-end concept at the IP datagram level, > given the generality of IP as a network packet transport protocol. It's > really tied closely to routing, which isn't specified in any way by IP, > other than "best efforts", a term that has become much more well defined > over the years (including the notions of dropping rather than storing > packets, the idea that successive IP datagrams should traverse roughly the > same path in order to have stable congestion detection, ...). > > > > Per-flow scheduling seems to work quite well in the cases where it > applies, transparently below the IP datagram layer (that is, underneath the > hourglass neck). IP effectively defines "flows", and it is reasonable to me > that "best efforts" as a concept could include some notion of network-wide > fairness among flows. Link-level "fairness" isn't a necessary precondition > to network level fairness. > > > > On Saturday, June 22, 2019 5:10pm, "Brian E Carpenter" < > brian.e.carpenter@gmail.com> said: > > > Just three or four small comments: > > > > On 23-Jun-19 07:50, David P. Reed wrote: > > > Two points: > > > > > > > > > > > > - Jerry Saltzer and I were the primary authors of the End-to-end > argument > > paper, and the motivation was based *my* work on the original TCP and IP > > protocols. Dave Clark got involved significantly later than all those > decisions, > > which were basically complete when he got involved. (Jerry was my thesis > > supervisor, I was his student, and I operated largely independently, > taking input > > from various others at MIT). I mention this because Dave understands the > > end-to-end arguments, but he understands (as we all did) that it was a > design > > *principle* and not a perfectly strict rule. That said, it's a rule that > has a > > strong foundational argument from modularity and evolvability in a > context where > > the system has to work on a wide range of infrastructures (not all > knowable in > > advance) and support a wide range of usage/application-areas (not all > knowable in > > advance). Treating the paper as if it were "DDC" declaring a law is just > wrong. He > > wasn't Moses and it is not written on tablets. Dave > > > did have some "power" in his role of trying to achieve interoperability > > across diverse implementations. But his focus was primarily on > interoperability, > > not other things. So ideas in the IP protocol like "TOS" which were > largely > > placeholders for not-completely-worked-out concepts deferred to the > future were > > left till later. > > > > Yes, well understood, but he was in fact the link between the e2e paper > and the > > differentiated services work. Although not a nominal author of the > "two-bit" RFC, > > he was heavily involved in it, which is why I mentioned him. And he was > very > > active in the IETF diffserv WG. > > > - It is clear (at least to me) that from the point of view of the > source of > > an IP datagram, the "handling" of that datagram within the network of > networks can > > vary, and so that is why there is a TOS field - to specify an > interoperable, > > meaningfully described per-packet indicator of differential handling. In > regards > > to the end-to-end argument, that handling choice is a network function, > *to the > > extent that it can completely be implemented in the network itself*. > > > > > > Congestion management, however, is not achievable entirely and only > within > > the network. That's completely obvious: congestion happens when the > > source-destination flows exceed the capacity of the network of networks > to satisfy > > all demands. > > > > > > The network can only implement *certain* general kinds of mechanisms > that may > > be used by the endpoints to resolve congestion: > > > > > > 1) admission controls. These are implemented at the interface between > the > > source entity and the network of networks. They tend to be impractical > in the > > Internet context, because there is, by a fundamental and irreversible > design > > choice made by Cerf and Kahn (and the rest of us), no central controller > of the > > entire network of networks. This is to make evolvability and scalability > work. 5G > > (not an Internet system) implies a central controller, as does SNA, LTE, > and many > > other networks. The Internet is an overlay on top of such networks. > > > > > > 2) signalling congestion to the endpoints, which will respond by > slowing > > their transmission rate (or explicitly re-routing transmission, or > compressing > > their content) through the network to match capacity. This response is > done > > *above* the IP layer, and has proven very practical. The function in the > network > > is reduced to "congestion signalling", in a universally understandable > meaningful > > mechanism: packet drops, ECN, packet-pair separation in arrival time, > ... > > This limited function is essential within the network, because it is the > state of > > the path(s) that is needed to implement the full function at the end > points. So > > congestion signalling, like ECN, is implemented according to the > end-to-end > > argument by carefully defining the network function to be the minimum > necessary > > mechanism so that endpoints can control their rates. > > > > > > 3) automatic selection of routes for flows. It's perfectly fine to > select > > different routes based on information in the IP header (the part that is > intended > > to be read and understood by the network of networks). Now this is > currently > > *rarely* done, due to the complexity of tracking more detailed routing > information > > at the router level. But we had expected that eventually the Internet > would be so > > well connected that there would be diverse routes with diverse > capabilities. For > > example, the "Interplanetary Internet" works with datagrams, that can be > > implemented with IP, but not using TCP, which requires very low > end-to-end > > latency. Thus, one would expect that TCP would not want any packets > transferred > > over a path via Mars, or for that matter a geosynchronous satellite, > even if the > > throughput would be higher. > > > > > > So one can imagine that eventually a "TOS" might say - send this packet > > preferably along a path that has at most 200 ms. RTT, *even if that > leads to > > congestion signalling*, while another TOS might say "send this path over > the most > > "capacious" set of paths, ignoring RTT entirely. (these are just for > illustration, > > but obviously something like this woujld work). > > > > > > Note that TOS is really aimed at *route selection* preferences, and not > > queueing management of individual routers. > > > > That may well have been the original intention, but it was hardly > mentioned at all > > in the diffserv WG (which I co-chaired), and "QOS-based routing" was in > very bad > > odour at that time. > > > > > > > > Queueing management to share a single queue on a path for multiple > priorities > > of traffic is not very compatible with "end-to-end arguments". There are > any > > number of reasons why this doesn't work well. I can go into them. Mainly > these > > reasons are why "diffserv" has never been adopted - > > > > Oh, but it has, in lots of local deployments of voice over IP for > example. It's > > what I've taken to calling a limited domain protocol. What has not > happened is > > Internet-wide deployment, because... > > > > > it's NOT interoperable because the diversity of traffic between > endpoints is > > hard to specify in a way that translates into the network mechanisms. Of > course > > any queue can be managed in some algorithmic way with parameters, but the > > endpoints that want to specify an end-to-end goal don't have a way to > understand > > the impact of those parameters on a specific queue that is currently > congested. > > > > Yes. And thanks for your insights. > > > > Brian > > > > > > > > > > > > > > Instead, the history of the Internet (and for that matter *all* > networks, > > even Bell's voice systems) has focused on minimizing queueing delay to > near zero > > throughout the network by whatever means it has at the endpoints or in > the design. > > This is why we have AIMD's MD as a response to detection of congestion. > > > > > > > > > > > > Pragmatic networks (those that operate in the real world) do not > choose to > > operate with shared links in a saturated state. That's known in the > phone business > > as the Mother's Day problem. You want to have enough capacity for the > rare > > near-overload to never result in congestion. Which means that the normal > > state of the network is very lightly loaded indeed, in order to minimize > RTT. > > Consequently, focusing on somehow trying to optimize the utilization of > the > > network to 100% is just a purely academic exercise. Since "priority" at > the packet > > level within a queue only improves that case, it's just a focus of (bad) > Ph.D. > > theses. (Good Ph.D. theses focus on actual real problems like getting > the queues > > down to 1 packet or less by signalling the endpoints with information > that allows > > them to do their job). > > > > > > > > > > > > So, in considering what goes in the IP layer, both its header and the > > mechanics of the network of networks, it is those things that actually > have > > implementable meaning in the network of networks when processing the IP > datagram. > > The rest is "content" because the network of networks doesn't need to > see it. > > > > > > > > > > > > Thus, don't put anything in the IP header that belongs in the > "content" part, > > just being a signal between end points. Some information used in the > network of > > networks is also logically carried between endpoints. > > > > > > > > > > > > > > > > > > On Friday, June 21, 2019 4:37pm, "Brian E Carpenter" > > <brian.e.carpenter@gmail.com> said: > > > > > >> Below... > > >> On 21-Jun-19 21:33, Luca Muscariello wrote: > > >> > + David Reed, as I'm not sure he's on the ecn-sane list. > > >> > > > >> > To me, it seems like a very religious position against per-flow > > >> queueing. > > >> > BTW, I fail to see how this would violate (in a "profound" way ) the > > e2e > > >> principle. > > >> > > > >> > When I read it (the e2e principle) > > >> > > > >> > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End > > Arguments in > > >> System Design". > > >> > In: Proceedings of the Second International Conference on > > Distributed > > >> Computing Systems. Paris, France. > > >> > April 8–10, 1981. IEEE Computer Society, pp. 509-512. > > >> > (available on line for free). > > >> > > > >> > It seems very much like the application of the Occam's razor to > > function > > >> placement in communication networks back in the 80s. > > >> > I see no conflict between what is written in that paper and per-flow > > queueing > > >> today, even after almost 40 years. > > >> > > > >> > If that was the case, then all service differentiation techniques > > would > > >> violate the e2e principle in a "profound" way too, > > >> > and dualQ too. A policer? A shaper? A priority queue? > > >> > > > >> > Luca > > >> > > >> Quoting RFC2638 (the "two-bit" RFC): > > >> > > >> >>> Both these > > >> >>> proposals seek to define a single common mechanism that is > > used > > >> by > > >> >>> interior network routers, pushing most of the complexity and > > state > > >> of > > >> >>> differentiated services to the network edges. > > >> > > >> I can't help thinking that if DDC had felt this was against the E2E > > principle, > > >> he would have kicked up a fuss when it was written. > > >> > > >> Bob's right, however, that there might be a tussle here. If end-points > > are > > >> attempting to pace their packets to suit their own needs, and the > network > > is > > >> policing packets to support both service differentiation and fairness, > > >> these may well be competing rather than collaborating behaviours. And > > there > > >> probably isn't anything we can do about it by twiddling with > algorithms. > > >> > > >> Brian > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller > > <moeller0@gmx.de > > >> <mailto:moeller0@gmx.de>> wrote: > > >> > > > >> > > > >> > > > >> > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net > > >> <mailto:ietf@bobbriscoe.net>> wrote: > > >> > > > > >> > > Jake, all, > > >> > > > > >> > > You may not be aware of my long history of concern about how > > >> per-flow scheduling within endpoints and networks will limit the > Internet > > in > > >> future. I find per-flow scheduling a violation of the e2e principle in > > such a > > >> profound way - the dynamic choice of the spacing between packets - > that > > most > > >> people don't even associate it with the e2e principle. > > >> > > > >> > Maybe because it is not a violation of the e2e principle at all? My > > point > > >> is that with shared resources between the endpoints, the endpoints > simply > > should > > >> have no expectancy that their choice of spacing between packets will > be > > conserved. > > >> For the simple reason that it seems generally impossible to guarantee > > that > > >> inter-packet spacing is conserved (think "cross-traffic" at the > > bottleneck hop > > >> along the path and general bunching up of packets in the queue of a > fast > > to slow > > >> transition*). I also would claim that the way L4S works (if it works) > is > > to > > >> synchronize all active flows at the bottleneck which in tirn means > each > > sender has > > >> only a very small timewindow in which to transmit a packet for it to > hits > > its > > >> "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing > > delay > > >> guarantees will not work. In other words the senders have basically no > > say in the > > >> "spacing between packets", I fail to see how L4S improves upon FQ in > that > > regard. > > >> > > > >> > > > >> > IMHO having per-flow fairness as the defaults seems quite > > >> reasonable, endpoints can still throttle flows to their liking. Now > > per-flow > > >> fairness still can be "abused", so by itself it might not be > sufficient, > > but > > >> neither is L4S as it has at best stochastic guarantees, as a single > queue > > AQM > > >> (let's ignore the RFC3168 part of the AQM) there is the probability to > > send a > > >> throtteling signal to a low bandwidth flow (fair enough, it is only a > > mild > > >> throtteling signal, but still). > > >> > But enough about my opinion, what is the ideal fairness measure in > > your > > >> mind, and what is realistically achievable over the internet? > > >> > > > >> > > > >> > Best Regards > > >> > Sebastian > > >> > > > >> > > > >> > > > >> > > > >> > > > > >> > > I detected that you were talking about FQ in a way that might > > have > > >> assumed my concern with it was just about implementation complexity. > If > > you (or > > >> anyone watching) is not aware of the architectural concerns with > > per-flow > > >> scheduling, I can enumerate them. > > >> > > > > >> > > I originally started working on what became L4S to prove that > > it was > > >> possible to separate out reducing queuing delay from throughput > > scheduling. When > > >> Koen and I started working together on this, we discovered we had > > identical > > >> concerns on this. > > >> > > > > >> > > > > >> > > > > >> > > Bob > > >> > > > > >> > > > > >> > > -- > > >> > > > > ________________________________________________________________ > > >> > > Bob Briscoe > > > > >> > > http://bobbriscoe.net/ > > >> > > > > >> > > _______________________________________________ > > >> > > Ecn-sane mailing list > > >> > > Ecn-sane@lists.bufferbloat.net > > >> <mailto:Ecn-sane@lists.bufferbloat.net> > > >> > > https://lists.bufferbloat.net/listinfo/ecn-sane > > >> > > > >> > _______________________________________________ > > >> > Ecn-sane mailing list > > >> > Ecn-sane@lists.bufferbloat.net > > >> <mailto:Ecn-sane@lists.bufferbloat.net> > > >> > https://lists.bufferbloat.net/listinfo/ecn-sane > > >> > > > >> > > >> > > > > > > > > [-- Attachment #2: Type: text/html, Size: 21720 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-06-19 14:12 [Ecn-sane] per-flow scheduling Bob Briscoe 2019-06-19 14:20 ` [Ecn-sane] [tsvwg] " Kyle Rose 2019-06-21 6:59 ` [Ecn-sane] " Sebastian Moeller @ 2019-07-17 21:33 ` Sebastian Moeller 2019-07-17 22:18 ` David P. Reed 2 siblings, 1 reply; 49+ messages in thread From: Sebastian Moeller @ 2019-07-17 21:33 UTC (permalink / raw) To: Bob Briscoe; +Cc: Holland, Jake, ecn-sane, tsvwg IETF list Dear Bob, dear IETF team, > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: > > Jake, all, > > You may not be aware of my long history of concern about how per-flow scheduling within endpoints and networks will limit the Internet in future. I find per-flow scheduling a violation of the e2e principle in such a profound way - the dynamic choice of the spacing between packets - that most people don't even associate it with the e2e principle. This does not rhyme well with the L4S stated advantage of allowing packet reordering (due to mandating RACK for all L4S tcp endpoints). Because surely changing the order of packets messes up the "the dynamic choice of the spacing between packets" in a significant way. IMHO it is either L4S is great because it will give intermediate hops more leeway to re-order packets, or "a sender's packet spacing" is sacred, please make up your mind which it is. > > I detected that you were talking about FQ in a way that might have assumed my concern with it was just about implementation complexity. If you (or anyone watching) is not aware of the architectural concerns with per-flow scheduling, I can enumerate them. Please do not hesitate to do so after your deserved holiday, and please state a superior alternative. Best Regards Sebastian > > I originally started working on what became L4S to prove that it was possible to separate out reducing queuing delay from throughput scheduling. When Koen and I started working together on this, we discovered we had identical concerns on this. > > > > Bob > > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/ > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 21:33 ` [Ecn-sane] " Sebastian Moeller @ 2019-07-17 22:18 ` David P. Reed 2019-07-17 22:34 ` David P. Reed ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: David P. Reed @ 2019-07-17 22:18 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Bob Briscoe, ecn-sane, tsvwg IETF list I do want to toss in my personal observations about the "end-to-end argument" related to per-flow-scheduling. (Such arguments are, of course, a class of arguments to which my name is attached. Not that I am a judge/jury of such questions...) A core principle of the Internet design is to move function out of the network, including routers and middleboxes, if those functions a) can be properly accomplished by the endpoints, and b) are not relevant to all uses of the Internet transport fabric being used by the ends. The rationale here has always seemed obvious to me. Like Bob Briscoe suggests, we were very wary of throwing features into the network that would preclude unanticipated future interoperability needs, new applications, and new technology in the infrastructure of the Internet as a whole. So what are we talking about here (ignoring the fine points of SCE, some of which I think are debatable - especially the focus on TCP alone, since much traffic will likely move away from TCP in the near future. A second technical requirement (necessary invariant) of the Internet's transport is that the entire Internet depends on rigorously stopping queueing delay from building up anywhere except at the endpoints, where the ends can manage it.This is absolutely critical, though it is peculiar in that many engineers, especially those who work at the IP layer and below, have a mental model of routing as essentially being about building up queueing delay (in order to manage priority in some trivial way by building up the queue on purpose, apparently). This second technical requirement cannot be resolved merely by the endpoints. The reason is that the endpoints cannot know accurately what host-host paths share common queues. This lack of a way to "cooperate" among independent users of a queue cannot be solved by a purely end-to-end solution. (well, I suppose some genius might invent a way, but I have not seen one in my 36 years closely watching the Internet in operation since it went live in 1983.) So, what the end-to-end argument would tend to do here, in my opinion, is to provide the most minimal mechanism in the devices that are capable of building up a queue in order to allow all the ends sharing that queue to do their job - which is to stop filling up the queue! Only the endpoints can prevent filling up queues. And depending on the protocol, they may need to make very different, yet compatible choices. This is a question of design at the architectural level. And the future matters. So there is an end-to-end argument to be made here, but it is a subtle one. The basic mechanism for controlling queue depth has been, and remains, quite simple: dropping packets. This has two impacts: 1) immediately reducing queueing delay, and 2) signalling to endpoints that are paying attention that they have contributed to an overfull queue. The optimum queueing delay in a steady state would always be one packet or less. Kleinrock has shown this in the last few years. Of course there aren't steady states. But we don't want a mechanism that can't converge to that steady state *quickly*, for all queues in the network. Another issue is that endpoints are not aware of the fact that packets can take multiple paths to any destination. In the future, alternate path choices can be made by routers (when we get smarter routing algorithms based on traffic engineering). So again, some minimal kind of information must be exposed to endpoints that will continue to communicate. Again, the routers must be able to help a wide variety of endpoints with different use cases to decide how to move queue buildup out of the network itself. Now the decision made by the endpoints must be made in the context of information about fairness. Maybe this is what is not obvious. The most obvious notion of fairness is equal shares among source host, dest host pairs. There are drawbacks to that, but the benefit of it is that it affects the IP layer alone, and deals with lots of boundary cases like the case where a single host opens a zillion TCP connections or uses lots of UDP source ports or destinations to somehow "cheat" by appearing to have "lots of flows". Another way to deal with dividing up flows is to ignore higher level protocol information entirely, and put the flow idenfitication in the IP layer. A 32-bit or 64-bit random number could be added as an "option" to IP to somehow extend the flow space. But that is not the most important thing today. I write this to say: 1) some kind of per-flow queueing, during the transient state where a queue is overloaded before packets are dropped would provide much needed information to the ends of every flow sharing a common queue. 2) per-flow queueing, minimized to a very low level, using IP envelope address information (plus maybe UDP and TCP addresses for those protocols in an extended address-based flow definition) is totally compatible with end-to-end arguments, but ONLY if the decisions made are certain to drive queueing delay out of the router to the endpoints. On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.de> said: > Dear Bob, dear IETF team, > > >> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: >> >> Jake, all, >> >> You may not be aware of my long history of concern about how per-flow scheduling >> within endpoints and networks will limit the Internet in future. I find per-flow >> scheduling a violation of the e2e principle in such a profound way - the dynamic >> choice of the spacing between packets - that most people don't even associate it >> with the e2e principle. > > This does not rhyme well with the L4S stated advantage of allowing packet > reordering (due to mandating RACK for all L4S tcp endpoints). Because surely > changing the order of packets messes up the "the dynamic choice of the spacing > between packets" in a significant way. IMHO it is either L4S is great because it > will give intermediate hops more leeway to re-order packets, or "a sender's > packet spacing" is sacred, please make up your mind which it is. > >> >> I detected that you were talking about FQ in a way that might have assumed my >> concern with it was just about implementation complexity. If you (or anyone >> watching) is not aware of the architectural concerns with per-flow scheduling, I >> can enumerate them. > > Please do not hesitate to do so after your deserved holiday, and please state a > superior alternative. > > Best Regards > Sebastian > > >> >> I originally started working on what became L4S to prove that it was possible to >> separate out reducing queuing delay from throughput scheduling. When Koen and I >> started working together on this, we discovered we had identical concerns on >> this. >> >> >> >> Bob >> >> >> -- >> ________________________________________________________________ >> Bob Briscoe http://bobbriscoe.net/ >> >> _______________________________________________ >> Ecn-sane mailing list >> Ecn-sane@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/ecn-sane > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 22:18 ` David P. Reed @ 2019-07-17 22:34 ` David P. Reed 2019-07-17 23:23 ` Dave Taht 2019-07-18 4:31 ` Jonathan Morton 2019-07-18 5:24 ` [Ecn-sane] " Jonathan Morton 2 siblings, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-07-17 22:34 UTC (permalink / raw) To: David P. Reed; +Cc: Sebastian Moeller, ecn-sane, Bob Briscoe, tsvwg IETF list A follow up point that I think needs to be made is one more end-to-end argument: It is NOT the job of the IP transport layer to provide free storage for low priority packets. The end-to-end argument here says: the ends can and must hold packets until they are either delivered or not relevant (in RTP, they become irrelevant when they get older than their desired delivery time, if you want an example of the latter), SO, the network should not provide the function of storage beyond the minimum needed to deal with transients. That means, unfortunately, that the dream of some kind of "background" path that stores "low priority" packets in the network fails the end-to-end argument test. If you think about this, it even applies to some imaginary interplanetary IP layer network. Queueing delay is not a feature of any end-to-end requirement. What may be desired at the router/link level in an interplanetary IP layer is holding packets because a link is actually down, or using link-level error correction coding or retransmission to bring the error rate down to an acceptable level before declaring it down. But that's quite different - it's the link level protocol, which aims to deliver minimum queueing delay under tough conditions, without buffering more than needed for that (the number of bits that fit in the light-speed transmission at the transmission rate. So, the main reason I'm saying this is because again, there are those who want to implement the TCP function of reliable delivery of each packet in the links. That's a very bad idea. On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.com> said: > I do want to toss in my personal observations about the "end-to-end argument" > related to per-flow-scheduling. (Such arguments are, of course, a class of > arguments to which my name is attached. Not that I am a judge/jury of such > questions...) > > A core principle of the Internet design is to move function out of the network, > including routers and middleboxes, if those functions > > a) can be properly accomplished by the endpoints, and > b) are not relevant to all uses of the Internet transport fabric being used by the > ends. > > The rationale here has always seemed obvious to me. Like Bob Briscoe suggests, we > were very wary of throwing features into the network that would preclude > unanticipated future interoperability needs, new applications, and new technology > in the infrastructure of the Internet as a whole. > > So what are we talking about here (ignoring the fine points of SCE, some of which > I think are debatable - especially the focus on TCP alone, since much traffic will > likely move away from TCP in the near future. > > A second technical requirement (necessary invariant) of the Internet's transport > is that the entire Internet depends on rigorously stopping queueing delay from > building up anywhere except at the endpoints, where the ends can manage it.This is > absolutely critical, though it is peculiar in that many engineers, especially > those who work at the IP layer and below, have a mental model of routing as > essentially being about building up queueing delay (in order to manage priority in > some trivial way by building up the queue on purpose, apparently). > > This second technical requirement cannot be resolved merely by the endpoints. > The reason is that the endpoints cannot know accurately what host-host paths share > common queues. > > This lack of a way to "cooperate" among independent users of a queue cannot be > solved by a purely end-to-end solution. (well, I suppose some genius might invent > a way, but I have not seen one in my 36 years closely watching the Internet in > operation since it went live in 1983.) > > So, what the end-to-end argument would tend to do here, in my opinion, is to > provide the most minimal mechanism in the devices that are capable of building up > a queue in order to allow all the ends sharing that queue to do their job - which > is to stop filling up the queue! > > Only the endpoints can prevent filling up queues. And depending on the protocol, > they may need to make very different, yet compatible choices. > > This is a question of design at the architectural level. And the future matters. > > So there is an end-to-end argument to be made here, but it is a subtle one. > > The basic mechanism for controlling queue depth has been, and remains, quite > simple: dropping packets. This has two impacts: 1) immediately reducing queueing > delay, and 2) signalling to endpoints that are paying attention that they have > contributed to an overfull queue. > > The optimum queueing delay in a steady state would always be one packet or less. > Kleinrock has shown this in the last few years. Of course there aren't steady > states. But we don't want a mechanism that can't converge to that steady state > *quickly*, for all queues in the network. > > Another issue is that endpoints are not aware of the fact that packets can take > multiple paths to any destination. In the future, alternate path choices can be > made by routers (when we get smarter routing algorithms based on traffic > engineering). > > So again, some minimal kind of information must be exposed to endpoints that will > continue to communicate. Again, the routers must be able to help a wide variety of > endpoints with different use cases to decide how to move queue buildup out of the > network itself. > > Now the decision made by the endpoints must be made in the context of information > about fairness. Maybe this is what is not obvious. > > The most obvious notion of fairness is equal shares among source host, dest host > pairs. There are drawbacks to that, but the benefit of it is that it affects the > IP layer alone, and deals with lots of boundary cases like the case where a single > host opens a zillion TCP connections or uses lots of UDP source ports or > destinations to somehow "cheat" by appearing to have "lots of flows". > > Another way to deal with dividing up flows is to ignore higher level protocol > information entirely, and put the flow idenfitication in the IP layer. A 32-bit or > 64-bit random number could be added as an "option" to IP to somehow extend the > flow space. > > But that is not the most important thing today. > > I write this to say: > 1) some kind of per-flow queueing, during the transient state where a queue is > overloaded before packets are dropped would provide much needed information to the > ends of every flow sharing a common queue. > 2) per-flow queueing, minimized to a very low level, using IP envelope address > information (plus maybe UDP and TCP addresses for those protocols in an extended > address-based flow definition) is totally compatible with end-to-end arguments, > but ONLY if the decisions made are certain to drive queueing delay out of the > router to the endpoints. > > > > > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.de> said: > >> Dear Bob, dear IETF team, >> >> >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: >>> >>> Jake, all, >>> >>> You may not be aware of my long history of concern about how per-flow scheduling >>> within endpoints and networks will limit the Internet in future. I find per-flow >>> scheduling a violation of the e2e principle in such a profound way - the dynamic >>> choice of the spacing between packets - that most people don't even associate it >>> with the e2e principle. >> >> This does not rhyme well with the L4S stated advantage of allowing packet >> reordering (due to mandating RACK for all L4S tcp endpoints). Because surely >> changing the order of packets messes up the "the dynamic choice of the spacing >> between packets" in a significant way. IMHO it is either L4S is great because it >> will give intermediate hops more leeway to re-order packets, or "a sender's >> packet spacing" is sacred, please make up your mind which it is. >> >>> >>> I detected that you were talking about FQ in a way that might have assumed my >>> concern with it was just about implementation complexity. If you (or anyone >>> watching) is not aware of the architectural concerns with per-flow scheduling, I >>> can enumerate them. >> >> Please do not hesitate to do so after your deserved holiday, and please state a >> superior alternative. >> >> Best Regards >> Sebastian >> >> >>> >>> I originally started working on what became L4S to prove that it was possible to >>> separate out reducing queuing delay from throughput scheduling. When Koen and I >>> started working together on this, we discovered we had identical concerns on >>> this. >>> >>> >>> >>> Bob >>> >>> >>> -- >>> ________________________________________________________________ >>> Bob Briscoe http://bobbriscoe.net/ >>> >>> _______________________________________________ >>> Ecn-sane mailing list >>> Ecn-sane@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/ecn-sane >> >> _______________________________________________ >> Ecn-sane mailing list >> Ecn-sane@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/ecn-sane >> > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 22:34 ` David P. Reed @ 2019-07-17 23:23 ` Dave Taht 2019-07-18 0:20 ` Dave Taht 2019-07-18 15:02 ` David P. Reed 0 siblings, 2 replies; 49+ messages in thread From: Dave Taht @ 2019-07-17 23:23 UTC (permalink / raw) To: David P. Reed; +Cc: ecn-sane, Bob Briscoe, tsvwg IETF list On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wrote: > > A follow up point that I think needs to be made is one more end-to-end argument: > > It is NOT the job of the IP transport layer to provide free storage for low priority packets. The end-to-end argument here says: the ends can and must hold packets until they are either delivered or not relevant (in RTP, they become irrelevant when they get older than their desired delivery time, if you want an example of the latter), SO, the network should not provide the function of storage beyond the minimum needed to deal with transients. > > That means, unfortunately, that the dream of some kind of "background" path that stores "low priority" packets in the network fails the end-to-end argument test. I do not mind reserving a tiny portion of the network for "background" traffic. This is different (I think?) than storing low priority packets in the network. A background traffic "queue" of 1 packet would be fine.... > If you think about this, it even applies to some imaginary interplanetary IP layer network. Queueing delay is not a feature of any end-to-end requirement. > > What may be desired at the router/link level in an interplanetary IP layer is holding packets because a link is actually down, or using link-level error correction coding or retransmission to bring the error rate down to an acceptable level before declaring it down. But that's quite different - it's the link level protocol, which aims to deliver minimum queueing delay under tough conditions, without buffering more than needed for that (the number of bits that fit in the light-speed transmission at the transmission rate. As I outlined in my mit wifi talk - 1 layer of retry of at the wifi mac layer made it work, in 1998, and that seemed a very acceptable compromise at the time. Present day retries at the layer, not congestion controlled, is totally out of hand. In thinking about starlink's mac, and mobility, I gradulally came to the conclusion that 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I disliked the idea. I still dislike retries at layer 2, even for nearby sats. really complicates things. so for all I know I'll be advocating ripping 'em out in starlink, if they are indeed, in there, next week. > So, the main reason I'm saying this is because again, there are those who want to implement the TCP function of reliable delivery of each packet in the links. That's a very bad idea. It was tried in the arpanet, and didn't work well there. There's a good story about many of the flaws of the Arpanet's design, including that problem, in the latter half of Kleinrock's second book on queue theory, at least the first edition... Wifi (and 345g) re-introduced the same problem with retransmits and block acks at layer 2. and after dissecting my ecn battlemesh data and observing what the retries at the mac layer STILL do on wifi with the current default wifi codel target (20ms AFTER two txops are in the hardware) currently achieve (50ms, which is 10x worse than what we could do and still better performance under load than any other shipping physical layer we have with fifos)... and after thinking hard about nagle's thought that "every application has a right to one packet in the network", and this very long thread reworking the end to end argument in a similar, but not quite identical direction, I'm coming to a couple conclusions I'd possibly not quite expressed well before. 1) transports should treat an RFC3168 CE coupled with loss (drop and mark) as an even stronger signal of congestion than either, and that this bit of the codel algorithm, when ecn is in use, is wrong, and has always been wrong: https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178 (we added this arbitrarily to codel in the 5th day of development in 2012. Using FQ masked it's effects on light traffic) What it should do instead is peek the queue and drop until it hits a markable packet, at the very least. Pie has an arbitrary drop at 10% figure, which does lighten the load some... cake used to have drop and mark also until a year or two back... 2) At low rates and high contention, we really need pacing and fractional cwnd. (while I would very much like to see a dynamic reduction of MSS tried, that too has a bottom limit) even then, drop as per bullet 1. 3) In the end, I could see a world with SCE marks, and CE being obsoleted in favor of drop, or CE only being exerted on really light loads similar to (or less than!) what the arbitrary 10% figure for pie uses 4) in all cases, I vastly prefer somehow ultimately shifting greedy transports to RTT rather than drop or CE as their primary congestion control indicator. FQ makes that feasible today. With enough FQ deployed for enough congestive scenarios and hardware, and RTT becoming the core indicator for more transports, single queued designs become possible in the distant future. > > On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.com> said: > > > I do want to toss in my personal observations about the "end-to-end argument" > > related to per-flow-scheduling. (Such arguments are, of course, a class of > > arguments to which my name is attached. Not that I am a judge/jury of such > > questions...) > > > > A core principle of the Internet design is to move function out of the network, > > including routers and middleboxes, if those functions > > > > a) can be properly accomplished by the endpoints, and > > b) are not relevant to all uses of the Internet transport fabric being used by the > > ends. > > > > The rationale here has always seemed obvious to me. Like Bob Briscoe suggests, we > > were very wary of throwing features into the network that would preclude > > unanticipated future interoperability needs, new applications, and new technology > > in the infrastructure of the Internet as a whole. > > > > So what are we talking about here (ignoring the fine points of SCE, some of which > > I think are debatable - especially the focus on TCP alone, since much traffic will > > likely move away from TCP in the near future. > > > > A second technical requirement (necessary invariant) of the Internet's transport > > is that the entire Internet depends on rigorously stopping queueing delay from > > building up anywhere except at the endpoints, where the ends can manage it.This is > > absolutely critical, though it is peculiar in that many engineers, especially > > those who work at the IP layer and below, have a mental model of routing as > > essentially being about building up queueing delay (in order to manage priority in > > some trivial way by building up the queue on purpose, apparently). > > > > This second technical requirement cannot be resolved merely by the endpoints. > > The reason is that the endpoints cannot know accurately what host-host paths share > > common queues. > > > > This lack of a way to "cooperate" among independent users of a queue cannot be > > solved by a purely end-to-end solution. (well, I suppose some genius might invent > > a way, but I have not seen one in my 36 years closely watching the Internet in > > operation since it went live in 1983.) > > > > So, what the end-to-end argument would tend to do here, in my opinion, is to > > provide the most minimal mechanism in the devices that are capable of building up > > a queue in order to allow all the ends sharing that queue to do their job - which > > is to stop filling up the queue! > > > > Only the endpoints can prevent filling up queues. And depending on the protocol, > > they may need to make very different, yet compatible choices. > > > > This is a question of design at the architectural level. And the future matters. > > > > So there is an end-to-end argument to be made here, but it is a subtle one. > > > > The basic mechanism for controlling queue depth has been, and remains, quite > > simple: dropping packets. This has two impacts: 1) immediately reducing queueing > > delay, and 2) signalling to endpoints that are paying attention that they have > > contributed to an overfull queue. > > > > The optimum queueing delay in a steady state would always be one packet or less. > > Kleinrock has shown this in the last few years. Of course there aren't steady > > states. But we don't want a mechanism that can't converge to that steady state > > *quickly*, for all queues in the network. > > > > Another issue is that endpoints are not aware of the fact that packets can take > > multiple paths to any destination. In the future, alternate path choices can be > > made by routers (when we get smarter routing algorithms based on traffic > > engineering). > > > > So again, some minimal kind of information must be exposed to endpoints that will > > continue to communicate. Again, the routers must be able to help a wide variety of > > endpoints with different use cases to decide how to move queue buildup out of the > > network itself. > > > > Now the decision made by the endpoints must be made in the context of information > > about fairness. Maybe this is what is not obvious. > > > > The most obvious notion of fairness is equal shares among source host, dest host > > pairs. There are drawbacks to that, but the benefit of it is that it affects the > > IP layer alone, and deals with lots of boundary cases like the case where a single > > host opens a zillion TCP connections or uses lots of UDP source ports or > > destinations to somehow "cheat" by appearing to have "lots of flows". > > > > Another way to deal with dividing up flows is to ignore higher level protocol > > information entirely, and put the flow idenfitication in the IP layer. A 32-bit or > > 64-bit random number could be added as an "option" to IP to somehow extend the > > flow space. > > > > But that is not the most important thing today. > > > > I write this to say: > > 1) some kind of per-flow queueing, during the transient state where a queue is > > overloaded before packets are dropped would provide much needed information to the > > ends of every flow sharing a common queue. > > 2) per-flow queueing, minimized to a very low level, using IP envelope address > > information (plus maybe UDP and TCP addresses for those protocols in an extended > > address-based flow definition) is totally compatible with end-to-end arguments, > > but ONLY if the decisions made are certain to drive queueing delay out of the > > router to the endpoints. > > > > > > > > > > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.de> said: > > > >> Dear Bob, dear IETF team, > >> > >> > >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: > >>> > >>> Jake, all, > >>> > >>> You may not be aware of my long history of concern about how per-flow scheduling > >>> within endpoints and networks will limit the Internet in future. I find per-flow > >>> scheduling a violation of the e2e principle in such a profound way - the dynamic > >>> choice of the spacing between packets - that most people don't even associate it > >>> with the e2e principle. > >> > >> This does not rhyme well with the L4S stated advantage of allowing packet > >> reordering (due to mandating RACK for all L4S tcp endpoints). Because surely > >> changing the order of packets messes up the "the dynamic choice of the spacing > >> between packets" in a significant way. IMHO it is either L4S is great because it > >> will give intermediate hops more leeway to re-order packets, or "a sender's > >> packet spacing" is sacred, please make up your mind which it is. > >> > >>> > >>> I detected that you were talking about FQ in a way that might have assumed my > >>> concern with it was just about implementation complexity. If you (or anyone > >>> watching) is not aware of the architectural concerns with per-flow scheduling, I > >>> can enumerate them. > >> > >> Please do not hesitate to do so after your deserved holiday, and please state a > >> superior alternative. > >> > >> Best Regards > >> Sebastian > >> > >> > >>> > >>> I originally started working on what became L4S to prove that it was possible to > >>> separate out reducing queuing delay from throughput scheduling. When Koen and I > >>> started working together on this, we discovered we had identical concerns on > >>> this. > >>> > >>> > >>> > >>> Bob > >>> > >>> > >>> -- > >>> ________________________________________________________________ > >>> Bob Briscoe http://bobbriscoe.net/ > >>> > >>> _______________________________________________ > >>> Ecn-sane mailing list > >>> Ecn-sane@lists.bufferbloat.net > >>> https://lists.bufferbloat.net/listinfo/ecn-sane > >> > >> _______________________________________________ > >> Ecn-sane mailing list > >> Ecn-sane@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/ecn-sane > >> > > > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 23:23 ` Dave Taht @ 2019-07-18 0:20 ` Dave Taht 2019-07-18 5:30 ` Jonathan Morton 2019-07-18 15:02 ` David P. Reed 1 sibling, 1 reply; 49+ messages in thread From: Dave Taht @ 2019-07-18 0:20 UTC (permalink / raw) To: David P. Reed; +Cc: ecn-sane, Bob Briscoe, tsvwg IETF list On Wed, Jul 17, 2019 at 4:23 PM Dave Taht <dave.taht@gmail.com> wrote: > > On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wrote: > > > > A follow up point that I think needs to be made is one more end-to-end argument: > > > > It is NOT the job of the IP transport layer to provide free storage for low priority packets. The end-to-end argument here says: the ends can and must hold packets until they are either delivered or not relevant (in RTP, they become irrelevant when they get older than their desired delivery time, if you want an example of the latter), SO, the network should not provide the function of storage beyond the minimum needed to deal with transients. > > > > That means, unfortunately, that the dream of some kind of "background" path that stores "low priority" packets in the network fails the end-to-end argument test. > > I do not mind reserving a tiny portion of the network for "background" > traffic. This > is different (I think?) than storing low priority packets in the > network. A background > traffic "queue" of 1 packet would be fine.... > > > If you think about this, it even applies to some imaginary interplanetary IP layer network. Queueing delay is not a feature of any end-to-end requirement. > > > > What may be desired at the router/link level in an interplanetary IP layer is holding packets because a link is actually down, or using link-level error correction coding or retransmission to bring the error rate down to an acceptable level before declaring it down. But that's quite different - it's the link level protocol, which aims to deliver minimum queueing delay under tough conditions, without buffering more than needed for that (the number of bits that fit in the light-speed transmission at the transmission rate. > > As I outlined in my mit wifi talk - 1 layer of retry of at the wifi > mac layer made it > work, in 1998, and that seemed a very acceptable compromise at the > time. Present day > retries at the layer, not congestion controlled, is totally out of hand. > > In thinking about starlink's mac, and mobility, I gradulally came to > the conclusion that > 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I > disliked the idea. > > I still dislike retries at layer 2, even for nearby sats. really > complicates things. so for all I know I'll be advocating ripping 'em > out in starlink, if they are indeed, in there, next week. > > > So, the main reason I'm saying this is because again, there are those who want to implement the TCP function of reliable delivery of each packet in the links. That's a very bad idea. > > It was tried in the arpanet, and didn't work well there. There's a > good story about many > of the flaws of the Arpanet's design, including that problem, in the > latter half of Kleinrock's second book on queue theory, at least the > first edition... > > Wifi (and 345g) re-introduced the same problem with retransmits and > block acks at layer 2. > > and after dissecting my ecn battlemesh data and observing what the > retries at the mac layer STILL do on wifi with the current default > wifi codel target (20ms AFTER two txops are in the hardware) currently > achieve (50ms, which is 10x worse than what we could do and still > better performance under load than any other shipping physical layer > we have with fifos)... and after thinking hard about nagle's thought > that "every application has a right to one packet in the network", and > this very long thread reworking the end to end argument in a similar, > but not quite identical direction, I'm coming to a couple conclusions > I'd possibly not quite expressed well before. > > 1) transports should treat an RFC3168 CE coupled with loss (drop and > mark) as an even stronger signal of congestion than either, and that > this bit of the codel algorithm, > when ecn is in use, is wrong, and has always been wrong: > > https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178 > > (we added this arbitrarily to codel in the 5th day of development in > 2012. Using FQ masked it's effects on light traffic) > > What it should do instead is peek the queue and drop until it hits a > markable packet, at the very least. I didn't say this well. It should drop otherwise markable packets until it exits the loop, and then mark the one it delivers from that flow, if it delivers one from that flow. That gets rid of all the extra mass ecn creates... but I should go code it up again and see what happens on wifi. Worst case I prove yet again, that reasoning about the behavior of queues if futile. > > Pie has an arbitrary drop at 10% figure, which does lighten the load > some... cake used to have drop and mark also until a year or two > back... > > 2) At low rates and high contention, we really need pacing and fractional cwnd. > > (while I would very much like to see a dynamic reduction of MSS tried, > that too has a bottom limit) > > even then, drop as per bullet 1. > > 3) In the end, I could see a world with SCE marks, and CE being > obsoleted in favor of drop, or CE only being exerted on really light > loads similar to (or less than!) what the arbitrary 10% figure for pie > uses > > 4) in all cases, I vastly prefer somehow ultimately shifting greedy > transports to RTT rather than drop or CE as their primary congestion > control indicator. FQ makes that feasible today. With enough FQ > deployed for enough congestive scenarios and hardware, and RTT > becoming the core indicator for more transports, single queued designs > become possible in the distant future. > > > > > > On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.com> said: > > > > > I do want to toss in my personal observations about the "end-to-end argument" > > > related to per-flow-scheduling. (Such arguments are, of course, a class of > > > arguments to which my name is attached. Not that I am a judge/jury of such > > > questions...) > > > > > > A core principle of the Internet design is to move function out of the network, > > > including routers and middleboxes, if those functions > > > > > > a) can be properly accomplished by the endpoints, and > > > b) are not relevant to all uses of the Internet transport fabric being used by the > > > ends. > > > > > > The rationale here has always seemed obvious to me. Like Bob Briscoe suggests, we > > > were very wary of throwing features into the network that would preclude > > > unanticipated future interoperability needs, new applications, and new technology > > > in the infrastructure of the Internet as a whole. > > > > > > So what are we talking about here (ignoring the fine points of SCE, some of which > > > I think are debatable - especially the focus on TCP alone, since much traffic will > > > likely move away from TCP in the near future. > > > > > > A second technical requirement (necessary invariant) of the Internet's transport > > > is that the entire Internet depends on rigorously stopping queueing delay from > > > building up anywhere except at the endpoints, where the ends can manage it.This is > > > absolutely critical, though it is peculiar in that many engineers, especially > > > those who work at the IP layer and below, have a mental model of routing as > > > essentially being about building up queueing delay (in order to manage priority in > > > some trivial way by building up the queue on purpose, apparently). > > > > > > This second technical requirement cannot be resolved merely by the endpoints. > > > The reason is that the endpoints cannot know accurately what host-host paths share > > > common queues. > > > > > > This lack of a way to "cooperate" among independent users of a queue cannot be > > > solved by a purely end-to-end solution. (well, I suppose some genius might invent > > > a way, but I have not seen one in my 36 years closely watching the Internet in > > > operation since it went live in 1983.) > > > > > > So, what the end-to-end argument would tend to do here, in my opinion, is to > > > provide the most minimal mechanism in the devices that are capable of building up > > > a queue in order to allow all the ends sharing that queue to do their job - which > > > is to stop filling up the queue! > > > > > > Only the endpoints can prevent filling up queues. And depending on the protocol, > > > they may need to make very different, yet compatible choices. > > > > > > This is a question of design at the architectural level. And the future matters. > > > > > > So there is an end-to-end argument to be made here, but it is a subtle one. > > > > > > The basic mechanism for controlling queue depth has been, and remains, quite > > > simple: dropping packets. This has two impacts: 1) immediately reducing queueing > > > delay, and 2) signalling to endpoints that are paying attention that they have > > > contributed to an overfull queue. > > > > > > The optimum queueing delay in a steady state would always be one packet or less. > > > Kleinrock has shown this in the last few years. Of course there aren't steady > > > states. But we don't want a mechanism that can't converge to that steady state > > > *quickly*, for all queues in the network. > > > > > > Another issue is that endpoints are not aware of the fact that packets can take > > > multiple paths to any destination. In the future, alternate path choices can be > > > made by routers (when we get smarter routing algorithms based on traffic > > > engineering). > > > > > > So again, some minimal kind of information must be exposed to endpoints that will > > > continue to communicate. Again, the routers must be able to help a wide variety of > > > endpoints with different use cases to decide how to move queue buildup out of the > > > network itself. > > > > > > Now the decision made by the endpoints must be made in the context of information > > > about fairness. Maybe this is what is not obvious. > > > > > > The most obvious notion of fairness is equal shares among source host, dest host > > > pairs. There are drawbacks to that, but the benefit of it is that it affects the > > > IP layer alone, and deals with lots of boundary cases like the case where a single > > > host opens a zillion TCP connections or uses lots of UDP source ports or > > > destinations to somehow "cheat" by appearing to have "lots of flows". > > > > > > Another way to deal with dividing up flows is to ignore higher level protocol > > > information entirely, and put the flow idenfitication in the IP layer. A 32-bit or > > > 64-bit random number could be added as an "option" to IP to somehow extend the > > > flow space. > > > > > > But that is not the most important thing today. > > > > > > I write this to say: > > > 1) some kind of per-flow queueing, during the transient state where a queue is > > > overloaded before packets are dropped would provide much needed information to the > > > ends of every flow sharing a common queue. > > > 2) per-flow queueing, minimized to a very low level, using IP envelope address > > > information (plus maybe UDP and TCP addresses for those protocols in an extended > > > address-based flow definition) is totally compatible with end-to-end arguments, > > > but ONLY if the decisions made are certain to drive queueing delay out of the > > > router to the endpoints. > > > > > > > > > > > > > > > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.de> said: > > > > > >> Dear Bob, dear IETF team, > > >> > > >> > > >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: > > >>> > > >>> Jake, all, > > >>> > > >>> You may not be aware of my long history of concern about how per-flow scheduling > > >>> within endpoints and networks will limit the Internet in future. I find per-flow > > >>> scheduling a violation of the e2e principle in such a profound way - the dynamic > > >>> choice of the spacing between packets - that most people don't even associate it > > >>> with the e2e principle. > > >> > > >> This does not rhyme well with the L4S stated advantage of allowing packet > > >> reordering (due to mandating RACK for all L4S tcp endpoints). Because surely > > >> changing the order of packets messes up the "the dynamic choice of the spacing > > >> between packets" in a significant way. IMHO it is either L4S is great because it > > >> will give intermediate hops more leeway to re-order packets, or "a sender's > > >> packet spacing" is sacred, please make up your mind which it is. > > >> > > >>> > > >>> I detected that you were talking about FQ in a way that might have assumed my > > >>> concern with it was just about implementation complexity. If you (or anyone > > >>> watching) is not aware of the architectural concerns with per-flow scheduling, I > > >>> can enumerate them. > > >> > > >> Please do not hesitate to do so after your deserved holiday, and please state a > > >> superior alternative. > > >> > > >> Best Regards > > >> Sebastian > > >> > > >> > > >>> > > >>> I originally started working on what became L4S to prove that it was possible to > > >>> separate out reducing queuing delay from throughput scheduling. When Koen and I > > >>> started working together on this, we discovered we had identical concerns on > > >>> this. > > >>> > > >>> > > >>> > > >>> Bob > > >>> > > >>> > > >>> -- > > >>> ________________________________________________________________ > > >>> Bob Briscoe http://bobbriscoe.net/ > > >>> > > >>> _______________________________________________ > > >>> Ecn-sane mailing list > > >>> Ecn-sane@lists.bufferbloat.net > > >>> https://lists.bufferbloat.net/listinfo/ecn-sane > > >> > > >> _______________________________________________ > > >> Ecn-sane mailing list > > >> Ecn-sane@lists.bufferbloat.net > > >> https://lists.bufferbloat.net/listinfo/ecn-sane > > >> > > > > > > > > > _______________________________________________ > > > Ecn-sane mailing list > > > Ecn-sane@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > > > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > -- > > Dave Täht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-18 0:20 ` Dave Taht @ 2019-07-18 5:30 ` Jonathan Morton 0 siblings, 0 replies; 49+ messages in thread From: Jonathan Morton @ 2019-07-18 5:30 UTC (permalink / raw) To: Dave Taht; +Cc: David P. Reed, Bob Briscoe, ecn-sane, tsvwg IETF list > On 18 Jul, 2019, at 3:20 am, Dave Taht <dave.taht@gmail.com> wrote: > >> What it should do instead is peek the queue and drop until it hits a >> markable packet, at the very least. > > I didn't say this well. It should drop otherwise markable packets until it > exits the loop, and then mark the one it delivers from that flow, if it delivers > one from that flow. That gets rid of all the extra mass ecn creates... You know, I think I finally understand what you're talking about here. You want to treat cases where the marking rate exceeds the flow's packet delivery rate as an overload condition justifying a shift to packet drops. This actually seems like a sane idea. There is I think one caveat: selecting the Codel 'interval' parameter will now have an increased penalty for getting it wrong, especially on the too-small side. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 23:23 ` Dave Taht 2019-07-18 0:20 ` Dave Taht @ 2019-07-18 15:02 ` David P. Reed 2019-07-18 16:06 ` Dave Taht 1 sibling, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-07-18 15:02 UTC (permalink / raw) To: Dave Taht; +Cc: ecn-sane, Bob Briscoe, tsvwg IETF list Dave - The context of my remarks was about the end-to-end arguments for placing function in the Internet. To that end, that "you do not mind putting storage for low priority packets in the routers" doesn't matter, for two important reasons: 1) the idea that one should "throw in a feature" because people "don't mind" is exactly what leads to feature creep of the worst kind - features that serve absolutely no real purpose. That's what we rigorously objected to in the late 1970's. No, we would NOT throw in features as they were "requested" because we didn't mind. 2) you have made no argument that the function cannot be done properly at the ends, and no argument that putting it in the network is necessary for the ends to achieve storage. On Wednesday, July 17, 2019 7:23pm, "Dave Taht" <dave.taht@gmail.com> said: > On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wrote: >> >> A follow up point that I think needs to be made is one more end-to-end argument: >> >> It is NOT the job of the IP transport layer to provide free storage for low >> priority packets. The end-to-end argument here says: the ends can and must hold >> packets until they are either delivered or not relevant (in RTP, they become >> irrelevant when they get older than their desired delivery time, if you want an >> example of the latter), SO, the network should not provide the function of >> storage beyond the minimum needed to deal with transients. >> >> That means, unfortunately, that the dream of some kind of "background" path that >> stores "low priority" packets in the network fails the end-to-end argument test. > > I do not mind reserving a tiny portion of the network for "background" > traffic. This > is different (I think?) than storing low priority packets in the > network. A background > traffic "queue" of 1 packet would be fine.... > >> If you think about this, it even applies to some imaginary interplanetary IP >> layer network. Queueing delay is not a feature of any end-to-end requirement. >> >> What may be desired at the router/link level in an interplanetary IP layer is >> holding packets because a link is actually down, or using link-level error >> correction coding or retransmission to bring the error rate down to an acceptable >> level before declaring it down. But that's quite different - it's the link level >> protocol, which aims to deliver minimum queueing delay under tough conditions, >> without buffering more than needed for that (the number of bits that fit in the >> light-speed transmission at the transmission rate. > > As I outlined in my mit wifi talk - 1 layer of retry of at the wifi > mac layer made it > work, in 1998, and that seemed a very acceptable compromise at the > time. Present day > retries at the layer, not congestion controlled, is totally out of hand. > > In thinking about starlink's mac, and mobility, I gradulally came to > the conclusion that > 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I > disliked the idea. > > I still dislike retries at layer 2, even for nearby sats. really > complicates things. so for all I know I'll be advocating ripping 'em > out in starlink, if they are indeed, in there, next week. > >> So, the main reason I'm saying this is because again, there are those who want to >> implement the TCP function of reliable delivery of each packet in the links. >> That's a very bad idea. > > It was tried in the arpanet, and didn't work well there. There's a > good story about many > of the flaws of the Arpanet's design, including that problem, in the > latter half of Kleinrock's second book on queue theory, at least the > first edition... > > Wifi (and 345g) re-introduced the same problem with retransmits and > block acks at layer 2. > > and after dissecting my ecn battlemesh data and observing what the > retries at the mac layer STILL do on wifi with the current default > wifi codel target (20ms AFTER two txops are in the hardware) currently > achieve (50ms, which is 10x worse than what we could do and still > better performance under load than any other shipping physical layer > we have with fifos)... and after thinking hard about nagle's thought > that "every application has a right to one packet in the network", and > this very long thread reworking the end to end argument in a similar, > but not quite identical direction, I'm coming to a couple conclusions > I'd possibly not quite expressed well before. > > 1) transports should treat an RFC3168 CE coupled with loss (drop and > mark) as an even stronger signal of congestion than either, and that > this bit of the codel algorithm, > when ecn is in use, is wrong, and has always been wrong: > > https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178 > > (we added this arbitrarily to codel in the 5th day of development in > 2012. Using FQ masked it's effects on light traffic) > > What it should do instead is peek the queue and drop until it hits a > markable packet, at the very least. > > Pie has an arbitrary drop at 10% figure, which does lighten the load > some... cake used to have drop and mark also until a year or two > back... > > 2) At low rates and high contention, we really need pacing and fractional cwnd. > > (while I would very much like to see a dynamic reduction of MSS tried, > that too has a bottom limit) > > even then, drop as per bullet 1. > > 3) In the end, I could see a world with SCE marks, and CE being > obsoleted in favor of drop, or CE only being exerted on really light > loads similar to (or less than!) what the arbitrary 10% figure for pie > uses > > 4) in all cases, I vastly prefer somehow ultimately shifting greedy > transports to RTT rather than drop or CE as their primary congestion > control indicator. FQ makes that feasible today. With enough FQ > deployed for enough congestive scenarios and hardware, and RTT > becoming the core indicator for more transports, single queued designs > become possible in the distant future. > > >> >> On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.com> said: >> >> > I do want to toss in my personal observations about the "end-to-end argument" >> > related to per-flow-scheduling. (Such arguments are, of course, a class of >> > arguments to which my name is attached. Not that I am a judge/jury of such >> > questions...) >> > >> > A core principle of the Internet design is to move function out of the >> network, >> > including routers and middleboxes, if those functions >> > >> > a) can be properly accomplished by the endpoints, and >> > b) are not relevant to all uses of the Internet transport fabric being used by >> the >> > ends. >> > >> > The rationale here has always seemed obvious to me. Like Bob Briscoe suggests, >> we >> > were very wary of throwing features into the network that would preclude >> > unanticipated future interoperability needs, new applications, and new >> technology >> > in the infrastructure of the Internet as a whole. >> > >> > So what are we talking about here (ignoring the fine points of SCE, some of >> which >> > I think are debatable - especially the focus on TCP alone, since much traffic >> will >> > likely move away from TCP in the near future. >> > >> > A second technical requirement (necessary invariant) of the Internet's >> transport >> > is that the entire Internet depends on rigorously stopping queueing delay from >> > building up anywhere except at the endpoints, where the ends can manage it.This >> is >> > absolutely critical, though it is peculiar in that many engineers, especially >> > those who work at the IP layer and below, have a mental model of routing as >> > essentially being about building up queueing delay (in order to manage priority >> in >> > some trivial way by building up the queue on purpose, apparently). >> > >> > This second technical requirement cannot be resolved merely by the endpoints. >> > The reason is that the endpoints cannot know accurately what host-host paths >> share >> > common queues. >> > >> > This lack of a way to "cooperate" among independent users of a queue cannot be >> > solved by a purely end-to-end solution. (well, I suppose some genius might >> invent >> > a way, but I have not seen one in my 36 years closely watching the Internet in >> > operation since it went live in 1983.) >> > >> > So, what the end-to-end argument would tend to do here, in my opinion, is to >> > provide the most minimal mechanism in the devices that are capable of building >> up >> > a queue in order to allow all the ends sharing that queue to do their job - >> which >> > is to stop filling up the queue! >> > >> > Only the endpoints can prevent filling up queues. And depending on the >> protocol, >> > they may need to make very different, yet compatible choices. >> > >> > This is a question of design at the architectural level. And the future >> matters. >> > >> > So there is an end-to-end argument to be made here, but it is a subtle one. >> > >> > The basic mechanism for controlling queue depth has been, and remains, quite >> > simple: dropping packets. This has two impacts: 1) immediately reducing >> queueing >> > delay, and 2) signalling to endpoints that are paying attention that they have >> > contributed to an overfull queue. >> > >> > The optimum queueing delay in a steady state would always be one packet or >> less. >> > Kleinrock has shown this in the last few years. Of course there aren't steady >> > states. But we don't want a mechanism that can't converge to that steady state >> > *quickly*, for all queues in the network. >> > >> > Another issue is that endpoints are not aware of the fact that packets can >> take >> > multiple paths to any destination. In the future, alternate path choices can >> be >> > made by routers (when we get smarter routing algorithms based on traffic >> > engineering). >> > >> > So again, some minimal kind of information must be exposed to endpoints that >> will >> > continue to communicate. Again, the routers must be able to help a wide variety >> of >> > endpoints with different use cases to decide how to move queue buildup out of >> the >> > network itself. >> > >> > Now the decision made by the endpoints must be made in the context of >> information >> > about fairness. Maybe this is what is not obvious. >> > >> > The most obvious notion of fairness is equal shares among source host, dest >> host >> > pairs. There are drawbacks to that, but the benefit of it is that it affects >> the >> > IP layer alone, and deals with lots of boundary cases like the case where a >> single >> > host opens a zillion TCP connections or uses lots of UDP source ports or >> > destinations to somehow "cheat" by appearing to have "lots of flows". >> > >> > Another way to deal with dividing up flows is to ignore higher level protocol >> > information entirely, and put the flow idenfitication in the IP layer. A 32-bit >> or >> > 64-bit random number could be added as an "option" to IP to somehow extend the >> > flow space. >> > >> > But that is not the most important thing today. >> > >> > I write this to say: >> > 1) some kind of per-flow queueing, during the transient state where a queue is >> > overloaded before packets are dropped would provide much needed information to >> the >> > ends of every flow sharing a common queue. >> > 2) per-flow queueing, minimized to a very low level, using IP envelope address >> > information (plus maybe UDP and TCP addresses for those protocols in an >> extended >> > address-based flow definition) is totally compatible with end-to-end >> arguments, >> > but ONLY if the decisions made are certain to drive queueing delay out of the >> > router to the endpoints. >> > >> > >> > >> > >> > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.de> >> said: >> > >> >> Dear Bob, dear IETF team, >> >> >> >> >> >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: >> >>> >> >>> Jake, all, >> >>> >> >>> You may not be aware of my long history of concern about how per-flow >> scheduling >> >>> within endpoints and networks will limit the Internet in future. I find >> per-flow >> >>> scheduling a violation of the e2e principle in such a profound way - the >> dynamic >> >>> choice of the spacing between packets - that most people don't even associate >> it >> >>> with the e2e principle. >> >> >> >> This does not rhyme well with the L4S stated advantage of allowing >> packet >> >> reordering (due to mandating RACK for all L4S tcp endpoints). Because surely >> >> changing the order of packets messes up the "the dynamic choice of the >> spacing >> >> between packets" in a significant way. IMHO it is either L4S is great because >> it >> >> will give intermediate hops more leeway to re-order packets, or "a sender's >> >> packet spacing" is sacred, please make up your mind which it is. >> >> >> >>> >> >>> I detected that you were talking about FQ in a way that might have assumed >> my >> >>> concern with it was just about implementation complexity. If you (or anyone >> >>> watching) is not aware of the architectural concerns with per-flow >> scheduling, I >> >>> can enumerate them. >> >> >> >> Please do not hesitate to do so after your deserved holiday, and please >> state a >> >> superior alternative. >> >> >> >> Best Regards >> >> Sebastian >> >> >> >> >> >>> >> >>> I originally started working on what became L4S to prove that it was possible >> to >> >>> separate out reducing queuing delay from throughput scheduling. When Koen and >> I >> >>> started working together on this, we discovered we had identical concerns on >> >>> this. >> >>> >> >>> >> >>> >> >>> Bob >> >>> >> >>> >> >>> -- >> >>> ________________________________________________________________ >> >>> Bob Briscoe http://bobbriscoe.net/ >> >>> >> >>> _______________________________________________ >> >>> Ecn-sane mailing list >> >>> Ecn-sane@lists.bufferbloat.net >> >>> https://lists.bufferbloat.net/listinfo/ecn-sane >> >> >> >> _______________________________________________ >> >> Ecn-sane mailing list >> >> Ecn-sane@lists.bufferbloat.net >> >> https://lists.bufferbloat.net/listinfo/ecn-sane >> >> >> > >> > >> > _______________________________________________ >> > Ecn-sane mailing list >> > Ecn-sane@lists.bufferbloat.net >> > https://lists.bufferbloat.net/listinfo/ecn-sane >> > >> >> >> _______________________________________________ >> Ecn-sane mailing list >> Ecn-sane@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/ecn-sane > > > > -- > > Dave Täht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-18 15:02 ` David P. Reed @ 2019-07-18 16:06 ` Dave Taht 0 siblings, 0 replies; 49+ messages in thread From: Dave Taht @ 2019-07-18 16:06 UTC (permalink / raw) To: David P. Reed; +Cc: ecn-sane, Bob Briscoe, tsvwg IETF list On Thu, Jul 18, 2019 at 8:02 AM David P. Reed <dpreed@deepplum.com> wrote: > > Dave - > The context of my remarks was about the end-to-end arguments for placing function in the Internet. > > To that end, that "you do not mind putting storage for low priority packets in the routers" doesn't matter, for two important reasons: > > 1) the idea that one should "throw in a feature" because people "don't mind" is exactly what leads to feature creep of the worst kind - features that serve absolutely no real purpose. That's what we rigorously objected to in the late 1970's. No, we would NOT throw in features as they were "requested" because we didn't mind. I dig it. :) If only the 5G folk had had your approach..... > > 2) you have made no argument that the function cannot be done properly at the ends, and no argument that putting it in the network is necessary for the ends to achieve storage. You are correct. > On Wednesday, July 17, 2019 7:23pm, "Dave Taht" <dave.taht@gmail.com> said: > > > On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wrote: > >> > >> A follow up point that I think needs to be made is one more end-to-end argument: > >> > >> It is NOT the job of the IP transport layer to provide free storage for low > >> priority packets. The end-to-end argument here says: the ends can and must hold > >> packets until they are either delivered or not relevant (in RTP, they become > >> irrelevant when they get older than their desired delivery time, if you want an > >> example of the latter), SO, the network should not provide the function of > >> storage beyond the minimum needed to deal with transients. > >> > >> That means, unfortunately, that the dream of some kind of "background" path that > >> stores "low priority" packets in the network fails the end-to-end argument test. > > > > I do not mind reserving a tiny portion of the network for "background" > > traffic. This > > is different (I think?) than storing low priority packets in the > > network. A background > > traffic "queue" of 1 packet would be fine.... > > > >> If you think about this, it even applies to some imaginary interplanetary IP > >> layer network. Queueing delay is not a feature of any end-to-end requirement. > >> > >> What may be desired at the router/link level in an interplanetary IP layer is > >> holding packets because a link is actually down, or using link-level error > >> correction coding or retransmission to bring the error rate down to an acceptable > >> level before declaring it down. But that's quite different - it's the link level > >> protocol, which aims to deliver minimum queueing delay under tough conditions, > >> without buffering more than needed for that (the number of bits that fit in the > >> light-speed transmission at the transmission rate. > > > > As I outlined in my mit wifi talk - 1 layer of retry of at the wifi > > mac layer made it > > work, in 1998, and that seemed a very acceptable compromise at the > > time. Present day > > retries at the layer, not congestion controlled, is totally out of hand. > > > > In thinking about starlink's mac, and mobility, I gradulally came to > > the conclusion that > > 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I > > disliked the idea. > > > > I still dislike retries at layer 2, even for nearby sats. really > > complicates things. so for all I know I'll be advocating ripping 'em > > out in starlink, if they are indeed, in there, next week. > > > >> So, the main reason I'm saying this is because again, there are those who want to > >> implement the TCP function of reliable delivery of each packet in the links. > >> That's a very bad idea. > > > > It was tried in the arpanet, and didn't work well there. There's a > > good story about many > > of the flaws of the Arpanet's design, including that problem, in the > > latter half of Kleinrock's second book on queue theory, at least the > > first edition... > > > > Wifi (and 345g) re-introduced the same problem with retransmits and > > block acks at layer 2. > > > > and after dissecting my ecn battlemesh data and observing what the > > retries at the mac layer STILL do on wifi with the current default > > wifi codel target (20ms AFTER two txops are in the hardware) currently > > achieve (50ms, which is 10x worse than what we could do and still > > better performance under load than any other shipping physical layer > > we have with fifos)... and after thinking hard about nagle's thought > > that "every application has a right to one packet in the network", and > > this very long thread reworking the end to end argument in a similar, > > but not quite identical direction, I'm coming to a couple conclusions > > I'd possibly not quite expressed well before. > > > > 1) transports should treat an RFC3168 CE coupled with loss (drop and > > mark) as an even stronger signal of congestion than either, and that > > this bit of the codel algorithm, > > when ecn is in use, is wrong, and has always been wrong: > > > > https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178 > > > > (we added this arbitrarily to codel in the 5th day of development in > > 2012. Using FQ masked it's effects on light traffic) > > > > What it should do instead is peek the queue and drop until it hits a > > markable packet, at the very least. > > > > Pie has an arbitrary drop at 10% figure, which does lighten the load > > some... cake used to have drop and mark also until a year or two > > back... > > > > 2) At low rates and high contention, we really need pacing and fractional cwnd. > > > > (while I would very much like to see a dynamic reduction of MSS tried, > > that too has a bottom limit) > > > > even then, drop as per bullet 1. > > > > 3) In the end, I could see a world with SCE marks, and CE being > > obsoleted in favor of drop, or CE only being exerted on really light > > loads similar to (or less than!) what the arbitrary 10% figure for pie > > uses > > > > 4) in all cases, I vastly prefer somehow ultimately shifting greedy > > transports to RTT rather than drop or CE as their primary congestion > > control indicator. FQ makes that feasible today. With enough FQ > > deployed for enough congestive scenarios and hardware, and RTT > > becoming the core indicator for more transports, single queued designs > > become possible in the distant future. > > > > > >> > >> On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.com> said: > >> > >> > I do want to toss in my personal observations about the "end-to-end argument" > >> > related to per-flow-scheduling. (Such arguments are, of course, a class of > >> > arguments to which my name is attached. Not that I am a judge/jury of such > >> > questions...) > >> > > >> > A core principle of the Internet design is to move function out of the > >> network, > >> > including routers and middleboxes, if those functions > >> > > >> > a) can be properly accomplished by the endpoints, and > >> > b) are not relevant to all uses of the Internet transport fabric being used by > >> the > >> > ends. > >> > > >> > The rationale here has always seemed obvious to me. Like Bob Briscoe suggests, > >> we > >> > were very wary of throwing features into the network that would preclude > >> > unanticipated future interoperability needs, new applications, and new > >> technology > >> > in the infrastructure of the Internet as a whole. > >> > > >> > So what are we talking about here (ignoring the fine points of SCE, some of > >> which > >> > I think are debatable - especially the focus on TCP alone, since much traffic > >> will > >> > likely move away from TCP in the near future. > >> > > >> > A second technical requirement (necessary invariant) of the Internet's > >> transport > >> > is that the entire Internet depends on rigorously stopping queueing delay from > >> > building up anywhere except at the endpoints, where the ends can manage it.This > >> is > >> > absolutely critical, though it is peculiar in that many engineers, especially > >> > those who work at the IP layer and below, have a mental model of routing as > >> > essentially being about building up queueing delay (in order to manage priority > >> in > >> > some trivial way by building up the queue on purpose, apparently). > >> > > >> > This second technical requirement cannot be resolved merely by the endpoints. > >> > The reason is that the endpoints cannot know accurately what host-host paths > >> share > >> > common queues. > >> > > >> > This lack of a way to "cooperate" among independent users of a queue cannot be > >> > solved by a purely end-to-end solution. (well, I suppose some genius might > >> invent > >> > a way, but I have not seen one in my 36 years closely watching the Internet in > >> > operation since it went live in 1983.) > >> > > >> > So, what the end-to-end argument would tend to do here, in my opinion, is to > >> > provide the most minimal mechanism in the devices that are capable of building > >> up > >> > a queue in order to allow all the ends sharing that queue to do their job - > >> which > >> > is to stop filling up the queue! > >> > > >> > Only the endpoints can prevent filling up queues. And depending on the > >> protocol, > >> > they may need to make very different, yet compatible choices. > >> > > >> > This is a question of design at the architectural level. And the future > >> matters. > >> > > >> > So there is an end-to-end argument to be made here, but it is a subtle one. > >> > > >> > The basic mechanism for controlling queue depth has been, and remains, quite > >> > simple: dropping packets. This has two impacts: 1) immediately reducing > >> queueing > >> > delay, and 2) signalling to endpoints that are paying attention that they have > >> > contributed to an overfull queue. > >> > > >> > The optimum queueing delay in a steady state would always be one packet or > >> less. > >> > Kleinrock has shown this in the last few years. Of course there aren't steady > >> > states. But we don't want a mechanism that can't converge to that steady state > >> > *quickly*, for all queues in the network. > >> > > >> > Another issue is that endpoints are not aware of the fact that packets can > >> take > >> > multiple paths to any destination. In the future, alternate path choices can > >> be > >> > made by routers (when we get smarter routing algorithms based on traffic > >> > engineering). > >> > > >> > So again, some minimal kind of information must be exposed to endpoints that > >> will > >> > continue to communicate. Again, the routers must be able to help a wide variety > >> of > >> > endpoints with different use cases to decide how to move queue buildup out of > >> the > >> > network itself. > >> > > >> > Now the decision made by the endpoints must be made in the context of > >> information > >> > about fairness. Maybe this is what is not obvious. > >> > > >> > The most obvious notion of fairness is equal shares among source host, dest > >> host > >> > pairs. There are drawbacks to that, but the benefit of it is that it affects > >> the > >> > IP layer alone, and deals with lots of boundary cases like the case where a > >> single > >> > host opens a zillion TCP connections or uses lots of UDP source ports or > >> > destinations to somehow "cheat" by appearing to have "lots of flows". > >> > > >> > Another way to deal with dividing up flows is to ignore higher level protocol > >> > information entirely, and put the flow idenfitication in the IP layer. A 32-bit > >> or > >> > 64-bit random number could be added as an "option" to IP to somehow extend the > >> > flow space. > >> > > >> > But that is not the most important thing today. > >> > > >> > I write this to say: > >> > 1) some kind of per-flow queueing, during the transient state where a queue is > >> > overloaded before packets are dropped would provide much needed information to > >> the > >> > ends of every flow sharing a common queue. > >> > 2) per-flow queueing, minimized to a very low level, using IP envelope address > >> > information (plus maybe UDP and TCP addresses for those protocols in an > >> extended > >> > address-based flow definition) is totally compatible with end-to-end > >> arguments, > >> > but ONLY if the decisions made are certain to drive queueing delay out of the > >> > router to the endpoints. > >> > > >> > > >> > > >> > > >> > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.de> > >> said: > >> > > >> >> Dear Bob, dear IETF team, > >> >> > >> >> > >> >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote: > >> >>> > >> >>> Jake, all, > >> >>> > >> >>> You may not be aware of my long history of concern about how per-flow > >> scheduling > >> >>> within endpoints and networks will limit the Internet in future. I find > >> per-flow > >> >>> scheduling a violation of the e2e principle in such a profound way - the > >> dynamic > >> >>> choice of the spacing between packets - that most people don't even associate > >> it > >> >>> with the e2e principle. > >> >> > >> >> This does not rhyme well with the L4S stated advantage of allowing > >> packet > >> >> reordering (due to mandating RACK for all L4S tcp endpoints). Because surely > >> >> changing the order of packets messes up the "the dynamic choice of the > >> spacing > >> >> between packets" in a significant way. IMHO it is either L4S is great because > >> it > >> >> will give intermediate hops more leeway to re-order packets, or "a sender's > >> >> packet spacing" is sacred, please make up your mind which it is. > >> >> > >> >>> > >> >>> I detected that you were talking about FQ in a way that might have assumed > >> my > >> >>> concern with it was just about implementation complexity. If you (or anyone > >> >>> watching) is not aware of the architectural concerns with per-flow > >> scheduling, I > >> >>> can enumerate them. > >> >> > >> >> Please do not hesitate to do so after your deserved holiday, and please > >> state a > >> >> superior alternative. > >> >> > >> >> Best Regards > >> >> Sebastian > >> >> > >> >> > >> >>> > >> >>> I originally started working on what became L4S to prove that it was possible > >> to > >> >>> separate out reducing queuing delay from throughput scheduling. When Koen and > >> I > >> >>> started working together on this, we discovered we had identical concerns on > >> >>> this. > >> >>> > >> >>> > >> >>> > >> >>> Bob > >> >>> > >> >>> > >> >>> -- > >> >>> ________________________________________________________________ > >> >>> Bob Briscoe http://bobbriscoe.net/ > >> >>> > >> >>> _______________________________________________ > >> >>> Ecn-sane mailing list > >> >>> Ecn-sane@lists.bufferbloat.net > >> >>> https://lists.bufferbloat.net/listinfo/ecn-sane > >> >> > >> >> _______________________________________________ > >> >> Ecn-sane mailing list > >> >> Ecn-sane@lists.bufferbloat.net > >> >> https://lists.bufferbloat.net/listinfo/ecn-sane > >> >> > >> > > >> > > >> > _______________________________________________ > >> > Ecn-sane mailing list > >> > Ecn-sane@lists.bufferbloat.net > >> > https://lists.bufferbloat.net/listinfo/ecn-sane > >> > > >> > >> > >> _______________________________________________ > >> Ecn-sane mailing list > >> Ecn-sane@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/ecn-sane > > > > > > > > -- > > > > Dave Täht > > CTO, TekLibre, LLC > > http://www.teklibre.com > > Tel: 1-831-205-9740 > > > > -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 22:18 ` David P. Reed 2019-07-17 22:34 ` David P. Reed @ 2019-07-18 4:31 ` Jonathan Morton 2019-07-18 15:52 ` David P. Reed 2019-07-18 5:24 ` [Ecn-sane] " Jonathan Morton 2 siblings, 1 reply; 49+ messages in thread From: Jonathan Morton @ 2019-07-18 4:31 UTC (permalink / raw) To: David P. Reed; +Cc: Sebastian Moeller, ecn-sane, Bob Briscoe, tsvwg IETF list > On 18 Jul, 2019, at 1:18 am, David P. Reed <dpreed@deepplum.com> wrote: > > So what are we talking about here (ignoring the fine points of SCE, some of which I think are debatable - especially the focus on TCP alone, since much traffic will likely move away from TCP in the near future. As a point of order, SCE is not specific to TCP. TCP is merely the most convenient congestion-aware protocol to experiment with, and therefore the one we have adapted first. Other protocols which already are (or aspire to be) TCP friendly, especially QUIC, should also be straightforward to adapt to SCE. I should also note that TCP is the de-facto gold standard, by which all other congestion control is measured, for better or worse. SCE is included in this, insofar as competing reasonably with standard TCP flows under all reasonable network conditions is necessary to introduce a new congestion control paradigm. This, I think, is also part of the end-to-end principle. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-18 4:31 ` Jonathan Morton @ 2019-07-18 15:52 ` David P. Reed 2019-07-18 18:12 ` [Ecn-sane] [tsvwg] " Dave Taht 0 siblings, 1 reply; 49+ messages in thread From: David P. Reed @ 2019-07-18 15:52 UTC (permalink / raw) To: Jonathan Morton; +Cc: Sebastian Moeller, ecn-sane, Bob Briscoe, tsvwg IETF list On Thursday, July 18, 2019 12:31am, "Jonathan Morton" <chromatix99@gmail.com> said: >> On 18 Jul, 2019, at 1:18 am, David P. Reed <dpreed@deepplum.com> wrote: >> >> So what are we talking about here (ignoring the fine points of SCE, some of which >> I think are debatable - especially the focus on TCP alone, since much traffic >> will likely move away from TCP in the near future. > > As a point of order, SCE is not specific to TCP. TCP is merely the most > convenient congestion-aware protocol to experiment with, and therefore the one we > have adapted first. Other protocols which already are (or aspire to be) TCP > friendly, especially QUIC, should also be straightforward to adapt to SCE. > I agree that it is important to show that SCE in the IP layer can be interpreted in each congestion management system, and TCP is a major one. Ideally there would be a general theory that is protocol and use case agnostic, so that the functions in IP are helpful both in particular protocols and also in the very important case of interacti0ns between different coexisting protocols. I believe that SCE can be structured so that it serves that purpose as more and more protocols that are based on UDP generate more and more traffic - when we designed UDP we decided NOT to try to put congestion management in UDP deliberately, for two reasons: 1) we didn't yet have a good congestion management approach for TCP, and 2) major use cases that argued for UDP (packet speech, in particular, but lots of computer-computer interactions on LANs, such as Remote Procedure Calls, etc.) were known to require different approaches to congestion management beyond the basic packet-drop (such as rate management via compression variation). UDP was part of the design choice to allow end-to-end agreement about congestion management implementation. We now have a very complex protocol due to the WWW, that imperfectly works on TCP. Thus, a new UDP based protocol framework is proposed, and will be quite heavily used in access networks that need congestion management, both at the server end and the client end. And we have heavy use of media streaming (though it matches TCP adequately well, being file-transfer-like due to buffering at the receiving end). Google and others are working hard to transition entirely away from HTTP/TCP to HTTP/QUIC/UDP. This transition will be concurrent, if not prior, to SCE integration into IP. I would hope that QUIC could use SCE to great advantage, especially in helping the co-existence of two competing uses for the same bottleneck paths without queueing delay. That's the case that matters to me, along with RTP and other uses. From browser-level monitoring, we already see many landing web pages open up HTTP requests to 100's of different server hosts concurrently. Yes, that is hundreds for one, count them, one click. This is not a bad thing. The designers of the Internet should not be allowed to say that it is wrong. Because it isn't wrong - it's exactly what the Internet is supposed to be able to do! However, the browser or its host must have the information to avoid queue overflow in this new protocol. That means a useful means like SCE It also, I believe, means that arbitration based on "flows" matter. So per-flow interactions matter. I don't know, but I believe that when lots of browsers end up sharing a bottleneck link/queue, per-flow scheduling may help a reasonable amount, primarily by preventing starvation of resources. (In scheduling of parallel computing, we call that "prevention of livelock". And when you have a hundred processors on a computer - which is what my day job supports, you get livelock ALL the time if you don't guarantee that all contenders on a resource get a chance.) What does NOT matter is some complex (intserv/diffserv) differentiation at the router level, or at least not much. > I should also note that TCP is the de-facto gold standard, by which all other > congestion control is measured, for better or worse. SCE is included in this, > insofar as competing reasonably with standard TCP flows under all reasonable > network conditions is necessary to introduce a new congestion control paradigm. > This, I think, is also part of the end-to-end principle. > > - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-07-18 15:52 ` David P. Reed @ 2019-07-18 18:12 ` Dave Taht 0 siblings, 0 replies; 49+ messages in thread From: Dave Taht @ 2019-07-18 18:12 UTC (permalink / raw) To: David P. Reed Cc: Jonathan Morton, ecn-sane, Sebastian Moeller, tsvwg IETF list "David P. Reed" <dpreed@deepplum.com> writes: > On Thursday, July 18, 2019 12:31am, "Jonathan Morton" <chromatix99@gmail.com> said: > >>> On 18 Jul, 2019, at 1:18 am, David P. Reed <dpreed@deepplum.com> wrote: >>> >>> So what are we talking about here (ignoring the fine points of SCE, some of which >>> I think are debatable - especially the focus on TCP alone, since much traffic >>> will likely move away from TCP in the near future. >> >> As a point of order, SCE is not specific to TCP. TCP is merely the most >> convenient congestion-aware protocol to experiment with, and therefore the one we >> have adapted first. Other protocols which already are (or aspire to be) TCP >> friendly, especially QUIC, should also be straightforward to adapt to SCE. >> > I agree that it is important to show that SCE in the IP layer can be > interpreted in each congestion management system, and TCP is a major > one. Ideally there would be a general theory that is protocol and use > case agnostic, so that the functions in IP are helpful both in > particular protocols and also in the very important case of > interacti0ns between different coexisting protocols. I believe that > SCE can be structured so that it serves that purpose as more and more > protocols that are based on UDP generate more and more traffic - > when we designed UDP we decided NOT to try to put congestion management in UDP deliberately, for two reasons: > 1) we didn't yet have a good congestion management approach for TCP, and > 2) major use cases that argued for UDP (packet speech, in particular, > but lots of computer-computer interactions on LANs, such as Remote > Procedure Calls, etc.) were known to require different approaches to > congestion management beyond the basic packet-drop (such as rate > management via compression variation). > UDP was part of the design choice to allow end-to-end agreement about congestion management implementation. > We now have a very complex protocol due to the WWW, that imperfectly > works on TCP. Thus, a new UDP based protocol framework is proposed, > and will be quite heavily used in access networks that need congestion > management, both at the server end and the client end. > And we have heavy use of media streaming (though it matches TCP > adequately well, being file-transfer-like due to buffering at the > receiving end). > > Google and others are working hard to transition entirely away from > HTTP/TCP to HTTP/QUIC/UDP. This transition will be concurrent, if not > prior, to SCE integration into IP. I would hope that QUIC could use > SCE to great advantage, especially in helping the co-existence of two > competing uses for the same bottleneck paths without queueing delay. > > That's the case that matters to me, along with RTP and other > uses. These are my own primary foci as well. I think SCE would be great on videoconferencing apps. I'd *really* like to be working on QUIC for a variety of reasons and in a variety of situations and daemons and libraries, but lacking funding for *anything* at present best I can do is urge the work continue. I sat down to try and add SCE to gcc ("google congestion control") a few weeks back and got befuddled by just how hard it is to build a browser from scratch nowadays. > From browser-level monitoring, we already see many landing web > pages open up HTTP requests to 100's of different server hosts > concurrently. Yes, that is hundreds for one, count them, one click. I kind of regard that number as high, but what counts for me is that these are *persistent* mobile capable connections that can be "nailed up" just how tcp used to be. nailing up tcp connections to anywhere has become increasingly difficult. prior work on mosh and wireguard is also pointing the way towards less tcp in the internet. I was trying to find the epic reddit discussion of this piece of avery's regarding ipv6 adoption, can't find it, pointing here: https://apenwarr.ca/log/20170810 > > This is not a bad thing. The designers of the Internet should not be > allowed to say that it is wrong. Because it isn't wrong - it's exactly > what the Internet is supposed to be able to do! However, the browser > or its host must have the information to avoid queue overflow in this > new protocol. That means a useful means like SCE > > It also, I believe, means that arbitration based on "flows" matter. So > per-flow interactions matter. I don't know, but I believe that when > lots of browsers end up sharing a bottleneck link/queue, per-flow > scheduling may help a reasonable amount, primarily by preventing > starvation of resources. (In scheduling of parallel computing, we call > that "prevention of livelock". And when you have a hundred processors > on a computer - which is what my day job supports, you get livelock > ALL the time if you don't guarantee that all contenders on a resource > get a chance.) What does NOT matter is some complex > (intserv/diffserv) differentiation at the router level, or at least > not much. yep. > >> I should also note that TCP is the de-facto gold standard, by which all other >> congestion control is measured, for better or worse. SCE is included in this, >> insofar as competing reasonably with standard TCP flows under all reasonable >> network conditions is necessary to introduce a new congestion control paradigm. >> This, I think, is also part of the end-to-end principle. >> >> - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-17 22:18 ` David P. Reed 2019-07-17 22:34 ` David P. Reed 2019-07-18 4:31 ` Jonathan Morton @ 2019-07-18 5:24 ` Jonathan Morton 2019-07-22 13:44 ` Bob Briscoe 2 siblings, 1 reply; 49+ messages in thread From: Jonathan Morton @ 2019-07-18 5:24 UTC (permalink / raw) To: David P. Reed; +Cc: Sebastian Moeller, ecn-sane, Bob Briscoe, tsvwg IETF list Quoting selectively: > On 18 Jul, 2019, at 1:18 am, David P. Reed <dpreed@deepplum.com> wrote: > > This lack of a way to "cooperate" among independent users of a queue cannot be solved by a purely end-to-end solution. (well, I suppose some genius might invent a way, but I have not seen one in my 36 years closely watching the Internet in operation since it went live in 1983.) > > So, what the end-to-end argument would tend to do here, in my opinion, is to provide the most minimal mechanism in the devices that are capable of building up a queue in order to allow all the ends sharing that queue to do their job - which is to stop filling up the queue! > The optimum queueing delay in a steady state would always be one packet or less. Kleinrock has shown this in the last few years. Of course there aren't steady states. But we don't want a mechanism that can't converge to that steady state *quickly*, for all queues in the network. > The most obvious notion of fairness is equal shares among source host, dest host pairs. There are drawbacks to that, but the benefit of it is that it affects the IP layer alone, and deals with lots of boundary cases like the case where a single host opens a zillion TCP connections or uses lots of UDP source ports or destinations to somehow "cheat" by appearing to have "lots of flows". > I write this to say: > 1) some kind of per-flow queueing, during the transient state where a queue is overloaded before packets are dropped would provide much needed information to the ends of every flow sharing a common queue. > 2) per-flow queueing, minimized to a very low level, using IP envelope address information (plus maybe UDP and TCP addresses for those protocols in an extended address-based flow definition) is totally compatible with end-to-end arguments, but ONLY if the decisions made are certain to drive queueing delay out of the router to the endpoints. These are points that I can agree with quite easily, and which are reflected in my work to date. Although I don't usually quote Kleinrock by name, the principle of always permitting one MTU per flow in the queue is fairly obvious. One of the things that per-flow queuing provides to endpoints is differential treatment in AQM. This is even true of LFQ, although the mechanism may not be obvious to all since there is only one set of AQM state; at minimum, AQM signals are suppressed for sparse flows and thus only provided to saturating flows. SCE marking would be based on individual packets' sojourn times, which are logically independent from their physical position in the queue. Careful implementation of a Codel-type AQM would also suppress CE marks (or drops) from well-behaved flows whose sojourn times are below the target, even if unresponsive flows are also present in the same queue, without losing accumulated state about the latter; this is I think already a property of COBALT (as implemented in Cake). Other AQMs which convert a sojourn time more-or-less directly into a marking rate would also be a good fit. I would only quibble that providing per-L4-flow fairness *within* a per-host or per-subscriber fairness structure is also feasible, in at least some contexts; Cake implements that, for example. I hope to be able to amplify the LFQ draft to show how to provide that in a more lightweight manner than Cake manages, on my way to Montreal. I may also publish CNQ (Cheap Nasty Queuing) as a straw-man draft at the same time, depending on my mood. It should be good for light relief if nothing else. It's even lighter-weight than LFQ - but, unlike LFQ, achieves this at the expense of performance. It maintains only enough state to prioritise sparse flows, with a rather strict definition of "sparse". - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-18 5:24 ` [Ecn-sane] " Jonathan Morton @ 2019-07-22 13:44 ` Bob Briscoe 2019-07-23 5:00 ` Jonathan Morton 2019-07-23 15:12 ` [Ecn-sane] [tsvwg] " Kyle Rose 0 siblings, 2 replies; 49+ messages in thread From: Bob Briscoe @ 2019-07-22 13:44 UTC (permalink / raw) To: Jonathan Morton, David P. Reed; +Cc: ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 5215 bytes --] Folks, As promised, I've pulled together and uploaded the main architectural arguments about per-flow scheduling that cause concern: Per-Flow Scheduling and the End-to-End Argument <http://bobbriscoe.net/projects/latency/per-flow_tr.pdf> It runs to 6 pages of reading. But I tried to make the time readers will have to spend worth it. I have to get on with other stuff for a while. I've caught up with /reading/ this thread (after returning from my break), but refrained from responding to some individual points, Will try to do that soon. Finally, I want to emphasize that the purpose is for advocates of per-flow scheduling to understand that there is a coherent world view that agrees with the arguments in this paper. If you have a different set of assumptions and perspectives that leads you to advocate per-flow scheduling and disagree with some of these arguments, that's to be expected. The purpose is to explain why some people don't want FQ, and therefore why it's important to leave the choice open between FQ and DualQ. That has always been the intent of L4S, which supports both. Bob On 18/07/2019 06:24, Jonathan Morton wrote: > Quoting selectively: > >> On 18 Jul, 2019, at 1:18 am, David P. Reed<dpreed@deepplum.com> wrote: >> >> This lack of a way to "cooperate" among independent users of a queue cannot be solved by a purely end-to-end solution. (well, I suppose some genius might invent a way, but I have not seen one in my 36 years closely watching the Internet in operation since it went live in 1983.) >> >> So, what the end-to-end argument would tend to do here, in my opinion, is to provide the most minimal mechanism in the devices that are capable of building up a queue in order to allow all the ends sharing that queue to do their job - which is to stop filling up the queue! >> The optimum queueing delay in a steady state would always be one packet or less. Kleinrock has shown this in the last few years. Of course there aren't steady states. But we don't want a mechanism that can't converge to that steady state *quickly*, for all queues in the network. >> The most obvious notion of fairness is equal shares among source host, dest host pairs. There are drawbacks to that, but the benefit of it is that it affects the IP layer alone, and deals with lots of boundary cases like the case where a single host opens a zillion TCP connections or uses lots of UDP source ports or destinations to somehow "cheat" by appearing to have "lots of flows". >> I write this to say: >> 1) some kind of per-flow queueing, during the transient state where a queue is overloaded before packets are dropped would provide much needed information to the ends of every flow sharing a common queue. >> 2) per-flow queueing, minimized to a very low level, using IP envelope address information (plus maybe UDP and TCP addresses for those protocols in an extended address-based flow definition) is totally compatible with end-to-end arguments, but ONLY if the decisions made are certain to drive queueing delay out of the router to the endpoints. > These are points that I can agree with quite easily, and which are reflected in my work to date. Although I don't usually quote Kleinrock by name, the principle of always permitting one MTU per flow in the queue is fairly obvious. > > One of the things that per-flow queuing provides to endpoints is differential treatment in AQM. This is even true of LFQ, although the mechanism may not be obvious to all since there is only one set of AQM state; at minimum, AQM signals are suppressed for sparse flows and thus only provided to saturating flows. SCE marking would be based on individual packets' sojourn times, which are logically independent from their physical position in the queue. Careful implementation of a Codel-type AQM would also suppress CE marks (or drops) from well-behaved flows whose sojourn times are below the target, even if unresponsive flows are also present in the same queue, without losing accumulated state about the latter; this is I think already a property of COBALT (as implemented in Cake). Other AQMs which convert a sojourn time more-or-less directly into a marking rate would also be a good fit. > > I would only quibble that providing per-L4-flow fairness *within* a per-host or per-subscriber fairness structure is also feasible, in at least some contexts; Cake implements that, for example. I hope to be able to amplify the LFQ draft to show how to provide that in a more lightweight manner than Cake manages, on my way to Montreal. > > I may also publish CNQ (Cheap Nasty Queuing) as a straw-man draft at the same time, depending on my mood. It should be good for light relief if nothing else. It's even lighter-weight than LFQ - but, unlike LFQ, achieves this at the expense of performance. It maintains only enough state to prioritise sparse flows, with a rather strict definition of "sparse". > > - Jonathan Morton > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane -- ________________________________________________________________ Bob Briscoehttp://bobbriscoe.net/ [-- Attachment #2: Type: text/html, Size: 6792 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-22 13:44 ` Bob Briscoe @ 2019-07-23 5:00 ` Jonathan Morton 2019-07-23 11:35 ` [Ecn-sane] CNQ cheap-nasty-queuing (was per-flow queuing) Luca Muscariello 2019-07-23 20:14 ` [Ecn-sane] per-flow scheduling Bob Briscoe 2019-07-23 15:12 ` [Ecn-sane] [tsvwg] " Kyle Rose 1 sibling, 2 replies; 49+ messages in thread From: Jonathan Morton @ 2019-07-23 5:00 UTC (permalink / raw) To: Bob Briscoe; +Cc: David P. Reed, ecn-sane, tsvwg IETF list > On 22 Jul, 2019, at 9:44 am, Bob Briscoe <ietf@bobbriscoe.net> wrote: > > As promised, I've pulled together and uploaded the main architectural arguments about per-flow scheduling that cause concern: > Per-Flow Scheduling and the End-to-End Argument > > It runs to 6 pages of reading. But I tried to make the time readers will have to spend worth it. Thanks for posting this. Upon reading, there is much to disagree with, but at least some points of view can be better understood in their context. However, there is one thing I must take issue with up front - your repeated assertion that SCE cannot work without FQ. As some of the plots we showed you this weekend (and maybe even earlier) demonstrate, we *do* have SCE working in a single-queue environment, even with non-SCE traffic mixed in; this is a development since your initial view of SCE several months ago. SCE will also work just fine (as it always did) with plain RFC-3168 compliant single-queue AQMs, as might reasonably be deployed to improve the short-term overload performance of core or access networks, or as a low-cost bufferbloat mitigation mechanism in switching, head-end or CPE devices. In that case SCE's behaviour is RFC-3168 compliant and TCP-friendly, and specifically does not require special treatment in the network. I would therefore appreciate a revision of your paper which removes that false assertion from the several places it is made. > Finally, I want to emphasize that the purpose is for advocates of per-flow scheduling to understand that there is a coherent world view that agrees with the arguments in this paper. If you have a different set of assumptions and perspectives that leads you to advocate per-flow scheduling and disagree with some of these arguments, that's to be expected. > > The purpose is to explain why some people don't want FQ, and therefore why it's important to leave the choice open between FQ and DualQ. That has always been the intent of L4S, which supports both. A central theme of your paper is a distinction of needs between times of "plenty" and "famine" in terms of network capacity. However, the dividing line between these states is not defined. This is a crucial deficiency when the argument made from the start is that FQ can be helpful during "famine" but is detrimental during "plenty". One reasonable definition of such a dividing line is that "plenty" exists whenever the link is not fully utilised. But in that case, any modern FQ algorithm will deliver packets in order of arrival, allowing all flows to use as much capacity as they desire. Only when the packet arrival rate begins to exceed that of draining, when a "famine" state could be said to exist, does FQ apply any management at all. Even then, if capacity is left over after metering out each flow's fair share, traffic is free to make use of it. Thus under this definition, FQ is strictly harmless. Reading between the lines, I suspect that what you mean is that if, after subtracting the capacity used by inelastic flows (such as the VR streams mentioned in one scenario), there is "sufficient" capacity left for elastic flows to experience good performance, then you would say there is still a state of "plenty". But this is a highly subjective definition and liable to change over time; would 1.5Mbps leftover capacity, equivalent to a T1 line of old, be acceptable today? I think most modern users would classify it as "famine", but it was considered quite a luxury 20 years ago. Regardless, my assertion is not that FQ is required for ultra-low latency, but that flows requiring ultra-low latency must be isolated from general traffic. FQ is a convenient, generic way to do that, and DualQ is another way; other ways exist, such as through Diffserv. If both traffic types are fed through a single queue, they will also share fates with respect to latency performance, such that every little misbehaviour of the general traffic will be reflected in the latency seen by the more sensitive flows. It is true that SCE doesn't inherently carry a label distinguishing its traffic from the general set, and thus DualQ cannot be directly applied to it. But there is a straightforward way to perform this labelling if required, right next door in the Diffserv field. The recently proposed NQB DSCP would likely be suitable. I don't think that the majority of potential SCE users would need or even want this distinction (the primary benefit of SCE being better link utilisation by eliminating the traditional deep sawtooth), but the mechanism exists, orthogonally to SCE itself. I have also drawn up, as a straw-man proposal, CNQ - Cheap Nasty Queuing: https://tools.ietf.org/html/draft-morton-tsvwg-cheap-nasty-queueing-00 This is a single-queue AQM, plus a side channel for prioritising sparse flows, although the definition of "sparse" is much stricter than for a true FQ implementation (even LFQ). In essence, CNQ treats a flow as sparse if its inter-packet gap is greater than the sojourn time of the main queue, and does not attempt to enforce throughput fairness. This is probably adequate to assist some common latency-sensitive protocols, such as ARP, SSH, NTP and DNS, as well as the initial handshake of longer-lived bulk flows. You will also notice that there is support for SCE in the application of AQM, though the AQM algorithm itself is only generically specified. In common with a plain single-queue AQM, CNQ will converge to approximate fairness between TCP-friendly flows, while keeping typical latencies under reasonable control. Aggressive or meek flows will also behave as expected for a single queue, up to a limit where an extremely meek flow might fall into the sparse queue and thus limit its ability to give way. This limit will relatively depend on the latency maintained in the main queue, and will probably be several times less than the fair share. I hope this serves to illustrate that I'm not against single-queue AQMs in an appropriate context, but that their performance limitations need to be borne in mind. In particular, I consider a single-queue AQM (or CNQ) to be a marked improvement over a dumb FIFO at any bottleneck. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] CNQ cheap-nasty-queuing (was per-flow queuing) 2019-07-23 5:00 ` Jonathan Morton @ 2019-07-23 11:35 ` Luca Muscariello 2019-07-23 20:14 ` [Ecn-sane] per-flow scheduling Bob Briscoe 1 sibling, 0 replies; 49+ messages in thread From: Luca Muscariello @ 2019-07-23 11:35 UTC (permalink / raw) To: ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 3059 bytes --] Starting a new thread as it looks like this new proposal can be integrated in a modular way with several other loosely coupled components by just using two hardware queues, which are already available in current HW. Modifications would just be in the ingress enqueue time “self-classifier”. It also looks like possible to append different optional packet droppers to the backlogged queue such as AFD or others to provide some level of flow protection to backlogged flows as well. On Tue, Jul 23, 2019 at 7:01 AM Jonathan Morton <chromatix99@gmail.com> wrote: > > It is true that SCE doesn't inherently carry a label distinguishing its > traffic from the general set, and thus DualQ cannot be directly applied to > it. But there is a straightforward way to perform this labelling if > required, right next door in the Diffserv field. The recently proposed NQB > DSCP would likely be suitable. I don't think that the majority of > potential SCE users would need or even want this distinction (the primary > benefit of SCE being better link utilisation by eliminating the traditional > deep sawtooth), but the mechanism exists, orthogonally to SCE itself. > > I have also drawn up, as a straw-man proposal, CNQ - Cheap Nasty Queuing: > > > https://tools.ietf.org/html/draft-morton-tsvwg-cheap-nasty-queueing-00 > > This is a single-queue AQM, plus a side channel for prioritising sparse > flows, although the definition of "sparse" is much stricter than for a true > FQ implementation (even LFQ). In essence, CNQ treats a flow as sparse if > its inter-packet gap is greater than the sojourn time of the main queue, > and does not attempt to enforce throughput fairness. This is probably > adequate to assist some common latency-sensitive protocols, such as ARP, > SSH, NTP and DNS, as well as the initial handshake of longer-lived bulk > flows. You will also notice that there is support for SCE in the > application of AQM, though the AQM algorithm itself is only generically > specified. > > In common with a plain single-queue AQM, CNQ will converge to approximate > fairness between TCP-friendly flows, while keeping typical latencies under > reasonable control. Aggressive or meek flows will also behave as expected > for a single queue, up to a limit where an extremely meek flow might fall > into the sparse queue and thus limit its ability to give way. This limit > will relatively depend on the latency maintained in the main queue, and > will probably be several times less than the fair share. > > I hope this serves to illustrate that I'm not against single-queue AQMs in > an appropriate context, but that their performance limitations need to be > borne in mind. In particular, I consider a single-queue AQM (or CNQ) to be > a marked improvement over a dumb FIFO at any bottleneck. > > - Jonathan Morton > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > [-- Attachment #2: Type: text/html, Size: 3752 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-23 5:00 ` Jonathan Morton 2019-07-23 11:35 ` [Ecn-sane] CNQ cheap-nasty-queuing (was per-flow queuing) Luca Muscariello @ 2019-07-23 20:14 ` Bob Briscoe 2019-07-23 22:24 ` Jonathan Morton 1 sibling, 1 reply; 49+ messages in thread From: Bob Briscoe @ 2019-07-23 20:14 UTC (permalink / raw) To: Jonathan Morton; +Cc: ecn-sane, tsvwg IETF list Jonathan On 23/07/2019 01:00, Jonathan Morton wrote: >> On 22 Jul, 2019, at 9:44 am, Bob Briscoe <ietf@bobbriscoe.net> wrote: >> >> As promised, I've pulled together and uploaded the main architectural arguments about per-flow scheduling that cause concern: >> Per-Flow Scheduling and the End-to-End Argument >> >> It runs to 6 pages of reading. But I tried to make the time readers will have to spend worth it. > Thanks for posting this. Upon reading, there is much to disagree with, but at least some points of view can be better understood in their context. > > However, there is one thing I must take issue with up front - your repeated assertion that SCE cannot work without FQ. As some of the plots we showed you this weekend (and maybe even earlier) demonstrate, we *do* have SCE working in a single-queue environment, even with non-SCE traffic mixed in; this is a development since your initial view of SCE several months ago. > > SCE will also work just fine (as it always did) with plain RFC-3168 compliant single-queue AQMs, as might reasonably be deployed to improve the short-term overload performance of core or access networks, or as a low-cost bufferbloat mitigation mechanism in switching, head-end or CPE devices. In that case SCE's behaviour is RFC-3168 compliant and TCP-friendly, and specifically does not require special treatment in the network. > > I would therefore appreciate a revision of your paper which removes that false assertion from the several places it is made. [BB] I was careful not to say "does not work", because I know your definition of "works" is not the same as the community working on ultra-low latency. I said it needs FQ "in order to provide benefit". I'll explain why SCE doesn't provide benefit in a response to your posting on LFQ, to keep both this thread and that one on topic. >> Finally, I want to emphasize that the purpose is for advocates of per-flow scheduling to understand that there is a coherent world view that agrees with the arguments in this paper. If you have a different set of assumptions and perspectives that leads you to advocate per-flow scheduling and disagree with some of these arguments, that's to be expected. >> >> The purpose is to explain why some people don't want FQ, and therefore why it's important to leave the choice open between FQ and DualQ. That has always been the intent of L4S, which supports both. > A central theme of your paper is a distinction of needs between times of "plenty" and "famine" in terms of network capacity. However, the dividing line between these states is not defined. This is a crucial deficiency when the argument made from the start is that FQ can be helpful during "famine" but is detrimental during "plenty". > > One reasonable definition of such a dividing line is that "plenty" exists whenever the link is not fully utilised. But in that case, any modern FQ algorithm will deliver packets in order of arrival, allowing all flows to use as much capacity as they desire. Only when the packet arrival rate begins to exceed that of draining, when a "famine" state could be said to exist, does FQ apply any management at all. Even then, if capacity is left over after metering out each flow's fair share, traffic is free to make use of it. Thus under this definition, FQ is strictly harmless. > > Reading between the lines, I suspect that what you mean is that if, after subtracting the capacity used by inelastic flows (such as the VR streams mentioned in one scenario), there is "sufficient" capacity left for elastic flows to experience good performance, then you would say there is still a state of "plenty". But this is a highly subjective definition and liable to change over time; would 1.5Mbps leftover capacity, equivalent to a T1 line of old, be acceptable today? I think most modern users would classify it as "famine", but it was considered quite a luxury 20 years ago. [BB] The objective measure of famine/plenty that all congestion controls use is a combination of the loss level, ECN level, queue delay, etc. In the paper, I referred to BBRv2's distinction between famine and plenty (which is at 1% loss) and in a footnote I expressed a preference for a more gradual transition. Many real-time congestion controls (e.g. the algos used in production video chat and conferencing products etc) include such a gradual transition from fast responsiveness at high congestion levels (loss in single digit percentage area) to being virtually unresponsive at much lower levels of congestion. There's lots of diversity, and this ecosystem is healthy and nowhere near threatening Internet collapse. So I think introducing a "standard" transition would just create a chilling effect on innovation. The use of the two words famine and plenty wasn't intended to imply only two states. It's a continuum (like the spectrum between famine and plenty). > Regardless, my assertion is not that FQ is required for ultra-low latency, but that flows requiring ultra-low latency must be isolated from general traffic. FQ is a convenient, generic way to do that, and DualQ is another way; other ways exist, such as through Diffserv. If both traffic types are fed through a single queue, they will also share fates with respect to latency performance, such that every little misbehaviour of the general traffic will be reflected in the latency seen by the more sensitive flows. > > It is true that SCE doesn't inherently carry a label distinguishing its traffic from the general set, and thus DualQ cannot be directly applied to it. But there is a straightforward way to perform this labelling if required, right next door in the Diffserv field. The recently proposed NQB DSCP would likely be suitable. I don't think that the majority of potential SCE users would need or even want this distinction (the primary benefit of SCE being better link utilisation by eliminating the traditional deep sawtooth), but the mechanism exists, orthogonally to SCE itself. To enable SCE and RFC3168 in two queues rather than per-flow queues, if you required SCE packets to be identified by a DSCP, if the DSCP got wiped (which it often does), your SCE traffic would mix with 3168 traffic and starve itself. > > I have also drawn up, as a straw-man proposal, CNQ - Cheap Nasty Queuing: > > https://tools.ietf.org/html/draft-morton-tsvwg-cheap-nasty-queueing-00 > > This is a single-queue AQM, plus a side channel for prioritising sparse flows, although the definition of "sparse" is much stricter than for a true FQ implementation (even LFQ). In essence, CNQ treats a flow as sparse if its inter-packet gap is greater than the sojourn time of the main queue, and does not attempt to enforce throughput fairness. This is probably adequate to assist some common latency-sensitive protocols, such as ARP, SSH, NTP and DNS, as well as the initial handshake of longer-lived bulk flows. You will also notice that there is support for SCE in the application of AQM, though the AQM algorithm itself is only generically specified. > > In common with a plain single-queue AQM, CNQ will converge to approximate fairness between TCP-friendly flows, while keeping typical latencies under reasonable control. Aggressive or meek flows will also behave as expected for a single queue, up to a limit where an extremely meek flow might fall into the sparse queue and thus limit its ability to give way. This limit will relatively depend on the latency maintained in the main queue, and will probably be several times less than the fair share. > > I hope this serves to illustrate that I'm not against single-queue AQMs in an appropriate context, but that their performance limitations need to be borne in mind. In particular, I consider a single-queue AQM (or CNQ) to be a marked improvement over a dumb FIFO at any bottleneck. OK. Can you say whether you've tested this exhausitvely? Need to know before we all spend time reading it in too much depth. Bob > > - Jonathan Morton > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] per-flow scheduling 2019-07-23 20:14 ` [Ecn-sane] per-flow scheduling Bob Briscoe @ 2019-07-23 22:24 ` Jonathan Morton 0 siblings, 0 replies; 49+ messages in thread From: Jonathan Morton @ 2019-07-23 22:24 UTC (permalink / raw) To: Bob Briscoe; +Cc: ecn-sane, tsvwg IETF list > [BB] The objective measure of famine/plenty that all congestion controls use is a combination of the loss level, ECN level, queue delay, etc. In the paper, I referred to BBRv2's distinction between famine and plenty (which is at 1% loss) and in a footnote I expressed a preference for a more gradual transition. Coincidentally, 1% loss corresponds to about 1.5Mbps goodput at an Internet-scale 80ms RTT, assuming a Reno transport. Obviously at different RTTs it corresponds to different goodputs, and you might argue that shorter RTTs are also common. But now we have a common reference point. Oh, and under similar conditions, 1% marking corresponds to about 30Mbps with L4S (ie. cwnd=200). That's a 20:1 ratio versus Reno, which you might want to think about carefully when it comes to fair competition. > The use of the two words famine and plenty wasn't intended to imply only two states. It's a continuum (like the spectrum between famine and plenty). Okay. I still happen to disagree with the argument, but single-queue AQMs are still a valid improvement over single dumb FIFOs. They improve reliability by reducing losses and timeouts, and help to reduce lag in online games. That's the practical problem facing most Internet users today, and that's where my solutions are focused. >> Regardless, my assertion is not that FQ is required for ultra-low latency, but that flows requiring ultra-low latency must be isolated from general traffic… >> >> It is true that SCE doesn't inherently carry a label distinguishing its traffic from the general set, and thus DualQ cannot be directly applied to it. But there is a straightforward way to perform this labelling if required, right next door in the Diffserv field. The recently proposed NQB DSCP would likely be suitable. I don't think that the majority of potential SCE users would need or even want this distinction (the primary benefit of SCE being better link utilisation by eliminating the traditional deep sawtooth), but the mechanism exists, orthogonally to SCE itself. > To enable SCE and RFC3168 in two queues rather than per-flow queues, if you required SCE packets to be identified by a DSCP, if the DSCP got wiped (which it often does), your SCE traffic would mix with 3168 traffic and starve itself. Under certain simplifying assumptions, yes. But those assumptions would include that the 3168 queue was also providing SCE marking in the FQ style, which might not be appropriate for what is effectively a single queue carrying mixed traffic. It would be as if your DualQ was providing L4S-style signalling in its Classic queue, which I'm sure you would not advocate. As a de-facto representative of the cable industry, I hope you are aware of the irony that it is chiefly cable ISPs who are bleaching Diffserv information out of consumer traffic? The starvation problem can be eliminated entirely by providing SCE marking only on the queue intended for SCE traffic (so only CE marks on the 3168 queue). Mis-marked traffic would then revert to purely 3168 compliant behaviour, which SCE does naturally when given only CE marks. This option is an important advantage of having a clear distinction between the two signals; there is no ambiguity at the receiver about what type of signal it's receiving and thus which response is demanded. As a reminder, we *also* have a solution specifically for single-queue AQMs implementing SCE. It's not a knobs-free solution as the FQ version is, but it exists and it seems to work. I expect we will need to explore its dynamic characteristics more thoroughly in the near future. >> I have also drawn up, as a straw-man proposal, CNQ - Cheap Nasty Queuing: >> >> https://tools.ietf.org/html/draft-morton-tsvwg-cheap-nasty-queueing-00 > OK. Can you say whether you've tested this exhausitvely? Need to know before we all spend time reading it in too much depth. To quote Knuth: "I have only proved the above code correct, not tried it." But we may get time to quickly implement CNQ and/or LFQ, in between preparations for Friday. (By which I mean implementing it in Linux.) This stuff is not central to our work on SCE, since we already have Cake for running experiments. I think Pete Heist wants to try putting CNQ in his offline simulator as well, which already has LFQ. So that should provide an early sanity check. - Jonathan Morton ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-07-22 13:44 ` Bob Briscoe 2019-07-23 5:00 ` Jonathan Morton @ 2019-07-23 15:12 ` Kyle Rose 2019-07-25 19:25 ` Holland, Jake 1 sibling, 1 reply; 49+ messages in thread From: Kyle Rose @ 2019-07-23 15:12 UTC (permalink / raw) To: Bob Briscoe; +Cc: Jonathan Morton, David P. Reed, ecn-sane, tsvwg IETF list [-- Attachment #1: Type: text/plain, Size: 4147 bytes --] On Mon, Jul 22, 2019 at 9:44 AM Bob Briscoe <ietf@bobbriscoe.net> wrote: > Folks, > > As promised, I've pulled together and uploaded the main architectural > arguments about per-flow scheduling that cause concern: > > Per-Flow Scheduling and the End-to-End Argum ent > <http://bobbriscoe.net/projects/latency/per-flow_tr.pdf> > > It runs to 6 pages of reading. But I tried to make the time readers will > have to spend worth it. > Before reading the other responses (poisoning my own thinking), I wanted to offer my own reaction. In the discussion of figure 1, you seem to imply that there's some obvious choice of bin packing for the flows involved, but that can't be right. What if the dark green flow has deadlines? Why should that be the one that gets only leftover bandwidth? I'll return to this point in a bit. The tl;dr summary of the paper seems to be that the L4S approach leaves the allocation of limited bandwidth up to the endpoints, while FQ arbitrarily enforces equality in the presence of limited bandwidth; but in reality the bottleneck device needs to make *some* choice when there's a shortage and flows don't respond. That requires some choice of policy. In FQ, the chosen policy is to make sure every flow has the ability to get low latency for itself, but in the absence of some other kind of trusted signaling allocates an equal proportion of the available bandwidth to each flow. ISTM this is the best you can do in an adversarial environment, because anything else can be gamed to get a more than equal share (and depending on how "flow" is defined, even this can be gamed by opening up more flows; but this is not a problem unique to FQ). In L4S, the policy is to assume one queue is well-behaved and one not, and to use the ECT(1) codepoint as a classifier to get into one or the other. But policy choice doesn't end there: in an uncooperative or adversarial environment, you can easily get into a situation in which the bottleneck has to apply policy to several unresponsive flows in the supposedly well-behaved queue. Note that this doesn't even have to involve bad actors misclassifying on purpose: it could be two uncooperative 200 Mb VR flows competing for 300 Mb of bandwidth. In this case, L4S falls back to classic, which with DualQ means every flow, not just the uncooperative ones, suffers. As a user, I don't want my small, responsive flows to suffer when uncooperative actors decide to exceed the BBW. Getting back to figure 1, how do you choose the right allocation? With the proposed use of ECT(1) as classifier, you have exactly one bit available to decide which queue, and therefore which policy, applies to a flow. Should all the classic flows get assigned whatever is left after the L4S flows are allocated bandwidth? That hardly seems fair to classic flows. But let's say this policy is implemented. It then escapes me how this is any different from the trust problems facing end-to-end DSCP/QoS: why wouldn't everyone just classify their classic flows as L4S, forcing everything to be treated as classic and getting access to a (greater) share of the overall BBW? Then we're left both with a spent ECT(1) codepoint and a need for FQ or some other queuing policy to arbitrate between flows, without any bits with which to implement the high-fidelity congestion signal required to achieve low latency without getting squeezed out. The bottom line is that I see no way to escape the necessity of something FQ-like at bottlenecks outside of the sender's trust domain. If FQ can't be done in backbone-grade hardware, then the only real answer is pipes in the core big enough to force the bottleneck to live somewhere closer to the edge, where FQ does scale. Note that, in a perfect world, FQ wouldn't trigger at all because there would always be enough bandwidth for everything users wanted to do, but in the real world it seems like the best you can possibly do in the absence of trusted information about how to prioritize traffic. IMO, best to think of FQ as a last-ditch measure indicating to the operator that they're gonna need a bigger pipe than as a steady-state bandwidth allocator. Kyle [-- Attachment #2: Type: text/html, Size: 4898 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-07-23 15:12 ` [Ecn-sane] [tsvwg] " Kyle Rose @ 2019-07-25 19:25 ` Holland, Jake 2019-07-27 15:35 ` Kyle Rose 0 siblings, 1 reply; 49+ messages in thread From: Holland, Jake @ 2019-07-25 19:25 UTC (permalink / raw) To: Kyle Rose, Bob Briscoe; +Cc: ecn-sane, tsvwg IETF list, David P. Reed [-- Attachment #1: Type: text/plain, Size: 6057 bytes --] Hi Kyle, I almost agree, except that the concern is not about classic flows. I agree (with caveats) with what Bob and Greg have said before: ordinary classic flows don’t have an incentive to mis-mark if they’ll be responding normally to CE, because a classic flow will back off too aggressively and starve itself if it’s getting CE marks from the LL queue. That said, I had a message where I tried to express something similar to the concerns I think you just raised, with regard to a different category of flow: https://mailarchive.ietf.org/arch/msg/tsvwg/bUu7pLmQo6BhR1mE2suJPPluW3Q So I agree with the concerns you’ve raised here, and I want to +1 that aspect of it while also correcting that I don’t think these apply for ordinary classic flows, but rather for flows that use application-level quality metrics to change bit-rates instead responding at the transport level. For those flows (which seems to include some of today’s video conferencing traffic), I expect they really would see an advantage by mis-marking themselves, and will require policing that imposes a policy decision. Given that, I agree that I don’t see a simple alternative to FQ for flows originating outside the policer’s trust domain when the network is fully utilized. I hope that makes at least a little sense. Best regards, Jake From: Kyle Rose <krose@krose.org> Date: 2019-07-23 at 11:13 To: Bob Briscoe <ietf@bobbriscoe.net> Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>, tsvwg IETF list <tsvwg@ietf.org>, "David P. Reed" <dpreed@deepplum.com> Subject: Re: [tsvwg] [Ecn-sane] per-flow scheduling On Mon, Jul 22, 2019 at 9:44 AM Bob Briscoe <ietf@bobbriscoe.net<mailto:ietf@bobbriscoe.net>> wrote: Folks, As promised, I've pulled together and uploaded the main architectural arguments about per-flow scheduling that cause concern: Per-Flow Scheduling and the End-to-End Argum ent<https://urldefense.proofpoint.com/v2/url?u=http-3A__bobbriscoe.net_projects_latency_per-2Dflow-5Ftr.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=PI1HWa27sXLOTKR6A5e3p0PaPt7vS4SMNHQKYIzfXxM&s=ACtkb-7e-7Ifb6QsnMjd4WSYrCfUyWGIbBuNkDZ8V3E&e=> It runs to 6 pages of reading. But I tried to make the time readers will have to spend worth it. Before reading the other responses (poisoning my own thinking), I wanted to offer my own reaction. In the discussion of figure 1, you seem to imply that there's some obvious choice of bin packing for the flows involved, but that can't be right. What if the dark green flow has deadlines? Why should that be the one that gets only leftover bandwidth? I'll return to this point in a bit. The tl;dr summary of the paper seems to be that the L4S approach leaves the allocation of limited bandwidth up to the endpoints, while FQ arbitrarily enforces equality in the presence of limited bandwidth; but in reality the bottleneck device needs to make *some* choice when there's a shortage and flows don't respond. That requires some choice of policy. In FQ, the chosen policy is to make sure every flow has the ability to get low latency for itself, but in the absence of some other kind of trusted signaling allocates an equal proportion of the available bandwidth to each flow. ISTM this is the best you can do in an adversarial environment, because anything else can be gamed to get a more than equal share (and depending on how "flow" is defined, even this can be gamed by opening up more flows; but this is not a problem unique to FQ). In L4S, the policy is to assume one queue is well-behaved and one not, and to use the ECT(1) codepoint as a classifier to get into one or the other. But policy choice doesn't end there: in an uncooperative or adversarial environment, you can easily get into a situation in which the bottleneck has to apply policy to several unresponsive flows in the supposedly well-behaved queue. Note that this doesn't even have to involve bad actors misclassifying on purpose: it could be two uncooperative 200 Mb VR flows competing for 300 Mb of bandwidth. In this case, L4S falls back to classic, which with DualQ means every flow, not just the uncooperative ones, suffers. As a user, I don't want my small, responsive flows to suffer when uncooperative actors decide to exceed the BBW. Getting back to figure 1, how do you choose the right allocation? With the proposed use of ECT(1) as classifier, you have exactly one bit available to decide which queue, and therefore which policy, applies to a flow. Should all the classic flows get assigned whatever is left after the L4S flows are allocated bandwidth? That hardly seems fair to classic flows. But let's say this policy is implemented. It then escapes me how this is any different from the trust problems facing end-to-end DSCP/QoS: why wouldn't everyone just classify their classic flows as L4S, forcing everything to be treated as classic and getting access to a (greater) share of the overall BBW? Then we're left both with a spent ECT(1) codepoint and a need for FQ or some other queuing policy to arbitrate between flows, without any bits with which to implement the high-fidelity congestion signal required to achieve low latency without getting squeezed out. The bottom line is that I see no way to escape the necessity of something FQ-like at bottlenecks outside of the sender's trust domain. If FQ can't be done in backbone-grade hardware, then the only real answer is pipes in the core big enough to force the bottleneck to live somewhere closer to the edge, where FQ does scale. Note that, in a perfect world, FQ wouldn't trigger at all because there would always be enough bandwidth for everything users wanted to do, but in the real world it seems like the best you can possibly do in the absence of trusted information about how to prioritize traffic. IMO, best to think of FQ as a last-ditch measure indicating to the operator that they're gonna need a bigger pipe than as a steady-state bandwidth allocator. Kyle [-- Attachment #2: Type: text/html, Size: 11297 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-07-25 19:25 ` Holland, Jake @ 2019-07-27 15:35 ` Kyle Rose 2019-07-27 19:42 ` Jonathan Morton 0 siblings, 1 reply; 49+ messages in thread From: Kyle Rose @ 2019-07-27 15:35 UTC (permalink / raw) To: Holland, Jake; +Cc: Bob Briscoe, ecn-sane, tsvwg IETF list, David P. Reed [-- Attachment #1: Type: text/plain, Size: 6820 bytes --] Right, I understand that under 3168 behavior the sender would react differently to ECE markings than L4S flows would, but I guess I don't understand why a sender willing to misclassify traffic with ECT(1) wouldn't also choose to react non-normatively to ECE markings. On the rest, I think we agree. Kyle On Thu, Jul 25, 2019 at 3:26 PM Holland, Jake <jholland@akamai.com> wrote: > Hi Kyle, > > > > I almost agree, except that the concern is not about classic flows. > > > > I agree (with caveats) with what Bob and Greg have said before: ordinary > classic flows don’t have an incentive to mis-mark if they’ll be responding > normally to CE, because a classic flow will back off too aggressively and > starve itself if it’s getting CE marks from the LL queue. > > > > That said, I had a message where I tried to express something similar to > the concerns I think you just raised, with regard to a different category > of flow: > > https://mailarchive.ietf.org/arch/msg/tsvwg/bUu7pLmQo6BhR1mE2suJPPluW3Q > > > > So I agree with the concerns you’ve raised here, and I want to +1 that > aspect of it while also correcting that I don’t think these apply for > ordinary classic flows, but rather for flows that use application-level > quality metrics to change bit-rates instead responding at the transport > level. > > > > For those flows (which seems to include some of today’s video conferencing > traffic), I expect they really would see an advantage by mis-marking > themselves, and will require policing that imposes a policy decision. > Given that, I agree that I don’t see a simple alternative to FQ for flows > originating outside the policer’s trust domain when the network is fully > utilized. > > > > I hope that makes at least a little sense. > > > > Best regards, > > Jake > > > > *From: *Kyle Rose <krose@krose.org> > *Date: *2019-07-23 at 11:13 > *To: *Bob Briscoe <ietf@bobbriscoe.net> > *Cc: *"ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>, > tsvwg IETF list <tsvwg@ietf.org>, "David P. Reed" <dpreed@deepplum.com> > *Subject: *Re: [tsvwg] [Ecn-sane] per-flow scheduling > > > > On Mon, Jul 22, 2019 at 9:44 AM Bob Briscoe <ietf@bobbriscoe.net> wrote: > > Folks, > > As promised, I've pulled together and uploaded the main architectural > arguments about per-flow scheduling that cause concern: > > Per-Flow Scheduling and the End-to-End Argum ent > <https://urldefense.proofpoint.com/v2/url?u=http-3A__bobbriscoe.net_projects_latency_per-2Dflow-5Ftr.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=PI1HWa27sXLOTKR6A5e3p0PaPt7vS4SMNHQKYIzfXxM&s=ACtkb-7e-7Ifb6QsnMjd4WSYrCfUyWGIbBuNkDZ8V3E&e=> > > > It runs to 6 pages of reading. But I tried to make the time readers will > have to spend worth it. > > > > Before reading the other responses (poisoning my own thinking), I wanted > to offer my own reaction. In the discussion of figure 1, you seem to imply > that there's some obvious choice of bin packing for the flows involved, but > that can't be right. What if the dark green flow has deadlines? Why should > that be the one that gets only leftover bandwidth? I'll return to this > point in a bit. > > > > The tl;dr summary of the paper seems to be that the L4S approach leaves > the allocation of limited bandwidth up to the endpoints, while FQ > arbitrarily enforces equality in the presence of limited bandwidth; but in > reality the bottleneck device needs to make *some* choice when there's a > shortage and flows don't respond. That requires some choice of policy. > > > > In FQ, the chosen policy is to make sure every flow has the ability to get > low latency for itself, but in the absence of some other kind of trusted > signaling allocates an equal proportion of the available bandwidth to each > flow. ISTM this is the best you can do in an adversarial environment, > because anything else can be gamed to get a more than equal share (and > depending on how "flow" is defined, even this can be gamed by opening up > more flows; but this is not a problem unique to FQ). > > > > In L4S, the policy is to assume one queue is well-behaved and one not, and > to use the ECT(1) codepoint as a classifier to get into one or the other. > But policy choice doesn't end there: in an uncooperative or adversarial > environment, you can easily get into a situation in which the bottleneck > has to apply policy to several unresponsive flows in the supposedly > well-behaved queue. Note that this doesn't even have to involve bad actors > misclassifying on purpose: it could be two uncooperative 200 Mb VR flows > competing for 300 Mb of bandwidth. In this case, L4S falls back to classic, > which with DualQ means every flow, not just the uncooperative ones, > suffers. As a user, I don't want my small, responsive flows to suffer when > uncooperative actors decide to exceed the BBW. > > > > Getting back to figure 1, how do you choose the right allocation? With the > proposed use of ECT(1) as classifier, you have exactly one bit available to > decide which queue, and therefore which policy, applies to a flow. Should > all the classic flows get assigned whatever is left after the L4S flows are > allocated bandwidth? That hardly seems fair to classic flows. But let's say > this policy is implemented. It then escapes me how this is any different > from the trust problems facing end-to-end DSCP/QoS: why wouldn't everyone > just classify their classic flows as L4S, forcing everything to be treated > as classic and getting access to a (greater) share of the overall BBW? Then > we're left both with a spent ECT(1) codepoint and a need for FQ or some > other queuing policy to arbitrate between flows, without any bits with > which to implement the high-fidelity congestion signal required to achieve > low latency without getting squeezed out. > > > > The bottom line is that I see no way to escape the necessity of something > FQ-like at bottlenecks outside of the sender's trust domain. If FQ can't be > done in backbone-grade hardware, then the only real answer is pipes in the > core big enough to force the bottleneck to live somewhere closer to the > edge, where FQ does scale. > > > > Note that, in a perfect world, FQ wouldn't trigger at all because there > would always be enough bandwidth for everything users wanted to do, but in > the real world it seems like the best you can possibly do in the absence of > trusted information about how to prioritize traffic. IMO, best to think of > FQ as a last-ditch measure indicating to the operator that they're gonna > need a bigger pipe than as a steady-state bandwidth allocator. > > > > Kyle > > > [-- Attachment #2: Type: text/html, Size: 10731 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Ecn-sane] [tsvwg] per-flow scheduling 2019-07-27 15:35 ` Kyle Rose @ 2019-07-27 19:42 ` Jonathan Morton 0 siblings, 0 replies; 49+ messages in thread From: Jonathan Morton @ 2019-07-27 19:42 UTC (permalink / raw) To: Kyle Rose; +Cc: Holland, Jake, tsvwg IETF list, Bob Briscoe, ecn-sane [-- Attachment #1: Type: text/plain, Size: 398 bytes --] RFC compliant traffic finding its way into the L4S queue is one thing - it obviously results in a performance degradation to one or the other, depending on other factors - but I'm more concerned about L4S traffic entering the classic queue, which would have a severe impact on general traffic throughout. By default this always occurs when a single queue AQM that is not L4S aware is encountered. [-- Attachment #2: Type: text/html, Size: 421 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2019-07-27 19:42 UTC | newest] Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-19 14:12 [Ecn-sane] per-flow scheduling Bob Briscoe 2019-06-19 14:20 ` [Ecn-sane] [tsvwg] " Kyle Rose 2019-06-21 6:59 ` [Ecn-sane] " Sebastian Moeller 2019-06-21 9:33 ` Luca Muscariello 2019-06-21 20:37 ` [Ecn-sane] [tsvwg] " Brian E Carpenter 2019-06-22 19:50 ` David P. Reed 2019-06-22 20:47 ` Jonathan Morton 2019-06-22 22:03 ` Luca Muscariello 2019-06-22 22:09 ` David P. Reed 2019-06-22 23:07 ` Jonathan Morton 2019-06-24 18:57 ` David P. Reed 2019-06-24 19:31 ` Jonathan Morton 2019-06-24 19:50 ` David P. Reed 2019-06-24 20:14 ` Jonathan Morton 2019-06-25 21:05 ` David P. Reed 2019-06-24 21:25 ` Luca Muscariello 2019-06-26 12:48 ` Sebastian Moeller 2019-06-26 16:31 ` David P. Reed 2019-06-26 16:53 ` David P. Reed 2019-06-27 7:54 ` Sebastian Moeller 2019-06-27 7:49 ` Sebastian Moeller 2019-06-27 20:33 ` Brian E Carpenter 2019-06-27 21:31 ` David P. Reed 2019-06-28 7:49 ` Toke Høiland-Jørgensen 2019-06-27 7:53 ` Bless, Roland (TM) 2019-06-22 21:10 ` Brian E Carpenter 2019-06-22 22:25 ` David P. Reed 2019-06-22 22:30 ` Luca Muscariello 2019-07-17 21:33 ` [Ecn-sane] " Sebastian Moeller 2019-07-17 22:18 ` David P. Reed 2019-07-17 22:34 ` David P. Reed 2019-07-17 23:23 ` Dave Taht 2019-07-18 0:20 ` Dave Taht 2019-07-18 5:30 ` Jonathan Morton 2019-07-18 15:02 ` David P. Reed 2019-07-18 16:06 ` Dave Taht 2019-07-18 4:31 ` Jonathan Morton 2019-07-18 15:52 ` David P. Reed 2019-07-18 18:12 ` [Ecn-sane] [tsvwg] " Dave Taht 2019-07-18 5:24 ` [Ecn-sane] " Jonathan Morton 2019-07-22 13:44 ` Bob Briscoe 2019-07-23 5:00 ` Jonathan Morton 2019-07-23 11:35 ` [Ecn-sane] CNQ cheap-nasty-queuing (was per-flow queuing) Luca Muscariello 2019-07-23 20:14 ` [Ecn-sane] per-flow scheduling Bob Briscoe 2019-07-23 22:24 ` Jonathan Morton 2019-07-23 15:12 ` [Ecn-sane] [tsvwg] " Kyle Rose 2019-07-25 19:25 ` Holland, Jake 2019-07-27 15:35 ` Kyle Rose 2019-07-27 19:42 ` Jonathan Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox