[Ecn-sane] [tsvwg] per-flow scheduling

Sat Jun 22 15:50:09 EDT 2019

Two points:

- Jerry Saltzer and I were the primary authors of the End-to-end argument paper, and the motivation was based *my* work on the original TCP and IP protocols. Dave Clark got involved significantly later than all those decisions, which were basically complete when he got involved. (Jerry was my thesis supervisor, I was his student, and I operated largely independently, taking input from various others at MIT). I mention this because Dave understands the end-to-end arguments, but he understands (as we all did) that it was a design *principle* and not a perfectly strict rule. That said, it's a rule that has a strong foundational argument from modularity and evolvability in a context where the system has to work on a wide range of infrastructures (not all knowable in advance) and support a wide range of usage/application-areas (not all knowable in advance). Treating the paper as if it were "DDC" declaring a law is just wrong. He wasn't Moses and it is not written on tablets. Dave did have some "power" in his role of trying to achieve interoperability across diverse implementations. But his focus was primarily on interoperability, not other things. So ideas in the IP protocol like "TOS" which were largely placeholders for not-completely-worked-out concepts deferred to the future were left till later.

- It is clear (at least to me) that from the point of view of the source of an IP datagram, the "handling" of that datagram within the network of networks can vary, and so that is why there is a TOS field - to specify an interoperable, meaningfully described per-packet indicator of differential handling. In regards to the end-to-end argument, that handling choice is a network function, *to the extent that it can completely be implemented in the network itself*.
Congestion management, however, is not achievable entirely and only within the network. That's completely obvious: congestion happens when the source-destination flows exceed the capacity of the network of networks to satisfy all demands.
The network can only implement *certain* general kinds of mechanisms that may be used by the endpoints to resolve congestion:
1) admission controls. These are implemented at the interface between the source entity and the network of networks. They tend to be impractical in the Internet context, because there is, by a fundamental and irreversible design choice made by Cerf and Kahn (and the rest of us), no central controller of the entire network of networks. This is to make evolvability and scalability work. 5G (not an Internet system) implies a central controller, as does SNA, LTE, and many other networks. The Internet is an overlay on top of such networks.
2) signalling congestion to the endpoints, which will respond by slowing their transmission rate (or explicitly re-routing transmission, or compressing their content) through the network to match capacity. This response is done *above* the IP layer, and has proven very practical. The function in the network is reduced to "congestion signalling", in a universally understandable meaningful mechanism: packet drops, ECN, packet-pair separation in arrival time, ...  This limited function is essential within the network, because it is the state of the path(s) that is needed to implement the full function at the end points. So congestion signalling, like ECN, is implemented according to the end-to-end argument by carefully defining the network function to be the minimum necessary mechanism so that endpoints can control their rates.
3) automatic selection of routes for flows. It's perfectly fine to select different routes based on information in the IP header (the part that is intended to be read and understood by the network of networks). Now this is currently *rarely* done, due to the complexity of tracking more detailed routing information at the router level. But we had expected that eventually the Internet would be so well connected that there would be diverse routes with diverse capabilities. For example, the "Interplanetary Internet" works with datagrams, that can be implemented with IP, but not using TCP, which requires very low end-to-end latency. Thus, one would expect that TCP would not want any packets transferred over a path via Mars, or for that matter a geosynchronous satellite, even if the throughput would be higher.
So one can imagine that eventually a "TOS" might say - send this packet preferably along a path that has at most 200 ms. RTT, *even if that leads to congestion signalling*, while another TOS might say "send this path over the most "capacious" set of paths, ignoring RTT entirely. (these are just for illustration, but obviously something like this woujld work).
Note that TOS is really aimed at *route selection* preferences, and not queueing management of individual routers.

Queueing management to share a single queue on a path for multiple priorities of traffic is not very compatible with "end-to-end arguments". There are any number of reasons why this doesn't work well. I can go into them. Mainly these reasons are why "diffserv" has never been adopted - it's NOT interoperable because the diversity of traffic between endpoints is hard to specify in a way that translates into the network mechanisms. Of course any queue can be managed in some algorithmic way with parameters, but the endpoints that want to specify an end-to-end goal don't have a way to understand the impact of those parameters on a specific queue that is currently congested.

Instead, the history of the Internet (and for that matter *all* networks, even Bell's voice systems) has focused on minimizing queueing delay to near zero throughout the network by whatever means it has at the endpoints or in the design. This is why we have AIMD's MD as a response to detection of congestion.

Pragmatic networks (those that operate in the real world) do not choose to operate with shared links in a saturated state. That's known in the phone business as the Mother's Day problem. You want to have enough capacity for the rare near-overload to never result in congestion.  Which means that the normal state of the network is very lightly loaded indeed, in order to minimize RTT. Consequently, focusing on somehow trying to optimize the utilization of the network to 100% is just a purely academic exercise. Since "priority" at the packet level within a queue only improves that case, it's just a focus of (bad) Ph.D. theses. (Good Ph.D. theses focus on actual real problems like getting the queues down to 1 packet or less by signalling the endpoints with information that allows them to do their job).

So, in considering what goes in the IP layer, both its header and the mechanics of the network of networks, it is those things that actually have implementable meaning in the network of networks when processing the IP datagram. The rest is "content" because the network of networks doesn't need to see it.

Thus, don't put anything in the IP header that belongs in the "content" part, just being a signal between end points. Some information used in the network of networks is also logically carried between endpoints.

On Friday, June 21, 2019 4:37pm, "Brian E Carpenter" <brian.e.carpenter at gmail.com> said:

> Below...
> On 21-Jun-19 21:33, Luca Muscariello wrote:
> > + David Reed, as I'm not sure he's on the ecn-sane list.
> >
> > To me, it seems like a very religious position against per-flow
> queueing. 
> > BTW, I fail to see how this would violate (in a "profound" way ) the e2e
> principle.
> >
> > When I read it (the e2e principle)
> >
> > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End Arguments in
> System Design". 
> > In: Proceedings of the Second International Conference on Distributed
> Computing Systems. Paris, France. 
> > April 8–10, 1981. IEEE Computer Society, pp. 509-512.
> > (available on line for free).
> >
> > It seems very much like the application of the Occam's razor to function
> placement in communication networks back in the 80s.
> > I see no conflict between what is written in that paper and per-flow queueing
> today, even after almost 40 years.
> >
> > If that was the case, then all service differentiation techniques would
> violate the e2e principle in a "profound" way too,
> > and dualQ too. A policer? A shaper? A priority queue?
> >
> > Luca
> 
> Quoting RFC2638 (the "two-bit" RFC):
> 
> >>> Both these
> >>> proposals seek to define a single common mechanism that is used
> by
> >>> interior network routers, pushing most of the complexity and state
> of
> >>> differentiated services to the network edges.
> 
> I can't help thinking that if DDC had felt this was against the E2E principle,
> he would have kicked up a fuss when it was written.
> 
> Bob's right, however, that there might be a tussle here. If end-points are
> attempting to pace their packets to suit their own needs, and the network is
> policing packets to support both service differentiation and fairness,
> these may well be competing rather than collaborating behaviours. And there
> probably isn't anything we can do about it by twiddling with algorithms.
> 
> Brian
> 
> 
> 
> 
> 
> 
> 
> >
> >
> >
> >
> >
> >
> >  
> >
> > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller <moeller0 at gmx.de
> <mailto:moeller0 at gmx.de>> wrote:
> >
> >
> >
> > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf at bobbriscoe.net
> <mailto:ietf at bobbriscoe.net>> wrote:
> > >
> > > Jake, all,
> > >
> > > You may not be aware of my long history of concern about how
> per-flow scheduling within endpoints and networks will limit the Internet in
> future. I find per-flow scheduling a violation of the e2e principle in such a
> profound way - the dynamic choice of the spacing between packets - that most
> people don't even associate it with the e2e principle.
> >
> > Maybe because it is not a violation of the e2e principle at all? My point
> is that with shared resources between the endpoints, the endpoints simply should
> have no expectancy that their choice of spacing between packets will be conserved.
> For the simple reason that it seems generally impossible to guarantee that
> inter-packet spacing is conserved (think "cross-traffic" at the bottleneck hop
> along the path and general bunching up of packets in the queue of a fast to slow
> transition*). I also would claim that the way L4S works (if it works) is to
> synchronize all active flows at the bottleneck which in tirn means each sender has
> only a very small timewindow in which to transmit a packet for it to hits its
> "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing delay
> guarantees will not work. In other words the senders have basically no say in the
> "spacing between packets", I fail to see how L4S improves upon FQ in that regard.
> >
> >
> >  IMHO having per-flow fairness as the defaults seems quite
> reasonable, endpoints can still throttle flows to their liking. Now per-flow
> fairness still can be "abused", so by itself it might not be sufficient, but
> neither is L4S as it has at best stochastic guarantees, as a single queue AQM
> (let's ignore the RFC3168 part of the AQM) there is the probability to send a
> throtteling signal to a low bandwidth flow (fair enough, it is only a mild
> throtteling signal, but still).
> > But enough about my opinion, what is the ideal fairness measure in your
> mind, and what is realistically achievable over the internet?
> >
> >
> > Best Regards
> >         Sebastian
> >
> >
> >
> >
> > >
> > > I detected that you were talking about FQ in a way that might have
> assumed my concern with it was just about implementation complexity. If you (or
> anyone watching) is not aware of the architectural concerns with per-flow
> scheduling, I can enumerate them.
> > >
> > > I originally started working on what became L4S to prove that it was
> possible to separate out reducing queuing delay from throughput scheduling. When
> Koen and I started working together on this, we discovered we had identical
> concerns on this.
> > >
> > >
> > >
> > > Bob
> > >
> > >
> > > --
> > > ________________________________________________________________
> > > Bob Briscoe               
>                http://bobbriscoe.net/
> > >
> > > _______________________________________________
> > > Ecn-sane mailing list
> > > Ecn-sane at lists.bufferbloat.net
> <mailto:Ecn-sane at lists.bufferbloat.net>
> > > https://lists.bufferbloat.net/listinfo/ecn-sane
> >
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane at lists.bufferbloat.net
> <mailto:Ecn-sane at lists.bufferbloat.net>
> > https://lists.bufferbloat.net/listinfo/ecn-sane
> >
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/ecn-sane/attachments/20190622/621d75e5/attachment-0001.html>