[Ecn-sane] [tsvwg] per-flow scheduling

Sat Jun 22 18:25:13 EDT 2019

Given the complexity of my broader comments, let me be clear that I have no problem with the broad concept of diffserv being compatible with the end-to-end arguments. I was trying to lay out what I think is a useful way to think about these kinds of issues within the Internet context.

Similarly, per-flow scheduling as an end-to-end concept (different flows defined by address pairs being jointly managed as entities) makes great sense, but it's really important to be clear that queue prioritization within a single queue at entry to a bottleneck link is a special case mechanism, and not a general end-to-end concept at the IP datagram level, given the generality of IP as a network packet transport protocol. It's really tied closely to routing, which isn't specified in any way by IP, other than "best efforts", a term that has become much more well defined over the years (including the notions of dropping rather than storing packets, the idea that successive IP datagrams should traverse roughly the same path in order to have stable congestion detection, ...).

Per-flow scheduling seems to work quite well in the cases where it applies, transparently below the IP datagram layer (that is, underneath the hourglass neck). IP effectively defines "flows", and it is reasonable to me that "best efforts" as a concept could include some notion of network-wide fairness among flows. Link-level "fairness" isn't a necessary precondition to network level fairness.

On Saturday, June 22, 2019 5:10pm, "Brian E Carpenter" <brian.e.carpenter at gmail.com> said:

> Just three or four small comments:
> 
> On 23-Jun-19 07:50, David P. Reed wrote:
> > Two points:
> >
> >  
> >
> > - Jerry Saltzer and I were the primary authors of the End-to-end argument
> paper, and the motivation was based *my* work on the original TCP and IP
> protocols. Dave Clark got involved significantly later than all those decisions,
> which were basically complete when he got involved. (Jerry was my thesis
> supervisor, I was his student, and I operated largely independently, taking input
> from various others at MIT). I mention this because Dave understands the
> end-to-end arguments, but he understands (as we all did) that it was a design
> *principle* and not a perfectly strict rule. That said, it's a rule that has a
> strong foundational argument from modularity and evolvability in a context where
> the system has to work on a wide range of infrastructures (not all knowable in
> advance) and support a wide range of usage/application-areas (not all knowable in
> advance). Treating the paper as if it were "DDC" declaring a law is just wrong. He
> wasn't Moses and it is not written on tablets. Dave
> > did have some "power" in his role of trying to achieve interoperability
> across diverse implementations. But his focus was primarily on interoperability,
> not other things. So ideas in the IP protocol like "TOS" which were largely
> placeholders for not-completely-worked-out concepts deferred to the future were
> left till later.
> 
> Yes, well understood, but he was in fact the link between the e2e paper and the
> differentiated services work. Although not a nominal author of the "two-bit" RFC,
> he was heavily involved in it, which is why I mentioned him. And he was very
> active in the IETF diffserv WG.
> > - It is clear (at least to me) that from the point of view of the source of
> an IP datagram, the "handling" of that datagram within the network of networks can
> vary, and so that is why there is a TOS field - to specify an interoperable,
> meaningfully described per-packet indicator of differential handling. In regards
> to the end-to-end argument, that handling choice is a network function, *to the
> extent that it can completely be implemented in the network itself*.
> >
> > Congestion management, however, is not achievable entirely and only within
> the network. That's completely obvious: congestion happens when the
> source-destination flows exceed the capacity of the network of networks to satisfy
> all demands.
> >
> > The network can only implement *certain* general kinds of mechanisms that may
> be used by the endpoints to resolve congestion:
> >
> > 1) admission controls. These are implemented at the interface between the
> source entity and the network of networks. They tend to be impractical in the
> Internet context, because there is, by a fundamental and irreversible design
> choice made by Cerf and Kahn (and the rest of us), no central controller of the
> entire network of networks. This is to make evolvability and scalability work. 5G
> (not an Internet system) implies a central controller, as does SNA, LTE, and many
> other networks. The Internet is an overlay on top of such networks.
> >
> > 2) signalling congestion to the endpoints, which will respond by slowing
> their transmission rate (or explicitly re-routing transmission, or compressing
> their content) through the network to match capacity. This response is done
> *above* the IP layer, and has proven very practical. The function in the network
> is reduced to "congestion signalling", in a universally understandable meaningful
> mechanism: packet drops, ECN, packet-pair separation in arrival time, ... 
> This limited function is essential within the network, because it is the state of
> the path(s) that is needed to implement the full function at the end points. So
> congestion signalling, like ECN, is implemented according to the end-to-end
> argument by carefully defining the network function to be the minimum necessary
> mechanism so that endpoints can control their rates.
> >
> > 3) automatic selection of routes for flows. It's perfectly fine to select
> different routes based on information in the IP header (the part that is intended
> to be read and understood by the network of networks). Now this is currently
> *rarely* done, due to the complexity of tracking more detailed routing information
> at the router level. But we had expected that eventually the Internet would be so
> well connected that there would be diverse routes with diverse capabilities. For
> example, the "Interplanetary Internet" works with datagrams, that can be
> implemented with IP, but not using TCP, which requires very low end-to-end
> latency. Thus, one would expect that TCP would not want any packets transferred
> over a path via Mars, or for that matter a geosynchronous satellite, even if the
> throughput would be higher.
> >
> > So one can imagine that eventually a "TOS" might say - send this packet
> preferably along a path that has at most 200 ms. RTT, *even if that leads to
> congestion signalling*, while another TOS might say "send this path over the most
> "capacious" set of paths, ignoring RTT entirely. (these are just for illustration,
> but obviously something like this woujld work).
> >
> > Note that TOS is really aimed at *route selection* preferences, and not
> queueing management of individual routers.
> 
> That may well have been the original intention, but it was hardly mentioned at all
> in the diffserv WG (which I co-chaired), and "QOS-based routing" was in very bad
> odour at that time.
>  
> >
> > Queueing management to share a single queue on a path for multiple priorities
> of traffic is not very compatible with "end-to-end arguments". There are any
> number of reasons why this doesn't work well. I can go into them. Mainly these
> reasons are why "diffserv" has never been adopted -
> 
> Oh, but it has, in lots of local deployments of voice over IP for example. It's
> what I've taken to calling a limited domain protocol. What has not happened is
> Internet-wide deployment, because...
> 
> > it's NOT interoperable because the diversity of traffic between endpoints is
> hard to specify in a way that translates into the network mechanisms. Of course
> any queue can be managed in some algorithmic way with parameters, but the
> endpoints that want to specify an end-to-end goal don't have a way to understand
> the impact of those parameters on a specific queue that is currently congested.
> 
> Yes. And thanks for your insights.
> 
> Brian
> 
> >
> >  
> >
> > Instead, the history of the Internet (and for that matter *all* networks,
> even Bell's voice systems) has focused on minimizing queueing delay to near zero
> throughout the network by whatever means it has at the endpoints or in the design.
> This is why we have AIMD's MD as a response to detection of congestion.
> >
> >  
> >
> > Pragmatic networks (those that operate in the real world) do not choose to
> operate with shared links in a saturated state. That's known in the phone business
> as the Mother's Day problem. You want to have enough capacity for the rare
> near-overload to never result in congestion.  Which means that the normal
> state of the network is very lightly loaded indeed, in order to minimize RTT.
> Consequently, focusing on somehow trying to optimize the utilization of the
> network to 100% is just a purely academic exercise. Since "priority" at the packet
> level within a queue only improves that case, it's just a focus of (bad) Ph.D.
> theses. (Good Ph.D. theses focus on actual real problems like getting the queues
> down to 1 packet or less by signalling the endpoints with information that allows
> them to do their job).
> >
> >  
> >
> > So, in considering what goes in the IP layer, both its header and the
> mechanics of the network of networks, it is those things that actually have
> implementable meaning in the network of networks when processing the IP datagram.
> The rest is "content" because the network of networks doesn't need to see it.
> >
> >  
> >
> > Thus, don't put anything in the IP header that belongs in the "content" part,
> just being a signal between end points. Some information used in the network of
> networks is also logically carried between endpoints.
> >
> >  
> >
> >  
> >
> > On Friday, June 21, 2019 4:37pm, "Brian E Carpenter"
> <brian.e.carpenter at gmail.com> said:
> >
> >> Below...
> >> On 21-Jun-19 21:33, Luca Muscariello wrote:
> >> > + David Reed, as I'm not sure he's on the ecn-sane list.
> >> >
> >> > To me, it seems like a very religious position against per-flow
> >> queueing. 
> >> > BTW, I fail to see how this would violate (in a "profound" way ) the
> e2e
> >> principle.
> >> >
> >> > When I read it (the e2e principle)
> >> >
> >> > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End
> Arguments in
> >> System Design". 
> >> > In: Proceedings of the Second International Conference on
> Distributed
> >> Computing Systems. Paris, France. 
> >> > April 8–10, 1981. IEEE Computer Society, pp. 509-512.
> >> > (available on line for free).
> >> >
> >> > It seems very much like the application of the Occam's razor to
> function
> >> placement in communication networks back in the 80s.
> >> > I see no conflict between what is written in that paper and per-flow
> queueing
> >> today, even after almost 40 years.
> >> >
> >> > If that was the case, then all service differentiation techniques
> would
> >> violate the e2e principle in a "profound" way too,
> >> > and dualQ too. A policer? A shaper? A priority queue?
> >> >
> >> > Luca
> >>
> >> Quoting RFC2638 (the "two-bit" RFC):
> >>
> >> >>> Both these
> >> >>> proposals seek to define a single common mechanism that is
> used
> >> by
> >> >>> interior network routers, pushing most of the complexity and
> state
> >> of
> >> >>> differentiated services to the network edges.
> >>
> >> I can't help thinking that if DDC had felt this was against the E2E
> principle,
> >> he would have kicked up a fuss when it was written.
> >>
> >> Bob's right, however, that there might be a tussle here. If end-points
> are
> >> attempting to pace their packets to suit their own needs, and the network
> is
> >> policing packets to support both service differentiation and fairness,
> >> these may well be competing rather than collaborating behaviours. And
> there
> >> probably isn't anything we can do about it by twiddling with algorithms.
> >>
> >> Brian
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >  
> >> >
> >> > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller
> <moeller0 at gmx.de
> >> <mailto:moeller0 at gmx.de>> wrote:
> >> >
> >> >
> >> >
> >> > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf at bobbriscoe.net
> >> <mailto:ietf at bobbriscoe.net>> wrote:
> >> > >
> >> > > Jake, all,
> >> > >
> >> > > You may not be aware of my long history of concern about how
> >> per-flow scheduling within endpoints and networks will limit the Internet
> in
> >> future. I find per-flow scheduling a violation of the e2e principle in
> such a
> >> profound way - the dynamic choice of the spacing between packets - that
> most
> >> people don't even associate it with the e2e principle.
> >> >
> >> > Maybe because it is not a violation of the e2e principle at all? My
> point
> >> is that with shared resources between the endpoints, the endpoints simply
> should
> >> have no expectancy that their choice of spacing between packets will be
> conserved.
> >> For the simple reason that it seems generally impossible to guarantee
> that
> >> inter-packet spacing is conserved (think "cross-traffic" at the
> bottleneck hop
> >> along the path and general bunching up of packets in the queue of a fast
> to slow
> >> transition*). I also would claim that the way L4S works (if it works) is
> to
> >> synchronize all active flows at the bottleneck which in tirn means each
> sender has
> >> only a very small timewindow in which to transmit a packet for it to hits
> its
> >> "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing
> delay
> >> guarantees will not work. In other words the senders have basically no
> say in the
> >> "spacing between packets", I fail to see how L4S improves upon FQ in that
> regard.
> >> >
> >> >
> >> >  IMHO having per-flow fairness as the defaults seems quite
> >> reasonable, endpoints can still throttle flows to their liking. Now
> per-flow
> >> fairness still can be "abused", so by itself it might not be sufficient,
> but
> >> neither is L4S as it has at best stochastic guarantees, as a single queue
> AQM
> >> (let's ignore the RFC3168 part of the AQM) there is the probability to
> send a
> >> throtteling signal to a low bandwidth flow (fair enough, it is only a
> mild
> >> throtteling signal, but still).
> >> > But enough about my opinion, what is the ideal fairness measure in
> your
> >> mind, and what is realistically achievable over the internet?
> >> >
> >> >
> >> > Best Regards
> >> >         Sebastian
> >> >
> >> >
> >> >
> >> >
> >> > >
> >> > > I detected that you were talking about FQ in a way that might
> have
> >> assumed my concern with it was just about implementation complexity. If
> you (or
> >> anyone watching) is not aware of the architectural concerns with
> per-flow
> >> scheduling, I can enumerate them.
> >> > >
> >> > > I originally started working on what became L4S to prove that
> it was
> >> possible to separate out reducing queuing delay from throughput
> scheduling. When
> >> Koen and I started working together on this, we discovered we had
> identical
> >> concerns on this.
> >> > >
> >> > >
> >> > >
> >> > > Bob
> >> > >
> >> > >
> >> > > --
> >> > >
> ________________________________________________________________
> >> > > Bob Briscoe             
>  
> >>              
>  http://bobbriscoe.net/
> >> > >
> >> > > _______________________________________________
> >> > > Ecn-sane mailing list
> >> > > Ecn-sane at lists.bufferbloat.net
> >> <mailto:Ecn-sane at lists.bufferbloat.net>
> >> > > https://lists.bufferbloat.net/listinfo/ecn-sane
> >> >
> >> > _______________________________________________
> >> > Ecn-sane mailing list
> >> > Ecn-sane at lists.bufferbloat.net
> >> <mailto:Ecn-sane at lists.bufferbloat.net>
> >> > https://lists.bufferbloat.net/listinfo/ecn-sane
> >> >
> >>
> >>
> >
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/ecn-sane/attachments/20190622/d9ebad61/attachment-0001.html>