[Ecn-sane] [tsvwg] per-flow scheduling

Luca Muscariello luca.muscariello at gmail.com
Sat Jun 22 18:30:18 EDT 2019


Thanks for the insights.

On Sun 23 Jun 2019 at 00:25, David P. Reed <dpreed at deepplum.com> wrote:

> Given the complexity of my broader comments, let me be clear that I have
> no problem with the broad concept of diffserv being compatible with the
> end-to-end arguments. I was trying to lay out what I think is a useful way
> to think about these kinds of issues within the Internet context.
>
>
>
> Similarly, per-flow scheduling as an end-to-end concept (different flows
> defined by address pairs being jointly managed as entities) makes great
> sense, but it's really important to be clear that queue prioritization
> within a single queue at entry to a bottleneck link is a special case
> mechanism, and not a general end-to-end concept at the IP datagram level,
> given the generality of IP as a network packet transport protocol. It's
> really tied closely to routing, which isn't specified in any way by IP,
> other than "best efforts", a term that has become much more well defined
> over the years (including the notions of dropping rather than storing
> packets, the idea that successive IP datagrams should traverse roughly the
> same path in order to have stable congestion detection, ...).
>
>
>
> Per-flow scheduling seems to work quite well in the cases where it
> applies, transparently below the IP datagram layer (that is, underneath the
> hourglass neck). IP effectively defines "flows", and it is reasonable to me
> that "best efforts" as a concept could include some notion of network-wide
> fairness among flows. Link-level "fairness" isn't a necessary precondition
> to network level fairness.
>
>
>
> On Saturday, June 22, 2019 5:10pm, "Brian E Carpenter" <
> brian.e.carpenter at gmail.com> said:
>
> > Just three or four small comments:
> >
> > On 23-Jun-19 07:50, David P. Reed wrote:
> > > Two points:
> > >
> > >
> > >
> > > - Jerry Saltzer and I were the primary authors of the End-to-end
> argument
> > paper, and the motivation was based *my* work on the original TCP and IP
> > protocols. Dave Clark got involved significantly later than all those
> decisions,
> > which were basically complete when he got involved. (Jerry was my thesis
> > supervisor, I was his student, and I operated largely independently,
> taking input
> > from various others at MIT). I mention this because Dave understands the
> > end-to-end arguments, but he understands (as we all did) that it was a
> design
> > *principle* and not a perfectly strict rule. That said, it's a rule that
> has a
> > strong foundational argument from modularity and evolvability in a
> context where
> > the system has to work on a wide range of infrastructures (not all
> knowable in
> > advance) and support a wide range of usage/application-areas (not all
> knowable in
> > advance). Treating the paper as if it were "DDC" declaring a law is just
> wrong. He
> > wasn't Moses and it is not written on tablets. Dave
> > > did have some "power" in his role of trying to achieve interoperability
> > across diverse implementations. But his focus was primarily on
> interoperability,
> > not other things. So ideas in the IP protocol like "TOS" which were
> largely
> > placeholders for not-completely-worked-out concepts deferred to the
> future were
> > left till later.
> >
> > Yes, well understood, but he was in fact the link between the e2e paper
> and the
> > differentiated services work. Although not a nominal author of the
> "two-bit" RFC,
> > he was heavily involved in it, which is why I mentioned him. And he was
> very
> > active in the IETF diffserv WG.
> > > - It is clear (at least to me) that from the point of view of the
> source of
> > an IP datagram, the "handling" of that datagram within the network of
> networks can
> > vary, and so that is why there is a TOS field - to specify an
> interoperable,
> > meaningfully described per-packet indicator of differential handling. In
> regards
> > to the end-to-end argument, that handling choice is a network function,
> *to the
> > extent that it can completely be implemented in the network itself*.
> > >
> > > Congestion management, however, is not achievable entirely and only
> within
> > the network. That's completely obvious: congestion happens when the
> > source-destination flows exceed the capacity of the network of networks
> to satisfy
> > all demands.
> > >
> > > The network can only implement *certain* general kinds of mechanisms
> that may
> > be used by the endpoints to resolve congestion:
> > >
> > > 1) admission controls. These are implemented at the interface between
> the
> > source entity and the network of networks. They tend to be impractical
> in the
> > Internet context, because there is, by a fundamental and irreversible
> design
> > choice made by Cerf and Kahn (and the rest of us), no central controller
> of the
> > entire network of networks. This is to make evolvability and scalability
> work. 5G
> > (not an Internet system) implies a central controller, as does SNA, LTE,
> and many
> > other networks. The Internet is an overlay on top of such networks.
> > >
> > > 2) signalling congestion to the endpoints, which will respond by
> slowing
> > their transmission rate (or explicitly re-routing transmission, or
> compressing
> > their content) through the network to match capacity. This response is
> done
> > *above* the IP layer, and has proven very practical. The function in the
> network
> > is reduced to "congestion signalling", in a universally understandable
> meaningful
> > mechanism: packet drops, ECN, packet-pair separation in arrival time,
> ...
> > This limited function is essential within the network, because it is the
> state of
> > the path(s) that is needed to implement the full function at the end
> points. So
> > congestion signalling, like ECN, is implemented according to the
> end-to-end
> > argument by carefully defining the network function to be the minimum
> necessary
> > mechanism so that endpoints can control their rates.
> > >
> > > 3) automatic selection of routes for flows. It's perfectly fine to
> select
> > different routes based on information in the IP header (the part that is
> intended
> > to be read and understood by the network of networks). Now this is
> currently
> > *rarely* done, due to the complexity of tracking more detailed routing
> information
> > at the router level. But we had expected that eventually the Internet
> would be so
> > well connected that there would be diverse routes with diverse
> capabilities. For
> > example, the "Interplanetary Internet" works with datagrams, that can be
> > implemented with IP, but not using TCP, which requires very low
> end-to-end
> > latency. Thus, one would expect that TCP would not want any packets
> transferred
> > over a path via Mars, or for that matter a geosynchronous satellite,
> even if the
> > throughput would be higher.
> > >
> > > So one can imagine that eventually a "TOS" might say - send this packet
> > preferably along a path that has at most 200 ms. RTT, *even if that
> leads to
> > congestion signalling*, while another TOS might say "send this path over
> the most
> > "capacious" set of paths, ignoring RTT entirely. (these are just for
> illustration,
> > but obviously something like this woujld work).
> > >
> > > Note that TOS is really aimed at *route selection* preferences, and not
> > queueing management of individual routers.
> >
> > That may well have been the original intention, but it was hardly
> mentioned at all
> > in the diffserv WG (which I co-chaired), and "QOS-based routing" was in
> very bad
> > odour at that time.
> >
> > >
> > > Queueing management to share a single queue on a path for multiple
> priorities
> > of traffic is not very compatible with "end-to-end arguments". There are
> any
> > number of reasons why this doesn't work well. I can go into them. Mainly
> these
> > reasons are why "diffserv" has never been adopted -
> >
> > Oh, but it has, in lots of local deployments of voice over IP for
> example. It's
> > what I've taken to calling a limited domain protocol. What has not
> happened is
> > Internet-wide deployment, because...
> >
> > > it's NOT interoperable because the diversity of traffic between
> endpoints is
> > hard to specify in a way that translates into the network mechanisms. Of
> course
> > any queue can be managed in some algorithmic way with parameters, but the
> > endpoints that want to specify an end-to-end goal don't have a way to
> understand
> > the impact of those parameters on a specific queue that is currently
> congested.
> >
> > Yes. And thanks for your insights.
> >
> > Brian
> >
> > >
> > >
> > >
> > > Instead, the history of the Internet (and for that matter *all*
> networks,
> > even Bell's voice systems) has focused on minimizing queueing delay to
> near zero
> > throughout the network by whatever means it has at the endpoints or in
> the design.
> > This is why we have AIMD's MD as a response to detection of congestion.
> > >
> > >
> > >
> > > Pragmatic networks (those that operate in the real world) do not
> choose to
> > operate with shared links in a saturated state. That's known in the
> phone business
> > as the Mother's Day problem. You want to have enough capacity for the
> rare
> > near-overload to never result in congestion.  Which means that the normal
> > state of the network is very lightly loaded indeed, in order to minimize
> RTT.
> > Consequently, focusing on somehow trying to optimize the utilization of
> the
> > network to 100% is just a purely academic exercise. Since "priority" at
> the packet
> > level within a queue only improves that case, it's just a focus of (bad)
> Ph.D.
> > theses. (Good Ph.D. theses focus on actual real problems like getting
> the queues
> > down to 1 packet or less by signalling the endpoints with information
> that allows
> > them to do their job).
> > >
> > >
> > >
> > > So, in considering what goes in the IP layer, both its header and the
> > mechanics of the network of networks, it is those things that actually
> have
> > implementable meaning in the network of networks when processing the IP
> datagram.
> > The rest is "content" because the network of networks doesn't need to
> see it.
> > >
> > >
> > >
> > > Thus, don't put anything in the IP header that belongs in the
> "content" part,
> > just being a signal between end points. Some information used in the
> network of
> > networks is also logically carried between endpoints.
> > >
> > >
> > >
> > >
> > >
> > > On Friday, June 21, 2019 4:37pm, "Brian E Carpenter"
> > <brian.e.carpenter at gmail.com> said:
> > >
> > >> Below...
> > >> On 21-Jun-19 21:33, Luca Muscariello wrote:
> > >> > + David Reed, as I'm not sure he's on the ecn-sane list.
> > >> >
> > >> > To me, it seems like a very religious position against per-flow
> > >> queueing.
> > >> > BTW, I fail to see how this would violate (in a "profound" way ) the
> > e2e
> > >> principle.
> > >> >
> > >> > When I read it (the e2e principle)
> > >> >
> > >> > Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) "End-to-End
> > Arguments in
> > >> System Design".
> > >> > In: Proceedings of the Second International Conference on
> > Distributed
> > >> Computing Systems. Paris, France.
> > >> > April 8–10, 1981. IEEE Computer Society, pp. 509-512.
> > >> > (available on line for free).
> > >> >
> > >> > It seems very much like the application of the Occam's razor to
> > function
> > >> placement in communication networks back in the 80s.
> > >> > I see no conflict between what is written in that paper and per-flow
> > queueing
> > >> today, even after almost 40 years.
> > >> >
> > >> > If that was the case, then all service differentiation techniques
> > would
> > >> violate the e2e principle in a "profound" way too,
> > >> > and dualQ too. A policer? A shaper? A priority queue?
> > >> >
> > >> > Luca
> > >>
> > >> Quoting RFC2638 (the "two-bit" RFC):
> > >>
> > >> >>> Both these
> > >> >>> proposals seek to define a single common mechanism that is
> > used
> > >> by
> > >> >>> interior network routers, pushing most of the complexity and
> > state
> > >> of
> > >> >>> differentiated services to the network edges.
> > >>
> > >> I can't help thinking that if DDC had felt this was against the E2E
> > principle,
> > >> he would have kicked up a fuss when it was written.
> > >>
> > >> Bob's right, however, that there might be a tussle here. If end-points
> > are
> > >> attempting to pace their packets to suit their own needs, and the
> network
> > is
> > >> policing packets to support both service differentiation and fairness,
> > >> these may well be competing rather than collaborating behaviours. And
> > there
> > >> probably isn't anything we can do about it by twiddling with
> algorithms.
> > >>
> > >> Brian
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Jun 21, 2019 at 9:00 AM Sebastian Moeller
> > <moeller0 at gmx.de
> > >> <mailto:moeller0 at gmx.de>> wrote:
> > >> >
> > >> >
> > >> >
> > >> > > On Jun 19, 2019, at 16:12, Bob Briscoe <ietf at bobbriscoe.net
> > >> <mailto:ietf at bobbriscoe.net>> wrote:
> > >> > >
> > >> > > Jake, all,
> > >> > >
> > >> > > You may not be aware of my long history of concern about how
> > >> per-flow scheduling within endpoints and networks will limit the
> Internet
> > in
> > >> future. I find per-flow scheduling a violation of the e2e principle in
> > such a
> > >> profound way - the dynamic choice of the spacing between packets -
> that
> > most
> > >> people don't even associate it with the e2e principle.
> > >> >
> > >> > Maybe because it is not a violation of the e2e principle at all? My
> > point
> > >> is that with shared resources between the endpoints, the endpoints
> simply
> > should
> > >> have no expectancy that their choice of spacing between packets will
> be
> > conserved.
> > >> For the simple reason that it seems generally impossible to guarantee
> > that
> > >> inter-packet spacing is conserved (think "cross-traffic" at the
> > bottleneck hop
> > >> along the path and general bunching up of packets in the queue of a
> fast
> > to slow
> > >> transition*). I also would claim that the way L4S works (if it works)
> is
> > to
> > >> synchronize all active flows at the bottleneck which in tirn means
> each
> > sender has
> > >> only a very small timewindow in which to transmit a packet for it to
> hits
> > its
> > >> "slot" in the bottleneck L4S scheduler, otherwise, L4S's low queueing
> > delay
> > >> guarantees will not work. In other words the senders have basically no
> > say in the
> > >> "spacing between packets", I fail to see how L4S improves upon FQ in
> that
> > regard.
> > >> >
> > >> >
> > >> >  IMHO having per-flow fairness as the defaults seems quite
> > >> reasonable, endpoints can still throttle flows to their liking. Now
> > per-flow
> > >> fairness still can be "abused", so by itself it might not be
> sufficient,
> > but
> > >> neither is L4S as it has at best stochastic guarantees, as a single
> queue
> > AQM
> > >> (let's ignore the RFC3168 part of the AQM) there is the probability to
> > send a
> > >> throtteling signal to a low bandwidth flow (fair enough, it is only a
> > mild
> > >> throtteling signal, but still).
> > >> > But enough about my opinion, what is the ideal fairness measure in
> > your
> > >> mind, and what is realistically achievable over the internet?
> > >> >
> > >> >
> > >> > Best Regards
> > >> >         Sebastian
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > >
> > >> > > I detected that you were talking about FQ in a way that might
> > have
> > >> assumed my concern with it was just about implementation complexity.
> If
> > you (or
> > >> anyone watching) is not aware of the architectural concerns with
> > per-flow
> > >> scheduling, I can enumerate them.
> > >> > >
> > >> > > I originally started working on what became L4S to prove that
> > it was
> > >> possible to separate out reducing queuing delay from throughput
> > scheduling. When
> > >> Koen and I started working together on this, we discovered we had
> > identical
> > >> concerns on this.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Bob
> > >> > >
> > >> > >
> > >> > > --
> > >> > >
> > ________________________________________________________________
> > >> > > Bob Briscoe
> >
> > >>
> >  http://bobbriscoe.net/
> > >> > >
> > >> > > _______________________________________________
> > >> > > Ecn-sane mailing list
> > >> > > Ecn-sane at lists.bufferbloat.net
> > >> <mailto:Ecn-sane at lists.bufferbloat.net>
> > >> > > https://lists.bufferbloat.net/listinfo/ecn-sane
> > >> >
> > >> > _______________________________________________
> > >> > Ecn-sane mailing list
> > >> > Ecn-sane at lists.bufferbloat.net
> > >> <mailto:Ecn-sane at lists.bufferbloat.net>
> > >> > https://lists.bufferbloat.net/listinfo/ecn-sane
> > >> >
> > >>
> > >>
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/ecn-sane/attachments/20190623/c528674b/attachment-0001.html>


More information about the Ecn-sane mailing list