[Bloat] [Cerowrt-devel] DC behaviors today

Neil Davies neil.davies at pnsol.com
Wed Dec 13 14:55:29 EST 2017


Please - my email was not an intention to troll - I wanted to establish a dialogue, I am sorry if I’ve offended.
> On 13 Dec 2017, at 18:08, dpreed at reed.com wrote:
> 
> Just to be clear, I have built and operated a whole range of network platforms, as well as diagnosing problems and planning deployments of systems that include digital packet delivery in real contexts where cost and performance matter, for nearly 40 years now. So this isn't only some kind of radical opinion, but hard-won knowledge across my entire career. I also havea very strong theoretical background in queueing theory and control theory -- enough to teach a graduate seminar, anyway.

I accept that - if we are laying out bona fides, I have acted as thesis advisor to people working in this area over 20 years, and I continue to work with network operators, system designers and research organisations (mainly in the EU) in this area.

> That said, there are lots of folks out there who have opinions different than mine. But far too many (such as those who think big buffers are "good", who brought us bufferbloat) are not aware of how networks are really used or the practical effects of their poor models of usage.
> 
> If it comforts you to think that I am just stating an "opinion", which must be wrong because it is not the "conventional wisdom" in the circles where you travel, fine. You are entitled to dismiss any ideas you don't like. But I would suggest you get data about your assumptions.
> 
> I don't know if I'm being trolled, but a couple of comments on the recent comments:
> 
> 1. Statistical multiplexing viewed as an averaging/smoothing as an idea is, in my personal opinion and experience measuring real network behavior, a description of a theoretical phenomenon that is not real (e.g. "consider a spherical cow") that is amenable to theoretical analysis. Such theoretical analysis can make some gross estimates, but it breaks down quickly. The same thing is true of common economic theory that models practical markets by linear models (linear systems of differential equations are common) and gaussian probability distributions (gaussians are easily analyzed, but wrong. You can read the popular books by Nassim Taleb for an entertaining and enlightening deeper understanding of the economic problems with such modeling).

I would fully accept that seeing statistical (or perhaps, better named, stochastic) multiplexing as an averaging process is vast over simplification of the complexity. However, I see the underlying mathematics as capturing a much richer description(s), for example of the transient behaviour - queuing theory (in its usual undergraduate formulation) tends to gloss over the edge / extreme conditions  as well as dealing with non-stationary arrival phenomena (such as can occur in the presence of adaptive protocols).

For example - one approach to solve the underlying Markov Chain systems (as the operational semantic representation of a queueing system)  is to represent them as transition matrices and then “solve” those matrices for steady state [as you probably know - think of that as backstory for the interested reader].

We’ve used such transition matrices to examine “relaxation times” of queueing / scheduling algorithms - i.e given that a buffer has filled, how quickly will the system relax back towards “steady state”. There are assumptions behind this, of course, but viewing the buffer state as a probability distribution and seeing how that distribution evolves after, say, an impulse change in load helps at lot to generate new approaches.

Cards on the table - I don’t see networks as a (purely) natural phenomena (as, say, chemistry or physics) - but as a more mathematical one. Queuing systems are (relatively) simple automata being pushed through their states by (non-stationary but broadly characterisable in stochastic terms) arrivals and departures (which are not so stochastically varied as they are related to the actual packet sizes.). There are rules to that mathematical game imposed by real-world physics, but there are other ways of constructing (and configuring) the actions of those automata to create “better” solutions (for various types of “better”).

> 
> One of the features well observed in real measurements of real systems is that packet flows are "fractal", which means that there is a self-similarity of rate variability all time scales from micro to macro. As you look at smaller and smaller time scales, or larger and larger time scales, the packet request density per unit time never smooths out due to "averaging over sources". That is, there's no practical "statistical multiplexing" effect. There's also significant correlation among many packet arrivals - assuming they are statistically independent (which is required for the "law of large numbers" to apply) is often far from the real situation - flows that are assumed to be independent are usually strongly coupled.

I remember this debate and its evolution, Hurst parameters and all that. I also understand that a collection of on/off Poisson sources looks fractal - I found that “the universe if fractal - live with it” ethos of limited practical use (except to help people say it was not solvable). When I saw those results the question I asked myself (because not seeing them a “natural” phenomena) "what is the right way to interact with the traffic patterns to regain acceptable levels of mathematical understanding?” - i.e what is the right intervention.

I agree that flows become coupled - every time two flows share a common path/resource they have that potential, the strength of that coupling and how to decouple them is what is useful to understand. It does not take much “randomness” (i.e perturbation of streams arrival patterns) to radically reduce that coupling - thankfully such randomness tends to occur due to issues of differential path length (hence delay).

Must admit I like randomness (in limited amounts) -  it is very useful - CDMA is just one example of such.

> 
> The one exception where flows average out at a constant rate is when there is a "bottleneck". Then, there being no more capacity, the constant rate is forced, not by statistical averaging but by a very different process. One that is almost never desirable.
> 
> This is just what is observed in case after case.  Designers may imagine that their networks have "smooth averaging" properties. There's a strong thread in networking literature that makes this pretty-much-always-false assumption the basis of protocol designs, thinking about "Quality of Service" and other sorts of things. You can teach graduate students about a reality that does not exist, and get papers accepted in conferences where the reviewers have been trained in the same tradition of unreal assumptions.

Agreed - there is a massive disconnect between a lot of the literature (and the people who make their living generating it - [to those people, please don’t take offence, queueing theory is really useful it is just the real world is a lot more non-stationary than you model]) and reality.

> 
> 2. I work every day with "datacenter" networking and distributed systems on 10 GigE and faster Ethernet fabrics with switches and trunking. I see the packet flows driven by distributed computing in real systems. Whenever the sustained peak load on a switch path reaches 100%, that's not "good", that's not "efficient" resource usage. That is a situation where computing is experiencing huge wasted capacity due to network congestion that is dramatically slowing down the desired workload.

Imagine that there were two flows - one that required low latency (e.g a real-time response as it was part of a large distributed computation) and other flows that could make useful progress if they suffered the delay (and to some extent, the loss effects) of the other traffic.

If the operational scenario you are working in consists of “mono service” (as you describe above) then there is no room for any differential service - I would content that (as important as data centres style systems are) they are not a universal phenomenon.

It is my understanding that Google uses this two tier notion to get high utilisation from their network interconnects while still preserving the performance of their services. I see large scale (i.e. public internets) not as a mono-service but as a “poly service” - there are multiple demands for timeliness etc that exist out there for “real services”.

> 
> Again this is because *real workloads* in distributed computation don't have smooth or averagable rates over interconnects. Latency is everything in that application too!

Yep - understand that - designed and built large scale message passing supercomputers in the ‘80s and ‘90s - even wrote a book on how to construct, measure and analyse their interconnects. Still have 70+ Inmos transputers (and the cross-bar switching infrastructure) in the garage.

> 
> Yes, because one buys switches from vendors who don't know how to build or operate a server or a database at all, you see vendors trying to demonstrate their amazing throughput, but the people who build these systems (me, for example) are not looking at throughput or statistical multiplexing at all! We use "throughput" as a proxy for "latency under load". (and it is a poor proxy! Because vendors throw in big buffers, causing bufferbloat. See Arista Networks' attempts to justify their huge buffers as a "good thing" -- when it is just a case of something you have to design around by clocking the packets so they never accumulate in a buffer).

Again - we are in violent agreement - this is the (misguided) belief of product managers that “more is better” - so they put more and more buffering in to their systems

> 
> So, yes, the peak transfer rate matters, of course. And sometimes it is utilized for very good reason (when the latency of a file transfer as a whole is the latency that matters). But to be clear, just because as a user I want to download a Linux distro update as quickly as possible when it happens does NOT imply that the average load at any time scale is "statistically averaged" for residential networking. Quite the opposite! I buy Gigabit service to my house because I cannot predict when I will need it, but I almost never need it. My average rate (except once a month or so) is miniscule. This is true even though my house is a heavy user of Netflix.

Again - violent agreement - what matters is “the outcome”; bulk data transport is just one case (and, unfortunately the one that appears most frequently in those papers mentioned above); what the Netflix user is interested in is “probability of buffering event per watched hour” or “time to first frame being displayed”

Take heart - you are really not alone here, there are plenty of people in the Telecoms industry that understand this (engineering, not marketing or senior management). What has happened is that people have been sold “top speed” and others (like the Googles and Netflix of this world) are _extremely_ worried that if the transport quality of their data suffers their business models disappear.

Capacity planning this is difficult - undressing the behavioural dynamics of (application level) demand is what is needed. This is a large weakness in the planning of the digital supply chains of today.

> 
> The way that Gigbit residential service affects my "quality of service" is almost entirely that I get good "response time" to unpredictable demands. How quickly a Netflix stream can fill its play buffer is the measure. The data rate of any Netflix stream is, on average much, much less than a Gigabit. Buffers in the network would ruin my Netflix experience, because the buffering is best done at the "edge" as the End-to-End argument usually suggests. It's certainly NOT because of statistical multiplexing.

Not quite as violent agreement here - Netflix (once streaming) is not that sensitive to delay - a burst of a 100ms-500ms for a second or so does not put their key outcome (assuring that the payout buffer does not empty) at too much risk.

We’ve worked with people who have created risks for Netflix delivery (accidentally I might add - they though they were doing “the right thing”) by increasing their network infrastructure to 100G delivery everywhere. That change (combined with others made by CDN people - TCP offload engines) created so much non-stationarity in the load so as to cause delay and loss spikes that *did* cause VoD playout buffers to empty.  This is an example of where “more capacity” produced worse outcomes.

This is still a pretty young industry - plenty of room for new original research out there (but for those paper creators reading this out there - step away from the TCP bulk streams, they are not the thing that is really interesting, the dynamic behavioural aspects are much more interesting to mine for new papers)

> 
> So when you are tempted to talk about "statistical multiplexing" smoothing out traffic flow take a pause and think about whether that really makes sense as a description of reality.

I see “trad” statistical multiplexing as the way that the industry has conned itself into creating (probably) unsustainable delivery models - it has put itself on a “keep building bigger” approach to just stand still - all because it doesn’t face up to issues of managing “delay and loss” coherently. The inherent two-degrees of freedom and the fact that such attenuation is conserved.

> 
> fq_codel is a good thing because it handles the awkward behavior at "peak load". It smooths out the impact of running out of resources. But that impact is still undesirable - if many Netflix flows are adding up to peak load, a new Netflix flow can't start very quickly. That results in terrible QoS from a Netflix user's point of view.

I would suggest that there are other ways of dealing with the impact of “peak” (i.e where instantaneous demand exceeds supply over a long enough timescale to start effecting the most delay/loss sensitive application in the collective multiplexed stream).  I would also agree that if all the streams are of the same “bound on delay and loss” requirements (i.e *all* Netflix) then 100%+ of all the same load (over, again the appropriate timescale - which for Netflix VoD in streaming is about 20s to 30s) then end-user disappointment is the only thing that can occur.

Again, not intended to troll - I think we are agreeing that current (as per most literature / received wisdom) have just about run their course - my assertion is that mathematics needed is out there (it is _not_ traditional queueing theory - but does spring from similar roots).

Cheers

Neil

> 
> 
> 
> 
> On Wednesday, December 13, 2017 11:41am, "Jonathan Morton" <chromatix99 at gmail.com> said:
> 
> > Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
> Ten times peak demand?  No.
> Ten times average demand estimated at time of deployment, and struggling badly with peak demand a decade later, yes.  And this is the transportation industry, where a decade is a *short* time - like less than a year in telecoms.
> - Jonathan Morton
> 
> On 13 Dec 2017 17:27, "Neil Davies" <neil.davies at pnsol.com <mailto:neil.davies at pnsol.com>> wrote:
> 
> On 12 Dec 2017, at 22:53, dpreed at reed.com <mailto:dpreed at reed.com> wrote:
> 
> Luca's point tends to be correct - variable latency destroys the stability of flow control loops, which destroys throughput, even when there is sufficient capacity to handle the load.
> 
> This is an indirect result of Little's Lemma (which is strictly true only for Poisson arrival, but almost any arrival process will have a similar interaction between latency and throughput).
> Actually it is true for general arrival patterns (can’t lay my hands on the reference for the moment - but it was a while back that was shown) - what this points to is an underlying conservation law - that “delay and loss” are conserved in a scheduling process. This comes out of the M/M/1/K/K queueing system and associated analysis.
> There is  conservation law (and Klienrock refers to this - at least in terms of delay - in 1965 - http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract <http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract>) at work here.
> All scheduling systems can do is “distribute” the resulting “delay and loss” differentially amongst the (instantaneous set of) competing streams.
> Let me just repeat that - The “delay and loss” are a conserved quantity - scheduling can’t “destroy” it (they can influence higher level protocol behaviour) but not reduce the total amount of “delay and loss” that is being induced into the collective set of streams...
> 
> 
> However, the other reason I say what I say so strongly is this:
> 
> Rant on.
> 
> Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only "benchmark setups" (or hot-rod races done for academic reasons or marketing reasons to claim some sort of "title") operate at peak supportable load any significant part of the time.
> Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
> 
> 
> The reason for this is not just "fat pipes are better", but because bitrate of the underlying medium is an insignificant fraction of systems operational and capital expense.
> Agree that (if you are the incumbent that ‘owns’ the low level transmission medium) that this is true (though the costs of lighting a new lambda are not trivial) - but that is not the experience of anyone else in the digital supply time
> 
> 
> SLA's are specified in "uptime" not "bits transported", and a clogged pipe is defined as down when latency exceeds a small number.
> Do you have any evidence you can reference for an SLA that treats a few ms as “down”? Most of the SLAs I’ve had dealings with use averages over fairly long time periods (e.g. a month) - and there is no quality in averages.
> 
> 
> Typical operating points of corporate networks where the users are happy are single-digit percentage of max load.
> Or less - they also detest the costs that they have to pay the network providers to try and de-risk their applications. There is also the issue that they measure averages (over 5min to 15min) they completely fail to capture (for example) the 15seconds when delay and jitter was high so the CEO’s video conference broke up.
> 
> 
> This is also true of computer buses and memory controllers and storage interfaces IRL. Again, latency is the primary measure, and the system never focuses on operating points anywhere near max throughput.
> Agreed - but wouldn’t it be nice if they could? I’ve worked on h/w systems where we have designed system to run near limits (the set-top box market is pretty cut-throat and the closer to saturation you can run and still deliver the acceptable outcome the cheaper the box the greater the profit margin for the set-top box provider)
> 
> 
> Rant off.
> 
> Cheers
> Neil
> 
> On Tuesday, December 12, 2017 1:36pm, "Dave Taht" <dave at taht.net <mailto:dave at taht.net>> said:
> 
> >
> > Luca Muscariello <luca.muscariello at gmail.com <mailto:luca.muscariello at gmail.com>> writes:
> >
> > > I think everything is about response time, even throughput.
> > >
> > > If we compare the time to transmit a single packet from A to B, including
> > > propagation delay, transmission delay and queuing delay,
> > > to the time to move a much larger amount of data from A to B we use
> > throughput
> > > in this second case because it is a normalized
> > > quantity w.r.t. response time (bytes over delivery time). For a single
> > > transmission we tend to use latency.
> > > But in the end response time is what matters.
> > >
> > > Also, even instantaneous throughput is well defined only for a time scale
> > which
> > > has to be much larger than the min RTT (propagation + transmission delays)
> > > Agree also that looking at video, latency and latency budgets are better
> > > quantities than throughput. At least more accurate.
> > >
> > > On Fri, Dec 8, 2017 at 8:05 AM, Mikael Abrahamsson <swmike at swm.pp.se <mailto:swmike at swm.pp.se>>
> > wrote:
> > >
> > > On Mon, 4 Dec 2017, dpreed at reed.com <mailto:dpreed at reed.com> wrote:
> > >
> > > I suggest we stop talking about throughput, which has been the
> > mistaken
> > > idea about networking for 30-40 years.
> > >
> > >
> > > We need to talk both about latency and speed. Yes, speed is talked about
> > too
> > > much (relative to RTT), but it's not irrelevant.
> > >
> > > Speed of light in fiber means RTT is approx 1ms per 100km, so from
> > Stockholm
> > > to SFO my RTT is never going to be significantly below 85ms (8625km
> > great
> > > circle). It's current twice that.
> > >
> > > So we just have to accept that some services will never be deliverable
> > > across the wider Internet, but have to be deployed closer to the
> > customer
> > > (as per your examples, some need 1ms RTT to work well), and we need
> > lower
> > > access latency and lower queuing delay. So yes, agreed.
> > >
> > > However, I am not going to concede that speed is "mistaken idea about
> > > networking". No amount of smarter queuing is going to fix the problem if
> > I
> > > don't have enough throughput available to me that I need for my
> > application.
> >
> > In terms of the bellcurve here, throughput has increased much more
> > rapidly than than latency has decreased, for most, and in an increasing
> > majority of human-interactive cases (like video streaming), we often
> > have enough throughput.
> >
> > And the age old argument regarding "just have overcapacity, always"
> > tends to work in these cases.
> >
> > I tend not to care as much about how long it takes for things that do
> > not need R/T deadlines as humans and as steering wheels do.
> >
> > Propigation delay, while ultimately bound by the speed of light, is also
> > affected by the wires wrapping indirectly around the earth - much slower
> > than would be possible if we worked at it:
> >
> > https://arxiv.org/pdf/1505.03449.pdf <https://arxiv.org/pdf/1505.03449.pdf>
> >
> > Then there's inside the boxes themselves:
> >
> > A lot of my struggles of late has been to get latencies and adaquate
> > sampling techniques down below 3ms (my previous value for starting to
> > reject things due to having too much noise) - and despite trying fairly
> > hard, well... a process can't even sleep accurately much below 1ms, on
> > bare metal linux. A dream of mine has been 8 channel high quality audio,
> > with a video delay of not much more than 2.7ms for AR applications.
> >
> > For comparison, an idle quad core aarch64 and dual core x86_64:
> >
> > root at nanopineo2:~# irtt sleep
> >
> > Testing sleep accuracy...
> >
> > Sleep Duration Mean Error % Error
> >
> > 1ns 13.353µs 1335336.9
> >
> > 10ns 14.34µs 143409.5
> >
> > 100ns 13.343µs 13343.9
> >
> > 1µs 12.791µs 1279.2
> >
> > 10µs 148.661µs 1486.6
> >
> > 100µs 150.907µs 150.9
> >
> > 1ms 168.001µs 16.8
> >
> > 10ms 131.235µs 1.3
> >
> > 100ms 145.611µs 0.1
> >
> > 200ms 162.917µs 0.1
> >
> > 500ms 169.885µs 0.0
> >
> >
> > d at nemesis:~$ irtt sleep
> >
> > Testing sleep accuracy...
> >
> >
> > Sleep Duration Mean Error % Error
> >
> > 1ns 668ns 66831.9
> >
> > 10ns 672ns 6723.7
> >
> > 100ns 557ns 557.6
> >
> > 1µs 57.749µs 5774.9
> >
> > 10µs 63.063µs 630.6
> >
> > 100µs 67.737µs 67.7
> >
> > 1ms 153.978µs 15.4
> >
> > 10ms 169.709µs 1.7
> >
> > 100ms 186.685µs 0.2
> >
> > 200ms 176.859µs 0.1
> >
> > 500ms 177.271µs 0.0
> >
> > >
> > > --
> > > Mikael Abrahamsson email: swmike at swm.pp.se <mailto:swmike at swm.pp.se>
> > > _______________________________________________
> > >
> > >
> > > Bloat mailing list
> > > Bloat at lists.bufferbloat.net <mailto:Bloat at lists.bufferbloat.net>
> > > https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
> > >
> > >
> > >
> > > _______________________________________________
> > > Bloat mailing list
> > > Bloat at lists.bufferbloat.net <mailto:Bloat at lists.bufferbloat.net>
> > > https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
> >
> 
> 
> 
> 
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net <mailto:Bloat at lists.bufferbloat.net>
> https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net <mailto:Bloat at lists.bufferbloat.net>
> https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20171213/4d5a58ba/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20171213/4d5a58ba/attachment-0001.sig>


More information about the Bloat mailing list