[Cerowrt-devel] [Bloat] DC behaviors today

dpreed at reed.com dpreed at reed.com
Wed Dec 13 13:08:14 EST 2017


Just to be clear, I have built and operated a whole range of network platforms, as well as diagnosing problems and planning deployments of systems that include digital packet delivery in real contexts where cost and performance matter, for nearly 40 years now. So this isn't only some kind of radical opinion, but hard-won knowledge across my entire career. I also havea very strong theoretical background in queueing theory and control theory -- enough to teach a graduate seminar, anyway.
That said, there are lots of folks out there who have opinions different than mine. But far too many (such as those who think big buffers are "good", who brought us bufferbloat) are not aware of how networks are really used or the practical effects of their poor models of usage.
 
If it comforts you to think that I am just stating an "opinion", which must be wrong because it is not the "conventional wisdom" in the circles where you travel, fine. You are entitled to dismiss any ideas you don't like. But I would suggest you get data about your assumptions.
 
I don't know if I'm being trolled, but a couple of comments on the recent comments:
 
1. Statistical multiplexing viewed as an averaging/smoothing as an idea is, in my personal opinion and experience measuring real network behavior, a description of a theoretical phenomenon that is not real (e.g. "consider a spherical cow") that is amenable to theoretical analysis. Such theoretical analysis can make some gross estimates, but it breaks down quickly. The same thing is true of common economic theory that models practical markets by linear models (linear systems of differential equations are common) and gaussian probability distributions (gaussians are easily analyzed, but wrong. You can read the popular books by Nassim Taleb for an entertaining and enlightening deeper understanding of the economic problems with such modeling).
 
One of the features well observed in real measurements of real systems is that packet flows are "fractal", which means that there is a self-similarity of rate variability all time scales from micro to macro. As you look at smaller and smaller time scales, or larger and larger time scales, the packet request density per unit time never smooths out due to "averaging over sources". That is, there's no practical "statistical multiplexing" effect. There's also significant correlation among many packet arrivals - assuming they are statistically independent (which is required for the "law of large numbers" to apply) is often far from the real situation - flows that are assumed to be independent are usually strongly coupled.
 
The one exception where flows average out at a constant rate is when there is a "bottleneck". Then, there being no more capacity, the constant rate is forced, not by statistical averaging but by a very different process. One that is almost never desirable.
 
This is just what is observed in case after case.  Designers may imagine that their networks have "smooth averaging" properties. There's a strong thread in networking literature that makes this pretty-much-always-false assumption the basis of protocol designs, thinking about "Quality of Service" and other sorts of things. You can teach graduate students about a reality that does not exist, and get papers accepted in conferences where the reviewers have been trained in the same tradition of unreal assumptions.
 
2. I work every day with "datacenter" networking and distributed systems on 10 GigE and faster Ethernet fabrics with switches and trunking. I see the packet flows driven by distributed computing in real systems. Whenever the sustained peak load on a switch path reaches 100%, that's not "good", that's not "efficient" resource usage. That is a situation where computing is experiencing huge wasted capacity due to network congestion that is dramatically slowing down the desired workload.
 
Again this is because *real workloads* in distributed computation don't have smooth or averagable rates over interconnects. Latency is everything in that application too!
 
Yes, because one buys switches from vendors who don't know how to build or operate a server or a database at all, you see vendors trying to demonstrate their amazing throughput, but the people who build these systems (me, for example) are not looking at throughput or statistical multiplexing at all! We use "throughput" as a proxy for "latency under load". (and it is a poor proxy! Because vendors throw in big buffers, causing bufferbloat. See Arista Networks' attempts to justify their huge buffers as a "good thing" -- when it is just a case of something you have to design around by clocking the packets so they never accumulate in a buffer).
 
So, yes, the peak transfer rate matters, of course. And sometimes it is utilized for very good reason (when the latency of a file transfer as a whole is the latency that matters). But to be clear, just because as a user I want to download a Linux distro update as quickly as possible when it happens does NOT imply that the average load at any time scale is "statistically averaged" for residential networking. Quite the opposite! I buy Gigabit service to my house because I cannot predict when I will need it, but I almost never need it. My average rate (except once a month or so) is miniscule. This is true even though my house is a heavy user of Netflix.
 
The way that Gigbit residential service affects my "quality of service" is almost entirely that I get good "response time" to unpredictable demands. How quickly a Netflix stream can fill its play buffer is the measure. The data rate of any Netflix stream is, on average much, much less than a Gigabit. Buffers in the network would ruin my Netflix experience, because the buffering is best done at the "edge" as the End-to-End argument usually suggests. It's certainly NOT because of statistical multiplexing.
 
So when you are tempted to talk about "statistical multiplexing" smoothing out traffic flow take a pause and think about whether that really makes sense as a description of reality.
 
fq_codel is a good thing because it handles the awkward behavior at "peak load". It smooths out the impact of running out of resources. But that impact is still undesirable - if many Netflix flows are adding up to peak load, a new Netflix flow can't start very quickly. That results in terrible QoS from a Netflix user's point of view.
 
 


On Wednesday, December 13, 2017 11:41am, "Jonathan Morton" <chromatix99 at gmail.com> said:



> Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
Ten times peak demand?  No.
Ten times average demand estimated at time of deployment, and struggling badly with peak demand a decade later, yes.  And this is the transportation industry, where a decade is a *short* time - like less than a year in telecoms.
- Jonathan Morton


On 13 Dec 2017 17:27, "Neil Davies" <[ neil.davies at pnsol.com ]( mailto:neil.davies at pnsol.com )> wrote:




On 12 Dec 2017, at 22:53, [ dpreed at reed.com ]( mailto:dpreed at reed.com ) wrote:

Luca's point tends to be correct - variable latency destroys the stability of flow control loops, which destroys throughput, even when there is sufficient capacity to handle the load.
 
This is an indirect result of Little's Lemma (which is strictly true only for Poisson arrival, but almost any arrival process will have a similar interaction between latency and throughput).
Actually it is true for general arrival patterns (can’t lay my hands on the reference for the moment - but it was a while back that was shown) - what this points to is an underlying conservation law - that “delay and loss” are conserved in a scheduling process. This comes out of the M/M/1/K/K queueing system and associated analysis.
There is  conservation law (and Klienrock refers to this - at least in terms of delay - in 1965 - [ http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract ]( http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract )) at work here.
All scheduling systems can do is “distribute” the resulting “delay and loss” differentially amongst the (instantaneous set of) competing streams. 
Let me just repeat that - The “delay and loss” are a conserved quantity - scheduling can’t “destroy” it (they can influence higher level protocol behaviour) but not reduce the total amount of “delay and loss” that is being induced into the collective set of streams...


 
However, the other reason I say what I say so strongly is this:
 
Rant on.
 
Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only "benchmark setups" (or hot-rod races done for academic reasons or marketing reasons to claim some sort of "title") operate at peak supportable load any significant part of the time.
Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?


 
The reason for this is not just "fat pipes are better", but because bitrate of the underlying medium is an insignificant fraction of systems operational and capital expense.
Agree that (if you are the incumbent that ‘owns’ the low level transmission medium) that this is true (though the costs of lighting a new lambda are not trivial) - but that is not the experience of anyone else in the digital supply time


 
SLA's are specified in "uptime" not "bits transported", and a clogged pipe is defined as down when latency exceeds a small number.
Do you have any evidence you can reference for an SLA that treats a few ms as “down”? Most of the SLAs I’ve had dealings with use averages over fairly long time periods (e.g. a month) - and there is no quality in averages.


 
Typical operating points of corporate networks where the users are happy are single-digit percentage of max load.
Or less - they also detest the costs that they have to pay the network providers to try and de-risk their applications. There is also the issue that they measure averages (over 5min to 15min) they completely fail to capture (for example) the 15seconds when delay and jitter was high so the CEO’s video conference broke up.


 
This is also true of computer buses and memory controllers and storage interfaces IRL. Again, latency is the primary measure, and the system never focuses on operating points anywhere near max throughput.
Agreed - but wouldn’t it be nice if they could? I’ve worked on h/w systems where we have designed system to run near limits (the set-top box market is pretty cut-throat and the closer to saturation you can run and still deliver the acceptable outcome the cheaper the box the greater the profit margin for the set-top box provider)


 
Rant off.


Cheers
Neil


On Tuesday, December 12, 2017 1:36pm, "Dave Taht" <[ dave at taht.net ]( mailto:dave at taht.net )> said:



> 
> Luca Muscariello <[ luca.muscariello at gmail.com ]( mailto:luca.muscariello at gmail.com )> writes:
> 
> > I think everything is about response time, even throughput.
> >
> > If we compare the time to transmit a single packet from A to B, including
> > propagation delay, transmission delay and queuing delay,
> > to the time to move a much larger amount of data from A to B we use
> throughput
> > in this second case because it is a normalized
> > quantity w.r.t. response time (bytes over delivery time). For a single
> > transmission we tend to use latency.
> > But in the end response time is what matters.
> >
> > Also, even instantaneous throughput is well defined only for a time scale
> which
> > has to be much larger than the min RTT (propagation + transmission delays)
> > Agree also that looking at video, latency and latency budgets are better
> > quantities than throughput. At least more accurate.
> >
> > On Fri, Dec 8, 2017 at 8:05 AM, Mikael Abrahamsson <[ swmike at swm.pp.se ]( mailto:swmike at swm.pp.se )>
> wrote:
> >
> > On Mon, 4 Dec 2017, [ dpreed at reed.com ]( mailto:dpreed at reed.com ) wrote:
> >
> > I suggest we stop talking about throughput, which has been the
> mistaken
> > idea about networking for 30-40 years.
> >
> >
> > We need to talk both about latency and speed. Yes, speed is talked about
> too
> > much (relative to RTT), but it's not irrelevant.
> >
> > Speed of light in fiber means RTT is approx 1ms per 100km, so from
> Stockholm
> > to SFO my RTT is never going to be significantly below 85ms (8625km
> great
> > circle). It's current twice that.
> >
> > So we just have to accept that some services will never be deliverable
> > across the wider Internet, but have to be deployed closer to the
> customer
> > (as per your examples, some need 1ms RTT to work well), and we need
> lower
> > access latency and lower queuing delay. So yes, agreed.
> >
> > However, I am not going to concede that speed is "mistaken idea about
> > networking". No amount of smarter queuing is going to fix the problem if
> I
> > don't have enough throughput available to me that I need for my
> application.
> 
> In terms of the bellcurve here, throughput has increased much more
> rapidly than than latency has decreased, for most, and in an increasing
> majority of human-interactive cases (like video streaming), we often
> have enough throughput.
> 
> And the age old argument regarding "just have overcapacity, always"
> tends to work in these cases.
> 
> I tend not to care as much about how long it takes for things that do
> not need R/T deadlines as humans and as steering wheels do.
> 
> Propigation delay, while ultimately bound by the speed of light, is also
> affected by the wires wrapping indirectly around the earth - much slower
> than would be possible if we worked at it:
> 
> [ https://arxiv.org/pdf/1505.03449.pdf ]( https://arxiv.org/pdf/1505.03449.pdf )
> 
> Then there's inside the boxes themselves:
> 
> A lot of my struggles of late has been to get latencies and adaquate
> sampling techniques down below 3ms (my previous value for starting to
> reject things due to having too much noise) - and despite trying fairly
> hard, well... a process can't even sleep accurately much below 1ms, on
> bare metal linux. A dream of mine has been 8 channel high quality audio,
> with a video delay of not much more than 2.7ms for AR applications.
> 
> For comparison, an idle quad core aarch64 and dual core x86_64:
> 
> root at nanopineo2:~# irtt sleep
> 
> Testing sleep accuracy...
> 
> Sleep Duration Mean Error % Error
> 
> 1ns 13.353µs 1335336.9
> 
> 10ns 14.34µs 143409.5
> 
> 100ns 13.343µs 13343.9
> 
> 1µs 12.791µs 1279.2
> 
> 10µs 148.661µs 1486.6
> 
> 100µs 150.907µs 150.9
> 
> 1ms 168.001µs 16.8
> 
> 10ms 131.235µs 1.3
> 
> 100ms 145.611µs 0.1
> 
> 200ms 162.917µs 0.1
> 
> 500ms 169.885µs 0.0
> 
> 
> d at nemesis:~$ irtt sleep
> 
> Testing sleep accuracy...
> 
> 
> Sleep Duration Mean Error % Error
> 
> 1ns 668ns 66831.9
> 
> 10ns 672ns 6723.7
> 
> 100ns 557ns 557.6
> 
> 1µs 57.749µs 5774.9
> 
> 10µs 63.063µs 630.6
> 
> 100µs 67.737µs 67.7
> 
> 1ms 153.978µs 15.4
> 
> 10ms 169.709µs 1.7
> 
> 100ms 186.685µs 0.2
> 
> 200ms 176.859µs 0.1
> 
> 500ms 177.271µs 0.0
> 
> >
> > --
> > Mikael Abrahamsson email: [ swmike at swm.pp.se ]( mailto:swmike at swm.pp.se )
> > _______________________________________________
> >
> >
> > Bloat mailing list
> > [ Bloat at lists.bufferbloat.net ]( mailto:Bloat at lists.bufferbloat.net )
> > [ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )
> >
> >
> >
> > _______________________________________________
> > Bloat mailing list
> > [ Bloat at lists.bufferbloat.net ]( mailto:Bloat at lists.bufferbloat.net )
> > [ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )
>


[ Spam ]( https://portal.roaringpenguin.co.uk/canit/b.php?c=s&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212 )
[ Not spam ]( https://portal.roaringpenguin.co.uk/canit/b.php?c=n&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212 )
[ Forget previous vote ]( https://portal.roaringpenguin.co.uk/canit/b.php?c=f&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212 )
_______________________________________________Bloat mailing list[ Bloat at lists.bufferbloat.net ]( mailto:Bloat at lists.bufferbloat.net )[ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )
_______________________________________________
 Bloat mailing list
[ Bloat at lists.bufferbloat.net ]( mailto:Bloat at lists.bufferbloat.net )
[ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20171213/5ef77df4/attachment-0001.html>


More information about the Cerowrt-devel mailing list