[Bloat] Little's Law mea culpa, but not invalidating my main point

Fri Jul 9 19:01:52 EDT 2021

David,

No question that non-stationarity and instability are what we often see in networks.  And, non-stationarity and instability are both topics that lead to very complex analytical problems in queueing theory.  You can find some results on the transient analysis in the queueing theory literature (including the second volume of my Queueing Systems book), but they are limited and hard. Nevertheless, the literature does contain some works on transient analysis of queueing systems as applied to network congestion control - again limited. On the other hand, as you said, control theory addresses stability head on and does offer some tools as well, but again, it is hairy. 

Averages are only averages, but they can provide valuable information. For sure, latency can and does confound behavior.  But, as you point out, it is the proliferation of control protocols that are, in some cases, deployed willy-nilly in networks without proper evaluation of their behavior that can lead to the nasty cycle of large transient latency, frantic repeating of web requests, protocols sending multiple copies, lack of awareness of true capacity or queue size or throughput, etc, all of which you articulate so well, create the chaos and frustration in the network.  Analyzing that is really difficult, and if we don’t measure and sense, we have no hope of understanding, controlling, or ameliorating such situations.  

Len

> On Jul 9, 2021, at 12:31 PM, David P. Reed <dpreed at deepplum.com> wrote:
> 
> Len - I admit I made a mistake in challenging Little's Law as being based on Poisson processes. It is more general. But it tells you an "average" in its base form, and latency averages are not useful for end user applications.
>  
> However, Little's Law does assume something that is not actually valid about the kind of distributions seen in the network, and in fact, it is NOT true that networks converge on Poisson arrival times.
>  
> The key issue is well-described in the sandard analysis of the M/M/1 queue (e.g. https://en.wikipedia.org/wiki/M/M/1_queue <https://en.wikipedia.org/wiki/M/M/1_queue>) , which is done only for Poisson processes, and is also limited to "stable" systems. But networks are never stable when fully loaded. They get unstable and those instabilities persist for a long time in the network. Instability is at core the underlying *requirement* of the Internet's usage.
>  
> So specifically: real networks, even large ones, and certainly the Internet today, are not asymptotic limits of sums of stationary stochastic arrival processes. Each esternal terminal of any real network has a real user there, running a real application, and the network is a complex graph. This makes it completely unlike a single queue. Even the links within a network carry a relatively small number of application flows. There's no ability to apply the Law of Large Numbers to the distributions, because any particular path contains only a small number of serialized flows with hightly variable rates.
>  
> Here's an example of what really happens in a real network (I've observed this in 5 different cities on ATT's cellular network, back when it was running Alcatel Lucent HSPA+ gear in those cities).
> But you can see this on any network where transient overload occurs, creating instability.
>  
>  
> At 7 AM, the data transmission of the network is roughty stable. That's because no links are overloaded within the network. Little's Law can tell you by observing the delay and throughput on any path that the average delay in the network is X.
>  
> Continue sampling delay in the network as the day wears on. At about 10 AM, ping delay starts to soar into the multiple second range. No packers are lost. The peak ping time is about 4000 milliseconds - 4 seconds in most of the networks. This is in downtown, no radio errors are reported, no link errors.
> So it is all queueing delay. 
>  
> Now what Little's law doesn't tell you much about average delay, because clearly *some* subpiece of the network is fully saturated. But what is interesting here is what is happening and where. You can't tell what is saturated, and in fact the entire network is quite unstable, because the peak is constantly varying and you don't know where the throughput is. All the packets are now arriving 4 seconds or so later.
>  
> Why is the situaton not worse than 4 seconds? Well, there are multiple things going on:
>  
> 1) TCP may be doing a lot of retransmissions (non-Poisson at all, not random either. The arrival process is entirely deterministic in each source, based on the retransmission timeout) or it may not be.
>  
> 2) Users are pissed off, because they clicked on a web page, and got nothing back. They retry on their screen, or they try another site. Meanwhile, the underlying TCP connection remains there, pumping the network full of more packets on that old path, which is still backed up with packets that haven't been delivered that are sitting in queues. The real arrival process is not Poisson at all, its a deterministic, repeated retrsnsmission plus a new attempt to connect to a new site.
>  
> 3) When the users get a web page back eventually, it is filled with names of other pieces needed to display that web page, which causes some number (often as many as 100) new pages to be fetched, ALL at the same time. Certainly not a stochastic process that will just obey the law of large numbers.
>  
> All of these things are the result of initial instability, causing queues to build up.
>  
> So what is the state of the system? is it stable? is it stochastic? Is it the sum of enough stochastic stable flows to average out to Poisson?
>  
> The answer is clearly NO. Control theory (not queuing theory) suggests that this system is completely uncontrolled and unstable.
>  
> So if the system is in this state, what does Little's Lemma tell us? What is the meaning of that hightly variable 4 second delay on ping packets, in terms of average utilizaton of the network?
>  
> We don't even know what all the users really might need, if the system hadn't become unstable, because some users have given up, and others are trying even harder, and new users are arriving.
>  
> What we do know, because ATT (at my suggestion) reconfigured their system after blaming Apple Computer company for "bugs" in the original iPhone in public, is that simply *dropping* packets sitting in queues more than a couple milliseconds MADE THE USERS HAPPY. Apparently the required capacity was there all along! 
>  
> So I conclude that the 4 second delay was the largest delay users could barely tolerate before deciding the network was DOWN and going away. And that the backup was the accumulation of useless packets sitting in queues because none of the end systems were receiving congestion signals (which for the Internet stack begins with packet dropping).
>  
> I should say that most operators, and especially ATT in this case, do not measure end-to-end latency. Instead they use Little's Lemma to query routers for their current throughput in bits per second, and calculate latency as if Little's Lemma applied. This results in reports to management that literally say:
>  
>   The network is not dropping packets, utilization is near 100% on many of our switches and routers.
>  
> And management responds, Hooray! Because utilization of 100% of their hardware is their investors' metric of maximizing profits. The hardware they are operating is fully utilized. No waste! And users are happy because no packets have been dropped!
>  
> Hmm... what's wrong with this picture? I can see why Donovan, CTO, would accuse Apple of lousy software that was ruining iPhone user experience!  His network was operating without ANY problems.
> So it must be Apple!
>  
> Well, no. The entire problem, as we saw when ATT just changed to shorten egress queues and drop packets when the egress queues overflowed, was that ATT's network was amplifying instability, not at the link level, but at the network level.
>  
> And queueing theory can help with that, but *intro queueing theory* cannot.
>  
> And a big part of that problem is the pervasive belief that, at the network boundary, *Poisson arrival* is a reasonable model for use in all cases.
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> On Friday, July 9, 2021 6:05am, "Luca Muscariello" <muscariello at ieee.org> said:
> 
> For those who might be interested in Little's law
> there is a nice paper by John Little on the occasion 
> of the 50th anniversary  of the result.
> https://www.informs.org/Blogs/Operations-Research-Forum/Little-s-Law-as-Viewed-on-its-50th-Anniversary <https://www.informs.org/Blogs/Operations-Research-Forum/Little-s-Law-as-Viewed-on-its-50th-Anniversary>
> https://www.informs.org/content/download/255808/2414681/file/little_paper.pdf <https://www.informs.org/content/download/255808/2414681/file/little_paper.pdf>
>  
> Nice read. 
> Luca 
>  
> P.S. 
> Who has not a copy of L. Kleinrock's books? I do have and am not ready to lend them!
> On Fri, Jul 9, 2021 at 11:01 AM Leonard Kleinrock <lk at cs.ucla.edu <mailto:lk at cs.ucla.edu>> wrote:
> David,
> I totally appreciate  your attention to when and when not analytical modeling works. Let me clarify a few things from your note.
> First, Little's law (also known as Little’s lemma or, as I use in my book, Little’s result) does not assume Poisson arrivals -  it is good for any arrival process and any service process and is an equality between time averages.  It states that the time average of the number in a system (for a sample path w) is equal to the average arrival rate to the system multiplied by the time-averaged time in the system for that sample path.  This is often written as   NTimeAvg =λ·TTimeAvg .  Moreover, if the system is also ergodic, then the time average equals the ensemble average and we often write it as N ̄ = λ T ̄ .  In any case, this requires neither Poisson arrivals nor exponential service times.  
>  
> Queueing theorists often do study the case of Poisson arrivals.  True, it makes the analysis easier, yet there is a better reason it is often used, and that is because the sum of a large number of independent stationary renewal processes approaches a Poisson process.  So nature often gives us Poisson arrivals.  
> Best,
> Len
> On Jul 8, 2021, at 12:38 PM, David P. Reed <dpreed at deepplum.com <mailto:dpreed at deepplum.com>> wrote:
> 
> I will tell you flat out that the arrival time distribution assumption made by Little's Lemma that allows "estimation of queue depth" is totally unreasonable on ANY Internet in practice.
>  
> The assumption is a Poisson Arrival Process. In reality, traffic arrivals in real internet applications are extremely far from Poisson, and, of course, using TCP windowing, become highly intercorrelated with crossing traffic that shares the same queue.
>  
> So, as I've tried to tell many, many net-heads (people who ignore applications layer behavior, like the people that think latency doesn't matter to end users, only throughput), end-to-end packet arrival times on a practical network are incredibly far from Poisson - and they are more like fractal probability distributions, very irregular at all scales of time.
>  
> So, the idea that iperf can estimate queue depth by Little's Lemma by just measuring saturation of capacity of a path is bogus.The less Poisson, the worse the estimate gets, by a huge factor.
>  
>  
> Where does the Poisson assumption come from?  Well, like many theorems, it is the simplest tractable closed form solution - it creates a simplified view, by being a "single-parameter" distribution (the parameter is called lambda for a Poisson distribution).  And the analysis of a simple queue with poisson arrival distribution and a static, fixed service time is the first interesting Queueing Theory example in most textbooks. It is suggestive of an interesting phenomenon, but it does NOT characterize any real system.
>  
> It's the queueing theory equivalent of "First, we assume a spherical cow...". in doing an example in a freshman physics class.
>  
> Unfortunately, most networking engineers understand neither queuing theory nor application networking usage in interactive applications. Which makes them arrogant. They assume all distributions are poisson!
>  
>  
> On Tuesday, July 6, 2021 9:46am, "Ben Greear" <greearb at candelatech.com <mailto:greearb at candelatech.com>> said:
> 
> > Hello,
> > 
> > I am interested to hear wish lists for network testing features. We make test
> > equipment, supporting lots
> > of wifi stations and a distributed architecture, with built-in udp, tcp, ipv6,
> > http, ... protocols,
> > and open to creating/improving some of our automated tests.
> > 
> > I know Dave has some test scripts already, so I'm not necessarily looking to
> > reimplement that,
> > but more fishing for other/new ideas.
> > 
> > Thanks,
> > Ben
> > 
> > On 7/2/21 4:28 PM, Bob McMahon wrote:
> > > I think we need the language of math here. It seems like the network
> > power metric, introduced by Kleinrock and Jaffe in the late 70s, is something
> > useful.
> > > Effective end/end queue depths per Little's law also seems useful. Both are
> > available in iperf 2 from a test perspective. Repurposing test techniques to
> > actual
> > > traffic could be useful. Hence the question around what exact telemetry
> > is useful to apps making socket write() and read() calls.
> > >
> > > Bob
> > >
> > > On Fri, Jul 2, 2021 at 10:07 AM Dave Taht <dave.taht at gmail.com <mailto:dave.taht at gmail.com>
> > <mailto:dave.taht at gmail.com <mailto:dave.taht at gmail.com>>> wrote:
> > >
> > > In terms of trying to find "Quality" I have tried to encourage folk to
> > > both read "zen and the art of motorcycle maintenance"[0], and Deming's
> > > work on "total quality management".
> > >
> > > My own slice at this network, computer and lifestyle "issue" is aiming
> > > for "imperceptible latency" in all things. [1]. There's a lot of
> > > fallout from that in terms of not just addressing queuing delay, but
> > > caching, prefetching, and learning more about what a user really needs
> > > (as opposed to wants) to know via intelligent agents.
> > >
> > > [0] If you want to get depressed, read Pirsig's successor to "zen...",
> > > lila, which is in part about what happens when an engineer hits an
> > > insoluble problem.
> > > [1] https://www.internetsociety.org/events/latency2013/ <https://www.internetsociety.org/events/latency2013/>
> > <https://www.internetsociety.org/events/latency2013/ <https://www.internetsociety.org/events/latency2013/>>
> > >
> > >
> > >
> > > On Thu, Jul 1, 2021 at 6:16 PM David P. Reed <dpreed at deepplum.com <mailto:dpreed at deepplum.com>
> > <mailto:dpreed at deepplum.com <mailto:dpreed at deepplum.com>>> wrote:
> > > >
> > > > Well, nice that the folks doing the conference  are willing to
> > consider that quality of user experience has little to do with signalling rate at
> > the
> > > physical layer or throughput of FTP transfers.
> > > >
> > > >
> > > >
> > > > But honestly, the fact that they call the problem "network quality"
> > suggests that they REALLY, REALLY don't understand the Internet isn't the hardware
> > or
> > > the routers or even the routing algorithms *to its users*.
> > > >
> > > >
> > > >
> > > > By ignoring the diversity of applications now and in the future,
> > and the fact that we DON'T KNOW what will be coming up, this conference will
> > likely fall
> > > into the usual trap that net-heads fall into - optimizing for some
> > imaginary reality that doesn't exist, and in fact will probably never be what
> > users
> > > actually will do given the chance.
> > > >
> > > >
> > > >
> > > > I saw this issue in 1976 in the group developing the original
> > Internet protocols - a desire to put *into the network* special tricks to optimize
> > ASR33
> > > logins to remote computers from terminal concentrators (aka remote
> > login), bulk file transfers between file systems on different time-sharing
> > systems, and
> > > "sessions" (virtual circuits) that required logins. And then trying to
> > exploit underlying "multicast" by building it into the IP layer, because someone
> > > thought that TV broadcast would be the dominant application.
> > > >
> > > >
> > > >
> > > > Frankly, to think of "quality" as something that can be "provided"
> > by "the network" misses the entire point of "end-to-end argument in system
> > design".
> > > Quality is not a property defined or created by The Network. If you want
> > to talk about Quality, you need to talk about users - all the users at all times,
> > > now and into the future, and that's something you can't do if you don't
> > bother to include current and future users talking about what they might expect
> > to
> > > experience that they don't experience.
> > > >
> > > >
> > > >
> > > > There was much fighting back in 1976 that basically involved
> > "network experts" saying that the network was the place to "solve" such issues as
> > quality,
> > > so applications could avoid having to solve such issues.
> > > >
> > > >
> > > >
> > > > What some of us managed to do was to argue that you can't "solve"
> > such issues. All you can do is provide a framework that enables different uses to
> > > *cooperate* in some way.
> > > >
> > > >
> > > >
> > > > Which is why the Internet drops packets rather than queueing them,
> > and why diffserv cannot work.
> > > >
> > > > (I know the latter is conftroversial, but at the moment, ALL of
> > diffserv attempts to talk about end-to-end applicaiton specific metrics, but
> > never, ever
> > > explains what the diffserv control points actually do w.r.t. what the IP
> > layer can actually control. So it is meaningless - another violation of the
> > > so-called end-to-end principle).
> > > >
> > > >
> > > >
> > > > Networks are about getting packets from here to there, multiplexing
> > the underlying resources. That's it. Quality is a whole different thing. Quality
> > can
> > > be improved by end-to-end approaches, if the underlying network provides
> > some kind of thing that actually creates a way for end-to-end applications to
> > > affect queueing and routing decisions, and more importantly getting
> > "telemetry" from the network regarding what is actually going on with the other
> > > end-to-end users sharing the infrastructure.
> > > >
> > > >
> > > >
> > > > This conference won't talk about it this way. So don't waste your
> > time.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wednesday, June 30, 2021 8:12pm, "Dave Taht"
> > <dave.taht at gmail.com <mailto:dave.taht at gmail.com> <mailto:dave.taht at gmail.com <mailto:dave.taht at gmail.com>>> said:
> > > >
> > > > > The program committee members are *amazing*. Perhaps, finally,
> > we can
> > > > > move the bar for the internet's quality metrics past endless,
> > blind
> > > > > repetitions of speedtest.
> > > > >
> > > > > For complete details, please see:
> > > > > https://www.iab.org/activities/workshops/network-quality/ <https://www.iab.org/activities/workshops/network-quality/>
> > <https://www.iab.org/activities/workshops/network-quality/ <https://www.iab.org/activities/workshops/network-quality/>>
> > > > >
> > > > > Submissions Due: Monday 2nd August 2021, midnight AOE
> > (Anywhere On Earth)
> > > > > Invitations Issued by: Monday 16th August 2021
> > > > >
> > > > > Workshop Date: This will be a virtual workshop, spread over
> > three days:
> > > > >
> > > > > 1400-1800 UTC Tue 14th September 2021
> > > > > 1400-1800 UTC Wed 15th September 2021
> > > > > 1400-1800 UTC Thu 16th September 2021
> > > > >
> > > > > Workshop co-chairs: Wes Hardaker, Evgeny Khorov, Omer Shapira
> > > > >
> > > > > The Program Committee members:
> > > > >
> > > > > Jari Arkko, Olivier Bonaventure, Vint Cerf, Stuart Cheshire,
> > Sam
> > > > > Crowford, Nick Feamster, Jim Gettys, Toke Hoiland-Jorgensen,
> > Geoff
> > > > > Huston, Cullen Jennings, Katarzyna Kosek-Szott, Mirja
> > Kuehlewind,
> > > > > Jason Livingood, Matt Mathias, Randall Meyer, Kathleen
> > Nichols,
> > > > > Christoph Paasch, Tommy Pauly, Greg White, Keith Winstein.
> > > > >
> > > > > Send Submissions to: network-quality-workshop-pc at iab.org <mailto:network-quality-workshop-pc at iab.org>
> > <mailto:network-quality-workshop-pc at iab.org <mailto:network-quality-workshop-pc at iab.org>>.
> > > > >
> > > > > Position papers from academia, industry, the open source
> > community and
> > > > > others that focus on measurements, experiences, observations
> > and
> > > > > advice for the future are welcome. Papers that reflect
> > experience
> > > > > based on deployed services are especially welcome. The
> > organizers
> > > > > understand that specific actions taken by operators are
> > unlikely to be
> > > > > discussed in detail, so papers discussing general categories
> > of
> > > > > actions and issues without naming specific technologies,
> > products, or
> > > > > other players in the ecosystem are expected. Papers should not
> > focus
> > > > > on specific protocol solutions.
> > > > >
> > > > > The workshop will be by invitation only. Those wishing to
> > attend
> > > > > should submit a position paper to the address above; it may
> > take the
> > > > > form of an Internet-Draft.
> > > > >
> > > > > All inputs submitted and considered relevant will be published
> > on the
> > > > > workshop website. The organisers will decide whom to invite
> > based on
> > > > > the submissions received. Sessions will be organized according
> > to
> > > > > content, and not every accepted submission or invited attendee
> > will
> > > > > have an opportunity to present as the intent is to foster
> > discussion
> > > > > and not simply to have a sequence of presentations.
> > > > >
> > > > > Position papers from those not planning to attend the virtual
> > sessions
> > > > > themselves are also encouraged. A workshop report will be
> > published
> > > > > afterwards.
> > > > >
> > > > > Overview:
> > > > >
> > > > > "We believe that one of the major factors behind this lack of
> > progress
> > > > > is the popular perception that throughput is the often sole
> > measure of
> > > > > the quality of Internet connectivity. With such narrow focus,
> > people
> > > > > don’t consider questions such as:
> > > > >
> > > > > What is the latency under typical working conditions?
> > > > > How reliable is the connectivity across longer time periods?
> > > > > Does the network allow the use of a broad range of protocols?
> > > > > What services can be run by clients of the network?
> > > > > What kind of IPv4, NAT or IPv6 connectivity is offered, and
> > are there firewalls?
> > > > > What security mechanisms are available for local services,
> > such as DNS?
> > > > > To what degree are the privacy, confidentiality, integrity
> > and
> > > > > authenticity of user communications guarded?
> > > > >
> > > > > Improving these aspects of network quality will likely depend
> > on
> > > > > measurement and exposing metrics to all involved parties,
> > including to
> > > > > end users in a meaningful way. Such measurements and exposure
> > of the
> > > > > right metrics will allow service providers and network
> > operators to
> > > > > focus on the aspects that impacts the users’ experience
> > most and at
> > > > > the same time empowers users to choose the Internet service
> > that will
> > > > > give them the best experience."
> > > > >
> > > > >
> > > > > --
> > > > > Latest Podcast:
> > > > >
> > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/ <https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/>
> > <https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/ <https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/>>
> > > > >
> > > > > Dave Täht CTO, TekLibre, LLC
> > > > > _______________________________________________
> > > > > Cerowrt-devel mailing list
> > > > > Cerowrt-devel at lists.bufferbloat.net <mailto:Cerowrt-devel at lists.bufferbloat.net>
> > <mailto:Cerowrt-devel at lists.bufferbloat.net <mailto:Cerowrt-devel at lists.bufferbloat.net>>
> > > > > https://lists.bufferbloat.net/listinfo/cerowrt-devel <https://lists.bufferbloat.net/listinfo/cerowrt-devel>
> > <https://lists.bufferbloat.net/listinfo/cerowrt-devel <https://lists.bufferbloat.net/listinfo/cerowrt-devel>>
> > > > >
> > >
> > >
> > >
> > > --
> > > Latest Podcast:
> > > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/ <https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/>
> > <https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/ <https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/>>
> > >
> > > Dave Täht CTO, TekLibre, LLC
> > > _______________________________________________
> > > Make-wifi-fast mailing list
> > > Make-wifi-fast at lists.bufferbloat.net <mailto:Make-wifi-fast at lists.bufferbloat.net>
> > <mailto:Make-wifi-fast at lists.bufferbloat.net <mailto:Make-wifi-fast at lists.bufferbloat.net>>
> > > https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>
> > <https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>>
> > >
> > >
> > > This electronic communication and the information and any files transmitted
> > with it, or attached to it, are confidential and are intended solely for the use
> > of
> > > the individual or entity to whom it is addressed and may contain information
> > that is confidential, legally privileged, protected by privacy laws, or otherwise
> > > restricted from disclosure to anyone else. If you are not the intended
> > recipient or the person responsible for delivering the e-mail to the intended
> > recipient,
> > > you are hereby notified that any use, copying, distributing, dissemination,
> > forwarding, printing, or copying of this e-mail is strictly prohibited. If you
> > > received this e-mail in error, please return the e-mail to the sender, delete
> > it from your computer, and destroy any printed copy of it.
> > >
> > > _______________________________________________
> > > Starlink mailing list
> > > Starlink at lists.bufferbloat.net <mailto:Starlink at lists.bufferbloat.net>
> > > https://lists.bufferbloat.net/listinfo/starlink <https://lists.bufferbloat.net/listinfo/starlink>
> > >
> > 
> > 
> > --
> > Ben Greear <greearb at candelatech.com <mailto:greearb at candelatech.com>>
> > Candela Technologies Inc http://www.candelatech.com <http://www.candelatech.com/>
> >
> _______________________________________________
> Starlink mailing list
> Starlink at lists.bufferbloat.net <mailto:Starlink at lists.bufferbloat.net>
> https://lists.bufferbloat.net/listinfo/starlink <https://lists.bufferbloat.net/listinfo/starlink>_______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast at lists.bufferbloat.net <mailto:Make-wifi-fast at lists.bufferbloat.net>
> https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20210709/2456d31a/attachment-0001.html>