[Bloat] summarizing the bitag latency report?

Jonathan Morton chromatix99 at gmail.com
Mon Nov 14 12:05:35 EST 2022


> On 12 Nov, 2022, at 1:16 am, Dave Taht via Bloat <bloat at lists.bufferbloat.net> wrote:
> 
> If you were to try to summarize this *in a paragraph*, what would you say?
> 
> https://www.bitag.org/documents/BITAG_latency_explained.pdf

I can get it down to *three* paragraphs while conveying the essentials:

The quality of an Internet path is measured by three factors:  throughput, latency, and packet loss.  Of these three measures, throughput is typically the least important for application performance, so long as a modest threshold is met - for example the US "broadband" definition of 25Mbps.  Packet loss is interpreted by computers as indicating congestion, which causes them to slow down network transfers unnecessarily; it also causes objectionable glitches in video and audio streams, and should thus be minimised.  Latency is the primary driver of perceived Internet quality for most applications in most circumstances.

Latency can be divided into "inherent" and "induced" components.  Inherent latency is simply the time it takes for a packet to traverse all the links in the path, outward and return.  Induced latency is the additional time spent deciding which of several links to direct the packet to, waiting for a shared medium, and/or stuck in a queue full of other packets going the same way.  Most applications are able to adapt to reasonable levels of inherent latency, but induced latency is much more difficult to manage due to its variability.  There are several ways to reduce induced latency without impairing throughput or packet loss, chiefly AQM and Fair Queuing, which can fruitfully be combined as in SQM.  SQM is widely, but not yet universally, deployed on the Internet, and works very well.

AQM is the practice of observing how big queues get, and signalling congestion in a deliberate way based on those observations.  ECN can be used to perform that signalling without any packet loss.  On traffic that doesn't support ECN, deliberately dropping packets in a controlled way is necessary.  These congestion signals cause applications to reduce their load on the network to match available capacity, and thereby reduce queuing.  Fair Queuing works orthogonally to this by treating each flow of traffic individually, so that one flow inducing heavy delays in its queue doesn't affect another flow which is lighter.  This makes it easy for very different applications to coexist on the same path, which often happens when there are several users in the same household or office.  SQM uses Fair Queuing, and also applies a separate AQM to each flow, so that congestion signals are directed solely to heavy flows.

If you really need it to be only *one* paragraph, the middle one might be the most essential.

> Also QoS, vs QoE. Try to imagine explaining the need to a CFO, or
> congresscritter. Feel free to take more than a paragraph.

QoS is Quality of Service.  QoE is Quality of Experience.  The two are very different concepts.

To illustrate this, consider a railway manager tasked with modernising his line by replacing steam trains with diesel ones.  He's a modern businessman keen to apply modern thinking to this task, so he delegates some underlings to gather data about the expected traffic flows on the line, as well as the types of train that are available for hire.

In the answers that come back, he focuses on two key figures:  the line carries 1000 passengers per day, and each carriage can seat 50 passengers.  Simple arithmetic shows that this demand can be met by running 20 carriages per day, but the manager rounds this up to 24 carriages to allow some margin for error.  After all, with the tremendous efficiency of diesel traction (compared to steam traction), he can afford to be a little generous.

One of the trains on offer is a 2000hp locomotive hauling 12 carriages - a very impressive sight, to be sure.  "Splendid," he thinks, "we can run that one twice a day, and that will meet demand with some margin to spare."  So that's what he does; once in the morning, and once in the evening.  The timetables are very easy to publish, too.

In the first month of operation, all of these trains turn up on time and with the correct number of carriages, and there are no breakdowns or accidents.  The specified capacity is therefore supplied.  This is an excellent "Quality of Service".

Yet the complaints start rolling in almost immediately.  Passengers who turn up wanting to travel at any other time than the two trains serve find themselves with an exceptionally long wait ahead of them.  Local police even report an increase in vagrancy complaints, due to passengers missing the evening train and having to sleep in the waiting rooms overnight.  This represents a very poor "Quality of Experience".

Learning from this misadventure, the manager goes back to his data and notes that one-carriage "railcars" are also available for hire.  For the next month's timetable, instead of the two 12-carriage trains each day, he will run one of these railcars every hour.  These will provide exactly the same seating capacity over the course of the day, but the waiting time will now be limited to a much more palatable duration.  (In Internet terms, he's optimised squarely for latency.)

Still the complaints come in - but now from different sources.  No longer are passengers waiting for hours and sleeping overnight in stations.  Instead, rush-hour commuters who had previously found the 12-carriage trains convenient are finding the railcars too crowded.  Even with over a hundred passengers crammed in like sardines, many more are left on the platforms and arrive at work late - or worse, come home to a cold dinner and an annoyed wife.  Simply put, demand is not evenly distributed through the day, but concentrated on particular times; at other times, the railcars are sufficient for the relatively small number of passengers, or even run almost empty.

So again, even though the "Quality of Service" is provided just as specified, the "Quality of Experience" for the passengers is very poor.  Indeed the overcrowding leads to some railcars being delayed, due to the difficulty of getting everyone in and out of the doors, and the conductors have great difficulty in checking tickets, hence a noticeable reduction in fare revenue.

Things improve markedly when the manager brings in 6-carriage express trains for the morning, lunchtime, and evening commuters, and continues to run the railcars at hourly intervals in between them, except for the small hours when some trains are removed due to minimal demand.  Now there are enough carriages in the rush-hour trains to satisfy commuters, and there are still trains running at other times so that nobody needs to wait particularly long for one.

In fact, demand increases substantially due to the good "Quality of Experience" that this new timetable provides, such that by the end of the first year, many of the railcars are upgraded to 3-carriage trains, and the commuter expresses are lengthened to 8 carriages.  Fare revenue is more than doubled.  The modernisation effort is a success.

The lesson here is that QoS is merely the means by which you may attempt to achieve high QoE.  Meeting QoS does not guarantee QoE.  Only if the QoS is designed around the factors that genuinely influence QoE will you succeed.  Unfortunately, many QoS schemes are inadequate for the needs of actual Internet users; this is because their designers have not kept up with the appropriate QoE factors.

 - Jonathan Morton



More information about the Bloat mailing list