[Bloat] Open Source Speed Test (was fast.com - Netflix's speed monitoring)

Sat Aug 27 08:16:08 EDT 2016

> On 27 Aug, 2016, at 04:06, David Lang <david at lang.hm> wrote:
> 
>>> so you can call it large queues instead of large buffers, but the result
>>> is that packets end up being 'in transit' for a long time.
>> 
>> No, a large queue is a bunch of packets waiting in a queue (which is contained in a buffer). A large buffer with zero or a small number of packets in it is not going to result in packets being in transit for a long time.
> 
> Is a large buffer that is never used really a large buffer? or does whatever prevents it from being used really turn it into a small buffer?

> I don't understand what you are trying to call out by trying to change the terminology.

If we’re talking terminology, I think we have to make better distinctions to avoid confusion.  There is a qualitative difference between a managed queue and both a large and small unmanaged queue; none of them behave similarly to each other.

A managed queue tries to keep itself empty, but does so by means of congestion signalling (marking or dropping a relatively small proportion of traffic), rather than placing a hard limit on queue length.  It *can* therefore fill up in various circumstances, including where the traffic simply ignores those signals, so its *peak* induced delay can be large; this is true of both flow-isolating and flow-blind queues.  However, the managed queue can achieve lower *average* induced delay than the large queue, and lower packet loss and higher link utilisation than the small queue, when given the buffer space of the large queue to work with.

Flow-isolation is an orthogonal property here; both managed and unmanaged implementations exist.  A flow-isolating queue aims to keep the induced delay to sparse flows, which use less than their fair share of the link, lower than to bulk flows; also to minimise the impact of unresponsive flows on responsive traffic.  The peak induced latency to any given bulk flow thus becomes less important than the peak induced latency to sparse flows.  With a flow-blind queue, all traffic suffers the same induced delay at any given moment, so the overall peak induced latency remains important.

Where I think the confusion arises here is between the *capacity* of the queue, which is static and often large for a managed queue, and the dynamic *backlog* of that queue, which is what a managed queue attempts to actively control.  In the case of a flow-isolating queue, there is also a distinction between the overall backlog of the queue, and the backlog of an individual subqueue.

I have noticed that some bufferbloat tests employ an unresponsive traffic burst as the latency measurement stream - particularly Netalyzr’s.  This does capture the difference between a large and small capacity queue, but is incapable of detecting AQM or flow-isolation, which are the preferred solutions to bufferbloat, unless the AQM is extremely aggressive when faced with unresponsive traffic.  A sufficiently aggressive response to satisfy such a test would however hurt goodput and packet loss on normal traffic, and would thus be counterproductive.

 - Jonathan Morton