[Bloat] Detecting bufferbloat from outside a node

Mon May 4 08:17:47 EDT 2015

> On 4 May, 2015, at 14:39, Neil Davies <neil.davies at pnsol.com> wrote:
> 
>>> Noting that, delay and loss is, of course, a natural consequence of having a shared medium
>> 
>> Not so.  Delay and loss are inherent to link oversubscription, not to contention.  Without ECN, delay is traded off against loss by the size of the buffer; a higher loss rate keeps the queue shorter and thus the induced delay lower.
> 
> Sorry Jonathan - that’s not what we’ve observed. We’ve measured “excessive” delay on links that are averagely loaded << 0.1% (as measured over a 15 min period) - I can supply pointers to the graphs for that. 

Presumably those would involve oversubscription on short timescales, and a lot of link idle time between those episodes.

One ISP I know of charges by data volume per month, currently in units of 75GB (minimum 2 per month, so 150GB).  This is on ADSL lines where the link rate might reasonably be 15Mbps or so in the relevant direction.  At that speed, it would take 100,000 seconds to exhaust the first two units - which is not much more than 24 hours.  There is therefore roughly a 26-fold mismatch between the peak rate available to the user and the average rate he must maintain to stay within the data allowance.

(I am ignoring small niceties in the calculations here, in favour of revealing the big picture without too much heavy maths.)

By your measure, that would mean that the link could only ever be 3.85% utilised (1/26th) on month-long timescales, and is therefore undersubscribed.  But I can assure you that, during the small percentage of time that the link is in active use, it will spend some time at 100% utilisation on RTT timescales, with TCP/IP straining to achieve more than that.  That is link oversubscription which results in high induced delay.

More precisely, instantaneous link oversubscription results in either *increasing* induced delay (as the buffer fills) or lost packets (which *will* happen if the buffer becomes completely full), while instantaneous link undersubscription results in either *decreasing* induced delay (as the buffer drains) or link idle periods.  Long-timescale measures of link utilisation are simply averages of these instantaneous measures.

> A single flow can contend the medium just as much as a multiple ones

I think here, again, we are using wildly different terminology.

There is no contention for the medium on the dedicated full-duplex link I described, only for queue space - and given a single flow, it cannot contend with itself.

The same goes for a full-duplex shared-access medium (such as DOCSIS cable) with only one host active.  There is no contention for the medium, because it is always available when that single host requests it, which it will as soon as it has at least one packet in its queue.  There is a more-or-less fixed latency for medium access, which becomes part of what you call the structural delay.  The rest is down to over- or under-subscription on short timescales, as above.

On a half-duplex medium, such as obsolete bus Ethernet or not-so-obsolete wifi, then there can be some contention for the medium between forward data and reverse ack packets.  But I was *not* talking about half-duplex.  Full-duplex is an important enough subset of the problem - covering at least ADSL, cable, VDSL, satellite, fibre - on which most of the important effects can be observed, including the ones we’re talking about.

 - Jonathan Morton