[Bloat] Detecting bufferbloat from outside a node

Neil Davies neil.davies at pnsol.com
Mon May 4 08:35:42 EDT 2015


Jonathan

We see the problem as the difference between averages and instantaneous. 

Network media is never “average” used - it is either “in-use” or “idle” - what we were seeing (and it was not an ISP but the core of a public service network here in the UK) was that delay can be “high” even when the loading is “low” (in the particular 5minute period the actual offered traffic was <0.01% of the capacity) - it was that the path under examination happened to be the constraining factor for a bulk transfer - the induced delay was high enough to place at risk other real-time applications (as defined by the public service network’s users).

The reasoning that you seem to be applying below assumes a time-homogenity that doesn’t correspond to network traffic patterns that occur in the engagements we’ve done over the last 15 years. The graph I was referring to is the one example that we can publicly discuss (all the rest are under NDA!).

What you are describing - if I’m understanding it properly - is the “busy period”. I would accept that Network Providers (ISP’s, telcos etc) have a problem in that they are relying on the system becoming idle frequently (the busy periods not accreting into longer and longer periods of non-idleness). However that is a pattern as well as a load dependent phenomena. 

Neil

On 4 May 2015, at 13:17, Jonathan Morton <chromatix99 at gmail.com> wrote:

> 
>> On 4 May, 2015, at 14:39, Neil Davies <neil.davies at pnsol.com> wrote:
>> 
>>>> Noting that, delay and loss is, of course, a natural consequence of having a shared medium
>>> 
>>> Not so.  Delay and loss are inherent to link oversubscription, not to contention.  Without ECN, delay is traded off against loss by the size of the buffer; a higher loss rate keeps the queue shorter and thus the induced delay lower.
>> 
>> Sorry Jonathan - that’s not what we’ve observed. We’ve measured “excessive” delay on links that are averagely loaded << 0.1% (as measured over a 15 min period) - I can supply pointers to the graphs for that. 
> 
> Presumably those would involve oversubscription on short timescales, and a lot of link idle time between those episodes.
> 
> One ISP I know of charges by data volume per month, currently in units of 75GB (minimum 2 per month, so 150GB).  This is on ADSL lines where the link rate might reasonably be 15Mbps or so in the relevant direction.  At that speed, it would take 100,000 seconds to exhaust the first two units - which is not much more than 24 hours.  There is therefore roughly a 26-fold mismatch between the peak rate available to the user and the average rate he must maintain to stay within the data allowance.
> 
> (I am ignoring small niceties in the calculations here, in favour of revealing the big picture without too much heavy maths.)
> 
> By your measure, that would mean that the link could only ever be 3.85% utilised (1/26th) on month-long timescales, and is therefore undersubscribed.  But I can assure you that, during the small percentage of time that the link is in active use, it will spend some time at 100% utilisation on RTT timescales, with TCP/IP straining to achieve more than that.  That is link oversubscription which results in high induced delay.
> 
> More precisely, instantaneous link oversubscription results in either *increasing* induced delay (as the buffer fills) or lost packets (which *will* happen if the buffer becomes completely full), while instantaneous link undersubscription results in either *decreasing* induced delay (as the buffer drains) or link idle periods.  Long-timescale measures of link utilisation are simply averages of these instantaneous measures.
> 
>> A single flow can contend the medium just as much as a multiple ones
> 
> I think here, again, we are using wildly different terminology.
> 
> There is no contention for the medium on the dedicated full-duplex link I described, only for queue space - and given a single flow, it cannot contend with itself.
> 
> The same goes for a full-duplex shared-access medium (such as DOCSIS cable) with only one host active.  There is no contention for the medium, because it is always available when that single host requests it, which it will as soon as it has at least one packet in its queue.  There is a more-or-less fixed latency for medium access, which becomes part of what you call the structural delay.  The rest is down to over- or under-subscription on short timescales, as above.
> 
> On a half-duplex medium, such as obsolete bus Ethernet or not-so-obsolete wifi, then there can be some contention for the medium between forward data and reverse ack packets.  But I was *not* talking about half-duplex.  Full-duplex is an important enough subset of the problem - covering at least ADSL, cable, VDSL, satellite, fibre - on which most of the important effects can be observed, including the ones we’re talking about.
> 
> - Jonathan Morton
> 




More information about the Bloat mailing list