[Bloat] Detecting bufferbloat from outside a node

Mon Apr 27 16:39:10 EDT 2015

On Mon, 27 Apr 2015, Paolo Valente wrote:

> Il giorno 27/apr/2015, alle ore 12:23, Toke Høiland-Jørgensen <toke at toke.dk> ha scritto:
>
>> Paolo Valente <paolo.valente at unimore.it> writes:
>>
>>> I am sorry, but I realized that what I said was incomplete. The main
>>> cause of my concern is that, from outside the node, we do not know
>>> whether a VoIP packet departs ad a given time because the application
>>> wants it to be sent at that time or because it has waited in the
>>> buffer for a lot of time. Similarly, we do not know how long the VoIP
>>> application will wait before getting its incoming packets delivered.
>>
>> No, not unless the application tells you (by, for instance,
>> timestamping; depending on where in the network stack the timestamp is
>> applied, you can measure different instances of bloat).
>
> That’s exactly what I was thinking about. Actually it seems the only solution to me.
>
> What apparently makes things more difficult is that I am not allowed either to choose the applications to run or to interfere in any way with the flows (e.g., by injecting some extra packet).
>
> Any pointer to previous/current work on this topic?
>
>> Or if you know
>> that an application is supposed to answer you immediately (as is the
>> case with a regular 'ping'), you can measure if it does so even when
>> otherwise loaded.
>>
>
> A ping was one of the first simple actions I suggested, but the answer was, as above: no you cannot ‘touch' the network!
>
>> Of course, you also might not measure anything, if the bottleneck is
>> elsewhere. But if you can control the conditions well enough, you can
>> probably avoid this; just be aware of it. In Linux, combating
>> bufferbloat has been quite the game of whack-a-mole over the last
>> several years :)
>>
>
> Then I guess that now I am trying to build a good mallet according to the 
> rules of the game for this company :)
>
> In any case, the target networks should be observable at such a level that, 
> yes, all relevant conditions should be under control (if one does not make 
> mistakes). My problem is, as I wrote above, to find out what information I can 
> and have to look at.

What is it that you do have available?

Bufferbloat usually isn't a huge problem on the leaf node where the applications 
are running. They usually have a fast local LAN link.

Bufferbloat causes most of it's problems when it's on a middlebox where the 
available bandwidth changes so that one link becomes congested.

If you can monitor packets going in and out of such links, you should be able to 
exactly measure the latency you get going through the device.

If you are trying to probe the network from the outside, without being able to 
even generate ping packets, then you have a problem.

If you can monitor ping packets going into the network, you can figure out how 
long they take to get back out.

Look for other protocols that should have a very fast response time. DNS and NTP 
are probably pretty good options. HTTP requests for small static pages aren't 
always reliable, but can be useful (or especially ones that check for cache 
expiration, HTTP HEAD commands for example)

If you can look at such traffic over a shortish, but not tiny, timeframe, you 
should be able to find the minimum response time for such traffic, and that can 
give you a pretty good idea of the about of minimum latency involved.

David Lang