[Bloat] Measuring latency-under-load consistently

Fri Mar 11 22:52:00 EST 2011

On 12 Mar, 2011, at 5:19 am, richard wrote:

>> 3) Flow smoothness, measured as the maximum time between sequential received data for any continuous flow, also expressed in Hz.  This is an important metric for video and radio streaming, and one which CUBIC will probably do extremely badly at if there are large buffers in the path (without AQM or Blackpool).
>> 
> 
> Am I correct that your "Flow smoothness" is the inverse if jitter? We
> should probably keep to a standard nomenclature. What should we call
> this and/or should we call it something else or invert the concept and
> call it what we already do - jitter?

I'm not certain that it's the same as what you call jitter, but it could be.  Because I'm going to be measuring at the application level, I don't necessarily get to see when every single packet arrives, particularly if they arrive out of order.  So what I'm measuring is the "lumpiness" of the application data-flow progress, but inverted to "smoothness" (ie. measured in Hz rather than ms) so that bigger numbers are better.

Using my big-easy-numbers example, suppose you have a 30-second unmanaged drop-tail queue, and nothing to stop it filling up.  For a while, packets will arrive in order, so the inter-arrival delay seen by the application is at most the RTT (as during the very beginning of the slow-start, which I think I will exclude from the measurement) and usually less as a continuous stream builds up.

But then the queue fills up and a packet is dropped.  At this point, progress as seen by the application will stop *dead* as soon as that missing packet's position reaches the head of the queue.

The sending TCP will now retransmit that packet.  But the queue is still approximately full because the congestion feedback didn't happen until now, so it will take another 30 seconds for the data to reach the application.  At this point the progress is instantaneously very large, and hopefully will continue more smoothly.

But the maximum inter-arrival delay after that episode is now 30 seconds (or 0.033 Hz), even though packets were arriving correctly throughout that time.  That's what I'm measuring here.

Most links are much less severe than that, of course, but it's this kind of thing that stops radio and video streaming from working properly.

On the much less severe end of the scale, this will also measure the burstiness of flows in the case when there's more than one at once - usually you will get a bunch of packets from one flow, then a bunch from another, and so on, but SFQ tends to fix that for you if you have it.  It will probably also pick up some similar effects from 802.11n aggregation and other link-level congestion-avoidance techniques.

 - Jonathan