[Bloat] Measuring latency-under-load consistently

Fri Mar 11 23:04:11 EST 2011

OK - you make a good case for a new measure as my understanding of
jitter is latency related and typically measured at the link level (udp)
rather than at the application level.

I infer then that this will do things like impact the CPU load and disk
load, and might for example introduce "ringing" or harmonics into such
sub systems if/when applications end up "in sync" due to being "less
smooth" in their data output to the lower level IP levels.

It will be affected by session drops due to timeouts as well as the need
to "fill the pipe" on a reconnect in such applications as streaming
video (my area) so that a key frame can come into the system and restart
the interrupted video play.

In my case I'm particularly allergic to such "restarts" in the midst of
a streaming program that is going out to multiple (thousands) of
receiving systems.

richard

On Sat, 2011-03-12 at 05:52 +0200, Jonathan Morton wrote:
> On 12 Mar, 2011, at 5:19 am, richard wrote:
> 
> >> 3) Flow smoothness, measured as the maximum time between sequential received data for any continuous flow, also expressed in Hz.  This is an important metric for video and radio streaming, and one which CUBIC will probably do extremely badly at if there are large buffers in the path (without AQM or Blackpool).
> >> 
> > 
> > Am I correct that your "Flow smoothness" is the inverse if jitter? We
> > should probably keep to a standard nomenclature. What should we call
> > this and/or should we call it something else or invert the concept and
> > call it what we already do - jitter?
> 
> I'm not certain that it's the same as what you call jitter, but it could be.  Because I'm going to be measuring at the application level, I don't necessarily get to see when every single packet arrives, particularly if they arrive out of order.  So what I'm measuring is the "lumpiness" of the application data-flow progress, but inverted to "smoothness" (ie. measured in Hz rather than ms) so that bigger numbers are better.
> 
> Using my big-easy-numbers example, suppose you have a 30-second unmanaged drop-tail queue, and nothing to stop it filling up.  For a while, packets will arrive in order, so the inter-arrival delay seen by the application is at most the RTT (as during the very beginning of the slow-start, which I think I will exclude from the measurement) and usually less as a continuous stream builds up.
> 
> But then the queue fills up and a packet is dropped.  At this point, progress as seen by the application will stop *dead* as soon as that missing packet's position reaches the head of the queue.
> 
> The sending TCP will now retransmit that packet.  But the queue is still approximately full because the congestion feedback didn't happen until now, so it will take another 30 seconds for the data to reach the application.  At this point the progress is instantaneously very large, and hopefully will continue more smoothly.
> 
> But the maximum inter-arrival delay after that episode is now 30 seconds (or 0.033 Hz), even though packets were arriving correctly throughout that time.  That's what I'm measuring here.
> 
> Most links are much less severe than that, of course, but it's this kind of thing that stops radio and video streaming from working properly.
> 
> On the much less severe end of the scale, this will also measure the burstiness of flows in the case when there's more than one at once - usually you will get a bunch of packets from one flow, then a bunch from another, and so on, but SFQ tends to fix that for you if you have it.  It will probably also pick up some similar effects from 802.11n aggregation and other link-level congestion-avoidance techniques.
> 
>  - Jonathan
> 
-- 
Richard C. Pitt                 Pacific Data Capture
rcpitt at pacdat.net               604-644-9265
http://digital-rag.com          www.pacdat.net
PGP Fingerprint: FCEF 167D 151B 64C4 3333  57F0 4F18 AF98 9F59 DD73