[Bloat] Measuring latency-under-load consistently

Sat Mar 12 16:57:22 EST 2011

On 12 Mar, 2011, at 6:04 am, richard wrote:

> OK - you make a good case for a new measure as my understanding of
> jitter is latency related and typically measured at the link level (udp)
> rather than at the application level.
> 
> I infer then that this will do things like impact the CPU load and disk
> load, and might for example introduce "ringing" or harmonics into such
> sub systems if/when applications end up "in sync" due to being "less
> smooth" in their data output to the lower level IP levels.

I'm not sure how significant those effects would be, compared to simple data starvation at the client.  Most Web servers operate with all the frequently-accessed data in RAM (via disk cache) and serve many clients at once or in quick succession, whose network paths don't have the same bottlenecks in general.

It was my understanding that UDP-bsed protocols tended to tolerate packet loss through redundancy and graceful degradation rather than retransmission, though there are always exceptions to the rule.  So a video streaming server would be transmitting smoothly, with the client giving feedback on how much data had been received and how much packet loss it was experiencing.  Even if that status information is considerably delayed, I don't see why load spikes at the server should occur.

A fileserver, on the other hand, would not care very much.  Even if the TCP window has grown to a megabyte, it takes longer to seek disk heads than to read that much off the platter, so these lumps would be absorbed by the normal readahead and elevator algorithms anyway.  However, large TCP windows do consume RAM in both server and client, and with a sufficient number of simultaneous clients, that could theoretically cause trouble.  Constraining TCP windows to near the actual BDP is more efficient all around.

> It will be affected by session drops due to timeouts as well as the need
> to "fill the pipe" on a reconnect in such applications as streaming
> video (my area) so that a key frame can come into the system and restart
> the interrupted video play.

In the event of a TCP session drop, I think I will consider it a test failure and give zero scores across the board.  Sufficient delay or packet loss to cause that indicates a pretty badly broken network.

With that said, I can think of a case where it is likely to happen.  Remember that I was seeing 30 seconds of buffering on a 500kbps 3G link...  now what happens if the link drops to GPRS speeds?  There would be over a megabyte of data queued up behind a 50Kbps link (at best).  No wonder stuff basically didn't work when that happened.

 - Jonathan