[Codel] RFC: Realtime Response Under Load (rrul) test specification

Tue Nov 6 15:52:27 EST 2012

Dave Taht wrote:
> I have been working on developing a specification for testing networks
> more effectively for various side effects of bufferbloat, notably
> gaming and voip performance, and especially web performance.... as
> well as a few other things that concerned me, such as IPv6 behaviour,
> and the effects of packet classification.
>
> A key goal is to be able to measure the quality of the user experience
> while a network is otherwise busy, with complex stuff going on in the
> background, but with a simple presentation of the results in the end,
> in under 60 seconds.

Rick Jones <rick.jones2 at hp.com> replied:
| Would you like fries with that?
|
| Snark aside, I think that being able to capture the state of the user
| experience in only 60 seconds is daunting at best.  Especially if
| this testing is going to run over the Big Bad Internet (tm) rather
| than in a controlled test lab.

> This portion of the test will take your favourite website as a target
> and show you how much it will slow down, under load.

| Under load on the website itself, or under load on one's link.  I
| ass-u-me the latter, but that should be made clear.  And while the
| chances of the additional load on a web site via this testing is
| likely epsilon, there is still the matter of its "optics" if you will
| - how it looks.  Particularly if there is going to be something
| distributed with a default website coded into it.

This, contraintuitive as it might sound, is what will make the exercise
work: an indication as a ratio (a non-dimensional measure) of how much
the response-time of a known site is degraded by the network going into
queue delay.

We're assuming a queuing centre, the website, that is running at a
steady speed and load throughout the short test,  and is NOT the
bottleneck.  When we increase the load on the network, it becomes the
bottleneck, a queue builds up, and the degradation is directly
proportional to the network being delayed.

A traditional measure in capacity planning is quite similar to what you
describe: the "stretch factor" is the ratio of the sitting-in-a-queue
delay to the normal service time of the network. When it's above 1,
you're spending as much time twiddling your thumbs as you are doing
work, and each additional bit of load will increase the delay and the
ratio dramatically.

I don't know if this will reproduce, but this, drawn as a curve against
load, the ratio you describe will look like a hockey-stick:

............................./
3.........................../
.........................../
........................../
2......................../
......................../
......................./
1....................-
._________----------

0....5....10....15....20....25

Ratio is the Y-axis, load is the X, and the periods are supposed to be
blank spaces (;-))

At loads 1-18 or so, the ratio is < 1 and grows quite slowly.
Above 20, the ratio is >> 1 and grows very rapidly, and without bound

The results will look like this, and the graphic-equalizer display will
tell the reader where the big components of the slowness are coming
from.  Pretty classic capacity planning, by folks like Gunther.

Of course, if the web site you're measuring gets DDOSed in the middle of
the test, Your Mileage May Vary!

--dave
-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb at spamcop.net           |                      -- Mark Twain
(416) 223-8968