[Codel] RFC: Realtime Response Under Load (rrul) test specification

CoDel AQM discussions
 help / color / mirror / Atom feed

* [Codel] RFC: Realtime Response Under Load (rrul) test specification
       [not found] <mailman.3.1352232001.18990.codel@lists.bufferbloat.net>
@ 2012-11-06 20:52 ` David Collier-Brown
  2012-11-09 10:21   ` Dave Taht
  0 siblings, 1 reply; 6+ messages in thread
From: David Collier-Brown @ 2012-11-06 20:52 UTC (permalink / raw)
  To: codel

Dave Taht wrote:
> I have been working on developing a specification for testing networks
> more effectively for various side effects of bufferbloat, notably
> gaming and voip performance, and especially web performance.... as
> well as a few other things that concerned me, such as IPv6 behaviour,
> and the effects of packet classification.
>
> A key goal is to be able to measure the quality of the user experience
> while a network is otherwise busy, with complex stuff going on in the
> background, but with a simple presentation of the results in the end,
> in under 60 seconds.

Rick Jones <rick.jones2@hp.com> replied:
| Would you like fries with that?
|
| Snark aside, I think that being able to capture the state of the user
| experience in only 60 seconds is daunting at best.  Especially if
| this testing is going to run over the Big Bad Internet (tm) rather
| than in a controlled test lab.

> This portion of the test will take your favourite website as a target
> and show you how much it will slow down, under load.

| Under load on the website itself, or under load on one's link.  I
| ass-u-me the latter, but that should be made clear.  And while the
| chances of the additional load on a web site via this testing is
| likely epsilon, there is still the matter of its "optics" if you will
| - how it looks.  Particularly if there is going to be something
| distributed with a default website coded into it.

This, contraintuitive as it might sound, is what will make the exercise
work: an indication as a ratio (a non-dimensional measure) of how much
the response-time of a known site is degraded by the network going into
queue delay.

We're assuming a queuing centre, the website, that is running at a
steady speed and load throughout the short test,  and is NOT the
bottleneck.  When we increase the load on the network, it becomes the
bottleneck, a queue builds up, and the degradation is directly
proportional to the network being delayed.

A traditional measure in capacity planning is quite similar to what you
describe: the "stretch factor" is the ratio of the sitting-in-a-queue
delay to the normal service time of the network. When it's above 1,
you're spending as much time twiddling your thumbs as you are doing
work, and each additional bit of load will increase the delay and the
ratio dramatically.

I don't know if this will reproduce, but this, drawn as a curve against
load, the ratio you describe will look like a hockey-stick:

............................./
3.........................../
.........................../
........................../
2......................../
......................../
......................./
1....................-
._________----------

0....5....10....15....20....25

Ratio is the Y-axis, load is the X, and the periods are supposed to be
blank spaces (;-))

At loads 1-18 or so, the ratio is < 1 and grows quite slowly.
Above 20, the ratio is >> 1 and grows very rapidly, and without bound

The results will look like this, and the graphic-equalizer display will
tell the reader where the big components of the slowness are coming
from.  Pretty classic capacity planning, by folks like Gunther.

Of course, if the web site you're measuring gets DDOSed in the middle of
the test, Your Mileage May Vary!

--dave
-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb@spamcop.net           |                      -- Mark Twain
(416) 223-8968

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Codel] RFC: Realtime Response Under Load (rrul) test specification
  2012-11-06 20:52 ` [Codel] RFC: Realtime Response Under Load (rrul) test specification David Collier-Brown
@ 2012-11-09 10:21   ` Dave Taht
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Taht @ 2012-11-09 10:21 UTC (permalink / raw)
  To: davecb; +Cc: codel, cerowrt-devel, bloat

On Tue, Nov 6, 2012 at 9:52 PM, David Collier-Brown <davec-b@rogers.com> wrote:
> Dave Taht wrote:
>> I have been working on developing a specification for testing networks
>> more effectively for various side effects of bufferbloat, notably
>> gaming and voip performance, and especially web performance.... as
>> well as a few other things that concerned me, such as IPv6 behaviour,
>> and the effects of packet classification.
>>
>> A key goal is to be able to measure the quality of the user experience
>> while a network is otherwise busy, with complex stuff going on in the
>> background, but with a simple presentation of the results in the end,
>> in under 60 seconds.
>
>
> Rick Jones <rick.jones2@hp.com> replied:
> | Would you like fries with that?
> |
> | Snark aside, I think that being able to capture the state of the user
> | experience in only 60 seconds is daunting at best.  Especially if
> | this testing is going to run over the Big Bad Internet (tm) rather
> | than in a controlled test lab.
>
>
>> This portion of the test will take your favourite website as a target
>> and show you how much it will slow down, under load.
>
> | Under load on the website itself, or under load on one's link.  I
> | ass-u-me the latter, but that should be made clear.  And while the
> | chances of the additional load on a web site via this testing is
> | likely epsilon, there is still the matter of its "optics" if you will
> | - how it looks.  Particularly if there is going to be something
> | distributed with a default website coded into it.
>
>
> This, contraintuitive as it might sound, is what will make the exercise
> work: an indication as a ratio (a non-dimensional measure) of how much
> the response-time of a known site is degraded by the network going into
> queue delay.

Exactly! The core comparison of this test is unloaded vs loaded
behavior of a network, which is to a large extent independent of the
underlying raw bandwidth.

I should work harder in bringing this out in the document. I note that
the most core component of the benchmark really is web performance
without and then with load, as exemplified by the short video herein:

http://gettys.wordpress.com/2012/02/01/bufferbloat-demonstration-videos/

with the dozens of dns lookups and short tcp streams that entails.

Regrettably emulating that behavior is hard, so being able to a/b a
random website while under the kinds of loads generated by rrul is a
key intent.

while there are interesting factoids to be gained by the behavior of
the elephantine TCP flows in relation to each other, it's the behavior
of the thinner flows that matters the most.

> We're assuming a queuing centre, the website, that is running at a
> steady speed and load throughout the short test,  and is NOT the
> bottleneck.  When we increase the load on the network, it becomes the
> bottleneck, a queue builds up, and the degradation is directly
> proportional to the network being delayed.
>
> A traditional measure in capacity planning is quite similar to what you
> describe: the "stretch factor" is the ratio of the sitting-in-a-queue
> delay to the normal service time of the network. When it's above 1,
> you're spending as much time twiddling your thumbs as you are doing
> work, and each additional bit of load will increase the delay and the
> ratio dramatically.

I like the stretch factor concept, a lot.

>
> I don't know if this will reproduce, but this, drawn as a curve against
> load, the ratio you describe will look like a hockey-stick:
>
> ............................./
> 3.........................../
> .........................../
> ........................../
> 2......................../
> ......................../
> ......................./
> 1....................-
> ._________----------
>
> 0....5....10....15....20....25
>
> Ratio is the Y-axis, load is the X, and the periods are supposed to be
> blank spaces (;-))
>
> At loads 1-18 or so, the ratio is < 1 and grows quite slowly.
> Above 20, the ratio is >> 1 and grows very rapidly, and without bound
>
> The results will look like this, and the graphic-equalizer display will
> tell the reader where the big components of the slowness are coming
> from.  Pretty classic capacity planning, by folks like Gunther.
>
> Of course, if the web site you're measuring gets DDOSed in the middle of
> the test, Your Mileage May Vary!
>
> --dave
> --
> David Collier-Brown,         | Always do right. This will gratify
> System Programmer and Author | some people and astonish the rest
> davecb@spamcop.net           |                      -- Mark Twain
> (416) 223-8968
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Codel] RFC: Realtime Response Under Load (rrul) test specification
@ 2012-11-06 12:42 Dave Taht
  2012-11-06 18:14 ` Rick Jones
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Taht @ 2012-11-06 12:42 UTC (permalink / raw)
  To: bloat, codel, cerowrt-devel

I have been working on developing a specification for testing networks
more effectively for various side effects of bufferbloat, notably
gaming and voip performance, and especially web performance.... as
well as a few other things that concerned me, such as IPv6 behavior,
and the effects of packet classification.

A key goal is to be able to measure the quality of the user experience
while a network is otherwise busy, with complex stuff going on in the
background, but with a simple presentation of the results in the end,
in under 60 seconds.

While it's not done yet, it escaped into the wild today, and I might
as well solicit wider opinions on it, sooo... get the spec at:

https://github.com/dtaht/deBloat/blob/master/spec/rrule.doc?raw=true

Portions of the test are being prototyped in the netperf-wrappers repo
on github. The initial results of the rrul test on several hotel
networks I've tried it on are "interesting". Example:
http://www.teklibre.com/~d/rrul2_conference.pdf

A major sticking point at the moment is to come up with an equivalent
of the chrome-benchmarks for measuring relative web page performance
with and without a network load, or to merely incorporate some
automated form of that benchmark into the overall test load.

The end goal is to have a complex, comprehensive benchmark of some
core networking issues, that produces simple results, whether they be
via a java tool like icsi's, or via flash on the web, or the command
line, via something like netperf.

Related resources:

netperf 2.6 or later running on a fairly nearby server
https://github.com/tohojo/netperf-wrapper
python-matplotlib

I look forward to your comments.

-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Codel] RFC: Realtime Response Under Load (rrul) test specification
  2012-11-06 12:42 Dave Taht
@ 2012-11-06 18:14 ` Rick Jones
  2012-11-09 10:34   ` Dave Taht
  0 siblings, 1 reply; 6+ messages in thread
From: Rick Jones @ 2012-11-06 18:14 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, cerowrt-devel, bloat

On 11/06/2012 04:42 AM, Dave Taht wrote:
> I have been working on developing a specification for testing networks
> more effectively for various side effects of bufferbloat, notably
> gaming and voip performance, and especially web performance.... as
> well as a few other things that concerned me, such as IPv6 behavior,
> and the effects of packet classification.
>
> A key goal is to be able to measure the quality of the user experience
> while a network is otherwise busy, with complex stuff going on in the
> background, but with a simple presentation of the results in the end,
> in under 60 seconds.

Would you like fries with that?

Snark aside, I think that being able to capture the state of the user 
experience in only 60 seconds is daunting at best.  Especially if this 
testing is going to run over the Big Bad Internet (tm) rather than in a 
controlled test lab.

> While it's not done yet, it escaped into the wild today, and I might
> as well solicit wider opinions on it, sooo... get the spec at:
>
> https://github.com/dtaht/deBloat/blob/master/spec/rrule.doc?raw=true

Github is serving that up as a plain text file, which then has Firefox 
looking to use gedit to look at the file, and gedit does not seem at all 
happy with it.  It was necessary to download the file and open it 
"manually" in LibreOffice.

> MUST run long enough to defeat bursty bandwidth optimizations such as
> PowerBoost and discard data from that interval.

I'll willingly display my ignorance, but for how long does PowerBoost 
and its cousins boost bandwidth?

I wasn't looking for PowerBoost, and given the thing being examined I 
wasn't seeing that, but recently when I was evaluating the network 
performance of something "out there" in the cloud (not my home cloud as 
it were though) I noticed performance spikes repeating at intervals 
which would require > 60 seconds to "defeat"

> MUST track and sum bi-directional throughput, using estimates for ACK
> sizes of ipv4, ipv6, and encapsulated ipv6 packets, udp and tcp_rr
> packets, etc.

Estimating the bandwidth consumed by ACKs and/or protocol headers, using 
code operating at user-space, is going to be guessing.  Particularly 
portable user-space.  While those things may indeed affect the user's 
experience, the user doesn't particularly care about ACKs or header 
sizes.  She cares how well the page loads or the call sounds.

> MUST have the test server(s) within 80ms of the testing client

Why?  Perhaps there is something stating that some number of nines worth 
of things being accessed are within 80ms of the user.  If there is, that 
should be given in support of the requirement.

> This portion of the test will take your favorite website as a target
> and show you how much it will slow down, under load.

Under load on the website itself, or under load on one's link.  I 
ass-u-me the latter, but that should be made clear.  And while the 
chances of the additional load on a web site via this testing is likely 
epsilon, there is still the matter of its "optics" if you will - how it 
looks.  Particularly if there is going to be something distributed with 
a default website coded into it.

Further, websites are not going to remain static, so there will be the 
matter of being able to compare results over time.  Perhaps that can be 
finessed with the "unloaded" (again I assume relative to the link of 
interest/test) measurement.

rick jones

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Codel] RFC: Realtime Response Under Load (rrul) test specification
  2012-11-06 18:14 ` Rick Jones
@ 2012-11-09 10:34   ` Dave Taht
  2012-11-09 17:57     ` Rick Jones
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Taht @ 2012-11-09 10:34 UTC (permalink / raw)
  To: Rick Jones; +Cc: codel, cerowrt-devel, bloat

On Tue, Nov 6, 2012 at 7:14 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 11/06/2012 04:42 AM, Dave Taht wrote:
>>
>> I have been working on developing a specification for testing networks
>> more effectively for various side effects of bufferbloat, notably
>> gaming and voip performance, and especially web performance.... as
>> well as a few other things that concerned me, such as IPv6 behavior,
>> and the effects of packet classification.
>>
>> A key goal is to be able to measure the quality of the user experience
>> while a network is otherwise busy, with complex stuff going on in the
>> background, but with a simple presentation of the results in the end,
>> in under 60 seconds.
>
>
> Would you like fries with that?

and a shake!

>
> Snark aside, I think that being able to capture the state of the user
> experience in only 60 seconds is daunting at best.  Especially if this

Concur.

> testing is going to run over the Big Bad Internet (tm) rather than in a
> controlled test lab.

In my testing of this scheme, from networks ranging in size and
quality from a 4 hop mesh network to the internet, to testing at
random hotels throughout the US and eu at baseline RTT up to 200ms, to
lab testing at multiple other locations, I was generally able to
generate a load that had "interesting" side effects inside of 40
seconds, and generally shorter.

My suggestion, as I always do, is for you (and others) to simply try
out the prototypes that are in the netperf-wrapper git repo, and see
what you can see, and learn what you can learn, and feedback what you
can.

In the longer term I would certainly like the simplest test for
latency under load added directly to netperf. The VOIP test is also
nifty.

>
>
>> While it's not done yet, it escaped into the wild today, and I might
>> as well solicit wider opinions on it, sooo... get the spec at:
>>
>> https://github.com/dtaht/deBloat/blob/master/spec/rrule.doc?raw=true
>
>
> Github is serving that up as a plain text file, which then has Firefox
> looking to use gedit to look at the file, and gedit does not seem at all
> happy with it.  It was necessary to download the file and open it "manually"
> in LibreOffice.

Sorry. The original was in emacs org mode. Shall I put that up instead?

>
>> MUST run long enough to defeat bursty bandwidth optimizations such as
>> PowerBoost and discard data from that interval.
>
>
> I'll willingly display my ignorance, but for how long does PowerBoost and
> its cousins boost bandwidth?
>
> I wasn't looking for PowerBoost, and given the thing being examined I wasn't
> seeing that, but recently when I was evaluating the network performance of
> something "out there" in the cloud (not my home cloud as it were though) I
> noticed performance spikes repeating at intervals which would require > 60
> seconds to "defeat"

I too have seen oddball spikes - for example, older forms of sfq do
permutation every 10 seconds and totally wipe out many tcp connections
by doing so.

I regard your problem detailed above as an edge case, compared to the
much more gross effects this benchmark generates.

Certainly being able to run the tests for longer intervals and capture
traffic would be useful for network engineers.


>
>> MUST track and sum bi-directional throughput, using estimates for ACK
>> sizes of ipv4, ipv6, and encapsulated ipv6 packets, udp and tcp_rr
>> packets, etc.
>
>
> Estimating the bandwidth consumed by ACKs and/or protocol headers, using
> code operating at user-space, is going to be guessing.  Particularly
> portable user-space.  While those things may indeed affect the user's
> experience, the user doesn't particularly care about ACKs or header sizes.
> She cares how well the page loads or the call sounds.

I feel an "optimum" ack overhead should be calculated, vs the actual
(which is impossible)
>> MUST have the test server(s) within 80ms of the testing client
>
>
> Why?  Perhaps there is something stating that some number of nines worth of
> things being accessed are within 80ms of the user.  If there is, that should
> be given in support of the requirement.

Con-US distance. Despite me pushing the test to 200ms, I have a great
deal more confidence it will work consistently at 80ms.

Can make this a "SHOULD" if you like.

>> This portion of the test will take your favorite website as a target
>> and show you how much it will slow down, under load.
>
>
> Under load on the website itself, or under load on one's link.  I ass-u-me
> the latter, but that should be made clear.  And while the chances of the
> additional load on a web site via this testing is likely epsilon, there is
> still the matter of its "optics" if you will - how it looks.  Particularly
> if there is going to be something distributed with a default website coded
> into it.
>
> Further, websites are not going to remain static, so there will be the
> matter of being able to compare results over time.  Perhaps that can be
> finessed with the "unloaded" (again I assume relative to the link of
> interest/test) measurement.

A core portion of the test really is comparing unloaded vs loaded
performance of the same place, in the same test, over the course of
about a minute.

And as these two baseline figures are kept, those can be compared for
any given website from any given location, over history, and changes
in the underlying network.

> rick jones



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Codel] RFC: Realtime Response Under Load (rrul) test specification
  2012-11-09 10:34   ` Dave Taht
@ 2012-11-09 17:57     ` Rick Jones
  0 siblings, 0 replies; 6+ messages in thread
From: Rick Jones @ 2012-11-09 17:57 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, cerowrt-devel, bloat

>> Github is serving that up as a plain text file, which then has Firefox
>> looking to use gedit to look at the file, and gedit does not seem at all
>> happy with it.  It was necessary to download the file and open it "manually"
>> in LibreOffice.
>
> Sorry. The original was in emacs org mode. Shall I put that up instead?

Just make sure the file has the correct MIME type (?) associated with it 
and I think it will be fine.

>> Estimating the bandwidth consumed by ACKs and/or protocol headers, using
>> code operating at user-space, is going to be guessing.  Particularly
>> portable user-space.  While those things may indeed affect the user's
>> experience, the user doesn't particularly care about ACKs or header sizes.
>> She cares how well the page loads or the call sounds.
>
> I feel an "optimum" ack overhead should be calculated, vs the actual
> (which is impossible)

Well, keep in mind that there will be cases where the two will be rather 
divergent.  Consider a request/response sort of exchange.  For excessive 
simplicity assume a netperf TCP_RR test.  Presumably for the single byte 
case, there will be no ACKs - they will all be piggy-backed on the 
segments carrying the requests and responses.  But now suppose there was 
a little think time in there - say to do a disc I/O or a back-end query 
or whatnot.  That may or may not make the response to the request or the 
next request after a response come after the stack's standalone ACK 
interval, which is a value we will not know up in user space.

Now make the responses longer and cross the MSS threshold - say 
something like 8KB.  We might ass-u-me an ACK-every-two-MSS, and we can 
get the MSS from user space (at least under *nix) but we will not know 
if GRO is present, enabled, or even effective from up at user space. 
And if GRO is working, rather than sending something like 5 or 6 ACKs 
for that 8KB the stack will have sent just one.

>>> MUST have the test server(s) within 80ms of the testing client
>>
>>
>> Why?  Perhaps there is something stating that some number of nines worth of
>> things being accessed are within 80ms of the user.  If there is, that should
>> be given in support of the requirement.
>
> Con-US distance. Despite me pushing the test to 200ms, I have a great
> deal more confidence it will work consistently at 80ms.
>
> Can make this a "SHOULD" if you like.

MUST or SHOULD, either way you should... include the reason for the 
requirement/request.

>
>>> This portion of the test will take your favorite website as a target
>>> and show you how much it will slow down, under load.
>>
>>
>> Under load on the website itself, or under load on one's link.  I ass-u-me
>> the latter, but that should be made clear.  And while the chances of the
>> additional load on a web site via this testing is likely epsilon, there is
>> still the matter of its "optics" if you will - how it looks.  Particularly
>> if there is going to be something distributed with a default website coded
>> into it.
>>
>> Further, websites are not going to remain static, so there will be the
>> matter of being able to compare results over time.  Perhaps that can be
>> finessed with the "unloaded" (again I assume relative to the link of
>> interest/test) measurement.
>
> A core portion of the test really is comparing unloaded vs loaded
> performance of the same place, in the same test, over the course of
> about a minute.
>
> And as these two baseline figures are kept, those can be compared for
> any given website from any given location, over history, and changes
> in the underlying network.

Adding further clarity on specifically *what* is presumed to be 
unloaded/loaded and calling-out the assumption that the web server being 
accessed will itself have uniform loading for the duration of the test 
would be goodness.

David Collier-Brown mentioned "stretch factor" - the ratio of the 
unloaded vs loaded delay (assuming I've interpreted what he wrote 
correctly).   Comparing stretch factors (as one is tweaking things) 
still calls for a rather consistent-over-time baseline doesn't it? (what 
David referred to as the "normal service time") If I target webserver 
foo.com on Monday, and on Monday I see an unloaded-network latency to it 
of 100ms and loaded of 200ms that would be a stretch factor of 2 yes? 
If I then look again on Tuesday, having made some change to my network 
under test that causes it to add only 75 ms, if unloaded access to 
webserver foo.com is for some reason 50 ms I will have a stretch factor 
of 2.5. That is something which will need to be kept in mind.

rick

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-09 17:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.3.1352232001.18990.codel@lists.bufferbloat.net>
2012-11-06 20:52 ` [Codel] RFC: Realtime Response Under Load (rrul) test specification David Collier-Brown
2012-11-09 10:21   ` Dave Taht
2012-11-06 12:42 Dave Taht
2012-11-06 18:14 ` Rick Jones
2012-11-09 10:34   ` Dave Taht
2012-11-09 17:57     ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox