[Bloat] Network tests as discussed in Washington, DC

Sun Nov 11 03:35:25 EST 2012

Hi everybody,

I totally love the idea to test for browsing performance. Thanks for
that ;-)
Nevertheless, I have another critical question on this 40s network test
idea:
Did someone consider the robustness of the results? That is, did sb
check for statistical significance?
I currently see that there are two steps:
First, the test with few load, which shows (I guess) low jitter/variance.
Second, busy queues.
 This second "phase" is probably when jitter/variance will inflate a
lot, right?
 Then, also the mean (and most other statistical summary-measures) won't
be stable.
  Thus, I doubt that in order to compute an aggregate "score" we can
rely on this, in all cases.

Obviously the best solution would be to run the test long enough so that
confidence intervals appear to be small and similar for both steps.
Probably is not feasible to expand the test into unusual long intervals
but at least computing a 95% confidence interval would give me a better
sense of results.

Doing this might also be a means to account for a broad variety of
testing/real-world environment and still get reliable results.

Anyone else with this thought?

Cheers,
Daniel