From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dberger@student.ethz.ch>
Received: from mail.comsound.de (mail.comsound.de [176.9.164.96])
	by huchra.bufferbloat.net (Postfix) with ESMTP id 73BA221F0DD;
	Sun, 11 Nov 2012 00:35:25 -0800 (PST)
Received: from [192.168.1.57] (201-247.196-178.cust.bluewin.ch
	[178.196.247.201])
	by mail.comsound.de (Postfix) with ESMTPSA id 446E96301;
	Sun, 11 Nov 2012 09:43:40 +0100 (CET)
Message-ID: <509F6474.5050203@student.ethz.ch>
Date: Sun, 11 Nov 2012 09:40:20 +0100
From: Daniel Berger <dberger@student.ethz.ch>
Organization: ETH Zurich
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:15.0) Gecko/20120910 Thunderbird/15.0.1
MIME-Version: 1.0
To: bloat <bloat@lists.bufferbloat.net>
References: <CAA93jw4fKBEoCy1d_+nPekODDbYP5dow1J9WNHxiy0oJvARsOQ@mail.gmail.com>
In-Reply-To: <CAA93jw4fKBEoCy1d_+nPekODDbYP5dow1J9WNHxiy0oJvARsOQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Sun, 11 Nov 2012 02:15:49 -0800
Cc: bloat-devel <bloat-devel@lists.bufferbloat.net>,
	cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] [Bloat] Network tests as discussed in
	Washington, DC
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 11 Nov 2012 08:35:25 -0000

Hi everybody,

I totally love the idea to test for browsing performance. Thanks for
that ;-)
Nevertheless, I have another critical question on this 40s network test
idea:
Did someone consider the robustness of the results? That is, did sb
check for statistical significance?
I currently see that there are two steps:
First, the test with few load, which shows (I guess) low jitter/variance.
Second, busy queues.
 This second "phase" is probably when jitter/variance will inflate a
lot, right?
 Then, also the mean (and most other statistical summary-measures) won't
be stable.
  Thus, I doubt that in order to compute an aggregate "score" we can
rely on this, in all cases.

Obviously the best solution would be to run the test long enough so that
confidence intervals appear to be small and similar for both steps.
Probably is not feasible to expand the test into unusual long intervals
but at least computing a 95% confidence interval would give me a better
sense of results.

Doing this might also be a means to account for a broad variety of
testing/real-world environment and still get reliable results.

Anyone else with this thought?

Cheers,
Daniel