From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.comsound.de (mail.comsound.de [176.9.164.96]) by huchra.bufferbloat.net (Postfix) with ESMTP id 73BA221F0DD; Sun, 11 Nov 2012 00:35:25 -0800 (PST) Received: from [192.168.1.57] (201-247.196-178.cust.bluewin.ch [178.196.247.201]) by mail.comsound.de (Postfix) with ESMTPSA id 446E96301; Sun, 11 Nov 2012 09:43:40 +0100 (CET) Message-ID: <509F6474.5050203@student.ethz.ch> Date: Sun, 11 Nov 2012 09:40:20 +0100 From: Daniel Berger Organization: ETH Zurich User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120910 Thunderbird/15.0.1 MIME-Version: 1.0 To: bloat References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Sun, 11 Nov 2012 02:15:49 -0800 Cc: bloat-devel , cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] [Bloat] Network tests as discussed in Washington, DC X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 08:35:25 -0000 Hi everybody, I totally love the idea to test for browsing performance. Thanks for that ;-) Nevertheless, I have another critical question on this 40s network test idea: Did someone consider the robustness of the results? That is, did sb check for statistical significance? I currently see that there are two steps: First, the test with few load, which shows (I guess) low jitter/variance. Second, busy queues. This second "phase" is probably when jitter/variance will inflate a lot, right? Then, also the mean (and most other statistical summary-measures) won't be stable. Thus, I doubt that in order to compute an aggregate "score" we can rely on this, in all cases. Obviously the best solution would be to run the test long enough so that confidence intervals appear to be small and similar for both steps. Probably is not feasible to expand the test into unusual long intervals but at least computing a 95% confidence interval would give me a better sense of results. Doing this might also be a means to account for a broad variety of testing/real-world environment and still get reliable results. Anyone else with this thought? Cheers, Daniel