From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f43.google.com (mail-ee0-f43.google.com [74.125.83.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id E16FB21F164; Sun, 11 Nov 2012 05:39:07 -0800 (PST) Received: by mail-ee0-f43.google.com with SMTP id e49so389629eek.16 for ; Sun, 11 Nov 2012 05:39:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=cxcWrrc4XhP3kKIPv8Vu/EwUhwKRNMnpsjCP4ub0cgg=; b=x+u5bjgI9AAwlIJVZsNOW47OWu9wibz4vd7S8s1rQIrZspqoe3hDjZqxL1baIRD9A6 b4wW4fo6aCxZCQq0r02I2eo4LT/UNmj/MBB/UNgIbYr8K+5phmy3F1dK+X+6Z7npUg9X WHY6REYVmhosQH1I0noWz7c/1CwQCkyVzBaH4bQ1CbSoMN1ELroMssR+Kp/yzIFIgc8y hWqOxZ64EbuIqVCZBHdYpQ9p9M1DgjCtcn6htV92/PBRANxbMz0VYSeuiyFz0r35c6aa nMzHk65kHqMdpWtHT2rouim7jr3ZhLdrdM8zlkMq2ign9eqgbmLOEYfbh/a8doXaDnsl wkkQ== MIME-Version: 1.0 Received: by 10.14.199.134 with SMTP id x6mr53881398een.31.1352641145780; Sun, 11 Nov 2012 05:39:05 -0800 (PST) Received: by 10.223.180.10 with HTTP; Sun, 11 Nov 2012 05:39:05 -0800 (PST) In-Reply-To: <509F6474.5050203@student.ethz.ch> References: <509F6474.5050203@student.ethz.ch> Date: Sun, 11 Nov 2012 14:39:05 +0100 Message-ID: Subject: Re: [Cerowrt-devel] [Bloat] Network tests as discussed in Washington, DC From: Dave Taht To: Daniel Berger , frank.rowand@am.sony.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: bloat-devel , cerowrt-devel@lists.bufferbloat.net, bloat X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 13:39:08 -0000 On Sun, Nov 11, 2012 at 9:40 AM, Daniel Berger wr= ote: > Hi everybody, > > I totally love the idea to test for browsing performance. Thanks for > that ;-) Jim's demos of the effect of network load on the performance of web sites are quite revealing, http://gettys.wordpress.com/2012/02/01/bufferbloat-demonstration-videos/ using the chrome web page benchmark available here: https://chrome.google.com/webstore/detail/page-benchmarker/channimfdomahekj= cahlbpccbgaopjll You can fairly easily replicate his results on your own hardware, both locally and over the internet. Go for it! However in attempting to get to a general purpose test, the simplicity of his demo (which used a very short path to MIT) didn't work well, thus, I came up with the methods described in the rRul document. They seem to scale fairly well up past 60ms RTT. More testers would be nice! One of the things that really bugs me about today's overbuffered networks is doing things like a file upload via scp, which nearly completes, then stops, and retransmits, over and over again, like jon corbet's example of what happened to him at a conference hotel last year. http://lwn.net/Articles/496509/ > Nevertheless, I have another critical question on this 40s network test > idea: > Did someone consider the robustness of the results? That is, did sb > check for statistical significance? Presently the effects on multiple sorts of networks are interesting. As one example, here is a run of one rrul prototype on wired and wifi toke put together: http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wired-pfifo-fast.pdf vs http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wlan2.pdf I LOVE the first graph (configured for pfifo_fast on the gateways) as it clearly shows classic drop tail "TCP global synchronization" on the egress gateway, and the resulting loss of utilization. It's nice to have been able to get it on a 50+ms *real-world* path. It also shows how traffic classification of TCP doesn't work across the internet very well, as the TCP's classified, different ways, evolve and change places. The second (taken on a good wifi) shows how noisy the data is.. http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wlan2.pdf (I note that using a TCP "ping" is a bad idea except for showing why tcp encapsulated inside tcp is a bad idea, which gets progressively worse at longer RTTs. Anyone have a decent RTP test we can replace this with?) A graph taken against a loaded wifi network is pretty horrify-ing... http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Buffer= bloat_on_wifi.pdf (don't look. Halloween is over) I have a ton of interesting statistics gathered at ietf and linuxcon this past week... but finding good ways to present it remain a problem and I note that most of the stuff above is intended as a BACKGROUND process while loading web pages and doing useful work like making phone calls is the real intended result of the benchmark. So, no, the only statistical significance so far calculated is that tests like this can cause a network to have one to three orders of magnitude of latency inserted into it. Compared to that, I'm not terribly concerned with a few percentage points here or there, at this time, but I'd welcome analysis. The biggest major unknown in the test is the optimal TCP ack count, and TCP's response to packet loss (retransmits) which could account for a great deal of the actual data transmitted, vs the amount of useful data transmitted. "useful data transmitted under no load and under load" would be tremendously useful statistic. It is my hope that the volume web and dns traffic projected to be in the test are going to be fairly minimal compared to the rest of it, but I'm not counting on it. That needs to be measured too. It's a pretty big project to do this up right in other words! > I currently see that there are two steps: > First, the test with few load, which shows (I guess) low jitter/variance. > Second, busy queues. > This second "phase" is probably when jitter/variance will inflate a > lot, right? > Then, also the mean (and most other statistical summary-measures) won't > be stable. Correct. > Thus, I doubt that in order to compute an aggregate "score" we can > rely on this, in all cases. The "score" as a ratio of various measured parameters from unloaded to load seems viable. > Obviously the best solution would be to run the test long enough so that > confidence intervals appear to be small and similar for both steps. There is nothing stopping a network engineer, device driver writer, or device maker, or mathematician or network queue theorist or sysadmin, or manager or concerned citizen... from running the test continuously, going from unloaded, to load, to unload, to load, and tweaking various underlying variables in the network stack and path. I do this all the time! It is my hope, certainly, that those that should do so, will do so. A core component IS the "mtr" tool which will point at the issues on the bottleneck link, which might be anything from the local OS, or device, to wireless ap, to cpe, to somewhere else on the path. Giving the end user data with (occasionally) something other than their ISP to blame would be a goodness, and having tools available to find and fix it, even better. However, the average citizen is not going to sit still for 60 seconds on a regular basis, which is the purpose of trying to come up with a score and panel of useful results that can be presented briefly and clearly. I also have hope that a test as robust and stressful as this can be run on edge gateways automatically, in the background, on selected routers throughout the world, much as bismark already does. See examples at: http://networkdashboard.org/ > Probably is not feasible to expand the test into unusual long intervals > but at least computing a 95% confidence interval would give me a better > sense of results. Go for it! "Bufferbloat.net: Making network research fun since 2011!" I note that the rRul work being done right now is the spare time project of myself and one grad student, leveraging the hard work that has been put into the Linux OS over the last year by so many, and the multitude of useful enhancements like classification, priority and congestion control algorithm that rick jones has put into netperf over the past year, also in his spare time. No funding for this work has yet arrived. Various proposals for grants have been ignored, but we're not grant writing experts. Cerowrt is getting some lovely support from interested users, but the size of the task to get code written, analyzed, and tests deployed is intimidating. There are a wealth of other tests that can be performed, while under a RRUL-like load. For example, this december I'll be at the connext conference in Nice, with some early results from the lincs.fr lab regarding the interactions of AQM and LEDBAT. I hope to be doing some follow up work on that paper also in december, against codel and fq_codel, and more realistic representations of uTP. a rrul-like test would be useful for analyzing and creating comparitive the results from any congestion control algorithm, alone or in combination, such as TCP-LP, or DC-TCP, or (as one potentially very interesting example) the latest work done at MIT on their TCP, that I forget the name of right now. I am very interested in how video sharding technologies work - what often happens there is that there is a HTTP get of one of 10 seconds of video at various rates. The client measures the delivery time of that 10 second shard and increases or decreases the next get to suit. This generally pushes TCP into slow start, repeatedly, and slams the downstream portion of the network, repeatedly. Then there's videoconferencing. Which I care about a lot. I like it when people's lips match up with what they are saying, being partially deaf, myself. And gaming. I'd like very much to have a better picture (packet captures!) of how various online games such as quake, starcraft, and world of warcraft interact with the network. (I think this last item would be rather fun for a team of grad students to take on. Heck, I'd enjoy "working" on this portion of the problem, too. :) ) > Doing this might also be a means to account for a broad variety of > testing/real-world environment and still get reliable results. I would argue that settling on a clear definition of the tests, writing the code, and collecting a large set of data would be "interesting". As for being able to draw general conclusions from it, I generally propose that we prototype tests, and iterate, going deeply into packet captures, until we get things that make sense in the lab and in the field... and rapidly bug report everything that is found. A great number of pathological behaviors we've discovered so far have turned out to be bugs at various levels in various stacks. It's generally been rather difficult to get to a "paper-writing stage", the way my life seems to work looks like this: > Anyone else with this thought? An example of how you can fool yourself with network statistics, misapplied= : https://lists.bufferbloat.net/pipermail/bloat/2011-November/000715.html Frank Rowand gave a very good (heretical!) presentation on core analysis and presentation ideas at last weeks linuxconf - particularly when it comes to analyzing real time performance of anything. I don't know if it's up yet. I have generally found that using mountain and cdf plots are the best ways to deal with the extremely noisy data collected from wifi and over the open internet, and that having packet captures and tcp instrumentation is useful also. --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.= html