* Network tests as discussed in Washington, DC @ 2012-11-10 10:54 Dave Taht 2012-11-10 14:18 ` [Cerowrt-devel] " dpreed ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Dave Taht @ 2012-11-10 10:54 UTC (permalink / raw) To: bloat, cerowrt-devel, bloat-devel The RRUL test idea presented earlier this week would do best with many underlying servers, instrumented with TCP stats collection, and a large backend for analysis, and thus I got interested in the current state of affairs in the internet as to how to pull that together in conjunction with larger labs and universities. So, I started sorting through the debates at the FCC about network testing. A wide range of opinion is presented, AND represented - I randomly clicked on the names I recognised (like isc, verizon, karl auerbach, measurement labs, new horizon foundation, and many, many others. I'm pretty sure to get a balanced view I need to click on the names I don't recognise!) see position papers here: http://apps.fcc.gov/ecfs/proceeding/view.action?name=04-36 After reading a dozen, I sighed, and went back to coding. I'm just this guy, trying to fix bufferbloat, by all technical means necessary. Additionally we're working on fixing ipv6, home routers, home gateways, naming, etc - and I rarely pay attention to politics! If anyone can provide a summary here of the debate as it stands now and who the players are, it would be enlightening. In other news: I was very pleased to hear that Srikanth Sundaresan of the Bismark project ( http://projectbismark.net/ ) won the "Applied Networking Research Prize" at this past week's ietf ( http://irtf.org/anrp ), for: "Broadband Internet Performance: A View From the Gateway" http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p134.pdf An early version of cerowrt was used in some versions of bismark (they later went to openwrt stable). I continue to build their repos for possible use in cerowrt (or vice versa), and I'm contemplating extending their open sourced backend database schema to include data from the rrul tests.... https://github.com/dtaht/dashboard-db -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Cerowrt-devel] Network tests as discussed in Washington, DC 2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht @ 2012-11-10 14:18 ` dpreed 2012-11-11 8:35 ` [Bloat] " Daniel Berger 2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson 2 siblings, 0 replies; 7+ messages in thread From: dpreed @ 2012-11-10 14:18 UTC (permalink / raw) To: Dave Taht; +Cc: bloat-devel, cerowrt-devel, bloat [-- Attachment #1: Type: text/plain, Size: 3913 bytes --] I follow this pretty closely. The person you want to talk to is Sascha Meinrath, about M-Lab - and if you have a good proposal, he has money to fund a measurement project. M-Lab has been a partial failure politically. The reason is this: the FCC does not want to do anything that would annoy incumbents running current Internet access systems. Direct measurements of latency are too dangerous. If you speak to the only Commissioner who ever tried to get unbiased measurements (Copps) you will find that his attempts were sabotaged on every side. The same thing happened in the CRTC inquiry regarding Net Neutrality, where I testified as an "expert witness" on the argument that prioritization was necessary to reduce congestion. My testimony pointed out that congestion was best measured by direct latency measurements. The incumbent operators *refused* to provide latency data, claiming a) they never measured it, and b) latency can be calculated from average traffic volume measurements by "Little's Lemma", so it was unnecessary. (this last is categorically false, but it was "one expert vs. another"). I wish you godspeed in getting direct latency measurements publicized. The dataset collected by Netalyzr has been attacked by the incumbent operators pretty strongly as "biased". (of course the measurements the FCC commissioned to a contractor recommended by a Congressional staffer are "unbiased"). -----Original Message----- From: "Dave Taht" <dave.taht@gmail.com> Sent: Saturday, November 10, 2012 5:54am To: "bloat" <bloat@lists.bufferbloat.net>, cerowrt-devel@lists.bufferbloat.net, "bloat-devel" <bloat-devel@lists.bufferbloat.net> Subject: [Cerowrt-devel] Network tests as discussed in Washington, DC The RRUL test idea presented earlier this week would do best with many underlying servers, instrumented with TCP stats collection, and a large backend for analysis, and thus I got interested in the current state of affairs in the internet as to how to pull that together in conjunction with larger labs and universities. So, I started sorting through the debates at the FCC about network testing. A wide range of opinion is presented, AND represented - I randomly clicked on the names I recognised (like isc, verizon, karl auerbach, measurement labs, new horizon foundation, and many, many others. I'm pretty sure to get a balanced view I need to click on the names I don't recognise!) see position papers here: http://apps.fcc.gov/ecfs/proceeding/view.action?name=04-36 After reading a dozen, I sighed, and went back to coding. I'm just this guy, trying to fix bufferbloat, by all technical means necessary. Additionally we're working on fixing ipv6, home routers, home gateways, naming, etc - and I rarely pay attention to politics! If anyone can provide a summary here of the debate as it stands now and who the players are, it would be enlightening. In other news: I was very pleased to hear that Srikanth Sundaresan of the Bismark project ( http://projectbismark.net/ ) won the "Applied Networking Research Prize" at this past week's ietf ( http://irtf.org/anrp ), for: "Broadband Internet Performance: A View From the Gateway" http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p134.pdf An early version of cerowrt was used in some versions of bismark (they later went to openwrt stable). I continue to build their repos for possible use in cerowrt (or vice versa), and I'm contemplating extending their open sourced backend database schema to include data from the rrul tests.... https://github.com/dtaht/dashboard-db -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel [-- Attachment #2: Type: text/html, Size: 4669 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Network tests as discussed in Washington, DC 2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht 2012-11-10 14:18 ` [Cerowrt-devel] " dpreed @ 2012-11-11 8:35 ` Daniel Berger 2012-11-11 13:39 ` [Cerowrt-devel] " Dave Taht 2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson 2 siblings, 1 reply; 7+ messages in thread From: Daniel Berger @ 2012-11-11 8:35 UTC (permalink / raw) To: bloat; +Cc: bloat-devel, cerowrt-devel Hi everybody, I totally love the idea to test for browsing performance. Thanks for that ;-) Nevertheless, I have another critical question on this 40s network test idea: Did someone consider the robustness of the results? That is, did sb check for statistical significance? I currently see that there are two steps: First, the test with few load, which shows (I guess) low jitter/variance. Second, busy queues. This second "phase" is probably when jitter/variance will inflate a lot, right? Then, also the mean (and most other statistical summary-measures) won't be stable. Thus, I doubt that in order to compute an aggregate "score" we can rely on this, in all cases. Obviously the best solution would be to run the test long enough so that confidence intervals appear to be small and similar for both steps. Probably is not feasible to expand the test into unusual long intervals but at least computing a 95% confidence interval would give me a better sense of results. Doing this might also be a means to account for a broad variety of testing/real-world environment and still get reliable results. Anyone else with this thought? Cheers, Daniel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cerowrt-devel] [Bloat] Network tests as discussed in Washington, DC 2012-11-11 8:35 ` [Bloat] " Daniel Berger @ 2012-11-11 13:39 ` Dave Taht 0 siblings, 0 replies; 7+ messages in thread From: Dave Taht @ 2012-11-11 13:39 UTC (permalink / raw) To: Daniel Berger, frank.rowand; +Cc: bloat-devel, cerowrt-devel, bloat On Sun, Nov 11, 2012 at 9:40 AM, Daniel Berger <dberger@student.ethz.ch> wrote: > Hi everybody, > > I totally love the idea to test for browsing performance. Thanks for > that ;-) Jim's demos of the effect of network load on the performance of web sites are quite revealing, http://gettys.wordpress.com/2012/02/01/bufferbloat-demonstration-videos/ using the chrome web page benchmark available here: https://chrome.google.com/webstore/detail/page-benchmarker/channimfdomahekjcahlbpccbgaopjll You can fairly easily replicate his results on your own hardware, both locally and over the internet. Go for it! However in attempting to get to a general purpose test, the simplicity of his demo (which used a very short path to MIT) didn't work well, thus, I came up with the methods described in the rRul document. They seem to scale fairly well up past 60ms RTT. More testers would be nice! One of the things that really bugs me about today's overbuffered networks is doing things like a file upload via scp, which nearly completes, then stops, and retransmits, over and over again, like jon corbet's example of what happened to him at a conference hotel last year. http://lwn.net/Articles/496509/ > Nevertheless, I have another critical question on this 40s network test > idea: > Did someone consider the robustness of the results? That is, did sb > check for statistical significance? Presently the effects on multiple sorts of networks are interesting. As one example, here is a run of one rrul prototype on wired and wifi toke put together: http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wired-pfifo-fast.pdf vs http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wlan2.pdf I LOVE the first graph (configured for pfifo_fast on the gateways) as it clearly shows classic drop tail "TCP global synchronization" on the egress gateway, and the resulting loss of utilization. It's nice to have been able to get it on a 50+ms *real-world* path. It also shows how traffic classification of TCP doesn't work across the internet very well, as the TCP's classified, different ways, evolve and change places. The second (taken on a good wifi) shows how noisy the data is.. http://www.teklibre.com/~d/bloat/rrul-denmark-germany-wlan2.pdf (I note that using a TCP "ping" is a bad idea except for showing why tcp encapsulated inside tcp is a bad idea, which gets progressively worse at longer RTTs. Anyone have a decent RTP test we can replace this with?) A graph taken against a loaded wifi network is pretty horrify-ing... http://www.teklibre.com/~d/bloat/Not_every_packet_is_sacred-Battling_Bufferbloat_on_wifi.pdf (don't look. Halloween is over) I have a ton of interesting statistics gathered at ietf and linuxcon this past week... but finding good ways to present it remain a problem and I note that most of the stuff above is intended as a BACKGROUND process while loading web pages and doing useful work like making phone calls is the real intended result of the benchmark. So, no, the only statistical significance so far calculated is that tests like this can cause a network to have one to three orders of magnitude of latency inserted into it. Compared to that, I'm not terribly concerned with a few percentage points here or there, at this time, but I'd welcome analysis. The biggest major unknown in the test is the optimal TCP ack count, and TCP's response to packet loss (retransmits) which could account for a great deal of the actual data transmitted, vs the amount of useful data transmitted. "useful data transmitted under no load and under load" would be tremendously useful statistic. It is my hope that the volume web and dns traffic projected to be in the test are going to be fairly minimal compared to the rest of it, but I'm not counting on it. That needs to be measured too. It's a pretty big project to do this up right in other words! > I currently see that there are two steps: > First, the test with few load, which shows (I guess) low jitter/variance. > Second, busy queues. > This second "phase" is probably when jitter/variance will inflate a > lot, right? > Then, also the mean (and most other statistical summary-measures) won't > be stable. Correct. > Thus, I doubt that in order to compute an aggregate "score" we can > rely on this, in all cases. The "score" as a ratio of various measured parameters from unloaded to load seems viable. > Obviously the best solution would be to run the test long enough so that > confidence intervals appear to be small and similar for both steps. There is nothing stopping a network engineer, device driver writer, or device maker, or mathematician or network queue theorist or sysadmin, or manager or concerned citizen... from running the test continuously, going from unloaded, to load, to unload, to load, and tweaking various underlying variables in the network stack and path. I do this all the time! It is my hope, certainly, that those that should do so, will do so. A core component IS the "mtr" tool which will point at the issues on the bottleneck link, which might be anything from the local OS, or device, to wireless ap, to cpe, to somewhere else on the path. Giving the end user data with (occasionally) something other than their ISP to blame would be a goodness, and having tools available to find and fix it, even better. However, the average citizen is not going to sit still for 60 seconds on a regular basis, which is the purpose of trying to come up with a score and panel of useful results that can be presented briefly and clearly. I also have hope that a test as robust and stressful as this can be run on edge gateways automatically, in the background, on selected routers throughout the world, much as bismark already does. See examples at: http://networkdashboard.org/ > Probably is not feasible to expand the test into unusual long intervals > but at least computing a 95% confidence interval would give me a better > sense of results. Go for it! "Bufferbloat.net: Making network research fun since 2011!" I note that the rRul work being done right now is the spare time project of myself and one grad student, leveraging the hard work that has been put into the Linux OS over the last year by so many, and the multitude of useful enhancements like classification, priority and congestion control algorithm that rick jones has put into netperf over the past year, also in his spare time. No funding for this work has yet arrived. Various proposals for grants have been ignored, but we're not grant writing experts. Cerowrt is getting some lovely support from interested users, but the size of the task to get code written, analyzed, and tests deployed is intimidating. There are a wealth of other tests that can be performed, while under a RRUL-like load. For example, this december I'll be at the connext conference in Nice, with some early results from the lincs.fr lab regarding the interactions of AQM and LEDBAT. I hope to be doing some follow up work on that paper also in december, against codel and fq_codel, and more realistic representations of uTP. a rrul-like test would be useful for analyzing and creating comparitive the results from any congestion control algorithm, alone or in combination, such as TCP-LP, or DC-TCP, or (as one potentially very interesting example) the latest work done at MIT on their TCP, that I forget the name of right now. I am very interested in how video sharding technologies work - what often happens there is that there is a HTTP get of one of 10 seconds of video at various rates. The client measures the delivery time of that 10 second shard and increases or decreases the next get to suit. This generally pushes TCP into slow start, repeatedly, and slams the downstream portion of the network, repeatedly. Then there's videoconferencing. Which I care about a lot. I like it when people's lips match up with what they are saying, being partially deaf, myself. And gaming. I'd like very much to have a better picture (packet captures!) of how various online games such as quake, starcraft, and world of warcraft interact with the network. (I think this last item would be rather fun for a team of grad students to take on. Heck, I'd enjoy "working" on this portion of the problem, too. :) ) > Doing this might also be a means to account for a broad variety of > testing/real-world environment and still get reliable results. I would argue that settling on a clear definition of the tests, writing the code, and collecting a large set of data would be "interesting". As for being able to draw general conclusions from it, I generally propose that we prototype tests, and iterate, going deeply into packet captures, until we get things that make sense in the lab and in the field... and rapidly bug report everything that is found. A great number of pathological behaviors we've discovered so far have turned out to be bugs at various levels in various stacks. It's generally been rather difficult to get to a "paper-writing stage", the way my life seems to work looks like this: > Anyone else with this thought? An example of how you can fool yourself with network statistics, misapplied: https://lists.bufferbloat.net/pipermail/bloat/2011-November/000715.html Frank Rowand gave a very good (heretical!) presentation on core analysis and presentation ideas at last weeks linuxconf - particularly when it comes to analyzing real time performance of anything. I don't know if it's up yet. I have generally found that using mountain and cdf plots are the best ways to deal with the extremely noisy data collected from wifi and over the open internet, and that having packet captures and tcp instrumentation is useful also. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cerowrt-devel] Network tests as discussed in Washington, DC 2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht 2012-11-10 14:18 ` [Cerowrt-devel] " dpreed 2012-11-11 8:35 ` [Bloat] " Daniel Berger @ 2012-11-11 23:59 ` Michael Richardson 2012-11-13 2:11 ` dpreed 2 siblings, 1 reply; 7+ messages in thread From: Michael Richardson @ 2012-11-11 23:59 UTC (permalink / raw) To: Dave Taht; +Cc: bloat-devel, cerowrt-devel, bloat [-- Attachment #1: Type: text/plain, Size: 1883 bytes --] Re: FCC. The IETF IAB plenary last Wednesday had a presenter from the FCC, who claimed to care about latency in response to my question. I don't claim it's a good use of your time to listen to these, but probably having the links in the archive is useful. http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-6 http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-7 http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-8 http://ietf85.conf.meetecho.com/index.php/Recorded_Sessions http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF85_PLENARY&chapter=part_11 5. Technical Topic: Measurement Issues in the Internet Moderator: Alissa Cooper Speakers: Sam Crawford and Henning Schulzrinne Although network performance measurement has been a topic of research, standardization, and development for decades, recent efforts to create national, regional, and global access network measurement testbeds and frameworks are drawing renewed interest. These efforts seek to provide consistent measurements of fine-grained performance metrics such as packet loss, delay, and throughput, as well as higher order tests to capture quality of experience with respect to specific applications and services. Some standardized metrics and tests exist, including those developed in the IPPM WG, but a unified framework for observing and reporting the quality metrics that define users experiences across different networks has yet to be developed. This plenary will examine the implications of large-scale network performance measurement for the Internet by exploring the approaches, results, and challenges involved with measurement efforts from around the world. -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works [-- Attachment #2: Type: application/pgp-signature, Size: 307 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Cerowrt-devel] Network tests as discussed in Washington, DC 2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson @ 2012-11-13 2:11 ` dpreed 2012-12-08 6:53 ` Dave's "Water Videos" posted to YouTube Richard Brown 0 siblings, 1 reply; 7+ messages in thread From: dpreed @ 2012-11-13 2:11 UTC (permalink / raw) To: Michael Richardson; +Cc: bloat-devel, cerowrt-devel, bloat I personally am happy that Henning Schulzrinne is pushing for this. However, it should be remembered that the Commissioners rule the day at the FCC. I hope he doesn't suffer what the Spectrum Policy Task Force at the FCC suffered when I was one of the many people who was involved. Let's just say a lot of staffers at the FCC were forced to leave because they dared to focus on technology "third rails". -----Original Message----- From: "Michael Richardson" <mcr+ietf@sandelman.ca> Sent: Sunday, November 11, 2012 6:58pm To: "Dave Taht" <dave.taht@gmail.com> Cc: "bloat-devel" <bloat-devel@lists.bufferbloat.net>, cerowrt-devel@lists.bufferbloat.net, "bloat" <bloat@lists.bufferbloat.net> Subject: Re: [Cerowrt-devel] Network tests as discussed in Washington, DC _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Re: FCC. The IETF IAB plenary last Wednesday had a presenter from the FCC, who claimed to care about latency in response to my question. I don't claim it's a good use of your time to listen to these, but probably having the links in the archive is useful. http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-6 http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-7 http://www.ietf.org/proceedings/85/slides/slides-85-iesg-opsandtech-8 http://ietf85.conf.meetecho.com/index.php/Recorded_Sessions http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF85_PLENARY&chapter=part_11 5. Technical Topic: Measurement Issues in the Internet Moderator: Alissa Cooper Speakers: Sam Crawford and Henning Schulzrinne Although network performance measurement has been a topic of research, standardization, and development for decades, recent efforts to create national, regional, and global access network measurement testbeds and frameworks are drawing renewed interest. These efforts seek to provide consistent measurements of fine-grained performance metrics such as packet loss, delay, and throughput, as well as higher order tests to capture quality of experience with respect to specific applications and services. Some standardized metrics and tests exist, including those developed in the IPPM WG, but a unified framework for observing and reporting the quality metrics that define users experiences across different networks has yet to be developed. This plenary will examine the implications of large-scale network performance measurement for the Internet by exploring the approaches, results, and challenges involved with measurement efforts from around the world. -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works ^ permalink raw reply [flat|nested] 7+ messages in thread
* Dave's "Water Videos" posted to YouTube 2012-11-13 2:11 ` dpreed @ 2012-12-08 6:53 ` Richard Brown 0 siblings, 0 replies; 7+ messages in thread From: Richard Brown @ 2012-12-08 6:53 UTC (permalink / raw) Cc: bloat-devel, cerowrt-devel, bloat Folks, I have pushed all the videos from teklibre to YouTube. (The raw .mp4's are still available on teklibre) There's a little bit of description on each of the pages to tell what each segment is about. They're available from the CeroWrt wiki page: http://www.bufferbloat.net/projects/cerowrt/wiki/Bloat-videos Dave: Did you have a chance to post the slides for your talk? Best regards, Rich Brown Hanover, NH USA ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-12-08 6:54 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-11-10 10:54 Network tests as discussed in Washington, DC Dave Taht 2012-11-10 14:18 ` [Cerowrt-devel] " dpreed 2012-11-11 8:35 ` [Bloat] " Daniel Berger 2012-11-11 13:39 ` [Cerowrt-devel] " Dave Taht 2012-11-11 23:59 ` [Cerowrt-devel] " Michael Richardson 2012-11-13 2:11 ` dpreed 2012-12-08 6:53 ` Dave's "Water Videos" posted to YouTube Richard Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox