[Cerowrt-devel] [aqm] chrome web page benchmarker fixed

Fri Apr 18 14:15:27 EDT 2014

Dave, 

We used the 25k object size for a short time back in 2012 until we had
resources to build a more advanced model (appendix A).  I did a bunch of
captures of real web pages back in 2011 and compared the object size
statistics to models that I'd seen published.  Lognormal didn't seem to be
*exactly* right, but it wasn't a bad fit to what I saw.  I've attached a
CDF.

The choice of 4 servers was based somewhat on logistics, and also on a
finding that across our data set, the average web page retrieved 81% of
its resources from the top 4 servers.  Increasing to 5 servers only
increased that percentage to 84%.

The choice of RTTs also came from the web traffic captures. I saw
RTTmin=16ms, RTTmean=53.8ms, RTTmax=134ms.

Much of this can be found in
https://tools.ietf.org/html/draft-white-httpbis-spdy-analysis-00

In many of the cases that we've simulated, the packet drop probability is
less than 1% for DNS packets.  In our web model, there are a total of 4
servers, so 4 DNS lookups assuming none of the addresses are cached. If
PLR = 1%, there would be a 3.9% chance of losing one or more DNS packets
(with a resulting ~5 second additional delay on load time).  I've probably
oversimplified this, but Kathie N. and I made the call that it would be
significantly easier to just do this math than to build a dns
implementation in ns2.  We've open sourced the web model (it's on Kathie's
web page and will be part of ns2.36) with an encouragement to the
community to improve on it.  If you'd like to port it to ns3 and add a dns
model, that would be fantastic.

-Greg

On 4/17/14, 3:07 PM, "Dave Taht" <dave.taht at gmail.com> wrote:

>On Thu, Apr 17, 2014 at 12:01 PM, William Chan (陈智昌)
><willchan at chromium.org> wrote:
>> Speaking as the primary Chromium developer in charge of this relevant
>>code,
>> I would like to caution putting too much trust in the numbers
>>generated. Any
>> statistical claims about the numbers are probably unreasonable to make.
>
>Sigh. Other benchmarks such as the apache ("ab") benchmark
>are primarily designed as stress testers for web servers, not as realistic
>traffic. Modern web traffic has such a high level of dynamicism in it,
>that static web page loads along any distribution, seem insufficient,
>passive analysis of aggregated traffic "feels" incorrect relative to the
>sorts of home and small business traffic I've seen, and so on.
>
>Famous papers, such as this one:
>
>http://ccr.sigcomm.org/archive/1995/jan95/ccr-9501-leland.pdf
>
>Seem possibly irrelevant to draw conclusions from given the kind
>of data they analysed and proceeding from an incorrect model or
>gut feel for how the network behaves today seems to be foolish.
>
>Even the most basic of tools, such as httping, had three basic bugs
>that I found in a few minutes of trying to come up with some basic
>behaviors yesterday:
>
>https://lists.bufferbloat.net/pipermail/bloat/2014-April/001890.html
>
>Those are going to be a lot easier to fix than diving into the chromium
>codebase!
>
>There are very few tools worth trusting, and I am always dubious
>of papers that publish results with unavailable tools and data. The only
>tools I have any faith in for network analysis are netperf,
>netperf-wrapper,
>tcpdump and xplot.org, and to a large extent wireshark. Toke and I have
>been tearing apart d-itg and I hope to one day be able to add that to
>my trustable list... but better tools are needed!
>
>Tools that I don't have a lot of faith in include that, iperf, anything
>written
>in java or other high level languages, speedtest.net, and things like
>shaperprobe.
>
>Have very little faith in ns2, slightly more in ns3, and I've been meaning
>to look over the mininet and other simulators whenever I got some spare
>time; the mininet results stanford gets seem pretty reasonable and I
>adore their reproducing results effort. Haven't explored ndt, keep meaning
>to...
>
>> Reasons:
>> * We don't actively maintain this code. It's behind the command line
>>flags.
>> They are broken. The fact that it still results in numbers on the
>>benchmark
>> extension is an example of where unmaintained code doesn't have the UI
>> disabled, even though the internal workings of the code fail to
>>guarantee
>> correct operation. We haven't disabled it because, well, it's
>>unmaintained.
>
>As I mentioned I was gearing up for a hacking run...
>
>The vast majority of results I look at are actually obtained via
>looking at packet captures. I mostly use benchmarks as abstractions
>to see if they make some sense relative to the captures and tend
>to compare different benchmarks against each other.
>
>I realize others don't go into that level of detail, so you have given
>fair warning! In our case we used the web page benchmarker as
>a means to try and rapidly acquire some typical distributions of
>get and tcp stream requests from things like the alexa top 1000,
>and as a way to A/B different aqm/packet scheduling setups.
>
>... but the only easily publishable results were from the benchmark
>itself,
>and we (reluctantly) only published one graph from all the work that
>went into it 2+ years back and used it as a test driver for the famous
>ietf video, comparing two identical boxes running it at the same time
>under different network conditions:
>
>https://www.bufferbloat.net/projects/cerowrt/wiki/Bloat-videos#IETF-demo-s
>ide-by-side-of-a-normal-cable-modem-vs-fq_codel
>
>from what I fiddled with today, it is at least still useful for that?
>
>moving on...
>
>The web model in the cablelabs work doesn't look much like my captures,
>in addition to not modeling dns at all, and using a smaller IW than google
>it looks like this:
>
>>> Model single user web page download as follows:
>
>>> - Web page modeled as single HTML page + 100 objects spread evenly
>>> across 4 servers. Web object sizes are currently fixed at 25 kB each,
>>> whereas the initial HTML page is 100 kB. Appendix A provides an
>>> alternative page model that may be explored in future work.
>
>Where what I see is a huge number of stuff that fits into a single
>iw10 slow start episode and some level of pipelining on larger stuff, so
>that a
>large number of object sizes of less than 7k with a lightly tailed
>distribution
>outside of that makes more sense.
>
>(I'm not staring at appendix A right now, I'm under the impression
> it was better)
>
>I certainly would like more suggestions for models and types
>of web traffic, as well as simulation of https + pfs traffic,
>spdy, quic, etc....
>
>>> - Server RTTs set as follows (20 ms, 30 ms, 50 ms, 100 ms).
>
>Server RTTs from my own web history tend to be lower than 50ms.
>
>>> - Initial HTTP GET to retrieve a moderately sized object (100 kB HTML
>>> page) from server 1.
>
>An initial GET to google fits into iw10 - it's about 7k.
>
>>> - Once initial HTTP GET completes, initiate 24 simultaneous HTTP GETs
>>> (via separate TCP connections), 6 connections each to 4 different
>>> server nodes
>
>I usually don't see more than 15. and certainly not 25k sized objects.
>
> > - Once each individual HTTP GET completes, initiate a subsequent GET
>> to the same server, until 25 objects have been retrieved from each
>> server.
>
>
>> * We don't make sure to flush all the network state in between runs, so
>>if
>> you're using that option, don't trust it to work.
>
>The typical scenario we used was a run against dozens or hundreds of urls,
>capturing traffic, while varying network conditions.
>
>Regarded the first run as the most interesting.
>
>Can exit the browser and restart after a run like that.
>
>At moment, merely plan to use the tool primarily to survey various
>web sites and load times while doing packet captures. Hope was
>to get valid data from the network portion of the load, tho...
>
>> * If you have an advanced Chromium setup, this definitely does not
>>work. I
>> advise using the benchmark extension only with a separate Chromium
>>profile
>> for testing purposes. Our flushing of sockets, caches, etc does not
>>actually
>> work correctly when you use the Chromium multiprofile feature and also
>>fails
>> to flush lots of our other network caches.
>
>noted.
>
>
>> * No one on Chromium really believes the time to paint numbers that we
>> output :) It's complicated. Our graphics stack is complicated. The time
>>from
>
>I actually care only about time-to-full layout as that's a core network
>effect...
>
>> when Blink thinks it painted to when the GPU actually blits to the
>>screen
>> cannot currently be corroborated with any high degree of accuracy from
>> within our code.
>
>> * It has not been maintained since 2010. It is quite likely there are
>>many
>> other subtle inaccuracies here.
>
>Grok.
>
>> In short, while you can expect it to give you a very high level
>> understanding of performance issues, I advise against placing
>>non-trivial
>> confidence in the accuracy of the numbers generated by the benchmark
>> extension. The fact that numbers are produced by the extension should
>>not be
>> treated as evidence that the extension actually functions correctly.
>
>OK, noted. Still delighted to be able to have a simple load generator
>that exercises the browsers and generates some results, however
>dubious.
>
>>
>> Cheers.
>>
>>
>> On Thu, Apr 17, 2014 at 10:49 AM, Dave Taht <dave.taht at gmail.com> wrote:
>>>
>>> Getting a grip on real web page load time behavior in an age of
>>> sharded websites,
>>> dozens of dns lookups, javascript, and fairly random behavior in ad
>>> services
>>> and cdns against how a modern browsers behaves is very, very hard.
>>>
>>> it turns out if you run
>>>
>>> google-chrome --enable-benchmarking --enable-net-benchmarking
>>>
>>> (Mac users have to embed these options in their startup script - see
>>>  http://www.chromium.org/developers/how-tos/run-chromium-with-flags )
>>>
>>> enable developer options and install and run the chrome web page
>>> benchmarker,
>>> (
>>> 
>>>https://chrome.google.com/webstore/detail/page-benchmarker/channimfdomah
>>>ekjcahlbpccbgaopjll?hl=en
>>> )
>>>
>>> that it works (at least for me, on a brief test of the latest chrome,
>>>on
>>> linux.
>>> Can someone try windows and mac?)
>>>
>>> You can then feed in a list of urls to test against, and post process
>>> the resulting .csv file to your hearts content. We used to use this
>>> benchmark a lot while trying to characterise typical web behaviors
>>> under aqm and packet scheduling systems under load. Running
>>> it simultaneously with a rrul test or one of the simpler tcp upload or
>>> download
>>> tests in the rrul suite was often quite interesting.
>>>
>>> It turned out the doc has been wrong a while as to the name of the
>>>second
>>> command lnie option. I was gearing up mentally for having to look at
>>> the source....
>>>
>>> http://code.google.com/p/chromium/issues/detail?id=338705
>>>
>>> /me happy
>>>
>>> --
>>> Dave Täht
>>>
>>> Heartbleed POC on wifi campus networks with EAP auth:
>>> http://www.eduroam.edu.au/advisory.html
>>>
>>> _______________________________________________
>>> aqm mailing list
>>> aqm at ietf.org
>>> https://www.ietf.org/mailman/listinfo/aqm
>>
>>
>
>
>
>-- 
>Dave Täht
>
>NSFW: 
>https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indec
>ent.article
>
>_______________________________________________
>aqm mailing list
>aqm at ietf.org
>https://www.ietf.org/mailman/listinfo/aqm

-------------- next part --------------
A non-text attachment was scrubbed...
Name: content_length_cdf.eps
Type: application/octet-stream
Size: 17824 bytes
Desc: content_length_cdf.eps
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140418/2cf1d1c1/attachment-0002.obj>