[Cerowrt-devel] [aqm] chrome web page benchmarker fixed

Thu Apr 17 17:07:27 EDT 2014

On Thu, Apr 17, 2014 at 12:01 PM, William Chan (陈智昌)
<willchan at chromium.org> wrote:
> Speaking as the primary Chromium developer in charge of this relevant code,
> I would like to caution putting too much trust in the numbers generated. Any
> statistical claims about the numbers are probably unreasonable to make.

Sigh. Other benchmarks such as the apache ("ab") benchmark
are primarily designed as stress testers for web servers, not as realistic
traffic. Modern web traffic has such a high level of dynamicism in it,
that static web page loads along any distribution, seem insufficient,
passive analysis of aggregated traffic "feels" incorrect relative to the
sorts of home and small business traffic I've seen, and so on.

Famous papers, such as this one:

http://ccr.sigcomm.org/archive/1995/jan95/ccr-9501-leland.pdf

Seem possibly irrelevant to draw conclusions from given the kind
of data they analysed and proceeding from an incorrect model or
gut feel for how the network behaves today seems to be foolish.

Even the most basic of tools, such as httping, had three basic bugs
that I found in a few minutes of trying to come up with some basic
behaviors yesterday:

https://lists.bufferbloat.net/pipermail/bloat/2014-April/001890.html

Those are going to be a lot easier to fix than diving into the chromium
codebase!

There are very few tools worth trusting, and I am always dubious
of papers that publish results with unavailable tools and data. The only
tools I have any faith in for network analysis are netperf, netperf-wrapper,
tcpdump and xplot.org, and to a large extent wireshark. Toke and I have
been tearing apart d-itg and I hope to one day be able to add that to
my trustable list... but better tools are needed!

Tools that I don't have a lot of faith in include that, iperf, anything written
in java or other high level languages, speedtest.net, and things like
shaperprobe.

Have very little faith in ns2, slightly more in ns3, and I've been meaning
to look over the mininet and other simulators whenever I got some spare
time; the mininet results stanford gets seem pretty reasonable and I
adore their reproducing results effort. Haven't explored ndt, keep meaning
to...

> Reasons:
> * We don't actively maintain this code. It's behind the command line flags.
> They are broken. The fact that it still results in numbers on the benchmark
> extension is an example of where unmaintained code doesn't have the UI
> disabled, even though the internal workings of the code fail to guarantee
> correct operation. We haven't disabled it because, well, it's unmaintained.

As I mentioned I was gearing up for a hacking run...

The vast majority of results I look at are actually obtained via
looking at packet captures. I mostly use benchmarks as abstractions
to see if they make some sense relative to the captures and tend
to compare different benchmarks against each other.

I realize others don't go into that level of detail, so you have given
fair warning! In our case we used the web page benchmarker as
a means to try and rapidly acquire some typical distributions of
get and tcp stream requests from things like the alexa top 1000,
and as a way to A/B different aqm/packet scheduling setups.

... but the only easily publishable results were from the benchmark itself,
and we (reluctantly) only published one graph from all the work that
went into it 2+ years back and used it as a test driver for the famous
ietf video, comparing two identical boxes running it at the same time
under different network conditions:

https://www.bufferbloat.net/projects/cerowrt/wiki/Bloat-videos#IETF-demo-side-by-side-of-a-normal-cable-modem-vs-fq_codel

from what I fiddled with today, it is at least still useful for that?

moving on...

The web model in the cablelabs work doesn't look much like my captures,
in addition to not modeling dns at all, and using a smaller IW than google
it looks like this:

>> Model single user web page download as follows:

>> - Web page modeled as single HTML page + 100 objects spread evenly
>> across 4 servers. Web object sizes are currently fixed at 25 kB each,
>> whereas the initial HTML page is 100 kB. Appendix A provides an
>> alternative page model that may be explored in future work.

Where what I see is a huge number of stuff that fits into a single
iw10 slow start episode and some level of pipelining on larger stuff, so that a
large number of object sizes of less than 7k with a lightly tailed distribution
outside of that makes more sense.

(I'm not staring at appendix A right now, I'm under the impression
 it was better)

I certainly would like more suggestions for models and types
of web traffic, as well as simulation of https + pfs traffic,
spdy, quic, etc....

>> - Server RTTs set as follows (20 ms, 30 ms, 50 ms, 100 ms).

Server RTTs from my own web history tend to be lower than 50ms.

>> - Initial HTTP GET to retrieve a moderately sized object (100 kB HTML
>> page) from server 1.

An initial GET to google fits into iw10 - it's about 7k.

>> - Once initial HTTP GET completes, initiate 24 simultaneous HTTP GETs
>> (via separate TCP connections), 6 connections each to 4 different
>> server nodes

I usually don't see more than 15. and certainly not 25k sized objects.

 > - Once each individual HTTP GET completes, initiate a subsequent GET
> to the same server, until 25 objects have been retrieved from each
> server.

> * We don't make sure to flush all the network state in between runs, so if
> you're using that option, don't trust it to work.

The typical scenario we used was a run against dozens or hundreds of urls,
capturing traffic, while varying network conditions.

Regarded the first run as the most interesting.

Can exit the browser and restart after a run like that.

At moment, merely plan to use the tool primarily to survey various
web sites and load times while doing packet captures. Hope was
to get valid data from the network portion of the load, tho...

> * If you have an advanced Chromium setup, this definitely does not work. I
> advise using the benchmark extension only with a separate Chromium profile
> for testing purposes. Our flushing of sockets, caches, etc does not actually
> work correctly when you use the Chromium multiprofile feature and also fails
> to flush lots of our other network caches.

noted.

> * No one on Chromium really believes the time to paint numbers that we
> output :) It's complicated. Our graphics stack is complicated. The time from

I actually care only about time-to-full layout as that's a core network
effect...

> when Blink thinks it painted to when the GPU actually blits to the screen
> cannot currently be corroborated with any high degree of accuracy from
> within our code.

> * It has not been maintained since 2010. It is quite likely there are many
> other subtle inaccuracies here.

Grok.

> In short, while you can expect it to give you a very high level
> understanding of performance issues, I advise against placing non-trivial
> confidence in the accuracy of the numbers generated by the benchmark
> extension. The fact that numbers are produced by the extension should not be
> treated as evidence that the extension actually functions correctly.

OK, noted. Still delighted to be able to have a simple load generator
that exercises the browsers and generates some results, however
dubious.

>
> Cheers.
>
>
> On Thu, Apr 17, 2014 at 10:49 AM, Dave Taht <dave.taht at gmail.com> wrote:
>>
>> Getting a grip on real web page load time behavior in an age of
>> sharded websites,
>> dozens of dns lookups, javascript, and fairly random behavior in ad
>> services
>> and cdns against how a modern browsers behaves is very, very hard.
>>
>> it turns out if you run
>>
>> google-chrome --enable-benchmarking --enable-net-benchmarking
>>
>> (Mac users have to embed these options in their startup script - see
>>  http://www.chromium.org/developers/how-tos/run-chromium-with-flags )
>>
>> enable developer options and install and run the chrome web page
>> benchmarker,
>> (
>> https://chrome.google.com/webstore/detail/page-benchmarker/channimfdomahekjcahlbpccbgaopjll?hl=en
>> )
>>
>> that it works (at least for me, on a brief test of the latest chrome, on
>> linux.
>> Can someone try windows and mac?)
>>
>> You can then feed in a list of urls to test against, and post process
>> the resulting .csv file to your hearts content. We used to use this
>> benchmark a lot while trying to characterise typical web behaviors
>> under aqm and packet scheduling systems under load. Running
>> it simultaneously with a rrul test or one of the simpler tcp upload or
>> download
>> tests in the rrul suite was often quite interesting.
>>
>> It turned out the doc has been wrong a while as to the name of the second
>> command lnie option. I was gearing up mentally for having to look at
>> the source....
>>
>> http://code.google.com/p/chromium/issues/detail?id=338705
>>
>> /me happy
>>
>> --
>> Dave Täht
>>
>> Heartbleed POC on wifi campus networks with EAP auth:
>> http://www.eduroam.edu.au/advisory.html
>>
>> _______________________________________________
>> aqm mailing list
>> aqm at ietf.org
>> https://www.ietf.org/mailman/listinfo/aqm
>
>

-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article