[Bloat] real behaviors of http (was: aqm evaluation guidelines)

Tue Apr 15 20:54:29 PDT 2014

I think that more people getting a "feel" for how web page accesses
work would be
a good thing. The simplest tool for a basic get request I know of is the httping
tool:

http://www.vanheusden.com/httping/

It also has many advanced features.

By default it returns output much like ping does.

d at nuc:~/public_html/gw11$ ping www.bufferbloat.net
PING shipka.bufferbloat.net (149.20.54.81) 56(84) bytes of data.
64 bytes from shipka.bufferbloat.net (149.20.54.81): icmp_seq=1 ttl=54
time=13.7 ms

This httping command line does a dns lookup on every request.
I have artificially moved my
dns server to my upstream resolver (75 dot 75 dot 75 dot 75) which as
near as I can
tell is about 10ms RTT away.

d at nuc:~/public_html/gw11$ httping -c 10 -i .02 -G http://www.bufferbloat.net
PING www.bufferbloat.net:80 (/):
connected to 149.20.54.81:80 (466 bytes), seq=0 time=161.38 ms
connected to 149.20.54.81:80 (466 bytes), seq=1 time=151.38 ms
connected to 149.20.54.81:80 (466 bytes), seq=2 time=170.45 ms
connected to 149.20.54.81:80 (466 bytes), seq=3 time=172.09 ms
connected to 149.20.54.81:80 (467 bytes), seq=4 time=369.22 ms
connected to 149.20.54.81:80 (467 bytes), seq=5 time=157.84 ms
connected to 149.20.54.81:80 (466 bytes), seq=6 time=188.63 ms
connected to 149.20.54.81:80 (466 bytes), seq=7 time=149.76 ms
connected to 149.20.54.81:80 (466 bytes), seq=8 time=162.28 ms
connected to 149.20.54.81:80 (466 bytes), seq=9 time=206.10 ms
10 connects, 10 ok, 0.00% failed, time 2091ms
round-trip min/avg/max = 149.8/188.9/369.2 ms

If you tell httping to only issue a -r esolve request on the first query
it can do considerably better:

d at nuc:~/public_html/gw11$ httping -r -c 10 -i .02 -G http://www.bufferbloat.net

connected to 149.20.54.81:80 (466 bytes), seq=0 time=138.46 ms
connected to 149.20.54.81:80 (466 bytes), seq=1 time=125.69 ms
connected to 149.20.54.81:80 (467 bytes), seq=2 time=333.78 ms
connected to 149.20.54.81:80 (466 bytes), seq=3 time=137.17 ms
connected to 149.20.54.81:80 (466 bytes), seq=4 time=133.13 ms
connected to 149.20.54.81:80 (466 bytes), seq=5 time=126.99 ms
connected to 149.20.54.81:80 (467 bytes), seq=6 time=139.29 ms
connected to 149.20.54.81:80 (466 bytes), seq=7 time=129.99 ms
connected to 149.20.54.81:80 (466 bytes), seq=8 time=127.58 ms
connected to 149.20.54.81:80 (466 bytes), seq=9 time=121.05 ms
--- http://www.bufferbloat.net/ ping statistics ---
10 connects, 10 ok, 0.00% failed, time 1731ms
round-trip min/avg/max = 121.1/151.3/333.8 ms

Now this measures the number of round trips needed to negotiate
a connection and get a web page, so the three core variables you can
manipulate is the RTT, the size of the returned data, and the amount
of dns caching or not in your setup. You can (after doing some measurements)
find and deduct the overhead of the application http server as a
relative constant.

There is support for persistent (pipelined) requests, which eliminates the tcp
3 way handshake for future requests but is not particularly well used by
browsers and servers in the real world. There is also support for TCP fast open.

-Q enables persistent connections.

d at nuc:~/public_html/gw11$ httping -r -c 10 -i .02 -G -Q
http://www.bufferbloat.net
PING www.bufferbloat.net:80 (/):
pinged host 149.20.54.81:80 (504 bytes), seq=0 time=151.25 ms

# I am thinking there is a bug here with the -G option

pinged host 149.20.54.81:80 (7251 bytes), seq=1 time=112.57 ms
pinged host 149.20.54.81:80 (7254 bytes), seq=2 time=110.79 ms
pinged host 149.20.54.81:80 (7257 bytes), seq=3 time=108.73 ms
pinged host 149.20.54.81:80 (7260 bytes), seq=4 time=109.77 ms
pinged host 149.20.54.81:80 (7263 bytes), seq=5 time=108.79 ms
pinged host 149.20.54.81:80 (7266 bytes), seq=6 time=109.74 ms
pinged host 149.20.54.81:80 (7270 bytes), seq=7 time=333.45 ms
pinged host 149.20.54.81:80 (7274 bytes), seq=8 time=118.83 ms
pinged host 149.20.54.81:80 (7278 bytes), seq=9 time=120.77 ms
--- http://www.bufferbloat.net/ ping statistics ---
10 connects, 10 ok, 0.00% failed, time 1602ms

httping appears to be broken for https as I dink with the latest
version. Off the
top of my head I seem to recall that https about tripled the time to
first data on this
path, mostly due to added RTTs.

httping can do json output, which is optional but easy to parse in
other tools like
netperf-wrapper, or nagios.

This command line -M option outputs json suitable for incorporating
into other tools

d at nuc:~/public_html/gw11$ httping -r -c 3 -i .02 -M -G
http://snapon.lab.bufferbloat.net
[
{ "status" : "1", "seq" : "1", "start_ts" : "1397642077.318260",
"resolve_ms" : "1.330376e-01", "connect_ms" : "1.441813e+01",
"request_ms" : "1.000881e+00", "total_ms" : "3.091097e+01",
"http_code" : "200", "msg" : "200 OK", "header_size" : "281",
"data_size" : "0", "bps" : "0.000000", "host" : "149.20.63.30",
"ssl_fingerprint" : "", "time_offset" : "3591659.008980",
"tfo_success" : "false", "write" : "1.535892e+01", "close" :
"2.002716e-02", "cookies" : "0", "to" : "3.591659e+06",
"tcp_rtt_stats" : "2.500000e+01", "re_tx" : "0", "pmtu" : "1500",
"tos" : "02", },
{ "status" : "1", "seq" : "2", "start_ts" : "1397642077.369331",
"connect_ms" : "1.342821e+01", "request_ms" : "6.830692e-01",
"total_ms" : "3.041911e+01", "http_code" : "200", "msg" : "200 OK",
"header_size" : "281", "data_size" : "0", "bps" : "0.000000", "host" :
"149.20.63.30", "ssl_fingerprint" : "", "time_offset" :
"3591608.745575", "tfo_success" : "false", "write" : "1.630783e+01",
"close" : "2.002716e-02", "cookies" : "0", "to" : "3.591609e+06",
"tcp_rtt_stats" : "2.500000e+01", "re_tx" : "0", "pmtu" : "1500",
"tos" : "02", },
{ "status" : "1", "seq" : "3", "start_ts" : "1397642077.419918",
"connect_ms" : "1.799989e+01", "request_ms" : "5.819798e-01",
"total_ms" : "4.074192e+01", "http_code" : "200", "msg" : "200 OK",
"header_size" : "281", "data_size" : "0", "bps" : "0.000000", "host" :
"149.20.63.30", "ssl_fingerprint" : "", "time_offset" :
"3591550.711155", "tfo_success" : "false", "write" : "2.216005e+01",
"close" : "2.813339e-02", "cookies" : "0", "to" : "3.591551e+06",
"tcp_rtt_stats" : "2.600000e+01", "re_tx" : "0", "pmtu" : "1500",
"tos" : "02", }
]

So to get a basic feel for how remote websites perform just for the
first query is straightforward, just go through the alexa top 1000
from your location or - better - pull your most common websites from
your browser's cache as namebench does.

So you can easily do the 700KB test by setting up a single flow via
httping and call it a day, and yes, you can learn something that way.
But, as real web traffic uses
far more simultaneous connections, and has a more complex relationship
to the dns, and
to features like pipelining, and caching, and most of all to most of
the flows being short and running in slow start inside of those RTTs,
I'd be reluctant to draw any conclusions about the behavior of an aqm
for web page load time without an accurate model of all the RTTs and
caching really involved. Many other web benchmarking tools exist, but
few actually benchmark real-looking traffic, being mostly designed to
stress test the OS or server.

Lastly I note as your rtts shrink the overhead of http grows, and many websites
I httpinged today were well below 30ms RTT from me. I keep seeing
people suggesting simulation values in the range 20 50 100 200 are
being "good", when I think in the future
numbers in the range of 2 8 16 32 64 will be the most common, with a
long tail for those in places like new zealand or connected via
satellite links.

d at snapon:~/git/httping-2.3.4$ ping www.google.com
PING www.google.com (74.125.239.48) 56(84) bytes of data.
64 bytes from nuq04s19-in-f16.1e100.net (74.125.239.48): icmp_req=1
ttl=59 time=1.13 ms
64 bytes from nuq04s19-in-f16.1e100.net (74.125.239.48): icmp_req=2
ttl=59 time=1.10 ms

--- www.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.109/1.119/1.130/0.035 ms
d at snapon:~/git/httping-2.3.4$ httping -c 10 -i .02 -G http://www.google.com
PING www.google.com:80 (/):
connected to 74.125.239.51:80 (769 bytes), seq=0 time= 55.27 ms
connected to 74.125.239.49:80 (788 bytes), seq=1 time= 54.79 ms
connected to 74.125.239.52:80 (257 bytes), seq=2 time= 57.35 ms
connected to 74.125.239.50:80 (257 bytes), seq=3 time= 56.97 ms
connected to 74.125.239.48:80 (257 bytes), seq=4 time= 56.61 ms
connected to 74.125.239.51:80 (257 bytes), seq=5 time= 59.43 ms
connected to 74.125.239.49:80 (257 bytes), seq=6 time= 55.17 ms
connected to 74.125.239.52:80 (257 bytes), seq=7 time= 58.22 ms
connected to 74.125.239.50:80 (257 bytes), seq=8 time= 57.60 ms
connected to 74.125.239.48:80 (257 bytes), seq=9 time= 56.61 ms

And all tools have bugs and need to be carefully checked to make
sure they are actually doing what they say they are doing.

I found three, I think, while composing this email.

And haven't got around to looking at the packet captures for other oddities.

-- 
Dave Täht

Deal with heartbleed:
https://www.eff.org/deeplinks/2014/04/bleeding-hearts-club-heartbleed-recovery-system-administrators