From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id C392221F203 for ; Sun, 14 Sep 2014 09:56:06 -0700 (PDT) Received: from hms-beagle.home.lan ([93.194.226.142]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0MJSuF-1XVgC82X2R-0039wy; Sun, 14 Sep 2014 18:55:55 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: <1AF70E51-A60F-47C1-AF90-9B1E6030227C@pnsol.com> Date: Sun, 14 Sep 2014 18:55:54 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20140913194126.5B0D1406062@ip-64-139-1-69.sjc.megapath.net> <20418644-AB62-43AE-A09E-5F85ED42DBF4@gmx.de> <1AF70E51-A60F-47C1-AF90-9B1E6030227C@pnsol.com> To: Neil Davies X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:i1OuklJ1S+9b2ireJ3JzZIbTh+en0bWmdqCRKMyc2vnKpiz9f/l rfHF3VA0y96q7VrJNgeVLj3HgQbRBz7R1wZmniF+Wm5UK/n1drdTECnzu6sfLEdMwq9bezy FWY5l+js5qM7zUgl47CJtiOnuMmw9RjZB2fgn5LQz0kzJ37uzgE/qdYhOoQJQ98eT3f0i2d bUvO7ySrbwgAaXkRSMCxw== X-UI-Out-Filterresults: notjunk:1; Cc: Hal Murray , bloat@lists.bufferbloat.net Subject: Re: [Bloat] Measuring Latency X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Sep 2014 16:56:35 -0000 Hi Neil, On Sep 14, 2014, at 16:31 , Neil Davies wrote: > Gents, >=20 > This is not actually true - you can measure one-way delays without = completely accurately synchronised clocks (they have to be reasonably = precise, not accurate) - see CERN thesis at http://goo.gl/ss6EBq I might have mis understood the thesis, but the requirement of = 1000s of samples somehow does not work well with the use case we have = been discussing in this thread, improvement of speed tests so that they = include latency under load measurements. Also looking at the thesis I = am a bit unsure about the one way delay measurement method, it relies = ion using the minimum one way delays times. My own observations of RTTs = for ATM quantization show that even for 1000 samples per packet size the = min is a much worse estimate than the median, so for their method to = work over the open internet we are talking about gigantic numbers of = samples=85 (now admitted I might have screwed up royally with my own = analysis, I do this for a hobby only). Cool thesis, nice and impressive = work, but not the best fit for the quest for a better speedtest I guess=85= Best Regards Sebastian >=20 > It is possible, with appropriate measurements, to construct arguments = that make marketeers salivate (or the appropriate metaphor) - you can = compare the relative effects of technology, location and instantaneous = congestion. See slideshare at http://goo.gl/6vytmD >=20 > Neil >=20 > On 14 Sep 2014, at 00:32, Jonathan Morton = wrote: >=20 >>>>> When reading it, it strikes me, that you don't directly tell them = what to >>>>> do; e.g. add a latency test during upload and download. ... >>>>=20 >>>> Does round trip latency have enough info, or do you need to know = how much is=20 >>>> contributed by each direction? >>>=20 >>> RTT is fine, uni-directional transfer time would be too good to be = true ;). >>=20 >> To expand on this: to measure one-way delay, you would need finely = synchronised clocks (to within a couple of ms) at both ends. The = average consumer doesn't have that sort of setup - not even if they = happen to use NTP. So it's not a goal worth pursuing at this level - = save it for scientific investigations, where the necessary effort and = equipment can be made available. >>=20 >>>> If I gave you a large collection of latency data from a test run, = how do you=20 >>>> reduce it to something simple that a marketer could compare with = the results=20 >>>> from another test run? >>>=20 >>> I believe the added latency under load would be a marketable = number, but we had a discussion in the past where it was argued that = marketing wants a number which increases with goodness, so larger =3D = better, something the raw difference is not going to deliver=85 >>=20 >> The obvious solution is to report the inverse value in Hz, a figure = of merit that gamers are familiar with (if not exactly in this context). >>=20 >> For example, I occasionally play World Of Tanks, which has a latency = meter (in ms) in the top corner, and I can roughly invert that in my = head - if I set my shaper too optimistically, I get something like 500ms = if something is updating in the background, but this goes down = immediately to 100ms once I adjust the setting to a more conservative = value - it's a 3G connection, so it's not going to get much better than = that. The corresponding inverted readings would be 2Hz (where I miss = most of my shots) and 10Hz (where I can carry the team). It's probably = worth reporting to one decimal place. >>=20 >> WoT isn't exactly the "twitchiest" of online games, either - have you = any idea how long it takes to aim a tank gun? Even so, when some tanks = can move at over 30km/h, a half-second difference in position is a whole = tank length, so with the slower response I no longer know *where* or = *when* to fire, unless the target is a sitting duck. Even though my = framerate is at 30Hz or more and appears smooth, my performance as a = player is dependent on the Internet's response frequency, because that = is lower. >>=20 >>=20 >> So here's the outline of a step-by-step methodology: >>=20 >> - Prepare space for a list of latency measurements. Each measurement = needs to be tagged with information about what else was going on at the = same time, ie. idle/upload/download/both. Some latency probes may be = lost, and this fact should also be recorded on a per-tag basis. >>=20 >> - Start taking latency measurements, tagging as idle to begin with. = Keep on doing so continuously, changing the tag as required, until = several seconds after the bandwidth measurements are complete. >>=20 >> - Latency measurements should be taken sufficiently frequently = (several times a second is okay) that there will be at least a hundred = samples with each tag, and the frequency of sampling should not change = during the test. Each probe must be tagged with a unique ID, so that = losses or re-ordering of packets can be detected and don't confuse the = measurement. >>=20 >> - The latency probes should use UDP, not ICMP. They should also use = the same Diffserv/TOS tag as the bandwidth measurement traffic; the = default "best-effort" tag is fine. >>=20 >> - To satisfy the above requirements, the latency tester must *not* = wait for a previous reply to return before sending the next one. It = should send at regular intervals based on wall-clock time. But don't = send so many probes that they themselves clog the link. >>=20 >> - Once several seconds of "idle" samples are recorded, start the = download test. Change the tag to "download" at this point. >>=20 >> - The download test is complete when all the data sent by the server = has reached the client. Change the tag back to "idle" at this moment. >>=20 >> - Repeat the previous two steps for the upload measurement, using the = "upload" tag. >>=20 >> - Repeat again, but perform upload and download tests at the same = time (predicting, if necessary, that the bandwidth in each direction = should be similar to that previously measured), and use the "both" tag. = Uploads and downloads tend to interfere with each other when the loaded = response frequency is poor, so don't simply assume that the results will = be the same as in the individual tests - *measure* it. >>=20 >> - Once a few seconds of "idle" samples have been taken, stop = measuring and start analysis. >>=20 >> - Separate the list of latency samples by tag, and sort the four = resulting lists in ascending order. >>=20 >> - In each list, find the sample nearest 98% of the way through the = list. This is the "98th percentile", a good way of finding the = "highest" value while discarding irrelevant outliers. The highest = latency is the one that users will notice. Typically poor results: idle = 50ms, download 250ms, upload 500ms, both 800ms. >>=20 >> - Correct the 98-percentile latencies for packet loss, by multiplying = it by the number of probes *sent* with the appropriate tag, and then = dividing it by the number of probes *received* with that tag. It is not = necessary to report packet loss in any other way, *except* for the = "idle" tag. >>=20 >> - Convert the corrected 98-percentile latencies into "response = frequencies" by dividing one second by them. The typical figures above = would become: idle 20.0 Hz, download 4.0 Hz, upload 2.0 Hz, both 1.25 Hz = - assuming there was no packet loss. These figures are comparable in = meaning and importance to "frames per second" figures in games. >>=20 >> - Report these response frequencies, to a precision of at least one = decimal place, alongside and with equal visual importance to, the = bandwidth figures. For example: >>=20 >> IDLE: Response 20.0 Hz Packet loss 0.00 % >> DOWNLOAD: Response 4.0 Hz Bandwidth 20.00 Mbps >> UPLOAD: Response 2.0 Hz Bandwidth 2.56 Mbps >> BIDIRECT: Response 1.3 Hz Bandwidth 15.23 / 2.35 Mbps >>=20 >> - Improving the response figures in the loaded condition will = probably also improve the *bidirectional* bandwidth figures as a = side-effect, while having a minimal effect on the *unidirectional* = figures. A connection with such qualities can legitimately be described = as supporting multiple activities at the same time. A connection with = the example figures shown above can *not*. >>=20 >>=20 >> The next trick is getting across the importance of acknowledging that = more than one person in the average household wants to use the Internet = at the same time these days, and they often want to do different things = from each other. In this case, the simplest valid argument probably has = a lot going for it. >>=20 >> An illustration might help to get the concept across. A household = with four users in different rooms: father in the study downloading a = video, mother in the kitchen on the (VoIP) phone, son in his bedroom = playing a game, daughter in her bedroom uploading photos. All via a = single black-box modem and connection. Captions should emphasise that = mother and son both need low latency (high response frequency), while = father and daughter need high bandwidth (in opposite directions!), and = that they're doing all these things at the same time. >>=20 >> - Jonathan Morton >>=20 >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >=20