From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eu1sys200aog116.obsmtp.com (eu1sys200aog116.obsmtp.com [207.126.144.141]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 51F5D21F206 for ; Sun, 14 Sep 2014 10:27:07 -0700 (PDT) Received: from mail.la.pnsol.com ([89.145.213.110]) (using TLSv1) by eu1sys200aob116.postini.com ([207.126.147.11]) with SMTP ID DSNKVBXP4+OKS6ULl015I0QjEyWDFF4ACBYY@postini.com; Sun, 14 Sep 2014 17:27:36 UTC Received: from git.pnsol.com ([172.20.5.238] helo=roam.smtp.pnsol.com) by mail.la.pnsol.com with esmtp (Exim 4.76) (envelope-from ) id 1XTDZj-0005Nt-1S; Sun, 14 Sep 2014 18:26:59 +0100 Received: from [172.20.5.109] by roam.smtp.pnsol.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1XTDZi-0000Ph-Tn; Sun, 14 Sep 2014 17:26:59 +0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Neil Davies In-Reply-To: Date: Sun, 14 Sep 2014 18:26:57 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20140913194126.5B0D1406062@ip-64-139-1-69.sjc.megapath.net> <20418644-AB62-43AE-A09E-5F85ED42DBF4@gmx.de> <1AF70E51-A60F-47C1-AF90-9B1E6030227C@pnsol.com> To: Sebastian Moeller X-Mailer: Apple Mail (2.1878.6) Cc: Hal Murray , bloat@lists.bufferbloat.net Subject: Re: [Bloat] Measuring Latency X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Sep 2014 17:27:37 -0000 Ok,=20 if you take a look at =C2=A7B2 in http://goo.gl/ss6EBq it shows how to = perform the correction for the clock difference given two sets of = measurements of delay (or more specifically delay and loss, the quality = attenuation or =E2=88=86Q), in =C2=A74.2 to =C2=A74.4 it gives a = semi-formal explanation (http://goo.gl/ss6EBq does this in an even more = informal way) of how to extract the basis set for the way in which = quality attenuation accrues (and hence delay) in networks. The point is there are three fundamental components of how this delay = "accrues" - G,S and V - it is the 'V' that congestion has an effect on - = G, and S are the the bits that are "fixed" for a given endpoint. The way = G and S are calculated means that you can get pretty accurate estimators = for them in a few (5 or so) samples. Noting that the minimum times that = are being used are *per packet size* (or in the case of things like ATM = per quantisation size). The rate at which the estimator tracks the true = G and S (and hence, when those effects are subtracted from the observed = values, the magnitude and stability of the V) is very high, the = theoretical discussion makes very limited assumptions about the = distribution (and hence the underlying of the delay caused) of V. The "speed" of a network connection is really the point, at which, the = 'V' starts to increase rapidly with the offered load, if you look at = slides 19 and 20 of http://goo.gl/ss6EBq you can see the measurement = effects of the total offered load exceeding the capacity of a = multiplexing point on the path - when the "speed" (offered load) was too = high. It is the measurement of the trend of 'V' that gives you the = turning point, that is "the speed" . Neil On 14 Sep 2014, at 17:55, Sebastian Moeller wrote: > Hi Neil, >=20 >=20 > On Sep 14, 2014, at 16:31 , Neil Davies wrote: >=20 >> Gents, >>=20 >> This is not actually true - you can measure one-way delays without = completely accurately synchronised clocks (they have to be reasonably = precise, not accurate) - see CERN thesis at http://goo.gl/ss6EBq >=20 > I might have mis understood the thesis, but the requirement of = 1000s of samples somehow does not work well with the use case we have = been discussing in this thread, improvement of speed tests so that they = include latency under load measurements. Also looking at the thesis I = am a bit unsure about the one way delay measurement method, it relies = ion using the minimum one way delays times. My own observations of RTTs = for ATM quantization show that even for 1000 samples per packet size the = min is a much worse estimate than the median, so for their method to = work over the open internet we are talking about gigantic numbers of = samples=E2=80=A6 (now admitted I might have screwed up royally with my = own analysis, I do this for a hobby only). Cool thesis, nice and = impressive work, but not the best fit for the quest for a better = speedtest I guess=E2=80=A6 >=20 > Best Regards > Sebastian >=20 >=20 >>=20 >> It is possible, with appropriate measurements, to construct arguments = that make marketeers salivate (or the appropriate metaphor) - you can = compare the relative effects of technology, location and instantaneous = congestion. See slideshare at http://goo.gl/ss6EBq >>=20 >> Neil >>=20 >> On 14 Sep 2014, at 00:32, Jonathan Morton = wrote: >>=20 >>>>>> When reading it, it strikes me, that you don't directly tell them = what to >>>>>> do; e.g. add a latency test during upload and download. ... >>>>>=20 >>>>> Does round trip latency have enough info, or do you need to know = how much is=20 >>>>> contributed by each direction? >>>>=20 >>>> RTT is fine, uni-directional transfer time would be too good to be = true ;). >>>=20 >>> To expand on this: to measure one-way delay, you would need finely = synchronised clocks (to within a couple of ms) at both ends. The = average consumer doesn't have that sort of setup - not even if they = happen to use NTP. So it's not a goal worth pursuing at this level - = save it for scientific investigations, where the necessary effort and = equipment can be made available. >>>=20 >>>>> If I gave you a large collection of latency data from a test run, = how do you=20 >>>>> reduce it to something simple that a marketer could compare with = the results=20 >>>>> from another test run? >>>>=20 >>>> I believe the added latency under load would be a marketable = number, but we had a discussion in the past where it was argued that = marketing wants a number which increases with goodness, so larger =3D = better, something the raw difference is not going to deliver=E2=80=A6 >>>=20 >>> The obvious solution is to report the inverse value in Hz, a figure = of merit that gamers are familiar with (if not exactly in this context). >>>=20 >>> For example, I occasionally play World Of Tanks, which has a latency = meter (in ms) in the top corner, and I can roughly invert that in my = head - if I set my shaper too optimistically, I get something like 500ms = if something is updating in the background, but this goes down = immediately to 100ms once I adjust the setting to a more conservative = value - it's a 3G connection, so it's not going to get much better than = that. The corresponding inverted readings would be 2Hz (where I miss = most of my shots) and 10Hz (where I can carry the team). It's probably = worth reporting to one decimal place. >>>=20 >>> WoT isn't exactly the "twitchiest" of online games, either - have = you any idea how long it takes to aim a tank gun? Even so, when some = tanks can move at over 30km/h, a half-second difference in position is a = whole tank length, so with the slower response I no longer know *where* = or *when* to fire, unless the target is a sitting duck. Even though my = framerate is at 30Hz or more and appears smooth, my performance as a = player is dependent on the Internet's response frequency, because that = is lower. >>>=20 >>>=20 >>> So here's the outline of a step-by-step methodology: >>>=20 >>> - Prepare space for a list of latency measurements. Each = measurement needs to be tagged with information about what else was = going on at the same time, ie. idle/upload/download/both. Some latency = probes may be lost, and this fact should also be recorded on a per-tag = basis. >>>=20 >>> - Start taking latency measurements, tagging as idle to begin with. = Keep on doing so continuously, changing the tag as required, until = several seconds after the bandwidth measurements are complete. >>>=20 >>> - Latency measurements should be taken sufficiently frequently = (several times a second is okay) that there will be at least a hundred = samples with each tag, and the frequency of sampling should not change = during the test. Each probe must be tagged with a unique ID, so that = losses or re-ordering of packets can be detected and don't confuse the = measurement. >>>=20 >>> - The latency probes should use UDP, not ICMP. They should also use = the same Diffserv/TOS tag as the bandwidth measurement traffic; the = default "best-effort" tag is fine. >>>=20 >>> - To satisfy the above requirements, the latency tester must *not* = wait for a previous reply to return before sending the next one. It = should send at regular intervals based on wall-clock time. But don't = send so many probes that they themselves clog the link. >>>=20 >>> - Once several seconds of "idle" samples are recorded, start the = download test. Change the tag to "download" at this point. >>>=20 >>> - The download test is complete when all the data sent by the server = has reached the client. Change the tag back to "idle" at this moment. >>>=20 >>> - Repeat the previous two steps for the upload measurement, using = the "upload" tag. >>>=20 >>> - Repeat again, but perform upload and download tests at the same = time (predicting, if necessary, that the bandwidth in each direction = should be similar to that previously measured), and use the "both" tag. = Uploads and downloads tend to interfere with each other when the loaded = response frequency is poor, so don't simply assume that the results will = be the same as in the individual tests - *measure* it. >>>=20 >>> - Once a few seconds of "idle" samples have been taken, stop = measuring and start analysis. >>>=20 >>> - Separate the list of latency samples by tag, and sort the four = resulting lists in ascending order. >>>=20 >>> - In each list, find the sample nearest 98% of the way through the = list. This is the "98th percentile", a good way of finding the = "highest" value while discarding irrelevant outliers. The highest = latency is the one that users will notice. Typically poor results: idle = 50ms, download 250ms, upload 500ms, both 800ms. >>>=20 >>> - Correct the 98-percentile latencies for packet loss, by = multiplying it by the number of probes *sent* with the appropriate tag, = and then dividing it by the number of probes *received* with that tag. = It is not necessary to report packet loss in any other way, *except* for = the "idle" tag. >>>=20 >>> - Convert the corrected 98-percentile latencies into "response = frequencies" by dividing one second by them. The typical figures above = would become: idle 20.0 Hz, download 4.0 Hz, upload 2.0 Hz, both 1.25 Hz = - assuming there was no packet loss. These figures are comparable in = meaning and importance to "frames per second" figures in games. >>>=20 >>> - Report these response frequencies, to a precision of at least one = decimal place, alongside and with equal visual importance to, the = bandwidth figures. For example: >>>=20 >>> IDLE: Response 20.0 Hz Packet loss 0.00 % >>> DOWNLOAD: Response 4.0 Hz Bandwidth 20.00 Mbps >>> UPLOAD: Response 2.0 Hz Bandwidth 2.56 Mbps >>> BIDIRECT: Response 1.3 Hz Bandwidth 15.23 / 2.35 Mbps >>>=20 >>> - Improving the response figures in the loaded condition will = probably also improve the *bidirectional* bandwidth figures as a = side-effect, while having a minimal effect on the *unidirectional* = figures. A connection with such qualities can legitimately be described = as supporting multiple activities at the same time. A connection with = the example figures shown above can *not*. >>>=20 >>>=20 >>> The next trick is getting across the importance of acknowledging = that more than one person in the average household wants to use the = Internet at the same time these days, and they often want to do = different things from each other. In this case, the simplest valid = argument probably has a lot going for it. >>>=20 >>> An illustration might help to get the concept across. A household = with four users in different rooms: father in the study downloading a = video, mother in the kitchen on the (VoIP) phone, son in his bedroom = playing a game, daughter in her bedroom uploading photos. All via a = single black-box modem and connection. Captions should emphasise that = mother and son both need low latency (high response frequency), while = father and daughter need high bandwidth (in opposite directions!), and = that they're doing all these things at the same time. >>>=20 >>> - Jonathan Morton >>>=20 >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >>=20 >=20