[Bloat] Measuring Latency
Jonathan Morton
chromatix99 at gmail.com
Sat Sep 13 19:32:23 EDT 2014
>>> When reading it, it strikes me, that you don't directly tell them what to
>>> do; e.g. add a latency test during upload and download. ...
>>
>> Does round trip latency have enough info, or do you need to know how much is
>> contributed by each direction?
>
> RTT is fine, uni-directional transfer time would be too good to be true ;).
To expand on this: to measure one-way delay, you would need finely synchronised clocks (to within a couple of ms) at both ends. The average consumer doesn't have that sort of setup - not even if they happen to use NTP. So it's not a goal worth pursuing at this level - save it for scientific investigations, where the necessary effort and equipment can be made available.
>> If I gave you a large collection of latency data from a test run, how do you
>> reduce it to something simple that a marketer could compare with the results
>> from another test run?
>
> I believe the added latency under load would be a marketable number, but we had a discussion in the past where it was argued that marketing wants a number which increases with goodness, so larger = better, something the raw difference is not going to deliver…
The obvious solution is to report the inverse value in Hz, a figure of merit that gamers are familiar with (if not exactly in this context).
For example, I occasionally play World Of Tanks, which has a latency meter (in ms) in the top corner, and I can roughly invert that in my head - if I set my shaper too optimistically, I get something like 500ms if something is updating in the background, but this goes down immediately to 100ms once I adjust the setting to a more conservative value - it's a 3G connection, so it's not going to get much better than that. The corresponding inverted readings would be 2Hz (where I miss most of my shots) and 10Hz (where I can carry the team). It's probably worth reporting to one decimal place.
WoT isn't exactly the "twitchiest" of online games, either - have you any idea how long it takes to aim a tank gun? Even so, when some tanks can move at over 30km/h, a half-second difference in position is a whole tank length, so with the slower response I no longer know *where* or *when* to fire, unless the target is a sitting duck. Even though my framerate is at 30Hz or more and appears smooth, my performance as a player is dependent on the Internet's response frequency, because that is lower.
So here's the outline of a step-by-step methodology:
- Prepare space for a list of latency measurements. Each measurement needs to be tagged with information about what else was going on at the same time, ie. idle/upload/download/both. Some latency probes may be lost, and this fact should also be recorded on a per-tag basis.
- Start taking latency measurements, tagging as idle to begin with. Keep on doing so continuously, changing the tag as required, until several seconds after the bandwidth measurements are complete.
- Latency measurements should be taken sufficiently frequently (several times a second is okay) that there will be at least a hundred samples with each tag, and the frequency of sampling should not change during the test. Each probe must be tagged with a unique ID, so that losses or re-ordering of packets can be detected and don't confuse the measurement.
- The latency probes should use UDP, not ICMP. They should also use the same Diffserv/TOS tag as the bandwidth measurement traffic; the default "best-effort" tag is fine.
- To satisfy the above requirements, the latency tester must *not* wait for a previous reply to return before sending the next one. It should send at regular intervals based on wall-clock time. But don't send so many probes that they themselves clog the link.
- Once several seconds of "idle" samples are recorded, start the download test. Change the tag to "download" at this point.
- The download test is complete when all the data sent by the server has reached the client. Change the tag back to "idle" at this moment.
- Repeat the previous two steps for the upload measurement, using the "upload" tag.
- Repeat again, but perform upload and download tests at the same time (predicting, if necessary, that the bandwidth in each direction should be similar to that previously measured), and use the "both" tag. Uploads and downloads tend to interfere with each other when the loaded response frequency is poor, so don't simply assume that the results will be the same as in the individual tests - *measure* it.
- Once a few seconds of "idle" samples have been taken, stop measuring and start analysis.
- Separate the list of latency samples by tag, and sort the four resulting lists in ascending order.
- In each list, find the sample nearest 98% of the way through the list. This is the "98th percentile", a good way of finding the "highest" value while discarding irrelevant outliers. The highest latency is the one that users will notice. Typically poor results: idle 50ms, download 250ms, upload 500ms, both 800ms.
- Correct the 98-percentile latencies for packet loss, by multiplying it by the number of probes *sent* with the appropriate tag, and then dividing it by the number of probes *received* with that tag. It is not necessary to report packet loss in any other way, *except* for the "idle" tag.
- Convert the corrected 98-percentile latencies into "response frequencies" by dividing one second by them. The typical figures above would become: idle 20.0 Hz, download 4.0 Hz, upload 2.0 Hz, both 1.25 Hz - assuming there was no packet loss. These figures are comparable in meaning and importance to "frames per second" figures in games.
- Report these response frequencies, to a precision of at least one decimal place, alongside and with equal visual importance to, the bandwidth figures. For example:
IDLE: Response 20.0 Hz Packet loss 0.00 %
DOWNLOAD: Response 4.0 Hz Bandwidth 20.00 Mbps
UPLOAD: Response 2.0 Hz Bandwidth 2.56 Mbps
BIDIRECT: Response 1.3 Hz Bandwidth 15.23 / 2.35 Mbps
- Improving the response figures in the loaded condition will probably also improve the *bidirectional* bandwidth figures as a side-effect, while having a minimal effect on the *unidirectional* figures. A connection with such qualities can legitimately be described as supporting multiple activities at the same time. A connection with the example figures shown above can *not*.
The next trick is getting across the importance of acknowledging that more than one person in the average household wants to use the Internet at the same time these days, and they often want to do different things from each other. In this case, the simplest valid argument probably has a lot going for it.
An illustration might help to get the concept across. A household with four users in different rooms: father in the study downloading a video, mother in the kitchen on the (VoIP) phone, son in his bedroom playing a game, daughter in her bedroom uploading photos. All via a single black-box modem and connection. Captions should emphasise that mother and son both need low latency (high response frequency), while father and daughter need high bandwidth (in opposite directions!), and that they're doing all these things at the same time.
- Jonathan Morton
More information about the Bloat
mailing list