[Bloat] Latency Measurements in Speed Test suites (was: DOCSIS 3+ recommendation?)

Tue Mar 31 00:14:31 EDT 2015

> Currently the 'saturation' meter is pinging away at an unrelated server (dslreports.com)
> probably it should ping away, and with higher frequency, at one of the servers streaming data in? because then there are more likely to be filled buffers en-route?
> or are the bloated buffers mainly at the customer end of the experience.

Mostly at the customer end (see below).  The core is generally built with sufficient capacity and small buffers (compared to the link capacity). I recommend pinging a topologically-nearby server rather than a central one.

> I can make the saturation meter display RTT directly, and continue during the test as an preference. I don't really want to have it pinging away during the test because it probably slows the result down. Actually I'll have to check that. Definitely on slow lines it would (like GPRS and 3G).

A simple ping, without artificially added payload, is about 64 bytes.  A small UDP packet (whose payload is just a unique cookie) can be used for the same purpose, and is less likely to experience artificial prioritisation.  Four of those a second makes a quarter of a kilobyte per second each way.  That’ll be noticeable on GPRS and analogue modems, but not to anyone else.  I say that as someone who regularly uses 3G.

A concept I’d like to introduce you to is “network responsiveness”, which is measured in Hz rather than ms, and thus goes down when latency goes up.  A responsiveness of 10.0Hz corresponds to a 100ms latency, and that’s a useful, rule-of-thumb baseline for acceptable VoIP and gaming performance.  It can be compared fairly directly to the framerate of a video or a graphics card.

> tcptrace on the server side of one stream would immediately reveal average and peak RTT and more. I wonder if that is the goal to be shooting for rather than these more indirect measurements.
> 
> What is the buffer bloat opinion on the ESNet page?
> 
> »fasterdata.es.net/network-tuning ··· r-bloat/
> 
> they say more not less buffers are needed for 10gig, and its only a problem with residential. 

Datacentres and the public Internet are very different places.  You can’t generalise from one to the other.  The RTTs are very different, for a start - LAN vs WAN scales.

At 10Gbps, a megabyte of buffer will drain in about a millisecond.  What’s more, a megabyte might be enough, because chances are such a fat link is being used by lots of TCP sessions in parallel, so you only need to worry about one or two of those bursting at a given instant.  Since buffers (after the first couple of packets) are used to absorb bursts, that’s all you might need.

Frankly, one of our present problems is getting consumer-grade router hardware to work reliably at 100Mbps or so, which is just starting to become widely available.  There’s only so much you can do with a wimpy, single-core, cost-optimised MIPS, even if it’s attached to lots of GigE and 802.11ac hardware; I’m using an ancient Pentium-MMX as a surprisingly accurate model for these things.  Sufficient buffering isn’t the problem here - it just can’t turn packets around fast enough.

On a more typical rural consumer connection, at 1Mbps, a megabyte of buffer will take about 10 seconds to drain, and is therefore obviously oversized.  Even at 10Mbps, it’ll take a whole second to drain, which is painful.  The AQM systems we’re working on are an answer to that problem - they will automatically act to keep the buffers at a more sensible fill level.  They also isolate flows from each other, so that one bursting or otherwise misbehaving flow won’t interfere (adding that draining latency) with a sparse, latency-sensitive one like VoIP or gaming.

It is that last scenario, which the great majority of consumers experience in practice, which we’d like you to address by measuring latency under load.

 - Jonathan Morton