An important factor when designing the test is the difference between intra-flow and inter-flow induced latencies, as well as the baseline latency.
In general, AQM by itself controls intra-flow induced latency, while flow isolation (commonly FQ) controls inter-flow induced latency. I consider the latter to be more important to measure.
Baseline latency is a factor of the underlying network topology, and is the type of latency most often measured. It should be measured in the no-load condition, but the choice of remote endpoint is critical. Large ISPs could gain an unfair advantage if they can provide a qualifying endpoint within their network, closer to the last mile links than most realistic Internet services. Conversely, ISPs are unlikely to endorse a measurement scheme which places the endpoints too far away from them.
One reasonable possibility is to use DNS lookups to randomly-selected gTLDs as the benchmark. There are gTLD DNS servers well-placed in essentially all regions of interest, and effective DNS caching is a legitimate means for an ISP to improve their customers' internet performance. Random lookups (especially of domains which are known to not exist) should defeat the effects of such caching.
Induced latency can then be measured by applying a load and comparing the new latency measurement to the baseline. This load can simultaneously be used to measure available throughput. The tests on dslreports offer a decent example of how to do this, but it would be necessary to standardise the load.
- Jonathan Morton