[Bloat] Progress with latency-under-load tool
Jonathan Morton
chromatix99 at gmail.com
Sun Mar 20 14:52:18 PDT 2011
On 20 Mar, 2011, at 10:33 pm, grenville armitage wrote:
>> Here are some numbers I got from a test over a switched 100base-TX LAN:
>>
>> Upload Capacity: 1018 KiB/s
>> Download Capacity: 2181 KiB/s
>> Link Responsiveness: 2 Hz
>> Flow Smoothness: 1 Hz
>>
>> Horrible, isn't it? I deliberately left these machines with standard configurations in order to show that.
>
> Perhaps a tangential 2 cents from me, but I'm unclear how helpful Hertz is as
> a unit of measurement for the challenge of raising awareness of bufferbloat.
Customers won't be asking for "more Hertz" - or at least, none that have any sense. They'll be asking for "more smoothness" for their video streams, or "more responsiveness" for their online games. They already ask for "more bandwidth", not "more megabits per second".
Hertz makes sense as a unit because, when talking about latency or transmission delays, shorter times are better, but people understand "bigger is better" much more easily. Hard drives used to measure their seek times in milliseconds too, but now they are measured in IOPS instead (a trend mostly associated with SSDs, which have IOPS numbers several orders of magnitude better than mechanical drives).
Let me manually translate that for you, though. That Responsiveness rating of 2Hz means that the practical RTT went above 334ms - and this on a switched Fast Ethernet being driven by good-quality 1996-vintage hardware on one side and a cheap nettop on the other. It actually reflects not pure latency as 'ping' would have measured it, but a packet loss and the time required to recover from it.
And the "smoothness" rating actually contrived to be *worse*, at somewhere north of 500ms. At Fast Ethernet speeds, that implies megabytes of buffering, on what really is a very old computer.
It's a clear sign of something very broken in the network stack, especially as I get broadly similar results (with much higher throughput numbers) when I run the same test on GigE hardware with much more powerful computers (which can actually saturate GigE).
You really don't want to see what I got on my 3G test runs. I got 0.09 Hz from a single flow, and these tests run all the way up to 32 flows. I think the modem switched down into GPRS mode for a few minutes as well, even though there was no obvious change in propagation conditions.
> And resolution past 3 significant digits from there seems
> possible with posix timers.
If the latency was staying in the single-digit milliseconds as it should be on a LAN, you'd have three s.f. with just integers. I do print the smoothness numbers out to 2 decimal places for the individual scenarios, though - these are the numbers meant for investigating problems:
Scenario 1: 0 uploads, 1 downloads... 1343 KiB/s down, 31.52 Hz smoothness
Scenario 2: 1 uploads, 0 downloads... 3670 KiB/s up, 22.24 Hz smoothness
Scenario 3: 0 uploads, 2 downloads... 2077 KiB/s down, 19.70 Hz smoothness
Scenario 4: 1 uploads, 1 downloads... 2855 KiB/s up, 322 KiB/s down, 6.44 Hz smoothness
That's from a different test, where you can see the effect of a WLAN and having Vegas on one side of the connection.
With that said, since the single-digit results are so ubiquitous, perhaps some extra precision is warranted after all. Perhaps I can take the opportunity to squash some more minor bugs, and add an interpretation of the goodput in terms of gigabytes per month.
> How'd they do debloated?
I'm still investigating that, partly as the older hardware wasn't yet set up with kernels capable of running advanced qdiscs. It takes a while to compile a kernel on a Pentium-MMX. I'm also really not sure where to get debloated drivers for a VIA Rhine or a Sun GEM. ;-) Mind you, such a thing may not be needed, if the device drivers are already slim so that cutting txqueuelen is sufficient.
I can't run debloated 3G tests - I don't have access to the buffer controls on the base tower. :-(
- Jonathan
More information about the Bloat
mailing list