[Bloat] [aqm] ping loss "considered harmful"

Mon Mar 2 05:54:45 EST 2015

> On 2 Mar, 2015, at 12:17, Mikael Abrahamsson <swmike at swm.pp.se> wrote:
> 
> On Mon, 2 Mar 2015, Brian Trammell wrote:
> 
>> Gaming protocols do this right - latency measurement is built into the protocol.
> 
> I believe this is the only way to do it properly, and the most likely easiest way to get this deployed would be to use the TCP stack.
> 
> We need to give users an easy-to-understand metric on how well their Internet traffic is working. So the problem here is that the users can't tell how well it's working without resorting to ICMP PING to try to figure out what's going on.
> 
> For instance, if their web browser had insight into what the TCP stack was doing then it could present information a lot better to the user. Instead of telling the user "time to first byte" (which is L4 information), it could tell the less novice user about packet loss, PDV, reordering, RTT, how well concurrent connections to the same IP address are doing, tell more about *why* some connections are slow instead of just saying "it took 5.3 seconds to load this webpage and here are the connections and how long each took". For the novice user there should be some kind of expert system that collects data that you can send to the ISP that also has an expert system to say "it seems your local connection delays packets", please connect to a wired connection and try again". It would know if the problem was excessive delay, excessive delay that varied a lot, packet loss, reordering, or whatever.
> 
> We have a huge amount of information in our TCP stacks that either are locked in there and not used properly to help users figure out what's going on, and there is basically zero information flow between the applications using TCP and the TCP stack itself. Each just tries to do its best on its own layer.

This seems like an actually good idea.  Several of those statistics, at least, could be exposed to userspace without incurring any additional overhead in the stack (except for the queries themselves), which is important for high-performance server users.  TCP stacks already track RTT, and sometimes MinRTT - the difference between these values is a reasonable lower-bound estimate of induced latency.

For stacks which don’t already track all the desirable data, a socket option could be used to turn that on, allocating extra space to do so.  To maximise portability, therefore, it might be necessary to require that option before statistics requests will be valid, even on stacks which do collect it all anyway.

Recent versions of Windows, even, have a semi-magic system which gives a little indicator of whether your connection has functioning Internet connectivity or not.  This could be extended, if Microsoft saw fit, to interpret these statistics and notify the user that their connection was behaving badly in the ways we now find interesting.  Whether Microsoft will do such a thing (which would undoubtedly piss off every major ISP on the planet) is another matter, but it’s a concept that can be used by Linux desktops as well, and with less political fallout.

Now, who’s going to knuckle down and implement it?

 - Jonathan Morton