"Eggert, Lars" <lars@netapp.com> writes:

> that would be a great addition. But I think it will require some
> fundamental change to the wrapper (actually , probably to netperf.) Or
> at least a solution a complete solution would.

Yeah, I was thinking of putting the functionality into netperf.

> I'd really like to see some measurement tool (netperf, flowgrind,
> etc.) grow support for measuring latencies based on the actual
> load-generating data flow. Ideally and assuming fully sync'ed clocks,
> I'd like to timestamp each byte of a TCP stream when an app does
> write(), and I'd like to timestamp it again when the receiving app
> read()s it. The difference between the two timestamps is the latency
> that byte saw end-to-end.

Well, what the LINCS people have done (the link in my previous mail) is
basically this: Sniff TCP packets that have timestamps on them (i.e.
with the TCP timestamp option enabled), and compute the delta between
the timestamps as a latency measure. Now this only gives an absolute
latency measure if the clocks are synchronised; however, if we're
interested in measuring queueing latency, i.e. induced *extra* latency,
this can be calculated as (latency - min-latency) where min-latency is
the minimum observed latency throughout the lifetime of the connection
(this is the same mechanism LEDBAT uses, btw).

In this case the unknown clock discrepancy cancels out (assuming no
clock drift over the course of the measurement, although there's
presumably a way to compensate for that, but I haven't been able to get
hold of the actual paper even though it's references in several
others...). The LINCS paper indicates that the estimates of queueing
latency from this method can be fairly accurate.

So I guess my question is firstly whether this way of measuring OWD
would be worthwhile, and secondly if anyone has any idea whether it will
be possible to implement (it would require access to the raw timestamp
values of the TCP data packets).

Putting timestamps into the TCP stream and reading them out at the other
end might work; but is there a way to force each timestamp to be in a
separate packet?

> That measurement would include the stack/driver latencies which you
> don't currently capture with a parallel ping. For datacenter scenarios
> with very low RTTs, these sources of latency begin to matter.

Yeah, I'm aware of that issue and fixing it was one of the reasons I
wanted to do this... :)

> I think that Stas' thrulay tool did measure latencies in this way, but
> it has accumulated some serious bitrot.

Do you know how that worked more specifically and/or do you have a link
to the source code?

-Toke