"Eggert, Lars" writes: > we tried this too. The TCP timestamps are too coarse-grained for > datacenter latency measurements, I think under at least Linux and > FreeBSD they get rounded up to 1ms or something. (Midori, do you > remember the exact value?) Right. Well now that you mention it, I do seem to recall having read that Linux uses the clock ticks (related to the kernel hz value; i.e. between 250 and 1000 hz depending on configuration) as timestamp units. I suppose FreeBSD is similar. > No, but the sender and receiver can agree to embed them every X bytes > in the stream. Yeah, sometimes that timestamp may be transmitted in > two segments, but I guess that should be OK? Right, so a protocol might be something like this (I'm still envisioning this in the context of the netperf TCP_STREAM / TCP_MAERTS tests): 1. Insert a sufficiently accurate timestamp into the TCP bandwidth measurement stream every X bytes (or maybe every X milliseconds?). 2. On the receiver side, look for these timestamps and each time one is received, calculate the delay (also in a sufficiently accurate, i.e. sub-millisecond, unit). Echo this calculated delay back to the sender, probably with a fresh timestamp attached. 3. The sender receives the delay measurements and either just outputs it straight away, or holds on to them until the end of the test and normalises them to be deltas against the minimum observed delay. Now, some possible issues with this: - Are we measuring the right thing? This will measure the time it takes a message to get from the application level on one side to the application level on another. There are a lot of things that could impact this apart from queueing latency; the most obvious one is packet loss and retransmissions which will give some spurious results I suppose (?). Doing the measurement with UDP packets would alleviate this, but then we're back to not being in-stream... - As for point 3, not normalising the result and just outputting the computed delay as-is means that the numbers will be meaningless without very accurately synchronised clocks. On the other hand, not processing the numbers before outputting them will allow people who *do* have synchronised clocks to do something useful with them. Perhaps a --assume-sync-clocks parameter? - Echoing back the delay measurements causes traffic which may or may not be significant; I'm thinking mostly in terms of running bidirectional measurements. Is that significant? A solution could be for the receiver to hold on to all the measurements until the end of the test and then send them back on the control connection. - Is clock drift something to worry about over the timescales of these tests? https://www.usenix.org/legacy/events/iptps10/tech/slides/cohen.pdf seems to suggest it shouldn't be, as long as the tests only run for at most a few minutes. > http://e2epi.internet2.edu/thrulay/ is the original. There are several > variants, but I think they also have been abandoned: Thanks. From what I can tell, the measurement here basically works by something akin to the above: for TCP, the timestamp is just echoed back by the receiver, so roundtrip time is measured. For UDP, the receiver calculates the delay, so presumably clock synchronisation is a prerequisite. So anyway, thoughts? Is the above something worth pursuing? -Toke