Preliminary results of using GPS to look for clock skew
esr at thyrsus.com
Wed Sep 21 22:11:38 EDT 2011
Dave Taht <dave.taht at gmail.com>:
> It is comforting to know that ntp is working well in your case, and, using
> GPS, we have a verifiable means with decent error bars of checking against
> ntp's algos independently!
Yup. It'll get better as I refine my profiling and gain more insight
into the numbers. My next task is to compute a lower bound for RS-232
transmission time and subtract that from E-S so we know how much of the
dominant component in fix latency is processing time.
Er, for other bloat-dev members: I should have said up front that I've
volunteered to be the bufferbloat project's go-to guy on reliable time
sources for network performance profiling. This is a completely
natural extension of the work I've been doing on GPSD since 2005. GPS
gives us atomic-clock time with $40 hardware (provided we're below 60
drgress N or S latitude and can string an antenna somewhere with a
decent skyview). I know almost everything there is to know about
extracting data from these sensors, and what I don't know my two senior
lieutenants on the GPSD project *do* know.
> Two ideas here:
> 1) Run the router WITHOUT ntp enabled at all
> (and/or testing against CLOCK_REALTIME)
> It would be good to know how much the base clock drift is, without
One of the things I don't know, and need to understand, is what the
relationships are among the different realtime clocks. The clock_gettime(3)
manual page is not hugely helpful. It says:
System-wide real-time clock. Setting this clock requires appro‐
Clock that cannot be set and represents monotonic time since
some unspecified starting point.
CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hard‐
ware-based time that is not subject to NTP adjustments.
High-resolution per-process timer from the CPU.
Thread-specific CPU-time clock.
Er, so what exactly is the relationship between the CLOCK_REALTIME clock
and the time(2) clock? Are they the same? If they're different, how
are they different?
It says the CLOCK_MONOTONIC clock isn't settable, but the CLOCK_MONOTONIC_RAW
text implies that the former may get NTP adjustments. And it doesn.t specify
whether the per-process timers are NTP-corrected...I'd guess not, but who's
to know from the above.
Can anyone point me to better documentation on these facilities?
> 2) ReRun all tests under load (example: netperf -l 3600 -H the_router)
I'll do this, for completeness, but I predict it's not going to make
any measurable difference. The indications so far are that neither of
the means of time delivery I have available to check are compute-bound
or disk-I/O bound at any point in their delivery chains.
So I think they're just going to shrug off any load short of
machine-thrashing-its-guts-out. But part of the point of what I'm
doing is that soon we'll have the test tools to know for *sure* that's
> The followon test to this is to actually start collecting and parsing
> ntp rawstats statistics, which can be easily turned on and collected
> on the router. It's getting-those-stats-somewhere coupled with the
> need to periodically delete these statistic files that's a problem at
> the moment, and really only the former...
> The bufferbloat signal (if it exists) is in the noise that ntp is
> currently (successfully in your case) rejecting.
> There are a couple rawstats parsers floating about, I have part of
> one, hal has another. I committed a major overdesign
> sin in mine by wanting to put it all into a postgres db, ran into
> major data representation problems (time on postgres is different than
> time inside of ntp), and put the work aside (it's on github in the
> same pieces I left it in)
> To enable rawstats collection on the router, modify /etc/ntp.conf to contain:
> statsdir /tmp
> statistics rawstats
> filegen rawstats file rawstats type day enable
> and restart ntp
> on a system protected by apparmor - like ubuntu - it's mildly trickier
> as you need to add
> whereverthelogdiris/rawstats* rwl
> to the /etc/apparmor.d/usr.sbin.ntpd
> The final bit of the cbbd is to actually collect port numbers - so
> stuff on ephemeral ports is known to be from
> natted devices and stuff on 123
> but I'm getting way ahead of myself here.
Agreed. I also think you're complicating life unnecessarily.
If we need rawstats in a form for real-time monitoring, why not modify
NTP to optionally multicast them and avoid all this going to disk? I
have good relations with the NTP guys, and they wouldn't be likely to
resist a feature request with a network-health-monitoring use case
even if we didn't. Let's *use* that zorch for something, rather than
fielding a fragile pile of hacks.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the Bloat-devel