From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from snark.thyrsus.com (static-71-162-243-5.phlapa.fios.verizon.net [71.162.243.5]) by huchra.bufferbloat.net (Postfix) with ESMTP id 9C07C201146 for ; Wed, 21 Sep 2011 19:11:38 -0700 (PDT) Received: by snark.thyrsus.com (Postfix, from userid 23) id 1D68A20C341; Wed, 21 Sep 2011 22:11:38 -0400 (EDT) Date: Wed, 21 Sep 2011 22:11:38 -0400 From: Eric Raymond To: Dave Taht Subject: Re: Preliminary results of using GPS to look for clock skew Message-ID: <20110922021137.GB21302@thyrsus.com> References: <20110921230205.2275820C2E5@snark.thyrsus.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Organization: Eric Conspiracy Secret Labs X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Eric Raymond , Hal Murray , bloat-devel@lists.bufferbloat.net X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list Reply-To: esr@thyrsus.com List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2011 02:11:38 -0000 Dave Taht : > It is comforting to know that ntp is working well in your case, and, using > GPS, we have a verifiable means with decent error bars of checking against > ntp's algos independently! Yup. It'll get better as I refine my profiling and gain more insight into the numbers. My next task is to compute a lower bound for RS-232 transmission time and subtract that from E-S so we know how much of the dominant component in fix latency is processing time. Er, for other bloat-dev members: I should have said up front that I've volunteered to be the bufferbloat project's go-to guy on reliable time sources for network performance profiling. This is a completely natural extension of the work I've been doing on GPSD since 2005. GPS gives us atomic-clock time with $40 hardware (provided we're below 60 drgress N or S latitude and can string an antenna somewhere with a decent skyview). I know almost everything there is to know about extracting data from these sensors, and what I don't know my two senior lieutenants on the GPSD project *do* know. > Two ideas here: > > 1) Run the router WITHOUT ntp enabled at all > (and/or testing against CLOCK_REALTIME) > It would be good to know how much the base clock drift is, without > correction. One of the things I don't know, and need to understand, is what the relationships are among the different realtime clocks. The clock_gettime(3) manual page is not hugely helpful. It says: CLOCK_REALTIME System-wide real-time clock. Setting this clock requires appro‐ priate privileges. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific) Similar to CLOCK_MONOTONIC, but provides access to a raw hard‐ ware-based time that is not subject to NTP adjustments. CLOCK_PROCESS_CPUTIME_ID High-resolution per-process timer from the CPU. CLOCK_THREAD_CPUTIME_ID Thread-specific CPU-time clock. Er, so what exactly is the relationship between the CLOCK_REALTIME clock and the time(2) clock? Are they the same? If they're different, how are they different? It says the CLOCK_MONOTONIC clock isn't settable, but the CLOCK_MONOTONIC_RAW text implies that the former may get NTP adjustments. And it doesn.t specify whether the per-process timers are NTP-corrected...I'd guess not, but who's to know from the above. Can anyone point me to better documentation on these facilities? > 2) ReRun all tests under load (example: netperf -l 3600 -H the_router) I'll do this, for completeness, but I predict it's not going to make any measurable difference. The indications so far are that neither of the means of time delivery I have available to check are compute-bound or disk-I/O bound at any point in their delivery chains. So I think they're just going to shrug off any load short of machine-thrashing-its-guts-out. But part of the point of what I'm doing is that soon we'll have the test tools to know for *sure* that's true. > The followon test to this is to actually start collecting and parsing > ntp rawstats statistics, which can be easily turned on and collected > on the router. It's getting-those-stats-somewhere coupled with the > need to periodically delete these statistic files that's a problem at > the moment, and really only the former... > > The bufferbloat signal (if it exists) is in the noise that ntp is > currently (successfully in your case) rejecting. > > There are a couple rawstats parsers floating about, I have part of > one, hal has another. I committed a major overdesign > sin in mine by wanting to put it all into a postgres db, ran into > major data representation problems (time on postgres is different than > time inside of ntp), and put the work aside (it's on github in the > same pieces I left it in) > > To enable rawstats collection on the router, modify /etc/ntp.conf to contain: > > statsdir /tmp > statistics rawstats > filegen rawstats file rawstats type day enable > > and restart ntp > > on a system protected by apparmor - like ubuntu - it's mildly trickier > as you need to add > a > > whereverthelogdiris/rawstats* rwl > > to the /etc/apparmor.d/usr.sbin.ntpd > > The final bit of the cbbd is to actually collect port numbers - so > stuff on ephemeral ports is known to be from > natted devices and stuff on 123 > > but I'm getting way ahead of myself here. Agreed. I also think you're complicating life unnecessarily. If we need rawstats in a form for real-time monitoring, why not modify NTP to optionally multicast them and avoid all this going to disk? I have good relations with the NTP guys, and they wouldn't be likely to resist a feature request with a network-health-monitoring use case even if we didn't. Let's *use* that zorch for something, rather than fielding a fragile pile of hacks. -- Eric S. Raymond