From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <esr@thyrsus.com>
Received: from snark.thyrsus.com (static-71-162-243-5.phlapa.fios.verizon.net
	[71.162.243.5])
	by huchra.bufferbloat.net (Postfix) with ESMTP id 9C07C201146
	for <bloat-devel@lists.bufferbloat.net>;
	Wed, 21 Sep 2011 19:11:38 -0700 (PDT)
Received: by snark.thyrsus.com (Postfix, from userid 23)
	id 1D68A20C341; Wed, 21 Sep 2011 22:11:38 -0400 (EDT)
Date: Wed, 21 Sep 2011 22:11:38 -0400
From: Eric Raymond <esr@thyrsus.com>
To: Dave Taht <dave.taht@gmail.com>
Subject: Re: Preliminary results of using GPS to look for clock skew
Message-ID: <20110922021137.GB21302@thyrsus.com>
References: <20110921230205.2275820C2E5@snark.thyrsus.com>
	<CAA93jw6cdO9ou8JpnRtQw51jHtcuBC5J41Xg3iu6bRPs3MsVdA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAA93jw6cdO9ou8JpnRtQw51jHtcuBC5J41Xg3iu6bRPs3MsVdA@mail.gmail.com>
Organization: Eric Conspiracy Secret Labs
X-Eric-Conspiracy: There is no conspiracy
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: Eric Raymond <esr@snark.thyrsus.com>, Hal Murray <hmurray@megapathdsl.net>,
	bloat-devel@lists.bufferbloat.net
X-BeenThere: bloat-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
Reply-To: esr@thyrsus.com
List-Id: "Developers working on AQM, device drivers,
	and networking stacks" <bloat-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat-devel>,
	<mailto:bloat-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat-devel>
List-Post: <mailto:bloat-devel@lists.bufferbloat.net>
List-Help: <mailto:bloat-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat-devel>,
	<mailto:bloat-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 22 Sep 2011 02:11:38 -0000

Dave Taht <dave.taht@gmail.com>:
> It is comforting to know that ntp is working well in your case, and, using
> GPS, we have a verifiable means with decent error bars of checking against
> ntp's algos independently!

Yup.  It'll get better as I refine my profiling and gain more insight
into the numbers.  My next task is to compute a lower bound for RS-232
transmission time and subtract that from E-S so we know how much of the
dominant component in fix latency is processing time.

Er, for other bloat-dev members: I should have said up front that I've
volunteered to be the bufferbloat project's go-to guy on reliable time
sources for network performance profiling.  This is a completely
natural extension of the work I've been doing on GPSD since 2005.  GPS
gives us atomic-clock time with $40 hardware (provided we're below 60
drgress N or S latitude and can string an antenna somewhere with a
decent skyview).  I know almost everything there is to know about
extracting data from these sensors, and what I don't know my two senior
lieutenants on the GPSD project *do* know.

> Two ideas here:
> 
> 1) Run the router WITHOUT ntp enabled at all
>    (and/or testing against CLOCK_REALTIME)
>    It would be good to know how much the base clock drift is, without
> correction.

One of the things I don't know, and need to understand, is what the
relationships are among the different realtime clocks. The clock_gettime(3)
manual page is not hugely helpful.  It says:

       CLOCK_REALTIME
              System-wide real-time clock.  Setting this clock requires appro‐
              priate privileges.

       CLOCK_MONOTONIC
              Clock  that  cannot  be  set and represents monotonic time since
              some unspecified starting point.

       CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
              Similar to CLOCK_MONOTONIC, but provides access to a  raw  hard‐
              ware-based time that is not subject to NTP adjustments.

       CLOCK_PROCESS_CPUTIME_ID
              High-resolution per-process timer from the CPU.

       CLOCK_THREAD_CPUTIME_ID
              Thread-specific CPU-time clock.

Er, so what exactly is the relationship between the CLOCK_REALTIME clock
and the time(2) clock?  Are they the same?  If they're different, how 
are they different?

It says the CLOCK_MONOTONIC clock isn't settable, but the CLOCK_MONOTONIC_RAW
text implies that the former may get NTP adjustments.  And it doesn.t specify
whether the per-process timers are NTP-corrected...I'd guess not, but who's 
to know from the above.

Can anyone point me to better documentation on these facilities?

> 2) ReRun all tests under load (example: netperf -l 3600 -H the_router)

I'll do this, for completeness, but I predict it's not going to make
any measurable difference.  The indications so far are that neither of
the means of time delivery I have available to check are compute-bound
or disk-I/O bound at any point in their delivery chains.  

So I think they're just going to shrug off any load short of
machine-thrashing-its-guts-out.  But part of the point of what I'm
doing is that soon we'll have the test tools to know for *sure* that's
true.

> The followon test to this is to actually start collecting and parsing
> ntp rawstats statistics, which can be easily turned on and collected
> on the router. It's getting-those-stats-somewhere coupled with the
> need to periodically delete these statistic files that's a problem at
> the moment, and really only the former...
> 
> The bufferbloat signal (if it exists) is in the noise that ntp is
> currently (successfully in your case) rejecting.
> 
> There are a couple rawstats parsers floating about, I have part of
> one, hal has another. I committed a major overdesign
> sin in mine by wanting to put it all into a postgres db, ran into
> major data representation problems (time on postgres is different than
> time inside of ntp), and put the work aside (it's on github in the
> same pieces I left it in)
> 
> To enable rawstats collection on the router, modify /etc/ntp.conf to contain:
> 
> statsdir /tmp
> statistics rawstats
> filegen rawstats file rawstats type day enable
> 
> and restart ntp
> 
> on a system protected by apparmor - like ubuntu - it's mildly trickier
> as you need to add
> a
> 
> whereverthelogdiris/rawstats* rwl
> 
> to the /etc/apparmor.d/usr.sbin.ntpd
> 
> The final bit of the cbbd is to actually collect port numbers - so
> stuff on ephemeral ports is known to be from
> natted devices and stuff on 123
> 
> but I'm getting way ahead of myself here.

Agreed.  I also think you're complicating life unnecessarily. 

If we need rawstats in a form for real-time monitoring, why not modify
NTP to optionally multicast them and avoid all this going to disk?  I
have good relations with the NTP guys, and they wouldn't be likely to
resist a feature request with a network-health-monitoring use case
even if we didn't. Let's *use* that zorch for something, rather than
fielding a fragile pile of hacks.


-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>