* [Bloat] Graph of bloat
@ 2015-07-08 10:23 Hal Murray
2015-07-08 15:55 ` Dave Taht
0 siblings, 1 reply; 12+ messages in thread
From: Hal Murray @ 2015-07-08 10:23 UTC (permalink / raw)
To: bloat; +Cc: Hal Murray
I was monitoring Google's time servers over the recent leap second. That
graph happened to include some good examples of bloat.
http://www.megapathdsl.net/~hmurray/bloat/google-off-smear-bloat.png
I have a slow DSL line with almost 4 seconds of buffering.
The blobs at -20 and -15 seconds are typical of a single large download. The
column at -6 seconds is typical of several active connections.
--
These are my opinions. I hate spam.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 10:23 [Bloat] Graph of bloat Hal Murray
@ 2015-07-08 15:55 ` Dave Taht
2015-07-08 16:11 ` Jan Ceuleers
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Dave Taht @ 2015-07-08 15:55 UTC (permalink / raw)
To: Hal Murray; +Cc: bloat
On Wed, Jul 8, 2015 at 3:23 AM, Hal Murray <hmurray@megapathdsl.net> wrote:
> I was monitoring Google's time servers over the recent leap second. That
> graph happened to include some good examples of bloat.
>
> http://www.megapathdsl.net/~hmurray/bloat/google-off-smear-bloat.png
>
> I have a slow DSL line with almost 4 seconds of buffering.
>
> The blobs at -20 and -15 seconds are typical of a single large download. The
> column at -6 seconds is typical of several active connections.
That is a very interesting graph! Does ntp adjust system time backward
based on getting nearly all it's samples with well over a 1/2 second
of induced delay?
Interestingly (or disturbingly) - dnsmasq just found and fixed a crash
bug that happened when time ran backwards. I can imagine a few other
core utilities/protocols/system services that should be checked for
bad behavior in this case.
http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2015q3/009701.html
>
> --
> These are my opinions. I hate spam.
>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 15:55 ` Dave Taht
@ 2015-07-08 16:11 ` Jan Ceuleers
2015-07-08 16:29 ` Jan Ceuleers
2015-07-08 16:32 ` Dave Taht
2015-07-08 17:53 ` Rich Brown
2015-07-09 10:07 ` Hal Murray
2 siblings, 2 replies; 12+ messages in thread
From: Jan Ceuleers @ 2015-07-08 16:11 UTC (permalink / raw)
To: bloat
On 08/07/15 17:55, Dave Taht wrote:
> That is a very interesting graph! Does ntp adjust system time backward
> based on getting nearly all it's samples with well over a 1/2 second
> of induced delay?
If there is a consistent asymmetrical delay then yes.
If the delay asymmetry is not persistent (but only occurs during up or
downloads) then the so-called huff-n-puff filter can be used to factor
it out.
https://www.eecis.udel.edu/~mills/ntp/html/huffpuff.html
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 16:11 ` Jan Ceuleers
@ 2015-07-08 16:29 ` Jan Ceuleers
2015-07-08 19:09 ` Alan Jenkins
2015-07-08 16:32 ` Dave Taht
1 sibling, 1 reply; 12+ messages in thread
From: Jan Ceuleers @ 2015-07-08 16:29 UTC (permalink / raw)
To: bloat
On 08/07/15 18:11, Jan Ceuleers wrote:
> On 08/07/15 17:55, Dave Taht wrote:
>> That is a very interesting graph! Does ntp adjust system time backward
>> based on getting nearly all it's samples with well over a 1/2 second
>> of induced delay?
>
> If there is a consistent asymmetrical delay then yes.
Let me qualify that "yes".
Normally ntpd will ensure that the system time as observed by the kernel
and applications always increases monotonically. The exception is where
the system time differs too much from what ntpd considers to be the
correct time and where ntpd is given permission to step the time (e.g.
using the -g command-line switch). In this case ntpd can step backwards.
What I meant in my previous message is that ntpd's idea of true time is
arrived at based on the assumption that the network delay is the same in
both directions to its servers. So if there is a systematically
different delay in one direction relative to the other then this
assumption falls down and ntpd's assessment of true time will be skewed.
The huff-n-puff filter helps in cases where the asymmetry in the delay
is not systematic, e.g. where the upstream channel does not suffer from
bufferbloat.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 16:11 ` Jan Ceuleers
2015-07-08 16:29 ` Jan Ceuleers
@ 2015-07-08 16:32 ` Dave Taht
2015-07-09 10:08 ` Jan Ceuleers
1 sibling, 1 reply; 12+ messages in thread
From: Dave Taht @ 2015-07-08 16:32 UTC (permalink / raw)
To: Jan Ceuleers; +Cc: bloat
On Wed, Jul 8, 2015 at 9:11 AM, Jan Ceuleers <jan.ceuleers@gmail.com> wrote:
> On 08/07/15 17:55, Dave Taht wrote:
>> That is a very interesting graph! Does ntp adjust system time backward
>> based on getting nearly all it's samples with well over a 1/2 second
>> of induced delay?
>
> If there is a consistent asymmetrical delay then yes.
>
> If the delay asymmetry is not persistent (but only occurs during up or
> downloads) then the so-called huff-n-puff filter can be used to factor
> it out.
>
> https://www.eecis.udel.edu/~mills/ntp/html/huffpuff.html
Judging from that graphic... I don't think huff and puff was designed
for the bufferbloated era! so the question remains, in hal's tests,
did ntp adjust the clock backwards?
> Jan
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 15:55 ` Dave Taht
2015-07-08 16:11 ` Jan Ceuleers
@ 2015-07-08 17:53 ` Rich Brown
2015-07-09 10:07 ` Hal Murray
2 siblings, 0 replies; 12+ messages in thread
From: Rich Brown @ 2015-07-08 17:53 UTC (permalink / raw)
To: Dave Taht; +Cc: Hal Murray, bloat
On Jul 8, 2015, at 11:55 AM, Dave Taht <dave.taht@gmail.com> wrote:
> On Wed, Jul 8, 2015 at 3:23 AM, Hal Murray <hmurray@megapathdsl.net> wrote:
>> I was monitoring Google's time servers over the recent leap second. That
>> graph happened to include some good examples of bloat.
>>
>> http://www.megapathdsl.net/~hmurray/bloat/google-off-smear-bloat.png
>>
>> I have a slow DSL line with almost 4 seconds of buffering.
>>
>> The blobs at -20 and -15 seconds are typical of a single large download. The
>> column at -6 seconds is typical of several active connections.
>
> That is a very interesting graph! Does ntp adjust system time backward
> based on getting nearly all it's samples with well over a 1/2 second
> of induced delay?
There was a good discussion of this on the NANOG list about a week or so ago, lamenting the leap-second. Lots of learned (and other) people chimed in on the pro's and con's. http://mailman.nanog.org/pipermail/nanog/2015-June/076540.html if you're terminally interested.
I think that chart is the clock offset of the Google time servers compared to the Menlo Park NTP server (which is "true time"). At 10 hours before midnight (and the arrival of the leap second), the Google servers report the seconds to be a tiny bit longer than a true second, so that by midnight, they're a full 500 msec (half-second) "ahead". When the leap-second drops in, they're a half-second behind "true" time, and it continues the ramp for the next 10 hours to be back "in sync".
(I may have the sense/direction of this wrong, but you get the idea...)
Rich
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 16:29 ` Jan Ceuleers
@ 2015-07-08 19:09 ` Alan Jenkins
0 siblings, 0 replies; 12+ messages in thread
From: Alan Jenkins @ 2015-07-08 19:09 UTC (permalink / raw)
To: Jan Ceuleers, bloat
[-- Attachment #1: Type: text/plain, Size: 3290 bytes --]
On 08/07/15 17:29, Jan Ceuleers wrote:
> On 08/07/15 18:11, Jan Ceuleers wrote:
>> On 08/07/15 17:55, Dave Taht wrote:
>>> That is a very interesting graph! Does ntp adjust system time backward
>>> based on getting nearly all it's samples with well over a 1/2 second
>>> of induced delay?
>> If there is a consistent asymmetrical delay then yes.
> Let me qualify that "yes".
>
> Normally ntpd will ensure that the system time as observed by the kernel
> and applications always increases monotonically. The exception is where
> the system time differs too much from what ntpd considers to be the
> correct time and where ntpd is given permission to step the time (e.g.
> using the -g command-line switch). In this case ntpd can step backwards.
<googles>. Ouch, I see Dave's point, I had hoped it was unfounded.
You don't mean -g, that applies to startup (the "panic threshold" offset
of 1000 seconds). Note startup also waits 900 seconds before allowing a
step, explicitly in case of transient bufferbloat.
You mean "ntpd is given permission to step the time, by *not* passing -x".
Normally, the time is slewed if the offset is less than the step
threshold, which is 128 ms by default, and stepped if above the
threshold.
(ow ow ow)
[The -x] option sets the threshold to 600 s, which is well within
the accuracy window to set the clock manually. Note: Since the slew
rate of typical Unix kernels is limited to 0.5 ms/s, each second of
adjustment requires an amortization interval of 2000 s. Thus, an
adjustment as much as 600 s will take almost 14 days to complete.
I also looked at phk's blog for the ntimed project again. Apparently
everyone filters ntp samples, which suggests massive delays would need
to affect more than one sample/poll interval. But I don't know how many
more than one, or what patterns it would filter in general. (Would it
miss bufferbloat less than some X but greater than the 128ms
threshold?). Apparently the filters are the hard part - makes sense but
sounds like hard work to analyze.
http://phk.freebsd.dk/time/20141024.html
> What I meant in my previous message is that ntpd's idea of true time is
> arrived at based on the assumption that the network delay is the same in
> both directions to its servers. So if there is a systematically
> different delay in one direction relative to the other then this
> assumption falls down and ntpd's assessment of true time will be skewed.
>
> The huff-n-puff filter helps in cases where the asymmetry in the delay
> is not systematic, e.g. where the upstream channel does not suffer from
> bufferbloat.
I see that's a tinker option which is not enabled by default, requires
manual tuning, is documented as experimental and being designed for
dialup :-P. IOW in a way it's less about bloat awareness and more the
general problem with NTP (like other projects) being under-resourced
global infrastructure...
PHK's post definitely suggests he's aware of asymmetry in delay events
and exploiting it in the Ntimed filters. OTOH the graphs there don't
seem to represent much bufferbloat. Maybe he could benefit from seeing
Hal's awesome graph (and for context, the equally awesome graphs at
http://www.dslreports.com/speedtest/results/bufferbloat).
Alan
[-- Attachment #2: Type: text/html, Size: 4415 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 15:55 ` Dave Taht
2015-07-08 16:11 ` Jan Ceuleers
2015-07-08 17:53 ` Rich Brown
@ 2015-07-09 10:07 ` Hal Murray
2015-07-09 10:55 ` Sebastian Moeller
2015-07-09 15:08 ` Dave Taht
2 siblings, 2 replies; 12+ messages in thread
From: Hal Murray @ 2015-07-09 10:07 UTC (permalink / raw)
To: Dave Taht; +Cc: Hal Murray, bloat
There are several parts to this discussion.
Leap seconds are ugly. The basic problem is that POSIX pretends they don't
exist. That's a carryover from the early days when computer time keeping
didn't have to worry about them. They weren't introduced until 1972. There
should be a second labeled 23:59:60 but most systems just set the clock back
a second and repeat 23:59:59, and all sorts of systems get in trouble when
time goes backwards.
They don't impact daily life like leap years do, so we don't teach kids about them when they learn about leap years. Most people don't even know they exist, and that includes most programmers. An additional complication is that they are unpredictable so you can't wire simple conversions into a chunk of code that gets copied around.
Google decided that it was simpler to "smear" their clocks rather than chase down and fix the bugs in all their code.
Time, technology and leaping seconds
http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html
The downside is that all their clocks are off by up to 1/2 second. If you don't need accurate time for legal reasons like stock market trading, their approach is probably a good one. Their internal clocks will all agree with each other, but they won't agree with outside systems that aren't playing the smearing game.
The blog above describes the smear using cosine - no sharp corners. The graph shows a linear smear.
> Does ntp adjust system time backward based on getting nearly all it's
> samples with well over a 1/2 second of induced delay?
The idea with smearing is to avoid having to set the clock back. The reference time on that graph is UTC. If your server was using only Google's NTP servers, it would follow that ramp, inserting the leap second over 20 hours rather than all at once by setting the clock back. That's the whole point of the smear. You lie to all your NTP clients and they all follow the same lie.
All that has nothing to do with bloat. It's just background for why I was making the graph.
--------
Now for NTP...
After the typical NTP client-server exchange, the client has 4 time stamps, send and receive for packets going in both directions. If you look at things in the right way, you have N equations and N+1 unknowns. You need one more equation to sort things out.
If you assume that the clocks on both ends are accurate, you can compute the network transit times in both directions.
NTP makes the assumption that the network delays are symmetric. Without bloat, that's generally reasonable. It does screwup on long links with asymmetric routing. If you watch NTP servers over a long distance, you can see steps when the routing changes. On the scale of bloat, those errors are minor. If you had a fast link rather than my slow DSL link they would be significant.
ntpd remembers the last 8 samples to each server. It only uses the one with the lowest round trip time, assuming that the others hit some sort of queueing delay. That filters out occasional bursts of interference or even bloat. It doesn't work for sustained bloat.
The huff-n-puff filter can be used for sustained bloat - better to coast than get confused. But there needs to be some limit on how long to wait before assuming the current timings are valid because the network has been reconfigured. If your bloat lasts long enough, ntpd will get confused.
In addition to getting the time correct, ntpd is also trying to calibrate the clock frequency so the future time will be more accurate (if the current time is good). That's the "drift". Without that correction, the clock will drift farther from the true time the longer you wait.
Ballpark numbers for the errors in crystals are 10s of PPM (parts per million). One PPM is roughly a second over 2 weeks, so an uncorrected clock is likely to drift seconds per day. I have one system that's off by 138 PPM. (The drift can also correct for minor errors in software.)
Normally, ntpd is just making minor corrections. It does that by slewing the clock, that is by fudging the clock frequency so the clock will "drift" in the desired direction. That takes a long time to make large corrections. ntpd will normally step the clock if the correction is over 128 ms.
But stepping the clock backwards is what causes most of the problems. ntpd has command line switches to don't-do-that, and another to allow one step at startup time... There are no simple answers.
--------
> Judging from that graphic... I don't think huff and puff was designed for
> the bufferbloated era! so the question remains, in hal's tests, did ntp
> adjust the clock backwards?
No. The system that collected that data was getting time from a good local GPS clock. It helps to have a place to stand if you want to collect time data.
Here is a typical pattern from a system using the pool without any huff-n-puff while I did a big download.
8 Jul 22:02:17 ntpd[26705]: 0.0.0.0 061c 0c clock_step -0.259747 s
8 Jul 23:06:24 ntpd[26705]: 0.0.0.0 061c 0c clock_step +0.274448 s
--
These are my opinions. I hate spam.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-08 16:32 ` Dave Taht
@ 2015-07-09 10:08 ` Jan Ceuleers
0 siblings, 0 replies; 12+ messages in thread
From: Jan Ceuleers @ 2015-07-09 10:08 UTC (permalink / raw)
To: Dave Taht; +Cc: bloat
On 08/07/15 18:32, Dave Taht wrote:
> Judging from that graphic... I don't think huff and puff was designed
> for the bufferbloated era! so the question remains, in hal's tests,
> did ntp adjust the clock backwards?
Dave,
No, ntpd did not adjust the time backwards.
What's going on in that graph is that it shows the different approach
that Google has taken to leap second insertion relative to ntpd's
implementation (which aligns with the standard).
Ntpd inserts a leap second by having the last UTC minute of the last day
of (in this case) June 30th last for 61 seconds rather than 60. The
extra second is number 60, after seconds 0 through 59. This is plotted
on the x axis.
The time reported by the Google server is compared to this on the y
axis. 10 hours prior to UTC midnight they slow down the clock by
1/72000, such that at the end of those 72000 seconds (20 hours) one more
"real" second will have passed than is reflected in the time reported by
the clock.
So obviously this means that during the 10 hours prior to UTC midnight
an offset is gradually built up between the Google servers and the rest
of the world, reaching half a second just prior to midnight. After the
leap second insertion on the standard server the sign of the offset is
inverted and the offset between the two begins to decline. 10 hours
after UTC midnight the Google servers return the rate of their clocks to
the normal "1 second per second" rate.
Anyway: Hal shared this because of the artifacts below the main graph
and he posits (quite reasonably) that this was due to bufferbloat.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-09 10:07 ` Hal Murray
@ 2015-07-09 10:55 ` Sebastian Moeller
2015-07-09 18:27 ` Hal Murray
2015-07-09 15:08 ` Dave Taht
1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Moeller @ 2015-07-09 10:55 UTC (permalink / raw)
To: Hal Murray; +Cc: bloat
Hi Hal,
On Jul 9, 2015, at 12:07 , Hal Murray <hmurray@megapathdsl.net> wrote:
> […]
> NTP makes the assumption that the network delays are symmetric. Without bloat, that's generally reasonable. It does screwup on long links with asymmetric routing. If you watch NTP servers over a long distance, you can see steps when the routing changes. On the scale of bloat, those errors are minor. If you had a fast link rather than my slow DSL link they would be significant.
> [...]
What about the inherent bandwidth and delay asymmetry of DSL links? The bandwidth imbalance alone can reach 10:1 and more easily (faster for ingress). And as far as I know classical reed-solomon forward error correction is most often combined with interleaving (to help against error bursts), and that interleaving often is asymmetric as well (but this asymmetry can go in both directions, so ingress might see more interleaving delay than egress. How much asymmetry can NTP cope with, and does NTP try to assess the one-way delay for both legs of the path to a server (without much thought I can fool myself into believing that if NTP would start with the symmetric connection fiction, sync the clocks and then use the synced clocks to asses the link delay asymmetry and then try to re-sync the clocks taking the just measured asymmetry into account might be a viable way around the issue; this seems simple so most likely it must be wrong ;) )?
Best Regards
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-09 10:07 ` Hal Murray
2015-07-09 10:55 ` Sebastian Moeller
@ 2015-07-09 15:08 ` Dave Taht
1 sibling, 0 replies; 12+ messages in thread
From: Dave Taht @ 2015-07-09 15:08 UTC (permalink / raw)
To: Hal Murray; +Cc: bloat
>> Judging from that graphic... I don't think huff and puff was designed for
>> the bufferbloated era! so the question remains, in hal's tests, did ntp
>> adjust the clock backwards?
>
> No. The system that collected that data was getting time from a good local GPS clock. It helps to have a place to stand if you want to collect time data.
>
> Here is a typical pattern from a system using the pool without any huff-n-puff while I did a big download.
> 8 Jul 22:02:17 ntpd[26705]: 0.0.0.0 061c 0c clock_step -0.259747 s
> 8 Jul 23:06:24 ntpd[26705]: 0.0.0.0 061c 0c clock_step +0.274448 s
OK, so, out there, on the billions of machines that dnsmasq runs on,
some are crashing when this happens.
I don't think I am satisified with the solutions.
http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2015q3/009701.html
>
>
> --
> These are my opinions. I hate spam.
>
>
>
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bloat] Graph of bloat
2015-07-09 10:55 ` Sebastian Moeller
@ 2015-07-09 18:27 ` Hal Murray
0 siblings, 0 replies; 12+ messages in thread
From: Hal Murray @ 2015-07-09 18:27 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: Hal Murray, bloat
moeller0@gmx.de said:
> What about the inherent bandwidth and delay asymmetry of DSL links?
The clock will be off by the calculated amount, but it will be stable. It
will be consistently off by the same amount.
Suppose server A gets its time from the internet over an asymmetrical link.
Now, suppose that server B out on the internet asks server A for the time.
The errors cancel out and it will get the correct answer.
--
These are my opinions. I hate spam.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-07-09 18:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-08 10:23 [Bloat] Graph of bloat Hal Murray
2015-07-08 15:55 ` Dave Taht
2015-07-08 16:11 ` Jan Ceuleers
2015-07-08 16:29 ` Jan Ceuleers
2015-07-08 19:09 ` Alan Jenkins
2015-07-08 16:32 ` Dave Taht
2015-07-09 10:08 ` Jan Ceuleers
2015-07-08 17:53 ` Rich Brown
2015-07-09 10:07 ` Hal Murray
2015-07-09 10:55 ` Sebastian Moeller
2015-07-09 18:27 ` Hal Murray
2015-07-09 15:08 ` Dave Taht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox