[Rpm] [Starlink] Researchers Seeking Probe Volunteers in USA

Thu Jan 12 15:39:21 EST 2023

Hi Sebastian (et. al.),

[I'll comment up here instead of inline.]  

Let me start by saying that I have not been intimately involved with the
IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
similar vein, just adding the wireless first hop component and it's effects
on PTP.  

What was apparent from the outset was that there was a lack of understanding
what the terms "to synchronize" or "to be synchronized" actually mean.  It's
not trivial . because we live in a (approximately, that's another story!)
4-D space-time continuum where the Lorentz metric plays a critical role.
Therein, simultaneity (aka "things happening at the same time") means the
"distance" between two such events is zero and that distance is given by
sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
a clock somewhere. Now since everything is relative (time with respect to
what? / location with respect to where?) it's pretty easy to see that "if
you don't know where you are, you can't know what time it is!" (English
sailors of the 18th century knew this well!) Add to this the fact that if
everything were stationary, nothing would happen (as Einstein said "Nothing
happens until something moves!"), special relativity also pays a role.
Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
due to their "speed" (8700 mph roughly)! Then add the consequence that
without mass we wouldn't exist (in these forms at least:-)), and
gravitational effects (aka General Relativity) come into play. Those turn
out to make clocks on GPS satellites run 45usec/day faster than those on
earth!  The net effect is that GPS clocks run about 38usec/day faster than
clocks on earth.  So what does it mean to "synchronize to GPS"?  Point is:
it's a non-trivial question with a very complicated answer.  The reason it
is important to get all this right is that the "what that ties time and
space together" is the speed of light and that turns out to be a
"foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means if I am
uncertain about my location to say 300 meters, then I also am not sure what
time it is to a usec AND vice-versa! 

All that said, the simplest explanation of synchronization is probably: Two
clocks are synchronized if, when they are brought (slowly) into physical
proximity ("sat next to each other") in the same (quasi-)inertial frame and
the same gravitational potential (not so obvious BTW . see the FYI below!),
an observer of both would say "they are keeping time identically". Since
this experiment is rarely possible, one can never be "sure" that his clock
is synchronized to any other clock elsewhere. And what does it mean to say
they "were synchronized" when brought together, but now they are not because
they are now in different gravitational potentials! (FYI, there are land
mine detectors being developed on this very principle! I know someone who
actually worked on such a project!) 

This all gets even more complicated when dealing with large networks of
networks in which the "speed of information transmission" can vary depending
on the medium (cf. coaxial cables versus fiber versus microwave links!) In
fact, the atmosphere is one of those media and variations therein result in
the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
in order to get to sub-nsec/cm accuracy.  Point is if you have a set of
nodes distributed across the country all with GPS and all "synchronized to
GPS time", and a second identical set of nodes (with no GPS) instead
connected with a network of cables and fiber links, all of different lengths
and composition using different carrier frequencies (dielectric constants
vary with frequency!) "synchronized" to some clock somewhere using NTP or
PTP), the synchronization of the two sets will be different unless a common
reference clock is used AND all the above effects are taken into account,
and good luck with that! :-) 

In conclusion, if anyone tells you that clock synchronization in
communication networks is simple ("Just use GPS!"), you should feel free to
chuckle (under your breath if necessary:-)) 

Cheers,

RR

-----Original Message-----
From: Sebastian Moeller [mailto:moeller0 at gmx.de] 
Sent: Thursday, January 12, 2023 12:23 AM
To: Dick Roy
Cc: Rodney W. Grimes; mike.reynolds at netforecast.com; libreqos; David P.
Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

Hi RR,

> On Jan 11, 2023, at 22:46, Dick Roy <dickroy at alum.mit.edu> wrote:

> 

>  

>  

> -----Original Message-----

> From: Starlink [mailto:starlink-bounces at lists.bufferbloat.net] On Behalf
Of Sebastian Moeller via Starlink

> Sent: Wednesday, January 11, 2023 12:01 PM

> To: Rodney W. Grimes

> Cc: Dave Taht via Starlink; mike.reynolds at netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat

> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

>  

> Hi Rodney,

>  

>  

>  

>  

> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink at gndrsh.dnsmgr.net>
wrote:

> > 

> > Hello,

> > 

> >     Yall can call me crazy if you want.. but... see below [RWG]

> >> Hi Bib,

> >> 

> >> 

> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink at lists.bufferbloat.net> wrote:

> >>> 

> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.

> >>> 

> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.

> >> 

> >>    [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....

> > 

> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time".  IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's.  You need to put 4 time stamps in the packet, and with
that you can compute "offset".

> [RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.

>  

>       [SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?

> [RR] It's not that simple!  Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs!

      [SM] I happily believe you, but the same idea of "time slot" needs to
be shared by all nodes, no? So the clockss need to be reasonably similar
rate, aka synchronized (see below).

>  All the technologies you mentioned above are essentially P2P, not
intended for broadcast.  Point is, there is a point controller (aka PoC)
often called a base station (eNodeB, gNodeB, .) that actually "controls
everything that is necessary to control" at the UE including time, frequency
and sampling time offsets, and these are critical to get right if you want
to communicate, and they are ALL subject to the laws of physics (cf. the
speed of light)! Turns out that what is necessary for the system to function
anywhere near capacity, is for all the clocks governing transmissions from
the UEs to be "unsynchronized" such that all the UE transmissions arrive at
the PoC at the same (prescribed) time!

      [SM] Fair enough. I would call clocks that are "in sync" albeit with
individual offsets as synchronized, but I am a layman and that might sound
offensively wrong to experts in the field. But even without the naming my
point is that all systems that depend on some idea of shared time-base are
halfway there of exposing that time to end users, by "translating it into an
NTP time source at the modem.

> For some technologies, in particular 5G!, these considerations are
ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
believe me! J   

      [SM Far be it from me not to believe you, so thanks for the pointers.
Yet, I still think that unless different nodes of a shared segment move at
significantly different speeds, that there should be a common
"tick-duration" for all clocks even if each clock runs at an offset... (I
naively would try to implement something like that by trying to fully
synchronize clocks and maintain a local offset value to convert from
"absolute" time to "network" time, but likely because coming from the
outside I am blissfully unaware of the detail challenges that need to be
solved).

Regards & Thanks

      Sebastian

>  

>  

> > 

> >> 

> >> 

> >>> 

> >>> --trip-times

> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)

> > [RWG] --clock-skew

> >     enable the measurement of the wall clock difference between sender
and receiver

> > 

> >> 

> >>    [SM] Sweet!

> >> 

> >> Regards

> >>    Sebastian

> >> 

> >>> 

> >>> Bob

> >>>> I have many kvetches about the new latency under load tests being

> >>>> designed and distributed over the past year. I am delighted! that
they

> >>>> are happening, but most really need third party evaluation, and

> >>>> calibration, and a solid explanation of what network pathologies they

> >>>> do and don't cover. Also a RED team attitude towards them, as well as

> >>>> thinking hard about what you are not measuring (operations research).

> >>>> I actually rather love the new cloudflare speedtest, because it tests

> >>>> a single TCP connection, rather than dozens, and at the same time
folk

> >>>> are complaining that it doesn't find the actual "speed!". yet... the

> >>>> test itself more closely emulates a user experience than
speedtest.net

> >>>> does. I am personally pretty convinced that the fewer numbers of
flows

> >>>> that a web page opens improves the likelihood of a good user

> >>>> experience, but lack data on it.

> >>>> To try to tackle the evaluation and calibration part, I've reached
out

> >>>> to all the new test designers in the hope that we could get together

> >>>> and produce a report of what each new test is actually doing. I've

> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know

> >>>> of, and only to some response, please reach out to other test
designer

> >>>> folks and have them join the rpm email list?

> >>>> My principal kvetches in the new tests so far are:

> >>>> 0) None of the tests last long enough.

> >>>> Ideally there should be a mode where they at least run to "time of

> >>>> first loss", or periodically, just run longer than the

> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

> >>>> there! It's really bad science to optimize the internet for 20

> >>>> seconds. It's like optimizing a car, to handle well, for just 20

> >>>> seconds.

> >>>> 1) Not testing up + down + ping at the same time

> >>>> None of the new tests actually test the same thing that the infamous

> >>>> rrul test does - all the others still test up, then down, and ping.
It

> >>>> was/remains my hope that the simpler parts of the flent test suite -

> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

> >>>> tests would provide calibration to the test designers.

> >>>> we've got zillions of flent results in the archive published here:

> >>>> https://blog.cerowrt.org/post/found_in_flent/

> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.

> >>> 

> >>>> The new tests have all added up + ping and down + ping, but not up +

> >>>> down + ping. Why??

> >>>> The behaviors of what happens in that case are really non-intuitive,
I

> >>>> know, but... it's just one more phase to add to any one of those new

> >>>> tests. I'd be deliriously happy if someone(s) new to the field

> >>>> started doing that, even optionally, and boggled at how it defeated

> >>>> their assumptions.

> >>>> Among other things that would show...

> >>>> It's the home router industry's dirty secret than darn few "gigabit"

> >>>> home routers can actually forward in both directions at a gigabit.
I'd

> >>>> like to smash that perception thoroughly, but given our starting
point

> >>>> is a gigabit router was a "gigabit switch" - and historically been

> >>>> something that couldn't even forward at 200Mbit - we have a long way

> >>>> to go there.

> >>>> Only in the past year have non-x86 home routers appeared that could

> >>>> actually do a gbit in both directions.

> >>>> 2) Few are actually testing within-stream latency

> >>>> Apple's rpm project is making a stab in that direction. It looks

> >>>> highly likely, that with a little more work, crusader and

> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and

> >>>> markings, more directly. As for the rest... sampling TCP_INFO on

> >>>> windows, and Linux, at least, always appeared simple to me, but I'm

> >>>> discovering how hard it is by delving deep into the rust behind

> >>>> crusader.

> >>>> the goresponsiveness thing is also IMHO running WAY too many streams

> >>>> at the same time, I guess motivated by an attempt to have the test

> >>>> complete quickly?

> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.

> >>> 

> >>>> In the libreqos.io project we've established a testbed where tests
can

> >>>> be plunked through various ISP plan network emulations. It's here:

> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked

> >>>> up)

> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to

> >>>> leverage with that, so I don't have to nat the various emulations.

> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

> >>>> to see more test designers setup a testbed like this to calibrate

> >>>> their own stuff.

> >>>> Presently we're able to test:

> >>>> flent

> >>>> netperf

> >>>> iperf2

> >>>> iperf3

> >>>> speedtest-cli

> >>>> crusader

> >>>> the broadband forum udp based test:

> >>>> https://github.com/BroadbandForum/obudpst

> >>>> trexx

> >>>> There's also a virtual machine setup that we can remotely drive a web

> >>>> browser from (but I didn't want to nat the results to the world) to

> >>>> test other web services.

> >>>> _______________________________________________

> >>>> Rpm mailing list

> >>>> Rpm at lists.bufferbloat.net

> >>>> https://lists.bufferbloat.net/listinfo/rpm

> >>> _______________________________________________

> >>> Starlink mailing list

> >>> Starlink at lists.bufferbloat.net

> >>> https://lists.bufferbloat.net/listinfo/starlink

> >> 

> >> _______________________________________________

> >> Starlink mailing list

> >> Starlink at lists.bufferbloat.net

> >> https://lists.bufferbloat.net/listinfo/starlink

>  

> _______________________________________________

> Starlink mailing list

> Starlink at lists.bufferbloat.net

> https://lists.bufferbloat.net/listinfo/starlink

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/rpm/attachments/20230112/5be2867e/attachment-0001.html>