-----Original Message-----
From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of Sebastian
Moeller via Starlink
Sent: Wednesday, January 11, 2023 12:01 PM
To: Rodney W. Grimes
Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
Hi Rodney,
> On Jan 11, 2023, at 19:32, Rodney W. Grimes
<starlink@gndrsh.dnsmgr.net> wrote:
>
> Hello,
>
> Yall can call me crazy if you want.. but... see
below [RWG]
>> Hi Bib,
>>
>>
>>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net>
wrote:
>>>
>>> My biggest barrier is the lack of clock sync by the
devices, i.e. very limited support for PTP in data centers and in end devices.
This limits the ability to measure one way delays (OWD) and most assume that OWD
is 1/2 and RTT which typically is a mistake. We know this intuitively with
airplane flight times or even car commute times where the one way time is not
1/2 a round trip time. Google maps & directions provide a time estimate for
the one way link. It doesn't compute a round trip and divide by two.
>>>
>>> For those that can get clock sync working, the iperf 2
--trip-times options is useful.
>>
>> [SM] +1; and yet even with unsynchronized clocks
one can try to measure how latency changes under load and that can be done per
direction. Sure this is far inferior to real reliably measured OWDs, but if
life/the internet deals you lemons....
>
> [RWG] iperf2/iperf3, etc are already moving large amounts of data
back and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time". IIRC (its been 25 years since
I worked on NTP at this level) you *should* be able to get a fairly accurate
clock delta between each end, and then use that info and time stamps in the
data stream to compute OWD's. You need to put 4 time stamps in the packet,
and with that you can compute "offset".
[RR] For
this to work at a reasonable level of accuracy, the timestamping circuits on
both ends need to be deterministic and repeatable as I recall. Any uncertainty
in that process adds to synchronization errors/uncertainties.
[SM] Nice idea. I would guess that all
timeslot based access technologies (so starlink, docsis, GPON, LTE?) all
distribute "high quality time" carefully to the "modems",
so maybe all that would be needed is to expose that high quality time to the
LAN side of those modems, dressed up as NTP server?
[RR] It’s
not that simple! Distributing “high-quality time”, i.e. “synchronizing
all clocks” does not solve the communication problem in synchronous
slotted MAC/PHYs! All the technologies you mentioned above are
essentially P2P, not intended for broadcast. Point is, there is a point
controller (aka PoC) often called a base station (eNodeB, gNodeB, …) that
actually “controls everything that is necessary to control” at the
UE including time, frequency and sampling time offsets, and these are critical
to get right if you want to communicate, and they are ALL subject to the laws
of physics (cf. the speed of light)! Turns out that what is necessary for the
system to function anywhere near capacity, is for all the clocks governing
transmissions from the UEs to be “unsynchronized” such that all the
UE transmissions arrive at the PoC at the same (prescribed) time! For some
technologies, in particular 5G!, these considerations are ESSENTIAL. Feel free
to scour the 3GPP LTE 5G RLC and PHY specs if you don’t believe me! J
>
>>
>>
>>>
>>> --trip-times
>>> enable the measurement of end to end write to read
latencies (client and server clocks must be synchronized)
> [RWG] --clock-skew
> enable the measurement of the wall clock
difference between sender and receiver
>
>>
>> [SM] Sweet!
>>
>> Regards
>> Sebastian
>>
>>>
>>> Bob
>>>> I have many kvetches about the new latency under load
tests being
>>>> designed and distributed over the past year. I am
delighted! that they
>>>> are happening, but most really need third party
evaluation, and
>>>> calibration, and a solid explanation of what network
pathologies they
>>>> do and don't cover. Also a RED team attitude towards
them, as well as
>>>> thinking hard about what you are not measuring
(operations research).
>>>> I actually rather love the new cloudflare speedtest,
because it tests
>>>> a single TCP connection, rather than dozens, and at
the same time folk
>>>> are complaining that it doesn't find the actual
"speed!". yet... the
>>>> test itself more closely emulates a user experience
than speedtest.net
>>>> does. I am personally pretty convinced that the fewer
numbers of flows
>>>> that a web page opens improves the likelihood of a
good user
>>>> experience, but lack data on it.
>>>> To try to tackle the evaluation and calibration part,
I've reached out
>>>> to all the new test designers in the hope that we
could get together
>>>> and produce a report of what each new test is actually
doing. I've
>>>> tweeted, linked in, emailed, and spammed every
measurement list I know
>>>> of, and only to some response, please reach out to
other test designer
>>>> folks and have them join the rpm email list?
>>>> My principal kvetches in the new tests so far are:
>>>> 0) None of the tests last long enough.
>>>> Ideally there should be a mode where they at least run
to "time of
>>>> first loss", or periodically, just run longer than
the
>>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There
be dragons
>>>> there! It's really bad science to optimize the
internet for 20
>>>> seconds. It's like optimizing a car, to handle well,
for just 20
>>>> seconds.
>>>> 1) Not testing up + down + ping at the same time
>>>> None of the new tests actually test the same thing
that the infamous
>>>> rrul test does - all the others still test up, then
down, and ping. It
>>>> was/remains my hope that the simpler parts of the
flent test suite -
>>>> such as the tcp_up_squarewave tests, the rrul test,
and the rtt_fair
>>>> tests would provide calibration to the test designers.
>>>> we've got zillions of flent results in the archive
published here:
>>>> https://blog.cerowrt.org/post/found_in_flent/
>>>> ps. Misinformation about iperf 2 impacts my ability to
do this.
>>>
>>>> The new tests have all added up + ping and down +
ping, but not up +
>>>> down + ping. Why??
>>>> The behaviors of what happens in that case are really
non-intuitive, I
>>>> know, but... it's just one more phase to add to any
one of those new
>>>> tests. I'd be deliriously happy if someone(s) new to
the field
>>>> started doing that, even optionally, and boggled at
how it defeated
>>>> their assumptions.
>>>> Among other things that would show...
>>>> It's the home router industry's dirty secret than darn
few "gigabit"
>>>> home routers can actually forward in both directions
at a gigabit. I'd
>>>> like to smash that perception thoroughly, but given
our starting point
>>>> is a gigabit router was a "gigabit switch" -
and historically been
>>>> something that couldn't even forward at 200Mbit - we
have a long way
>>>> to go there.
>>>> Only in the past year have non-x86 home routers
appeared that could
>>>> actually do a gbit in both directions.
>>>> 2) Few are actually testing within-stream latency
>>>> Apple's rpm project is making a stab in that
direction. It looks
>>>> highly likely, that with a little more work, crusader
and
>>>> go-responsiveness can finally start sampling the tcp
RTT, loss and
>>>> markings, more directly. As for the rest... sampling
TCP_INFO on
>>>> windows, and Linux, at least, always appeared simple
to me, but I'm
>>>> discovering how hard it is by delving deep into the
rust behind
>>>> crusader.
>>>> the goresponsiveness thing is also IMHO running WAY
too many streams
>>>> at the same time, I guess motivated by an attempt to
have the test
>>>> complete quickly?
>>>> B) To try and tackle the validation problem:ps.
Misinformation about iperf 2 impacts my ability to do this.
>>>
>>>> In the libreqos.io project we've established a testbed
where tests can
>>>> be plunked through various ISP plan network
emulations. It's here:
>>>> https://payne.taht.net (run bandwidth test for what's
currently hooked
>>>> up)
>>>> We could rather use an AS number and at least a
ipv4/24 and ipv6/48 to
>>>> leverage with that, so I don't have to nat the various
emulations.
>>>> (and funding, anyone got funding?) Or, as the code is
GPLv2 licensed,
>>>> to see more test designers setup a testbed like this
to calibrate
>>>> their own stuff.
>>>> Presently we're able to test:
>>>> flent
>>>> netperf
>>>> iperf2
>>>> iperf3
>>>> speedtest-cli
>>>> crusader
>>>> the broadband forum udp based test:
>>>> https://github.com/BroadbandForum/obudpst
>>>> trexx
>>>> There's also a virtual machine setup that we can
remotely drive a web
>>>> browser from (but I didn't want to nat the results to
the world) to
>>>> test other web services.
>>>> _______________________________________________
>>>> Rpm mailing list
>>>> Rpm@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/rpm
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink