* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
[not found] <202301111832.30BIWevV030127@gndrsh.dnsmgr.net>
@ 2023-01-11 20:01 ` Sebastian Moeller
2023-01-11 21:46 ` Dick Roy
2023-01-11 20:09 ` rjmcmahon
1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-11 20:01 UTC (permalink / raw)
To: Rodney W. Grimes
Cc: rjmcmahon, Rpm, mike.reynolds, David P. Reed, libreqos,
Dave Taht via Starlink, bloat
Hi Rodney,
> On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net> wrote:
>
> Hello,
>
> Yall can call me crazy if you want.. but... see below [RWG]
>> Hi Bib,
>>
>>
>>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>>>
>>> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>>>
>>> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>>
>> [SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
>
> [RWG] iperf2/iperf3, etc are already moving large amounts of data back and forth, for that matter any rate test, why not abuse some of that data and add the fundemental NTP clock sync data and bidirectionally pass each others concept of "current time". IIRC (its been 25 years since I worked on NTP at this level) you *should* be able to get a fairly accurate clock delta between each end, and then use that info and time stamps in the data stream to compute OWD's. You need to put 4 time stamps in the packet, and with that you can compute "offset".
[SM] Nice idea. I would guess that all timeslot based access technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality time" carefully to the "modems", so maybe all that would be needed is to expose that high quality time to the LAN side of those modems, dressed up as NTP server?
>
>>
>>
>>>
>>> --trip-times
>>> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
> [RWG] --clock-skew
> enable the measurement of the wall clock difference between sender and receiver
>
>>
>> [SM] Sweet!
>>
>> Regards
>> Sebastian
>>
>>>
>>> Bob
>>>> I have many kvetches about the new latency under load tests being
>>>> designed and distributed over the past year. I am delighted! that they
>>>> are happening, but most really need third party evaluation, and
>>>> calibration, and a solid explanation of what network pathologies they
>>>> do and don't cover. Also a RED team attitude towards them, as well as
>>>> thinking hard about what you are not measuring (operations research).
>>>> I actually rather love the new cloudflare speedtest, because it tests
>>>> a single TCP connection, rather than dozens, and at the same time folk
>>>> are complaining that it doesn't find the actual "speed!". yet... the
>>>> test itself more closely emulates a user experience than speedtest.net
>>>> does. I am personally pretty convinced that the fewer numbers of flows
>>>> that a web page opens improves the likelihood of a good user
>>>> experience, but lack data on it.
>>>> To try to tackle the evaluation and calibration part, I've reached out
>>>> to all the new test designers in the hope that we could get together
>>>> and produce a report of what each new test is actually doing. I've
>>>> tweeted, linked in, emailed, and spammed every measurement list I know
>>>> of, and only to some response, please reach out to other test designer
>>>> folks and have them join the rpm email list?
>>>> My principal kvetches in the new tests so far are:
>>>> 0) None of the tests last long enough.
>>>> Ideally there should be a mode where they at least run to "time of
>>>> first loss", or periodically, just run longer than the
>>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>>> there! It's really bad science to optimize the internet for 20
>>>> seconds. It's like optimizing a car, to handle well, for just 20
>>>> seconds.
>>>> 1) Not testing up + down + ping at the same time
>>>> None of the new tests actually test the same thing that the infamous
>>>> rrul test does - all the others still test up, then down, and ping. It
>>>> was/remains my hope that the simpler parts of the flent test suite -
>>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>>>> tests would provide calibration to the test designers.
>>>> we've got zillions of flent results in the archive published here:
>>>> https://blog.cerowrt.org/post/found_in_flent/
>>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>>>
>>>> The new tests have all added up + ping and down + ping, but not up +
>>>> down + ping. Why??
>>>> The behaviors of what happens in that case are really non-intuitive, I
>>>> know, but... it's just one more phase to add to any one of those new
>>>> tests. I'd be deliriously happy if someone(s) new to the field
>>>> started doing that, even optionally, and boggled at how it defeated
>>>> their assumptions.
>>>> Among other things that would show...
>>>> It's the home router industry's dirty secret than darn few "gigabit"
>>>> home routers can actually forward in both directions at a gigabit. I'd
>>>> like to smash that perception thoroughly, but given our starting point
>>>> is a gigabit router was a "gigabit switch" - and historically been
>>>> something that couldn't even forward at 200Mbit - we have a long way
>>>> to go there.
>>>> Only in the past year have non-x86 home routers appeared that could
>>>> actually do a gbit in both directions.
>>>> 2) Few are actually testing within-stream latency
>>>> Apple's rpm project is making a stab in that direction. It looks
>>>> highly likely, that with a little more work, crusader and
>>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>>>> markings, more directly. As for the rest... sampling TCP_INFO on
>>>> windows, and Linux, at least, always appeared simple to me, but I'm
>>>> discovering how hard it is by delving deep into the rust behind
>>>> crusader.
>>>> the goresponsiveness thing is also IMHO running WAY too many streams
>>>> at the same time, I guess motivated by an attempt to have the test
>>>> complete quickly?
>>>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>>>
>>>> In the libreqos.io project we've established a testbed where tests can
>>>> be plunked through various ISP plan network emulations. It's here:
>>>> https://payne.taht.net (run bandwidth test for what's currently hooked
>>>> up)
>>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>>>> leverage with that, so I don't have to nat the various emulations.
>>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>>>> to see more test designers setup a testbed like this to calibrate
>>>> their own stuff.
>>>> Presently we're able to test:
>>>> flent
>>>> netperf
>>>> iperf2
>>>> iperf3
>>>> speedtest-cli
>>>> crusader
>>>> the broadband forum udp based test:
>>>> https://github.com/BroadbandForum/obudpst
>>>> trexx
>>>> There's also a virtual machine setup that we can remotely drive a web
>>>> browser from (but I didn't want to nat the results to the world) to
>>>> test other web services.
>>>> _______________________________________________
>>>> Rpm mailing list
>>>> Rpm@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/rpm
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
[not found] <202301111832.30BIWevV030127@gndrsh.dnsmgr.net>
2023-01-11 20:01 ` [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA Sebastian Moeller
@ 2023-01-11 20:09 ` rjmcmahon
2023-01-12 8:14 ` Sebastian Moeller
1 sibling, 1 reply; 19+ messages in thread
From: rjmcmahon @ 2023-01-11 20:09 UTC (permalink / raw)
To: Rodney W. Grimes
Cc: Sebastian Moeller, Rpm, mike.reynolds, David P. Reed, libreqos,
Dave Taht via Starlink, bloat
Iperf 2 is designed to measure network i/o. Note: It doesn't have to
move large amounts of data. It can support data profiles that don't
drive TCP's CCA as an example.
Two things I've been asked for and avoided:
1) Integrate clock sync into iperf's test traffic
2) Measure and output CPU usages
I think both of these are outside the scope of a tool designed to test
network i/o over sockets, rather these should be developed & validated
independently of a network i/o tool.
Clock error really isn't about amount/frequency of traffic but rather
getting a periodic high-quality reference. I tend to use GPS pulse per
second to lock the local system oscillator to. As David says, most every
modern handheld computer has the GPS chips to do this already. So to me
it seems more of a policy choice between data center operators and
device mfgs and less of a technical issue.
Bob
> Hello,
>
> Yall can call me crazy if you want.. but... see below [RWG]
>> Hi Bib,
>>
>>
>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>> >
>> > My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>> >
>> > For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>>
>> [SM] +1; and yet even with unsynchronized clocks one can try to
>> measure how latency changes under load and that can be done per
>> direction. Sure this is far inferior to real reliably measured OWDs,
>> but if life/the internet deals you lemons....
>
> [RWG] iperf2/iperf3, etc are already moving large amounts of data
> back and forth, for that matter any rate test, why not abuse some of
> that data and add the fundemental NTP clock sync data and
> bidirectionally pass each others concept of "current time". IIRC (its
> been 25 years since I worked on NTP at this level) you *should* be
> able to get a fairly accurate clock delta between each end, and then
> use that info and time stamps in the data stream to compute OWD's.
> You need to put 4 time stamps in the packet, and with that you can
> compute "offset".
>
>>
>>
>> >
>> > --trip-times
>> > enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
> [RWG] --clock-skew
> enable the measurement of the wall clock difference between sender and
> receiver
>
>>
>> [SM] Sweet!
>>
>> Regards
>> Sebastian
>>
>> >
>> > Bob
>> >> I have many kvetches about the new latency under load tests being
>> >> designed and distributed over the past year. I am delighted! that they
>> >> are happening, but most really need third party evaluation, and
>> >> calibration, and a solid explanation of what network pathologies they
>> >> do and don't cover. Also a RED team attitude towards them, as well as
>> >> thinking hard about what you are not measuring (operations research).
>> >> I actually rather love the new cloudflare speedtest, because it tests
>> >> a single TCP connection, rather than dozens, and at the same time folk
>> >> are complaining that it doesn't find the actual "speed!". yet... the
>> >> test itself more closely emulates a user experience than speedtest.net
>> >> does. I am personally pretty convinced that the fewer numbers of flows
>> >> that a web page opens improves the likelihood of a good user
>> >> experience, but lack data on it.
>> >> To try to tackle the evaluation and calibration part, I've reached out
>> >> to all the new test designers in the hope that we could get together
>> >> and produce a report of what each new test is actually doing. I've
>> >> tweeted, linked in, emailed, and spammed every measurement list I know
>> >> of, and only to some response, please reach out to other test designer
>> >> folks and have them join the rpm email list?
>> >> My principal kvetches in the new tests so far are:
>> >> 0) None of the tests last long enough.
>> >> Ideally there should be a mode where they at least run to "time of
>> >> first loss", or periodically, just run longer than the
>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> >> there! It's really bad science to optimize the internet for 20
>> >> seconds. It's like optimizing a car, to handle well, for just 20
>> >> seconds.
>> >> 1) Not testing up + down + ping at the same time
>> >> None of the new tests actually test the same thing that the infamous
>> >> rrul test does - all the others still test up, then down, and ping. It
>> >> was/remains my hope that the simpler parts of the flent test suite -
>> >> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> >> tests would provide calibration to the test designers.
>> >> we've got zillions of flent results in the archive published here:
>> >> https://blog.cerowrt.org/post/found_in_flent/
>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>> >
>> >> The new tests have all added up + ping and down + ping, but not up +
>> >> down + ping. Why??
>> >> The behaviors of what happens in that case are really non-intuitive, I
>> >> know, but... it's just one more phase to add to any one of those new
>> >> tests. I'd be deliriously happy if someone(s) new to the field
>> >> started doing that, even optionally, and boggled at how it defeated
>> >> their assumptions.
>> >> Among other things that would show...
>> >> It's the home router industry's dirty secret than darn few "gigabit"
>> >> home routers can actually forward in both directions at a gigabit. I'd
>> >> like to smash that perception thoroughly, but given our starting point
>> >> is a gigabit router was a "gigabit switch" - and historically been
>> >> something that couldn't even forward at 200Mbit - we have a long way
>> >> to go there.
>> >> Only in the past year have non-x86 home routers appeared that could
>> >> actually do a gbit in both directions.
>> >> 2) Few are actually testing within-stream latency
>> >> Apple's rpm project is making a stab in that direction. It looks
>> >> highly likely, that with a little more work, crusader and
>> >> go-responsiveness can finally start sampling the tcp RTT, loss and
>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>> >> windows, and Linux, at least, always appeared simple to me, but I'm
>> >> discovering how hard it is by delving deep into the rust behind
>> >> crusader.
>> >> the goresponsiveness thing is also IMHO running WAY too many streams
>> >> at the same time, I guess motivated by an attempt to have the test
>> >> complete quickly?
>> >> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>> >
>> >> In the libreqos.io project we've established a testbed where tests can
>> >> be plunked through various ISP plan network emulations. It's here:
>> >> https://payne.taht.net (run bandwidth test for what's currently hooked
>> >> up)
>> >> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> >> leverage with that, so I don't have to nat the various emulations.
>> >> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> >> to see more test designers setup a testbed like this to calibrate
>> >> their own stuff.
>> >> Presently we're able to test:
>> >> flent
>> >> netperf
>> >> iperf2
>> >> iperf3
>> >> speedtest-cli
>> >> crusader
>> >> the broadband forum udp based test:
>> >> https://github.com/BroadbandForum/obudpst
>> >> trexx
>> >> There's also a virtual machine setup that we can remotely drive a web
>> >> browser from (but I didn't want to nat the results to the world) to
>> >> test other web services.
>> >> _______________________________________________
>> >> Rpm mailing list
>> >> Rpm@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/rpm
>> > _______________________________________________
>> > Starlink mailing list
>> > Starlink@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/starlink
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
>>
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-11 20:01 ` [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA Sebastian Moeller
@ 2023-01-11 21:46 ` Dick Roy
2023-01-12 8:22 ` Sebastian Moeller
0 siblings, 1 reply; 19+ messages in thread
From: Dick Roy @ 2023-01-11 21:46 UTC (permalink / raw)
To: 'Sebastian Moeller', 'Rodney W. Grimes'
Cc: mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'rjmcmahon', 'bloat'
[-- Attachment #1: Type: text/plain, Size: 9881 bytes --]
-----Original Message-----
From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
Sebastian Moeller via Starlink
Sent: Wednesday, January 11, 2023 12:01 PM
To: Rodney W. Grimes
Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi Rodney,
> On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
wrote:
>
> Hello,
>
> Yall can call me crazy if you want.. but... see below [RWG]
>> Hi Bib,
>>
>>
>>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:
>>>
>>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.
>>>
>>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.
>>
>> [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....
>
> [RWG] iperf2/iperf3, etc are already moving large amounts of data back and
forth, for that matter any rate test, why not abuse some of that data and
add the fundemental NTP clock sync data and bidirectionally pass each others
concept of "current time". IIRC (its been 25 years since I worked on NTP at
this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's. You need to put 4 time stamps in the packet, and with
that you can compute "offset".
[RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.
[SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?
[RR] It's not that simple! Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs! All the technologies you mentioned above are
essentially P2P, not intended for broadcast. Point is, there is a point
controller (aka PoC) often called a base station (eNodeB, gNodeB, .) that
actually "controls everything that is necessary to control" at the UE
including time, frequency and sampling time offsets, and these are critical
to get right if you want to communicate, and they are ALL subject to the
laws of physics (cf. the speed of light)! Turns out that what is necessary
for the system to function anywhere near capacity, is for all the clocks
governing transmissions from the UEs to be "unsynchronized" such that all
the UE transmissions arrive at the PoC at the same (prescribed) time! For
some technologies, in particular 5G!, these considerations are ESSENTIAL.
Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't believe
me! :-)
>
>>
>>
>>>
>>> --trip-times
>>> enable the measurement of end to end write to read latencies (client and
server clocks must be synchronized)
> [RWG] --clock-skew
> enable the measurement of the wall clock difference between sender and
receiver
>
>>
>> [SM] Sweet!
>>
>> Regards
>> Sebastian
>>
>>>
>>> Bob
>>>> I have many kvetches about the new latency under load tests being
>>>> designed and distributed over the past year. I am delighted! that they
>>>> are happening, but most really need third party evaluation, and
>>>> calibration, and a solid explanation of what network pathologies they
>>>> do and don't cover. Also a RED team attitude towards them, as well as
>>>> thinking hard about what you are not measuring (operations research).
>>>> I actually rather love the new cloudflare speedtest, because it tests
>>>> a single TCP connection, rather than dozens, and at the same time folk
>>>> are complaining that it doesn't find the actual "speed!". yet... the
>>>> test itself more closely emulates a user experience than speedtest.net
>>>> does. I am personally pretty convinced that the fewer numbers of flows
>>>> that a web page opens improves the likelihood of a good user
>>>> experience, but lack data on it.
>>>> To try to tackle the evaluation and calibration part, I've reached out
>>>> to all the new test designers in the hope that we could get together
>>>> and produce a report of what each new test is actually doing. I've
>>>> tweeted, linked in, emailed, and spammed every measurement list I know
>>>> of, and only to some response, please reach out to other test designer
>>>> folks and have them join the rpm email list?
>>>> My principal kvetches in the new tests so far are:
>>>> 0) None of the tests last long enough.
>>>> Ideally there should be a mode where they at least run to "time of
>>>> first loss", or periodically, just run longer than the
>>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>>> there! It's really bad science to optimize the internet for 20
>>>> seconds. It's like optimizing a car, to handle well, for just 20
>>>> seconds.
>>>> 1) Not testing up + down + ping at the same time
>>>> None of the new tests actually test the same thing that the infamous
>>>> rrul test does - all the others still test up, then down, and ping. It
>>>> was/remains my hope that the simpler parts of the flent test suite -
>>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>>>> tests would provide calibration to the test designers.
>>>> we've got zillions of flent results in the archive published here:
>>>> https://blog.cerowrt.org/post/found_in_flent/
>>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>>>
>>>> The new tests have all added up + ping and down + ping, but not up +
>>>> down + ping. Why??
>>>> The behaviors of what happens in that case are really non-intuitive, I
>>>> know, but... it's just one more phase to add to any one of those new
>>>> tests. I'd be deliriously happy if someone(s) new to the field
>>>> started doing that, even optionally, and boggled at how it defeated
>>>> their assumptions.
>>>> Among other things that would show...
>>>> It's the home router industry's dirty secret than darn few "gigabit"
>>>> home routers can actually forward in both directions at a gigabit. I'd
>>>> like to smash that perception thoroughly, but given our starting point
>>>> is a gigabit router was a "gigabit switch" - and historically been
>>>> something that couldn't even forward at 200Mbit - we have a long way
>>>> to go there.
>>>> Only in the past year have non-x86 home routers appeared that could
>>>> actually do a gbit in both directions.
>>>> 2) Few are actually testing within-stream latency
>>>> Apple's rpm project is making a stab in that direction. It looks
>>>> highly likely, that with a little more work, crusader and
>>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>>>> markings, more directly. As for the rest... sampling TCP_INFO on
>>>> windows, and Linux, at least, always appeared simple to me, but I'm
>>>> discovering how hard it is by delving deep into the rust behind
>>>> crusader.
>>>> the goresponsiveness thing is also IMHO running WAY too many streams
>>>> at the same time, I guess motivated by an attempt to have the test
>>>> complete quickly?
>>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.
>>>
>>>> In the libreqos.io project we've established a testbed where tests can
>>>> be plunked through various ISP plan network emulations. It's here:
>>>> https://payne.taht.net (run bandwidth test for what's currently hooked
>>>> up)
>>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>>>> leverage with that, so I don't have to nat the various emulations.
>>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>>>> to see more test designers setup a testbed like this to calibrate
>>>> their own stuff.
>>>> Presently we're able to test:
>>>> flent
>>>> netperf
>>>> iperf2
>>>> iperf3
>>>> speedtest-cli
>>>> crusader
>>>> the broadband forum udp based test:
>>>> https://github.com/BroadbandForum/obudpst
>>>> trexx
>>>> There's also a virtual machine setup that we can remotely drive a web
>>>> browser from (but I didn't want to nat the results to the world) to
>>>> test other web services.
>>>> _______________________________________________
>>>> Rpm mailing list
>>>> Rpm@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/rpm
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 32121 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-11 20:09 ` rjmcmahon
@ 2023-01-12 8:14 ` Sebastian Moeller
2023-01-12 17:49 ` Robert McMahon
0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-12 8:14 UTC (permalink / raw)
To: rjmcmahon
Cc: Rodney W. Grimes, Rpm, mike.reynolds, David P. Reed, libreqos,
Dave Taht via Starlink, bloat
Hi Bob,
> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
>
> Iperf 2 is designed to measure network i/o. Note: It doesn't have to move large amounts of data. It can support data profiles that don't drive TCP's CCA as an example.
>
> Two things I've been asked for and avoided:
>
> 1) Integrate clock sync into iperf's test traffic
[SM] This I understand, measurement conditions can be unsuited for tight time synchronization...
> 2) Measure and output CPU usages
[SM] This one puzzles me, as far as I understand the only way to properly diagnose network issues is to rule out other things like CPU overload that can have symptoms similar to network issues. As an example, the cake qdisc will if CPU cycles become tight first increases its internal queueing and jitter (not consciously, it is just an observation that once cake does not get access to the CPU as timely as it wants, queuing latency and variability increases) and then later also shows reduced throughput, so similar things that can happen along an e2e network path for completely different reasons, e.g. lower level retransmissions or a variable rate link. So i would think that checking the CPU load at least coarse would be within the scope of network testing tools, no?
Regards
Sebastian
> I think both of these are outside the scope of a tool designed to test network i/o over sockets, rather these should be developed & validated independently of a network i/o tool.
>
> Clock error really isn't about amount/frequency of traffic but rather getting a periodic high-quality reference. I tend to use GPS pulse per second to lock the local system oscillator to. As David says, most every modern handheld computer has the GPS chips to do this already. So to me it seems more of a policy choice between data center operators and device mfgs and less of a technical issue.
>
> Bob
>> Hello,
>> Yall can call me crazy if you want.. but... see below [RWG]
>>> Hi Bib,
>>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>>> >
>>> > My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>>> >
>>> > For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>>> [SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
>> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>> back and forth, for that matter any rate test, why not abuse some of
>> that data and add the fundemental NTP clock sync data and
>> bidirectionally pass each others concept of "current time". IIRC (its
>> been 25 years since I worked on NTP at this level) you *should* be
>> able to get a fairly accurate clock delta between each end, and then
>> use that info and time stamps in the data stream to compute OWD's.
>> You need to put 4 time stamps in the packet, and with that you can
>> compute "offset".
>>> >
>>> > --trip-times
>>> > enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
>> [RWG] --clock-skew
>> enable the measurement of the wall clock difference between sender and receiver
>>> [SM] Sweet!
>>> Regards
>>> Sebastian
>>> >
>>> > Bob
>>> >> I have many kvetches about the new latency under load tests being
>>> >> designed and distributed over the past year. I am delighted! that they
>>> >> are happening, but most really need third party evaluation, and
>>> >> calibration, and a solid explanation of what network pathologies they
>>> >> do and don't cover. Also a RED team attitude towards them, as well as
>>> >> thinking hard about what you are not measuring (operations research).
>>> >> I actually rather love the new cloudflare speedtest, because it tests
>>> >> a single TCP connection, rather than dozens, and at the same time folk
>>> >> are complaining that it doesn't find the actual "speed!". yet... the
>>> >> test itself more closely emulates a user experience than speedtest.net
>>> >> does. I am personally pretty convinced that the fewer numbers of flows
>>> >> that a web page opens improves the likelihood of a good user
>>> >> experience, but lack data on it.
>>> >> To try to tackle the evaluation and calibration part, I've reached out
>>> >> to all the new test designers in the hope that we could get together
>>> >> and produce a report of what each new test is actually doing. I've
>>> >> tweeted, linked in, emailed, and spammed every measurement list I know
>>> >> of, and only to some response, please reach out to other test designer
>>> >> folks and have them join the rpm email list?
>>> >> My principal kvetches in the new tests so far are:
>>> >> 0) None of the tests last long enough.
>>> >> Ideally there should be a mode where they at least run to "time of
>>> >> first loss", or periodically, just run longer than the
>>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>> >> there! It's really bad science to optimize the internet for 20
>>> >> seconds. It's like optimizing a car, to handle well, for just 20
>>> >> seconds.
>>> >> 1) Not testing up + down + ping at the same time
>>> >> None of the new tests actually test the same thing that the infamous
>>> >> rrul test does - all the others still test up, then down, and ping. It
>>> >> was/remains my hope that the simpler parts of the flent test suite -
>>> >> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>>> >> tests would provide calibration to the test designers.
>>> >> we've got zillions of flent results in the archive published here:
>>> >> https://blog.cerowrt.org/post/found_in_flent/
>>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>>> >
>>> >> The new tests have all added up + ping and down + ping, but not up +
>>> >> down + ping. Why??
>>> >> The behaviors of what happens in that case are really non-intuitive, I
>>> >> know, but... it's just one more phase to add to any one of those new
>>> >> tests. I'd be deliriously happy if someone(s) new to the field
>>> >> started doing that, even optionally, and boggled at how it defeated
>>> >> their assumptions.
>>> >> Among other things that would show...
>>> >> It's the home router industry's dirty secret than darn few "gigabit"
>>> >> home routers can actually forward in both directions at a gigabit. I'd
>>> >> like to smash that perception thoroughly, but given our starting point
>>> >> is a gigabit router was a "gigabit switch" - and historically been
>>> >> something that couldn't even forward at 200Mbit - we have a long way
>>> >> to go there.
>>> >> Only in the past year have non-x86 home routers appeared that could
>>> >> actually do a gbit in both directions.
>>> >> 2) Few are actually testing within-stream latency
>>> >> Apple's rpm project is making a stab in that direction. It looks
>>> >> highly likely, that with a little more work, crusader and
>>> >> go-responsiveness can finally start sampling the tcp RTT, loss and
>>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>>> >> windows, and Linux, at least, always appeared simple to me, but I'm
>>> >> discovering how hard it is by delving deep into the rust behind
>>> >> crusader.
>>> >> the goresponsiveness thing is also IMHO running WAY too many streams
>>> >> at the same time, I guess motivated by an attempt to have the test
>>> >> complete quickly?
>>> >> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>>> >
>>> >> In the libreqos.io project we've established a testbed where tests can
>>> >> be plunked through various ISP plan network emulations. It's here:
>>> >> https://payne.taht.net (run bandwidth test for what's currently hooked
>>> >> up)
>>> >> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>>> >> leverage with that, so I don't have to nat the various emulations.
>>> >> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>>> >> to see more test designers setup a testbed like this to calibrate
>>> >> their own stuff.
>>> >> Presently we're able to test:
>>> >> flent
>>> >> netperf
>>> >> iperf2
>>> >> iperf3
>>> >> speedtest-cli
>>> >> crusader
>>> >> the broadband forum udp based test:
>>> >> https://github.com/BroadbandForum/obudpst
>>> >> trexx
>>> >> There's also a virtual machine setup that we can remotely drive a web
>>> >> browser from (but I didn't want to nat the results to the world) to
>>> >> test other web services.
>>> >> _______________________________________________
>>> >> Rpm mailing list
>>> >> Rpm@lists.bufferbloat.net
>>> >> https://lists.bufferbloat.net/listinfo/rpm
>>> > _______________________________________________
>>> > Starlink mailing list
>>> > Starlink@lists.bufferbloat.net
>>> > https://lists.bufferbloat.net/listinfo/starlink
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-11 21:46 ` Dick Roy
@ 2023-01-12 8:22 ` Sebastian Moeller
2023-01-12 18:02 ` rjmcmahon
2023-01-12 20:39 ` Dick Roy
0 siblings, 2 replies; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-12 8:22 UTC (permalink / raw)
To: Dick Roy
Cc: Rodney W. Grimes, mike.reynolds, libreqos, David P. Reed, Rpm,
rjmcmahon, bloat
Hi RR,
> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>
>
> -----Original Message-----
> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of Sebastian Moeller via Starlink
> Sent: Wednesday, January 11, 2023 12:01 PM
> To: Rodney W. Grimes
> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
> Hi Rodney,
>
>
>
>
> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net> wrote:
> >
> > Hello,
> >
> > Yall can call me crazy if you want.. but... see below [RWG]
> >> Hi Bib,
> >>
> >>
> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
> >>>
> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
> >>>
> >>> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
> >>
> >> [SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
> >
> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back and forth, for that matter any rate test, why not abuse some of that data and add the fundemental NTP clock sync data and bidirectionally pass each others concept of "current time". IIRC (its been 25 years since I worked on NTP at this level) you *should* be able to get a fairly accurate clock delta between each end, and then use that info and time stamps in the data stream to compute OWD's. You need to put 4 time stamps in the packet, and with that you can compute "offset".
> [RR] For this to work at a reasonable level of accuracy, the timestamping circuits on both ends need to be deterministic and repeatable as I recall. Any uncertainty in that process adds to synchronization errors/uncertainties.
>
> [SM] Nice idea. I would guess that all timeslot based access technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality time" carefully to the "modems", so maybe all that would be needed is to expose that high quality time to the LAN side of those modems, dressed up as NTP server?
> [RR] It’s not that simple! Distributing “high-quality time”, i.e. “synchronizing all clocks” does not solve the communication problem in synchronous slotted MAC/PHYs!
[SM] I happily believe you, but the same idea of "time slot" needs to be shared by all nodes, no? So the clockss need to be reasonably similar rate, aka synchronized (see below).
> All the technologies you mentioned above are essentially P2P, not intended for broadcast. Point is, there is a point controller (aka PoC) often called a base station (eNodeB, gNodeB, …) that actually “controls everything that is necessary to control” at the UE including time, frequency and sampling time offsets, and these are critical to get right if you want to communicate, and they are ALL subject to the laws of physics (cf. the speed of light)! Turns out that what is necessary for the system to function anywhere near capacity, is for all the clocks governing transmissions from the UEs to be “unsynchronized” such that all the UE transmissions arrive at the PoC at the same (prescribed) time!
[SM] Fair enough. I would call clocks that are "in sync" albeit with individual offsets as synchronized, but I am a layman and that might sound offensively wrong to experts in the field. But even without the naming my point is that all systems that depend on some idea of shared time-base are halfway there of exposing that time to end users, by "translating it into an NTP time source at the modem.
> For some technologies, in particular 5G!, these considerations are ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don’t believe me! J
[SM Far be it from me not to believe you, so thanks for the pointers. Yet, I still think that unless different nodes of a shared segment move at significantly different speeds, that there should be a common "tick-duration" for all clocks even if each clock runs at an offset... (I naively would try to implement something like that by trying to fully synchronize clocks and maintain a local offset value to convert from "absolute" time to "network" time, but likely because coming from the outside I am blissfully unaware of the detail challenges that need to be solved).
Regards & Thanks
Sebastian
>
>
> >
> >>
> >>
> >>>
> >>> --trip-times
> >>> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
> > [RWG] --clock-skew
> > enable the measurement of the wall clock difference between sender and receiver
> >
> >>
> >> [SM] Sweet!
> >>
> >> Regards
> >> Sebastian
> >>
> >>>
> >>> Bob
> >>>> I have many kvetches about the new latency under load tests being
> >>>> designed and distributed over the past year. I am delighted! that they
> >>>> are happening, but most really need third party evaluation, and
> >>>> calibration, and a solid explanation of what network pathologies they
> >>>> do and don't cover. Also a RED team attitude towards them, as well as
> >>>> thinking hard about what you are not measuring (operations research).
> >>>> I actually rather love the new cloudflare speedtest, because it tests
> >>>> a single TCP connection, rather than dozens, and at the same time folk
> >>>> are complaining that it doesn't find the actual "speed!". yet... the
> >>>> test itself more closely emulates a user experience than speedtest.net
> >>>> does. I am personally pretty convinced that the fewer numbers of flows
> >>>> that a web page opens improves the likelihood of a good user
> >>>> experience, but lack data on it.
> >>>> To try to tackle the evaluation and calibration part, I've reached out
> >>>> to all the new test designers in the hope that we could get together
> >>>> and produce a report of what each new test is actually doing. I've
> >>>> tweeted, linked in, emailed, and spammed every measurement list I know
> >>>> of, and only to some response, please reach out to other test designer
> >>>> folks and have them join the rpm email list?
> >>>> My principal kvetches in the new tests so far are:
> >>>> 0) None of the tests last long enough.
> >>>> Ideally there should be a mode where they at least run to "time of
> >>>> first loss", or periodically, just run longer than the
> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
> >>>> there! It's really bad science to optimize the internet for 20
> >>>> seconds. It's like optimizing a car, to handle well, for just 20
> >>>> seconds.
> >>>> 1) Not testing up + down + ping at the same time
> >>>> None of the new tests actually test the same thing that the infamous
> >>>> rrul test does - all the others still test up, then down, and ping. It
> >>>> was/remains my hope that the simpler parts of the flent test suite -
> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
> >>>> tests would provide calibration to the test designers.
> >>>> we've got zillions of flent results in the archive published here:
> >>>> https://blog.cerowrt.org/post/found_in_flent/
> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
> >>>
> >>>> The new tests have all added up + ping and down + ping, but not up +
> >>>> down + ping. Why??
> >>>> The behaviors of what happens in that case are really non-intuitive, I
> >>>> know, but... it's just one more phase to add to any one of those new
> >>>> tests. I'd be deliriously happy if someone(s) new to the field
> >>>> started doing that, even optionally, and boggled at how it defeated
> >>>> their assumptions.
> >>>> Among other things that would show...
> >>>> It's the home router industry's dirty secret than darn few "gigabit"
> >>>> home routers can actually forward in both directions at a gigabit. I'd
> >>>> like to smash that perception thoroughly, but given our starting point
> >>>> is a gigabit router was a "gigabit switch" - and historically been
> >>>> something that couldn't even forward at 200Mbit - we have a long way
> >>>> to go there.
> >>>> Only in the past year have non-x86 home routers appeared that could
> >>>> actually do a gbit in both directions.
> >>>> 2) Few are actually testing within-stream latency
> >>>> Apple's rpm project is making a stab in that direction. It looks
> >>>> highly likely, that with a little more work, crusader and
> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
> >>>> discovering how hard it is by delving deep into the rust behind
> >>>> crusader.
> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
> >>>> at the same time, I guess motivated by an attempt to have the test
> >>>> complete quickly?
> >>>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
> >>>
> >>>> In the libreqos.io project we've established a testbed where tests can
> >>>> be plunked through various ISP plan network emulations. It's here:
> >>>> https://payne.taht.net (run bandwidth test for what's currently hooked
> >>>> up)
> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
> >>>> leverage with that, so I don't have to nat the various emulations.
> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
> >>>> to see more test designers setup a testbed like this to calibrate
> >>>> their own stuff.
> >>>> Presently we're able to test:
> >>>> flent
> >>>> netperf
> >>>> iperf2
> >>>> iperf3
> >>>> speedtest-cli
> >>>> crusader
> >>>> the broadband forum udp based test:
> >>>> https://github.com/BroadbandForum/obudpst
> >>>> trexx
> >>>> There's also a virtual machine setup that we can remotely drive a web
> >>>> browser from (but I didn't want to nat the results to the world) to
> >>>> test other web services.
> >>>> _______________________________________________
> >>>> Rpm mailing list
> >>>> Rpm@lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/rpm
> >>> _______________________________________________
> >>> Starlink mailing list
> >>> Starlink@lists.bufferbloat.net
> >>> https://lists.bufferbloat.net/listinfo/starlink
> >>
> >> _______________________________________________
> >> Starlink mailing list
> >> Starlink@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/starlink
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 8:14 ` Sebastian Moeller
@ 2023-01-12 17:49 ` Robert McMahon
2023-01-12 21:57 ` Dick Roy
0 siblings, 1 reply; 19+ messages in thread
From: Robert McMahon @ 2023-01-12 17:49 UTC (permalink / raw)
To: Sebastian Moeller
Cc: Rodney W. Grimes, Rpm, mike.reynolds, David P. Reed, libreqos,
Dave Taht via Starlink, bloat
[-- Attachment #1: Type: text/plain, Size: 10817 bytes --]
Hi Sebastien,
You make a good point. What I did was issue a warning if the tool found it was being CPU limited vs i/o limited. This indicates the i/o test likely is inaccurate from an i/o perspective, and the results are suspect. It does this crudely by comparing the cpu thread doing stats against the traffic threads doing i/o, which thread is waiting on the others. There is no attempt to assess the cpu load itself. So it's designed with a singular purpose of making sure i/o threads only block on syscalls of write and read.
I probably should revisit this both in design and implementation. Thanks for bringing it up and all input is truly appreciated.
Bob
On Jan 12, 2023, 12:14 AM, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>Hi Bob,
>
>
>> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
>>
>> Iperf 2 is designed to measure network i/o. Note: It doesn't have to
>move large amounts of data. It can support data profiles that don't
>drive TCP's CCA as an example.
>>
>> Two things I've been asked for and avoided:
>>
>> 1) Integrate clock sync into iperf's test traffic
>
> [SM] This I understand, measurement conditions can be unsuited for
>tight time synchronization...
>
>
>> 2) Measure and output CPU usages
>
> [SM] This one puzzles me, as far as I understand the only way to
>properly diagnose network issues is to rule out other things like CPU
>overload that can have symptoms similar to network issues. As an
>example, the cake qdisc will if CPU cycles become tight first increases
>its internal queueing and jitter (not consciously, it is just an
>observation that once cake does not get access to the CPU as timely as
>it wants, queuing latency and variability increases) and then later
>also shows reduced throughput, so similar things that can happen along
>an e2e network path for completely different reasons, e.g. lower level
>retransmissions or a variable rate link. So i would think that checking
>the CPU load at least coarse would be within the scope of network
>testing tools, no?
>
>Regards
> Sebastian
>
>
>
>
>> I think both of these are outside the scope of a tool designed to
>test network i/o over sockets, rather these should be developed &
>validated independently of a network i/o tool.
>>
>> Clock error really isn't about amount/frequency of traffic but rather
>getting a periodic high-quality reference. I tend to use GPS pulse per
>second to lock the local system oscillator to. As David says, most
>every modern handheld computer has the GPS chips to do this already. So
>to me it seems more of a policy choice between data center operators
>and device mfgs and less of a technical issue.
>>
>> Bob
>>> Hello,
>>> Yall can call me crazy if you want.. but... see below [RWG]
>>>> Hi Bib,
>>>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>>>> >
>>>> > My biggest barrier is the lack of clock sync by the devices, i.e.
>very limited support for PTP in data centers and in end devices. This
>limits the ability to measure one way delays (OWD) and most assume that
>OWD is 1/2 and RTT which typically is a mistake. We know this
>intuitively with airplane flight times or even car commute times where
>the one way time is not 1/2 a round trip time. Google maps & directions
>provide a time estimate for the one way link. It doesn't compute a
>round trip and divide by two.
>>>> >
>>>> > For those that can get clock sync working, the iperf 2
>--trip-times options is useful.
>>>> [SM] +1; and yet even with unsynchronized clocks one can try to
>measure how latency changes under load and that can be done per
>direction. Sure this is far inferior to real reliably measured OWDs,
>but if life/the internet deals you lemons....
>>> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>>> back and forth, for that matter any rate test, why not abuse some of
>>> that data and add the fundemental NTP clock sync data and
>>> bidirectionally pass each others concept of "current time". IIRC
>(its
>>> been 25 years since I worked on NTP at this level) you *should* be
>>> able to get a fairly accurate clock delta between each end, and then
>>> use that info and time stamps in the data stream to compute OWD's.
>>> You need to put 4 time stamps in the packet, and with that you can
>>> compute "offset".
>>>> >
>>>> > --trip-times
>>>> > enable the measurement of end to end write to read latencies
>(client and server clocks must be synchronized)
>>> [RWG] --clock-skew
>>> enable the measurement of the wall clock difference between sender
>and receiver
>>>> [SM] Sweet!
>>>> Regards
>>>> Sebastian
>>>> >
>>>> > Bob
>>>> >> I have many kvetches about the new latency under load tests
>being
>>>> >> designed and distributed over the past year. I am delighted!
>that they
>>>> >> are happening, but most really need third party evaluation, and
>>>> >> calibration, and a solid explanation of what network pathologies
>they
>>>> >> do and don't cover. Also a RED team attitude towards them, as
>well as
>>>> >> thinking hard about what you are not measuring (operations
>research).
>>>> >> I actually rather love the new cloudflare speedtest, because it
>tests
>>>> >> a single TCP connection, rather than dozens, and at the same
>time folk
>>>> >> are complaining that it doesn't find the actual "speed!". yet...
>the
>>>> >> test itself more closely emulates a user experience than
>speedtest.net
>>>> >> does. I am personally pretty convinced that the fewer numbers of
>flows
>>>> >> that a web page opens improves the likelihood of a good user
>>>> >> experience, but lack data on it.
>>>> >> To try to tackle the evaluation and calibration part, I've
>reached out
>>>> >> to all the new test designers in the hope that we could get
>together
>>>> >> and produce a report of what each new test is actually doing.
>I've
>>>> >> tweeted, linked in, emailed, and spammed every measurement list
>I know
>>>> >> of, and only to some response, please reach out to other test
>designer
>>>> >> folks and have them join the rpm email list?
>>>> >> My principal kvetches in the new tests so far are:
>>>> >> 0) None of the tests last long enough.
>>>> >> Ideally there should be a mode where they at least run to "time
>of
>>>> >> first loss", or periodically, just run longer than the
>>>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>>> >> there! It's really bad science to optimize the internet for 20
>>>> >> seconds. It's like optimizing a car, to handle well, for just 20
>>>> >> seconds.
>>>> >> 1) Not testing up + down + ping at the same time
>>>> >> None of the new tests actually test the same thing that the
>infamous
>>>> >> rrul test does - all the others still test up, then down, and
>ping. It
>>>> >> was/remains my hope that the simpler parts of the flent test
>suite -
>>>> >> such as the tcp_up_squarewave tests, the rrul test, and the
>rtt_fair
>>>> >> tests would provide calibration to the test designers.
>>>> >> we've got zillions of flent results in the archive published
>here:
>>>> >> https://blog.cerowrt.org/post/found_in_flent/
>>>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>>>> >
>>>> >> The new tests have all added up + ping and down + ping, but not
>up +
>>>> >> down + ping. Why??
>>>> >> The behaviors of what happens in that case are really
>non-intuitive, I
>>>> >> know, but... it's just one more phase to add to any one of those
>new
>>>> >> tests. I'd be deliriously happy if someone(s) new to the field
>>>> >> started doing that, even optionally, and boggled at how it
>defeated
>>>> >> their assumptions.
>>>> >> Among other things that would show...
>>>> >> It's the home router industry's dirty secret than darn few
>"gigabit"
>>>> >> home routers can actually forward in both directions at a
>gigabit. I'd
>>>> >> like to smash that perception thoroughly, but given our starting
>point
>>>> >> is a gigabit router was a "gigabit switch" - and historically
>been
>>>> >> something that couldn't even forward at 200Mbit - we have a long
>way
>>>> >> to go there.
>>>> >> Only in the past year have non-x86 home routers appeared that
>could
>>>> >> actually do a gbit in both directions.
>>>> >> 2) Few are actually testing within-stream latency
>>>> >> Apple's rpm project is making a stab in that direction. It looks
>>>> >> highly likely, that with a little more work, crusader and
>>>> >> go-responsiveness can finally start sampling the tcp RTT, loss
>and
>>>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>>>> >> windows, and Linux, at least, always appeared simple to me, but
>I'm
>>>> >> discovering how hard it is by delving deep into the rust behind
>>>> >> crusader.
>>>> >> the goresponsiveness thing is also IMHO running WAY too many
>streams
>>>> >> at the same time, I guess motivated by an attempt to have the
>test
>>>> >> complete quickly?
>>>> >> B) To try and tackle the validation problem:ps. Misinformation
>about iperf 2 impacts my ability to do this.
>>>> >
>>>> >> In the libreqos.io project we've established a testbed where
>tests can
>>>> >> be plunked through various ISP plan network emulations. It's
>here:
>>>> >> https://payne.taht.net (run bandwidth test for what's currently
>hooked
>>>> >> up)
>>>> >> We could rather use an AS number and at least a ipv4/24 and
>ipv6/48 to
>>>> >> leverage with that, so I don't have to nat the various
>emulations.
>>>> >> (and funding, anyone got funding?) Or, as the code is GPLv2
>licensed,
>>>> >> to see more test designers setup a testbed like this to
>calibrate
>>>> >> their own stuff.
>>>> >> Presently we're able to test:
>>>> >> flent
>>>> >> netperf
>>>> >> iperf2
>>>> >> iperf3
>>>> >> speedtest-cli
>>>> >> crusader
>>>> >> the broadband forum udp based test:
>>>> >> https://github.com/BroadbandForum/obudpst
>>>> >> trexx
>>>> >> There's also a virtual machine setup that we can remotely drive
>a web
>>>> >> browser from (but I didn't want to nat the results to the world)
>to
>>>> >> test other web services.
>>>> >> _______________________________________________
>>>> >> Rpm mailing list
>>>> >> Rpm@lists.bufferbloat.net
>>>> >> https://lists.bufferbloat.net/listinfo/rpm
>>>> > _______________________________________________
>>>> > Starlink mailing list
>>>> > Starlink@lists.bufferbloat.net
>>>> > https://lists.bufferbloat.net/listinfo/starlink
>>>> _______________________________________________
>>>> Starlink mailing list
>>>> Starlink@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 12665 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 8:22 ` Sebastian Moeller
@ 2023-01-12 18:02 ` rjmcmahon
2023-01-12 21:34 ` Dick Roy
2023-01-12 20:39 ` Dick Roy
1 sibling, 1 reply; 19+ messages in thread
From: rjmcmahon @ 2023-01-12 18:02 UTC (permalink / raw)
To: Sebastian Moeller
Cc: Dick Roy, Rodney W. Grimes, mike.reynolds, libreqos,
David P. Reed, Rpm, bloat
For WiFi there is the TSF
https://en.wikipedia.org/wiki/Timing_synchronization_function
We in test & measurement use that in our internal telemetry. The TSF of
a Wifi device only needs frequency-sync for some things typically
related to access to the medium. A phase locked loop does it. A device
that decides to go to sleep, as an example, will also stop its TSF
creating a non-linearity. It's difficult to synchronize it to the system
clock or the GPS atomic clock - though we do this for internal testing
reasons so it can be done.
What's mostly missing for T&M with WiFi is the GPS atomic clock as
that's a convenient time domain to use as the canonical domain.
Bob
> Hi RR,
>
>
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>>
>>
>>
>> -----Original Message-----
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
>> Behalf Of Sebastian Moeller via Starlink
>> Sent: Wednesday, January 11, 2023 12:01 PM
>> To: Rodney W. Grimes
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;
>> David P. Reed; Rpm; rjmcmahon; bloat
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
>> USA
>>
>> Hi Rodney,
>>
>>
>>
>>
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net> wrote:
>> >
>> > Hello,
>> >
>> > Yall can call me crazy if you want.. but... see below [RWG]
>> >> Hi Bib,
>> >>
>> >>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>> >>>
>> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>> >>>
>> >>> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>> >>
>> >> [SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
>> >
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back and forth, for that matter any rate test, why not abuse some of that data and add the fundemental NTP clock sync data and bidirectionally pass each others concept of "current time". IIRC (its been 25 years since I worked on NTP at this level) you *should* be able to get a fairly accurate clock delta between each end, and then use that info and time stamps in the data stream to compute OWD's. You need to put 4 time stamps in the packet, and with that you can compute "offset".
>> [RR] For this to work at a reasonable level of accuracy, the
>> timestamping circuits on both ends need to be deterministic and
>> repeatable as I recall. Any uncertainty in that process adds to
>> synchronization errors/uncertainties.
>>
>> [SM] Nice idea. I would guess that all timeslot based access
>> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
>> quality time" carefully to the "modems", so maybe all that would be
>> needed is to expose that high quality time to the LAN side of those
>> modems, dressed up as NTP server?
>> [RR] It’s not that simple! Distributing “high-quality time”, i.e.
>> “synchronizing all clocks” does not solve the communication problem in
>> synchronous slotted MAC/PHYs!
>
> [SM] I happily believe you, but the same idea of "time slot" needs to
> be shared by all nodes, no? So the clockss need to be reasonably
> similar rate, aka synchronized (see below).
>
>
>> All the technologies you mentioned above are essentially P2P, not
>> intended for broadcast. Point is, there is a point controller (aka
>> PoC) often called a base station (eNodeB, gNodeB, …) that actually
>> “controls everything that is necessary to control” at the UE including
>> time, frequency and sampling time offsets, and these are critical to
>> get right if you want to communicate, and they are ALL subject to the
>> laws of physics (cf. the speed of light)! Turns out that what is
>> necessary for the system to function anywhere near capacity, is for
>> all the clocks governing transmissions from the UEs to be
>> “unsynchronized” such that all the UE transmissions arrive at the PoC
>> at the same (prescribed) time!
>
> [SM] Fair enough. I would call clocks that are "in sync" albeit with
> individual offsets as synchronized, but I am a layman and that might
> sound offensively wrong to experts in the field. But even without the
> naming my point is that all systems that depend on some idea of shared
> time-base are halfway there of exposing that time to end users, by
> "translating it into an NTP time source at the modem.
>
>
>> For some technologies, in particular 5G!, these considerations are
>> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you
>> don’t believe me! J
>
> [SM Far be it from me not to believe you, so thanks for the pointers.
> Yet, I still think that unless different nodes of a shared segment
> move at significantly different speeds, that there should be a common
> "tick-duration" for all clocks even if each clock runs at an offset...
> (I naively would try to implement something like that by trying to
> fully synchronize clocks and maintain a local offset value to convert
> from "absolute" time to "network" time, but likely because coming from
> the outside I am blissfully unaware of the detail challenges that need
> to be solved).
>
> Regards & Thanks
> Sebastian
>
>
>>
>>
>> >
>> >>
>> >>
>> >>>
>> >>> --trip-times
>> >>> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
>> > [RWG] --clock-skew
>> > enable the measurement of the wall clock difference between sender and receiver
>> >
>> >>
>> >> [SM] Sweet!
>> >>
>> >> Regards
>> >> Sebastian
>> >>
>> >>>
>> >>> Bob
>> >>>> I have many kvetches about the new latency under load tests being
>> >>>> designed and distributed over the past year. I am delighted! that they
>> >>>> are happening, but most really need third party evaluation, and
>> >>>> calibration, and a solid explanation of what network pathologies they
>> >>>> do and don't cover. Also a RED team attitude towards them, as well as
>> >>>> thinking hard about what you are not measuring (operations research).
>> >>>> I actually rather love the new cloudflare speedtest, because it tests
>> >>>> a single TCP connection, rather than dozens, and at the same time folk
>> >>>> are complaining that it doesn't find the actual "speed!". yet... the
>> >>>> test itself more closely emulates a user experience than speedtest.net
>> >>>> does. I am personally pretty convinced that the fewer numbers of flows
>> >>>> that a web page opens improves the likelihood of a good user
>> >>>> experience, but lack data on it.
>> >>>> To try to tackle the evaluation and calibration part, I've reached out
>> >>>> to all the new test designers in the hope that we could get together
>> >>>> and produce a report of what each new test is actually doing. I've
>> >>>> tweeted, linked in, emailed, and spammed every measurement list I know
>> >>>> of, and only to some response, please reach out to other test designer
>> >>>> folks and have them join the rpm email list?
>> >>>> My principal kvetches in the new tests so far are:
>> >>>> 0) None of the tests last long enough.
>> >>>> Ideally there should be a mode where they at least run to "time of
>> >>>> first loss", or periodically, just run longer than the
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> >>>> there! It's really bad science to optimize the internet for 20
>> >>>> seconds. It's like optimizing a car, to handle well, for just 20
>> >>>> seconds.
>> >>>> 1) Not testing up + down + ping at the same time
>> >>>> None of the new tests actually test the same thing that the infamous
>> >>>> rrul test does - all the others still test up, then down, and ping. It
>> >>>> was/remains my hope that the simpler parts of the flent test suite -
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> >>>> tests would provide calibration to the test designers.
>> >>>> we've got zillions of flent results in the archive published here:
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>> >>>
>> >>>> The new tests have all added up + ping and down + ping, but not up +
>> >>>> down + ping. Why??
>> >>>> The behaviors of what happens in that case are really non-intuitive, I
>> >>>> know, but... it's just one more phase to add to any one of those new
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>> >>>> started doing that, even optionally, and boggled at how it defeated
>> >>>> their assumptions.
>> >>>> Among other things that would show...
>> >>>> It's the home router industry's dirty secret than darn few "gigabit"
>> >>>> home routers can actually forward in both directions at a gigabit. I'd
>> >>>> like to smash that perception thoroughly, but given our starting point
>> >>>> is a gigabit router was a "gigabit switch" - and historically been
>> >>>> something that couldn't even forward at 200Mbit - we have a long way
>> >>>> to go there.
>> >>>> Only in the past year have non-x86 home routers appeared that could
>> >>>> actually do a gbit in both directions.
>> >>>> 2) Few are actually testing within-stream latency
>> >>>> Apple's rpm project is making a stab in that direction. It looks
>> >>>> highly likely, that with a little more work, crusader and
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
>> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
>> >>>> discovering how hard it is by delving deep into the rust behind
>> >>>> crusader.
>> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
>> >>>> at the same time, I guess motivated by an attempt to have the test
>> >>>> complete quickly?
>> >>>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>> >>>
>> >>>> In the libreqos.io project we've established a testbed where tests can
>> >>>> be plunked through various ISP plan network emulations. It's here:
>> >>>> https://payne.taht.net (run bandwidth test for what's currently hooked
>> >>>> up)
>> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> >>>> leverage with that, so I don't have to nat the various emulations.
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> >>>> to see more test designers setup a testbed like this to calibrate
>> >>>> their own stuff.
>> >>>> Presently we're able to test:
>> >>>> flent
>> >>>> netperf
>> >>>> iperf2
>> >>>> iperf3
>> >>>> speedtest-cli
>> >>>> crusader
>> >>>> the broadband forum udp based test:
>> >>>> https://github.com/BroadbandForum/obudpst
>> >>>> trexx
>> >>>> There's also a virtual machine setup that we can remotely drive a web
>> >>>> browser from (but I didn't want to nat the results to the world) to
>> >>>> test other web services.
>> >>>> _______________________________________________
>> >>>> Rpm mailing list
>> >>>> Rpm@lists.bufferbloat.net
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>> >>> _______________________________________________
>> >>> Starlink mailing list
>> >>> Starlink@lists.bufferbloat.net
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>> >>
>> >> _______________________________________________
>> >> Starlink mailing list
>> >> Starlink@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/starlink
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 8:22 ` Sebastian Moeller
2023-01-12 18:02 ` rjmcmahon
@ 2023-01-12 20:39 ` Dick Roy
2023-01-13 7:33 ` Sebastian Moeller
2023-01-13 7:40 ` rjmcmahon
1 sibling, 2 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-12 20:39 UTC (permalink / raw)
To: 'Sebastian Moeller'
Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
'David P. Reed', 'Rpm', 'rjmcmahon',
'bloat'
[-- Attachment #1: Type: text/plain, Size: 16054 bytes --]
Hi Sebastian (et. al.),
[I'll comment up here instead of inline.]
Let me start by saying that I have not been intimately involved with the
IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
similar vein, just adding the wireless first hop component and it's effects
on PTP.
What was apparent from the outset was that there was a lack of understanding
what the terms "to synchronize" or "to be synchronized" actually mean. It's
not trivial . because we live in a (approximately, that's another story!)
4-D space-time continuum where the Lorentz metric plays a critical role.
Therein, simultaneity (aka "things happening at the same time") means the
"distance" between two such events is zero and that distance is given by
sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
a clock somewhere. Now since everything is relative (time with respect to
what? / location with respect to where?) it's pretty easy to see that "if
you don't know where you are, you can't know what time it is!" (English
sailors of the 18th century knew this well!) Add to this the fact that if
everything were stationary, nothing would happen (as Einstein said "Nothing
happens until something moves!"), special relativity also pays a role.
Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
due to their "speed" (8700 mph roughly)! Then add the consequence that
without mass we wouldn't exist (in these forms at least:-)), and
gravitational effects (aka General Relativity) come into play. Those turn
out to make clocks on GPS satellites run 45usec/day faster than those on
earth! The net effect is that GPS clocks run about 38usec/day faster than
clocks on earth. So what does it mean to "synchronize to GPS"? Point is:
it's a non-trivial question with a very complicated answer. The reason it
is important to get all this right is that the "what that ties time and
space together" is the speed of light and that turns out to be a
"foot-per-nanosecond" in a vacuum (roughly 300m/usec). This means if I am
uncertain about my location to say 300 meters, then I also am not sure what
time it is to a usec AND vice-versa!
All that said, the simplest explanation of synchronization is probably: Two
clocks are synchronized if, when they are brought (slowly) into physical
proximity ("sat next to each other") in the same (quasi-)inertial frame and
the same gravitational potential (not so obvious BTW . see the FYI below!),
an observer of both would say "they are keeping time identically". Since
this experiment is rarely possible, one can never be "sure" that his clock
is synchronized to any other clock elsewhere. And what does it mean to say
they "were synchronized" when brought together, but now they are not because
they are now in different gravitational potentials! (FYI, there are land
mine detectors being developed on this very principle! I know someone who
actually worked on such a project!)
This all gets even more complicated when dealing with large networks of
networks in which the "speed of information transmission" can vary depending
on the medium (cf. coaxial cables versus fiber versus microwave links!) In
fact, the atmosphere is one of those media and variations therein result in
the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
in order to get to sub-nsec/cm accuracy. Point is if you have a set of
nodes distributed across the country all with GPS and all "synchronized to
GPS time", and a second identical set of nodes (with no GPS) instead
connected with a network of cables and fiber links, all of different lengths
and composition using different carrier frequencies (dielectric constants
vary with frequency!) "synchronized" to some clock somewhere using NTP or
PTP), the synchronization of the two sets will be different unless a common
reference clock is used AND all the above effects are taken into account,
and good luck with that! :-)
In conclusion, if anyone tells you that clock synchronization in
communication networks is simple ("Just use GPS!"), you should feel free to
chuckle (under your breath if necessary:-))
Cheers,
RR
-----Original Message-----
From: Sebastian Moeller [mailto:moeller0@gmx.de]
Sent: Thursday, January 12, 2023 12:23 AM
To: Dick Roy
Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David P.
Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi RR,
> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>
>
> -----Original Message-----
> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf
Of Sebastian Moeller via Starlink
> Sent: Wednesday, January 11, 2023 12:01 PM
> To: Rodney W. Grimes
> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
> Hi Rodney,
>
>
>
>
> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
wrote:
> >
> > Hello,
> >
> > Yall can call me crazy if you want.. but... see below [RWG]
> >> Hi Bib,
> >>
> >>
> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:
> >>>
> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.
> >>>
> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.
> >>
> >> [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....
> >
> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time". IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's. You need to put 4 time stamps in the packet, and with
that you can compute "offset".
> [RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.
>
> [SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?
> [RR] It's not that simple! Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs!
[SM] I happily believe you, but the same idea of "time slot" needs to
be shared by all nodes, no? So the clockss need to be reasonably similar
rate, aka synchronized (see below).
> All the technologies you mentioned above are essentially P2P, not
intended for broadcast. Point is, there is a point controller (aka PoC)
often called a base station (eNodeB, gNodeB, .) that actually "controls
everything that is necessary to control" at the UE including time, frequency
and sampling time offsets, and these are critical to get right if you want
to communicate, and they are ALL subject to the laws of physics (cf. the
speed of light)! Turns out that what is necessary for the system to function
anywhere near capacity, is for all the clocks governing transmissions from
the UEs to be "unsynchronized" such that all the UE transmissions arrive at
the PoC at the same (prescribed) time!
[SM] Fair enough. I would call clocks that are "in sync" albeit with
individual offsets as synchronized, but I am a layman and that might sound
offensively wrong to experts in the field. But even without the naming my
point is that all systems that depend on some idea of shared time-base are
halfway there of exposing that time to end users, by "translating it into an
NTP time source at the modem.
> For some technologies, in particular 5G!, these considerations are
ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
believe me! J
[SM Far be it from me not to believe you, so thanks for the pointers.
Yet, I still think that unless different nodes of a shared segment move at
significantly different speeds, that there should be a common
"tick-duration" for all clocks even if each clock runs at an offset... (I
naively would try to implement something like that by trying to fully
synchronize clocks and maintain a local offset value to convert from
"absolute" time to "network" time, but likely because coming from the
outside I am blissfully unaware of the detail challenges that need to be
solved).
Regards & Thanks
Sebastian
>
>
> >
> >>
> >>
> >>>
> >>> --trip-times
> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)
> > [RWG] --clock-skew
> > enable the measurement of the wall clock difference between sender
and receiver
> >
> >>
> >> [SM] Sweet!
> >>
> >> Regards
> >> Sebastian
> >>
> >>>
> >>> Bob
> >>>> I have many kvetches about the new latency under load tests being
> >>>> designed and distributed over the past year. I am delighted! that
they
> >>>> are happening, but most really need third party evaluation, and
> >>>> calibration, and a solid explanation of what network pathologies they
> >>>> do and don't cover. Also a RED team attitude towards them, as well as
> >>>> thinking hard about what you are not measuring (operations research).
> >>>> I actually rather love the new cloudflare speedtest, because it tests
> >>>> a single TCP connection, rather than dozens, and at the same time
folk
> >>>> are complaining that it doesn't find the actual "speed!". yet... the
> >>>> test itself more closely emulates a user experience than
speedtest.net
> >>>> does. I am personally pretty convinced that the fewer numbers of
flows
> >>>> that a web page opens improves the likelihood of a good user
> >>>> experience, but lack data on it.
> >>>> To try to tackle the evaluation and calibration part, I've reached
out
> >>>> to all the new test designers in the hope that we could get together
> >>>> and produce a report of what each new test is actually doing. I've
> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know
> >>>> of, and only to some response, please reach out to other test
designer
> >>>> folks and have them join the rpm email list?
> >>>> My principal kvetches in the new tests so far are:
> >>>> 0) None of the tests last long enough.
> >>>> Ideally there should be a mode where they at least run to "time of
> >>>> first loss", or periodically, just run longer than the
> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
> >>>> there! It's really bad science to optimize the internet for 20
> >>>> seconds. It's like optimizing a car, to handle well, for just 20
> >>>> seconds.
> >>>> 1) Not testing up + down + ping at the same time
> >>>> None of the new tests actually test the same thing that the infamous
> >>>> rrul test does - all the others still test up, then down, and ping.
It
> >>>> was/remains my hope that the simpler parts of the flent test suite -
> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
> >>>> tests would provide calibration to the test designers.
> >>>> we've got zillions of flent results in the archive published here:
> >>>> https://blog.cerowrt.org/post/found_in_flent/
> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
> >>>
> >>>> The new tests have all added up + ping and down + ping, but not up +
> >>>> down + ping. Why??
> >>>> The behaviors of what happens in that case are really non-intuitive,
I
> >>>> know, but... it's just one more phase to add to any one of those new
> >>>> tests. I'd be deliriously happy if someone(s) new to the field
> >>>> started doing that, even optionally, and boggled at how it defeated
> >>>> their assumptions.
> >>>> Among other things that would show...
> >>>> It's the home router industry's dirty secret than darn few "gigabit"
> >>>> home routers can actually forward in both directions at a gigabit.
I'd
> >>>> like to smash that perception thoroughly, but given our starting
point
> >>>> is a gigabit router was a "gigabit switch" - and historically been
> >>>> something that couldn't even forward at 200Mbit - we have a long way
> >>>> to go there.
> >>>> Only in the past year have non-x86 home routers appeared that could
> >>>> actually do a gbit in both directions.
> >>>> 2) Few are actually testing within-stream latency
> >>>> Apple's rpm project is making a stab in that direction. It looks
> >>>> highly likely, that with a little more work, crusader and
> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
> >>>> discovering how hard it is by delving deep into the rust behind
> >>>> crusader.
> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
> >>>> at the same time, I guess motivated by an attempt to have the test
> >>>> complete quickly?
> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.
> >>>
> >>>> In the libreqos.io project we've established a testbed where tests
can
> >>>> be plunked through various ISP plan network emulations. It's here:
> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked
> >>>> up)
> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to
> >>>> leverage with that, so I don't have to nat the various emulations.
> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
> >>>> to see more test designers setup a testbed like this to calibrate
> >>>> their own stuff.
> >>>> Presently we're able to test:
> >>>> flent
> >>>> netperf
> >>>> iperf2
> >>>> iperf3
> >>>> speedtest-cli
> >>>> crusader
> >>>> the broadband forum udp based test:
> >>>> https://github.com/BroadbandForum/obudpst
> >>>> trexx
> >>>> There's also a virtual machine setup that we can remotely drive a web
> >>>> browser from (but I didn't want to nat the results to the world) to
> >>>> test other web services.
> >>>> _______________________________________________
> >>>> Rpm mailing list
> >>>> Rpm@lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/rpm
> >>> _______________________________________________
> >>> Starlink mailing list
> >>> Starlink@lists.bufferbloat.net
> >>> https://lists.bufferbloat.net/listinfo/starlink
> >>
> >> _______________________________________________
> >> Starlink mailing list
> >> Starlink@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/starlink
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 45666 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 18:02 ` rjmcmahon
@ 2023-01-12 21:34 ` Dick Roy
0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-12 21:34 UTC (permalink / raw)
To: 'rjmcmahon', 'Sebastian Moeller'
Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
'David P. Reed', 'Rpm', 'bloat'
[-- Attachment #1: Type: text/plain, Size: 13155 bytes --]
-----Original Message-----
From: rjmcmahon [mailto:rjmcmahon@rjmcmahon.com]
Sent: Thursday, January 12, 2023 10:03 AM
To: Sebastian Moeller
Cc: Dick Roy; Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos;
David P. Reed; Rpm; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
For WiFi there is the TSF
https://en.wikipedia.org/wiki/Timing_synchronization_function
[RR] There is also a TimingAdvertisement function which can be used to
synchronize STAs to UTC time (or other specified time references . see the
802.11 standard for details . or ask me offline). It was added in the
802.11p amendment along with OCB operation if you care to know:-)
We in test & measurement use that in our internal telemetry. The TSF of
a Wifi device only needs frequency-sync for some things typically
related to access to the medium. A phase locked loop does it. A device
that decides to go to sleep, as an example, will also stop its TSF
creating a non-linearity. It's difficult to synchronize it to the system
clock or the GPS atomic clock - though we do this for internal testing
reasons so it can be done.
What's mostly missing for T&M with WiFi is the GPS atomic clock as
that's a convenient time domain to use as the canonical domain.
Bob
> Hi RR,
>
>
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>>
>>
>>
>> -----Original Message-----
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
>> Behalf Of Sebastian Moeller via Starlink
>> Sent: Wednesday, January 11, 2023 12:01 PM
>> To: Rodney W. Grimes
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;
>> David P. Reed; Rpm; rjmcmahon; bloat
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
>> USA
>>
>> Hi Rodney,
>>
>>
>>
>>
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
<starlink@gndrsh.dnsmgr.net> wrote:
>> >
>> > Hello,
>> >
>> > Yall can call me crazy if you want.. but... see below [RWG]
>> >> Hi Bib,
>> >>
>> >>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:
>> >>>
>> >>> My biggest barrier is the lack of clock sync by the devices, i.e.
very limited support for PTP in data centers and in end devices. This limits
the ability to measure one way delays (OWD) and most assume that OWD is 1/2
and RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.
>> >>>
>> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.
>> >>
>> >> [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....
>> >
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time". IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's. You need to put 4 time stamps in the packet, and with
that you can compute "offset".
>> [RR] For this to work at a reasonable level of accuracy, the
>> timestamping circuits on both ends need to be deterministic and
>> repeatable as I recall. Any uncertainty in that process adds to
>> synchronization errors/uncertainties.
>>
>> [SM] Nice idea. I would guess that all timeslot based access
>> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
>> quality time" carefully to the "modems", so maybe all that would be
>> needed is to expose that high quality time to the LAN side of those
>> modems, dressed up as NTP server?
>> [RR] It's not that simple! Distributing "high-quality time", i.e.
>> "synchronizing all clocks" does not solve the communication problem in
>> synchronous slotted MAC/PHYs!
>
> [SM] I happily believe you, but the same idea of "time slot" needs to
> be shared by all nodes, no? So the clockss need to be reasonably
> similar rate, aka synchronized (see below).
>
>
>> All the technologies you mentioned above are essentially P2P, not
>> intended for broadcast. Point is, there is a point controller (aka
>> PoC) often called a base station (eNodeB, gNodeB, .) that actually
>> "controls everything that is necessary to control" at the UE including
>> time, frequency and sampling time offsets, and these are critical to
>> get right if you want to communicate, and they are ALL subject to the
>> laws of physics (cf. the speed of light)! Turns out that what is
>> necessary for the system to function anywhere near capacity, is for
>> all the clocks governing transmissions from the UEs to be
>> "unsynchronized" such that all the UE transmissions arrive at the PoC
>> at the same (prescribed) time!
>
> [SM] Fair enough. I would call clocks that are "in sync" albeit with
> individual offsets as synchronized, but I am a layman and that might
> sound offensively wrong to experts in the field. But even without the
> naming my point is that all systems that depend on some idea of shared
> time-base are halfway there of exposing that time to end users, by
> "translating it into an NTP time source at the modem.
>
>
>> For some technologies, in particular 5G!, these considerations are
>> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you
>> don't believe me! J
>
> [SM Far be it from me not to believe you, so thanks for the pointers.
> Yet, I still think that unless different nodes of a shared segment
> move at significantly different speeds, that there should be a common
> "tick-duration" for all clocks even if each clock runs at an offset...
> (I naively would try to implement something like that by trying to
> fully synchronize clocks and maintain a local offset value to convert
> from "absolute" time to "network" time, but likely because coming from
> the outside I am blissfully unaware of the detail challenges that need
> to be solved).
>
> Regards & Thanks
> Sebastian
>
>
>>
>>
>> >
>> >>
>> >>
>> >>>
>> >>> --trip-times
>> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)
>> > [RWG] --clock-skew
>> > enable the measurement of the wall clock difference between sender
and receiver
>> >
>> >>
>> >> [SM] Sweet!
>> >>
>> >> Regards
>> >> Sebastian
>> >>
>> >>>
>> >>> Bob
>> >>>> I have many kvetches about the new latency under load tests being
>> >>>> designed and distributed over the past year. I am delighted! that
they
>> >>>> are happening, but most really need third party evaluation, and
>> >>>> calibration, and a solid explanation of what network pathologies
they
>> >>>> do and don't cover. Also a RED team attitude towards them, as well
as
>> >>>> thinking hard about what you are not measuring (operations
research).
>> >>>> I actually rather love the new cloudflare speedtest, because it
tests
>> >>>> a single TCP connection, rather than dozens, and at the same time
folk
>> >>>> are complaining that it doesn't find the actual "speed!". yet... the
>> >>>> test itself more closely emulates a user experience than
speedtest.net
>> >>>> does. I am personally pretty convinced that the fewer numbers of
flows
>> >>>> that a web page opens improves the likelihood of a good user
>> >>>> experience, but lack data on it.
>> >>>> To try to tackle the evaluation and calibration part, I've reached
out
>> >>>> to all the new test designers in the hope that we could get together
>> >>>> and produce a report of what each new test is actually doing. I've
>> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know
>> >>>> of, and only to some response, please reach out to other test
designer
>> >>>> folks and have them join the rpm email list?
>> >>>> My principal kvetches in the new tests so far are:
>> >>>> 0) None of the tests last long enough.
>> >>>> Ideally there should be a mode where they at least run to "time of
>> >>>> first loss", or periodically, just run longer than the
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> >>>> there! It's really bad science to optimize the internet for 20
>> >>>> seconds. It's like optimizing a car, to handle well, for just 20
>> >>>> seconds.
>> >>>> 1) Not testing up + down + ping at the same time
>> >>>> None of the new tests actually test the same thing that the infamous
>> >>>> rrul test does - all the others still test up, then down, and ping.
It
>> >>>> was/remains my hope that the simpler parts of the flent test suite -
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> >>>> tests would provide calibration to the test designers.
>> >>>> we've got zillions of flent results in the archive published here:
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>> >>>
>> >>>> The new tests have all added up + ping and down + ping, but not up +
>> >>>> down + ping. Why??
>> >>>> The behaviors of what happens in that case are really non-intuitive,
I
>> >>>> know, but... it's just one more phase to add to any one of those new
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>> >>>> started doing that, even optionally, and boggled at how it defeated
>> >>>> their assumptions.
>> >>>> Among other things that would show...
>> >>>> It's the home router industry's dirty secret than darn few "gigabit"
>> >>>> home routers can actually forward in both directions at a gigabit.
I'd
>> >>>> like to smash that perception thoroughly, but given our starting
point
>> >>>> is a gigabit router was a "gigabit switch" - and historically been
>> >>>> something that couldn't even forward at 200Mbit - we have a long way
>> >>>> to go there.
>> >>>> Only in the past year have non-x86 home routers appeared that could
>> >>>> actually do a gbit in both directions.
>> >>>> 2) Few are actually testing within-stream latency
>> >>>> Apple's rpm project is making a stab in that direction. It looks
>> >>>> highly likely, that with a little more work, crusader and
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
>> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
>> >>>> discovering how hard it is by delving deep into the rust behind
>> >>>> crusader.
>> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
>> >>>> at the same time, I guess motivated by an attempt to have the test
>> >>>> complete quickly?
>> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.
>> >>>
>> >>>> In the libreqos.io project we've established a testbed where tests
can
>> >>>> be plunked through various ISP plan network emulations. It's here:
>> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked
>> >>>> up)
>> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to
>> >>>> leverage with that, so I don't have to nat the various emulations.
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
licensed,
>> >>>> to see more test designers setup a testbed like this to calibrate
>> >>>> their own stuff.
>> >>>> Presently we're able to test:
>> >>>> flent
>> >>>> netperf
>> >>>> iperf2
>> >>>> iperf3
>> >>>> speedtest-cli
>> >>>> crusader
>> >>>> the broadband forum udp based test:
>> >>>> https://github.com/BroadbandForum/obudpst
>> >>>> trexx
>> >>>> There's also a virtual machine setup that we can remotely drive a
web
>> >>>> browser from (but I didn't want to nat the results to the world) to
>> >>>> test other web services.
>> >>>> _______________________________________________
>> >>>> Rpm mailing list
>> >>>> Rpm@lists.bufferbloat.net
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>> >>> _______________________________________________
>> >>> Starlink mailing list
>> >>> Starlink@lists.bufferbloat.net
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>> >>
>> >> _______________________________________________
>> >> Starlink mailing list
>> >> Starlink@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/starlink
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 47179 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 17:49 ` Robert McMahon
@ 2023-01-12 21:57 ` Dick Roy
2023-01-13 7:44 ` Sebastian Moeller
0 siblings, 1 reply; 19+ messages in thread
From: Dick Roy @ 2023-01-12 21:57 UTC (permalink / raw)
To: 'Robert McMahon', 'Sebastian Moeller'
Cc: mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
[-- Attachment #1: Type: text/plain, Size: 10313 bytes --]
FYI .
https://www.fiercewireless.com/tech/cbrs-based-fwa-beats-starlink-performanc
e-madden
Nothing earth-shaking :-)
RR
_____
From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
Robert McMahon via Starlink
Sent: Thursday, January 12, 2023 9:50 AM
To: Sebastian Moeller
Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi Sebastien,
You make a good point. What I did was issue a warning if the tool found it
was being CPU limited vs i/o limited. This indicates the i/o test likely is
inaccurate from an i/o perspective, and the results are suspect. It does
this crudely by comparing the cpu thread doing stats against the traffic
threads doing i/o, which thread is waiting on the others. There is no
attempt to assess the cpu load itself. So it's designed with a singular
purpose of making sure i/o threads only block on syscalls of write and read.
I probably should revisit this both in design and implementation. Thanks for
bringing it up and all input is truly appreciated.
Bob
On Jan 12, 2023, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
Hi Bob,
On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
Iperf 2 is designed to measure network i/o. Note: It doesn't have to move
large amounts of data. It can support data profiles that don't drive TCP's
CCA as an example.
Two things I've been asked for and avoided:
1) Integrate clock sync into iperf's test traffic
[SM] This I understand, measurement conditions can be unsuited for tight
time synchronization...
2) Measure and output CPU usages
[SM] This one puzzles me, as far as I understand the only way to properly
diagnose network issues is to rule out other things like CPU overload that
can have symptoms similar to network issues. As an example, the cake qdisc
will if CPU cycles become tight first increases its internal queueing and
jitter (not consciously, it is just an observation that once cake does not
get access to the CPU as timely as it wants, queuing latency and variability
increases) and then later also shows reduced throughput, so similar things
that can happen along an e2e network path for completely different reasons,
e.g. lower level retransmissions or a variable rate link. So i would think
that checking the CPU load at least coarse would be within the scope of
network testing tools, no?
Regards
Sebastian
I think both of these are outside the scope of a tool designed to test
network i/o over sockets, rather these should be developed & validated
independently of a network i/o tool.
Clock error really isn't about amount/frequency of traffic but rather
getting a periodic high-quality reference. I tend to use GPS pulse per
second to lock the local system oscillator to. As David says, most every
modern handheld computer has the GPS chips to do this already. So to me it
seems more of a policy choice between data center operators and device mfgs
and less of a technical issue.
Bob
Hello,
Yall can call me crazy if you want.. but... see below [RWG]
Hi Bib,
On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:
My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.
For those that can get clock sync working, the iperf 2 --trip-times options
is useful.
[SM] +1; and yet even with unsynchronized clocks one can try to measure
how latency changes under load and that can be done per direction. Sure this
is far inferior to real reliably measured OWDs, but if life/the internet
deals you lemons....
[RWG] iperf2/iperf3, etc are already moving large amounts of data
back and forth, for that matter any rate test, why not abuse some of
that data and add the fundemental NTP clock sync data and
bidirectionally pass each others concept of "current time". IIRC (its
been 25 years since I worked on NTP at this level) you *should* be
able to get a fairly accurate clock delta between each end, and then
use that info and time stamps in the data stream to compute OWD's.
You need to put 4 time stamps in the packet, and with that you can
compute "offset".
--trip-times
enable the measurement of end to end write to read latencies (client and
server clocks must be synchronized)
[RWG] --clock-skew
enable the measurement of the wall clock difference between sender and
receiver
[SM] Sweet!
Regards
Sebastian
Bob
I have many kvetches about the new latency under load tests being
designed and distributed over the past year. I am delighted! that they
are happening, but most really need third party evaluation, and
calibration, and a solid explanation of what network pathologies they
do and don't cover. Also a RED team attitude towards them, as well as
thinking hard about what you are not measuring (operations research).
I actually rather love the new cloudflare speedtest, because it tests
a single TCP connection, rather than dozens, and at the same time folk
are complaining that it doesn't find the actual "speed!". yet... the
test itself more closely emulates a user experience than speedtest.net
does. I am personally pretty convinced that the fewer numbers of flows
that a web page opens improves the likelihood of a good user
experience, but lack data on it.
To try to tackle the evaluation and calibration part, I've reached out
to all the new test designers in the hope that we could get together
and produce a report of what each new test is actually doing. I've
tweeted, linked in, emailed, and spammed every measurement list I know
of, and only to some response, please reach out to other test designer
folks and have them join the rpm email list?
My principal kvetches in the new tests so far are:
0) None of the tests last long enough.
Ideally there should be a mode where they at least run to "time of
first loss", or periodically, just run longer than the
industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
there! It's really bad science to optimize the internet for 20
seconds. It's like optimizing a car, to handle well, for just 20
seconds.
1) Not testing up + down + ping at the same time
None of the new tests actually test the same thing that the infamous
rrul test does - all the others still test up, then down, and ping. It
was/remains my hope that the simpler parts of the flent test suite -
such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
tests would provide calibration to the test designers.
we've got zillions of flent results in the archive published here:
https://blog.cerowrt.org/post/found_in_flent/
ps. Misinformation about iperf 2 impacts my ability to do this.
The new tests have all added up + ping and down + ping, but not up +
down + ping. Why??
The behaviors of what happens in that case are really non-intuitive, I
know, but... it's just one more phase to add to any one of those new
tests. I'd be deliriously happy if someone(s) new to the field
started doing that, even optionally, and boggled at how it defeated
their assumptions.
Among other things that would show...
It's the home router industry's dirty secret than darn few "gigabit"
home routers can actually forward in both directions at a gigabit. I'd
like to smash that perception thoroughly, but given our starting point
is a gigabit router was a "gigabit switch" - and historically been
something that couldn't even forward at 200Mbit - we have a long way
to go there.
Only in the past year have non-x86 home routers appeared that could
actually do a gbit in both directions.
2) Few are actually testing within-stream latency
Apple's rpm project is making a stab in that direction. It looks
highly likely, that with a little more work, crusader and
go-responsiveness can finally start sampling the tcp RTT, loss and
markings, more directly. As for the rest... sampling TCP_INFO on
windows, and Linux, at least, always appeared simple to me, but I'm
discovering how hard it is by delving deep into the rust behind
crusader.
the goresponsiveness thing is also IMHO running WAY too many streams
at the same time, I guess motivated by an attempt to have the test
complete quickly?
B) To try and tackle the validation problem:ps. Misinformation about iperf
2 impacts my ability to do this.
In the libreqos.io project we've established a testbed where tests can
be plunked through various ISP plan network emulations. It's here:
https://payne.taht.net (run bandwidth test for what's currently hooked
up)
We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
leverage with that, so I don't have to nat the various emulations.
(and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
to see more test designers setup a testbed like this to calibrate
their own stuff.
Presently we're able to test:
flent
netperf
iperf2
iperf3
speedtest-cli
crusader
the broadband forum udp based test:
https://github.com/BroadbandForum/obudpst
trexx
There's also a virtual machine setup that we can remotely drive a web
browser from (but I didn't want to nat the results to the world) to
test other web services.
_____
Rpm mailing list
Rpm@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/rpm
_____
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink
_____
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 20462 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 20:39 ` Dick Roy
@ 2023-01-13 7:33 ` Sebastian Moeller
2023-01-13 8:26 ` Dick Roy
2023-01-13 7:40 ` rjmcmahon
1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-13 7:33 UTC (permalink / raw)
To: dickroy, Dick Roy
Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
'David P. Reed', 'Rpm', 'rjmcmahon',
'bloat'
[-- Attachment #1: Type: text/plain, Size: 18467 bytes --]
Hi RR,
Thanks for the detailed response below, since my point is somewhat orthogonal I opted for top-posting.
Let me take a step back here and rephrase, synchronising clocks within an acceptable range to be useful is not rocket science nor witchcraft. For measuring internet traffic 'millisecond' range seems acceptable, local networks can probably profit from finer time resolution. So I am not after e.g. clock synchronisation to participate in SDH/SONET. Heck in the toy project I am active in, we operate on load dependent delay deltas so we even ignore different time offsets and are tolerant to (mildly) different tickrates and clock skew, but it would certainly be nice to have some acceptable measure of UTC from endpoints to be able to interpret timestamps as 'absolute'. Mind you I am fine with them not being veridical absolute, but just good enough for my measurement purpose and I guess that should be within the range of the achievable. Heck, if all servers we query timestamps of would be NTP-'synchronized' and would follow the RFC recommendation to report timestamps in milliseconds past midnight UTC I would be happy.
Regards
Sebsstian
On 12 January 2023 21:39:21 CET, Dick Roy <dickroy@alum.mit.edu> wrote:
>Hi Sebastian (et. al.),
>
>
>
>[I'll comment up here instead of inline.]
>
>
>
>Let me start by saying that I have not been intimately involved with the
>IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
>similar vein, just adding the wireless first hop component and it's effects
>on PTP.
>
>
>
>What was apparent from the outset was that there was a lack of understanding
>what the terms "to synchronize" or "to be synchronized" actually mean. It's
>not trivial . because we live in a (approximately, that's another story!)
>4-D space-time continuum where the Lorentz metric plays a critical role.
>Therein, simultaneity (aka "things happening at the same time") means the
>"distance" between two such events is zero and that distance is given by
>sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
>a clock somewhere. Now since everything is relative (time with respect to
>what? / location with respect to where?) it's pretty easy to see that "if
>you don't know where you are, you can't know what time it is!" (English
>sailors of the 18th century knew this well!) Add to this the fact that if
>everything were stationary, nothing would happen (as Einstein said "Nothing
>happens until something moves!"), special relativity also pays a role.
>Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
>due to their "speed" (8700 mph roughly)! Then add the consequence that
>without mass we wouldn't exist (in these forms at least:-)), and
>gravitational effects (aka General Relativity) come into play. Those turn
>out to make clocks on GPS satellites run 45usec/day faster than those on
>earth! The net effect is that GPS clocks run about 38usec/day faster than
>clocks on earth. So what does it mean to "synchronize to GPS"? Point is:
>it's a non-trivial question with a very complicated answer. The reason it
>is important to get all this right is that the "what that ties time and
>space together" is the speed of light and that turns out to be a
>"foot-per-nanosecond" in a vacuum (roughly 300m/usec). This means if I am
>uncertain about my location to say 300 meters, then I also am not sure what
>time it is to a usec AND vice-versa!
>
>
>
>All that said, the simplest explanation of synchronization is probably: Two
>clocks are synchronized if, when they are brought (slowly) into physical
>proximity ("sat next to each other") in the same (quasi-)inertial frame and
>the same gravitational potential (not so obvious BTW . see the FYI below!),
>an observer of both would say "they are keeping time identically". Since
>this experiment is rarely possible, one can never be "sure" that his clock
>is synchronized to any other clock elsewhere. And what does it mean to say
>they "were synchronized" when brought together, but now they are not because
>they are now in different gravitational potentials! (FYI, there are land
>mine detectors being developed on this very principle! I know someone who
>actually worked on such a project!)
>
>
>
>This all gets even more complicated when dealing with large networks of
>networks in which the "speed of information transmission" can vary depending
>on the medium (cf. coaxial cables versus fiber versus microwave links!) In
>fact, the atmosphere is one of those media and variations therein result in
>the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
>in order to get to sub-nsec/cm accuracy. Point is if you have a set of
>nodes distributed across the country all with GPS and all "synchronized to
>GPS time", and a second identical set of nodes (with no GPS) instead
>connected with a network of cables and fiber links, all of different lengths
>and composition using different carrier frequencies (dielectric constants
>vary with frequency!) "synchronized" to some clock somewhere using NTP or
>PTP), the synchronization of the two sets will be different unless a common
>reference clock is used AND all the above effects are taken into account,
>and good luck with that! :-)
>
>
>
>In conclusion, if anyone tells you that clock synchronization in
>communication networks is simple ("Just use GPS!"), you should feel free to
>chuckle (under your breath if necessary:-))
>
>
>
>Cheers,
>
>
>
>RR
>
>
>
>
>
>
>
>
>
>
>
>
>
>-----Original Message-----
>From: Sebastian Moeller [mailto:moeller0@gmx.de]
>Sent: Thursday, January 12, 2023 12:23 AM
>To: Dick Roy
>Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David P.
>Reed; Rpm; rjmcmahon; bloat
>Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
>
>
>Hi RR,
>
>
>
>
>
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>>
>
>>
>
>>
>
>> -----Original Message-----
>
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf
>Of Sebastian Moeller via Starlink
>
>> Sent: Wednesday, January 11, 2023 12:01 PM
>
>> To: Rodney W. Grimes
>
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
>P. Reed; Rpm; rjmcmahon; bloat
>
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
>>
>
>> Hi Rodney,
>
>>
>
>>
>
>>
>
>>
>
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
>wrote:
>
>> >
>
>> > Hello,
>
>> >
>
>> > Yall can call me crazy if you want.. but... see below [RWG]
>
>> >> Hi Bib,
>
>> >>
>
>> >>
>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>
>> >>>
>
>> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
>limited support for PTP in data centers and in end devices. This limits the
>ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
>RTT which typically is a mistake. We know this intuitively with airplane
>flight times or even car commute times where the one way time is not 1/2 a
>round trip time. Google maps & directions provide a time estimate for the
>one way link. It doesn't compute a round trip and divide by two.
>
>> >>>
>
>> >>> For those that can get clock sync working, the iperf 2 --trip-times
>options is useful.
>
>> >>
>
>> >> [SM] +1; and yet even with unsynchronized clocks one can try to
>measure how latency changes under load and that can be done per direction.
>Sure this is far inferior to real reliably measured OWDs, but if life/the
>internet deals you lemons....
>
>> >
>
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
>and forth, for that matter any rate test, why not abuse some of that data
>and add the fundemental NTP clock sync data and bidirectionally pass each
>others concept of "current time". IIRC (its been 25 years since I worked on
>NTP at this level) you *should* be able to get a fairly accurate clock delta
>between each end, and then use that info and time stamps in the data stream
>to compute OWD's. You need to put 4 time stamps in the packet, and with
>that you can compute "offset".
>
>> [RR] For this to work at a reasonable level of accuracy, the timestamping
>circuits on both ends need to be deterministic and repeatable as I recall.
>Any uncertainty in that process adds to synchronization
>errors/uncertainties.
>
>>
>
>> [SM] Nice idea. I would guess that all timeslot based access
>technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
>time" carefully to the "modems", so maybe all that would be needed is to
>expose that high quality time to the LAN side of those modems, dressed up as
>NTP server?
>
>> [RR] It's not that simple! Distributing "high-quality time", i.e.
>"synchronizing all clocks" does not solve the communication problem in
>synchronous slotted MAC/PHYs!
>
>
>
> [SM] I happily believe you, but the same idea of "time slot" needs to
>be shared by all nodes, no? So the clockss need to be reasonably similar
>rate, aka synchronized (see below).
>
>
>
>
>
>> All the technologies you mentioned above are essentially P2P, not
>intended for broadcast. Point is, there is a point controller (aka PoC)
>often called a base station (eNodeB, gNodeB, .) that actually "controls
>everything that is necessary to control" at the UE including time, frequency
>and sampling time offsets, and these are critical to get right if you want
>to communicate, and they are ALL subject to the laws of physics (cf. the
>speed of light)! Turns out that what is necessary for the system to function
>anywhere near capacity, is for all the clocks governing transmissions from
>the UEs to be "unsynchronized" such that all the UE transmissions arrive at
>the PoC at the same (prescribed) time!
>
>
>
> [SM] Fair enough. I would call clocks that are "in sync" albeit with
>individual offsets as synchronized, but I am a layman and that might sound
>offensively wrong to experts in the field. But even without the naming my
>point is that all systems that depend on some idea of shared time-base are
>halfway there of exposing that time to end users, by "translating it into an
>NTP time source at the modem.
>
>
>
>
>
>> For some technologies, in particular 5G!, these considerations are
>ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
>believe me! J
>
>
>
> [SM Far be it from me not to believe you, so thanks for the pointers.
>Yet, I still think that unless different nodes of a shared segment move at
>significantly different speeds, that there should be a common
>"tick-duration" for all clocks even if each clock runs at an offset... (I
>naively would try to implement something like that by trying to fully
>synchronize clocks and maintain a local offset value to convert from
>"absolute" time to "network" time, but likely because coming from the
>outside I am blissfully unaware of the detail challenges that need to be
>solved).
>
>
>
>Regards & Thanks
>
> Sebastian
>
>
>
>
>
>>
>
>>
>
>> >
>
>> >>
>
>> >>
>
>> >>>
>
>> >>> --trip-times
>
>> >>> enable the measurement of end to end write to read latencies (client
>and server clocks must be synchronized)
>
>> > [RWG] --clock-skew
>
>> > enable the measurement of the wall clock difference between sender
>and receiver
>
>> >
>
>> >>
>
>> >> [SM] Sweet!
>
>> >>
>
>> >> Regards
>
>> >> Sebastian
>
>> >>
>
>> >>>
>
>> >>> Bob
>
>> >>>> I have many kvetches about the new latency under load tests being
>
>> >>>> designed and distributed over the past year. I am delighted! that
>they
>
>> >>>> are happening, but most really need third party evaluation, and
>
>> >>>> calibration, and a solid explanation of what network pathologies they
>
>> >>>> do and don't cover. Also a RED team attitude towards them, as well as
>
>> >>>> thinking hard about what you are not measuring (operations research).
>
>> >>>> I actually rather love the new cloudflare speedtest, because it tests
>
>> >>>> a single TCP connection, rather than dozens, and at the same time
>folk
>
>> >>>> are complaining that it doesn't find the actual "speed!". yet... the
>
>> >>>> test itself more closely emulates a user experience than
>speedtest.net
>
>> >>>> does. I am personally pretty convinced that the fewer numbers of
>flows
>
>> >>>> that a web page opens improves the likelihood of a good user
>
>> >>>> experience, but lack data on it.
>
>> >>>> To try to tackle the evaluation and calibration part, I've reached
>out
>
>> >>>> to all the new test designers in the hope that we could get together
>
>> >>>> and produce a report of what each new test is actually doing. I've
>
>> >>>> tweeted, linked in, emailed, and spammed every measurement list I
>know
>
>> >>>> of, and only to some response, please reach out to other test
>designer
>
>> >>>> folks and have them join the rpm email list?
>
>> >>>> My principal kvetches in the new tests so far are:
>
>> >>>> 0) None of the tests last long enough.
>
>> >>>> Ideally there should be a mode where they at least run to "time of
>
>> >>>> first loss", or periodically, just run longer than the
>
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>
>> >>>> there! It's really bad science to optimize the internet for 20
>
>> >>>> seconds. It's like optimizing a car, to handle well, for just 20
>
>> >>>> seconds.
>
>> >>>> 1) Not testing up + down + ping at the same time
>
>> >>>> None of the new tests actually test the same thing that the infamous
>
>> >>>> rrul test does - all the others still test up, then down, and ping.
>It
>
>> >>>> was/remains my hope that the simpler parts of the flent test suite -
>
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>
>> >>>> tests would provide calibration to the test designers.
>
>> >>>> we've got zillions of flent results in the archive published here:
>
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>> >>>
>
>> >>>> The new tests have all added up + ping and down + ping, but not up +
>
>> >>>> down + ping. Why??
>
>> >>>> The behaviors of what happens in that case are really non-intuitive,
>I
>
>> >>>> know, but... it's just one more phase to add to any one of those new
>
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>
>> >>>> started doing that, even optionally, and boggled at how it defeated
>
>> >>>> their assumptions.
>
>> >>>> Among other things that would show...
>
>> >>>> It's the home router industry's dirty secret than darn few "gigabit"
>
>> >>>> home routers can actually forward in both directions at a gigabit.
>I'd
>
>> >>>> like to smash that perception thoroughly, but given our starting
>point
>
>> >>>> is a gigabit router was a "gigabit switch" - and historically been
>
>> >>>> something that couldn't even forward at 200Mbit - we have a long way
>
>> >>>> to go there.
>
>> >>>> Only in the past year have non-x86 home routers appeared that could
>
>> >>>> actually do a gbit in both directions.
>
>> >>>> 2) Few are actually testing within-stream latency
>
>> >>>> Apple's rpm project is making a stab in that direction. It looks
>
>> >>>> highly likely, that with a little more work, crusader and
>
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
>
>> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
>
>> >>>> discovering how hard it is by delving deep into the rust behind
>
>> >>>> crusader.
>
>> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
>
>> >>>> at the same time, I guess motivated by an attempt to have the test
>
>> >>>> complete quickly?
>
>> >>>> B) To try and tackle the validation problem:ps. Misinformation about
>iperf 2 impacts my ability to do this.
>
>> >>>
>
>> >>>> In the libreqos.io project we've established a testbed where tests
>can
>
>> >>>> be plunked through various ISP plan network emulations. It's here:
>
>> >>>> https://payne.taht.net (run bandwidth test for what's currently
>hooked
>
>> >>>> up)
>
>> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
>to
>
>> >>>> leverage with that, so I don't have to nat the various emulations.
>
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>
>> >>>> to see more test designers setup a testbed like this to calibrate
>
>> >>>> their own stuff.
>
>> >>>> Presently we're able to test:
>
>> >>>> flent
>
>> >>>> netperf
>
>> >>>> iperf2
>
>> >>>> iperf3
>
>> >>>> speedtest-cli
>
>> >>>> crusader
>
>> >>>> the broadband forum udp based test:
>
>> >>>> https://github.com/BroadbandForum/obudpst
>
>> >>>> trexx
>
>> >>>> There's also a virtual machine setup that we can remotely drive a web
>
>> >>>> browser from (but I didn't want to nat the results to the world) to
>
>> >>>> test other web services.
>
>> >>>> _______________________________________________
>
>> >>>> Rpm mailing list
>
>> >>>> Rpm@lists.bufferbloat.net
>
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>
>> >>> _______________________________________________
>
>> >>> Starlink mailing list
>
>> >>> Starlink@lists.bufferbloat.net
>
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>
>> >>
>
>> >> _______________________________________________
>
>> >> Starlink mailing list
>
>> >> Starlink@lists.bufferbloat.net
>
>> >> https://lists.bufferbloat.net/listinfo/starlink
>
>>
>
>> _______________________________________________
>
>> Starlink mailing list
>
>> Starlink@lists.bufferbloat.net
>
>> https://lists.bufferbloat.net/listinfo/starlink
>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
[-- Attachment #2: Type: text/html, Size: 47677 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 20:39 ` Dick Roy
2023-01-13 7:33 ` Sebastian Moeller
@ 2023-01-13 7:40 ` rjmcmahon
2023-01-13 8:10 ` Dick Roy
1 sibling, 1 reply; 19+ messages in thread
From: rjmcmahon @ 2023-01-13 7:40 UTC (permalink / raw)
To: dickroy
Cc: 'Sebastian Moeller', 'Rodney W. Grimes',
mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
Hi RR,
I believe quality GPS chips compensate for relativity in pulse per
second which is needed to get position accuracy.
Bob
> Hi Sebastian (et. al.),
>
> [I'll comment up here instead of inline.]
>
> Let me start by saying that I have not been intimately involved with
> the IEEE 1588 effort (PTP), however I was involved in the 802.11
> efforts along a similar vein, just adding the wireless first hop
> component and it's effects on PTP.
>
> What was apparent from the outset was that there was a lack of
> understanding what the terms "to synchronize" or "to be synchronized"
> actually mean. It's not trivial … because we live in a
> (approximately, that's another story!) 4-D space-time continuum where
> the Lorentz metric plays a critical role. Therein, simultaneity (aka
> "things happening at the same time") means the "distance" between two
> such events is zero and that distance is given by sqrt(x^2 + y^2 + z^2
> - (ct)^2) and the "thing happening" can be the tick of a clock
> somewhere. Now since everything is relative (time with respect to
> what? / location with respect to where?) it's pretty easy to see that
> "if you don't know where you are, you can't know what time it is!"
> (English sailors of the 18th century knew this well!) Add to this the
> fact that if everything were stationary, nothing would happen (as
> Einstein said "Nothing happens until something moves!"), special
> relativity also pays a role. Clocks on GPS satellites run approx.
> 7usecs/day slower than those on earth due to their "speed" (8700 mph
> roughly)! Then add the consequence that without mass we wouldn't exist
> (in these forms at leastJ), and gravitational effects (aka General
> Relativity) come into play. Those turn out to make clocks on GPS
> satellites run 45usec/day faster than those on earth! The net effect
> is that GPS clocks run about 38usec/day faster than clocks on earth.
> So what does it mean to "synchronize to GPS"? Point is: it's a
> non-trivial question with a very complicated answer. The reason it is
> important to get all this right is that the "what that ties time and
> space together" is the speed of light and that turns out to be a
> "foot-per-nanosecond" in a vacuum (roughly 300m/usec). This means if
> I am uncertain about my location to say 300 meters, then I also am not
> sure what time it is to a usec AND vice-versa!
>
> All that said, the simplest explanation of synchronization is
> probably: Two clocks are synchronized if, when they are brought
> (slowly) into physical proximity ("sat next to each other") in the
> same (quasi-)inertial frame and the same gravitational potential (not
> so obvious BTW … see the FYI below!), an observer of both would say
> "they are keeping time identically". Since this experiment is rarely
> possible, one can never be "sure" that his clock is synchronized to
> any other clock elsewhere. And what does it mean to say they "were
> synchronized" when brought together, but now they are not because they
> are now in different gravitational potentials! (FYI, there are land
> mine detectors being developed on this very principle! I know someone
> who actually worked on such a project!)
>
> This all gets even more complicated when dealing with large networks
> of networks in which the "speed of information transmission" can vary
> depending on the medium (cf. coaxial cables versus fiber versus
> microwave links!) In fact, the atmosphere is one of those media and
> variations therein result in the need for "GPS corrections" (cf. RTCM
> GPS correction messages, RTK, etc.) in order to get to sub-nsec/cm
> accuracy. Point is if you have a set of nodes distributed across the
> country all with GPS and all "synchronized to GPS time", and a second
> identical set of nodes (with no GPS) instead connected with a network
> of cables and fiber links, all of different lengths and composition
> using different carrier frequencies (dielectric constants vary with
> frequency!) "synchronized" to some clock somewhere using NTP or PTP),
> the synchronization of the two sets will be different unless a common
> reference clock is used AND all the above effects are taken into
> account, and good luck with that! J
>
> In conclusion, if anyone tells you that clock synchronization in
> communication networks is simple ("Just use GPS!"), you should feel
> free to chuckle (under your breath if necessaryJ)
>
> Cheers,
>
> RR
>
> -----Original Message-----
> From: Sebastian Moeller [mailto:moeller0@gmx.de]
> Sent: Thursday, January 12, 2023 12:23 AM
> To: Dick Roy
> Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David
> P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
> USA
>
> Hi RR,
>
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>>
>
>>
>
>>
>
>> -----Original Message-----
>
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
> Behalf Of Sebastian Moeller via Starlink
>
>> Sent: Wednesday, January 11, 2023 12:01 PM
>
>> To: Rodney W. Grimes
>
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;
> David P. Reed; Rpm; rjmcmahon; bloat
>
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
> in USA
>
>>
>
>> Hi Rodney,
>
>>
>
>>
>
>>
>
>>
>
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
> <starlink@gndrsh.dnsmgr.net> wrote:
>
>> >
>
>> > Hello,
>
>> >
>
>> > Yall can call me crazy if you want.. but... see below [RWG]
>
>> >> Hi Bib,
>
>> >>
>
>> >>
>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
> <starlink@lists.bufferbloat.net> wrote:
>
>> >>>
>
>> >>> My biggest barrier is the lack of clock sync by the devices,
> i.e. very limited support for PTP in data centers and in end devices.
> This limits the ability to measure one way delays (OWD) and most
> assume that OWD is 1/2 and RTT which typically is a mistake. We know
> this intuitively with airplane flight times or even car commute times
> where the one way time is not 1/2 a round trip time. Google maps &
> directions provide a time estimate for the one way link. It doesn't
> compute a round trip and divide by two.
>
>> >>>
>
>> >>> For those that can get clock sync working, the iperf 2
> --trip-times options is useful.
>
>> >>
>
>> >> [SM] +1; and yet even with unsynchronized clocks one can try
> to measure how latency changes under load and that can be done per
> direction. Sure this is far inferior to real reliably measured OWDs,
> but if life/the internet deals you lemons....
>
>> >
>
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data
> back and forth, for that matter any rate test, why not abuse some of
> that data and add the fundemental NTP clock sync data and
> bidirectionally pass each others concept of "current time". IIRC (its
> been 25 years since I worked on NTP at this level) you *should* be
> able to get a fairly accurate clock delta between each end, and then
> use that info and time stamps in the data stream to compute OWD's.
> You need to put 4 time stamps in the packet, and with that you can
> compute "offset".
>
>> [RR] For this to work at a reasonable level of accuracy, the
> timestamping circuits on both ends need to be deterministic and
> repeatable as I recall. Any uncertainty in that process adds to
> synchronization errors/uncertainties.
>
>>
>
>> [SM] Nice idea. I would guess that all timeslot based access
> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
> quality time" carefully to the "modems", so maybe all that would be
> needed is to expose that high quality time to the LAN side of those
> modems, dressed up as NTP server?
>
>> [RR] It's not that simple! Distributing "high-quality time", i.e.
> "synchronizing all clocks" does not solve the communication problem in
> synchronous slotted MAC/PHYs!
>
> [SM] I happily believe you, but the same idea of "time slot"
> needs to be shared by all nodes, no? So the clockss need to be
> reasonably similar rate, aka synchronized (see below).
>
>> All the technologies you mentioned above are essentially P2P, not
> intended for broadcast. Point is, there is a point controller (aka
> PoC) often called a base station (eNodeB, gNodeB, …) that actually
> "controls everything that is necessary to control" at the UE including
> time, frequency and sampling time offsets, and these are critical to
> get right if you want to communicate, and they are ALL subject to the
> laws of physics (cf. the speed of light)! Turns out that what is
> necessary for the system to function anywhere near capacity, is for
> all the clocks governing transmissions from the UEs to be
> "unsynchronized" such that all the UE transmissions arrive at the PoC
> at the same (prescribed) time!
>
> [SM] Fair enough. I would call clocks that are "in sync" albeit
> with individual offsets as synchronized, but I am a layman and that
> might sound offensively wrong to experts in the field. But even
> without the naming my point is that all systems that depend on some
> idea of shared time-base are halfway there of exposing that time to
> end users, by "translating it into an NTP time source at the modem.
>
>> For some technologies, in particular 5G!, these considerations are
> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you
> don't believe me! J
>
> [SM Far be it from me not to believe you, so thanks for the
> pointers. Yet, I still think that unless different nodes of a shared
> segment move at significantly different speeds, that there should be a
> common "tick-duration" for all clocks even if each clock runs at an
> offset... (I naively would try to implement something like that by
> trying to fully synchronize clocks and maintain a local offset value
> to convert from "absolute" time to "network" time, but likely because
> coming from the outside I am blissfully unaware of the detail
> challenges that need to be solved).
>
> Regards & Thanks
>
> Sebastian
>
>>
>
>>
>
>> >
>
>> >>
>
>> >>
>
>> >>>
>
>> >>> --trip-times
>
>> >>> enable the measurement of end to end write to read latencies
> (client and server clocks must be synchronized)
>
>> > [RWG] --clock-skew
>
>> > enable the measurement of the wall clock difference between
> sender and receiver
>
>> >
>
>> >>
>
>> >> [SM] Sweet!
>
>> >>
>
>> >> Regards
>
>> >> Sebastian
>
>> >>
>
>> >>>
>
>> >>> Bob
>
>> >>>> I have many kvetches about the new latency under load tests
> being
>
>> >>>> designed and distributed over the past year. I am delighted!
> that they
>
>> >>>> are happening, but most really need third party evaluation, and
>
>
>> >>>> calibration, and a solid explanation of what network
> pathologies they
>
>> >>>> do and don't cover. Also a RED team attitude towards them, as
> well as
>
>> >>>> thinking hard about what you are not measuring (operations
> research).
>
>> >>>> I actually rather love the new cloudflare speedtest, because it
> tests
>
>> >>>> a single TCP connection, rather than dozens, and at the same
> time folk
>
>> >>>> are complaining that it doesn't find the actual "speed!".
> yet... the
>
>> >>>> test itself more closely emulates a user experience than
> speedtest.net
>
>> >>>> does. I am personally pretty convinced that the fewer numbers
> of flows
>
>> >>>> that a web page opens improves the likelihood of a good user
>
>> >>>> experience, but lack data on it.
>
>> >>>> To try to tackle the evaluation and calibration part, I've
> reached out
>
>> >>>> to all the new test designers in the hope that we could get
> together
>
>> >>>> and produce a report of what each new test is actually doing.
> I've
>
>> >>>> tweeted, linked in, emailed, and spammed every measurement list
> I know
>
>> >>>> of, and only to some response, please reach out to other test
> designer
>
>> >>>> folks and have them join the rpm email list?
>
>> >>>> My principal kvetches in the new tests so far are:
>
>> >>>> 0) None of the tests last long enough.
>
>> >>>> Ideally there should be a mode where they at least run to "time
> of
>
>> >>>> first loss", or periodically, just run longer than the
>
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be
> dragons
>
>> >>>> there! It's really bad science to optimize the internet for 20
>
>> >>>> seconds. It's like optimizing a car, to handle well, for just
> 20
>
>> >>>> seconds.
>
>> >>>> 1) Not testing up + down + ping at the same time
>
>> >>>> None of the new tests actually test the same thing that the
> infamous
>
>> >>>> rrul test does - all the others still test up, then down, and
> ping. It
>
>> >>>> was/remains my hope that the simpler parts of the flent test
> suite -
>
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the
> rtt_fair
>
>> >>>> tests would provide calibration to the test designers.
>
>> >>>> we've got zillions of flent results in the archive published
> here:
>
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>
>> >>>
>
>> >>>> The new tests have all added up + ping and down + ping, but not
> up +
>
>> >>>> down + ping. Why??
>
>> >>>> The behaviors of what happens in that case are really
> non-intuitive, I
>
>> >>>> know, but... it's just one more phase to add to any one of
> those new
>
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>
>> >>>> started doing that, even optionally, and boggled at how it
> defeated
>
>> >>>> their assumptions.
>
>> >>>> Among other things that would show...
>
>> >>>> It's the home router industry's dirty secret than darn few
> "gigabit"
>
>> >>>> home routers can actually forward in both directions at a
> gigabit. I'd
>
>> >>>> like to smash that perception thoroughly, but given our
> starting point
>
>> >>>> is a gigabit router was a "gigabit switch" - and historically
> been
>
>> >>>> something that couldn't even forward at 200Mbit - we have a
> long way
>
>> >>>> to go there.
>
>> >>>> Only in the past year have non-x86 home routers appeared that
> could
>
>> >>>> actually do a gbit in both directions.
>
>> >>>> 2) Few are actually testing within-stream latency
>
>> >>>> Apple's rpm project is making a stab in that direction. It
> looks
>
>> >>>> highly likely, that with a little more work, crusader and
>
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss
> and
>
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO
> on
>
>> >>>> windows, and Linux, at least, always appeared simple to me, but
> I'm
>
>> >>>> discovering how hard it is by delving deep into the rust behind
>
>
>> >>>> crusader.
>
>> >>>> the goresponsiveness thing is also IMHO running WAY too many
> streams
>
>> >>>> at the same time, I guess motivated by an attempt to have the
> test
>
>> >>>> complete quickly?
>
>> >>>> B) To try and tackle the validation problem:ps. Misinformation
> about iperf 2 impacts my ability to do this.
>
>> >>>
>
>> >>>> In the libreqos.io project we've established a testbed where
> tests can
>
>> >>>> be plunked through various ISP plan network emulations. It's
> here:
>
>> >>>> https://payne.taht.net (run bandwidth test for what's currently
> hooked
>
>> >>>> up)
>
>> >>>> We could rather use an AS number and at least a ipv4/24 and
> ipv6/48 to
>
>> >>>> leverage with that, so I don't have to nat the various
> emulations.
>
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
> licensed,
>
>> >>>> to see more test designers setup a testbed like this to
> calibrate
>
>> >>>> their own stuff.
>
>> >>>> Presently we're able to test:
>
>> >>>> flent
>
>> >>>> netperf
>
>> >>>> iperf2
>
>> >>>> iperf3
>
>> >>>> speedtest-cli
>
>> >>>> crusader
>
>> >>>> the broadband forum udp based test:
>
>> >>>> https://github.com/BroadbandForum/obudpst
>
>> >>>> trexx
>
>> >>>> There's also a virtual machine setup that we can remotely drive
> a web
>
>> >>>> browser from (but I didn't want to nat the results to the
> world) to
> awhile
>> >>>> test other web services.
>
>> >>>> _______________________________________________
>
>> >>>> Rpm mailing list
>
>> >>>> Rpm@lists.bufferbloat.net
>
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>
>> >>> _______________________________________________
>
>> >>> Starlink mailing list
>
>> >>> Starlink@lists.bufferbloat.net
>
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>
>> >>
>
>> >> _______________________________________________
>
>> >> Starlink mailing list
>
>> >> Starlink@lists.bufferbloat.net
>
>> >> https://lists.bufferbloat.net/listinfo/starlink
>
>>
>
>> _______________________________________________
>
>> Starlink mailing list
>
>> Starlink@lists.bufferbloat.net
>
>> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-12 21:57 ` Dick Roy
@ 2023-01-13 7:44 ` Sebastian Moeller
2023-01-13 8:01 ` Dick Roy
0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-13 7:44 UTC (permalink / raw)
To: dickroy, Dick Roy, 'Robert McMahon'
Cc: mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
Hi RR
On 12 January 2023 22:57:32 CET, Dick Roy <dickroy@alum.mit.edu> wrote:
>FYI .
>
>
>
>https://www.fiercewireless.com/tech/cbrs-based-fwa-beats-starlink-performanc
>e-madden
>
[SM] He is so close:
'Speed tests don’t tell us much about the capacity of the network, or the reliability of the network, or the true latency with larger packet sizes. Packet loss testing can help to fill in key missing information to give the end customer the smooth experience they’re looking for.'
and
'Packets received over 250 ms latency are considered too late to be useful for video conferencing.'
He actually reports both loss numbers and delay > 250ms, so in spite arguing that loss is the relevant metric he already dips his toes into the latency issue... I wonder whether his view will refine over time now that he apparently moved from a link with 8% packet loss to one with a more sane 0.1% loss rate (no idea how he measured lossrate though, or latency). I guess this shows that there is no single solution for all links, it really matters where one starts which of throughput, delay, loss is the most painful and hence the dimension in need of a fix first.
Regards
Sebastian
>
>
>Nothing earth-shaking :-)
>
>
>RR
>
>
>
> _____
>
>From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
>Robert McMahon via Starlink
>Sent: Thursday, January 12, 2023 9:50 AM
>To: Sebastian Moeller
>Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
>P. Reed; Rpm; bloat
>Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
>
>
>Hi Sebastien,
>
>You make a good point. What I did was issue a warning if the tool found it
>was being CPU limited vs i/o limited. This indicates the i/o test likely is
>inaccurate from an i/o perspective, and the results are suspect. It does
>this crudely by comparing the cpu thread doing stats against the traffic
>threads doing i/o, which thread is waiting on the others. There is no
>attempt to assess the cpu load itself. So it's designed with a singular
>purpose of making sure i/o threads only block on syscalls of write and read.
>
>I probably should revisit this both in design and implementation. Thanks for
>bringing it up and all input is truly appreciated.
>
>Bob
>
>On Jan 12, 2023, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
>Hi Bob,
>
>
>
>
>
>
> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
>
>
>
>
>
> Iperf 2 is designed to measure network i/o. Note: It doesn't have to move
>large amounts of data. It can support data profiles that don't drive TCP's
>CCA as an example.
>
>
>
>
>
> Two things I've been asked for and avoided:
>
>
>
>
>
> 1) Integrate clock sync into iperf's test traffic
>
>
>
> [SM] This I understand, measurement conditions can be unsuited for tight
>time synchronization...
>
>
>
>
>
>
> 2) Measure and output CPU usages
>
>
>
> [SM] This one puzzles me, as far as I understand the only way to properly
>diagnose network issues is to rule out other things like CPU overload that
>can have symptoms similar to network issues. As an example, the cake qdisc
>will if CPU cycles become tight first increases its internal queueing and
>jitter (not consciously, it is just an observation that once cake does not
>get access to the CPU as timely as it wants, queuing latency and variability
>increases) and then later also shows reduced throughput, so similar things
>that can happen along an e2e network path for completely different reasons,
>e.g. lower level retransmissions or a variable rate link. So i would think
>that checking the CPU load at least coarse would be within the scope of
>network testing tools, no?
>
>
>
>
>
>Regards
>
>
> Sebastian
>
>
>
>
>
>
>
>
>
>
>
>
> I think both of these are outside the scope of a tool designed to test
>network i/o over sockets, rather these should be developed & validated
>independently of a network i/o tool.
>
>
>
>
>
> Clock error really isn't about amount/frequency of traffic but rather
>getting a periodic high-quality reference. I tend to use GPS pulse per
>second to lock the local system oscillator to. As David says, most every
>modern handheld computer has the GPS chips to do this already. So to me it
>seems more of a policy choice between data center operators and device mfgs
>and less of a technical issue.
>
>
>
>
>
> Bob
> Hello,
>
>
> Yall can call me crazy if you want.. but... see below [RWG]
> Hi Bib,
> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>
>
>
>
>
> My biggest barrier is the lack of clock sync by the devices, i.e. very
>limited support for PTP in data centers and in end devices. This limits the
>ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
>RTT which typically is a mistake. We know this intuitively with airplane
>flight times or even car commute times where the one way time is not 1/2 a
>round trip time. Google maps & directions provide a time estimate for the
>one way link. It doesn't compute a round trip and divide by two.
>
>
>
>
>
> For those that can get clock sync working, the iperf 2 --trip-times options
>is useful.
> [SM] +1; and yet even with unsynchronized clocks one can try to measure
>how latency changes under load and that can be done per direction. Sure this
>is far inferior to real reliably measured OWDs, but if life/the internet
>deals you lemons....
> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>
>
> back and forth, for that matter any rate test, why not abuse some of
>
>
> that data and add the fundemental NTP clock sync data and
>
>
> bidirectionally pass each others concept of "current time". IIRC (its
>
>
> been 25 years since I worked on NTP at this level) you *should* be
>
>
> able to get a fairly accurate clock delta between each end, and then
>
>
> use that info and time stamps in the data stream to compute OWD's.
>
>
> You need to put 4 time stamps in the packet, and with that you can
>
>
> compute "offset".
>
>
>
>
> --trip-times
>
>
> enable the measurement of end to end write to read latencies (client and
>server clocks must be synchronized)
>
> [RWG] --clock-skew
>
>
> enable the measurement of the wall clock difference between sender and
>receiver
> [SM] Sweet!
>
>
> Regards
>
>
> Sebastian
>
>
>
> Bob
> I have many kvetches about the new latency under load tests being
>
>
> designed and distributed over the past year. I am delighted! that they
>
>
> are happening, but most really need third party evaluation, and
>
>
> calibration, and a solid explanation of what network pathologies they
>
>
> do and don't cover. Also a RED team attitude towards them, as well as
>
>
> thinking hard about what you are not measuring (operations research).
>
>
> I actually rather love the new cloudflare speedtest, because it tests
>
>
> a single TCP connection, rather than dozens, and at the same time folk
>
>
> are complaining that it doesn't find the actual "speed!". yet... the
>
>
> test itself more closely emulates a user experience than speedtest.net
>
>
> does. I am personally pretty convinced that the fewer numbers of flows
>
>
> that a web page opens improves the likelihood of a good user
>
>
> experience, but lack data on it.
>
>
> To try to tackle the evaluation and calibration part, I've reached out
>
>
> to all the new test designers in the hope that we could get together
>
>
> and produce a report of what each new test is actually doing. I've
>
>
> tweeted, linked in, emailed, and spammed every measurement list I know
>
>
> of, and only to some response, please reach out to other test designer
>
>
> folks and have them join the rpm email list?
>
>
> My principal kvetches in the new tests so far are:
>
>
> 0) None of the tests last long enough.
>
>
> Ideally there should be a mode where they at least run to "time of
>
>
> first loss", or periodically, just run longer than the
>
>
> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>
>
> there! It's really bad science to optimize the internet for 20
>
>
> seconds. It's like optimizing a car, to handle well, for just 20
>
>
> seconds.
>
>
> 1) Not testing up + down + ping at the same time
>
>
> None of the new tests actually test the same thing that the infamous
>
>
> rrul test does - all the others still test up, then down, and ping. It
>
>
> was/remains my hope that the simpler parts of the flent test suite -
>
>
> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>
>
> tests would provide calibration to the test designers.
>
>
> we've got zillions of flent results in the archive published here:
>
>
> https://blog.cerowrt.org/post/found_in_flent/
>
>
> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>
> The new tests have all added up + ping and down + ping, but not up +
>
>
> down + ping. Why??
>
>
> The behaviors of what happens in that case are really non-intuitive, I
>
>
> know, but... it's just one more phase to add to any one of those new
>
>
> tests. I'd be deliriously happy if someone(s) new to the field
>
>
> started doing that, even optionally, and boggled at how it defeated
>
>
> their assumptions.
>
>
> Among other things that would show...
>
>
> It's the home router industry's dirty secret than darn few "gigabit"
>
>
> home routers can actually forward in both directions at a gigabit. I'd
>
>
> like to smash that perception thoroughly, but given our starting point
>
>
> is a gigabit router was a "gigabit switch" - and historically been
>
>
> something that couldn't even forward at 200Mbit - we have a long way
>
>
> to go there.
>
>
> Only in the past year have non-x86 home routers appeared that could
>
>
> actually do a gbit in both directions.
>
>
> 2) Few are actually testing within-stream latency
>
>
> Apple's rpm project is making a stab in that direction. It looks
>
>
> highly likely, that with a little more work, crusader and
>
>
> go-responsiveness can finally start sampling the tcp RTT, loss and
>
>
> markings, more directly. As for the rest... sampling TCP_INFO on
>
>
> windows, and Linux, at least, always appeared simple to me, but I'm
>
>
> discovering how hard it is by delving deep into the rust behind
>
>
> crusader.
>
>
> the goresponsiveness thing is also IMHO running WAY too many streams
>
>
> at the same time, I guess motivated by an attempt to have the test
>
>
> complete quickly?
>
>
> B) To try and tackle the validation problem:ps. Misinformation about iperf
>2 impacts my ability to do this.
>
>
> In the libreqos.io project we've established a testbed where tests can
>
>
> be plunked through various ISP plan network emulations. It's here:
>
>
> https://payne.taht.net (run bandwidth test for what's currently hooked
>
>
> up)
>
>
> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>
>
> leverage with that, so I don't have to nat the various emulations.
>
>
> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>
>
> to see more test designers setup a testbed like this to calibrate
>
>
> their own stuff.
>
>
> Presently we're able to test:
>
>
> flent
>
>
> netperf
>
>
> iperf2
>
>
> iperf3
>
>
> speedtest-cli
>
>
> crusader
>
>
> the broadband forum udp based test:
>
>
> https://github.com/BroadbandForum/obudpst
>
>
> trexx
>
>
> There's also a virtual machine setup that we can remotely drive a web
>
>
> browser from (but I didn't want to nat the results to the world) to
>
>
> test other web services.
>
>
>
>
>
> _____
>
>
>
>
>
>
> Rpm mailing list
>
>
> Rpm@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/rpm
>
>
>
>
>
>
> _____
>
>
>
>
>
>
> Starlink mailing list
>
>
> Starlink@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/starlink
>
>
>
>
>
> _____
>
>
>
>
>
>
> Starlink mailing list
>
>
> Starlink@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/starlink
>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-13 7:44 ` Sebastian Moeller
@ 2023-01-13 8:01 ` Dick Roy
0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-13 8:01 UTC (permalink / raw)
To: 'Sebastian Moeller', 'Robert McMahon'
Cc: mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
[-- Attachment #1: Type: text/plain, Size: 13323 bytes --]
-----Original Message-----
From: Sebastian Moeller [mailto:moeller0@gmx.de]
Sent: Thursday, January 12, 2023 11:45 PM
To: dickroy@alum.mit.edu; Dick Roy; 'Robert McMahon'
Cc: mike.reynolds@netforecast.com; 'libreqos'; 'David P. Reed'; 'Rpm';
'bloat'
Subject: RE: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi RR
On 12 January 2023 22:57:32 CET, Dick Roy <dickroy@alum.mit.edu> wrote:
>FYI .
>
>
>
>https://www.fiercewireless.com/tech/cbrs-based-fwa-beats-starlink-performan
c
>e-madden
>
[SM] He is so close:
[RR] Which is why I posted the link :-) I knew you'd latch on to his
thread!
'Speed tests don't tell us much about the capacity of the network, or the
reliability of the network, or the true latency with larger packet sizes.
Packet loss testing can help to fill in key missing information to give the
end customer the smooth experience they're looking for.'
and
'Packets received over 250 ms latency are considered too late to be useful
for video conferencing.'
He actually reports both loss numbers and delay > 250ms, so in spite arguing
that loss is the relevant metric he already dips his toes into the latency
issue... I wonder whether his view will refine over time now that he
apparently moved from a link with 8% packet loss to one with a more sane
0.1% loss rate (no idea how he measured lossrate though, or latency). I
guess this shows that there is no single solution for all links, it really
matters where one starts which of throughput, delay, loss is the most
painful and hence the dimension in need of a fix first.
Regards
Sebastian
>
>
>Nothing earth-shaking :-)
>
>
>RR
>
>
>
> _____
>
>From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
>Robert McMahon via Starlink
>Sent: Thursday, January 12, 2023 9:50 AM
>To: Sebastian Moeller
>Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
>P. Reed; Rpm; bloat
>Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
>
>
>Hi Sebastien,
>
>You make a good point. What I did was issue a warning if the tool found it
>was being CPU limited vs i/o limited. This indicates the i/o test likely is
>inaccurate from an i/o perspective, and the results are suspect. It does
>this crudely by comparing the cpu thread doing stats against the traffic
>threads doing i/o, which thread is waiting on the others. There is no
>attempt to assess the cpu load itself. So it's designed with a singular
>purpose of making sure i/o threads only block on syscalls of write and
read.
>
>I probably should revisit this both in design and implementation. Thanks
for
>bringing it up and all input is truly appreciated.
>
>Bob
>
>On Jan 12, 2023, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
>Hi Bob,
>
>
>
>
>
>
> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
>
>
>
>
>
> Iperf 2 is designed to measure network i/o. Note: It doesn't have to move
>large amounts of data. It can support data profiles that don't drive TCP's
>CCA as an example.
>
>
>
>
>
> Two things I've been asked for and avoided:
>
>
>
>
>
> 1) Integrate clock sync into iperf's test traffic
>
>
>
> [SM] This I understand, measurement conditions can be unsuited for tight
>time synchronization...
>
>
>
>
>
>
> 2) Measure and output CPU usages
>
>
>
> [SM] This one puzzles me, as far as I understand the only way to properly
>diagnose network issues is to rule out other things like CPU overload that
>can have symptoms similar to network issues. As an example, the cake qdisc
>will if CPU cycles become tight first increases its internal queueing and
>jitter (not consciously, it is just an observation that once cake does not
>get access to the CPU as timely as it wants, queuing latency and
variability
>increases) and then later also shows reduced throughput, so similar things
>that can happen along an e2e network path for completely different reasons,
>e.g. lower level retransmissions or a variable rate link. So i would think
>that checking the CPU load at least coarse would be within the scope of
>network testing tools, no?
>
>
>
>
>
>Regards
>
>
> Sebastian
>
>
>
>
>
>
>
>
>
>
>
>
> I think both of these are outside the scope of a tool designed to test
>network i/o over sockets, rather these should be developed & validated
>independently of a network i/o tool.
>
>
>
>
>
> Clock error really isn't about amount/frequency of traffic but rather
>getting a periodic high-quality reference. I tend to use GPS pulse per
>second to lock the local system oscillator to. As David says, most every
>modern handheld computer has the GPS chips to do this already. So to me it
>seems more of a policy choice between data center operators and device mfgs
>and less of a technical issue.
>
>
>
>
>
> Bob
> Hello,
>
>
> Yall can call me crazy if you want.. but... see below [RWG]
> Hi Bib,
> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>
>
>
>
>
> My biggest barrier is the lack of clock sync by the devices, i.e. very
>limited support for PTP in data centers and in end devices. This limits the
>ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
>RTT which typically is a mistake. We know this intuitively with airplane
>flight times or even car commute times where the one way time is not 1/2 a
>round trip time. Google maps & directions provide a time estimate for the
>one way link. It doesn't compute a round trip and divide by two.
>
>
>
>
>
> For those that can get clock sync working, the iperf 2 --trip-times
options
>is useful.
> [SM] +1; and yet even with unsynchronized clocks one can try to measure
>how latency changes under load and that can be done per direction. Sure
this
>is far inferior to real reliably measured OWDs, but if life/the internet
>deals you lemons....
> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>
>
> back and forth, for that matter any rate test, why not abuse some of
>
>
> that data and add the fundemental NTP clock sync data and
>
>
> bidirectionally pass each others concept of "current time". IIRC (its
>
>
> been 25 years since I worked on NTP at this level) you *should* be
>
>
> able to get a fairly accurate clock delta between each end, and then
>
>
> use that info and time stamps in the data stream to compute OWD's.
>
>
> You need to put 4 time stamps in the packet, and with that you can
>
>
> compute "offset".
>
>
>
>
> --trip-times
>
>
> enable the measurement of end to end write to read latencies (client and
>server clocks must be synchronized)
>
> [RWG] --clock-skew
>
>
> enable the measurement of the wall clock difference between sender and
>receiver
> [SM] Sweet!
>
>
> Regards
>
>
> Sebastian
>
>
>
> Bob
> I have many kvetches about the new latency under load tests being
>
>
> designed and distributed over the past year. I am delighted! that they
>
>
> are happening, but most really need third party evaluation, and
>
>
> calibration, and a solid explanation of what network pathologies they
>
>
> do and don't cover. Also a RED team attitude towards them, as well as
>
>
> thinking hard about what you are not measuring (operations research).
>
>
> I actually rather love the new cloudflare speedtest, because it tests
>
>
> a single TCP connection, rather than dozens, and at the same time folk
>
>
> are complaining that it doesn't find the actual "speed!". yet... the
>
>
> test itself more closely emulates a user experience than speedtest.net
>
>
> does. I am personally pretty convinced that the fewer numbers of flows
>
>
> that a web page opens improves the likelihood of a good user
>
>
> experience, but lack data on it.
>
>
> To try to tackle the evaluation and calibration part, I've reached out
>
>
> to all the new test designers in the hope that we could get together
>
>
> and produce a report of what each new test is actually doing. I've
>
>
> tweeted, linked in, emailed, and spammed every measurement list I know
>
>
> of, and only to some response, please reach out to other test designer
>
>
> folks and have them join the rpm email list?
>
>
> My principal kvetches in the new tests so far are:
>
>
> 0) None of the tests last long enough.
>
>
> Ideally there should be a mode where they at least run to "time of
>
>
> first loss", or periodically, just run longer than the
>
>
> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>
>
> there! It's really bad science to optimize the internet for 20
>
>
> seconds. It's like optimizing a car, to handle well, for just 20
>
>
> seconds.
>
>
> 1) Not testing up + down + ping at the same time
>
>
> None of the new tests actually test the same thing that the infamous
>
>
> rrul test does - all the others still test up, then down, and ping. It
>
>
> was/remains my hope that the simpler parts of the flent test suite -
>
>
> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>
>
> tests would provide calibration to the test designers.
>
>
> we've got zillions of flent results in the archive published here:
>
>
> https://blog.cerowrt.org/post/found_in_flent/
>
>
> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>
> The new tests have all added up + ping and down + ping, but not up +
>
>
> down + ping. Why??
>
>
> The behaviors of what happens in that case are really non-intuitive, I
>
>
> know, but... it's just one more phase to add to any one of those new
>
>
> tests. I'd be deliriously happy if someone(s) new to the field
>
>
> started doing that, even optionally, and boggled at how it defeated
>
>
> their assumptions.
>
>
> Among other things that would show...
>
>
> It's the home router industry's dirty secret than darn few "gigabit"
>
>
> home routers can actually forward in both directions at a gigabit. I'd
>
>
> like to smash that perception thoroughly, but given our starting point
>
>
> is a gigabit router was a "gigabit switch" - and historically been
>
>
> something that couldn't even forward at 200Mbit - we have a long way
>
>
> to go there.
>
>
> Only in the past year have non-x86 home routers appeared that could
>
>
> actually do a gbit in both directions.
>
>
> 2) Few are actually testing within-stream latency
>
>
> Apple's rpm project is making a stab in that direction. It looks
>
>
> highly likely, that with a little more work, crusader and
>
>
> go-responsiveness can finally start sampling the tcp RTT, loss and
>
>
> markings, more directly. As for the rest... sampling TCP_INFO on
>
>
> windows, and Linux, at least, always appeared simple to me, but I'm
>
>
> discovering how hard it is by delving deep into the rust behind
>
>
> crusader.
>
>
> the goresponsiveness thing is also IMHO running WAY too many streams
>
>
> at the same time, I guess motivated by an attempt to have the test
>
>
> complete quickly?
>
>
> B) To try and tackle the validation problem:ps. Misinformation about iperf
>2 impacts my ability to do this.
>
>
> In the libreqos.io project we've established a testbed where tests can
>
>
> be plunked through various ISP plan network emulations. It's here:
>
>
> https://payne.taht.net (run bandwidth test for what's currently hooked
>
>
> up)
>
>
> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>
>
> leverage with that, so I don't have to nat the various emulations.
>
>
> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>
>
> to see more test designers setup a testbed like this to calibrate
>
>
> their own stuff.
>
>
> Presently we're able to test:
>
>
> flent
>
>
> netperf
>
>
> iperf2
>
>
> iperf3
>
>
> speedtest-cli
>
>
> crusader
>
>
> the broadband forum udp based test:
>
>
> https://github.com/BroadbandForum/obudpst
>
>
> trexx
>
>
> There's also a virtual machine setup that we can remotely drive a web
>
>
> browser from (but I didn't want to nat the results to the world) to
>
>
> test other web services.
>
>
>
>
>
> _____
>
>
>
>
>
>
> Rpm mailing list
>
>
> Rpm@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/rpm
>
>
>
>
>
>
> _____
>
>
>
>
>
>
> Starlink mailing list
>
>
> Starlink@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/starlink
>
>
>
>
>
> _____
>
>
>
>
>
>
> Starlink mailing list
>
>
> Starlink@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/starlink
>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
[-- Attachment #2: Type: text/html, Size: 85508 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-13 7:40 ` rjmcmahon
@ 2023-01-13 8:10 ` Dick Roy
2023-01-15 23:09 ` rjmcmahon
0 siblings, 1 reply; 19+ messages in thread
From: Dick Roy @ 2023-01-13 8:10 UTC (permalink / raw)
To: 'rjmcmahon'
Cc: 'Sebastian Moeller', 'Rodney W. Grimes',
mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
[-- Attachment #1: Type: text/plain, Size: 18900 bytes --]
-----Original Message-----
From: rjmcmahon [mailto:rjmcmahon@rjmcmahon.com]
Sent: Thursday, January 12, 2023 11:40 PM
To: dickroy@alum.mit.edu
Cc: 'Sebastian Moeller'; 'Rodney W. Grimes'; mike.reynolds@netforecast.com;
'libreqos'; 'David P. Reed'; 'Rpm'; 'bloat'
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi RR,
I believe quality GPS chips compensate for relativity in pulse per
second which is needed to get position accuracy.
[RR] Of course they do. That 38usec/day really matters! They assume they
know what the gravitational potential is where they are, and they can
estimate the potential at the satellites so they can compensate, and they
do. Point is, a GPS unit at Lake Tahoe (6250') runs faster than the one in
San Francisco (sea level). How do you think these two "should be
synchronized"! How do you define "synchronization" in this case? You
synchronize those two clocks, then what about all the other clocks at Lake
Tahoe (or SF or anywhere in between for that matter :-))??? These are not
trivial questions. However if all one cares about is seconds or
milliseconds, then you can argue that we (earthlings on planet earth) can
"sweep such facts under the proverbial rug" for the purposes of latency in
communication networks and that's certainly doable. Don't tell that to the
guys whose protocols require "synchronization of all unit to nanoseconds"
though! They will be very, very unhappy :-) :-) And you know who you are
:-) :-)
:-)
Bob
> Hi Sebastian (et. al.),
>
> [I'll comment up here instead of inline.]
>
> Let me start by saying that I have not been intimately involved with
> the IEEE 1588 effort (PTP), however I was involved in the 802.11
> efforts along a similar vein, just adding the wireless first hop
> component and it's effects on PTP.
>
> What was apparent from the outset was that there was a lack of
> understanding what the terms "to synchronize" or "to be synchronized"
> actually mean. It's not trivial . because we live in a
> (approximately, that's another story!) 4-D space-time continuum where
> the Lorentz metric plays a critical role. Therein, simultaneity (aka
> "things happening at the same time") means the "distance" between two
> such events is zero and that distance is given by sqrt(x^2 + y^2 + z^2
> - (ct)^2) and the "thing happening" can be the tick of a clock
> somewhere. Now since everything is relative (time with respect to
> what? / location with respect to where?) it's pretty easy to see that
> "if you don't know where you are, you can't know what time it is!"
> (English sailors of the 18th century knew this well!) Add to this the
> fact that if everything were stationary, nothing would happen (as
> Einstein said "Nothing happens until something moves!"), special
> relativity also pays a role. Clocks on GPS satellites run approx.
> 7usecs/day slower than those on earth due to their "speed" (8700 mph
> roughly)! Then add the consequence that without mass we wouldn't exist
> (in these forms at leastJ), and gravitational effects (aka General
> Relativity) come into play. Those turn out to make clocks on GPS
> satellites run 45usec/day faster than those on earth! The net effect
> is that GPS clocks run about 38usec/day faster than clocks on earth.
> So what does it mean to "synchronize to GPS"? Point is: it's a
> non-trivial question with a very complicated answer. The reason it is
> important to get all this right is that the "what that ties time and
> space together" is the speed of light and that turns out to be a
> "foot-per-nanosecond" in a vacuum (roughly 300m/usec). This means if
> I am uncertain about my location to say 300 meters, then I also am not
> sure what time it is to a usec AND vice-versa!
>
> All that said, the simplest explanation of synchronization is
> probably: Two clocks are synchronized if, when they are brought
> (slowly) into physical proximity ("sat next to each other") in the
> same (quasi-)inertial frame and the same gravitational potential (not
> so obvious BTW . see the FYI below!), an observer of both would say
> "they are keeping time identically". Since this experiment is rarely
> possible, one can never be "sure" that his clock is synchronized to
> any other clock elsewhere. And what does it mean to say they "were
> synchronized" when brought together, but now they are not because they
> are now in different gravitational potentials! (FYI, there are land
> mine detectors being developed on this very principle! I know someone
> who actually worked on such a project!)
>
> This all gets even more complicated when dealing with large networks
> of networks in which the "speed of information transmission" can vary
> depending on the medium (cf. coaxial cables versus fiber versus
> microwave links!) In fact, the atmosphere is one of those media and
> variations therein result in the need for "GPS corrections" (cf. RTCM
> GPS correction messages, RTK, etc.) in order to get to sub-nsec/cm
> accuracy. Point is if you have a set of nodes distributed across the
> country all with GPS and all "synchronized to GPS time", and a second
> identical set of nodes (with no GPS) instead connected with a network
> of cables and fiber links, all of different lengths and composition
> using different carrier frequencies (dielectric constants vary with
> frequency!) "synchronized" to some clock somewhere using NTP or PTP),
> the synchronization of the two sets will be different unless a common
> reference clock is used AND all the above effects are taken into
> account, and good luck with that! J
>
> In conclusion, if anyone tells you that clock synchronization in
> communication networks is simple ("Just use GPS!"), you should feel
> free to chuckle (under your breath if necessaryJ)
>
> Cheers,
>
> RR
>
> -----Original Message-----
> From: Sebastian Moeller [mailto:moeller0@gmx.de]
> Sent: Thursday, January 12, 2023 12:23 AM
> To: Dick Roy
> Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David
> P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
> USA
>
> Hi RR,
>
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>>
>
>>
>
>>
>
>> -----Original Message-----
>
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
> Behalf Of Sebastian Moeller via Starlink
>
>> Sent: Wednesday, January 11, 2023 12:01 PM
>
>> To: Rodney W. Grimes
>
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;
> David P. Reed; Rpm; rjmcmahon; bloat
>
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
> in USA
>
>>
>
>> Hi Rodney,
>
>>
>
>>
>
>>
>
>>
>
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
> <starlink@gndrsh.dnsmgr.net> wrote:
>
>> >
>
>> > Hello,
>
>> >
>
>> > Yall can call me crazy if you want.. but... see below [RWG]
>
>> >> Hi Bib,
>
>> >>
>
>> >>
>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
> <starlink@lists.bufferbloat.net> wrote:
>
>> >>>
>
>> >>> My biggest barrier is the lack of clock sync by the devices,
> i.e. very limited support for PTP in data centers and in end devices.
> This limits the ability to measure one way delays (OWD) and most
> assume that OWD is 1/2 and RTT which typically is a mistake. We know
> this intuitively with airplane flight times or even car commute times
> where the one way time is not 1/2 a round trip time. Google maps &
> directions provide a time estimate for the one way link. It doesn't
> compute a round trip and divide by two.
>
>> >>>
>
>> >>> For those that can get clock sync working, the iperf 2
> --trip-times options is useful.
>
>> >>
>
>> >> [SM] +1; and yet even with unsynchronized clocks one can try
> to measure how latency changes under load and that can be done per
> direction. Sure this is far inferior to real reliably measured OWDs,
> but if life/the internet deals you lemons....
>
>> >
>
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data
> back and forth, for that matter any rate test, why not abuse some of
> that data and add the fundemental NTP clock sync data and
> bidirectionally pass each others concept of "current time". IIRC (its
> been 25 years since I worked on NTP at this level) you *should* be
> able to get a fairly accurate clock delta between each end, and then
> use that info and time stamps in the data stream to compute OWD's.
> You need to put 4 time stamps in the packet, and with that you can
> compute "offset".
>
>> [RR] For this to work at a reasonable level of accuracy, the
> timestamping circuits on both ends need to be deterministic and
> repeatable as I recall. Any uncertainty in that process adds to
> synchronization errors/uncertainties.
>
>>
>
>> [SM] Nice idea. I would guess that all timeslot based access
> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
> quality time" carefully to the "modems", so maybe all that would be
> needed is to expose that high quality time to the LAN side of those
> modems, dressed up as NTP server?
>
>> [RR] It's not that simple! Distributing "high-quality time", i.e.
> "synchronizing all clocks" does not solve the communication problem in
> synchronous slotted MAC/PHYs!
>
> [SM] I happily believe you, but the same idea of "time slot"
> needs to be shared by all nodes, no? So the clockss need to be
> reasonably similar rate, aka synchronized (see below).
>
>> All the technologies you mentioned above are essentially P2P, not
> intended for broadcast. Point is, there is a point controller (aka
> PoC) often called a base station (eNodeB, gNodeB, .) that actually
> "controls everything that is necessary to control" at the UE including
> time, frequency and sampling time offsets, and these are critical to
> get right if you want to communicate, and they are ALL subject to the
> laws of physics (cf. the speed of light)! Turns out that what is
> necessary for the system to function anywhere near capacity, is for
> all the clocks governing transmissions from the UEs to be
> "unsynchronized" such that all the UE transmissions arrive at the PoC
> at the same (prescribed) time!
>
> [SM] Fair enough. I would call clocks that are "in sync" albeit
> with individual offsets as synchronized, but I am a layman and that
> might sound offensively wrong to experts in the field. But even
> without the naming my point is that all systems that depend on some
> idea of shared time-base are halfway there of exposing that time to
> end users, by "translating it into an NTP time source at the modem.
>
>> For some technologies, in particular 5G!, these considerations are
> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you
> don't believe me! J
>
> [SM Far be it from me not to believe you, so thanks for the
> pointers. Yet, I still think that unless different nodes of a shared
> segment move at significantly different speeds, that there should be a
> common "tick-duration" for all clocks even if each clock runs at an
> offset... (I naively would try to implement something like that by
> trying to fully synchronize clocks and maintain a local offset value
> to convert from "absolute" time to "network" time, but likely because
> coming from the outside I am blissfully unaware of the detail
> challenges that need to be solved).
>
> Regards & Thanks
>
> Sebastian
>
>>
>
>>
>
>> >
>
>> >>
>
>> >>
>
>> >>>
>
>> >>> --trip-times
>
>> >>> enable the measurement of end to end write to read latencies
> (client and server clocks must be synchronized)
>
>> > [RWG] --clock-skew
>
>> > enable the measurement of the wall clock difference between
> sender and receiver
>
>> >
>
>> >>
>
>> >> [SM] Sweet!
>
>> >>
>
>> >> Regards
>
>> >> Sebastian
>
>> >>
>
>> >>>
>
>> >>> Bob
>
>> >>>> I have many kvetches about the new latency under load tests
> being
>
>> >>>> designed and distributed over the past year. I am delighted!
> that they
>
>> >>>> are happening, but most really need third party evaluation, and
>
>
>> >>>> calibration, and a solid explanation of what network
> pathologies they
>
>> >>>> do and don't cover. Also a RED team attitude towards them, as
> well as
>
>> >>>> thinking hard about what you are not measuring (operations
> research).
>
>> >>>> I actually rather love the new cloudflare speedtest, because it
> tests
>
>> >>>> a single TCP connection, rather than dozens, and at the same
> time folk
>
>> >>>> are complaining that it doesn't find the actual "speed!".
> yet... the
>
>> >>>> test itself more closely emulates a user experience than
> speedtest.net
>
>> >>>> does. I am personally pretty convinced that the fewer numbers
> of flows
>
>> >>>> that a web page opens improves the likelihood of a good user
>
>> >>>> experience, but lack data on it.
>
>> >>>> To try to tackle the evaluation and calibration part, I've
> reached out
>
>> >>>> to all the new test designers in the hope that we could get
> together
>
>> >>>> and produce a report of what each new test is actually doing.
> I've
>
>> >>>> tweeted, linked in, emailed, and spammed every measurement list
> I know
>
>> >>>> of, and only to some response, please reach out to other test
> designer
>
>> >>>> folks and have them join the rpm email list?
>
>> >>>> My principal kvetches in the new tests so far are:
>
>> >>>> 0) None of the tests last long enough.
>
>> >>>> Ideally there should be a mode where they at least run to "time
> of
>
>> >>>> first loss", or periodically, just run longer than the
>
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be
> dragons
>
>> >>>> there! It's really bad science to optimize the internet for 20
>
>> >>>> seconds. It's like optimizing a car, to handle well, for just
> 20
>
>> >>>> seconds.
>
>> >>>> 1) Not testing up + down + ping at the same time
>
>> >>>> None of the new tests actually test the same thing that the
> infamous
>
>> >>>> rrul test does - all the others still test up, then down, and
> ping. It
>
>> >>>> was/remains my hope that the simpler parts of the flent test
> suite -
>
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the
> rtt_fair
>
>> >>>> tests would provide calibration to the test designers.
>
>> >>>> we've got zillions of flent results in the archive published
> here:
>
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>
>> >>>
>
>> >>>> The new tests have all added up + ping and down + ping, but not
> up +
>
>> >>>> down + ping. Why??
>
>> >>>> The behaviors of what happens in that case are really
> non-intuitive, I
>
>> >>>> know, but... it's just one more phase to add to any one of
> those new
>
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>
>> >>>> started doing that, even optionally, and boggled at how it
> defeated
>
>> >>>> their assumptions.
>
>> >>>> Among other things that would show...
>
>> >>>> It's the home router industry's dirty secret than darn few
> "gigabit"
>
>> >>>> home routers can actually forward in both directions at a
> gigabit. I'd
>
>> >>>> like to smash that perception thoroughly, but given our
> starting point
>
>> >>>> is a gigabit router was a "gigabit switch" - and historically
> been
>
>> >>>> something that couldn't even forward at 200Mbit - we have a
> long way
>
>> >>>> to go there.
>
>> >>>> Only in the past year have non-x86 home routers appeared that
> could
>
>> >>>> actually do a gbit in both directions.
>
>> >>>> 2) Few are actually testing within-stream latency
>
>> >>>> Apple's rpm project is making a stab in that direction. It
> looks
>
>> >>>> highly likely, that with a little more work, crusader and
>
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss
> and
>
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO
> on
>
>> >>>> windows, and Linux, at least, always appeared simple to me, but
> I'm
>
>> >>>> discovering how hard it is by delving deep into the rust behind
>
>
>> >>>> crusader.
>
>> >>>> the goresponsiveness thing is also IMHO running WAY too many
> streams
>
>> >>>> at the same time, I guess motivated by an attempt to have the
> test
>
>> >>>> complete quickly?
>
>> >>>> B) To try and tackle the validation problem:ps. Misinformation
> about iperf 2 impacts my ability to do this.
>
>> >>>
>
>> >>>> In the libreqos.io project we've established a testbed where
> tests can
>
>> >>>> be plunked through various ISP plan network emulations. It's
> here:
>
>> >>>> https://payne.taht.net (run bandwidth test for what's currently
> hooked
>
>> >>>> up)
>
>> >>>> We could rather use an AS number and at least a ipv4/24 and
> ipv6/48 to
>
>> >>>> leverage with that, so I don't have to nat the various
> emulations.
>
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
> licensed,
>
>> >>>> to see more test designers setup a testbed like this to
> calibrate
>
>> >>>> their own stuff.
>
>> >>>> Presently we're able to test:
>
>> >>>> flent
>
>> >>>> netperf
>
>> >>>> iperf2
>
>> >>>> iperf3
>
>> >>>> speedtest-cli
>
>> >>>> crusader
>
>> >>>> the broadband forum udp based test:
>
>> >>>> https://github.com/BroadbandForum/obudpst
>
>> >>>> trexx
>
>> >>>> There's also a virtual machine setup that we can remotely drive
> a web
>
>> >>>> browser from (but I didn't want to nat the results to the
> world) to
> awhile
>> >>>> test other web services.
>
>> >>>> _______________________________________________
>
>> >>>> Rpm mailing list
>
>> >>>> Rpm@lists.bufferbloat.net
>
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>
>> >>> _______________________________________________
>
>> >>> Starlink mailing list
>
>> >>> Starlink@lists.bufferbloat.net
>
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>
>> >>
>
>> >> _______________________________________________
>
>> >> Starlink mailing list
>
>> >> Starlink@lists.bufferbloat.net
>
>> >> https://lists.bufferbloat.net/listinfo/starlink
>
>>
>
>> _______________________________________________
>
>> Starlink mailing list
>
>> Starlink@lists.bufferbloat.net
>
>> https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 91806 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-13 7:33 ` Sebastian Moeller
@ 2023-01-13 8:26 ` Dick Roy
0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-13 8:26 UTC (permalink / raw)
To: 'Sebastian Moeller'
Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
'David P. Reed', 'Rpm', 'rjmcmahon',
'bloat'
[-- Attachment #1: Type: text/plain, Size: 17792 bytes --]
_____
From: Sebastian Moeller [mailto:moeller0@gmx.de]
Sent: Thursday, January 12, 2023 11:33 PM
To: dickroy@alum.mit.edu; Dick Roy
Cc: 'Rodney W. Grimes'; mike.reynolds@netforecast.com; 'libreqos'; 'David P.
Reed'; 'Rpm'; 'rjmcmahon'; 'bloat'
Subject: RE: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi RR,
Thanks for the detailed response below, since my point is somewhat
orthogonal I opted for top-posting.
Let me take a step back here and rephrase, synchronising clocks within an
acceptable range to be useful is not rocket science nor witchcraft. For
measuring internet traffic 'millisecond' range seems acceptable, local
networks can probably profit from finer time resolution. So I am not after
e.g. clock synchronisation to participate in SDH/SONET. Heck in the toy
project I am active in, we operate on load dependent delay deltas so we even
ignore different time offsets and are tolerant to (mildly) different
tickrates and clock skew, but it would certainly be nice to have some
acceptable measure of UTC from endpoints to be able to interpret timestamps
as 'absolute'. Mind you I am fine with them not being veridical absolute,
but just good enough for my measurement purpose and I guess that should be
within the range of the achievable. Heck, if all servers we query timestamps
of would be NTP-'synchronized' and would follow the RFC recommendation to
report timestamps in milliseconds past midnight UTC I would be happy.
[RR] Yup! All true. Hence my post that obviously passed this one in the
ether! :-) :-)
Regards
Sebsstian
On 12 January 2023 21:39:21 CET, Dick Roy <dickroy@alum.mit.edu> wrote:
Hi Sebastian (et. al.),
[I'll comment up here instead of inline.]
Let me start by saying that I have not been intimately involved with the
IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
similar vein, just adding the wireless first hop component and it's effects
on PTP.
What was apparent from the outset was that there was a lack of understanding
what the terms "to synchronize" or "to be synchronized" actually mean. It's
not trivial . because we live in a (approximately, that's another story!)
4-D space-time continuum where the Lorentz metric plays a critical role.
Therein, simultaneity (aka "things happening at the same time") means the
"distance" between two such events is zero and that distance is given by
sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
a clock somewhere. Now since everything is relative (time with respect to
what? / location with respect to where?) it's pretty easy to see that "if
you don't know where you are, you can't know what time it is!" (English
sailors of the 18th century knew this well!) Add to this the fact that if
everything were stationary, nothing would happen (as Einstein said "Nothing
happens until something moves!"), special relativity also pays a role.
Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
due to their "speed" (8700 mph roughly)! Then add the consequence that
without mass we wouldn't exist (in these forms at least:-)), and
gravitational effects (aka General Relativity) come into play. Those turn
out to make clocks on GPS satellites run 45usec/day faster than those on
earth! The net effect is that GPS clocks run about 38usec/day faster than
clocks on earth. So what does it mean to "synchronize to GPS"? Point is:
it's a non-trivial question with a very complicated answer. The reason it
is important to get all this right is that the "what that ties time and
space together" is the speed of light and that turns out to be a
"foot-per-nanosecond" in a vacuum (roughly 300m/usec). This means if I am
uncertain about my location to say 300 meters, then I also am not sure what
time it is to a usec AND vice-versa!
All that said, the simplest explanation of synchronization is probably: Two
clocks are synchronized if, when they are brought (slowly) into physical
proximity ("sat next to each other") in the same (quasi-)inertial frame and
the same gravitational potential (not so obvious BTW . see the FYI below!),
an observer of both would say "they are keeping time identically". Since
this experiment is rarely possible, one can never be "sure" that his clock
is synchronized to any other clock elsewhere. And what does it mean to say
they "were synchronized" when brought together, but now they are not because
they are now in different gravitational potentials! (FYI, there are land
mine detectors being developed on this very principle! I know someone who
actually worked on such a project!)
This all gets even more complicated when dealing with large networks of
networks in which the "speed of information transmission" can vary depending
on the medium (cf. coaxial cables versus fiber versus microwave links!) In
fact, the atmosphere is one of those media and variations therein result in
the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
in order to get to sub-nsec/cm accuracy. Point is if you have a set of
nodes distributed across the country all with GPS and all "synchronized to
GPS time", and a second identical set of nodes (with no GPS) instead
connected with a network of cables and fiber links, all of different lengths
and composition using different carrier frequencies (dielectric constants
vary with frequency!) "synchronized" to some clock somewhere using NTP or
PTP), the synchronization of the two sets will be different unless a common
reference clock is used AND all the above effects are taken into account,
and good luck with that! :-)
In conclusion, if anyone tells you that clock synchronization in
communication networks is simple ("Just use GPS!"), you should feel free to
chuckle (under your breath if necessary:-))
Cheers,
RR
-----Original Message-----
From: Sebastian Moeller [mailto:moeller0@gmx.de]
Sent: Thursday, January 12, 2023 12:23 AM
To: Dick Roy
Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David P.
Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
Hi RR,
> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>
>
> -----Original Message-----
> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf
Of Sebastian Moeller via Starlink
> Sent: Wednesday, January 11, 2023 12:01 PM
> To: Rodney W. Grimes
> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
> Hi Rodney,
>
>
>
>
> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
wrote:
> >
> > Hello,
> >
> > Yall can call me crazy if you want.. but... see below [RWG]
> >> Hi Bib,
> >>
> >>
> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:
> >>>
> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.
> >>>
> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.
> >>
> >> [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....
> >
> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time". IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's. You need to put 4 time stamps in the packet, and with
that you can compute "offset".
> [RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.
>
> [SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?
> [RR] It's not that simple! Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs!
[SM] I happily believe you, but the same idea of "time slot" needs to
be shared by all nodes, no? So the clockss need to be reasonably similar
rate, aka synchronized (see below).
> All the technologies you mentioned above are essentially P2P, not
intended for broadcast. Point is, there is a point controller (aka PoC)
often called a base station (eNodeB, gNodeB, .) that actually "controls
everything that is necessary to control" at the UE including time, frequency
and sampling time offsets, and these are critical to get right if you want
to communicate, and they are ALL subject to the laws of physics (cf. the
speed of light)! Turns out that what is necessary for the system to function
anywhere near capacity, is for all the clocks governing transmissions from
the UEs to be "unsynchronized" such that all the UE transmissions arrive at
the PoC at the same (prescribed) time!
[SM] Fair enough. I would call clocks that are "in sync" albeit with
individual offsets as synchronized, but I am a layman and that might sound
offensively wrong to experts in the field. But even without the naming my
point is that all systems that depend on some idea of shared time-base are
halfway there of exposing that time to end users, by "translating it into an
NTP time source at the modem.
> For some technologies, in particular 5G!, these considerations are
ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
believe me! J
[SM Far be it from me not to believe you, so thanks for the pointers.
Yet, I still think that unless different nodes of a shared segment move at
significantly different speeds, that there should be a common
"tick-duration" for all clocks even if each clock runs at an offset... (I
naively would try to implement something like that by trying to fully
synchronize clocks and maintain a local offset value to convert from
"absolute" time to "network" time, but likely because coming from the
outside I am blissfully unaware of the detail challenges that need to be
solved).
Regards & Thanks
Sebastian
>
>
> >
> >>
> >>
> >>>
> >>> --trip-times
> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)
> > [RWG] --clock-skew
> > enable the measurement of the wall clock difference between sender
and receiver
> >
> >>
> >> [SM] Sweet!
> >>
> >> Regards
> >> Sebastian
> >>
> >>>
> >>> Bob
> >>>> I have many kvetches about the new latency under load tests being
> >>>> designed and distributed over the past year. I am delighted! that
they
> >>>> are happening, but most really need third party evaluation, and
> >>>> calibration, and a solid explanation of what network pathologies they
> >>>> do and don't cover. Also a RED team attitude towards them, as well as
> >>>> thinking hard about what you are not measuring (operations research).
> >>>> I actually rather love the new cloudflare speedtest, because it tests
> >>>> a single TCP connection, rather than dozens, and at the same time
folk
> >>>> are complaining that it doesn't find the actual "speed!". yet... the
> >>>> test itself more closely emulates a user experience than
speedtest.net
> >>>> does. I am personally pretty convinced that the fewer numbers of
flows
> >>>> that a web page opens improves the likelihood of a good user
> >>>> experience, but lack data on it.
> >>>> To try to tackle the evaluation and calibration part, I've reached
out
> >>>> to all the new test designers in the hope that we could get together
> >>>> and produce a report of what each new test is actually doing. I've
> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know
> >>>> of, and only to some response, please reach out to other test
designer
> >>>> folks and have them join the rpm email list?
> >>>> My principal kvetches in the new tests so far are:
> >>>> 0) None of the tests last long enough.
> >>>> Ideally there should be a mode where they at least run to "time of
> >>>> first loss", or periodically, just run longer than the
> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
> >>>> there! It's really bad science to optimize the internet for 20
> >>>> seconds. It's like optimizing a car, to handle well, for just 20
> >>>> seconds.
> >>>> 1) Not testing up + down + ping at the same time
> >>>> None of the new tests actually test the same thing that the infamous
> >>>> rrul test does - all the others still test up, then down, and ping.
It
> >>>> was/remains my hope that the simpler parts of the flent test suite -
> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
> >>>> tests would provide calibration to the test designers.
> >>>> we've got zillions of flent results in the archive published here:
> >>>> https://blog.cerowrt.org/post/found_in_flent/
> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
> >>>
> >>>> The new tests have all added up + ping and down + ping, but not up +
> >>>> down + ping. Why??
> >>>> The behaviors of what happens in that case are really non-intuitive,
I
> >>>> know, but... it's just one more phase to add to any one of those new
> >>>> tests. I'd be deliriously happy if someone(s) new to the field
> >>>> started doing that, even optionally, and boggled at how it defeated
> >>>> their assumptions.
> >>>> Among other things that would show...
> >>>> It's the home router industry's dirty secret than darn few "gigabit"
> >>>> home routers can actually forward in both directions at a gigabit.
I'd
> >>>> like to smash that perception thoroughly, but given our starting
point
> >>>> is a gigabit router was a "gigabit switch" - and historically been
> >>>> something that couldn't even forward at 200Mbit - we have a long way
> >>>> to go there.
> >>>> Only in the past year have non-x86 home routers appeared that could
> >>>> actually do a gbit in both directions.
> >>>> 2) Few are actually testing within-stream latency
> >>>> Apple's rpm project is making a stab in that direction. It looks
> >>>> highly likely, that with a little more work, crusader and
> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
> >>>> discovering how hard it is by delving deep into the rust behind
> >>>> crusader.
> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
> >>>> at the same time, I guess motivated by an attempt to have the test
> >>>> complete quickly?
> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.
> >>>
> >>>> In the libreqos.io project we've established a testbed where tests
can
> >>>> be plunked through various ISP plan network emulations. It's here:
> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked
> >>>> up)
> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to
> >>>> leverage with that, so I don't have to nat the various emulations.
> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
> >>>> to see more test designers setup a testbed like this to calibrate
> >>>> their own stuff.
> >>>> Presently we're able to test:
> >>>> flent
> >>>> netperf
> >>>> iperf2
> >>>> iperf3
> >>>> speedtest-cli
> >>>> crusader
> >>>> the broadband forum udp based test:
> >>>> https://github.com/BroadbandForum/obudpst
> >>>> trexx
> >>>> There's also a virtual machine setup that we can remotely drive a web
> >>>> browser from (but I didn't want to nat the results to the world) to
> >>>> test other web services.
> >>>> _______________________________________________
> >>>> Rpm mailing list
> >>>> Rpm@lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/rpm
> >>> _______________________________________________
> >>> Starlink mailing list
> >>> Starlink@lists.bufferbloat.net
> >>> https://lists.bufferbloat.net/listinfo/starlink
> >>
> >> _______________________________________________
> >> Starlink mailing list
> >> Starlink@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/starlink
>
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
[-- Attachment #2: Type: text/html, Size: 50849 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-13 8:10 ` Dick Roy
@ 2023-01-15 23:09 ` rjmcmahon
0 siblings, 0 replies; 19+ messages in thread
From: rjmcmahon @ 2023-01-15 23:09 UTC (permalink / raw)
To: dickroy
Cc: 'Sebastian Moeller', 'Rodney W. Grimes',
mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
hmm, interesting. I'm thinking that GPS PPS is sufficient from iperf 2 &
classical mechanics perspective.
Have you looked at white rabbit per CERN?
https://kt.cern/article/white-rabbit-cern-born-open-source-technology-sets-new-global-standard-empowering-world#:~:text=White%20Rabbit%20(WR)%20is%20a,the%20field%20of%20particle%20physics.
This discussion does make me question if there is a better metric than
one way delay, i.e. "speed of causality as limited by network i/o" taken
per each end of the e2e path? My expertise is quite limited w/respect to
relativity so I don't know if the below makes any sense or not. I also
think a core issue is the simultaneity of the start which isn't obvious
on how to discern.
Does comparing the write blocking times (or frequency) histograms to the
read blocking times (or frequency) histograms which are coupled by tcp's
control loop do anything useful? The blocking occurs because of a
coupling & awating per the remote. Then compare those against a write to
read thread on the same chip (which I think should be the same in each
reference frame and the fastest i/o possible for an end.) The frequency
differences might be due to what you call "interruptions" & one way
delays (& error) assuming all else equal??
Thanks in advance for any thoughts on this.
Bob
> -----Original Message-----
> From: rjmcmahon [mailto:rjmcmahon@rjmcmahon.com]
> Sent: Thursday, January 12, 2023 11:40 PM
> To: dickroy@alum.mit.edu
> Cc: 'Sebastian Moeller'; 'Rodney W. Grimes';
> mike.reynolds@netforecast.com; 'libreqos'; 'David P. Reed'; 'Rpm';
> 'bloat'
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
> USA
>
> Hi RR,
>
> I believe quality GPS chips compensate for relativity in pulse per
>
> second which is needed to get position accuracy.
>
> _[RR] Of course they do. That 38usec/day really matters! They assume
> they know what the gravitational potential is where they are, and they
> can estimate the potential at the satellites so they can compensate,
> and they do. Point is, a GPS unit at Lake Tahoe (6250') runs faster
> than the one in San Francisco (sea level). How do you think these two
> "should be synchronized"! How do you define "synchronization" in
> this case? You synchronize those two clocks, then what about all the
> other clocks at Lake Tahoe (or SF or anywhere in between for that
> matter __J)??? These are not trivial questions. However if all one
> cares about is seconds or milliseconds, then you can argue that we
> (earthlings on planet earth) can "sweep such facts under the
> proverbial rug" for the purposes of latency in communication networks
> and that's certainly doable. Don't tell that to the guys whose
> protocols require "synchronization of all unit to nanoseconds" though!
> They will be very, very unhappy __J __J And you know who you are __J
> __J _
>
> _ _
>
> _J_
>
> Bob
>
>> Hi Sebastian (et. al.),
>
>>
>
>> [I'll comment up here instead of inline.]
>
>>
>
>> Let me start by saying that I have not been intimately involved with
>
>
>> the IEEE 1588 effort (PTP), however I was involved in the 802.11
>
>> efforts along a similar vein, just adding the wireless first hop
>
>> component and it's effects on PTP.
>
>>
>
>> What was apparent from the outset was that there was a lack of
>
>> understanding what the terms "to synchronize" or "to be
> synchronized"
>
>> actually mean. It's not trivial … because we live in a
>
>> (approximately, that's another story!) 4-D space-time continuum
> where
>
>> the Lorentz metric plays a critical role. Therein, simultaneity
> (aka
>
>> "things happening at the same time") means the "distance" between
> two
>
>> such events is zero and that distance is given by sqrt(x^2 + y^2 +
> z^2
>
>> - (ct)^2) and the "thing happening" can be the tick of a clock
>
>> somewhere. Now since everything is relative (time with respect to
>
>> what? / location with respect to where?) it's pretty easy to see
> that
>
>> "if you don't know where you are, you can't know what time it is!"
>
>> (English sailors of the 18th century knew this well!) Add to this
> the
>
>> fact that if everything were stationary, nothing would happen (as
>
>> Einstein said "Nothing happens until something moves!"), special
>
>> relativity also pays a role. Clocks on GPS satellites run approx.
>
>> 7usecs/day slower than those on earth due to their "speed" (8700 mph
>
>
>> roughly)! Then add the consequence that without mass we wouldn't
> exist
>
>> (in these forms at leastJ), and gravitational effects (aka General
>
>> Relativity) come into play. Those turn out to make clocks on GPS
>
>> satellites run 45usec/day faster than those on earth! The net
> effect
>
>> is that GPS clocks run about 38usec/day faster than clocks on earth.
>
>
>> So what does it mean to "synchronize to GPS"? Point is: it's a
>
>> non-trivial question with a very complicated answer. The reason it
> is
>
>> important to get all this right is that the "what that ties time and
>
>
>> space together" is the speed of light and that turns out to be a
>
>> "foot-per-nanosecond" in a vacuum (roughly 300m/usec). This means
> if
>
>> I am uncertain about my location to say 300 meters, then I also am
> not
>
>> sure what time it is to a usec AND vice-versa!
>
>>
>
>> All that said, the simplest explanation of synchronization is
>
>> probably: Two clocks are synchronized if, when they are brought
>
>> (slowly) into physical proximity ("sat next to each other") in the
>
>> same (quasi-)inertial frame and the same gravitational potential
> (not
>
>> so obvious BTW … see the FYI below!), an observer of both would
> say
>
>> "they are keeping time identically". Since this experiment is rarely
>
>
>> possible, one can never be "sure" that his clock is synchronized to
>
>> any other clock elsewhere. And what does it mean to say they "were
>
>> synchronized" when brought together, but now they are not because
> they
>
>> are now in different gravitational potentials! (FYI, there are land
>
>> mine detectors being developed on this very principle! I know
> someone
>
>> who actually worked on such a project!)
>
>>
>
>> This all gets even more complicated when dealing with large networks
>
>
>> of networks in which the "speed of information transmission" can
> vary
>
>> depending on the medium (cf. coaxial cables versus fiber versus
>
>> microwave links!) In fact, the atmosphere is one of those media and
>
>> variations therein result in the need for "GPS corrections" (cf.
> RTCM
>
>> GPS correction messages, RTK, etc.) in order to get to sub-nsec/cm
>
>> accuracy. Point is if you have a set of nodes distributed across
> the
>
>> country all with GPS and all "synchronized to GPS time", and a
> second
>
>> identical set of nodes (with no GPS) instead connected with a
> network
>
>> of cables and fiber links, all of different lengths and composition
>
>> using different carrier frequencies (dielectric constants vary with
>
>> frequency!) "synchronized" to some clock somewhere using NTP or
> PTP),
>
>> the synchronization of the two sets will be different unless a
> common
>
>> reference clock is used AND all the above effects are taken into
>
>> account, and good luck with that! J
>
>>
>
>> In conclusion, if anyone tells you that clock synchronization in
>
>> communication networks is simple ("Just use GPS!"), you should feel
>
>> free to chuckle (under your breath if necessaryJ)
>
>>
>
>> Cheers,
>
>>
>
>> RR
>
>>
>
>> -----Original Message-----
>
>> From: Sebastian Moeller [mailto:moeller0@gmx.de]
>
>> Sent: Thursday, January 12, 2023 12:23 AM
>
>> To: Dick Roy
>
>> Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David
>
>
>> P. Reed; Rpm; rjmcmahon; bloat
>
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
> in
>
>> USA
>
>>
>
>> Hi RR,
>
>>
>
>>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>>
>
>>>
>
>>
>
>>>
>
>>
>
>>>
>
>>
>
>>> -----Original Message-----
>
>>
>
>>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
>
>> Behalf Of Sebastian Moeller via Starlink
>
>>
>
>>> Sent: Wednesday, January 11, 2023 12:01 PM
>
>>
>
>>> To: Rodney W. Grimes
>
>>
>
>>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com;
> libreqos;
>
>> David P. Reed; Rpm; rjmcmahon; bloat
>
>>
>
>>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
>
>> in USA
>
>>
>
>>>
>
>>
>
>>> Hi Rodney,
>
>>
>
>>>
>
>>
>
>>>
>
>>
>
>>>
>
>>
>
>>>
>
>>
>
>>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
>
>> <starlink@gndrsh.dnsmgr.net> wrote:
>
>>
>
>>> >
>
>>
>
>>> > Hello,
>
>>
>
>>> >
>
>>
>
>>> > Yall can call me crazy if you want.. but... see below [RWG]
>
>>
>
>>> >> Hi Bib,
>
>>
>
>>> >>
>
>>
>
>>> >>
>
>>
>
>>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
>
>> <starlink@lists.bufferbloat.net> wrote:
>
>>
>
>>> >>>
>
>>
>
>>> >>> My biggest barrier is the lack of clock sync by the devices,
>
>> i.e. very limited support for PTP in data centers and in end
> devices.
>
>> This limits the ability to measure one way delays (OWD) and most
>
>> assume that OWD is 1/2 and RTT which typically is a mistake. We know
>
>
>> this intuitively with airplane flight times or even car commute
> times
>
>> where the one way time is not 1/2 a round trip time. Google maps &
>
>> directions provide a time estimate for the one way link. It doesn't
>
>> compute a round trip and divide by two.
>
>>
>
>>> >>>
>
>>
>
>>> >>> For those that can get clock sync working, the iperf 2
>
>> --trip-times options is useful.
>
>>
>
>>> >>
>
>>
>
>>> >> [SM] +1; and yet even with unsynchronized clocks one can try
>
>> to measure how latency changes under load and that can be done per
>
>> direction. Sure this is far inferior to real reliably measured OWDs,
>
>
>> but if life/the internet deals you lemons....
>
>>
>
>>> >
>
>>
>
>>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data
>
>
>> back and forth, for that matter any rate test, why not abuse some of
>
>
>> that data and add the fundemental NTP clock sync data and
>
>> bidirectionally pass each others concept of "current time". IIRC
> (its
>
>> been 25 years since I worked on NTP at this level) you *should* be
>
>> able to get a fairly accurate clock delta between each end, and then
>
>
>> use that info and time stamps in the data stream to compute OWD's.
>
>> You need to put 4 time stamps in the packet, and with that you can
>
>> compute "offset".
>
>>
>
>>> [RR] For this to work at a reasonable level of accuracy, the
>
>> timestamping circuits on both ends need to be deterministic and
>
>> repeatable as I recall. Any uncertainty in that process adds to
>
>> synchronization errors/uncertainties.
>
>>
>
>>>
>
>>
>
>>> [SM] Nice idea. I would guess that all timeslot based access
>
>> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
>
>> quality time" carefully to the "modems", so maybe all that would be
>
>> needed is to expose that high quality time to the LAN side of those
>
>> modems, dressed up as NTP server?
>
>>
>
>>> [RR] It's not that simple! Distributing "high-quality time", i.e.
>
>> "synchronizing all clocks" does not solve the communication problem
> in
>
>> synchronous slotted MAC/PHYs!
>
>>
>
>> [SM] I happily believe you, but the same idea of "time slot"
>
>> needs to be shared by all nodes, no? So the clockss need to be
>
>> reasonably similar rate, aka synchronized (see below).
>
>>
>
>>> All the technologies you mentioned above are essentially P2P, not
>
>> intended for broadcast. Point is, there is a point controller (aka
>
>> PoC) often called a base station (eNodeB, gNodeB, …) that actually
>
>
>> "controls everything that is necessary to control" at the UE
> including
>
>> time, frequency and sampling time offsets, and these are critical to
>
>
>> get right if you want to communicate, and they are ALL subject to
> the
>
>> laws of physics (cf. the speed of light)! Turns out that what is
>
>> necessary for the system to function anywhere near capacity, is for
>
>> all the clocks governing transmissions from the UEs to be
>
>> "unsynchronized" such that all the UE transmissions arrive at the
> PoC
>
>> at the same (prescribed) time!
>
>>
>
>> [SM] Fair enough. I would call clocks that are "in sync"
> albeit
>
>> with individual offsets as synchronized, but I am a layman and that
>
>> might sound offensively wrong to experts in the field. But even
>
>> without the naming my point is that all systems that depend on some
>
>> idea of shared time-base are halfway there of exposing that time to
>
>> end users, by "translating it into an NTP time source at the modem.
>
>>
>
>>> For some technologies, in particular 5G!, these considerations are
>
>> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if
> you
>
>> don't believe me! J
>
>>
>
>> [SM Far be it from me not to believe you, so thanks for the
>
>> pointers. Yet, I still think that unless different nodes of a shared
>
>
>> segment move at significantly different speeds, that there should be
> a
>
>> common "tick-duration" for all clocks even if each clock runs at an
>
>> offset... (I naively would try to implement something like that by
>
>> trying to fully synchronize clocks and maintain a local offset value
>
>
>> to convert from "absolute" time to "network" time, but likely
> because
>
>> coming from the outside I am blissfully unaware of the detail
>
>> challenges that need to be solved).
>
>>
>
>> Regards & Thanks
>
>>
>
>> Sebastian
>
>>
>
>>>
>
>>
>
>>>
>
>>
>
>>> >
>
>>
>
>>> >>
>
>>
>
>>> >>
>
>>
>
>>> >>>
>
>>
>
>>> >>> --trip-times
>
>>
>
>>> >>> enable the measurement of end to end write to read latencies
>
>> (client and server clocks must be synchronized)
>
>>
>
>>> > [RWG] --clock-skew
>
>>
>
>>> > enable the measurement of the wall clock difference between
>
>> sender and receiver
>
>>
>
>>> >
>
>>
>
>>> >>
>
>>
>
>>> >> [SM] Sweet!
>
>>
>
>>> >>
>
>>
>
>>> >> Regards
>
>>
>
>>> >> Sebastian
>
>>
>
>>> >>
>
>>
>
>>> >>>
>
>>
>
>>> >>> Bob
>
>>
>
>>> >>>> I have many kvetches about the new latency under load tests
>
>> being
>
>>
>
>>> >>>> designed and distributed over the past year. I am delighted!
>
>> that they
>
>>
>
>>> >>>> are happening, but most really need third party evaluation,
> and
>
>>
>
>>
>
>>> >>>> calibration, and a solid explanation of what network
>
>> pathologies they
>
>>
>
>>> >>>> do and don't cover. Also a RED team attitude towards them, as
>
>> well as
>
>>
>
>>> >>>> thinking hard about what you are not measuring (operations
>
>> research).
>
>>
>
>>> >>>> I actually rather love the new cloudflare speedtest, because
> it
>
>> tests
>
>>
>
>>> >>>> a single TCP connection, rather than dozens, and at the same
>
>> time folk
>
>>
>
>>> >>>> are complaining that it doesn't find the actual "speed!".
>
>> yet... the
>
>>
>
>>> >>>> test itself more closely emulates a user experience than
>
>> speedtest.net
>
>>
>
>>> >>>> does. I am personally pretty convinced that the fewer numbers
>
>> of flows
>
>>
>
>>> >>>> that a web page opens improves the likelihood of a good user
>
>>
>
>>> >>>> experience, but lack data on it.
>
>>
>
>>> >>>> To try to tackle the evaluation and calibration part, I've
>
>> reached out
>
>>
>
>>> >>>> to all the new test designers in the hope that we could get
>
>> together
>
>>
>
>>> >>>> and produce a report of what each new test is actually doing.
>
>> I've
>
>>
>
>>> >>>> tweeted, linked in, emailed, and spammed every measurement
> list
>
>> I know
>
>>
>
>>> >>>> of, and only to some response, please reach out to other test
>
>> designer
>
>>
>
>>> >>>> folks and have them join the rpm email list?
>
>>
>
>>> >>>> My principal kvetches in the new tests so far are:
>
>>
>
>>> >>>> 0) None of the tests last long enough.
>
>>
>
>>> >>>> Ideally there should be a mode where they at least run to
> "time
>
>> of
>
>>
>
>>> >>>> first loss", or periodically, just run longer than the
>
>>
>
>>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be
>
>> dragons
>
>>
>
>>> >>>> there! It's really bad science to optimize the internet for 20
>
>
>>
>
>>> >>>> seconds. It's like optimizing a car, to handle well, for just
>
>> 20
>
>>
>
>>> >>>> seconds.
>
>>
>
>>> >>>> 1) Not testing up + down + ping at the same time
>
>>
>
>>> >>>> None of the new tests actually test the same thing that the
>
>> infamous
>
>>
>
>>> >>>> rrul test does - all the others still test up, then down, and
>
>> ping. It
>
>>
>
>>> >>>> was/remains my hope that the simpler parts of the flent test
>
>> suite -
>
>>
>
>>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the
>
>> rtt_fair
>
>>
>
>>> >>>> tests would provide calibration to the test designers.
>
>>
>
>>> >>>> we've got zillions of flent results in the archive published
>
>> here:
>
>>
>
>>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>
>>
>
>>> >>>> ps. Misinformation about iperf 2 impacts my ability to do
> this.
>
>>
>
>>
>
>>> >>>
>
>>
>
>>> >>>> The new tests have all added up + ping and down + ping, but
> not
>
>> up +
>
>>
>
>>> >>>> down + ping. Why??
>
>>
>
>>> >>>> The behaviors of what happens in that case are really
>
>> non-intuitive, I
>
>>
>
>>> >>>> know, but... it's just one more phase to add to any one of
>
>> those new
>
>>
>
>>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>
>
>>
>
>>> >>>> started doing that, even optionally, and boggled at how it
>
>> defeated
>
>>
>
>>> >>>> their assumptions.
>
>>
>
>>> >>>> Among other things that would show...
>
>>
>
>>> >>>> It's the home router industry's dirty secret than darn few
>
>> "gigabit"
>
>>
>
>>> >>>> home routers can actually forward in both directions at a
>
>> gigabit. I'd
>
>>
>
>>> >>>> like to smash that perception thoroughly, but given our
>
>> starting point
>
>>
>
>>> >>>> is a gigabit router was a "gigabit switch" - and historically
>
>> been
>
>>
>
>>> >>>> something that couldn't even forward at 200Mbit - we have a
>
>> long way
>
>>
>
>>> >>>> to go there.
>
>>
>
>>> >>>> Only in the past year have non-x86 home routers appeared that
>
>> could
>
>>
>
>>> >>>> actually do a gbit in both directions.
>
>>
>
>>> >>>> 2) Few are actually testing within-stream latency
>
>>
>
>>> >>>> Apple's rpm project is making a stab in that direction. It
>
>> looks
>
>>
>
>>> >>>> highly likely, that with a little more work, crusader and
>
>>
>
>>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss
>
>
>> and
>
>>
>
>>> >>>> markings, more directly. As for the rest... sampling TCP_INFO
>
>> on
>
>>
>
>>> >>>> windows, and Linux, at least, always appeared simple to me,
> but
>
>> I'm
>
>>
>
>>> >>>> discovering how hard it is by delving deep into the rust
> behind
>
>>
>
>>
>
>>> >>>> crusader.
>
>>
>
>>> >>>> the goresponsiveness thing is also IMHO running WAY too many
>
>> streams
>
>>
>
>>> >>>> at the same time, I guess motivated by an attempt to have the
>
>> test
>
>>
>
>>> >>>> complete quickly?
>
>>
>
>>> >>>> B) To try and tackle the validation problem:ps. Misinformation
>
>
>> about iperf 2 impacts my ability to do this.
>
>>
>
>>> >>>
>
>>
>
>>> >>>> In the libreqos.io project we've established a testbed where
>
>> tests can
>
>>
>
>>> >>>> be plunked through various ISP plan network emulations. It's
>
>> here:
>
>>
>
>>> >>>> https://payne.taht.net (run bandwidth test for what's
> currently
>
>> hooked
>
>>
>
>>> >>>> up)
>
>>
>
>>> >>>> We could rather use an AS number and at least a ipv4/24 and
>
>> ipv6/48 to
>
>>
>
>>> >>>> leverage with that, so I don't have to nat the various
>
>> emulations.
>
>>
>
>>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
>
>> licensed,
>
>>
>
>>> >>>> to see more test designers setup a testbed like this to
>
>> calibrate
>
>>
>
>>> >>>> their own stuff.
>
>>
>
>>> >>>> Presently we're able to test:
>
>>
>
>>> >>>> flent
>
>>
>
>>> >>>> netperf
>
>>
>
>>> >>>> iperf2
>
>>
>
>>> >>>> iperf3
>
>>
>
>>> >>>> speedtest-cli
>
>>
>
>>> >>>> crusader
>
>>
>
>>> >>>> the broadband forum udp based test:
>
>>
>
>>> >>>> https://github.com/BroadbandForum/obudpst
>
>>
>
>>> >>>> trexx
>
>>
>
>>> >>>> There's also a virtual machine setup that we can remotely
> drive
>
>> a web
>
>>
>
>>> >>>> browser from (but I didn't want to nat the results to the
>
>> world) to
>
>> awhile
>
>>> >>>> test other web services.
>
>>
>
>>> >>>> _______________________________________________
>
>>
>
>>> >>>> Rpm mailing list
>
>>
>
>>> >>>> Rpm@lists.bufferbloat.net
>
>>
>
>>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>
>>
>
>>> >>> _______________________________________________
>
>>
>
>>> >>> Starlink mailing list
>
>>
>
>>> >>> Starlink@lists.bufferbloat.net
>
>>
>
>>> >>> https://lists.bufferbloat.net/listinfo/starlink
>
>>
>
>>> >>
>
>>
>
>>> >> _______________________________________________
>
>>
>
>>> >> Starlink mailing list
>
>>
>
>>> >> Starlink@lists.bufferbloat.net
>
>>
>
>>> >> https://lists.bufferbloat.net/listinfo/starlink
>
>>
>
>>>
>
>>
>
>>> _______________________________________________
>
>>
>
>>> Starlink mailing list
>
>>
>
>>> Starlink@lists.bufferbloat.net
>
>>
>
>>> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-09 20:46 ` rjmcmahon
@ 2023-01-09 21:02 ` Dick Roy
0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-09 21:02 UTC (permalink / raw)
To: 'rjmcmahon', 'Dave Taht'
Cc: mike.reynolds, 'libreqos', 'David P. Reed',
'Rpm', 'bloat'
[-- Attachment #1: Type: text/plain, Size: 10301 bytes --]
-----Original Message-----
From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
rjmcmahon via Starlink
Sent: Monday, January 9, 2023 12:47 PM
To: Dave Taht
Cc: starlink@lists.bufferbloat.net; mike.reynolds@netforecast.com; libreqos;
David P. Reed; Rpm; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
The write to read latencies (OWD) are on the server side in CLT form.
Use --histograms on the server side to enable them.
Your client side sampled TCP RTT is 6ms with less than a 1 ms of
variance (or sqrt of variance as variance is typically squared)
[RR] or standard deviation (std for short) :-)
No
retries suggest the network isn't dropping packets.
All the newer bounceback code is only master and requires a compile from
source. It will be released in 2.1.9 after testing cycles. Hopefully, in
early March 2023
Bob
https://sourceforge.net/projects/iperf2/
> The DC that so graciously loaned us 3 machines for the testbed (thx
> equinix!), does support ptp, but we have not configured it yet. In ntp
> tests between these hosts we seem to be within 500us, and certainly
> 50us would be great, in the future.
>
> I note that in all my kvetching about the new tests' needing
> validation today... I kind of elided that I'm pretty happy with
> iperf2's new tests that landed last august, and are now appearing in
> linux package managers around the world. I hope more folk use them.
> (sorry robert, it's been a long time since last august!)
>
> Our new testbed has multiple setups. In one setup - basically the
> machine name is equal to a given ISP plan, and a key testing point is
> looking at the differences between the FCC 25-3 and 100/20 plans in
> the real world. However at our scale (25gbit) it turned out that
> emulating the delay realistically has problematic.
>
> Anyway, here's a 25/3 result for iperf (other results and iperf test
> type requests gladly accepted)
>
> root@lqos:~# iperf -6 --trip-times -c c25-3 -e -i 1
> ------------------------------------------------------------
> Client connecting to c25-3, TCP port 5001 with pid 2146556 (1 flows)
> Write buffer size: 131072 Byte
> TOS set to 0x0 (Nagle on)
> TCP window size: 85.3 KByte (default)
> ------------------------------------------------------------
> [ 1] local fd77::3%bond0.4 port 59396 connected with fd77::1:2 port
> 5001 (trip-times) (sock=3) (icwnd/mss/irtt=13/1428/948) (ct=1.10 ms)
> on 2023-01-09 20:13:37 (UTC)
> [ ID] Interval Transfer Bandwidth Write/Err Rtry
> Cwnd/RTT(var) NetPwr
> [ 1] 0.0000-1.0000 sec 3.25 MBytes 27.3 Mbits/sec 26/0 0
> 19K/6066(262) us 562
> [ 1] 1.0000-2.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 15K/4671(207) us 673
> [ 1] 2.0000-3.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 13K/5538(280) us 568
> [ 1] 3.0000-4.0000 sec 3.12 MBytes 26.2 Mbits/sec 25/0 0
> 16K/6244(355) us 525
> [ 1] 4.0000-5.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 19K/6152(216) us 511
> [ 1] 5.0000-6.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 22K/6764(529) us 465
> [ 1] 6.0000-7.0000 sec 3.12 MBytes 26.2 Mbits/sec 25/0 0
> 15K/5918(605) us 554
> [ 1] 7.0000-8.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 18K/5178(327) us 608
> [ 1] 8.0000-9.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 19K/5758(473) us 546
> [ 1] 9.0000-10.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0
> 16K/6141(280) us 512
> [ 1] 0.0000-10.0952 sec 30.6 MBytes 25.4 Mbits/sec 245/0
> 0 19K/5924(491) us 537
>
>
> On Mon, Jan 9, 2023 at 11:13 AM rjmcmahon <rjmcmahon@rjmcmahon.com>
> wrote:
>>
>> My biggest barrier is the lack of clock sync by the devices, i.e. very
>> limited support for PTP in data centers and in end devices. This
>> limits
>> the ability to measure one way delays (OWD) and most assume that OWD
>> is
>> 1/2 and RTT which typically is a mistake. We know this intuitively
>> with
>> airplane flight times or even car commute times where the one way time
>> is not 1/2 a round trip time. Google maps & directions provide a time
>> estimate for the one way link. It doesn't compute a round trip and
>> divide by two.
>>
>> For those that can get clock sync working, the iperf 2 --trip-times
>> options is useful.
>>
>> --trip-times
>> enable the measurement of end to end write to read latencies
>> (client
>> and server clocks must be synchronized)
>>
>> Bob
>> > I have many kvetches about the new latency under load tests being
>> > designed and distributed over the past year. I am delighted! that they
>> > are happening, but most really need third party evaluation, and
>> > calibration, and a solid explanation of what network pathologies they
>> > do and don't cover. Also a RED team attitude towards them, as well as
>> > thinking hard about what you are not measuring (operations research).
>> >
>> > I actually rather love the new cloudflare speedtest, because it tests
>> > a single TCP connection, rather than dozens, and at the same time folk
>> > are complaining that it doesn't find the actual "speed!". yet... the
>> > test itself more closely emulates a user experience than speedtest.net
>> > does. I am personally pretty convinced that the fewer numbers of flows
>> > that a web page opens improves the likelihood of a good user
>> > experience, but lack data on it.
>> >
>> > To try to tackle the evaluation and calibration part, I've reached out
>> > to all the new test designers in the hope that we could get together
>> > and produce a report of what each new test is actually doing. I've
>> > tweeted, linked in, emailed, and spammed every measurement list I know
>> > of, and only to some response, please reach out to other test designer
>> > folks and have them join the rpm email list?
>> >
>> > My principal kvetches in the new tests so far are:
>> >
>> > 0) None of the tests last long enough.
>> >
>> > Ideally there should be a mode where they at least run to "time of
>> > first loss", or periodically, just run longer than the
>> > industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> > there! It's really bad science to optimize the internet for 20
>> > seconds. It's like optimizing a car, to handle well, for just 20
>> > seconds.
>> >
>> > 1) Not testing up + down + ping at the same time
>> >
>> > None of the new tests actually test the same thing that the infamous
>> > rrul test does - all the others still test up, then down, and ping. It
>> > was/remains my hope that the simpler parts of the flent test suite -
>> > such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> > tests would provide calibration to the test designers.
>> >
>> > we've got zillions of flent results in the archive published here:
>> > https://blog.cerowrt.org/post/found_in_flent/
>> > ps. Misinformation about iperf 2 impacts my ability to do this.
>>
>> > The new tests have all added up + ping and down + ping, but not up +
>> > down + ping. Why??
>> >
>> > The behaviors of what happens in that case are really non-intuitive, I
>> > know, but... it's just one more phase to add to any one of those new
>> > tests. I'd be deliriously happy if someone(s) new to the field
>> > started doing that, even optionally, and boggled at how it defeated
>> > their assumptions.
>> >
>> > Among other things that would show...
>> >
>> > It's the home router industry's dirty secret than darn few "gigabit"
>> > home routers can actually forward in both directions at a gigabit. I'd
>> > like to smash that perception thoroughly, but given our starting point
>> > is a gigabit router was a "gigabit switch" - and historically been
>> > something that couldn't even forward at 200Mbit - we have a long way
>> > to go there.
>> >
>> > Only in the past year have non-x86 home routers appeared that could
>> > actually do a gbit in both directions.
>> >
>> > 2) Few are actually testing within-stream latency
>> >
>> > Apple's rpm project is making a stab in that direction. It looks
>> > highly likely, that with a little more work, crusader and
>> > go-responsiveness can finally start sampling the tcp RTT, loss and
>> > markings, more directly. As for the rest... sampling TCP_INFO on
>> > windows, and Linux, at least, always appeared simple to me, but I'm
>> > discovering how hard it is by delving deep into the rust behind
>> > crusader.
>> >
>> > the goresponsiveness thing is also IMHO running WAY too many streams
>> > at the same time, I guess motivated by an attempt to have the test
>> > complete quickly?
>> >
>> > B) To try and tackle the validation problem:ps. Misinformation about
>> > iperf 2 impacts my ability to do this.
>>
>> >
>> > In the libreqos.io project we've established a testbed where tests can
>> > be plunked through various ISP plan network emulations. It's here:
>> > https://payne.taht.net (run bandwidth test for what's currently hooked
>> > up)
>> >
>> > We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> > leverage with that, so I don't have to nat the various emulations.
>> > (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> > to see more test designers setup a testbed like this to calibrate
>> > their own stuff.
>> >
>> > Presently we're able to test:
>> > flent
>> > netperf
>> > iperf2
>> > iperf3
>> > speedtest-cli
>> > crusader
>> > the broadband forum udp based test:
>> > https://github.com/BroadbandForum/obudpst
>> > trexx
>> >
>> > There's also a virtual machine setup that we can remotely drive a web
>> > browser from (but I didn't want to nat the results to the world) to
>> > test other web services.
>> > _______________________________________________
>> > Rpm mailing list
>> > Rpm@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/rpm
_______________________________________________
Starlink mailing list
Starlink@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink
[-- Attachment #2: Type: text/html, Size: 40564 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
2023-01-09 19:13 ` [Bloat] [Rpm] " rjmcmahon
@ 2023-01-09 19:47 ` Sebastian Moeller
2023-01-09 20:20 ` [Bloat] [Rpm] [Starlink] " Dave Taht
1 sibling, 0 replies; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-09 19:47 UTC (permalink / raw)
To: rjmcmahon
Cc: Dave Täht, Dave Taht via Starlink, mike.reynolds, libreqos,
David P. Reed, Rpm, bloat
Hi Bib,
> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>
> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>
> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
[SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
>
> --trip-times
> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
[SM] Sweet!
Regards
Sebastian
>
> Bob
>> I have many kvetches about the new latency under load tests being
>> designed and distributed over the past year. I am delighted! that they
>> are happening, but most really need third party evaluation, and
>> calibration, and a solid explanation of what network pathologies they
>> do and don't cover. Also a RED team attitude towards them, as well as
>> thinking hard about what you are not measuring (operations research).
>> I actually rather love the new cloudflare speedtest, because it tests
>> a single TCP connection, rather than dozens, and at the same time folk
>> are complaining that it doesn't find the actual "speed!". yet... the
>> test itself more closely emulates a user experience than speedtest.net
>> does. I am personally pretty convinced that the fewer numbers of flows
>> that a web page opens improves the likelihood of a good user
>> experience, but lack data on it.
>> To try to tackle the evaluation and calibration part, I've reached out
>> to all the new test designers in the hope that we could get together
>> and produce a report of what each new test is actually doing. I've
>> tweeted, linked in, emailed, and spammed every measurement list I know
>> of, and only to some response, please reach out to other test designer
>> folks and have them join the rpm email list?
>> My principal kvetches in the new tests so far are:
>> 0) None of the tests last long enough.
>> Ideally there should be a mode where they at least run to "time of
>> first loss", or periodically, just run longer than the
>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> there! It's really bad science to optimize the internet for 20
>> seconds. It's like optimizing a car, to handle well, for just 20
>> seconds.
>> 1) Not testing up + down + ping at the same time
>> None of the new tests actually test the same thing that the infamous
>> rrul test does - all the others still test up, then down, and ping. It
>> was/remains my hope that the simpler parts of the flent test suite -
>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> tests would provide calibration to the test designers.
>> we've got zillions of flent results in the archive published here:
>> https://blog.cerowrt.org/post/found_in_flent/
>> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>> The new tests have all added up + ping and down + ping, but not up +
>> down + ping. Why??
>> The behaviors of what happens in that case are really non-intuitive, I
>> know, but... it's just one more phase to add to any one of those new
>> tests. I'd be deliriously happy if someone(s) new to the field
>> started doing that, even optionally, and boggled at how it defeated
>> their assumptions.
>> Among other things that would show...
>> It's the home router industry's dirty secret than darn few "gigabit"
>> home routers can actually forward in both directions at a gigabit. I'd
>> like to smash that perception thoroughly, but given our starting point
>> is a gigabit router was a "gigabit switch" - and historically been
>> something that couldn't even forward at 200Mbit - we have a long way
>> to go there.
>> Only in the past year have non-x86 home routers appeared that could
>> actually do a gbit in both directions.
>> 2) Few are actually testing within-stream latency
>> Apple's rpm project is making a stab in that direction. It looks
>> highly likely, that with a little more work, crusader and
>> go-responsiveness can finally start sampling the tcp RTT, loss and
>> markings, more directly. As for the rest... sampling TCP_INFO on
>> windows, and Linux, at least, always appeared simple to me, but I'm
>> discovering how hard it is by delving deep into the rust behind
>> crusader.
>> the goresponsiveness thing is also IMHO running WAY too many streams
>> at the same time, I guess motivated by an attempt to have the test
>> complete quickly?
>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>
>> In the libreqos.io project we've established a testbed where tests can
>> be plunked through various ISP plan network emulations. It's here:
>> https://payne.taht.net (run bandwidth test for what's currently hooked
>> up)
>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> leverage with that, so I don't have to nat the various emulations.
>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> to see more test designers setup a testbed like this to calibrate
>> their own stuff.
>> Presently we're able to test:
>> flent
>> netperf
>> iperf2
>> iperf3
>> speedtest-cli
>> crusader
>> the broadband forum udp based test:
>> https://github.com/BroadbandForum/obudpst
>> trexx
>> There's also a virtual machine setup that we can remotely drive a web
>> browser from (but I didn't want to nat the results to the world) to
>> test other web services.
>> _______________________________________________
>> Rpm mailing list
>> Rpm@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-01-15 23:09 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <202301111832.30BIWevV030127@gndrsh.dnsmgr.net>
2023-01-11 20:01 ` [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA Sebastian Moeller
2023-01-11 21:46 ` Dick Roy
2023-01-12 8:22 ` Sebastian Moeller
2023-01-12 18:02 ` rjmcmahon
2023-01-12 21:34 ` Dick Roy
2023-01-12 20:39 ` Dick Roy
2023-01-13 7:33 ` Sebastian Moeller
2023-01-13 8:26 ` Dick Roy
2023-01-13 7:40 ` rjmcmahon
2023-01-13 8:10 ` Dick Roy
2023-01-15 23:09 ` rjmcmahon
2023-01-11 20:09 ` rjmcmahon
2023-01-12 8:14 ` Sebastian Moeller
2023-01-12 17:49 ` Robert McMahon
2023-01-12 21:57 ` Dick Roy
2023-01-13 7:44 ` Sebastian Moeller
2023-01-13 8:01 ` Dick Roy
[not found] <mailman.2651.1672779463.1281.starlink@lists.bufferbloat.net>
[not found] ` <1672786712.106922180@apps.rackspace.com>
[not found] ` <F4CA66DA-516C-438A-8D8A-5F172E5DFA75@cable.comcast.com>
2023-01-09 15:26 ` [Bloat] [Starlink] " Dave Taht
2023-01-09 19:13 ` [Bloat] [Rpm] " rjmcmahon
2023-01-09 19:47 ` [Bloat] [Starlink] [Rpm] " Sebastian Moeller
2023-01-09 20:20 ` [Bloat] [Rpm] [Starlink] " Dave Taht
2023-01-09 20:46 ` rjmcmahon
2023-01-09 21:02 ` [Bloat] [Starlink] [Rpm] " Dick Roy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox