Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
       [not found] <202301111832.30BIWevV030127@gndrsh.dnsmgr.net>
@ 2023-01-11 20:01 ` Sebastian Moeller
  2023-01-11 21:46   ` Dick Roy
  2023-01-11 20:09 ` rjmcmahon
  1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-11 20:01 UTC (permalink / raw)
  To: Rodney W. Grimes
  Cc: rjmcmahon, Rpm, mike.reynolds, David P. Reed, libreqos,
	Dave Taht via Starlink, bloat

Hi Rodney,




> On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net> wrote:
> 
> Hello,
> 
> 	Yall can call me crazy if you want.. but... see below [RWG]
>> Hi Bib,
>> 
>> 
>>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>>> 
>>> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>>> 
>>> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>> 
>> 	[SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
> 
> [RWG] iperf2/iperf3, etc are already moving large amounts of data back and forth, for that matter any rate test, why not abuse some of that data and add the fundemental NTP clock sync data and bidirectionally pass each others concept of "current time".  IIRC (its been 25 years since I worked on NTP at this level) you *should* be able to get a fairly accurate clock delta between each end, and then use that info and time stamps in the data stream to compute OWD's.  You need to put 4 time stamps in the packet, and with that you can compute "offset".

	[SM] Nice idea. I would guess that all timeslot based access technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality time" carefully to the "modems", so maybe all that would be needed is to expose that high quality time to the LAN side of those modems, dressed up as NTP server?


> 
>> 
>> 
>>> 
>>> --trip-times
>>> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
> [RWG] --clock-skew
> 	enable the measurement of the wall clock difference between sender and receiver
> 
>> 
>> 	[SM] Sweet!
>> 
>> Regards
>> 	Sebastian
>> 
>>> 
>>> Bob
>>>> I have many kvetches about the new latency under load tests being
>>>> designed and distributed over the past year. I am delighted! that they
>>>> are happening, but most really need third party evaluation, and
>>>> calibration, and a solid explanation of what network pathologies they
>>>> do and don't cover. Also a RED team attitude towards them, as well as
>>>> thinking hard about what you are not measuring (operations research).
>>>> I actually rather love the new cloudflare speedtest, because it tests
>>>> a single TCP connection, rather than dozens, and at the same time folk
>>>> are complaining that it doesn't find the actual "speed!". yet... the
>>>> test itself more closely emulates a user experience than speedtest.net
>>>> does. I am personally pretty convinced that the fewer numbers of flows
>>>> that a web page opens improves the likelihood of a good user
>>>> experience, but lack data on it.
>>>> To try to tackle the evaluation and calibration part, I've reached out
>>>> to all the new test designers in the hope that we could get together
>>>> and produce a report of what each new test is actually doing. I've
>>>> tweeted, linked in, emailed, and spammed every measurement list I know
>>>> of, and only to some response, please reach out to other test designer
>>>> folks and have them join the rpm email list?
>>>> My principal kvetches in the new tests so far are:
>>>> 0) None of the tests last long enough.
>>>> Ideally there should be a mode where they at least run to "time of
>>>> first loss", or periodically, just run longer than the
>>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>>> there! It's really bad science to optimize the internet for 20
>>>> seconds. It's like optimizing a car, to handle well, for just 20
>>>> seconds.
>>>> 1) Not testing up + down + ping at the same time
>>>> None of the new tests actually test the same thing that the infamous
>>>> rrul test does - all the others still test up, then down, and ping. It
>>>> was/remains my hope that the simpler parts of the flent test suite -
>>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>>>> tests would provide calibration to the test designers.
>>>> we've got zillions of flent results in the archive published here:
>>>> https://blog.cerowrt.org/post/found_in_flent/
>>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>>> 
>>>> The new tests have all added up + ping and down + ping, but not up +
>>>> down + ping. Why??
>>>> The behaviors of what happens in that case are really non-intuitive, I
>>>> know, but... it's just one more phase to add to any one of those new
>>>> tests. I'd be deliriously happy if someone(s) new to the field
>>>> started doing that, even optionally, and boggled at how it defeated
>>>> their assumptions.
>>>> Among other things that would show...
>>>> It's the home router industry's dirty secret than darn few "gigabit"
>>>> home routers can actually forward in both directions at a gigabit. I'd
>>>> like to smash that perception thoroughly, but given our starting point
>>>> is a gigabit router was a "gigabit switch" - and historically been
>>>> something that couldn't even forward at 200Mbit - we have a long way
>>>> to go there.
>>>> Only in the past year have non-x86 home routers appeared that could
>>>> actually do a gbit in both directions.
>>>> 2) Few are actually testing within-stream latency
>>>> Apple's rpm project is making a stab in that direction. It looks
>>>> highly likely, that with a little more work, crusader and
>>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>>>> markings, more directly. As for the rest... sampling TCP_INFO on
>>>> windows, and Linux, at least, always appeared simple to me, but I'm
>>>> discovering how hard it is by delving deep into the rust behind
>>>> crusader.
>>>> the goresponsiveness thing is also IMHO running WAY too many streams
>>>> at the same time, I guess motivated by an attempt to have the test
>>>> complete quickly?
>>>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>>> 
>>>> In the libreqos.io project we've established a testbed where tests can
>>>> be plunked through various ISP plan network emulations. It's here:
>>>> https://payne.taht.net (run bandwidth test for what's currently hooked
>>>> up)
>>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>>>> leverage with that, so I don't have to nat the various emulations.
>>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>>>> to see more test designers setup a testbed like this to calibrate
>>>> their own stuff.
>>>> Presently we're able to test:
>>>> flent
>>>> netperf
>>>> iperf2
>>>> iperf3
>>>> speedtest-cli
>>>> crusader
>>>> the broadband forum udp based test:
>>>> https://github.com/BroadbandForum/obudpst
>>>> trexx
>>>> There's also a virtual machine setup that we can remotely drive a web
>>>> browser from (but I didn't want to nat the results to the world) to
>>>> test other web services.
>>>> _______________________________________________
>>>> Rpm mailing list
>>>> Rpm@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/rpm
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink
>> 
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
       [not found] <202301111832.30BIWevV030127@gndrsh.dnsmgr.net>
  2023-01-11 20:01 ` [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA Sebastian Moeller
@ 2023-01-11 20:09 ` rjmcmahon
  2023-01-12  8:14   ` Sebastian Moeller
  1 sibling, 1 reply; 19+ messages in thread
From: rjmcmahon @ 2023-01-11 20:09 UTC (permalink / raw)
  To: Rodney W. Grimes
  Cc: Sebastian Moeller, Rpm, mike.reynolds, David P. Reed, libreqos,
	Dave Taht via Starlink, bloat

Iperf 2 is designed to measure network i/o. Note: It doesn't have to 
move large amounts of data. It can support data profiles that don't 
drive TCP's CCA as an example.

Two things I've been asked for and avoided:

1) Integrate clock sync into iperf's test traffic
2) Measure and output CPU usages

I think both of these are outside the scope of a tool designed to test 
network i/o over sockets, rather these should be developed & validated 
independently of a network i/o tool.

Clock error really isn't about amount/frequency of traffic but rather 
getting a periodic high-quality reference. I tend to use GPS pulse per 
second to lock the local system oscillator to. As David says, most every 
modern handheld computer has the GPS chips to do this already. So to me 
it seems more of a policy choice between data center operators and 
device mfgs and less of a technical issue.

Bob
> Hello,
> 
> 	Yall can call me crazy if you want.. but... see below [RWG]
>> Hi Bib,
>> 
>> 
>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>> >
>> > My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>> >
>> > For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>> 
>> 	[SM] +1; and yet even with unsynchronized clocks one can try to 
>> measure how latency changes under load and that can be done per 
>> direction. Sure this is far inferior to real reliably measured OWDs, 
>> but if life/the internet deals you lemons....
> 
>  [RWG] iperf2/iperf3, etc are already moving large amounts of data
> back and forth, for that matter any rate test, why not abuse some of
> that data and add the fundemental NTP clock sync data and
> bidirectionally pass each others concept of "current time".  IIRC (its
> been 25 years since I worked on NTP at this level) you *should* be
> able to get a fairly accurate clock delta between each end, and then
> use that info and time stamps in the data stream to compute OWD's.
> You need to put 4 time stamps in the packet, and with that you can
> compute "offset".
> 
>> 
>> 
>> >
>> > --trip-times
>> >  enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
>  [RWG] --clock-skew
> 	enable the measurement of the wall clock difference between sender and 
> receiver
> 
>> 
>> 	[SM] Sweet!
>> 
>> Regards
>> 	Sebastian
>> 
>> >
>> > Bob
>> >> I have many kvetches about the new latency under load tests being
>> >> designed and distributed over the past year. I am delighted! that they
>> >> are happening, but most really need third party evaluation, and
>> >> calibration, and a solid explanation of what network pathologies they
>> >> do and don't cover. Also a RED team attitude towards them, as well as
>> >> thinking hard about what you are not measuring (operations research).
>> >> I actually rather love the new cloudflare speedtest, because it tests
>> >> a single TCP connection, rather than dozens, and at the same time folk
>> >> are complaining that it doesn't find the actual "speed!". yet... the
>> >> test itself more closely emulates a user experience than speedtest.net
>> >> does. I am personally pretty convinced that the fewer numbers of flows
>> >> that a web page opens improves the likelihood of a good user
>> >> experience, but lack data on it.
>> >> To try to tackle the evaluation and calibration part, I've reached out
>> >> to all the new test designers in the hope that we could get together
>> >> and produce a report of what each new test is actually doing. I've
>> >> tweeted, linked in, emailed, and spammed every measurement list I know
>> >> of, and only to some response, please reach out to other test designer
>> >> folks and have them join the rpm email list?
>> >> My principal kvetches in the new tests so far are:
>> >> 0) None of the tests last long enough.
>> >> Ideally there should be a mode where they at least run to "time of
>> >> first loss", or periodically, just run longer than the
>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> >> there! It's really bad science to optimize the internet for 20
>> >> seconds. It's like optimizing a car, to handle well, for just 20
>> >> seconds.
>> >> 1) Not testing up + down + ping at the same time
>> >> None of the new tests actually test the same thing that the infamous
>> >> rrul test does - all the others still test up, then down, and ping. It
>> >> was/remains my hope that the simpler parts of the flent test suite -
>> >> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> >> tests would provide calibration to the test designers.
>> >> we've got zillions of flent results in the archive published here:
>> >> https://blog.cerowrt.org/post/found_in_flent/
>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>> >
>> >> The new tests have all added up + ping and down + ping, but not up +
>> >> down + ping. Why??
>> >> The behaviors of what happens in that case are really non-intuitive, I
>> >> know, but... it's just one more phase to add to any one of those new
>> >> tests. I'd be deliriously happy if someone(s) new to the field
>> >> started doing that, even optionally, and boggled at how it defeated
>> >> their assumptions.
>> >> Among other things that would show...
>> >> It's the home router industry's dirty secret than darn few "gigabit"
>> >> home routers can actually forward in both directions at a gigabit. I'd
>> >> like to smash that perception thoroughly, but given our starting point
>> >> is a gigabit router was a "gigabit switch" - and historically been
>> >> something that couldn't even forward at 200Mbit - we have a long way
>> >> to go there.
>> >> Only in the past year have non-x86 home routers appeared that could
>> >> actually do a gbit in both directions.
>> >> 2) Few are actually testing within-stream latency
>> >> Apple's rpm project is making a stab in that direction. It looks
>> >> highly likely, that with a little more work, crusader and
>> >> go-responsiveness can finally start sampling the tcp RTT, loss and
>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>> >> windows, and Linux, at least, always appeared simple to me, but I'm
>> >> discovering how hard it is by delving deep into the rust behind
>> >> crusader.
>> >> the goresponsiveness thing is also IMHO running WAY too many streams
>> >> at the same time, I guess motivated by an attempt to have the test
>> >> complete quickly?
>> >> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>> >
>> >> In the libreqos.io project we've established a testbed where tests can
>> >> be plunked through various ISP plan network emulations. It's here:
>> >> https://payne.taht.net (run bandwidth test for what's currently hooked
>> >> up)
>> >> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> >> leverage with that, so I don't have to nat the various emulations.
>> >> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> >> to see more test designers setup a testbed like this to calibrate
>> >> their own stuff.
>> >> Presently we're able to test:
>> >> flent
>> >> netperf
>> >> iperf2
>> >> iperf3
>> >> speedtest-cli
>> >> crusader
>> >> the broadband forum udp based test:
>> >> https://github.com/BroadbandForum/obudpst
>> >> trexx
>> >> There's also a virtual machine setup that we can remotely drive a web
>> >> browser from (but I didn't want to nat the results to the world) to
>> >> test other web services.
>> >> _______________________________________________
>> >> Rpm mailing list
>> >> Rpm@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/rpm
>> > _______________________________________________
>> > Starlink mailing list
>> > Starlink@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/starlink
>> 
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
>> 
>> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-11 20:01 ` [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA Sebastian Moeller
@ 2023-01-11 21:46   ` Dick Roy
  2023-01-12  8:22     ` Sebastian Moeller
  0 siblings, 1 reply; 19+ messages in thread
From: Dick Roy @ 2023-01-11 21:46 UTC (permalink / raw)
  To: 'Sebastian Moeller', 'Rodney W. Grimes'
  Cc: mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'rjmcmahon', 'bloat'

[-- Attachment #1: Type: text/plain, Size: 9881 bytes --]

 

 

-----Original Message-----
From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
Sebastian Moeller via Starlink
Sent: Wednesday, January 11, 2023 12:01 PM
To: Rodney W. Grimes
Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

Hi Rodney,

 

 

 

 

> On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
wrote:

> 

> Hello,

> 

>     Yall can call me crazy if you want.. but... see below [RWG]

>> Hi Bib,

>> 

>> 

>>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:

>>> 

>>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.

>>> 

>>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.

>> 

>>    [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....

> 

> [RWG] iperf2/iperf3, etc are already moving large amounts of data back and
forth, for that matter any rate test, why not abuse some of that data and
add the fundemental NTP clock sync data and bidirectionally pass each others
concept of "current time".  IIRC (its been 25 years since I worked on NTP at
this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's.  You need to put 4 time stamps in the packet, and with
that you can compute "offset".

[RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.

 

      [SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?

[RR] It's not that simple!  Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs!  All the technologies you mentioned above are
essentially P2P, not intended for broadcast.  Point is, there is a point
controller (aka PoC) often called a base station (eNodeB, gNodeB, .) that
actually "controls everything that is necessary to control" at the UE
including time, frequency and sampling time offsets, and these are critical
to get right if you want to communicate, and they are ALL subject to the
laws of physics (cf. the speed of light)! Turns out that what is necessary
for the system to function anywhere near capacity, is for all the clocks
governing transmissions from the UEs to be "unsynchronized" such that all
the UE transmissions arrive at the PoC at the same (prescribed) time! For
some technologies, in particular 5G!, these considerations are ESSENTIAL.
Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't believe
me! :-)   

 

 

> 

>> 

>> 

>>> 

>>> --trip-times

>>> enable the measurement of end to end write to read latencies (client and
server clocks must be synchronized)

> [RWG] --clock-skew

>     enable the measurement of the wall clock difference between sender and
receiver

> 

>> 

>>    [SM] Sweet!

>> 

>> Regards

>>    Sebastian

>> 

>>> 

>>> Bob

>>>> I have many kvetches about the new latency under load tests being

>>>> designed and distributed over the past year. I am delighted! that they

>>>> are happening, but most really need third party evaluation, and

>>>> calibration, and a solid explanation of what network pathologies they

>>>> do and don't cover. Also a RED team attitude towards them, as well as

>>>> thinking hard about what you are not measuring (operations research).

>>>> I actually rather love the new cloudflare speedtest, because it tests

>>>> a single TCP connection, rather than dozens, and at the same time folk

>>>> are complaining that it doesn't find the actual "speed!". yet... the

>>>> test itself more closely emulates a user experience than speedtest.net

>>>> does. I am personally pretty convinced that the fewer numbers of flows

>>>> that a web page opens improves the likelihood of a good user

>>>> experience, but lack data on it.

>>>> To try to tackle the evaluation and calibration part, I've reached out

>>>> to all the new test designers in the hope that we could get together

>>>> and produce a report of what each new test is actually doing. I've

>>>> tweeted, linked in, emailed, and spammed every measurement list I know

>>>> of, and only to some response, please reach out to other test designer

>>>> folks and have them join the rpm email list?

>>>> My principal kvetches in the new tests so far are:

>>>> 0) None of the tests last long enough.

>>>> Ideally there should be a mode where they at least run to "time of

>>>> first loss", or periodically, just run longer than the

>>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

>>>> there! It's really bad science to optimize the internet for 20

>>>> seconds. It's like optimizing a car, to handle well, for just 20

>>>> seconds.

>>>> 1) Not testing up + down + ping at the same time

>>>> None of the new tests actually test the same thing that the infamous

>>>> rrul test does - all the others still test up, then down, and ping. It

>>>> was/remains my hope that the simpler parts of the flent test suite -

>>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

>>>> tests would provide calibration to the test designers.

>>>> we've got zillions of flent results in the archive published here:

>>>> https://blog.cerowrt.org/post/found_in_flent/

>>>> ps. Misinformation about iperf 2 impacts my ability to do this.

>>> 

>>>> The new tests have all added up + ping and down + ping, but not up +

>>>> down + ping. Why??

>>>> The behaviors of what happens in that case are really non-intuitive, I

>>>> know, but... it's just one more phase to add to any one of those new

>>>> tests. I'd be deliriously happy if someone(s) new to the field

>>>> started doing that, even optionally, and boggled at how it defeated

>>>> their assumptions.

>>>> Among other things that would show...

>>>> It's the home router industry's dirty secret than darn few "gigabit"

>>>> home routers can actually forward in both directions at a gigabit. I'd

>>>> like to smash that perception thoroughly, but given our starting point

>>>> is a gigabit router was a "gigabit switch" - and historically been

>>>> something that couldn't even forward at 200Mbit - we have a long way

>>>> to go there.

>>>> Only in the past year have non-x86 home routers appeared that could

>>>> actually do a gbit in both directions.

>>>> 2) Few are actually testing within-stream latency

>>>> Apple's rpm project is making a stab in that direction. It looks

>>>> highly likely, that with a little more work, crusader and

>>>> go-responsiveness can finally start sampling the tcp RTT, loss and

>>>> markings, more directly. As for the rest... sampling TCP_INFO on

>>>> windows, and Linux, at least, always appeared simple to me, but I'm

>>>> discovering how hard it is by delving deep into the rust behind

>>>> crusader.

>>>> the goresponsiveness thing is also IMHO running WAY too many streams

>>>> at the same time, I guess motivated by an attempt to have the test

>>>> complete quickly?

>>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.

>>> 

>>>> In the libreqos.io project we've established a testbed where tests can

>>>> be plunked through various ISP plan network emulations. It's here:

>>>> https://payne.taht.net (run bandwidth test for what's currently hooked

>>>> up)

>>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to

>>>> leverage with that, so I don't have to nat the various emulations.

>>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

>>>> to see more test designers setup a testbed like this to calibrate

>>>> their own stuff.

>>>> Presently we're able to test:

>>>> flent

>>>> netperf

>>>> iperf2

>>>> iperf3

>>>> speedtest-cli

>>>> crusader

>>>> the broadband forum udp based test:

>>>> https://github.com/BroadbandForum/obudpst

>>>> trexx

>>>> There's also a virtual machine setup that we can remotely drive a web

>>>> browser from (but I didn't want to nat the results to the world) to

>>>> test other web services.

>>>> _______________________________________________

>>>> Rpm mailing list

>>>> Rpm@lists.bufferbloat.net

>>>> https://lists.bufferbloat.net/listinfo/rpm

>>> _______________________________________________

>>> Starlink mailing list

>>> Starlink@lists.bufferbloat.net

>>> https://lists.bufferbloat.net/listinfo/starlink

>> 

>> _______________________________________________

>> Starlink mailing list

>> Starlink@lists.bufferbloat.net

>> https://lists.bufferbloat.net/listinfo/starlink

 

_______________________________________________

Starlink mailing list

Starlink@lists.bufferbloat.net

https://lists.bufferbloat.net/listinfo/starlink


[-- Attachment #2: Type: text/html, Size: 32121 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-11 20:09 ` rjmcmahon
@ 2023-01-12  8:14   ` Sebastian Moeller
  2023-01-12 17:49     ` Robert McMahon
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-12  8:14 UTC (permalink / raw)
  To: rjmcmahon
  Cc: Rodney W. Grimes, Rpm, mike.reynolds, David P. Reed, libreqos,
	Dave Taht via Starlink, bloat

Hi Bob,


> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
> 
> Iperf 2 is designed to measure network i/o. Note: It doesn't have to move large amounts of data. It can support data profiles that don't drive TCP's CCA as an example.
> 
> Two things I've been asked for and avoided:
> 
> 1) Integrate clock sync into iperf's test traffic

	[SM] This I understand, measurement conditions can be unsuited for tight time synchronization...


> 2) Measure and output CPU usages

	[SM] This one puzzles me, as far as I understand the only way to properly diagnose network issues is to rule out other things like CPU overload that can have symptoms similar to network issues. As an example, the cake qdisc will if CPU cycles become tight first increases its internal queueing and jitter (not consciously, it is just an observation that once cake does not get access to the CPU as timely as it wants, queuing latency and variability increases) and then later also shows reduced throughput, so similar things that can happen along an e2e network path for completely different reasons, e.g. lower level retransmissions or a variable rate link. So i would think that checking the CPU load at least coarse would be within the scope of network testing tools, no?

Regards
	Sebastian




> I think both of these are outside the scope of a tool designed to test network i/o over sockets, rather these should be developed & validated independently of a network i/o tool.
> 
> Clock error really isn't about amount/frequency of traffic but rather getting a periodic high-quality reference. I tend to use GPS pulse per second to lock the local system oscillator to. As David says, most every modern handheld computer has the GPS chips to do this already. So to me it seems more of a policy choice between data center operators and device mfgs and less of a technical issue.
> 
> Bob
>> Hello,
>> 	Yall can call me crazy if you want.. but... see below [RWG]
>>> Hi Bib,
>>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>>> >
>>> > My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>>> >
>>> > For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>>> 	[SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
>> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>> back and forth, for that matter any rate test, why not abuse some of
>> that data and add the fundemental NTP clock sync data and
>> bidirectionally pass each others concept of "current time".  IIRC (its
>> been 25 years since I worked on NTP at this level) you *should* be
>> able to get a fairly accurate clock delta between each end, and then
>> use that info and time stamps in the data stream to compute OWD's.
>> You need to put 4 time stamps in the packet, and with that you can
>> compute "offset".
>>> >
>>> > --trip-times
>>> >  enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
>> [RWG] --clock-skew
>> 	enable the measurement of the wall clock difference between sender and receiver
>>> 	[SM] Sweet!
>>> Regards
>>> 	Sebastian
>>> >
>>> > Bob
>>> >> I have many kvetches about the new latency under load tests being
>>> >> designed and distributed over the past year. I am delighted! that they
>>> >> are happening, but most really need third party evaluation, and
>>> >> calibration, and a solid explanation of what network pathologies they
>>> >> do and don't cover. Also a RED team attitude towards them, as well as
>>> >> thinking hard about what you are not measuring (operations research).
>>> >> I actually rather love the new cloudflare speedtest, because it tests
>>> >> a single TCP connection, rather than dozens, and at the same time folk
>>> >> are complaining that it doesn't find the actual "speed!". yet... the
>>> >> test itself more closely emulates a user experience than speedtest.net
>>> >> does. I am personally pretty convinced that the fewer numbers of flows
>>> >> that a web page opens improves the likelihood of a good user
>>> >> experience, but lack data on it.
>>> >> To try to tackle the evaluation and calibration part, I've reached out
>>> >> to all the new test designers in the hope that we could get together
>>> >> and produce a report of what each new test is actually doing. I've
>>> >> tweeted, linked in, emailed, and spammed every measurement list I know
>>> >> of, and only to some response, please reach out to other test designer
>>> >> folks and have them join the rpm email list?
>>> >> My principal kvetches in the new tests so far are:
>>> >> 0) None of the tests last long enough.
>>> >> Ideally there should be a mode where they at least run to "time of
>>> >> first loss", or periodically, just run longer than the
>>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>> >> there! It's really bad science to optimize the internet for 20
>>> >> seconds. It's like optimizing a car, to handle well, for just 20
>>> >> seconds.
>>> >> 1) Not testing up + down + ping at the same time
>>> >> None of the new tests actually test the same thing that the infamous
>>> >> rrul test does - all the others still test up, then down, and ping. It
>>> >> was/remains my hope that the simpler parts of the flent test suite -
>>> >> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>>> >> tests would provide calibration to the test designers.
>>> >> we've got zillions of flent results in the archive published here:
>>> >> https://blog.cerowrt.org/post/found_in_flent/
>>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>>> >
>>> >> The new tests have all added up + ping and down + ping, but not up +
>>> >> down + ping. Why??
>>> >> The behaviors of what happens in that case are really non-intuitive, I
>>> >> know, but... it's just one more phase to add to any one of those new
>>> >> tests. I'd be deliriously happy if someone(s) new to the field
>>> >> started doing that, even optionally, and boggled at how it defeated
>>> >> their assumptions.
>>> >> Among other things that would show...
>>> >> It's the home router industry's dirty secret than darn few "gigabit"
>>> >> home routers can actually forward in both directions at a gigabit. I'd
>>> >> like to smash that perception thoroughly, but given our starting point
>>> >> is a gigabit router was a "gigabit switch" - and historically been
>>> >> something that couldn't even forward at 200Mbit - we have a long way
>>> >> to go there.
>>> >> Only in the past year have non-x86 home routers appeared that could
>>> >> actually do a gbit in both directions.
>>> >> 2) Few are actually testing within-stream latency
>>> >> Apple's rpm project is making a stab in that direction. It looks
>>> >> highly likely, that with a little more work, crusader and
>>> >> go-responsiveness can finally start sampling the tcp RTT, loss and
>>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>>> >> windows, and Linux, at least, always appeared simple to me, but I'm
>>> >> discovering how hard it is by delving deep into the rust behind
>>> >> crusader.
>>> >> the goresponsiveness thing is also IMHO running WAY too many streams
>>> >> at the same time, I guess motivated by an attempt to have the test
>>> >> complete quickly?
>>> >> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>>> >
>>> >> In the libreqos.io project we've established a testbed where tests can
>>> >> be plunked through various ISP plan network emulations. It's here:
>>> >> https://payne.taht.net (run bandwidth test for what's currently hooked
>>> >> up)
>>> >> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>>> >> leverage with that, so I don't have to nat the various emulations.
>>> >> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>>> >> to see more test designers setup a testbed like this to calibrate
>>> >> their own stuff.
>>> >> Presently we're able to test:
>>> >> flent
>>> >> netperf
>>> >> iperf2
>>> >> iperf3
>>> >> speedtest-cli
>>> >> crusader
>>> >> the broadband forum udp based test:
>>> >> https://github.com/BroadbandForum/obudpst
>>> >> trexx
>>> >> There's also a virtual machine setup that we can remotely drive a web
>>> >> browser from (but I didn't want to nat the results to the world) to
>>> >> test other web services.
>>> >> _______________________________________________
>>> >> Rpm mailing list
>>> >> Rpm@lists.bufferbloat.net
>>> >> https://lists.bufferbloat.net/listinfo/rpm
>>> > _______________________________________________
>>> > Starlink mailing list
>>> > Starlink@lists.bufferbloat.net
>>> > https://lists.bufferbloat.net/listinfo/starlink
>>> _______________________________________________
>>> Starlink mailing list
>>> Starlink@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-11 21:46   ` Dick Roy
@ 2023-01-12  8:22     ` Sebastian Moeller
  2023-01-12 18:02       ` rjmcmahon
  2023-01-12 20:39       ` Dick Roy
  0 siblings, 2 replies; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-12  8:22 UTC (permalink / raw)
  To: Dick Roy
  Cc: Rodney W. Grimes, mike.reynolds, libreqos, David P. Reed, Rpm,
	rjmcmahon, bloat

Hi RR,


> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
> 
>  
>  
> -----Original Message-----
> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of Sebastian Moeller via Starlink
> Sent: Wednesday, January 11, 2023 12:01 PM
> To: Rodney W. Grimes
> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>  
> Hi Rodney,
>  
>  
>  
>  
> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net> wrote:
> > 
> > Hello,
> > 
> >     Yall can call me crazy if you want.. but... see below [RWG]
> >> Hi Bib,
> >> 
> >> 
> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
> >>> 
> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
> >>> 
> >>> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
> >> 
> >>    [SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
> > 
> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back and forth, for that matter any rate test, why not abuse some of that data and add the fundemental NTP clock sync data and bidirectionally pass each others concept of "current time".  IIRC (its been 25 years since I worked on NTP at this level) you *should* be able to get a fairly accurate clock delta between each end, and then use that info and time stamps in the data stream to compute OWD's.  You need to put 4 time stamps in the packet, and with that you can compute "offset".
> [RR] For this to work at a reasonable level of accuracy, the timestamping circuits on both ends need to be deterministic and repeatable as I recall. Any uncertainty in that process adds to synchronization errors/uncertainties.
>  
>       [SM] Nice idea. I would guess that all timeslot based access technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality time" carefully to the "modems", so maybe all that would be needed is to expose that high quality time to the LAN side of those modems, dressed up as NTP server?
> [RR] It’s not that simple!  Distributing “high-quality time”, i.e. “synchronizing all clocks” does not solve the communication problem in synchronous slotted MAC/PHYs!

	[SM] I happily believe you, but the same idea of "time slot" needs to be shared by all nodes, no? So the clockss need to be reasonably similar rate, aka synchronized (see below).


>  All the technologies you mentioned above are essentially P2P, not intended for broadcast.  Point is, there is a point controller (aka PoC) often called a base station (eNodeB, gNodeB, …) that actually “controls everything that is necessary to control” at the UE including time, frequency and sampling time offsets, and these are critical to get right if you want to communicate, and they are ALL subject to the laws of physics (cf. the speed of light)! Turns out that what is necessary for the system to function anywhere near capacity, is for all the clocks governing transmissions from the UEs to be “unsynchronized” such that all the UE transmissions arrive at the PoC at the same (prescribed) time!

	[SM] Fair enough. I would call clocks that are "in sync" albeit with individual offsets as synchronized, but I am a layman and that might sound offensively wrong to experts in the field. But even without the naming my point is that all systems that depend on some idea of shared time-base are halfway there of exposing that time to end users, by "translating it into an NTP time source at the modem.


> For some technologies, in particular 5G!, these considerations are ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don’t believe me! J   

	[SM Far be it from me not to believe you, so thanks for the pointers. Yet, I still think that unless different nodes of a shared segment move at significantly different speeds, that there should be a common "tick-duration" for all clocks even if each clock runs at an offset... (I naively would try to implement something like that by trying to fully synchronize clocks and maintain a local offset value to convert from "absolute" time to "network" time, but likely because coming from the outside I am blissfully unaware of the detail challenges that need to be solved).

Regards & Thanks
	Sebastian


>  
>  
> > 
> >> 
> >> 
> >>> 
> >>> --trip-times
> >>> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
> > [RWG] --clock-skew
> >     enable the measurement of the wall clock difference between sender and receiver
> > 
> >> 
> >>    [SM] Sweet!
> >> 
> >> Regards
> >>    Sebastian
> >> 
> >>> 
> >>> Bob
> >>>> I have many kvetches about the new latency under load tests being
> >>>> designed and distributed over the past year. I am delighted! that they
> >>>> are happening, but most really need third party evaluation, and
> >>>> calibration, and a solid explanation of what network pathologies they
> >>>> do and don't cover. Also a RED team attitude towards them, as well as
> >>>> thinking hard about what you are not measuring (operations research).
> >>>> I actually rather love the new cloudflare speedtest, because it tests
> >>>> a single TCP connection, rather than dozens, and at the same time folk
> >>>> are complaining that it doesn't find the actual "speed!". yet... the
> >>>> test itself more closely emulates a user experience than speedtest.net
> >>>> does. I am personally pretty convinced that the fewer numbers of flows
> >>>> that a web page opens improves the likelihood of a good user
> >>>> experience, but lack data on it.
> >>>> To try to tackle the evaluation and calibration part, I've reached out
> >>>> to all the new test designers in the hope that we could get together
> >>>> and produce a report of what each new test is actually doing. I've
> >>>> tweeted, linked in, emailed, and spammed every measurement list I know
> >>>> of, and only to some response, please reach out to other test designer
> >>>> folks and have them join the rpm email list?
> >>>> My principal kvetches in the new tests so far are:
> >>>> 0) None of the tests last long enough.
> >>>> Ideally there should be a mode where they at least run to "time of
> >>>> first loss", or periodically, just run longer than the
> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
> >>>> there! It's really bad science to optimize the internet for 20
> >>>> seconds. It's like optimizing a car, to handle well, for just 20
> >>>> seconds.
> >>>> 1) Not testing up + down + ping at the same time
> >>>> None of the new tests actually test the same thing that the infamous
> >>>> rrul test does - all the others still test up, then down, and ping. It
> >>>> was/remains my hope that the simpler parts of the flent test suite -
> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
> >>>> tests would provide calibration to the test designers.
> >>>> we've got zillions of flent results in the archive published here:
> >>>> https://blog.cerowrt.org/post/found_in_flent/
> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
> >>> 
> >>>> The new tests have all added up + ping and down + ping, but not up +
> >>>> down + ping. Why??
> >>>> The behaviors of what happens in that case are really non-intuitive, I
> >>>> know, but... it's just one more phase to add to any one of those new
> >>>> tests. I'd be deliriously happy if someone(s) new to the field
> >>>> started doing that, even optionally, and boggled at how it defeated
> >>>> their assumptions.
> >>>> Among other things that would show...
> >>>> It's the home router industry's dirty secret than darn few "gigabit"
> >>>> home routers can actually forward in both directions at a gigabit. I'd
> >>>> like to smash that perception thoroughly, but given our starting point
> >>>> is a gigabit router was a "gigabit switch" - and historically been
> >>>> something that couldn't even forward at 200Mbit - we have a long way
> >>>> to go there.
> >>>> Only in the past year have non-x86 home routers appeared that could
> >>>> actually do a gbit in both directions.
> >>>> 2) Few are actually testing within-stream latency
> >>>> Apple's rpm project is making a stab in that direction. It looks
> >>>> highly likely, that with a little more work, crusader and
> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
> >>>> discovering how hard it is by delving deep into the rust behind
> >>>> crusader.
> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
> >>>> at the same time, I guess motivated by an attempt to have the test
> >>>> complete quickly?
> >>>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
> >>> 
> >>>> In the libreqos.io project we've established a testbed where tests can
> >>>> be plunked through various ISP plan network emulations. It's here:
> >>>> https://payne.taht.net (run bandwidth test for what's currently hooked
> >>>> up)
> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
> >>>> leverage with that, so I don't have to nat the various emulations.
> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
> >>>> to see more test designers setup a testbed like this to calibrate
> >>>> their own stuff.
> >>>> Presently we're able to test:
> >>>> flent
> >>>> netperf
> >>>> iperf2
> >>>> iperf3
> >>>> speedtest-cli
> >>>> crusader
> >>>> the broadband forum udp based test:
> >>>> https://github.com/BroadbandForum/obudpst
> >>>> trexx
> >>>> There's also a virtual machine setup that we can remotely drive a web
> >>>> browser from (but I didn't want to nat the results to the world) to
> >>>> test other web services.
> >>>> _______________________________________________
> >>>> Rpm mailing list
> >>>> Rpm@lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/rpm
> >>> _______________________________________________
> >>> Starlink mailing list
> >>> Starlink@lists.bufferbloat.net
> >>> https://lists.bufferbloat.net/listinfo/starlink
> >> 
> >> _______________________________________________
> >> Starlink mailing list
> >> Starlink@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/starlink
>  
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12  8:14   ` Sebastian Moeller
@ 2023-01-12 17:49     ` Robert McMahon
  2023-01-12 21:57       ` Dick Roy
  0 siblings, 1 reply; 19+ messages in thread
From: Robert McMahon @ 2023-01-12 17:49 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: Rodney W. Grimes, Rpm, mike.reynolds, David P. Reed, libreqos,
	Dave Taht via Starlink, bloat

[-- Attachment #1: Type: text/plain, Size: 10817 bytes --]

Hi Sebastien,

⁣You make a good point. What I did was issue a warning if the tool found it was being CPU limited vs i/o limited. This indicates the i/o test likely is inaccurate from an i/o perspective, and the results are suspect. It does this crudely by comparing the cpu thread doing stats against the traffic threads doing i/o, which thread is waiting on the others. There is no attempt to assess the cpu load itself. So it's designed with a singular purpose of making sure i/o threads only block on syscalls of write and read.

I probably should revisit this both in design and implementation. Thanks for bringing it up and all input is truly appreciated.

Bob

On Jan 12, 2023, 12:14 AM, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>Hi Bob,
>
>
>> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
>>
>> Iperf 2 is designed to measure network i/o. Note: It doesn't have to
>move large amounts of data. It can support data profiles that don't
>drive TCP's CCA as an example.
>>
>> Two things I've been asked for and avoided:
>>
>> 1) Integrate clock sync into iperf's test traffic
>
>	[SM] This I understand, measurement conditions can be unsuited for
>tight time synchronization...
>
>
>> 2) Measure and output CPU usages
>
>	[SM] This one puzzles me, as far as I understand the only way to
>properly diagnose network issues is to rule out other things like CPU
>overload that can have symptoms similar to network issues. As an
>example, the cake qdisc will if CPU cycles become tight first increases
>its internal queueing and jitter (not consciously, it is just an
>observation that once cake does not get access to the CPU as timely as
>it wants, queuing latency and variability increases) and then later
>also shows reduced throughput, so similar things that can happen along
>an e2e network path for completely different reasons, e.g. lower level
>retransmissions or a variable rate link. So i would think that checking
>the CPU load at least coarse would be within the scope of network
>testing tools, no?
>
>Regards
>	Sebastian
>
>
>
>
>> I think both of these are outside the scope of a tool designed to
>test network i/o over sockets, rather these should be developed &
>validated independently of a network i/o tool.
>>
>> Clock error really isn't about amount/frequency of traffic but rather
>getting a periodic high-quality reference. I tend to use GPS pulse per
>second to lock the local system oscillator to. As David says, most
>every modern handheld computer has the GPS chips to do this already. So
>to me it seems more of a policy choice between data center operators
>and device mfgs and less of a technical issue.
>>
>> Bob
>>> Hello,
>>> 	Yall can call me crazy if you want.. but... see below [RWG]
>>>> Hi Bib,
>>>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>>>> >
>>>> > My biggest barrier is the lack of clock sync by the devices, i.e.
>very limited support for PTP in data centers and in end devices. This
>limits the ability to measure one way delays (OWD) and most assume that
>OWD is 1/2 and RTT which typically is a mistake. We know this
>intuitively with airplane flight times or even car commute times where
>the one way time is not 1/2 a round trip time. Google maps & directions
>provide a time estimate for the one way link. It doesn't compute a
>round trip and divide by two.
>>>> >
>>>> > For those that can get clock sync working, the iperf 2
>--trip-times options is useful.
>>>> 	[SM] +1; and yet even with unsynchronized clocks one can try to
>measure how latency changes under load and that can be done per
>direction. Sure this is far inferior to real reliably measured OWDs,
>but if life/the internet deals you lemons....
>>> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>>> back and forth, for that matter any rate test, why not abuse some of
>>> that data and add the fundemental NTP clock sync data and
>>> bidirectionally pass each others concept of "current time".  IIRC
>(its
>>> been 25 years since I worked on NTP at this level) you *should* be
>>> able to get a fairly accurate clock delta between each end, and then
>>> use that info and time stamps in the data stream to compute OWD's.
>>> You need to put 4 time stamps in the packet, and with that you can
>>> compute "offset".
>>>> >
>>>> > --trip-times
>>>> >  enable the measurement of end to end write to read latencies
>(client and server clocks must be synchronized)
>>> [RWG] --clock-skew
>>> 	enable the measurement of the wall clock difference between sender
>and receiver
>>>> 	[SM] Sweet!
>>>> Regards
>>>> 	Sebastian
>>>> >
>>>> > Bob
>>>> >> I have many kvetches about the new latency under load tests
>being
>>>> >> designed and distributed over the past year. I am delighted!
>that they
>>>> >> are happening, but most really need third party evaluation, and
>>>> >> calibration, and a solid explanation of what network pathologies
>they
>>>> >> do and don't cover. Also a RED team attitude towards them, as
>well as
>>>> >> thinking hard about what you are not measuring (operations
>research).
>>>> >> I actually rather love the new cloudflare speedtest, because it
>tests
>>>> >> a single TCP connection, rather than dozens, and at the same
>time folk
>>>> >> are complaining that it doesn't find the actual "speed!". yet...
>the
>>>> >> test itself more closely emulates a user experience than
>speedtest.net
>>>> >> does. I am personally pretty convinced that the fewer numbers of
>flows
>>>> >> that a web page opens improves the likelihood of a good user
>>>> >> experience, but lack data on it.
>>>> >> To try to tackle the evaluation and calibration part, I've
>reached out
>>>> >> to all the new test designers in the hope that we could get
>together
>>>> >> and produce a report of what each new test is actually doing.
>I've
>>>> >> tweeted, linked in, emailed, and spammed every measurement list
>I know
>>>> >> of, and only to some response, please reach out to other test
>designer
>>>> >> folks and have them join the rpm email list?
>>>> >> My principal kvetches in the new tests so far are:
>>>> >> 0) None of the tests last long enough.
>>>> >> Ideally there should be a mode where they at least run to "time
>of
>>>> >> first loss", or periodically, just run longer than the
>>>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>>> >> there! It's really bad science to optimize the internet for 20
>>>> >> seconds. It's like optimizing a car, to handle well, for just 20
>>>> >> seconds.
>>>> >> 1) Not testing up + down + ping at the same time
>>>> >> None of the new tests actually test the same thing that the
>infamous
>>>> >> rrul test does - all the others still test up, then down, and
>ping. It
>>>> >> was/remains my hope that the simpler parts of the flent test
>suite -
>>>> >> such as the tcp_up_squarewave tests, the rrul test, and the
>rtt_fair
>>>> >> tests would provide calibration to the test designers.
>>>> >> we've got zillions of flent results in the archive published
>here:
>>>> >> https://blog.cerowrt.org/post/found_in_flent/
>>>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>>>> >
>>>> >> The new tests have all added up + ping and down + ping, but not
>up +
>>>> >> down + ping. Why??
>>>> >> The behaviors of what happens in that case are really
>non-intuitive, I
>>>> >> know, but... it's just one more phase to add to any one of those
>new
>>>> >> tests. I'd be deliriously happy if someone(s) new to the field
>>>> >> started doing that, even optionally, and boggled at how it
>defeated
>>>> >> their assumptions.
>>>> >> Among other things that would show...
>>>> >> It's the home router industry's dirty secret than darn few
>"gigabit"
>>>> >> home routers can actually forward in both directions at a
>gigabit. I'd
>>>> >> like to smash that perception thoroughly, but given our starting
>point
>>>> >> is a gigabit router was a "gigabit switch" - and historically
>been
>>>> >> something that couldn't even forward at 200Mbit - we have a long
>way
>>>> >> to go there.
>>>> >> Only in the past year have non-x86 home routers appeared that
>could
>>>> >> actually do a gbit in both directions.
>>>> >> 2) Few are actually testing within-stream latency
>>>> >> Apple's rpm project is making a stab in that direction. It looks
>>>> >> highly likely, that with a little more work, crusader and
>>>> >> go-responsiveness can finally start sampling the tcp RTT, loss
>and
>>>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>>>> >> windows, and Linux, at least, always appeared simple to me, but
>I'm
>>>> >> discovering how hard it is by delving deep into the rust behind
>>>> >> crusader.
>>>> >> the goresponsiveness thing is also IMHO running WAY too many
>streams
>>>> >> at the same time, I guess motivated by an attempt to have the
>test
>>>> >> complete quickly?
>>>> >> B) To try and tackle the validation problem:ps. Misinformation
>about iperf 2 impacts my ability to do this.
>>>> >
>>>> >> In the libreqos.io project we've established a testbed where
>tests can
>>>> >> be plunked through various ISP plan network emulations. It's
>here:
>>>> >> https://payne.taht.net (run bandwidth test for what's currently
>hooked
>>>> >> up)
>>>> >> We could rather use an AS number and at least a ipv4/24 and
>ipv6/48 to
>>>> >> leverage with that, so I don't have to nat the various
>emulations.
>>>> >> (and funding, anyone got funding?) Or, as the code is GPLv2
>licensed,
>>>> >> to see more test designers setup a testbed like this to
>calibrate
>>>> >> their own stuff.
>>>> >> Presently we're able to test:
>>>> >> flent
>>>> >> netperf
>>>> >> iperf2
>>>> >> iperf3
>>>> >> speedtest-cli
>>>> >> crusader
>>>> >> the broadband forum udp based test:
>>>> >> https://github.com/BroadbandForum/obudpst
>>>> >> trexx
>>>> >> There's also a virtual machine setup that we can remotely drive
>a web
>>>> >> browser from (but I didn't want to nat the results to the world)
>to
>>>> >> test other web services.
>>>> >> _______________________________________________
>>>> >> Rpm mailing list
>>>> >> Rpm@lists.bufferbloat.net
>>>> >> https://lists.bufferbloat.net/listinfo/rpm
>>>> > _______________________________________________
>>>> > Starlink mailing list
>>>> > Starlink@lists.bufferbloat.net
>>>> > https://lists.bufferbloat.net/listinfo/starlink
>>>> _______________________________________________
>>>> Starlink mailing list
>>>> Starlink@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/starlink

[-- Attachment #2: Type: text/html, Size: 12665 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12  8:22     ` Sebastian Moeller
@ 2023-01-12 18:02       ` rjmcmahon
  2023-01-12 21:34         ` Dick Roy
  2023-01-12 20:39       ` Dick Roy
  1 sibling, 1 reply; 19+ messages in thread
From: rjmcmahon @ 2023-01-12 18:02 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: Dick Roy, Rodney W. Grimes, mike.reynolds, libreqos,
	David P. Reed, Rpm, bloat

For WiFi there is the TSF

https://en.wikipedia.org/wiki/Timing_synchronization_function

We in test & measurement use that in our internal telemetry. The TSF of 
a Wifi device only needs frequency-sync for some things typically 
related to access to the medium. A phase locked loop does it. A device 
that decides to go to sleep, as an example, will also stop its TSF 
creating a non-linearity. It's difficult to synchronize it to the system 
clock or the GPS atomic clock - though we do this for internal testing 
reasons so it can be done.

What's mostly missing for T&M with WiFi is the GPS atomic clock as 
that's a convenient time domain to use as the canonical domain.

Bob
> Hi RR,
> 
> 
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>> 
>> 
>> 
>> -----Original Message-----
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On 
>> Behalf Of Sebastian Moeller via Starlink
>> Sent: Wednesday, January 11, 2023 12:01 PM
>> To: Rodney W. Grimes
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; 
>> David P. Reed; Rpm; rjmcmahon; bloat
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in 
>> USA
>> 
>> Hi Rodney,
>> 
>> 
>> 
>> 
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net> wrote:
>> >
>> > Hello,
>> >
>> >     Yall can call me crazy if you want.. but... see below [RWG]
>> >> Hi Bib,
>> >>
>> >>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
>> >>>
>> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
>> >>>
>> >>> For those that can get clock sync working, the iperf 2 --trip-times options is useful.
>> >>
>> >>    [SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....
>> >
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back and forth, for that matter any rate test, why not abuse some of that data and add the fundemental NTP clock sync data and bidirectionally pass each others concept of "current time".  IIRC (its been 25 years since I worked on NTP at this level) you *should* be able to get a fairly accurate clock delta between each end, and then use that info and time stamps in the data stream to compute OWD's.  You need to put 4 time stamps in the packet, and with that you can compute "offset".
>> [RR] For this to work at a reasonable level of accuracy, the 
>> timestamping circuits on both ends need to be deterministic and 
>> repeatable as I recall. Any uncertainty in that process adds to 
>> synchronization errors/uncertainties.
>> 
>>       [SM] Nice idea. I would guess that all timeslot based access 
>> technologies (so starlink, docsis, GPON, LTE?) all distribute "high 
>> quality time" carefully to the "modems", so maybe all that would be 
>> needed is to expose that high quality time to the LAN side of those 
>> modems, dressed up as NTP server?
>> [RR] It’s not that simple!  Distributing “high-quality time”, i.e. 
>> “synchronizing all clocks” does not solve the communication problem in 
>> synchronous slotted MAC/PHYs!
> 
> 	[SM] I happily believe you, but the same idea of "time slot" needs to
> be shared by all nodes, no? So the clockss need to be reasonably
> similar rate, aka synchronized (see below).
> 
> 
>>  All the technologies you mentioned above are essentially P2P, not 
>> intended for broadcast.  Point is, there is a point controller (aka 
>> PoC) often called a base station (eNodeB, gNodeB, …) that actually 
>> “controls everything that is necessary to control” at the UE including 
>> time, frequency and sampling time offsets, and these are critical to 
>> get right if you want to communicate, and they are ALL subject to the 
>> laws of physics (cf. the speed of light)! Turns out that what is 
>> necessary for the system to function anywhere near capacity, is for 
>> all the clocks governing transmissions from the UEs to be 
>> “unsynchronized” such that all the UE transmissions arrive at the PoC 
>> at the same (prescribed) time!
> 
> 	[SM] Fair enough. I would call clocks that are "in sync" albeit with
> individual offsets as synchronized, but I am a layman and that might
> sound offensively wrong to experts in the field. But even without the
> naming my point is that all systems that depend on some idea of shared
> time-base are halfway there of exposing that time to end users, by
> "translating it into an NTP time source at the modem.
> 
> 
>> For some technologies, in particular 5G!, these considerations are 
>> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you 
>> don’t believe me! J
> 
> 	[SM Far be it from me not to believe you, so thanks for the pointers.
> Yet, I still think that unless different nodes of a shared segment
> move at significantly different speeds, that there should be a common
> "tick-duration" for all clocks even if each clock runs at an offset...
> (I naively would try to implement something like that by trying to
> fully synchronize clocks and maintain a local offset value to convert
> from "absolute" time to "network" time, but likely because coming from
> the outside I am blissfully unaware of the detail challenges that need
> to be solved).
> 
> Regards & Thanks
> 	Sebastian
> 
> 
>> 
>> 
>> >
>> >>
>> >>
>> >>>
>> >>> --trip-times
>> >>> enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)
>> > [RWG] --clock-skew
>> >     enable the measurement of the wall clock difference between sender and receiver
>> >
>> >>
>> >>    [SM] Sweet!
>> >>
>> >> Regards
>> >>    Sebastian
>> >>
>> >>>
>> >>> Bob
>> >>>> I have many kvetches about the new latency under load tests being
>> >>>> designed and distributed over the past year. I am delighted! that they
>> >>>> are happening, but most really need third party evaluation, and
>> >>>> calibration, and a solid explanation of what network pathologies they
>> >>>> do and don't cover. Also a RED team attitude towards them, as well as
>> >>>> thinking hard about what you are not measuring (operations research).
>> >>>> I actually rather love the new cloudflare speedtest, because it tests
>> >>>> a single TCP connection, rather than dozens, and at the same time folk
>> >>>> are complaining that it doesn't find the actual "speed!". yet... the
>> >>>> test itself more closely emulates a user experience than speedtest.net
>> >>>> does. I am personally pretty convinced that the fewer numbers of flows
>> >>>> that a web page opens improves the likelihood of a good user
>> >>>> experience, but lack data on it.
>> >>>> To try to tackle the evaluation and calibration part, I've reached out
>> >>>> to all the new test designers in the hope that we could get together
>> >>>> and produce a report of what each new test is actually doing. I've
>> >>>> tweeted, linked in, emailed, and spammed every measurement list I know
>> >>>> of, and only to some response, please reach out to other test designer
>> >>>> folks and have them join the rpm email list?
>> >>>> My principal kvetches in the new tests so far are:
>> >>>> 0) None of the tests last long enough.
>> >>>> Ideally there should be a mode where they at least run to "time of
>> >>>> first loss", or periodically, just run longer than the
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> >>>> there! It's really bad science to optimize the internet for 20
>> >>>> seconds. It's like optimizing a car, to handle well, for just 20
>> >>>> seconds.
>> >>>> 1) Not testing up + down + ping at the same time
>> >>>> None of the new tests actually test the same thing that the infamous
>> >>>> rrul test does - all the others still test up, then down, and ping. It
>> >>>> was/remains my hope that the simpler parts of the flent test suite -
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> >>>> tests would provide calibration to the test designers.
>> >>>> we've got zillions of flent results in the archive published here:
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>> >>>
>> >>>> The new tests have all added up + ping and down + ping, but not up +
>> >>>> down + ping. Why??
>> >>>> The behaviors of what happens in that case are really non-intuitive, I
>> >>>> know, but... it's just one more phase to add to any one of those new
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>> >>>> started doing that, even optionally, and boggled at how it defeated
>> >>>> their assumptions.
>> >>>> Among other things that would show...
>> >>>> It's the home router industry's dirty secret than darn few "gigabit"
>> >>>> home routers can actually forward in both directions at a gigabit. I'd
>> >>>> like to smash that perception thoroughly, but given our starting point
>> >>>> is a gigabit router was a "gigabit switch" - and historically been
>> >>>> something that couldn't even forward at 200Mbit - we have a long way
>> >>>> to go there.
>> >>>> Only in the past year have non-x86 home routers appeared that could
>> >>>> actually do a gbit in both directions.
>> >>>> 2) Few are actually testing within-stream latency
>> >>>> Apple's rpm project is making a stab in that direction. It looks
>> >>>> highly likely, that with a little more work, crusader and
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
>> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
>> >>>> discovering how hard it is by delving deep into the rust behind
>> >>>> crusader.
>> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
>> >>>> at the same time, I guess motivated by an attempt to have the test
>> >>>> complete quickly?
>> >>>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
>> >>>
>> >>>> In the libreqos.io project we've established a testbed where tests can
>> >>>> be plunked through various ISP plan network emulations. It's here:
>> >>>> https://payne.taht.net (run bandwidth test for what's currently hooked
>> >>>> up)
>> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> >>>> leverage with that, so I don't have to nat the various emulations.
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> >>>> to see more test designers setup a testbed like this to calibrate
>> >>>> their own stuff.
>> >>>> Presently we're able to test:
>> >>>> flent
>> >>>> netperf
>> >>>> iperf2
>> >>>> iperf3
>> >>>> speedtest-cli
>> >>>> crusader
>> >>>> the broadband forum udp based test:
>> >>>> https://github.com/BroadbandForum/obudpst
>> >>>> trexx
>> >>>> There's also a virtual machine setup that we can remotely drive a web
>> >>>> browser from (but I didn't want to nat the results to the world) to
>> >>>> test other web services.
>> >>>> _______________________________________________
>> >>>> Rpm mailing list
>> >>>> Rpm@lists.bufferbloat.net
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>> >>> _______________________________________________
>> >>> Starlink mailing list
>> >>> Starlink@lists.bufferbloat.net
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>> >>
>> >> _______________________________________________
>> >> Starlink mailing list
>> >> Starlink@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/starlink
>> 
>> _______________________________________________
>> Starlink mailing list
>> Starlink@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12  8:22     ` Sebastian Moeller
  2023-01-12 18:02       ` rjmcmahon
@ 2023-01-12 20:39       ` Dick Roy
  2023-01-13  7:33         ` Sebastian Moeller
  2023-01-13  7:40         ` rjmcmahon
  1 sibling, 2 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-12 20:39 UTC (permalink / raw)
  To: 'Sebastian Moeller'
  Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
	'David P. Reed', 'Rpm', 'rjmcmahon',
	'bloat'

[-- Attachment #1: Type: text/plain, Size: 16054 bytes --]

Hi Sebastian (et. al.),

 

[I'll comment up here instead of inline.]  

 

Let me start by saying that I have not been intimately involved with the
IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
similar vein, just adding the wireless first hop component and it's effects
on PTP.  

 

What was apparent from the outset was that there was a lack of understanding
what the terms "to synchronize" or "to be synchronized" actually mean.  It's
not trivial . because we live in a (approximately, that's another story!)
4-D space-time continuum where the Lorentz metric plays a critical role.
Therein, simultaneity (aka "things happening at the same time") means the
"distance" between two such events is zero and that distance is given by
sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
a clock somewhere. Now since everything is relative (time with respect to
what? / location with respect to where?) it's pretty easy to see that "if
you don't know where you are, you can't know what time it is!" (English
sailors of the 18th century knew this well!) Add to this the fact that if
everything were stationary, nothing would happen (as Einstein said "Nothing
happens until something moves!"), special relativity also pays a role.
Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
due to their "speed" (8700 mph roughly)! Then add the consequence that
without mass we wouldn't exist (in these forms at least:-)), and
gravitational effects (aka General Relativity) come into play. Those turn
out to make clocks on GPS satellites run 45usec/day faster than those on
earth!  The net effect is that GPS clocks run about 38usec/day faster than
clocks on earth.  So what does it mean to "synchronize to GPS"?  Point is:
it's a non-trivial question with a very complicated answer.  The reason it
is important to get all this right is that the "what that ties time and
space together" is the speed of light and that turns out to be a
"foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means if I am
uncertain about my location to say 300 meters, then I also am not sure what
time it is to a usec AND vice-versa! 

 

All that said, the simplest explanation of synchronization is probably: Two
clocks are synchronized if, when they are brought (slowly) into physical
proximity ("sat next to each other") in the same (quasi-)inertial frame and
the same gravitational potential (not so obvious BTW . see the FYI below!),
an observer of both would say "they are keeping time identically". Since
this experiment is rarely possible, one can never be "sure" that his clock
is synchronized to any other clock elsewhere. And what does it mean to say
they "were synchronized" when brought together, but now they are not because
they are now in different gravitational potentials! (FYI, there are land
mine detectors being developed on this very principle! I know someone who
actually worked on such a project!) 

 

This all gets even more complicated when dealing with large networks of
networks in which the "speed of information transmission" can vary depending
on the medium (cf. coaxial cables versus fiber versus microwave links!) In
fact, the atmosphere is one of those media and variations therein result in
the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
in order to get to sub-nsec/cm accuracy.  Point is if you have a set of
nodes distributed across the country all with GPS and all "synchronized to
GPS time", and a second identical set of nodes (with no GPS) instead
connected with a network of cables and fiber links, all of different lengths
and composition using different carrier frequencies (dielectric constants
vary with frequency!) "synchronized" to some clock somewhere using NTP or
PTP), the synchronization of the two sets will be different unless a common
reference clock is used AND all the above effects are taken into account,
and good luck with that! :-) 

 

In conclusion, if anyone tells you that clock synchronization in
communication networks is simple ("Just use GPS!"), you should feel free to
chuckle (under your breath if necessary:-)) 

 

Cheers,

 

RR

 

 

  

 

 

 

-----Original Message-----
From: Sebastian Moeller [mailto:moeller0@gmx.de] 
Sent: Thursday, January 12, 2023 12:23 AM
To: Dick Roy
Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David P.
Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

Hi RR,

 

 

> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:

> 

>  

>  

> -----Original Message-----

> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf
Of Sebastian Moeller via Starlink

> Sent: Wednesday, January 11, 2023 12:01 PM

> To: Rodney W. Grimes

> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat

> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

>  

> Hi Rodney,

>  

>  

>  

>  

> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
wrote:

> > 

> > Hello,

> > 

> >     Yall can call me crazy if you want.. but... see below [RWG]

> >> Hi Bib,

> >> 

> >> 

> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:

> >>> 

> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.

> >>> 

> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.

> >> 

> >>    [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....

> > 

> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time".  IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's.  You need to put 4 time stamps in the packet, and with
that you can compute "offset".

> [RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.

>  

>       [SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?

> [RR] It's not that simple!  Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs!

 

      [SM] I happily believe you, but the same idea of "time slot" needs to
be shared by all nodes, no? So the clockss need to be reasonably similar
rate, aka synchronized (see below).

 

 

>  All the technologies you mentioned above are essentially P2P, not
intended for broadcast.  Point is, there is a point controller (aka PoC)
often called a base station (eNodeB, gNodeB, .) that actually "controls
everything that is necessary to control" at the UE including time, frequency
and sampling time offsets, and these are critical to get right if you want
to communicate, and they are ALL subject to the laws of physics (cf. the
speed of light)! Turns out that what is necessary for the system to function
anywhere near capacity, is for all the clocks governing transmissions from
the UEs to be "unsynchronized" such that all the UE transmissions arrive at
the PoC at the same (prescribed) time!

 

      [SM] Fair enough. I would call clocks that are "in sync" albeit with
individual offsets as synchronized, but I am a layman and that might sound
offensively wrong to experts in the field. But even without the naming my
point is that all systems that depend on some idea of shared time-base are
halfway there of exposing that time to end users, by "translating it into an
NTP time source at the modem.

 

 

> For some technologies, in particular 5G!, these considerations are
ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
believe me! J   

 

      [SM Far be it from me not to believe you, so thanks for the pointers.
Yet, I still think that unless different nodes of a shared segment move at
significantly different speeds, that there should be a common
"tick-duration" for all clocks even if each clock runs at an offset... (I
naively would try to implement something like that by trying to fully
synchronize clocks and maintain a local offset value to convert from
"absolute" time to "network" time, but likely because coming from the
outside I am blissfully unaware of the detail challenges that need to be
solved).

 

Regards & Thanks

      Sebastian

 

 

>  

>  

> > 

> >> 

> >> 

> >>> 

> >>> --trip-times

> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)

> > [RWG] --clock-skew

> >     enable the measurement of the wall clock difference between sender
and receiver

> > 

> >> 

> >>    [SM] Sweet!

> >> 

> >> Regards

> >>    Sebastian

> >> 

> >>> 

> >>> Bob

> >>>> I have many kvetches about the new latency under load tests being

> >>>> designed and distributed over the past year. I am delighted! that
they

> >>>> are happening, but most really need third party evaluation, and

> >>>> calibration, and a solid explanation of what network pathologies they

> >>>> do and don't cover. Also a RED team attitude towards them, as well as

> >>>> thinking hard about what you are not measuring (operations research).

> >>>> I actually rather love the new cloudflare speedtest, because it tests

> >>>> a single TCP connection, rather than dozens, and at the same time
folk

> >>>> are complaining that it doesn't find the actual "speed!". yet... the

> >>>> test itself more closely emulates a user experience than
speedtest.net

> >>>> does. I am personally pretty convinced that the fewer numbers of
flows

> >>>> that a web page opens improves the likelihood of a good user

> >>>> experience, but lack data on it.

> >>>> To try to tackle the evaluation and calibration part, I've reached
out

> >>>> to all the new test designers in the hope that we could get together

> >>>> and produce a report of what each new test is actually doing. I've

> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know

> >>>> of, and only to some response, please reach out to other test
designer

> >>>> folks and have them join the rpm email list?

> >>>> My principal kvetches in the new tests so far are:

> >>>> 0) None of the tests last long enough.

> >>>> Ideally there should be a mode where they at least run to "time of

> >>>> first loss", or periodically, just run longer than the

> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

> >>>> there! It's really bad science to optimize the internet for 20

> >>>> seconds. It's like optimizing a car, to handle well, for just 20

> >>>> seconds.

> >>>> 1) Not testing up + down + ping at the same time

> >>>> None of the new tests actually test the same thing that the infamous

> >>>> rrul test does - all the others still test up, then down, and ping.
It

> >>>> was/remains my hope that the simpler parts of the flent test suite -

> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

> >>>> tests would provide calibration to the test designers.

> >>>> we've got zillions of flent results in the archive published here:

> >>>> https://blog.cerowrt.org/post/found_in_flent/

> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.

> >>> 

> >>>> The new tests have all added up + ping and down + ping, but not up +

> >>>> down + ping. Why??

> >>>> The behaviors of what happens in that case are really non-intuitive,
I

> >>>> know, but... it's just one more phase to add to any one of those new

> >>>> tests. I'd be deliriously happy if someone(s) new to the field

> >>>> started doing that, even optionally, and boggled at how it defeated

> >>>> their assumptions.

> >>>> Among other things that would show...

> >>>> It's the home router industry's dirty secret than darn few "gigabit"

> >>>> home routers can actually forward in both directions at a gigabit.
I'd

> >>>> like to smash that perception thoroughly, but given our starting
point

> >>>> is a gigabit router was a "gigabit switch" - and historically been

> >>>> something that couldn't even forward at 200Mbit - we have a long way

> >>>> to go there.

> >>>> Only in the past year have non-x86 home routers appeared that could

> >>>> actually do a gbit in both directions.

> >>>> 2) Few are actually testing within-stream latency

> >>>> Apple's rpm project is making a stab in that direction. It looks

> >>>> highly likely, that with a little more work, crusader and

> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and

> >>>> markings, more directly. As for the rest... sampling TCP_INFO on

> >>>> windows, and Linux, at least, always appeared simple to me, but I'm

> >>>> discovering how hard it is by delving deep into the rust behind

> >>>> crusader.

> >>>> the goresponsiveness thing is also IMHO running WAY too many streams

> >>>> at the same time, I guess motivated by an attempt to have the test

> >>>> complete quickly?

> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.

> >>> 

> >>>> In the libreqos.io project we've established a testbed where tests
can

> >>>> be plunked through various ISP plan network emulations. It's here:

> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked

> >>>> up)

> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to

> >>>> leverage with that, so I don't have to nat the various emulations.

> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

> >>>> to see more test designers setup a testbed like this to calibrate

> >>>> their own stuff.

> >>>> Presently we're able to test:

> >>>> flent

> >>>> netperf

> >>>> iperf2

> >>>> iperf3

> >>>> speedtest-cli

> >>>> crusader

> >>>> the broadband forum udp based test:

> >>>> https://github.com/BroadbandForum/obudpst

> >>>> trexx

> >>>> There's also a virtual machine setup that we can remotely drive a web

> >>>> browser from (but I didn't want to nat the results to the world) to

> >>>> test other web services.

> >>>> _______________________________________________

> >>>> Rpm mailing list

> >>>> Rpm@lists.bufferbloat.net

> >>>> https://lists.bufferbloat.net/listinfo/rpm

> >>> _______________________________________________

> >>> Starlink mailing list

> >>> Starlink@lists.bufferbloat.net

> >>> https://lists.bufferbloat.net/listinfo/starlink

> >> 

> >> _______________________________________________

> >> Starlink mailing list

> >> Starlink@lists.bufferbloat.net

> >> https://lists.bufferbloat.net/listinfo/starlink

>  

> _______________________________________________

> Starlink mailing list

> Starlink@lists.bufferbloat.net

> https://lists.bufferbloat.net/listinfo/starlink


[-- Attachment #2: Type: text/html, Size: 45666 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12 18:02       ` rjmcmahon
@ 2023-01-12 21:34         ` Dick Roy
  0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-12 21:34 UTC (permalink / raw)
  To: 'rjmcmahon', 'Sebastian Moeller'
  Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
	'David P. Reed', 'Rpm', 'bloat'

[-- Attachment #1: Type: text/plain, Size: 13155 bytes --]

 

 

-----Original Message-----
From: rjmcmahon [mailto:rjmcmahon@rjmcmahon.com] 
Sent: Thursday, January 12, 2023 10:03 AM
To: Sebastian Moeller
Cc: Dick Roy; Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos;
David P. Reed; Rpm; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

For WiFi there is the TSF

 

https://en.wikipedia.org/wiki/Timing_synchronization_function

[RR] There is also a TimingAdvertisement function which can be used to
synchronize STAs to UTC time (or other specified time references . see the
802.11 standard for details . or ask me offline).  It was added in the
802.11p amendment along with OCB operation if you care to know:-) 

 

We in test & measurement use that in our internal telemetry. The TSF of 

a Wifi device only needs frequency-sync for some things typically 

related to access to the medium. A phase locked loop does it. A device 

that decides to go to sleep, as an example, will also stop its TSF 

creating a non-linearity. It's difficult to synchronize it to the system 

clock or the GPS atomic clock - though we do this for internal testing 

reasons so it can be done.

 

What's mostly missing for T&M with WiFi is the GPS atomic clock as 

that's a convenient time domain to use as the canonical domain.

 

Bob

> Hi RR,

> 

> 

>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:

>> 

>> 

>> 

>> -----Original Message-----

>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On 

>> Behalf Of Sebastian Moeller via Starlink

>> Sent: Wednesday, January 11, 2023 12:01 PM

>> To: Rodney W. Grimes

>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; 

>> David P. Reed; Rpm; rjmcmahon; bloat

>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in 

>> USA

>> 

>> Hi Rodney,

>> 

>> 

>> 

>> 

>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
<starlink@gndrsh.dnsmgr.net> wrote:

>> >

>> > Hello,

>> >

>> >     Yall can call me crazy if you want.. but... see below [RWG]

>> >> Hi Bib,

>> >>

>> >>

>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:

>> >>>

>> >>> My biggest barrier is the lack of clock sync by the devices, i.e.
very limited support for PTP in data centers and in end devices. This limits
the ability to measure one way delays (OWD) and most assume that OWD is 1/2
and RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.

>> >>>

>> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.

>> >>

>> >>    [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....

>> >

>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time".  IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's.  You need to put 4 time stamps in the packet, and with
that you can compute "offset".

>> [RR] For this to work at a reasonable level of accuracy, the 

>> timestamping circuits on both ends need to be deterministic and 

>> repeatable as I recall. Any uncertainty in that process adds to 

>> synchronization errors/uncertainties.

>> 

>>       [SM] Nice idea. I would guess that all timeslot based access 

>> technologies (so starlink, docsis, GPON, LTE?) all distribute "high 

>> quality time" carefully to the "modems", so maybe all that would be 

>> needed is to expose that high quality time to the LAN side of those 

>> modems, dressed up as NTP server?

>> [RR] It's not that simple!  Distributing "high-quality time", i.e. 

>> "synchronizing all clocks" does not solve the communication problem in 

>> synchronous slotted MAC/PHYs!

> 

>     [SM] I happily believe you, but the same idea of "time slot" needs to

> be shared by all nodes, no? So the clockss need to be reasonably

> similar rate, aka synchronized (see below).

> 

> 

>>  All the technologies you mentioned above are essentially P2P, not 

>> intended for broadcast.  Point is, there is a point controller (aka 

>> PoC) often called a base station (eNodeB, gNodeB, .) that actually 

>> "controls everything that is necessary to control" at the UE including 

>> time, frequency and sampling time offsets, and these are critical to 

>> get right if you want to communicate, and they are ALL subject to the 

>> laws of physics (cf. the speed of light)! Turns out that what is 

>> necessary for the system to function anywhere near capacity, is for 

>> all the clocks governing transmissions from the UEs to be 

>> "unsynchronized" such that all the UE transmissions arrive at the PoC 

>> at the same (prescribed) time!

> 

>     [SM] Fair enough. I would call clocks that are "in sync" albeit with

> individual offsets as synchronized, but I am a layman and that might

> sound offensively wrong to experts in the field. But even without the

> naming my point is that all systems that depend on some idea of shared

> time-base are halfway there of exposing that time to end users, by

> "translating it into an NTP time source at the modem.

> 

> 

>> For some technologies, in particular 5G!, these considerations are 

>> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you 

>> don't believe me! J

> 

>     [SM Far be it from me not to believe you, so thanks for the pointers.

> Yet, I still think that unless different nodes of a shared segment

> move at significantly different speeds, that there should be a common

> "tick-duration" for all clocks even if each clock runs at an offset...

> (I naively would try to implement something like that by trying to

> fully synchronize clocks and maintain a local offset value to convert

> from "absolute" time to "network" time, but likely because coming from

> the outside I am blissfully unaware of the detail challenges that need

> to be solved).

> 

> Regards & Thanks

>     Sebastian

> 

> 

>> 

>> 

>> >

>> >>

>> >>

>> >>>

>> >>> --trip-times

>> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)

>> > [RWG] --clock-skew

>> >     enable the measurement of the wall clock difference between sender
and receiver

>> >

>> >>

>> >>    [SM] Sweet!

>> >>

>> >> Regards

>> >>    Sebastian

>> >>

>> >>>

>> >>> Bob

>> >>>> I have many kvetches about the new latency under load tests being

>> >>>> designed and distributed over the past year. I am delighted! that
they

>> >>>> are happening, but most really need third party evaluation, and

>> >>>> calibration, and a solid explanation of what network pathologies
they

>> >>>> do and don't cover. Also a RED team attitude towards them, as well
as

>> >>>> thinking hard about what you are not measuring (operations
research).

>> >>>> I actually rather love the new cloudflare speedtest, because it
tests

>> >>>> a single TCP connection, rather than dozens, and at the same time
folk

>> >>>> are complaining that it doesn't find the actual "speed!". yet... the

>> >>>> test itself more closely emulates a user experience than
speedtest.net

>> >>>> does. I am personally pretty convinced that the fewer numbers of
flows

>> >>>> that a web page opens improves the likelihood of a good user

>> >>>> experience, but lack data on it.

>> >>>> To try to tackle the evaluation and calibration part, I've reached
out

>> >>>> to all the new test designers in the hope that we could get together

>> >>>> and produce a report of what each new test is actually doing. I've

>> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know

>> >>>> of, and only to some response, please reach out to other test
designer

>> >>>> folks and have them join the rpm email list?

>> >>>> My principal kvetches in the new tests so far are:

>> >>>> 0) None of the tests last long enough.

>> >>>> Ideally there should be a mode where they at least run to "time of

>> >>>> first loss", or periodically, just run longer than the

>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

>> >>>> there! It's really bad science to optimize the internet for 20

>> >>>> seconds. It's like optimizing a car, to handle well, for just 20

>> >>>> seconds.

>> >>>> 1) Not testing up + down + ping at the same time

>> >>>> None of the new tests actually test the same thing that the infamous

>> >>>> rrul test does - all the others still test up, then down, and ping.
It

>> >>>> was/remains my hope that the simpler parts of the flent test suite -

>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

>> >>>> tests would provide calibration to the test designers.

>> >>>> we've got zillions of flent results in the archive published here:

>> >>>> https://blog.cerowrt.org/post/found_in_flent/

>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.

>> >>>

>> >>>> The new tests have all added up + ping and down + ping, but not up +

>> >>>> down + ping. Why??

>> >>>> The behaviors of what happens in that case are really non-intuitive,
I

>> >>>> know, but... it's just one more phase to add to any one of those new

>> >>>> tests. I'd be deliriously happy if someone(s) new to the field

>> >>>> started doing that, even optionally, and boggled at how it defeated

>> >>>> their assumptions.

>> >>>> Among other things that would show...

>> >>>> It's the home router industry's dirty secret than darn few "gigabit"

>> >>>> home routers can actually forward in both directions at a gigabit.
I'd

>> >>>> like to smash that perception thoroughly, but given our starting
point

>> >>>> is a gigabit router was a "gigabit switch" - and historically been

>> >>>> something that couldn't even forward at 200Mbit - we have a long way

>> >>>> to go there.

>> >>>> Only in the past year have non-x86 home routers appeared that could

>> >>>> actually do a gbit in both directions.

>> >>>> 2) Few are actually testing within-stream latency

>> >>>> Apple's rpm project is making a stab in that direction. It looks

>> >>>> highly likely, that with a little more work, crusader and

>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and

>> >>>> markings, more directly. As for the rest... sampling TCP_INFO on

>> >>>> windows, and Linux, at least, always appeared simple to me, but I'm

>> >>>> discovering how hard it is by delving deep into the rust behind

>> >>>> crusader.

>> >>>> the goresponsiveness thing is also IMHO running WAY too many streams

>> >>>> at the same time, I guess motivated by an attempt to have the test

>> >>>> complete quickly?

>> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.

>> >>>

>> >>>> In the libreqos.io project we've established a testbed where tests
can

>> >>>> be plunked through various ISP plan network emulations. It's here:

>> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked

>> >>>> up)

>> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to

>> >>>> leverage with that, so I don't have to nat the various emulations.

>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
licensed,

>> >>>> to see more test designers setup a testbed like this to calibrate

>> >>>> their own stuff.

>> >>>> Presently we're able to test:

>> >>>> flent

>> >>>> netperf

>> >>>> iperf2

>> >>>> iperf3

>> >>>> speedtest-cli

>> >>>> crusader

>> >>>> the broadband forum udp based test:

>> >>>> https://github.com/BroadbandForum/obudpst

>> >>>> trexx

>> >>>> There's also a virtual machine setup that we can remotely drive a
web

>> >>>> browser from (but I didn't want to nat the results to the world) to

>> >>>> test other web services.

>> >>>> _______________________________________________

>> >>>> Rpm mailing list

>> >>>> Rpm@lists.bufferbloat.net

>> >>>> https://lists.bufferbloat.net/listinfo/rpm

>> >>> _______________________________________________

>> >>> Starlink mailing list

>> >>> Starlink@lists.bufferbloat.net

>> >>> https://lists.bufferbloat.net/listinfo/starlink

>> >>

>> >> _______________________________________________

>> >> Starlink mailing list

>> >> Starlink@lists.bufferbloat.net

>> >> https://lists.bufferbloat.net/listinfo/starlink

>> 

>> _______________________________________________

>> Starlink mailing list

>> Starlink@lists.bufferbloat.net

>> https://lists.bufferbloat.net/listinfo/starlink


[-- Attachment #2: Type: text/html, Size: 47179 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12 17:49     ` Robert McMahon
@ 2023-01-12 21:57       ` Dick Roy
  2023-01-13  7:44         ` Sebastian Moeller
  0 siblings, 1 reply; 19+ messages in thread
From: Dick Roy @ 2023-01-12 21:57 UTC (permalink / raw)
  To: 'Robert McMahon', 'Sebastian Moeller'
  Cc: mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

[-- Attachment #1: Type: text/plain, Size: 10313 bytes --]

FYI .

https://www.fiercewireless.com/tech/cbrs-based-fwa-beats-starlink-performanc
e-madden

Nothing earth-shaking :-)

RR

  _____  

From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
Robert McMahon via Starlink
Sent: Thursday, January 12, 2023 9:50 AM
To: Sebastian Moeller
Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

Hi Sebastien,

You make a good point. What I did was issue a warning if the tool found it
was being CPU limited vs i/o limited. This indicates the i/o test likely is
inaccurate from an i/o perspective, and the results are suspect. It does
this crudely by comparing the cpu thread doing stats against the traffic
threads doing i/o, which thread is waiting on the others. There is no
attempt to assess the cpu load itself. So it's designed with a singular
purpose of making sure i/o threads only block on syscalls of write and read.

I probably should revisit this both in design and implementation. Thanks for
bringing it up and all input is truly appreciated. 

Bob

On Jan 12, 2023, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:

Hi Bob,

 On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:

 Iperf 2 is designed to measure network i/o. Note: It doesn't have to move
large amounts of data. It can support data profiles that don't drive TCP's
CCA as an example.

 Two things I've been asked for and avoided:

 1) Integrate clock sync into iperf's test traffic

 [SM] This I understand, measurement conditions can be unsuited for tight
time synchronization...

 2) Measure and output CPU usages

 [SM] This one puzzles me, as far as I understand the only way to properly
diagnose network issues is to rule out other things like CPU overload that
can have symptoms similar to network issues. As an example, the cake qdisc
will if CPU cycles become tight first increases its internal queueing and
jitter (not consciously, it is just an observation that once cake does not
get access to the CPU as timely as it wants, queuing latency and variability
increases) and then later also shows reduced throughput, so similar things
that can happen along an e2e network path for completely different reasons,
e.g. lower level retransmissions or a variable rate link. So i would think
that checking the CPU load at least coarse would be within the scope of
network testing tools, no?

Regards

 Sebastian

 I think both of these are outside the scope of a tool designed to test
network i/o over sockets, rather these should be developed & validated
independently of a network i/o tool.

 Clock error really isn't about amount/frequency of traffic but rather
getting a periodic high-quality reference. I tend to use GPS pulse per
second to lock the local system oscillator to. As David says, most every
modern handheld computer has the GPS chips to do this already. So to me it
seems more of a policy choice between data center operators and device mfgs
and less of a technical issue.

 Bob
 Hello,

  Yall can call me crazy if you want.. but... see below [RWG]
 Hi Bib,
 On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:

 My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.

 For those that can get clock sync working, the iperf 2 --trip-times options
is useful.
  [SM] +1; and yet even with unsynchronized clocks one can try to measure
how latency changes under load and that can be done per direction. Sure this
is far inferior to real reliably measured OWDs, but if life/the internet
deals you lemons....
 [RWG] iperf2/iperf3, etc are already moving large amounts of data

 back and forth, for that matter any rate test, why not abuse some of

 that data and add the fundemental NTP clock sync data and

 bidirectionally pass each others concept of "current time".  IIRC (its

 been 25 years since I worked on NTP at this level) you *should* be

 able to get a fairly accurate clock delta between each end, and then

 use that info and time stamps in the data stream to compute OWD's.

 You need to put 4 time stamps in the packet, and with that you can

 compute "offset".

 --trip-times

  enable the measurement of end to end write to read latencies (client and
server clocks must be synchronized)

 [RWG] --clock-skew

  enable the measurement of the wall clock difference between sender and
receiver
  [SM] Sweet!

 Regards

  Sebastian

 Bob
 I have many kvetches about the new latency under load tests being

 designed and distributed over the past year. I am delighted! that they

 are happening, but most really need third party evaluation, and

 calibration, and a solid explanation of what network pathologies they

 do and don't cover. Also a RED team attitude towards them, as well as

 thinking hard about what you are not measuring (operations research).

 I actually rather love the new cloudflare speedtest, because it tests

 a single TCP connection, rather than dozens, and at the same time folk

 are complaining that it doesn't find the actual "speed!". yet... the

 test itself more closely emulates a user experience than speedtest.net

 does. I am personally pretty convinced that the fewer numbers of flows

 that a web page opens improves the likelihood of a good user

 experience, but lack data on it.

 To try to tackle the evaluation and calibration part, I've reached out

 to all the new test designers in the hope that we could get together

 and produce a report of what each new test is actually doing. I've

 tweeted, linked in, emailed, and spammed every measurement list I know

 of, and only to some response, please reach out to other test designer

 folks and have them join the rpm email list?

 My principal kvetches in the new tests so far are:

 0) None of the tests last long enough.

 Ideally there should be a mode where they at least run to "time of

 first loss", or periodically, just run longer than the

 industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

 there! It's really bad science to optimize the internet for 20

 seconds. It's like optimizing a car, to handle well, for just 20

 seconds.

 1) Not testing up + down + ping at the same time

 None of the new tests actually test the same thing that the infamous

 rrul test does - all the others still test up, then down, and ping. It

 was/remains my hope that the simpler parts of the flent test suite -

 such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

 tests would provide calibration to the test designers.

 we've got zillions of flent results in the archive published here:

 https://blog.cerowrt.org/post/found_in_flent/

 ps. Misinformation about iperf 2 impacts my ability to do this.

 The new tests have all added up + ping and down + ping, but not up +

 down + ping. Why??

 The behaviors of what happens in that case are really non-intuitive, I

 know, but... it's just one more phase to add to any one of those new

 tests. I'd be deliriously happy if someone(s) new to the field

 started doing that, even optionally, and boggled at how it defeated

 their assumptions.

 Among other things that would show...

 It's the home router industry's dirty secret than darn few "gigabit"

 home routers can actually forward in both directions at a gigabit. I'd

 like to smash that perception thoroughly, but given our starting point

 is a gigabit router was a "gigabit switch" - and historically been

 something that couldn't even forward at 200Mbit - we have a long way

 to go there.

 Only in the past year have non-x86 home routers appeared that could

 actually do a gbit in both directions.

 2) Few are actually testing within-stream latency

 Apple's rpm project is making a stab in that direction. It looks

 highly likely, that with a little more work, crusader and

 go-responsiveness can finally start sampling the tcp RTT, loss and

 markings, more directly. As for the rest... sampling TCP_INFO on

 windows, and Linux, at least, always appeared simple to me, but I'm

 discovering how hard it is by delving deep into the rust behind

 crusader.

 the goresponsiveness thing is also IMHO running WAY too many streams

 at the same time, I guess motivated by an attempt to have the test

 complete quickly?

 B) To try and tackle the validation problem:ps. Misinformation about iperf
2 impacts my ability to do this.

 In the libreqos.io project we've established a testbed where tests can

 be plunked through various ISP plan network emulations. It's here:

 https://payne.taht.net (run bandwidth test for what's currently hooked

 up)

 We could rather use an AS number and at least a ipv4/24 and ipv6/48 to

 leverage with that, so I don't have to nat the various emulations.

 (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

 to see more test designers setup a testbed like this to calibrate

 their own stuff.

 Presently we're able to test:

 flent

 netperf

 iperf2

 iperf3

 speedtest-cli

 crusader

 the broadband forum udp based test:

 https://github.com/BroadbandForum/obudpst

 trexx

 There's also a virtual machine setup that we can remotely drive a web

 browser from (but I didn't want to nat the results to the world) to

 test other web services.

  _____  

 Rpm mailing list

 Rpm@lists.bufferbloat.net

 https://lists.bufferbloat.net/listinfo/rpm

  _____  

 Starlink mailing list

 Starlink@lists.bufferbloat.net

 https://lists.bufferbloat.net/listinfo/starlink

  _____  

 Starlink mailing list

 Starlink@lists.bufferbloat.net

 https://lists.bufferbloat.net/listinfo/starlink

[-- Attachment #2: Type: text/html, Size: 20462 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12 20:39       ` Dick Roy
@ 2023-01-13  7:33         ` Sebastian Moeller
  2023-01-13  8:26           ` Dick Roy
  2023-01-13  7:40         ` rjmcmahon
  1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-13  7:33 UTC (permalink / raw)
  To: dickroy, Dick Roy
  Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
	'David P. Reed', 'Rpm', 'rjmcmahon',
	'bloat'

[-- Attachment #1: Type: text/plain, Size: 18467 bytes --]

Hi RR,

Thanks for the detailed response below, since my point is somewhat orthogonal I opted for top-posting.
Let me take a step back here and rephrase, synchronising clocks within an acceptable range to be useful is not rocket science nor witchcraft. For measuring internet traffic 'millisecond' range seems acceptable, local networks can probably profit from finer time resolution. So I am not after e.g. clock synchronisation to participate in SDH/SONET. Heck in the toy project I am active in, we operate on load dependent delay deltas so we even ignore different time offsets and are tolerant to (mildly) different tickrates and clock skew, but it would certainly be nice to have some acceptable measure of UTC from endpoints to be able to interpret timestamps as 'absolute'. Mind you I am fine with them not being veridical absolute, but just good enough for my measurement purpose and I guess that should be within the range of the achievable. Heck, if all servers we query timestamps of would be NTP-'synchronized' and would follow the RFC recommendation to report timestamps in milliseconds past midnight UTC I would be happy.

Regards
        Sebsstian

On 12 January 2023 21:39:21 CET, Dick Roy <dickroy@alum.mit.edu> wrote:
>Hi Sebastian (et. al.),
>
> 
>
>[I'll comment up here instead of inline.]  
>
> 
>
>Let me start by saying that I have not been intimately involved with the
>IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
>similar vein, just adding the wireless first hop component and it's effects
>on PTP.  
>
> 
>
>What was apparent from the outset was that there was a lack of understanding
>what the terms "to synchronize" or "to be synchronized" actually mean.  It's
>not trivial . because we live in a (approximately, that's another story!)
>4-D space-time continuum where the Lorentz metric plays a critical role.
>Therein, simultaneity (aka "things happening at the same time") means the
>"distance" between two such events is zero and that distance is given by
>sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
>a clock somewhere. Now since everything is relative (time with respect to
>what? / location with respect to where?) it's pretty easy to see that "if
>you don't know where you are, you can't know what time it is!" (English
>sailors of the 18th century knew this well!) Add to this the fact that if
>everything were stationary, nothing would happen (as Einstein said "Nothing
>happens until something moves!"), special relativity also pays a role.
>Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
>due to their "speed" (8700 mph roughly)! Then add the consequence that
>without mass we wouldn't exist (in these forms at least:-)), and
>gravitational effects (aka General Relativity) come into play. Those turn
>out to make clocks on GPS satellites run 45usec/day faster than those on
>earth!  The net effect is that GPS clocks run about 38usec/day faster than
>clocks on earth.  So what does it mean to "synchronize to GPS"?  Point is:
>it's a non-trivial question with a very complicated answer.  The reason it
>is important to get all this right is that the "what that ties time and
>space together" is the speed of light and that turns out to be a
>"foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means if I am
>uncertain about my location to say 300 meters, then I also am not sure what
>time it is to a usec AND vice-versa! 
>
> 
>
>All that said, the simplest explanation of synchronization is probably: Two
>clocks are synchronized if, when they are brought (slowly) into physical
>proximity ("sat next to each other") in the same (quasi-)inertial frame and
>the same gravitational potential (not so obvious BTW . see the FYI below!),
>an observer of both would say "they are keeping time identically". Since
>this experiment is rarely possible, one can never be "sure" that his clock
>is synchronized to any other clock elsewhere. And what does it mean to say
>they "were synchronized" when brought together, but now they are not because
>they are now in different gravitational potentials! (FYI, there are land
>mine detectors being developed on this very principle! I know someone who
>actually worked on such a project!) 
>
> 
>
>This all gets even more complicated when dealing with large networks of
>networks in which the "speed of information transmission" can vary depending
>on the medium (cf. coaxial cables versus fiber versus microwave links!) In
>fact, the atmosphere is one of those media and variations therein result in
>the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
>in order to get to sub-nsec/cm accuracy.  Point is if you have a set of
>nodes distributed across the country all with GPS and all "synchronized to
>GPS time", and a second identical set of nodes (with no GPS) instead
>connected with a network of cables and fiber links, all of different lengths
>and composition using different carrier frequencies (dielectric constants
>vary with frequency!) "synchronized" to some clock somewhere using NTP or
>PTP), the synchronization of the two sets will be different unless a common
>reference clock is used AND all the above effects are taken into account,
>and good luck with that! :-) 
>
> 
>
>In conclusion, if anyone tells you that clock synchronization in
>communication networks is simple ("Just use GPS!"), you should feel free to
>chuckle (under your breath if necessary:-)) 
>
> 
>
>Cheers,
>
> 
>
>RR
>
> 
>
> 
>
>  
>
> 
>
> 
>
> 
>
>-----Original Message-----
>From: Sebastian Moeller [mailto:moeller0@gmx.de] 
>Sent: Thursday, January 12, 2023 12:23 AM
>To: Dick Roy
>Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David P.
>Reed; Rpm; rjmcmahon; bloat
>Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
> 
>
>Hi RR,
>
> 
>
> 
>
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
>
>> 
>
>>  
>
>>  
>
>> -----Original Message-----
>
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf
>Of Sebastian Moeller via Starlink
>
>> Sent: Wednesday, January 11, 2023 12:01 PM
>
>> To: Rodney W. Grimes
>
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
>P. Reed; Rpm; rjmcmahon; bloat
>
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
>>  
>
>> Hi Rodney,
>
>>  
>
>>  
>
>>  
>
>>  
>
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
>wrote:
>
>> > 
>
>> > Hello,
>
>> > 
>
>> >     Yall can call me crazy if you want.. but... see below [RWG]
>
>> >> Hi Bib,
>
>> >> 
>
>> >> 
>
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>
>> >>> 
>
>> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
>limited support for PTP in data centers and in end devices. This limits the
>ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
>RTT which typically is a mistake. We know this intuitively with airplane
>flight times or even car commute times where the one way time is not 1/2 a
>round trip time. Google maps & directions provide a time estimate for the
>one way link. It doesn't compute a round trip and divide by two.
>
>> >>> 
>
>> >>> For those that can get clock sync working, the iperf 2 --trip-times
>options is useful.
>
>> >> 
>
>> >>    [SM] +1; and yet even with unsynchronized clocks one can try to
>measure how latency changes under load and that can be done per direction.
>Sure this is far inferior to real reliably measured OWDs, but if life/the
>internet deals you lemons....
>
>> > 
>
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
>and forth, for that matter any rate test, why not abuse some of that data
>and add the fundemental NTP clock sync data and bidirectionally pass each
>others concept of "current time".  IIRC (its been 25 years since I worked on
>NTP at this level) you *should* be able to get a fairly accurate clock delta
>between each end, and then use that info and time stamps in the data stream
>to compute OWD's.  You need to put 4 time stamps in the packet, and with
>that you can compute "offset".
>
>> [RR] For this to work at a reasonable level of accuracy, the timestamping
>circuits on both ends need to be deterministic and repeatable as I recall.
>Any uncertainty in that process adds to synchronization
>errors/uncertainties.
>
>>  
>
>>       [SM] Nice idea. I would guess that all timeslot based access
>technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
>time" carefully to the "modems", so maybe all that would be needed is to
>expose that high quality time to the LAN side of those modems, dressed up as
>NTP server?
>
>> [RR] It's not that simple!  Distributing "high-quality time", i.e.
>"synchronizing all clocks" does not solve the communication problem in
>synchronous slotted MAC/PHYs!
>
> 
>
>      [SM] I happily believe you, but the same idea of "time slot" needs to
>be shared by all nodes, no? So the clockss need to be reasonably similar
>rate, aka synchronized (see below).
>
> 
>
> 
>
>>  All the technologies you mentioned above are essentially P2P, not
>intended for broadcast.  Point is, there is a point controller (aka PoC)
>often called a base station (eNodeB, gNodeB, .) that actually "controls
>everything that is necessary to control" at the UE including time, frequency
>and sampling time offsets, and these are critical to get right if you want
>to communicate, and they are ALL subject to the laws of physics (cf. the
>speed of light)! Turns out that what is necessary for the system to function
>anywhere near capacity, is for all the clocks governing transmissions from
>the UEs to be "unsynchronized" such that all the UE transmissions arrive at
>the PoC at the same (prescribed) time!
>
> 
>
>      [SM] Fair enough. I would call clocks that are "in sync" albeit with
>individual offsets as synchronized, but I am a layman and that might sound
>offensively wrong to experts in the field. But even without the naming my
>point is that all systems that depend on some idea of shared time-base are
>halfway there of exposing that time to end users, by "translating it into an
>NTP time source at the modem.
>
> 
>
> 
>
>> For some technologies, in particular 5G!, these considerations are
>ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
>believe me! J   
>
> 
>
>      [SM Far be it from me not to believe you, so thanks for the pointers.
>Yet, I still think that unless different nodes of a shared segment move at
>significantly different speeds, that there should be a common
>"tick-duration" for all clocks even if each clock runs at an offset... (I
>naively would try to implement something like that by trying to fully
>synchronize clocks and maintain a local offset value to convert from
>"absolute" time to "network" time, but likely because coming from the
>outside I am blissfully unaware of the detail challenges that need to be
>solved).
>
> 
>
>Regards & Thanks
>
>      Sebastian
>
> 
>
> 
>
>>  
>
>>  
>
>> > 
>
>> >> 
>
>> >> 
>
>> >>> 
>
>> >>> --trip-times
>
>> >>> enable the measurement of end to end write to read latencies (client
>and server clocks must be synchronized)
>
>> > [RWG] --clock-skew
>
>> >     enable the measurement of the wall clock difference between sender
>and receiver
>
>> > 
>
>> >> 
>
>> >>    [SM] Sweet!
>
>> >> 
>
>> >> Regards
>
>> >>    Sebastian
>
>> >> 
>
>> >>> 
>
>> >>> Bob
>
>> >>>> I have many kvetches about the new latency under load tests being
>
>> >>>> designed and distributed over the past year. I am delighted! that
>they
>
>> >>>> are happening, but most really need third party evaluation, and
>
>> >>>> calibration, and a solid explanation of what network pathologies they
>
>> >>>> do and don't cover. Also a RED team attitude towards them, as well as
>
>> >>>> thinking hard about what you are not measuring (operations research).
>
>> >>>> I actually rather love the new cloudflare speedtest, because it tests
>
>> >>>> a single TCP connection, rather than dozens, and at the same time
>folk
>
>> >>>> are complaining that it doesn't find the actual "speed!". yet... the
>
>> >>>> test itself more closely emulates a user experience than
>speedtest.net
>
>> >>>> does. I am personally pretty convinced that the fewer numbers of
>flows
>
>> >>>> that a web page opens improves the likelihood of a good user
>
>> >>>> experience, but lack data on it.
>
>> >>>> To try to tackle the evaluation and calibration part, I've reached
>out
>
>> >>>> to all the new test designers in the hope that we could get together
>
>> >>>> and produce a report of what each new test is actually doing. I've
>
>> >>>> tweeted, linked in, emailed, and spammed every measurement list I
>know
>
>> >>>> of, and only to some response, please reach out to other test
>designer
>
>> >>>> folks and have them join the rpm email list?
>
>> >>>> My principal kvetches in the new tests so far are:
>
>> >>>> 0) None of the tests last long enough.
>
>> >>>> Ideally there should be a mode where they at least run to "time of
>
>> >>>> first loss", or periodically, just run longer than the
>
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>
>> >>>> there! It's really bad science to optimize the internet for 20
>
>> >>>> seconds. It's like optimizing a car, to handle well, for just 20
>
>> >>>> seconds.
>
>> >>>> 1) Not testing up + down + ping at the same time
>
>> >>>> None of the new tests actually test the same thing that the infamous
>
>> >>>> rrul test does - all the others still test up, then down, and ping.
>It
>
>> >>>> was/remains my hope that the simpler parts of the flent test suite -
>
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>
>> >>>> tests would provide calibration to the test designers.
>
>> >>>> we've got zillions of flent results in the archive published here:
>
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
>
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
>
>> >>> 
>
>> >>>> The new tests have all added up + ping and down + ping, but not up +
>
>> >>>> down + ping. Why??
>
>> >>>> The behaviors of what happens in that case are really non-intuitive,
>I
>
>> >>>> know, but... it's just one more phase to add to any one of those new
>
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
>
>> >>>> started doing that, even optionally, and boggled at how it defeated
>
>> >>>> their assumptions.
>
>> >>>> Among other things that would show...
>
>> >>>> It's the home router industry's dirty secret than darn few "gigabit"
>
>> >>>> home routers can actually forward in both directions at a gigabit.
>I'd
>
>> >>>> like to smash that perception thoroughly, but given our starting
>point
>
>> >>>> is a gigabit router was a "gigabit switch" - and historically been
>
>> >>>> something that couldn't even forward at 200Mbit - we have a long way
>
>> >>>> to go there.
>
>> >>>> Only in the past year have non-x86 home routers appeared that could
>
>> >>>> actually do a gbit in both directions.
>
>> >>>> 2) Few are actually testing within-stream latency
>
>> >>>> Apple's rpm project is making a stab in that direction. It looks
>
>> >>>> highly likely, that with a little more work, crusader and
>
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and
>
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO on
>
>> >>>> windows, and Linux, at least, always appeared simple to me, but I'm
>
>> >>>> discovering how hard it is by delving deep into the rust behind
>
>> >>>> crusader.
>
>> >>>> the goresponsiveness thing is also IMHO running WAY too many streams
>
>> >>>> at the same time, I guess motivated by an attempt to have the test
>
>> >>>> complete quickly?
>
>> >>>> B) To try and tackle the validation problem:ps. Misinformation about
>iperf 2 impacts my ability to do this.
>
>> >>> 
>
>> >>>> In the libreqos.io project we've established a testbed where tests
>can
>
>> >>>> be plunked through various ISP plan network emulations. It's here:
>
>> >>>> https://payne.taht.net (run bandwidth test for what's currently
>hooked
>
>> >>>> up)
>
>> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
>to
>
>> >>>> leverage with that, so I don't have to nat the various emulations.
>
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>
>> >>>> to see more test designers setup a testbed like this to calibrate
>
>> >>>> their own stuff.
>
>> >>>> Presently we're able to test:
>
>> >>>> flent
>
>> >>>> netperf
>
>> >>>> iperf2
>
>> >>>> iperf3
>
>> >>>> speedtest-cli
>
>> >>>> crusader
>
>> >>>> the broadband forum udp based test:
>
>> >>>> https://github.com/BroadbandForum/obudpst
>
>> >>>> trexx
>
>> >>>> There's also a virtual machine setup that we can remotely drive a web
>
>> >>>> browser from (but I didn't want to nat the results to the world) to
>
>> >>>> test other web services.
>
>> >>>> _______________________________________________
>
>> >>>> Rpm mailing list
>
>> >>>> Rpm@lists.bufferbloat.net
>
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
>
>> >>> _______________________________________________
>
>> >>> Starlink mailing list
>
>> >>> Starlink@lists.bufferbloat.net
>
>> >>> https://lists.bufferbloat.net/listinfo/starlink
>
>> >> 
>
>> >> _______________________________________________
>
>> >> Starlink mailing list
>
>> >> Starlink@lists.bufferbloat.net
>
>> >> https://lists.bufferbloat.net/listinfo/starlink
>
>>  
>
>> _______________________________________________
>
>> Starlink mailing list
>
>> Starlink@lists.bufferbloat.net
>
>> https://lists.bufferbloat.net/listinfo/starlink
>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[-- Attachment #2: Type: text/html, Size: 47677 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12 20:39       ` Dick Roy
  2023-01-13  7:33         ` Sebastian Moeller
@ 2023-01-13  7:40         ` rjmcmahon
  2023-01-13  8:10           ` Dick Roy
  1 sibling, 1 reply; 19+ messages in thread
From: rjmcmahon @ 2023-01-13  7:40 UTC (permalink / raw)
  To: dickroy
  Cc: 'Sebastian Moeller', 'Rodney W. Grimes',
	mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

Hi RR,

I believe quality GPS chips compensate for relativity in pulse per 
second which is needed to get position accuracy.

Bob
> Hi Sebastian (et. al.),
> 
> [I'll comment up here instead of inline.]
> 
> Let me start by saying that I have not been intimately involved with
> the IEEE 1588 effort (PTP), however I was involved in the 802.11
> efforts along a similar vein, just adding the wireless first hop
> component and it's effects on PTP.
> 
> What was apparent from the outset was that there was a lack of
> understanding what the terms "to synchronize" or "to be synchronized"
> actually mean.  It's not trivial … because we live in a
> (approximately, that's another story!) 4-D space-time continuum where
> the Lorentz metric plays a critical role.  Therein, simultaneity (aka
> "things happening at the same time") means the "distance" between two
> such events is zero and that distance is given by sqrt(x^2 + y^2 + z^2
> - (ct)^2) and the "thing happening" can be the tick of a clock
> somewhere. Now since everything is relative (time with respect to
> what? / location with respect to where?) it's pretty easy to see that
> "if you don't know where you are, you can't know what time it is!"
> (English sailors of the 18th century knew this well!) Add to this the
> fact that if everything were stationary, nothing would happen (as
> Einstein said "Nothing happens until something moves!"), special
> relativity also pays a role.  Clocks on GPS satellites run approx.
> 7usecs/day slower than those on earth due to their "speed" (8700 mph
> roughly)! Then add the consequence that without mass we wouldn't exist
> (in these forms at leastJ), and gravitational effects (aka General
> Relativity) come into play. Those turn out to make clocks on GPS
> satellites run 45usec/day faster than those on earth!  The net effect
> is that GPS clocks run about 38usec/day faster than clocks on earth.
> So what does it mean to "synchronize to GPS"?  Point is: it's a
> non-trivial question with a very complicated answer.  The reason it is
> important to get all this right is that the "what that ties time and
> space together" is the speed of light and that turns out to be a
> "foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means if
> I am uncertain about my location to say 300 meters, then I also am not
> sure what time it is to a usec AND vice-versa!
> 
> All that said, the simplest explanation of synchronization is
> probably: Two clocks are synchronized if, when they are brought
> (slowly) into physical proximity ("sat next to each other") in the
> same (quasi-)inertial frame and the same gravitational potential (not
> so obvious BTW … see the FYI below!), an observer of both would say
> "they are keeping time identically". Since this experiment is rarely
> possible, one can never be "sure" that his clock is synchronized to
> any other clock elsewhere. And what does it mean to say they "were
> synchronized" when brought together, but now they are not because they
> are now in different gravitational potentials! (FYI, there are land
> mine detectors being developed on this very principle! I know someone
> who actually worked on such a project!)
> 
> This all gets even more complicated when dealing with large networks
> of networks in which the "speed of information transmission" can vary
> depending on the medium (cf. coaxial cables versus fiber versus
> microwave links!) In fact, the atmosphere is one of those media and
> variations therein result in the need for "GPS corrections" (cf. RTCM
> GPS correction messages, RTK, etc.) in order to get to sub-nsec/cm
> accuracy.  Point is if you have a set of nodes distributed across the
> country all with GPS and all "synchronized to GPS time", and a second
> identical set of nodes (with no GPS) instead connected with a network
> of cables and fiber links, all of different lengths and composition
> using different carrier frequencies (dielectric constants vary with
> frequency!) "synchronized" to some clock somewhere using NTP or PTP),
> the synchronization of the two sets will be different unless a common
> reference clock is used AND all the above effects are taken into
> account, and good luck with that! J
> 
> In conclusion, if anyone tells you that clock synchronization in
> communication networks is simple ("Just use GPS!"), you should feel
> free to chuckle (under your breath if necessaryJ)
> 
> Cheers,
> 
> RR
> 
> -----Original Message-----
> From: Sebastian Moeller [mailto:moeller0@gmx.de]
> Sent: Thursday, January 12, 2023 12:23 AM
> To: Dick Roy
> Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David
> P. Reed; Rpm; rjmcmahon; bloat
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
> USA
> 
> Hi RR,
> 
>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
> 
>> 
> 
>> 
> 
>> 
> 
>> -----Original Message-----
> 
>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
> Behalf Of Sebastian Moeller via Starlink
> 
>> Sent: Wednesday, January 11, 2023 12:01 PM
> 
>> To: Rodney W. Grimes
> 
>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;
> David P. Reed; Rpm; rjmcmahon; bloat
> 
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
> in USA
> 
>> 
> 
>> Hi Rodney,
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
> <starlink@gndrsh.dnsmgr.net> wrote:
> 
>> >
> 
>> > Hello,
> 
>> >
> 
>> >     Yall can call me crazy if you want.. but... see below [RWG]
> 
>> >> Hi Bib,
> 
>> >>
> 
>> >>
> 
>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
> <starlink@lists.bufferbloat.net> wrote:
> 
>> >>>
> 
>> >>> My biggest barrier is the lack of clock sync by the devices,
> i.e. very limited support for PTP in data centers and in end devices.
> This limits the ability to measure one way delays (OWD) and most
> assume that OWD is 1/2 and RTT which typically is a mistake. We know
> this intuitively with airplane flight times or even car commute times
> where the one way time is not 1/2 a round trip time. Google maps &
> directions provide a time estimate for the one way link. It doesn't
> compute a round trip and divide by two.
> 
>> >>>
> 
>> >>> For those that can get clock sync working, the iperf 2
> --trip-times options is useful.
> 
>> >>
> 
>> >>    [SM] +1; and yet even with unsynchronized clocks one can try
> to measure how latency changes under load and that can be done per
> direction. Sure this is far inferior to real reliably measured OWDs,
> but if life/the internet deals you lemons....
> 
>> >
> 
>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data
> back and forth, for that matter any rate test, why not abuse some of
> that data and add the fundemental NTP clock sync data and
> bidirectionally pass each others concept of "current time".  IIRC (its
> been 25 years since I worked on NTP at this level) you *should* be
> able to get a fairly accurate clock delta between each end, and then
> use that info and time stamps in the data stream to compute OWD's.
> You need to put 4 time stamps in the packet, and with that you can
> compute "offset".
> 
>> [RR] For this to work at a reasonable level of accuracy, the
> timestamping circuits on both ends need to be deterministic and
> repeatable as I recall. Any uncertainty in that process adds to
> synchronization errors/uncertainties.
> 
>> 
> 
>>       [SM] Nice idea. I would guess that all timeslot based access
> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
> quality time" carefully to the "modems", so maybe all that would be
> needed is to expose that high quality time to the LAN side of those
> modems, dressed up as NTP server?
> 
>> [RR] It's not that simple!  Distributing "high-quality time", i.e.
> "synchronizing all clocks" does not solve the communication problem in
> synchronous slotted MAC/PHYs!
> 
>       [SM] I happily believe you, but the same idea of "time slot"
> needs to be shared by all nodes, no? So the clockss need to be
> reasonably similar rate, aka synchronized (see below).
> 
>>  All the technologies you mentioned above are essentially P2P, not
> intended for broadcast.  Point is, there is a point controller (aka
> PoC) often called a base station (eNodeB, gNodeB, …) that actually
> "controls everything that is necessary to control" at the UE including
> time, frequency and sampling time offsets, and these are critical to
> get right if you want to communicate, and they are ALL subject to the
> laws of physics (cf. the speed of light)! Turns out that what is
> necessary for the system to function anywhere near capacity, is for
> all the clocks governing transmissions from the UEs to be
> "unsynchronized" such that all the UE transmissions arrive at the PoC
> at the same (prescribed) time!
> 
>       [SM] Fair enough. I would call clocks that are "in sync" albeit
> with individual offsets as synchronized, but I am a layman and that
> might sound offensively wrong to experts in the field. But even
> without the naming my point is that all systems that depend on some
> idea of shared time-base are halfway there of exposing that time to
> end users, by "translating it into an NTP time source at the modem.
> 
>> For some technologies, in particular 5G!, these considerations are
> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you
> don't believe me! J
> 
>       [SM Far be it from me not to believe you, so thanks for the
> pointers. Yet, I still think that unless different nodes of a shared
> segment move at significantly different speeds, that there should be a
> common "tick-duration" for all clocks even if each clock runs at an
> offset... (I naively would try to implement something like that by
> trying to fully synchronize clocks and maintain a local offset value
> to convert from "absolute" time to "network" time, but likely because
> coming from the outside I am blissfully unaware of the detail
> challenges that need to be solved).
> 
> Regards & Thanks
> 
>       Sebastian
> 
>> 
> 
>> 
> 
>> >
> 
>> >>
> 
>> >>
> 
>> >>>
> 
>> >>> --trip-times
> 
>> >>> enable the measurement of end to end write to read latencies
> (client and server clocks must be synchronized)
> 
>> > [RWG] --clock-skew
> 
>> >     enable the measurement of the wall clock difference between
> sender and receiver
> 
>> >
> 
>> >>
> 
>> >>    [SM] Sweet!
> 
>> >>
> 
>> >> Regards
> 
>> >>    Sebastian
> 
>> >>
> 
>> >>>
> 
>> >>> Bob
> 
>> >>>> I have many kvetches about the new latency under load tests
> being
> 
>> >>>> designed and distributed over the past year. I am delighted!
> that they
> 
>> >>>> are happening, but most really need third party evaluation, and
> 
> 
>> >>>> calibration, and a solid explanation of what network
> pathologies they
> 
>> >>>> do and don't cover. Also a RED team attitude towards them, as
> well as
> 
>> >>>> thinking hard about what you are not measuring (operations
> research).
> 
>> >>>> I actually rather love the new cloudflare speedtest, because it
> tests
> 
>> >>>> a single TCP connection, rather than dozens, and at the same
> time folk
> 
>> >>>> are complaining that it doesn't find the actual "speed!".
> yet... the
> 
>> >>>> test itself more closely emulates a user experience than
> speedtest.net
> 
>> >>>> does. I am personally pretty convinced that the fewer numbers
> of flows
> 
>> >>>> that a web page opens improves the likelihood of a good user
> 
>> >>>> experience, but lack data on it.
> 
>> >>>> To try to tackle the evaluation and calibration part, I've
> reached out
> 
>> >>>> to all the new test designers in the hope that we could get
> together
> 
>> >>>> and produce a report of what each new test is actually doing.
> I've
> 
>> >>>> tweeted, linked in, emailed, and spammed every measurement list
> I know
> 
>> >>>> of, and only to some response, please reach out to other test
> designer
> 
>> >>>> folks and have them join the rpm email list?
> 
>> >>>> My principal kvetches in the new tests so far are:
> 
>> >>>> 0) None of the tests last long enough.
> 
>> >>>> Ideally there should be a mode where they at least run to "time
> of
> 
>> >>>> first loss", or periodically, just run longer than the
> 
>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be
> dragons
> 
>> >>>> there! It's really bad science to optimize the internet for 20
> 
>> >>>> seconds. It's like optimizing a car, to handle well, for just
> 20
> 
>> >>>> seconds.
> 
>> >>>> 1) Not testing up + down + ping at the same time
> 
>> >>>> None of the new tests actually test the same thing that the
> infamous
> 
>> >>>> rrul test does - all the others still test up, then down, and
> ping. It
> 
>> >>>> was/remains my hope that the simpler parts of the flent test
> suite -
> 
>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the
> rtt_fair
> 
>> >>>> tests would provide calibration to the test designers.
> 
>> >>>> we've got zillions of flent results in the archive published
> here:
> 
>> >>>> https://blog.cerowrt.org/post/found_in_flent/
> 
>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.
> 
> 
>> >>>
> 
>> >>>> The new tests have all added up + ping and down + ping, but not
> up +
> 
>> >>>> down + ping. Why??
> 
>> >>>> The behaviors of what happens in that case are really
> non-intuitive, I
> 
>> >>>> know, but... it's just one more phase to add to any one of
> those new
> 
>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
> 
>> >>>> started doing that, even optionally, and boggled at how it
> defeated
> 
>> >>>> their assumptions.
> 
>> >>>> Among other things that would show...
> 
>> >>>> It's the home router industry's dirty secret than darn few
> "gigabit"
> 
>> >>>> home routers can actually forward in both directions at a
> gigabit. I'd
> 
>> >>>> like to smash that perception thoroughly, but given our
> starting point
> 
>> >>>> is a gigabit router was a "gigabit switch" - and historically
> been
> 
>> >>>> something that couldn't even forward at 200Mbit - we have a
> long way
> 
>> >>>> to go there.
> 
>> >>>> Only in the past year have non-x86 home routers appeared that
> could
> 
>> >>>> actually do a gbit in both directions.
> 
>> >>>> 2) Few are actually testing within-stream latency
> 
>> >>>> Apple's rpm project is making a stab in that direction. It
> looks
> 
>> >>>> highly likely, that with a little more work, crusader and
> 
>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss
> and
> 
>> >>>> markings, more directly. As for the rest... sampling TCP_INFO
> on
> 
>> >>>> windows, and Linux, at least, always appeared simple to me, but
> I'm
> 
>> >>>> discovering how hard it is by delving deep into the rust behind
> 
> 
>> >>>> crusader.
> 
>> >>>> the goresponsiveness thing is also IMHO running WAY too many
> streams
> 
>> >>>> at the same time, I guess motivated by an attempt to have the
> test
> 
>> >>>> complete quickly?
> 
>> >>>> B) To try and tackle the validation problem:ps. Misinformation
> about iperf 2 impacts my ability to do this.
> 
>> >>>
> 
>> >>>> In the libreqos.io project we've established a testbed where
> tests can
> 
>> >>>> be plunked through various ISP plan network emulations. It's
> here:
> 
>> >>>> https://payne.taht.net (run bandwidth test for what's currently
> hooked
> 
>> >>>> up)
> 
>> >>>> We could rather use an AS number and at least a ipv4/24 and
> ipv6/48 to
> 
>> >>>> leverage with that, so I don't have to nat the various
> emulations.
> 
>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
> licensed,
> 
>> >>>> to see more test designers setup a testbed like this to
> calibrate
> 
>> >>>> their own stuff.
> 
>> >>>> Presently we're able to test:
> 
>> >>>> flent
> 
>> >>>> netperf
> 
>> >>>> iperf2
> 
>> >>>> iperf3
> 
>> >>>> speedtest-cli
> 
>> >>>> crusader
> 
>> >>>> the broadband forum udp based test:
> 
>> >>>> https://github.com/BroadbandForum/obudpst
> 
>> >>>> trexx
> 
>> >>>> There's also a virtual machine setup that we can remotely drive
> a web
> 
>> >>>> browser from (but I didn't want to nat the results to the
> world) to
> awhile
>> >>>> test other web services.
> 
>> >>>> _______________________________________________
> 
>> >>>> Rpm mailing list
> 
>> >>>> Rpm@lists.bufferbloat.net
> 
>> >>>> https://lists.bufferbloat.net/listinfo/rpm
> 
>> >>> _______________________________________________
> 
>> >>> Starlink mailing list
> 
>> >>> Starlink@lists.bufferbloat.net
> 
>> >>> https://lists.bufferbloat.net/listinfo/starlink
> 
>> >>
> 
>> >> _______________________________________________
> 
>> >> Starlink mailing list
> 
>> >> Starlink@lists.bufferbloat.net
> 
>> >> https://lists.bufferbloat.net/listinfo/starlink
> 
>> 
> 
>> _______________________________________________
> 
>> Starlink mailing list
> 
>> Starlink@lists.bufferbloat.net
> 
>> https://lists.bufferbloat.net/listinfo/starlink

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-12 21:57       ` Dick Roy
@ 2023-01-13  7:44         ` Sebastian Moeller
  2023-01-13  8:01           ` Dick Roy
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-13  7:44 UTC (permalink / raw)
  To: dickroy, Dick Roy, 'Robert McMahon'
  Cc: mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

Hi RR

On 12 January 2023 22:57:32 CET, Dick Roy <dickroy@alum.mit.edu> wrote:
>FYI .
>
> 
>
>https://www.fiercewireless.com/tech/cbrs-based-fwa-beats-starlink-performanc
>e-madden
>

[SM] He is so close:
'Speed tests don’t tell us much about the capacity of the network, or the reliability of the network, or the true latency with larger packet sizes. Packet loss testing can help to fill in key missing information to give the end customer the smooth experience they’re looking for.'
and
'Packets received over 250 ms latency are considered too late to be useful for video conferencing.'

He actually reports both loss numbers and delay > 250ms, so in spite arguing that loss is the relevant metric he already dips his toes into the latency issue... I wonder whether his view will refine over time now that he apparently moved from a link with 8% packet loss to one with a more sane 0.1% loss rate (no idea how he measured lossrate though, or latency). I guess this shows that there is no single solution for all links, it really matters where one starts which of throughput, delay, loss is the most painful and hence the dimension in need of a fix first.

Regards
        Sebastian



> 
>
>Nothing earth-shaking :-)
>
>
>RR
>
> 
>
>  _____  
>
>From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
>Robert McMahon via Starlink
>Sent: Thursday, January 12, 2023 9:50 AM
>To: Sebastian Moeller
>Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
>P. Reed; Rpm; bloat
>Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
>
> 
>
>Hi Sebastien,
>
>You make a good point. What I did was issue a warning if the tool found it
>was being CPU limited vs i/o limited. This indicates the i/o test likely is
>inaccurate from an i/o perspective, and the results are suspect. It does
>this crudely by comparing the cpu thread doing stats against the traffic
>threads doing i/o, which thread is waiting on the others. There is no
>attempt to assess the cpu load itself. So it's designed with a singular
>purpose of making sure i/o threads only block on syscalls of write and read.
>
>I probably should revisit this both in design and implementation. Thanks for
>bringing it up and all input is truly appreciated. 
>
>Bob
>
>On Jan 12, 2023, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
>Hi Bob,
>
>
>
>
>
>
> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:
>
>
> 
>
>
> Iperf 2 is designed to measure network i/o. Note: It doesn't have to move
>large amounts of data. It can support data profiles that don't drive TCP's
>CCA as an example.
>
>
> 
>
>
> Two things I've been asked for and avoided:
>
>
> 
>
>
> 1) Integrate clock sync into iperf's test traffic
>
>
>
> [SM] This I understand, measurement conditions can be unsuited for tight
>time synchronization...
>
>
>
>
>
>
> 2) Measure and output CPU usages
>
>
>
> [SM] This one puzzles me, as far as I understand the only way to properly
>diagnose network issues is to rule out other things like CPU overload that
>can have symptoms similar to network issues. As an example, the cake qdisc
>will if CPU cycles become tight first increases its internal queueing and
>jitter (not consciously, it is just an observation that once cake does not
>get access to the CPU as timely as it wants, queuing latency and variability
>increases) and then later also shows reduced throughput, so similar things
>that can happen along an e2e network path for completely different reasons,
>e.g. lower level retransmissions or a variable rate link. So i would think
>that checking the CPU load at least coarse would be within the scope of
>network testing tools, no?
>
>
>
>
>
>Regards
>
>
> Sebastian
>
>
>
>
>
>
>
>
>
>
>
>
> I think both of these are outside the scope of a tool designed to test
>network i/o over sockets, rather these should be developed & validated
>independently of a network i/o tool.
>
>
> 
>
>
> Clock error really isn't about amount/frequency of traffic but rather
>getting a periodic high-quality reference. I tend to use GPS pulse per
>second to lock the local system oscillator to. As David says, most every
>modern handheld computer has the GPS chips to do this already. So to me it
>seems more of a policy choice between data center operators and device mfgs
>and less of a technical issue.
>
>
> 
>
>
> Bob
> Hello,
>
>
>  Yall can call me crazy if you want.. but... see below [RWG]
> Hi Bib,
> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink@lists.bufferbloat.net> wrote:
>
>
>
>
>
> My biggest barrier is the lack of clock sync by the devices, i.e. very
>limited support for PTP in data centers and in end devices. This limits the
>ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
>RTT which typically is a mistake. We know this intuitively with airplane
>flight times or even car commute times where the one way time is not 1/2 a
>round trip time. Google maps & directions provide a time estimate for the
>one way link. It doesn't compute a round trip and divide by two.
>
>
>
>
>
> For those that can get clock sync working, the iperf 2 --trip-times options
>is useful.
>  [SM] +1; and yet even with unsynchronized clocks one can try to measure
>how latency changes under load and that can be done per direction. Sure this
>is far inferior to real reliably measured OWDs, but if life/the internet
>deals you lemons....
> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>
>
> back and forth, for that matter any rate test, why not abuse some of
>
>
> that data and add the fundemental NTP clock sync data and
>
>
> bidirectionally pass each others concept of "current time".  IIRC (its
>
>
> been 25 years since I worked on NTP at this level) you *should* be
>
>
> able to get a fairly accurate clock delta between each end, and then
>
>
> use that info and time stamps in the data stream to compute OWD's.
>
>
> You need to put 4 time stamps in the packet, and with that you can
>
>
> compute "offset".
>
>
>
>
> --trip-times
>
>
>  enable the measurement of end to end write to read latencies (client and
>server clocks must be synchronized)
>
> [RWG] --clock-skew
>
>
>  enable the measurement of the wall clock difference between sender and
>receiver
>  [SM] Sweet!
>
>
> Regards
>
>
>  Sebastian
>
>
>
> Bob
> I have many kvetches about the new latency under load tests being
>
>
> designed and distributed over the past year. I am delighted! that they
>
>
> are happening, but most really need third party evaluation, and
>
>
> calibration, and a solid explanation of what network pathologies they
>
>
> do and don't cover. Also a RED team attitude towards them, as well as
>
>
> thinking hard about what you are not measuring (operations research).
>
>
> I actually rather love the new cloudflare speedtest, because it tests
>
>
> a single TCP connection, rather than dozens, and at the same time folk
>
>
> are complaining that it doesn't find the actual "speed!". yet... the
>
>
> test itself more closely emulates a user experience than speedtest.net
>
>
> does. I am personally pretty convinced that the fewer numbers of flows
>
>
> that a web page opens improves the likelihood of a good user
>
>
> experience, but lack data on it.
>
>
> To try to tackle the evaluation and calibration part, I've reached out
>
>
> to all the new test designers in the hope that we could get together
>
>
> and produce a report of what each new test is actually doing. I've
>
>
> tweeted, linked in, emailed, and spammed every measurement list I know
>
>
> of, and only to some response, please reach out to other test designer
>
>
> folks and have them join the rpm email list?
>
>
> My principal kvetches in the new tests so far are:
>
>
> 0) None of the tests last long enough.
>
>
> Ideally there should be a mode where they at least run to "time of
>
>
> first loss", or periodically, just run longer than the
>
>
> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>
>
> there! It's really bad science to optimize the internet for 20
>
>
> seconds. It's like optimizing a car, to handle well, for just 20
>
>
> seconds.
>
>
> 1) Not testing up + down + ping at the same time
>
>
> None of the new tests actually test the same thing that the infamous
>
>
> rrul test does - all the others still test up, then down, and ping. It
>
>
> was/remains my hope that the simpler parts of the flent test suite -
>
>
> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>
>
> tests would provide calibration to the test designers.
>
>
> we've got zillions of flent results in the archive published here:
>
>
> https://blog.cerowrt.org/post/found_in_flent/
>
>
> ps. Misinformation about iperf 2 impacts my ability to do this.
>
> 
> The new tests have all added up + ping and down + ping, but not up +
>
>
> down + ping. Why??
>
>
> The behaviors of what happens in that case are really non-intuitive, I
>
>
> know, but... it's just one more phase to add to any one of those new
>
>
> tests. I'd be deliriously happy if someone(s) new to the field
>
>
> started doing that, even optionally, and boggled at how it defeated
>
>
> their assumptions.
>
>
> Among other things that would show...
>
>
> It's the home router industry's dirty secret than darn few "gigabit"
>
>
> home routers can actually forward in both directions at a gigabit. I'd
>
>
> like to smash that perception thoroughly, but given our starting point
>
>
> is a gigabit router was a "gigabit switch" - and historically been
>
>
> something that couldn't even forward at 200Mbit - we have a long way
>
>
> to go there.
>
>
> Only in the past year have non-x86 home routers appeared that could
>
>
> actually do a gbit in both directions.
>
>
> 2) Few are actually testing within-stream latency
>
>
> Apple's rpm project is making a stab in that direction. It looks
>
>
> highly likely, that with a little more work, crusader and
>
>
> go-responsiveness can finally start sampling the tcp RTT, loss and
>
>
> markings, more directly. As for the rest... sampling TCP_INFO on
>
>
> windows, and Linux, at least, always appeared simple to me, but I'm
>
>
> discovering how hard it is by delving deep into the rust behind
>
>
> crusader.
>
>
> the goresponsiveness thing is also IMHO running WAY too many streams
>
>
> at the same time, I guess motivated by an attempt to have the test
>
>
> complete quickly?
>
>
> B) To try and tackle the validation problem:ps. Misinformation about iperf
>2 impacts my ability to do this.
>
> 
> In the libreqos.io project we've established a testbed where tests can
>
>
> be plunked through various ISP plan network emulations. It's here:
>
>
> https://payne.taht.net (run bandwidth test for what's currently hooked
>
>
> up)
>
>
> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>
>
> leverage with that, so I don't have to nat the various emulations.
>
>
> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>
>
> to see more test designers setup a testbed like this to calibrate
>
>
> their own stuff.
>
>
> Presently we're able to test:
>
>
> flent
>
>
> netperf
>
>
> iperf2
>
>
> iperf3
>
>
> speedtest-cli
>
>
> crusader
>
>
> the broadband forum udp based test:
>
>
> https://github.com/BroadbandForum/obudpst
>
>
> trexx
>
>
> There's also a virtual machine setup that we can remotely drive a web
>
>
> browser from (but I didn't want to nat the results to the world) to
>
>
> test other web services.
>
>
>
>
>
>  _____  
>
>
>
>
>
>
> Rpm mailing list
>
>
> Rpm@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/rpm
>
>
>
>
>
>
>  _____  
>
>
>
>
>
>
> Starlink mailing list
>
>
> Starlink@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/starlink
>
>
>
>
>
>  _____  
>
>
>
>
>
>
> Starlink mailing list
>
>
> Starlink@lists.bufferbloat.net
>
>
> https://lists.bufferbloat.net/listinfo/starlink
> 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-13  7:44         ` Sebastian Moeller
@ 2023-01-13  8:01           ` Dick Roy
  0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-13  8:01 UTC (permalink / raw)
  To: 'Sebastian Moeller', 'Robert McMahon'
  Cc: mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

[-- Attachment #1: Type: text/plain, Size: 13323 bytes --]

 

 

-----Original Message-----
From: Sebastian Moeller [mailto:moeller0@gmx.de] 
Sent: Thursday, January 12, 2023 11:45 PM
To: dickroy@alum.mit.edu; Dick Roy; 'Robert McMahon'
Cc: mike.reynolds@netforecast.com; 'libreqos'; 'David P. Reed'; 'Rpm';
'bloat'
Subject: RE: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

Hi RR

 

On 12 January 2023 22:57:32 CET, Dick Roy <dickroy@alum.mit.edu> wrote:

>FYI .

> 

> 

> 

>https://www.fiercewireless.com/tech/cbrs-based-fwa-beats-starlink-performan
c

>e-madden

> 

 

[SM] He is so close:

[RR] Which is why I posted the link :-)  I knew you'd latch on to his
thread! 

  

'Speed tests don't tell us much about the capacity of the network, or the
reliability of the network, or the true latency with larger packet sizes.
Packet loss testing can help to fill in key missing information to give the
end customer the smooth experience they're looking for.'

and

'Packets received over 250 ms latency are considered too late to be useful
for video conferencing.'

 

He actually reports both loss numbers and delay > 250ms, so in spite arguing
that loss is the relevant metric he already dips his toes into the latency
issue... I wonder whether his view will refine over time now that he
apparently moved from a link with 8% packet loss to one with a more sane
0.1% loss rate (no idea how he measured lossrate though, or latency). I
guess this shows that there is no single solution for all links, it really
matters where one starts which of throughput, delay, loss is the most
painful and hence the dimension in need of a fix first.

 

Regards

        Sebastian

 

 

 

> 

> 

>Nothing earth-shaking :-)

> 

> 

>RR

> 

> 

> 

>  _____  

> 

>From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of

>Robert McMahon via Starlink

>Sent: Thursday, January 12, 2023 9:50 AM

>To: Sebastian Moeller

>Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David

>P. Reed; Rpm; bloat

>Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

> 

> 

> 

>Hi Sebastien,

> 

>You make a good point. What I did was issue a warning if the tool found it

>was being CPU limited vs i/o limited. This indicates the i/o test likely is

>inaccurate from an i/o perspective, and the results are suspect. It does

>this crudely by comparing the cpu thread doing stats against the traffic

>threads doing i/o, which thread is waiting on the others. There is no

>attempt to assess the cpu load itself. So it's designed with a singular

>purpose of making sure i/o threads only block on syscalls of write and
read.

> 

>I probably should revisit this both in design and implementation. Thanks
for

>bringing it up and all input is truly appreciated. 

> 

>Bob

> 

>On Jan 12, 2023, at 12:14 AM, Sebastian Moeller <moeller0@gmx.de> wrote:

> 

>Hi Bob,

> 

> 

> 

> 

> 

> 

> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon@rjmcmahon.com> wrote:

> 

> 

> 

> 

> 

> Iperf 2 is designed to measure network i/o. Note: It doesn't have to move

>large amounts of data. It can support data profiles that don't drive TCP's

>CCA as an example.

> 

> 

> 

> 

> 

> Two things I've been asked for and avoided:

> 

> 

> 

> 

> 

> 1) Integrate clock sync into iperf's test traffic

> 

> 

> 

> [SM] This I understand, measurement conditions can be unsuited for tight

>time synchronization...

> 

> 

> 

> 

> 

> 

> 2) Measure and output CPU usages

> 

> 

> 

> [SM] This one puzzles me, as far as I understand the only way to properly

>diagnose network issues is to rule out other things like CPU overload that

>can have symptoms similar to network issues. As an example, the cake qdisc

>will if CPU cycles become tight first increases its internal queueing and

>jitter (not consciously, it is just an observation that once cake does not

>get access to the CPU as timely as it wants, queuing latency and
variability

>increases) and then later also shows reduced throughput, so similar things

>that can happen along an e2e network path for completely different reasons,

>e.g. lower level retransmissions or a variable rate link. So i would think

>that checking the CPU load at least coarse would be within the scope of

>network testing tools, no?

> 

> 

> 

> 

> 

>Regards

> 

> 

> Sebastian

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> I think both of these are outside the scope of a tool designed to test

>network i/o over sockets, rather these should be developed & validated

>independently of a network i/o tool.

> 

> 

> 

> 

> 

> Clock error really isn't about amount/frequency of traffic but rather

>getting a periodic high-quality reference. I tend to use GPS pulse per

>second to lock the local system oscillator to. As David says, most every

>modern handheld computer has the GPS chips to do this already. So to me it

>seems more of a policy choice between data center operators and device mfgs

>and less of a technical issue.

> 

> 

> 

> 

> 

> Bob

> Hello,

> 

> 

>  Yall can call me crazy if you want.. but... see below [RWG]

> Hi Bib,

> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink

><starlink@lists.bufferbloat.net> wrote:

> 

> 

> 

> 

> 

> My biggest barrier is the lack of clock sync by the devices, i.e. very

>limited support for PTP in data centers and in end devices. This limits the

>ability to measure one way delays (OWD) and most assume that OWD is 1/2 and

>RTT which typically is a mistake. We know this intuitively with airplane

>flight times or even car commute times where the one way time is not 1/2 a

>round trip time. Google maps & directions provide a time estimate for the

>one way link. It doesn't compute a round trip and divide by two.

> 

> 

> 

> 

> 

> For those that can get clock sync working, the iperf 2 --trip-times
options

>is useful.

>  [SM] +1; and yet even with unsynchronized clocks one can try to measure

>how latency changes under load and that can be done per direction. Sure
this

>is far inferior to real reliably measured OWDs, but if life/the internet

>deals you lemons....

> [RWG] iperf2/iperf3, etc are already moving large amounts of data

> 

> 

> back and forth, for that matter any rate test, why not abuse some of

> 

> 

> that data and add the fundemental NTP clock sync data and

> 

> 

> bidirectionally pass each others concept of "current time".  IIRC (its

> 

> 

> been 25 years since I worked on NTP at this level) you *should* be

> 

> 

> able to get a fairly accurate clock delta between each end, and then

> 

> 

> use that info and time stamps in the data stream to compute OWD's.

> 

> 

> You need to put 4 time stamps in the packet, and with that you can

> 

> 

> compute "offset".

> 

> 

> 

> 

> --trip-times

> 

> 

>  enable the measurement of end to end write to read latencies (client and

>server clocks must be synchronized)

> 

> [RWG] --clock-skew

> 

> 

>  enable the measurement of the wall clock difference between sender and

>receiver

>  [SM] Sweet!

> 

> 

> Regards

> 

> 

>  Sebastian

> 

> 

> 

> Bob

> I have many kvetches about the new latency under load tests being

> 

> 

> designed and distributed over the past year. I am delighted! that they

> 

> 

> are happening, but most really need third party evaluation, and

> 

> 

> calibration, and a solid explanation of what network pathologies they

> 

> 

> do and don't cover. Also a RED team attitude towards them, as well as

> 

> 

> thinking hard about what you are not measuring (operations research).

> 

> 

> I actually rather love the new cloudflare speedtest, because it tests

> 

> 

> a single TCP connection, rather than dozens, and at the same time folk

> 

> 

> are complaining that it doesn't find the actual "speed!". yet... the

> 

> 

> test itself more closely emulates a user experience than speedtest.net

> 

> 

> does. I am personally pretty convinced that the fewer numbers of flows

> 

> 

> that a web page opens improves the likelihood of a good user

> 

> 

> experience, but lack data on it.

> 

> 

> To try to tackle the evaluation and calibration part, I've reached out

> 

> 

> to all the new test designers in the hope that we could get together

> 

> 

> and produce a report of what each new test is actually doing. I've

> 

> 

> tweeted, linked in, emailed, and spammed every measurement list I know

> 

> 

> of, and only to some response, please reach out to other test designer

> 

> 

> folks and have them join the rpm email list?

> 

> 

> My principal kvetches in the new tests so far are:

> 

> 

> 0) None of the tests last long enough.

> 

> 

> Ideally there should be a mode where they at least run to "time of

> 

> 

> first loss", or periodically, just run longer than the

> 

> 

> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

> 

> 

> there! It's really bad science to optimize the internet for 20

> 

> 

> seconds. It's like optimizing a car, to handle well, for just 20

> 

> 

> seconds.

> 

> 

> 1) Not testing up + down + ping at the same time

> 

> 

> None of the new tests actually test the same thing that the infamous

> 

> 

> rrul test does - all the others still test up, then down, and ping. It

> 

> 

> was/remains my hope that the simpler parts of the flent test suite -

> 

> 

> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

> 

> 

> tests would provide calibration to the test designers.

> 

> 

> we've got zillions of flent results in the archive published here:

> 

> 

> https://blog.cerowrt.org/post/found_in_flent/

> 

> 

> ps. Misinformation about iperf 2 impacts my ability to do this.

> 

> 

> The new tests have all added up + ping and down + ping, but not up +

> 

> 

> down + ping. Why??

> 

> 

> The behaviors of what happens in that case are really non-intuitive, I

> 

> 

> know, but... it's just one more phase to add to any one of those new

> 

> 

> tests. I'd be deliriously happy if someone(s) new to the field

> 

> 

> started doing that, even optionally, and boggled at how it defeated

> 

> 

> their assumptions.

> 

> 

> Among other things that would show...

> 

> 

> It's the home router industry's dirty secret than darn few "gigabit"

> 

> 

> home routers can actually forward in both directions at a gigabit. I'd

> 

> 

> like to smash that perception thoroughly, but given our starting point

> 

> 

> is a gigabit router was a "gigabit switch" - and historically been

> 

> 

> something that couldn't even forward at 200Mbit - we have a long way

> 

> 

> to go there.

> 

> 

> Only in the past year have non-x86 home routers appeared that could

> 

> 

> actually do a gbit in both directions.

> 

> 

> 2) Few are actually testing within-stream latency

> 

> 

> Apple's rpm project is making a stab in that direction. It looks

> 

> 

> highly likely, that with a little more work, crusader and

> 

> 

> go-responsiveness can finally start sampling the tcp RTT, loss and

> 

> 

> markings, more directly. As for the rest... sampling TCP_INFO on

> 

> 

> windows, and Linux, at least, always appeared simple to me, but I'm

> 

> 

> discovering how hard it is by delving deep into the rust behind

> 

> 

> crusader.

> 

> 

> the goresponsiveness thing is also IMHO running WAY too many streams

> 

> 

> at the same time, I guess motivated by an attempt to have the test

> 

> 

> complete quickly?

> 

> 

> B) To try and tackle the validation problem:ps. Misinformation about iperf

>2 impacts my ability to do this.

> 

> 

> In the libreqos.io project we've established a testbed where tests can

> 

> 

> be plunked through various ISP plan network emulations. It's here:

> 

> 

> https://payne.taht.net (run bandwidth test for what's currently hooked

> 

> 

> up)

> 

> 

> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to

> 

> 

> leverage with that, so I don't have to nat the various emulations.

> 

> 

> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

> 

> 

> to see more test designers setup a testbed like this to calibrate

> 

> 

> their own stuff.

> 

> 

> Presently we're able to test:

> 

> 

> flent

> 

> 

> netperf

> 

> 

> iperf2

> 

> 

> iperf3

> 

> 

> speedtest-cli

> 

> 

> crusader

> 

> 

> the broadband forum udp based test:

> 

> 

> https://github.com/BroadbandForum/obudpst

> 

> 

> trexx

> 

> 

> There's also a virtual machine setup that we can remotely drive a web

> 

> 

> browser from (but I didn't want to nat the results to the world) to

> 

> 

> test other web services.

> 

> 

> 

> 

> 

>  _____  

> 

> 

> 

> 

> 

> 

> Rpm mailing list

> 

> 

> Rpm@lists.bufferbloat.net

> 

> 

> https://lists.bufferbloat.net/listinfo/rpm

> 

> 

> 

> 

> 

> 

>  _____  

> 

> 

> 

> 

> 

> 

> Starlink mailing list

> 

> 

> Starlink@lists.bufferbloat.net

> 

> 

> https://lists.bufferbloat.net/listinfo/starlink

> 

> 

> 

> 

> 

>  _____  

> 

> 

> 

> 

> 

> 

> Starlink mailing list

> 

> 

> Starlink@lists.bufferbloat.net

> 

> 

> https://lists.bufferbloat.net/listinfo/starlink

> 

 

-- 

Sent from my Android device with K-9 Mail. Please excuse my brevity.


[-- Attachment #2: Type: text/html, Size: 85508 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-13  7:40         ` rjmcmahon
@ 2023-01-13  8:10           ` Dick Roy
  2023-01-15 23:09             ` rjmcmahon
  0 siblings, 1 reply; 19+ messages in thread
From: Dick Roy @ 2023-01-13  8:10 UTC (permalink / raw)
  To: 'rjmcmahon'
  Cc: 'Sebastian Moeller', 'Rodney W. Grimes',
	mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

[-- Attachment #1: Type: text/plain, Size: 18900 bytes --]

 

 

-----Original Message-----
From: rjmcmahon [mailto:rjmcmahon@rjmcmahon.com] 
Sent: Thursday, January 12, 2023 11:40 PM
To: dickroy@alum.mit.edu
Cc: 'Sebastian Moeller'; 'Rodney W. Grimes'; mike.reynolds@netforecast.com;
'libreqos'; 'David P. Reed'; 'Rpm'; 'bloat'
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

Hi RR,

 

I believe quality GPS chips compensate for relativity in pulse per 

second which is needed to get position accuracy.

[RR] Of course they do.  That 38usec/day really matters! They assume they
know what the gravitational potential is where they are, and they can
estimate the potential at the satellites so they can compensate, and they
do.  Point is, a GPS unit at Lake Tahoe (6250') runs faster than the one in
San Francisco (sea level).  How do you think these two "should be
synchronized"!   How do you define "synchronization" in this case?  You
synchronize those two clocks, then what about all the other clocks at Lake
Tahoe (or SF or anywhere in between for that matter :-))??? These are not
trivial questions. However if all one cares about is seconds or
milliseconds, then you can argue that we (earthlings on planet earth) can
"sweep such facts under the proverbial rug" for the purposes of latency in
communication networks and that's certainly doable.  Don't tell that to the
guys whose protocols require "synchronization of all unit to nanoseconds"
though!  They will be very, very unhappy :-) :-) And you know who you are
:-) :-) 

 

:-)

 

Bob

> Hi Sebastian (et. al.),

> 

> [I'll comment up here instead of inline.]

> 

> Let me start by saying that I have not been intimately involved with

> the IEEE 1588 effort (PTP), however I was involved in the 802.11

> efforts along a similar vein, just adding the wireless first hop

> component and it's effects on PTP.

> 

> What was apparent from the outset was that there was a lack of

> understanding what the terms "to synchronize" or "to be synchronized"

> actually mean.  It's not trivial . because we live in a

> (approximately, that's another story!) 4-D space-time continuum where

> the Lorentz metric plays a critical role.  Therein, simultaneity (aka

> "things happening at the same time") means the "distance" between two

> such events is zero and that distance is given by sqrt(x^2 + y^2 + z^2

> - (ct)^2) and the "thing happening" can be the tick of a clock

> somewhere. Now since everything is relative (time with respect to

> what? / location with respect to where?) it's pretty easy to see that

> "if you don't know where you are, you can't know what time it is!"

> (English sailors of the 18th century knew this well!) Add to this the

> fact that if everything were stationary, nothing would happen (as

> Einstein said "Nothing happens until something moves!"), special

> relativity also pays a role.  Clocks on GPS satellites run approx.

> 7usecs/day slower than those on earth due to their "speed" (8700 mph

> roughly)! Then add the consequence that without mass we wouldn't exist

> (in these forms at leastJ), and gravitational effects (aka General

> Relativity) come into play. Those turn out to make clocks on GPS

> satellites run 45usec/day faster than those on earth!  The net effect

> is that GPS clocks run about 38usec/day faster than clocks on earth.

> So what does it mean to "synchronize to GPS"?  Point is: it's a

> non-trivial question with a very complicated answer.  The reason it is

> important to get all this right is that the "what that ties time and

> space together" is the speed of light and that turns out to be a

> "foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means if

> I am uncertain about my location to say 300 meters, then I also am not

> sure what time it is to a usec AND vice-versa!

> 

> All that said, the simplest explanation of synchronization is

> probably: Two clocks are synchronized if, when they are brought

> (slowly) into physical proximity ("sat next to each other") in the

> same (quasi-)inertial frame and the same gravitational potential (not

> so obvious BTW . see the FYI below!), an observer of both would say

> "they are keeping time identically". Since this experiment is rarely

> possible, one can never be "sure" that his clock is synchronized to

> any other clock elsewhere. And what does it mean to say they "were

> synchronized" when brought together, but now they are not because they

> are now in different gravitational potentials! (FYI, there are land

> mine detectors being developed on this very principle! I know someone

> who actually worked on such a project!)

> 

> This all gets even more complicated when dealing with large networks

> of networks in which the "speed of information transmission" can vary

> depending on the medium (cf. coaxial cables versus fiber versus

> microwave links!) In fact, the atmosphere is one of those media and

> variations therein result in the need for "GPS corrections" (cf. RTCM

> GPS correction messages, RTK, etc.) in order to get to sub-nsec/cm

> accuracy.  Point is if you have a set of nodes distributed across the

> country all with GPS and all "synchronized to GPS time", and a second

> identical set of nodes (with no GPS) instead connected with a network

> of cables and fiber links, all of different lengths and composition

> using different carrier frequencies (dielectric constants vary with

> frequency!) "synchronized" to some clock somewhere using NTP or PTP),

> the synchronization of the two sets will be different unless a common

> reference clock is used AND all the above effects are taken into

> account, and good luck with that! J

> 

> In conclusion, if anyone tells you that clock synchronization in

> communication networks is simple ("Just use GPS!"), you should feel

> free to chuckle (under your breath if necessaryJ)

> 

> Cheers,

> 

> RR

> 

> -----Original Message-----

> From: Sebastian Moeller [mailto:moeller0@gmx.de]

> Sent: Thursday, January 12, 2023 12:23 AM

> To: Dick Roy

> Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David

> P. Reed; Rpm; rjmcmahon; bloat

> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in

> USA

> 

> Hi RR,

> 

>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:

> 

>> 

> 

>> 

> 

>> 

> 

>> -----Original Message-----

> 

>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On

> Behalf Of Sebastian Moeller via Starlink

> 

>> Sent: Wednesday, January 11, 2023 12:01 PM

> 

>> To: Rodney W. Grimes

> 

>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos;

> David P. Reed; Rpm; rjmcmahon; bloat

> 

>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers

> in USA

> 

>> 

> 

>> Hi Rodney,

> 

>> 

> 

>> 

> 

>> 

> 

>> 

> 

>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes

> <starlink@gndrsh.dnsmgr.net> wrote:

> 

>> >

> 

>> > Hello,

> 

>> >

> 

>> >     Yall can call me crazy if you want.. but... see below [RWG]

> 

>> >> Hi Bib,

> 

>> >>

> 

>> >>

> 

>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink

> <starlink@lists.bufferbloat.net> wrote:

> 

>> >>>

> 

>> >>> My biggest barrier is the lack of clock sync by the devices,

> i.e. very limited support for PTP in data centers and in end devices.

> This limits the ability to measure one way delays (OWD) and most

> assume that OWD is 1/2 and RTT which typically is a mistake. We know

> this intuitively with airplane flight times or even car commute times

> where the one way time is not 1/2 a round trip time. Google maps &

> directions provide a time estimate for the one way link. It doesn't

> compute a round trip and divide by two.

> 

>> >>>

> 

>> >>> For those that can get clock sync working, the iperf 2

> --trip-times options is useful.

> 

>> >>

> 

>> >>    [SM] +1; and yet even with unsynchronized clocks one can try

> to measure how latency changes under load and that can be done per

> direction. Sure this is far inferior to real reliably measured OWDs,

> but if life/the internet deals you lemons....

> 

>> >

> 

>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data

> back and forth, for that matter any rate test, why not abuse some of

> that data and add the fundemental NTP clock sync data and

> bidirectionally pass each others concept of "current time".  IIRC (its

> been 25 years since I worked on NTP at this level) you *should* be

> able to get a fairly accurate clock delta between each end, and then

> use that info and time stamps in the data stream to compute OWD's.

> You need to put 4 time stamps in the packet, and with that you can

> compute "offset".

> 

>> [RR] For this to work at a reasonable level of accuracy, the

> timestamping circuits on both ends need to be deterministic and

> repeatable as I recall. Any uncertainty in that process adds to

> synchronization errors/uncertainties.

> 

>> 

> 

>>       [SM] Nice idea. I would guess that all timeslot based access

> technologies (so starlink, docsis, GPON, LTE?) all distribute "high

> quality time" carefully to the "modems", so maybe all that would be

> needed is to expose that high quality time to the LAN side of those

> modems, dressed up as NTP server?

> 

>> [RR] It's not that simple!  Distributing "high-quality time", i.e.

> "synchronizing all clocks" does not solve the communication problem in

> synchronous slotted MAC/PHYs!

> 

>       [SM] I happily believe you, but the same idea of "time slot"

> needs to be shared by all nodes, no? So the clockss need to be

> reasonably similar rate, aka synchronized (see below).

> 

>>  All the technologies you mentioned above are essentially P2P, not

> intended for broadcast.  Point is, there is a point controller (aka

> PoC) often called a base station (eNodeB, gNodeB, .) that actually

> "controls everything that is necessary to control" at the UE including

> time, frequency and sampling time offsets, and these are critical to

> get right if you want to communicate, and they are ALL subject to the

> laws of physics (cf. the speed of light)! Turns out that what is

> necessary for the system to function anywhere near capacity, is for

> all the clocks governing transmissions from the UEs to be

> "unsynchronized" such that all the UE transmissions arrive at the PoC

> at the same (prescribed) time!

> 

>       [SM] Fair enough. I would call clocks that are "in sync" albeit

> with individual offsets as synchronized, but I am a layman and that

> might sound offensively wrong to experts in the field. But even

> without the naming my point is that all systems that depend on some

> idea of shared time-base are halfway there of exposing that time to

> end users, by "translating it into an NTP time source at the modem.

> 

>> For some technologies, in particular 5G!, these considerations are

> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you

> don't believe me! J

> 

>       [SM Far be it from me not to believe you, so thanks for the

> pointers. Yet, I still think that unless different nodes of a shared

> segment move at significantly different speeds, that there should be a

> common "tick-duration" for all clocks even if each clock runs at an

> offset... (I naively would try to implement something like that by

> trying to fully synchronize clocks and maintain a local offset value

> to convert from "absolute" time to "network" time, but likely because

> coming from the outside I am blissfully unaware of the detail

> challenges that need to be solved).

> 

> Regards & Thanks

> 

>       Sebastian

> 

>> 

> 

>> 

> 

>> >

> 

>> >>

> 

>> >>

> 

>> >>>

> 

>> >>> --trip-times

> 

>> >>> enable the measurement of end to end write to read latencies

> (client and server clocks must be synchronized)

> 

>> > [RWG] --clock-skew

> 

>> >     enable the measurement of the wall clock difference between

> sender and receiver

> 

>> >

> 

>> >>

> 

>> >>    [SM] Sweet!

> 

>> >>

> 

>> >> Regards

> 

>> >>    Sebastian

> 

>> >>

> 

>> >>>

> 

>> >>> Bob

> 

>> >>>> I have many kvetches about the new latency under load tests

> being

> 

>> >>>> designed and distributed over the past year. I am delighted!

> that they

> 

>> >>>> are happening, but most really need third party evaluation, and

> 

> 

>> >>>> calibration, and a solid explanation of what network

> pathologies they

> 

>> >>>> do and don't cover. Also a RED team attitude towards them, as

> well as

> 

>> >>>> thinking hard about what you are not measuring (operations

> research).

> 

>> >>>> I actually rather love the new cloudflare speedtest, because it

> tests

> 

>> >>>> a single TCP connection, rather than dozens, and at the same

> time folk

> 

>> >>>> are complaining that it doesn't find the actual "speed!".

> yet... the

> 

>> >>>> test itself more closely emulates a user experience than

> speedtest.net

> 

>> >>>> does. I am personally pretty convinced that the fewer numbers

> of flows

> 

>> >>>> that a web page opens improves the likelihood of a good user

> 

>> >>>> experience, but lack data on it.

> 

>> >>>> To try to tackle the evaluation and calibration part, I've

> reached out

> 

>> >>>> to all the new test designers in the hope that we could get

> together

> 

>> >>>> and produce a report of what each new test is actually doing.

> I've

> 

>> >>>> tweeted, linked in, emailed, and spammed every measurement list

> I know

> 

>> >>>> of, and only to some response, please reach out to other test

> designer

> 

>> >>>> folks and have them join the rpm email list?

> 

>> >>>> My principal kvetches in the new tests so far are:

> 

>> >>>> 0) None of the tests last long enough.

> 

>> >>>> Ideally there should be a mode where they at least run to "time

> of

> 

>> >>>> first loss", or periodically, just run longer than the

> 

>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be

> dragons

> 

>> >>>> there! It's really bad science to optimize the internet for 20

> 

>> >>>> seconds. It's like optimizing a car, to handle well, for just

> 20

> 

>> >>>> seconds.

> 

>> >>>> 1) Not testing up + down + ping at the same time

> 

>> >>>> None of the new tests actually test the same thing that the

> infamous

> 

>> >>>> rrul test does - all the others still test up, then down, and

> ping. It

> 

>> >>>> was/remains my hope that the simpler parts of the flent test

> suite -

> 

>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the

> rtt_fair

> 

>> >>>> tests would provide calibration to the test designers.

> 

>> >>>> we've got zillions of flent results in the archive published

> here:

> 

>> >>>> https://blog.cerowrt.org/post/found_in_flent/

> 

>> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.

> 

> 

>> >>>

> 

>> >>>> The new tests have all added up + ping and down + ping, but not

> up +

> 

>> >>>> down + ping. Why??

> 

>> >>>> The behaviors of what happens in that case are really

> non-intuitive, I

> 

>> >>>> know, but... it's just one more phase to add to any one of

> those new

> 

>> >>>> tests. I'd be deliriously happy if someone(s) new to the field

> 

>> >>>> started doing that, even optionally, and boggled at how it

> defeated

> 

>> >>>> their assumptions.

> 

>> >>>> Among other things that would show...

> 

>> >>>> It's the home router industry's dirty secret than darn few

> "gigabit"

> 

>> >>>> home routers can actually forward in both directions at a

> gigabit. I'd

> 

>> >>>> like to smash that perception thoroughly, but given our

> starting point

> 

>> >>>> is a gigabit router was a "gigabit switch" - and historically

> been

> 

>> >>>> something that couldn't even forward at 200Mbit - we have a

> long way

> 

>> >>>> to go there.

> 

>> >>>> Only in the past year have non-x86 home routers appeared that

> could

> 

>> >>>> actually do a gbit in both directions.

> 

>> >>>> 2) Few are actually testing within-stream latency

> 

>> >>>> Apple's rpm project is making a stab in that direction. It

> looks

> 

>> >>>> highly likely, that with a little more work, crusader and

> 

>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss

> and

> 

>> >>>> markings, more directly. As for the rest... sampling TCP_INFO

> on

> 

>> >>>> windows, and Linux, at least, always appeared simple to me, but

> I'm

> 

>> >>>> discovering how hard it is by delving deep into the rust behind

> 

> 

>> >>>> crusader.

> 

>> >>>> the goresponsiveness thing is also IMHO running WAY too many

> streams

> 

>> >>>> at the same time, I guess motivated by an attempt to have the

> test

> 

>> >>>> complete quickly?

> 

>> >>>> B) To try and tackle the validation problem:ps. Misinformation

> about iperf 2 impacts my ability to do this.

> 

>> >>>

> 

>> >>>> In the libreqos.io project we've established a testbed where

> tests can

> 

>> >>>> be plunked through various ISP plan network emulations. It's

> here:

> 

>> >>>> https://payne.taht.net (run bandwidth test for what's currently

> hooked

> 

>> >>>> up)

> 

>> >>>> We could rather use an AS number and at least a ipv4/24 and

> ipv6/48 to

> 

>> >>>> leverage with that, so I don't have to nat the various

> emulations.

> 

>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2

> licensed,

> 

>> >>>> to see more test designers setup a testbed like this to

> calibrate

> 

>> >>>> their own stuff.

> 

>> >>>> Presently we're able to test:

> 

>> >>>> flent

> 

>> >>>> netperf

> 

>> >>>> iperf2

> 

>> >>>> iperf3

> 

>> >>>> speedtest-cli

> 

>> >>>> crusader

> 

>> >>>> the broadband forum udp based test:

> 

>> >>>> https://github.com/BroadbandForum/obudpst

> 

>> >>>> trexx

> 

>> >>>> There's also a virtual machine setup that we can remotely drive

> a web

> 

>> >>>> browser from (but I didn't want to nat the results to the

> world) to

> awhile

>> >>>> test other web services.

> 

>> >>>> _______________________________________________

> 

>> >>>> Rpm mailing list

> 

>> >>>> Rpm@lists.bufferbloat.net

> 

>> >>>> https://lists.bufferbloat.net/listinfo/rpm

> 

>> >>> _______________________________________________

> 

>> >>> Starlink mailing list

> 

>> >>> Starlink@lists.bufferbloat.net

> 

>> >>> https://lists.bufferbloat.net/listinfo/starlink

> 

>> >>

> 

>> >> _______________________________________________

> 

>> >> Starlink mailing list

> 

>> >> Starlink@lists.bufferbloat.net

> 

>> >> https://lists.bufferbloat.net/listinfo/starlink

> 

>> 

> 

>> _______________________________________________

> 

>> Starlink mailing list

> 

>> Starlink@lists.bufferbloat.net

> 

>> https://lists.bufferbloat.net/listinfo/starlink


[-- Attachment #2: Type: text/html, Size: 91806 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-13  7:33         ` Sebastian Moeller
@ 2023-01-13  8:26           ` Dick Roy
  0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-13  8:26 UTC (permalink / raw)
  To: 'Sebastian Moeller'
  Cc: 'Rodney W. Grimes', mike.reynolds, 'libreqos',
	'David P. Reed', 'Rpm', 'rjmcmahon',
	'bloat'

[-- Attachment #1: Type: text/plain, Size: 17792 bytes --]

 

 

  _____  

From: Sebastian Moeller [mailto:moeller0@gmx.de] 
Sent: Thursday, January 12, 2023 11:33 PM
To: dickroy@alum.mit.edu; Dick Roy
Cc: 'Rodney W. Grimes'; mike.reynolds@netforecast.com; 'libreqos'; 'David P.
Reed'; 'Rpm'; 'rjmcmahon'; 'bloat'
Subject: RE: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

Hi RR,

Thanks for the detailed response below, since my point is somewhat
orthogonal I opted for top-posting.
Let me take a step back here and rephrase, synchronising clocks within an
acceptable range to be useful is not rocket science nor witchcraft. For
measuring internet traffic 'millisecond' range seems acceptable, local
networks can probably profit from finer time resolution. So I am not after
e.g. clock synchronisation to participate in SDH/SONET. Heck in the toy
project I am active in, we operate on load dependent delay deltas so we even
ignore different time offsets and are tolerant to (mildly) different
tickrates and clock skew, but it would certainly be nice to have some
acceptable measure of UTC from endpoints to be able to interpret timestamps
as 'absolute'. Mind you I am fine with them not being veridical absolute,
but just good enough for my measurement purpose and I guess that should be
within the range of the achievable. Heck, if all servers we query timestamps
of would be NTP-'synchronized' and would follow the RFC recommendation to
report timestamps in milliseconds past midnight UTC I would be happy.

[RR] Yup!  All true. Hence my post that obviously passed this one in the
ether! :-) :-) 



Regards
        Sebsstian

On 12 January 2023 21:39:21 CET, Dick Roy <dickroy@alum.mit.edu> wrote:

Hi Sebastian (et. al.),

 

[I'll comment up here instead of inline.]  

 

Let me start by saying that I have not been intimately involved with the
IEEE 1588 effort (PTP), however I was involved in the 802.11 efforts along a
similar vein, just adding the wireless first hop component and it's effects
on PTP.  

 

What was apparent from the outset was that there was a lack of understanding
what the terms "to synchronize" or "to be synchronized" actually mean.  It's
not trivial . because we live in a (approximately, that's another story!)
4-D space-time continuum where the Lorentz metric plays a critical role.
Therein, simultaneity (aka "things happening at the same time") means the
"distance" between two such events is zero and that distance is given by
sqrt(x^2 + y^2 + z^2 - (ct)^2) and the "thing happening" can be the tick of
a clock somewhere. Now since everything is relative (time with respect to
what? / location with respect to where?) it's pretty easy to see that "if
you don't know where you are, you can't know what time it is!" (English
sailors of the 18th century knew this well!) Add to this the fact that if
everything were stationary, nothing would happen (as Einstein said "Nothing
happens until something moves!"), special relativity also pays a role.
Clocks on GPS satellites run approx. 7usecs/day slower than those on earth
due to their "speed" (8700 mph roughly)! Then add the consequence that
without mass we wouldn't exist (in these forms at least:-)), and
gravitational effects (aka General Relativity) come into play. Those turn
out to make clocks on GPS satellites run 45usec/day faster than those on
earth!  The net effect is that GPS clocks run about 38usec/day faster than
clocks on earth.  So what does it mean to "synchronize to GPS"?  Point is:
it's a non-trivial question with a very complicated answer.  The reason it
is important to get all this right is that the "what that ties time and
space together" is the speed of light and that turns out to be a
"foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means if I am
uncertain about my location to say 300 meters, then I also am not sure what
time it is to a usec AND vice-versa! 

 

All that said, the simplest explanation of synchronization is probably: Two
clocks are synchronized if, when they are brought (slowly) into physical
proximity ("sat next to each other") in the same (quasi-)inertial frame and
the same gravitational potential (not so obvious BTW . see the FYI below!),
an observer of both would say "they are keeping time identically". Since
this experiment is rarely possible, one can never be "sure" that his clock
is synchronized to any other clock elsewhere. And what does it mean to say
they "were synchronized" when brought together, but now they are not because
they are now in different gravitational potentials! (FYI, there are land
mine detectors being developed on this very principle! I know someone who
actually worked on such a project!) 

 

This all gets even more complicated when dealing with large networks of
networks in which the "speed of information transmission" can vary depending
on the medium (cf. coaxial cables versus fiber versus microwave links!) In
fact, the atmosphere is one of those media and variations therein result in
the need for "GPS corrections" (cf. RTCM GPS correction messages, RTK, etc.)
in order to get to sub-nsec/cm accuracy.  Point is if you have a set of
nodes distributed across the country all with GPS and all "synchronized to
GPS time", and a second identical set of nodes (with no GPS) instead
connected with a network of cables and fiber links, all of different lengths
and composition using different carrier frequencies (dielectric constants
vary with frequency!) "synchronized" to some clock somewhere using NTP or
PTP), the synchronization of the two sets will be different unless a common
reference clock is used AND all the above effects are taken into account,
and good luck with that! :-) 

 

In conclusion, if anyone tells you that clock synchronization in
communication networks is simple ("Just use GPS!"), you should feel free to
chuckle (under your breath if necessary:-)) 

 

Cheers,

 

RR

 

 

  

 

 

 

-----Original Message-----
From: Sebastian Moeller [mailto:moeller0@gmx.de] 
Sent: Thursday, January 12, 2023 12:23 AM
To: Dick Roy
Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David P.
Reed; Rpm; rjmcmahon; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

Hi RR,

 

 

> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:

> 

>  

>  

> -----Original Message-----

> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf
Of Sebastian Moeller via Starlink

> Sent: Wednesday, January 11, 2023 12:01 PM

> To: Rodney W. Grimes

> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com; libreqos; David
P. Reed; Rpm; rjmcmahon; bloat

> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

>  

> Hi Rodney,

>  

>  

>  

>  

> > On Jan 11, 2023, at 19:32, Rodney W. Grimes <starlink@gndrsh.dnsmgr.net>
wrote:

> > 

> > Hello,

> > 

> >     Yall can call me crazy if you want.. but... see below [RWG]

> >> Hi Bib,

> >> 

> >> 

> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
<starlink@lists.bufferbloat.net> wrote:

> >>> 

> >>> My biggest barrier is the lack of clock sync by the devices, i.e. very
limited support for PTP in data centers and in end devices. This limits the
ability to measure one way delays (OWD) and most assume that OWD is 1/2 and
RTT which typically is a mistake. We know this intuitively with airplane
flight times or even car commute times where the one way time is not 1/2 a
round trip time. Google maps & directions provide a time estimate for the
one way link. It doesn't compute a round trip and divide by two.

> >>> 

> >>> For those that can get clock sync working, the iperf 2 --trip-times
options is useful.

> >> 

> >>    [SM] +1; and yet even with unsynchronized clocks one can try to
measure how latency changes under load and that can be done per direction.
Sure this is far inferior to real reliably measured OWDs, but if life/the
internet deals you lemons....

> > 

> > [RWG] iperf2/iperf3, etc are already moving large amounts of data back
and forth, for that matter any rate test, why not abuse some of that data
and add the fundemental NTP clock sync data and bidirectionally pass each
others concept of "current time".  IIRC (its been 25 years since I worked on
NTP at this level) you *should* be able to get a fairly accurate clock delta
between each end, and then use that info and time stamps in the data stream
to compute OWD's.  You need to put 4 time stamps in the packet, and with
that you can compute "offset".

> [RR] For this to work at a reasonable level of accuracy, the timestamping
circuits on both ends need to be deterministic and repeatable as I recall.
Any uncertainty in that process adds to synchronization
errors/uncertainties.

>  

>       [SM] Nice idea. I would guess that all timeslot based access
technologies (so starlink, docsis, GPON, LTE?) all distribute "high quality
time" carefully to the "modems", so maybe all that would be needed is to
expose that high quality time to the LAN side of those modems, dressed up as
NTP server?

> [RR] It's not that simple!  Distributing "high-quality time", i.e.
"synchronizing all clocks" does not solve the communication problem in
synchronous slotted MAC/PHYs!

 

      [SM] I happily believe you, but the same idea of "time slot" needs to
be shared by all nodes, no? So the clockss need to be reasonably similar
rate, aka synchronized (see below).

 

 

>  All the technologies you mentioned above are essentially P2P, not
intended for broadcast.  Point is, there is a point controller (aka PoC)
often called a base station (eNodeB, gNodeB, .) that actually "controls
everything that is necessary to control" at the UE including time, frequency
and sampling time offsets, and these are critical to get right if you want
to communicate, and they are ALL subject to the laws of physics (cf. the
speed of light)! Turns out that what is necessary for the system to function
anywhere near capacity, is for all the clocks governing transmissions from
the UEs to be "unsynchronized" such that all the UE transmissions arrive at
the PoC at the same (prescribed) time!

 

      [SM] Fair enough. I would call clocks that are "in sync" albeit with
individual offsets as synchronized, but I am a layman and that might sound
offensively wrong to experts in the field. But even without the naming my
point is that all systems that depend on some idea of shared time-base are
halfway there of exposing that time to end users, by "translating it into an
NTP time source at the modem.

 

 

> For some technologies, in particular 5G!, these considerations are
ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if you don't
believe me! J   

 

      [SM Far be it from me not to believe you, so thanks for the pointers.
Yet, I still think that unless different nodes of a shared segment move at
significantly different speeds, that there should be a common
"tick-duration" for all clocks even if each clock runs at an offset... (I
naively would try to implement something like that by trying to fully
synchronize clocks and maintain a local offset value to convert from
"absolute" time to "network" time, but likely because coming from the
outside I am blissfully unaware of the detail challenges that need to be
solved).

 

Regards & Thanks

      Sebastian

 

 

>  

>  

> > 

> >> 

> >> 

> >>> 

> >>> --trip-times

> >>> enable the measurement of end to end write to read latencies (client
and server clocks must be synchronized)

> > [RWG] --clock-skew

> >     enable the measurement of the wall clock difference between sender
and receiver

> > 

> >> 

> >>    [SM] Sweet!

> >> 

> >> Regards

> >>    Sebastian

> >> 

> >>> 

> >>> Bob

> >>>> I have many kvetches about the new latency under load tests being

> >>>> designed and distributed over the past year. I am delighted! that
they

> >>>> are happening, but most really need third party evaluation, and

> >>>> calibration, and a solid explanation of what network pathologies they

> >>>> do and don't cover. Also a RED team attitude towards them, as well as

> >>>> thinking hard about what you are not measuring (operations research).

> >>>> I actually rather love the new cloudflare speedtest, because it tests

> >>>> a single TCP connection, rather than dozens, and at the same time
folk

> >>>> are complaining that it doesn't find the actual "speed!". yet... the

> >>>> test itself more closely emulates a user experience than
speedtest.net

> >>>> does. I am personally pretty convinced that the fewer numbers of
flows

> >>>> that a web page opens improves the likelihood of a good user

> >>>> experience, but lack data on it.

> >>>> To try to tackle the evaluation and calibration part, I've reached
out

> >>>> to all the new test designers in the hope that we could get together

> >>>> and produce a report of what each new test is actually doing. I've

> >>>> tweeted, linked in, emailed, and spammed every measurement list I
know

> >>>> of, and only to some response, please reach out to other test
designer

> >>>> folks and have them join the rpm email list?

> >>>> My principal kvetches in the new tests so far are:

> >>>> 0) None of the tests last long enough.

> >>>> Ideally there should be a mode where they at least run to "time of

> >>>> first loss", or periodically, just run longer than the

> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

> >>>> there! It's really bad science to optimize the internet for 20

> >>>> seconds. It's like optimizing a car, to handle well, for just 20

> >>>> seconds.

> >>>> 1) Not testing up + down + ping at the same time

> >>>> None of the new tests actually test the same thing that the infamous

> >>>> rrul test does - all the others still test up, then down, and ping.
It

> >>>> was/remains my hope that the simpler parts of the flent test suite -

> >>>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

> >>>> tests would provide calibration to the test designers.

> >>>> we've got zillions of flent results in the archive published here:

> >>>> https://blog.cerowrt.org/post/found_in_flent/

> >>>> ps. Misinformation about iperf 2 impacts my ability to do this.

> >>> 

> >>>> The new tests have all added up + ping and down + ping, but not up +

> >>>> down + ping. Why??

> >>>> The behaviors of what happens in that case are really non-intuitive,
I

> >>>> know, but... it's just one more phase to add to any one of those new

> >>>> tests. I'd be deliriously happy if someone(s) new to the field

> >>>> started doing that, even optionally, and boggled at how it defeated

> >>>> their assumptions.

> >>>> Among other things that would show...

> >>>> It's the home router industry's dirty secret than darn few "gigabit"

> >>>> home routers can actually forward in both directions at a gigabit.
I'd

> >>>> like to smash that perception thoroughly, but given our starting
point

> >>>> is a gigabit router was a "gigabit switch" - and historically been

> >>>> something that couldn't even forward at 200Mbit - we have a long way

> >>>> to go there.

> >>>> Only in the past year have non-x86 home routers appeared that could

> >>>> actually do a gbit in both directions.

> >>>> 2) Few are actually testing within-stream latency

> >>>> Apple's rpm project is making a stab in that direction. It looks

> >>>> highly likely, that with a little more work, crusader and

> >>>> go-responsiveness can finally start sampling the tcp RTT, loss and

> >>>> markings, more directly. As for the rest... sampling TCP_INFO on

> >>>> windows, and Linux, at least, always appeared simple to me, but I'm

> >>>> discovering how hard it is by delving deep into the rust behind

> >>>> crusader.

> >>>> the goresponsiveness thing is also IMHO running WAY too many streams

> >>>> at the same time, I guess motivated by an attempt to have the test

> >>>> complete quickly?

> >>>> B) To try and tackle the validation problem:ps. Misinformation about
iperf 2 impacts my ability to do this.

> >>> 

> >>>> In the libreqos.io project we've established a testbed where tests
can

> >>>> be plunked through various ISP plan network emulations. It's here:

> >>>> https://payne.taht.net (run bandwidth test for what's currently
hooked

> >>>> up)

> >>>> We could rather use an AS number and at least a ipv4/24 and ipv6/48
to

> >>>> leverage with that, so I don't have to nat the various emulations.

> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

> >>>> to see more test designers setup a testbed like this to calibrate

> >>>> their own stuff.

> >>>> Presently we're able to test:

> >>>> flent

> >>>> netperf

> >>>> iperf2

> >>>> iperf3

> >>>> speedtest-cli

> >>>> crusader

> >>>> the broadband forum udp based test:

> >>>> https://github.com/BroadbandForum/obudpst

> >>>> trexx

> >>>> There's also a virtual machine setup that we can remotely drive a web

> >>>> browser from (but I didn't want to nat the results to the world) to

> >>>> test other web services.

> >>>> _______________________________________________

> >>>> Rpm mailing list

> >>>> Rpm@lists.bufferbloat.net

> >>>> https://lists.bufferbloat.net/listinfo/rpm

> >>> _______________________________________________

> >>> Starlink mailing list

> >>> Starlink@lists.bufferbloat.net

> >>> https://lists.bufferbloat.net/listinfo/starlink

> >> 

> >> _______________________________________________

> >> Starlink mailing list

> >> Starlink@lists.bufferbloat.net

> >> https://lists.bufferbloat.net/listinfo/starlink

>  

> _______________________________________________

> Starlink mailing list

> Starlink@lists.bufferbloat.net

> https://lists.bufferbloat.net/listinfo/starlink

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


[-- Attachment #2: Type: text/html, Size: 50849 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-13  8:10           ` Dick Roy
@ 2023-01-15 23:09             ` rjmcmahon
  0 siblings, 0 replies; 19+ messages in thread
From: rjmcmahon @ 2023-01-15 23:09 UTC (permalink / raw)
  To: dickroy
  Cc: 'Sebastian Moeller', 'Rodney W. Grimes',
	mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

hmm, interesting. I'm thinking that GPS PPS is sufficient from iperf 2 & 
classical mechanics perspective.

Have you looked at white rabbit per CERN?

https://kt.cern/article/white-rabbit-cern-born-open-source-technology-sets-new-global-standard-empowering-world#:~:text=White%20Rabbit%20(WR)%20is%20a,the%20field%20of%20particle%20physics.

This discussion does make me question if there is a better metric than 
one way delay, i.e. "speed of causality as limited by network i/o" taken 
per each end of the e2e path? My expertise is quite limited w/respect to 
relativity so I don't know if the below makes any sense or not. I also 
think a core issue is the simultaneity of the start which isn't obvious 
on how to discern.

Does comparing the write blocking times (or frequency) histograms to the 
read blocking times (or frequency) histograms which are coupled by tcp's 
control loop do anything useful? The blocking occurs because of a 
coupling & awating per the remote. Then compare those against a write to 
read thread on the same chip (which I think should be the same in each 
reference frame and the fastest i/o possible for an end.) The frequency 
differences might be due to what you call "interruptions" & one way 
delays (& error) assuming all else equal??

Thanks in advance for any thoughts on this.

Bob
> -----Original Message-----
> From: rjmcmahon [mailto:rjmcmahon@rjmcmahon.com]
> Sent: Thursday, January 12, 2023 11:40 PM
> To: dickroy@alum.mit.edu
> Cc: 'Sebastian Moeller'; 'Rodney W. Grimes';
> mike.reynolds@netforecast.com; 'libreqos'; 'David P. Reed'; 'Rpm';
> 'bloat'
> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in
> USA
> 
> Hi RR,
> 
> I believe quality GPS chips compensate for relativity in pulse per
> 
> second which is needed to get position accuracy.
> 
> _[RR] Of course they do.  That 38usec/day really matters! They assume
> they know what the gravitational potential is where they are, and they
> can estimate the potential at the satellites so they can compensate,
> and they do.  Point is, a GPS unit at Lake Tahoe (6250') runs faster
> than the one in San Francisco (sea level).  How do you think these two
> "should be synchronized"!   How do you define "synchronization" in
> this case?  You synchronize those two clocks, then what about all the
> other clocks at Lake Tahoe (or SF or anywhere in between for that
> matter __J)??? These are not trivial questions. However if all one
> cares about is seconds or milliseconds, then you can argue that we
> (earthlings on planet earth) can "sweep such facts under the
> proverbial rug" for the purposes of latency in communication networks
> and that's certainly doable.  Don't tell that to the guys whose
> protocols require "synchronization of all unit to nanoseconds" though!
>  They will be very, very unhappy __J __J And you know who you are __J
> __J _
> 
> _ _
> 
> _J_
> 
> Bob
> 
>> Hi Sebastian (et. al.),
> 
>> 
> 
>> [I'll comment up here instead of inline.]
> 
>> 
> 
>> Let me start by saying that I have not been intimately involved with
> 
> 
>> the IEEE 1588 effort (PTP), however I was involved in the 802.11
> 
>> efforts along a similar vein, just adding the wireless first hop
> 
>> component and it's effects on PTP.
> 
>> 
> 
>> What was apparent from the outset was that there was a lack of
> 
>> understanding what the terms "to synchronize" or "to be
> synchronized"
> 
>> actually mean.  It's not trivial … because we live in a
> 
>> (approximately, that's another story!) 4-D space-time continuum
> where
> 
>> the Lorentz metric plays a critical role.  Therein, simultaneity
> (aka
> 
>> "things happening at the same time") means the "distance" between
> two
> 
>> such events is zero and that distance is given by sqrt(x^2 + y^2 +
> z^2
> 
>> - (ct)^2) and the "thing happening" can be the tick of a clock
> 
>> somewhere. Now since everything is relative (time with respect to
> 
>> what? / location with respect to where?) it's pretty easy to see
> that
> 
>> "if you don't know where you are, you can't know what time it is!"
> 
>> (English sailors of the 18th century knew this well!) Add to this
> the
> 
>> fact that if everything were stationary, nothing would happen (as
> 
>> Einstein said "Nothing happens until something moves!"), special
> 
>> relativity also pays a role.  Clocks on GPS satellites run approx.
> 
>> 7usecs/day slower than those on earth due to their "speed" (8700 mph
> 
> 
>> roughly)! Then add the consequence that without mass we wouldn't
> exist
> 
>> (in these forms at leastJ), and gravitational effects (aka General
> 
>> Relativity) come into play. Those turn out to make clocks on GPS
> 
>> satellites run 45usec/day faster than those on earth!  The net
> effect
> 
>> is that GPS clocks run about 38usec/day faster than clocks on earth.
> 
> 
>> So what does it mean to "synchronize to GPS"?  Point is: it's a
> 
>> non-trivial question with a very complicated answer.  The reason it
> is
> 
>> important to get all this right is that the "what that ties time and
> 
> 
>> space together" is the speed of light and that turns out to be a
> 
>> "foot-per-nanosecond" in a vacuum (roughly 300m/usec).  This means
> if
> 
>> I am uncertain about my location to say 300 meters, then I also am
> not
> 
>> sure what time it is to a usec AND vice-versa!
> 
>> 
> 
>> All that said, the simplest explanation of synchronization is
> 
>> probably: Two clocks are synchronized if, when they are brought
> 
>> (slowly) into physical proximity ("sat next to each other") in the
> 
>> same (quasi-)inertial frame and the same gravitational potential
> (not
> 
>> so obvious BTW … see the FYI below!), an observer of both would
> say
> 
>> "they are keeping time identically". Since this experiment is rarely
> 
> 
>> possible, one can never be "sure" that his clock is synchronized to
> 
>> any other clock elsewhere. And what does it mean to say they "were
> 
>> synchronized" when brought together, but now they are not because
> they
> 
>> are now in different gravitational potentials! (FYI, there are land
> 
>> mine detectors being developed on this very principle! I know
> someone
> 
>> who actually worked on such a project!)
> 
>> 
> 
>> This all gets even more complicated when dealing with large networks
> 
> 
>> of networks in which the "speed of information transmission" can
> vary
> 
>> depending on the medium (cf. coaxial cables versus fiber versus
> 
>> microwave links!) In fact, the atmosphere is one of those media and
> 
>> variations therein result in the need for "GPS corrections" (cf.
> RTCM
> 
>> GPS correction messages, RTK, etc.) in order to get to sub-nsec/cm
> 
>> accuracy.  Point is if you have a set of nodes distributed across
> the
> 
>> country all with GPS and all "synchronized to GPS time", and a
> second
> 
>> identical set of nodes (with no GPS) instead connected with a
> network
> 
>> of cables and fiber links, all of different lengths and composition
> 
>> using different carrier frequencies (dielectric constants vary with
> 
>> frequency!) "synchronized" to some clock somewhere using NTP or
> PTP),
> 
>> the synchronization of the two sets will be different unless a
> common
> 
>> reference clock is used AND all the above effects are taken into
> 
>> account, and good luck with that! J
> 
>> 
> 
>> In conclusion, if anyone tells you that clock synchronization in
> 
>> communication networks is simple ("Just use GPS!"), you should feel
> 
>> free to chuckle (under your breath if necessaryJ)
> 
>> 
> 
>> Cheers,
> 
>> 
> 
>> RR
> 
>> 
> 
>> -----Original Message-----
> 
>> From: Sebastian Moeller [mailto:moeller0@gmx.de]
> 
>> Sent: Thursday, January 12, 2023 12:23 AM
> 
>> To: Dick Roy
> 
>> Cc: Rodney W. Grimes; mike.reynolds@netforecast.com; libreqos; David
> 
> 
>> P. Reed; Rpm; rjmcmahon; bloat
> 
>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
> in
> 
>> USA
> 
>> 
> 
>> Hi RR,
> 
>> 
> 
>>> On Jan 11, 2023, at 22:46, Dick Roy <dickroy@alum.mit.edu> wrote:
> 
>> 
> 
>>> 
> 
>> 
> 
>>> 
> 
>> 
> 
>>> 
> 
>> 
> 
>>> -----Original Message-----
> 
>> 
> 
>>> From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On
> 
>> Behalf Of Sebastian Moeller via Starlink
> 
>> 
> 
>>> Sent: Wednesday, January 11, 2023 12:01 PM
> 
>> 
> 
>>> To: Rodney W. Grimes
> 
>> 
> 
>>> Cc: Dave Taht via Starlink; mike.reynolds@netforecast.com;
> libreqos;
> 
>> David P. Reed; Rpm; rjmcmahon; bloat
> 
>> 
> 
>>> Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers
> 
>> in USA
> 
>> 
> 
>>> 
> 
>> 
> 
>>> Hi Rodney,
> 
>> 
> 
>>> 
> 
>> 
> 
>>> 
> 
>> 
> 
>>> 
> 
>> 
> 
>>> 
> 
>> 
> 
>>> > On Jan 11, 2023, at 19:32, Rodney W. Grimes
> 
>> <starlink@gndrsh.dnsmgr.net> wrote:
> 
>> 
> 
>>> >
> 
>> 
> 
>>> > Hello,
> 
>> 
> 
>>> >
> 
>> 
> 
>>> >     Yall can call me crazy if you want.. but... see below [RWG]
> 
>> 
> 
>>> >> Hi Bib,
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
> 
>> <starlink@lists.bufferbloat.net> wrote:
> 
>> 
> 
>>> >>>
> 
>> 
> 
>>> >>> My biggest barrier is the lack of clock sync by the devices,
> 
>> i.e. very limited support for PTP in data centers and in end
> devices.
> 
>> This limits the ability to measure one way delays (OWD) and most
> 
>> assume that OWD is 1/2 and RTT which typically is a mistake. We know
> 
> 
>> this intuitively with airplane flight times or even car commute
> times
> 
>> where the one way time is not 1/2 a round trip time. Google maps &
> 
>> directions provide a time estimate for the one way link. It doesn't
> 
>> compute a round trip and divide by two.
> 
>> 
> 
>>> >>>
> 
>> 
> 
>>> >>> For those that can get clock sync working, the iperf 2
> 
>> --trip-times options is useful.
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>    [SM] +1; and yet even with unsynchronized clocks one can try
> 
>> to measure how latency changes under load and that can be done per
> 
>> direction. Sure this is far inferior to real reliably measured OWDs,
> 
> 
>> but if life/the internet deals you lemons....
> 
>> 
> 
>>> >
> 
>> 
> 
>>> > [RWG] iperf2/iperf3, etc are already moving large amounts of data
> 
> 
>> back and forth, for that matter any rate test, why not abuse some of
> 
> 
>> that data and add the fundemental NTP clock sync data and
> 
>> bidirectionally pass each others concept of "current time".  IIRC
> (its
> 
>> been 25 years since I worked on NTP at this level) you *should* be
> 
>> able to get a fairly accurate clock delta between each end, and then
> 
> 
>> use that info and time stamps in the data stream to compute OWD's.
> 
>> You need to put 4 time stamps in the packet, and with that you can
> 
>> compute "offset".
> 
>> 
> 
>>> [RR] For this to work at a reasonable level of accuracy, the
> 
>> timestamping circuits on both ends need to be deterministic and
> 
>> repeatable as I recall. Any uncertainty in that process adds to
> 
>> synchronization errors/uncertainties.
> 
>> 
> 
>>> 
> 
>> 
> 
>>>       [SM] Nice idea. I would guess that all timeslot based access
> 
>> technologies (so starlink, docsis, GPON, LTE?) all distribute "high
> 
>> quality time" carefully to the "modems", so maybe all that would be
> 
>> needed is to expose that high quality time to the LAN side of those
> 
>> modems, dressed up as NTP server?
> 
>> 
> 
>>> [RR] It's not that simple!  Distributing "high-quality time", i.e.
> 
>> "synchronizing all clocks" does not solve the communication problem
> in
> 
>> synchronous slotted MAC/PHYs!
> 
>> 
> 
>>       [SM] I happily believe you, but the same idea of "time slot"
> 
>> needs to be shared by all nodes, no? So the clockss need to be
> 
>> reasonably similar rate, aka synchronized (see below).
> 
>> 
> 
>>>  All the technologies you mentioned above are essentially P2P, not
> 
>> intended for broadcast.  Point is, there is a point controller (aka
> 
>> PoC) often called a base station (eNodeB, gNodeB, …) that actually
> 
> 
>> "controls everything that is necessary to control" at the UE
> including
> 
>> time, frequency and sampling time offsets, and these are critical to
> 
> 
>> get right if you want to communicate, and they are ALL subject to
> the
> 
>> laws of physics (cf. the speed of light)! Turns out that what is
> 
>> necessary for the system to function anywhere near capacity, is for
> 
>> all the clocks governing transmissions from the UEs to be
> 
>> "unsynchronized" such that all the UE transmissions arrive at the
> PoC
> 
>> at the same (prescribed) time!
> 
>> 
> 
>>       [SM] Fair enough. I would call clocks that are "in sync"
> albeit
> 
>> with individual offsets as synchronized, but I am a layman and that
> 
>> might sound offensively wrong to experts in the field. But even
> 
>> without the naming my point is that all systems that depend on some
> 
>> idea of shared time-base are halfway there of exposing that time to
> 
>> end users, by "translating it into an NTP time source at the modem.
> 
>> 
> 
>>> For some technologies, in particular 5G!, these considerations are
> 
>> ESSENTIAL. Feel free to scour the 3GPP LTE 5G RLC and PHY specs if
> you
> 
>> don't believe me! J
> 
>> 
> 
>>       [SM Far be it from me not to believe you, so thanks for the
> 
>> pointers. Yet, I still think that unless different nodes of a shared
> 
> 
>> segment move at significantly different speeds, that there should be
> a
> 
>> common "tick-duration" for all clocks even if each clock runs at an
> 
>> offset... (I naively would try to implement something like that by
> 
>> trying to fully synchronize clocks and maintain a local offset value
> 
> 
>> to convert from "absolute" time to "network" time, but likely
> because
> 
>> coming from the outside I am blissfully unaware of the detail
> 
>> challenges that need to be solved).
> 
>> 
> 
>> Regards & Thanks
> 
>> 
> 
>>       Sebastian
> 
>> 
> 
>>> 
> 
>> 
> 
>>> 
> 
>> 
> 
>>> >
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>>
> 
>> 
> 
>>> >>> --trip-times
> 
>> 
> 
>>> >>> enable the measurement of end to end write to read latencies
> 
>> (client and server clocks must be synchronized)
> 
>> 
> 
>>> > [RWG] --clock-skew
> 
>> 
> 
>>> >     enable the measurement of the wall clock difference between
> 
>> sender and receiver
> 
>> 
> 
>>> >
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>    [SM] Sweet!
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >> Regards
> 
>> 
> 
>>> >>    Sebastian
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >>>
> 
>> 
> 
>>> >>> Bob
> 
>> 
> 
>>> >>>> I have many kvetches about the new latency under load tests
> 
>> being
> 
>> 
> 
>>> >>>> designed and distributed over the past year. I am delighted!
> 
>> that they
> 
>> 
> 
>>> >>>> are happening, but most really need third party evaluation,
> and
> 
>> 
> 
>> 
> 
>>> >>>> calibration, and a solid explanation of what network
> 
>> pathologies they
> 
>> 
> 
>>> >>>> do and don't cover. Also a RED team attitude towards them, as
> 
>> well as
> 
>> 
> 
>>> >>>> thinking hard about what you are not measuring (operations
> 
>> research).
> 
>> 
> 
>>> >>>> I actually rather love the new cloudflare speedtest, because
> it
> 
>> tests
> 
>> 
> 
>>> >>>> a single TCP connection, rather than dozens, and at the same
> 
>> time folk
> 
>> 
> 
>>> >>>> are complaining that it doesn't find the actual "speed!".
> 
>> yet... the
> 
>> 
> 
>>> >>>> test itself more closely emulates a user experience than
> 
>> speedtest.net
> 
>> 
> 
>>> >>>> does. I am personally pretty convinced that the fewer numbers
> 
>> of flows
> 
>> 
> 
>>> >>>> that a web page opens improves the likelihood of a good user
> 
>> 
> 
>>> >>>> experience, but lack data on it.
> 
>> 
> 
>>> >>>> To try to tackle the evaluation and calibration part, I've
> 
>> reached out
> 
>> 
> 
>>> >>>> to all the new test designers in the hope that we could get
> 
>> together
> 
>> 
> 
>>> >>>> and produce a report of what each new test is actually doing.
> 
>> I've
> 
>> 
> 
>>> >>>> tweeted, linked in, emailed, and spammed every measurement
> list
> 
>> I know
> 
>> 
> 
>>> >>>> of, and only to some response, please reach out to other test
> 
>> designer
> 
>> 
> 
>>> >>>> folks and have them join the rpm email list?
> 
>> 
> 
>>> >>>> My principal kvetches in the new tests so far are:
> 
>> 
> 
>>> >>>> 0) None of the tests last long enough.
> 
>> 
> 
>>> >>>> Ideally there should be a mode where they at least run to
> "time
> 
>> of
> 
>> 
> 
>>> >>>> first loss", or periodically, just run longer than the
> 
>> 
> 
>>> >>>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be
> 
>> dragons
> 
>> 
> 
>>> >>>> there! It's really bad science to optimize the internet for 20
> 
> 
>> 
> 
>>> >>>> seconds. It's like optimizing a car, to handle well, for just
> 
>> 20
> 
>> 
> 
>>> >>>> seconds.
> 
>> 
> 
>>> >>>> 1) Not testing up + down + ping at the same time
> 
>> 
> 
>>> >>>> None of the new tests actually test the same thing that the
> 
>> infamous
> 
>> 
> 
>>> >>>> rrul test does - all the others still test up, then down, and
> 
>> ping. It
> 
>> 
> 
>>> >>>> was/remains my hope that the simpler parts of the flent test
> 
>> suite -
> 
>> 
> 
>>> >>>> such as the tcp_up_squarewave tests, the rrul test, and the
> 
>> rtt_fair
> 
>> 
> 
>>> >>>> tests would provide calibration to the test designers.
> 
>> 
> 
>>> >>>> we've got zillions of flent results in the archive published
> 
>> here:
> 
>> 
> 
>>> >>>> https://blog.cerowrt.org/post/found_in_flent/
> 
>> 
> 
>>> >>>> ps. Misinformation about iperf 2 impacts my ability to do
> this.
> 
>> 
> 
>> 
> 
>>> >>>
> 
>> 
> 
>>> >>>> The new tests have all added up + ping and down + ping, but
> not
> 
>> up +
> 
>> 
> 
>>> >>>> down + ping. Why??
> 
>> 
> 
>>> >>>> The behaviors of what happens in that case are really
> 
>> non-intuitive, I
> 
>> 
> 
>>> >>>> know, but... it's just one more phase to add to any one of
> 
>> those new
> 
>> 
> 
>>> >>>> tests. I'd be deliriously happy if someone(s) new to the field
> 
> 
>> 
> 
>>> >>>> started doing that, even optionally, and boggled at how it
> 
>> defeated
> 
>> 
> 
>>> >>>> their assumptions.
> 
>> 
> 
>>> >>>> Among other things that would show...
> 
>> 
> 
>>> >>>> It's the home router industry's dirty secret than darn few
> 
>> "gigabit"
> 
>> 
> 
>>> >>>> home routers can actually forward in both directions at a
> 
>> gigabit. I'd
> 
>> 
> 
>>> >>>> like to smash that perception thoroughly, but given our
> 
>> starting point
> 
>> 
> 
>>> >>>> is a gigabit router was a "gigabit switch" - and historically
> 
>> been
> 
>> 
> 
>>> >>>> something that couldn't even forward at 200Mbit - we have a
> 
>> long way
> 
>> 
> 
>>> >>>> to go there.
> 
>> 
> 
>>> >>>> Only in the past year have non-x86 home routers appeared that
> 
>> could
> 
>> 
> 
>>> >>>> actually do a gbit in both directions.
> 
>> 
> 
>>> >>>> 2) Few are actually testing within-stream latency
> 
>> 
> 
>>> >>>> Apple's rpm project is making a stab in that direction. It
> 
>> looks
> 
>> 
> 
>>> >>>> highly likely, that with a little more work, crusader and
> 
>> 
> 
>>> >>>> go-responsiveness can finally start sampling the tcp RTT, loss
> 
> 
>> and
> 
>> 
> 
>>> >>>> markings, more directly. As for the rest... sampling TCP_INFO
> 
>> on
> 
>> 
> 
>>> >>>> windows, and Linux, at least, always appeared simple to me,
> but
> 
>> I'm
> 
>> 
> 
>>> >>>> discovering how hard it is by delving deep into the rust
> behind
> 
>> 
> 
>> 
> 
>>> >>>> crusader.
> 
>> 
> 
>>> >>>> the goresponsiveness thing is also IMHO running WAY too many
> 
>> streams
> 
>> 
> 
>>> >>>> at the same time, I guess motivated by an attempt to have the
> 
>> test
> 
>> 
> 
>>> >>>> complete quickly?
> 
>> 
> 
>>> >>>> B) To try and tackle the validation problem:ps. Misinformation
> 
> 
>> about iperf 2 impacts my ability to do this.
> 
>> 
> 
>>> >>>
> 
>> 
> 
>>> >>>> In the libreqos.io project we've established a testbed where
> 
>> tests can
> 
>> 
> 
>>> >>>> be plunked through various ISP plan network emulations. It's
> 
>> here:
> 
>> 
> 
>>> >>>> https://payne.taht.net (run bandwidth test for what's
> currently
> 
>> hooked
> 
>> 
> 
>>> >>>> up)
> 
>> 
> 
>>> >>>> We could rather use an AS number and at least a ipv4/24 and
> 
>> ipv6/48 to
> 
>> 
> 
>>> >>>> leverage with that, so I don't have to nat the various
> 
>> emulations.
> 
>> 
> 
>>> >>>> (and funding, anyone got funding?) Or, as the code is GPLv2
> 
>> licensed,
> 
>> 
> 
>>> >>>> to see more test designers setup a testbed like this to
> 
>> calibrate
> 
>> 
> 
>>> >>>> their own stuff.
> 
>> 
> 
>>> >>>> Presently we're able to test:
> 
>> 
> 
>>> >>>> flent
> 
>> 
> 
>>> >>>> netperf
> 
>> 
> 
>>> >>>> iperf2
> 
>> 
> 
>>> >>>> iperf3
> 
>> 
> 
>>> >>>> speedtest-cli
> 
>> 
> 
>>> >>>> crusader
> 
>> 
> 
>>> >>>> the broadband forum udp based test:
> 
>> 
> 
>>> >>>> https://github.com/BroadbandForum/obudpst
> 
>> 
> 
>>> >>>> trexx
> 
>> 
> 
>>> >>>> There's also a virtual machine setup that we can remotely
> drive
> 
>> a web
> 
>> 
> 
>>> >>>> browser from (but I didn't want to nat the results to the
> 
>> world) to
> 
>> awhile
> 
>>> >>>> test other web services.
> 
>> 
> 
>>> >>>> _______________________________________________
> 
>> 
> 
>>> >>>> Rpm mailing list
> 
>> 
> 
>>> >>>> Rpm@lists.bufferbloat.net
> 
>> 
> 
>>> >>>> https://lists.bufferbloat.net/listinfo/rpm
> 
>> 
> 
>>> >>> _______________________________________________
> 
>> 
> 
>>> >>> Starlink mailing list
> 
>> 
> 
>>> >>> Starlink@lists.bufferbloat.net
> 
>> 
> 
>>> >>> https://lists.bufferbloat.net/listinfo/starlink
> 
>> 
> 
>>> >>
> 
>> 
> 
>>> >> _______________________________________________
> 
>> 
> 
>>> >> Starlink mailing list
> 
>> 
> 
>>> >> Starlink@lists.bufferbloat.net
> 
>> 
> 
>>> >> https://lists.bufferbloat.net/listinfo/starlink
> 
>> 
> 
>>> 
> 
>> 
> 
>>> _______________________________________________
> 
>> 
> 
>>> Starlink mailing list
> 
>> 
> 
>>> Starlink@lists.bufferbloat.net
> 
>> 
> 
>>> https://lists.bufferbloat.net/listinfo/starlink

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-09 20:46           ` rjmcmahon
@ 2023-01-09 21:02             ` Dick Roy
  0 siblings, 0 replies; 19+ messages in thread
From: Dick Roy @ 2023-01-09 21:02 UTC (permalink / raw)
  To: 'rjmcmahon', 'Dave Taht'
  Cc: mike.reynolds, 'libreqos', 'David P. Reed',
	'Rpm', 'bloat'

[-- Attachment #1: Type: text/plain, Size: 10301 bytes --]

 

 

-----Original Message-----
From: Starlink [mailto:starlink-bounces@lists.bufferbloat.net] On Behalf Of
rjmcmahon via Starlink
Sent: Monday, January 9, 2023 12:47 PM
To: Dave Taht
Cc: starlink@lists.bufferbloat.net; mike.reynolds@netforecast.com; libreqos;
David P. Reed; Rpm; bloat
Subject: Re: [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

 

The write to read latencies (OWD) are on the server side in CLT form. 

Use --histograms on the server side to enable them.

 

Your client side sampled TCP RTT is 6ms with less than a 1 ms of 

variance (or sqrt of variance as variance is typically squared)

[RR] or standard deviation (std for short) :-)

  No 

retries suggest the network isn't dropping packets.

 

All the newer bounceback code is only master and requires a compile from 

source. It will be released in 2.1.9 after testing cycles. Hopefully, in 

early March 2023

 

Bob

 

https://sourceforge.net/projects/iperf2/

 

> The DC that so graciously loaned us 3 machines for the testbed (thx

> equinix!), does support ptp, but we have not configured it yet. In ntp

> tests between these hosts we seem to be within 500us, and certainly

> 50us would be great, in the future.

> 

> I note that in all my kvetching about the new tests' needing

> validation today... I kind of elided that I'm pretty happy with

> iperf2's new tests that landed last august, and are now appearing in

> linux package managers around the world. I hope more folk use them.

> (sorry robert, it's been a long time since last august!)

> 

> Our new testbed has multiple setups. In one setup - basically the

> machine name is equal to a given ISP plan, and a key testing point is

> looking at the differences between the FCC 25-3 and 100/20 plans in

> the real world. However at our scale (25gbit) it turned out that

> emulating the delay realistically has problematic.

> 

> Anyway, here's a 25/3 result for iperf (other results and iperf test

> type requests gladly accepted)

> 

> root@lqos:~# iperf -6 --trip-times -c c25-3 -e -i 1

> ------------------------------------------------------------

> Client connecting to c25-3, TCP port 5001 with pid 2146556 (1 flows)

> Write buffer size: 131072 Byte

> TOS set to 0x0 (Nagle on)

> TCP window size: 85.3 KByte (default)

> ------------------------------------------------------------

> [  1] local fd77::3%bond0.4 port 59396 connected with fd77::1:2 port

> 5001 (trip-times) (sock=3) (icwnd/mss/irtt=13/1428/948) (ct=1.10 ms)

> on 2023-01-09 20:13:37 (UTC)

> [ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry

>    Cwnd/RTT(var)        NetPwr

> [  1] 0.0000-1.0000 sec  3.25 MBytes  27.3 Mbits/sec  26/0          0

>      19K/6066(262) us  562

> [  1] 1.0000-2.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>      15K/4671(207) us  673

> [  1] 2.0000-3.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>      13K/5538(280) us  568

> [  1] 3.0000-4.0000 sec  3.12 MBytes  26.2 Mbits/sec  25/0          0

>      16K/6244(355) us  525

> [  1] 4.0000-5.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>      19K/6152(216) us  511

> [  1] 5.0000-6.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>      22K/6764(529) us  465

> [  1] 6.0000-7.0000 sec  3.12 MBytes  26.2 Mbits/sec  25/0          0

>      15K/5918(605) us  554

> [  1] 7.0000-8.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>      18K/5178(327) us  608

> [  1] 8.0000-9.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>      19K/5758(473) us  546

> [  1] 9.0000-10.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0

>       16K/6141(280) us  512

> [  1] 0.0000-10.0952 sec  30.6 MBytes  25.4 Mbits/sec  245/0

> 0       19K/5924(491) us  537

> 

> 

> On Mon, Jan 9, 2023 at 11:13 AM rjmcmahon <rjmcmahon@rjmcmahon.com> 

> wrote:

>> 

>> My biggest barrier is the lack of clock sync by the devices, i.e. very

>> limited support for PTP in data centers and in end devices. This 

>> limits

>> the ability to measure one way delays (OWD) and most assume that OWD 

>> is

>> 1/2 and RTT which typically is a mistake. We know this intuitively 

>> with

>> airplane flight times or even car commute times where the one way time

>> is not 1/2 a round trip time. Google maps & directions provide a time

>> estimate for the one way link. It doesn't compute a round trip and

>> divide by two.

>> 

>> For those that can get clock sync working, the iperf 2 --trip-times

>> options is useful.

>> 

>> --trip-times

>>    enable the measurement of end to end write to read latencies 

>> (client

>> and server clocks must be synchronized)

>> 

>> Bob

>> > I have many kvetches about the new latency under load tests being

>> > designed and distributed over the past year. I am delighted! that they

>> > are happening, but most really need third party evaluation, and

>> > calibration, and a solid explanation of what network pathologies they

>> > do and don't cover. Also a RED team attitude towards them, as well as

>> > thinking hard about what you are not measuring (operations research).

>> >

>> > I actually rather love the new cloudflare speedtest, because it tests

>> > a single TCP connection, rather than dozens, and at the same time folk

>> > are complaining that it doesn't find the actual "speed!". yet... the

>> > test itself more closely emulates a user experience than speedtest.net

>> > does. I am personally pretty convinced that the fewer numbers of flows

>> > that a web page opens improves the likelihood of a good user

>> > experience, but lack data on it.

>> >

>> > To try to tackle the evaluation and calibration part, I've reached out

>> > to all the new test designers in the hope that we could get together

>> > and produce a report of what each new test is actually doing. I've

>> > tweeted, linked in, emailed, and spammed every measurement list I know

>> > of, and only to some response, please reach out to other test designer

>> > folks and have them join the rpm email list?

>> >

>> > My principal kvetches in the new tests so far are:

>> >

>> > 0) None of the tests last long enough.

>> >

>> > Ideally there should be a mode where they at least run to "time of

>> > first loss", or periodically, just run longer than the

>> > industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons

>> > there! It's really bad science to optimize the internet for 20

>> > seconds. It's like optimizing a car, to handle well, for just 20

>> > seconds.

>> >

>> > 1) Not testing up + down + ping at the same time

>> >

>> > None of the new tests actually test the same thing that the infamous

>> > rrul test does - all the others still test up, then down, and ping. It

>> > was/remains my hope that the simpler parts of the flent test suite -

>> > such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair

>> > tests would provide calibration to the test designers.

>> >

>> > we've got zillions of flent results in the archive published here:

>> > https://blog.cerowrt.org/post/found_in_flent/

>> > ps. Misinformation about iperf 2 impacts my ability to do this.

>> 

>> > The new tests have all added up + ping and down + ping, but not up +

>> > down + ping. Why??

>> >

>> > The behaviors of what happens in that case are really non-intuitive, I

>> > know, but... it's just one more phase to add to any one of those new

>> > tests. I'd be deliriously happy if someone(s) new to the field

>> > started doing that, even optionally, and boggled at how it defeated

>> > their assumptions.

>> >

>> > Among other things that would show...

>> >

>> > It's the home router industry's dirty secret than darn few "gigabit"

>> > home routers can actually forward in both directions at a gigabit. I'd

>> > like to smash that perception thoroughly, but given our starting point

>> > is a gigabit router was a "gigabit switch" - and historically been

>> > something that couldn't even forward at 200Mbit - we have a long way

>> > to go there.

>> >

>> > Only in the past year have non-x86 home routers appeared that could

>> > actually do a gbit in both directions.

>> >

>> > 2) Few are actually testing within-stream latency

>> >

>> > Apple's rpm project is making a stab in that direction. It looks

>> > highly likely, that with a little more work, crusader and

>> > go-responsiveness can finally start sampling the tcp RTT, loss and

>> > markings, more directly. As for the rest... sampling TCP_INFO on

>> > windows, and Linux, at least, always appeared simple to me, but I'm

>> > discovering how hard it is by delving deep into the rust behind

>> > crusader.

>> >

>> > the goresponsiveness thing is also IMHO running WAY too many streams

>> > at the same time, I guess motivated by an attempt to have the test

>> > complete quickly?

>> >

>> > B) To try and tackle the validation problem:ps. Misinformation about

>> > iperf 2 impacts my ability to do this.

>> 

>> >

>> > In the libreqos.io project we've established a testbed where tests can

>> > be plunked through various ISP plan network emulations. It's here:

>> > https://payne.taht.net (run bandwidth test for what's currently hooked

>> > up)

>> >

>> > We could rather use an AS number and at least a ipv4/24 and ipv6/48 to

>> > leverage with that, so I don't have to nat the various emulations.

>> > (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,

>> > to see more test designers setup a testbed like this to calibrate

>> > their own stuff.

>> >

>> > Presently we're able to test:

>> > flent

>> > netperf

>> > iperf2

>> > iperf3

>> > speedtest-cli

>> > crusader

>> > the broadband forum udp based test:

>> > https://github.com/BroadbandForum/obudpst

>> > trexx

>> >

>> > There's also a virtual machine setup that we can remotely drive a web

>> > browser from (but I didn't want to nat the results to the world) to

>> > test other web services.

>> > _______________________________________________

>> > Rpm mailing list

>> > Rpm@lists.bufferbloat.net

>> > https://lists.bufferbloat.net/listinfo/rpm

_______________________________________________

Starlink mailing list

Starlink@lists.bufferbloat.net

https://lists.bufferbloat.net/listinfo/starlink


[-- Attachment #2: Type: text/html, Size: 40564 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA
  2023-01-09 19:13       ` [Bloat] [Rpm] " rjmcmahon
@ 2023-01-09 19:47         ` Sebastian Moeller
  2023-01-09 20:20         ` [Bloat] [Rpm] [Starlink] " Dave Taht
  1 sibling, 0 replies; 19+ messages in thread
From: Sebastian Moeller @ 2023-01-09 19:47 UTC (permalink / raw)
  To: rjmcmahon
  Cc: Dave Täht, Dave Taht via Starlink, mike.reynolds, libreqos,
	David P. Reed, Rpm, bloat

Hi Bib,


> On Jan 9, 2023, at 20:13, rjmcmahon via Starlink <starlink@lists.bufferbloat.net> wrote:
> 
> My biggest barrier is the lack of clock sync by the devices, i.e. very limited support for PTP in data centers and in end devices. This limits the ability to measure one way delays (OWD) and most assume that OWD is 1/2 and RTT which typically is a mistake. We know this intuitively with airplane flight times or even car commute times where the one way time is not 1/2 a round trip time. Google maps & directions provide a time estimate for the one way link. It doesn't compute a round trip and divide by two.
> 
> For those that can get clock sync working, the iperf 2 --trip-times options is useful.

	[SM] +1; and yet even with unsynchronized clocks one can try to measure how latency changes under load and that can be done per direction. Sure this is far inferior to real reliably measured OWDs, but if life/the internet deals you lemons....


> 
> --trip-times
>  enable the measurement of end to end write to read latencies (client and server clocks must be synchronized)

	[SM] Sweet!

Regards
	Sebastian

> 
> Bob
>> I have many kvetches about the new latency under load tests being
>> designed and distributed over the past year. I am delighted! that they
>> are happening, but most really need third party evaluation, and
>> calibration, and a solid explanation of what network pathologies they
>> do and don't cover. Also a RED team attitude towards them, as well as
>> thinking hard about what you are not measuring (operations research).
>> I actually rather love the new cloudflare speedtest, because it tests
>> a single TCP connection, rather than dozens, and at the same time folk
>> are complaining that it doesn't find the actual "speed!". yet... the
>> test itself more closely emulates a user experience than speedtest.net
>> does. I am personally pretty convinced that the fewer numbers of flows
>> that a web page opens improves the likelihood of a good user
>> experience, but lack data on it.
>> To try to tackle the evaluation and calibration part, I've reached out
>> to all the new test designers in the hope that we could get together
>> and produce a report of what each new test is actually doing. I've
>> tweeted, linked in, emailed, and spammed every measurement list I know
>> of, and only to some response, please reach out to other test designer
>> folks and have them join the rpm email list?
>> My principal kvetches in the new tests so far are:
>> 0) None of the tests last long enough.
>> Ideally there should be a mode where they at least run to "time of
>> first loss", or periodically, just run longer than the
>> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> there! It's really bad science to optimize the internet for 20
>> seconds. It's like optimizing a car, to handle well, for just 20
>> seconds.
>> 1) Not testing up + down + ping at the same time
>> None of the new tests actually test the same thing that the infamous
>> rrul test does - all the others still test up, then down, and ping. It
>> was/remains my hope that the simpler parts of the flent test suite -
>> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> tests would provide calibration to the test designers.
>> we've got zillions of flent results in the archive published here:
>> https://blog.cerowrt.org/post/found_in_flent/
>> ps. Misinformation about iperf 2 impacts my ability to do this.
> 
>> The new tests have all added up + ping and down + ping, but not up +
>> down + ping. Why??
>> The behaviors of what happens in that case are really non-intuitive, I
>> know, but... it's just one more phase to add to any one of those new
>> tests. I'd be deliriously happy if someone(s) new to the field
>> started doing that, even optionally, and boggled at how it defeated
>> their assumptions.
>> Among other things that would show...
>> It's the home router industry's dirty secret than darn few "gigabit"
>> home routers can actually forward in both directions at a gigabit. I'd
>> like to smash that perception thoroughly, but given our starting point
>> is a gigabit router was a "gigabit switch" - and historically been
>> something that couldn't even forward at 200Mbit - we have a long way
>> to go there.
>> Only in the past year have non-x86 home routers appeared that could
>> actually do a gbit in both directions.
>> 2) Few are actually testing within-stream latency
>> Apple's rpm project is making a stab in that direction. It looks
>> highly likely, that with a little more work, crusader and
>> go-responsiveness can finally start sampling the tcp RTT, loss and
>> markings, more directly. As for the rest... sampling TCP_INFO on
>> windows, and Linux, at least, always appeared simple to me, but I'm
>> discovering how hard it is by delving deep into the rust behind
>> crusader.
>> the goresponsiveness thing is also IMHO running WAY too many streams
>> at the same time, I guess motivated by an attempt to have the test
>> complete quickly?
>> B) To try and tackle the validation problem:ps. Misinformation about iperf 2 impacts my ability to do this.
> 
>> In the libreqos.io project we've established a testbed where tests can
>> be plunked through various ISP plan network emulations. It's here:
>> https://payne.taht.net (run bandwidth test for what's currently hooked
>> up)
>> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> leverage with that, so I don't have to nat the various emulations.
>> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> to see more test designers setup a testbed like this to calibrate
>> their own stuff.
>> Presently we're able to test:
>> flent
>> netperf
>> iperf2
>> iperf3
>> speedtest-cli
>> crusader
>> the broadband forum udp based test:
>> https://github.com/BroadbandForum/obudpst
>> trexx
>> There's also a virtual machine setup that we can remotely drive a web
>> browser from (but I didn't want to nat the results to the world) to
>> test other web services.
>> _______________________________________________
>> Rpm mailing list
>> Rpm@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm
> _______________________________________________
> Starlink mailing list
> Starlink@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-01-15 23:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <202301111832.30BIWevV030127@gndrsh.dnsmgr.net>
2023-01-11 20:01 ` [Bloat] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA Sebastian Moeller
2023-01-11 21:46   ` Dick Roy
2023-01-12  8:22     ` Sebastian Moeller
2023-01-12 18:02       ` rjmcmahon
2023-01-12 21:34         ` Dick Roy
2023-01-12 20:39       ` Dick Roy
2023-01-13  7:33         ` Sebastian Moeller
2023-01-13  8:26           ` Dick Roy
2023-01-13  7:40         ` rjmcmahon
2023-01-13  8:10           ` Dick Roy
2023-01-15 23:09             ` rjmcmahon
2023-01-11 20:09 ` rjmcmahon
2023-01-12  8:14   ` Sebastian Moeller
2023-01-12 17:49     ` Robert McMahon
2023-01-12 21:57       ` Dick Roy
2023-01-13  7:44         ` Sebastian Moeller
2023-01-13  8:01           ` Dick Roy
     [not found] <mailman.2651.1672779463.1281.starlink@lists.bufferbloat.net>
     [not found] ` <1672786712.106922180@apps.rackspace.com>
     [not found]   ` <F4CA66DA-516C-438A-8D8A-5F172E5DFA75@cable.comcast.com>
2023-01-09 15:26     ` [Bloat] [Starlink] " Dave Taht
2023-01-09 19:13       ` [Bloat] [Rpm] " rjmcmahon
2023-01-09 19:47         ` [Bloat] [Starlink] [Rpm] " Sebastian Moeller
2023-01-09 20:20         ` [Bloat] [Rpm] [Starlink] " Dave Taht
2023-01-09 20:46           ` rjmcmahon
2023-01-09 21:02             ` [Bloat] [Starlink] [Rpm] " Dick Roy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox