[Starlink] Researchers Seeking Probe Volunteers in USA

Mon Jan 9 10:26:35 EST 2023

I have many kvetches about the new latency under load tests being
designed and distributed over the past year. I am delighted! that they
are happening, but most really need third party evaluation, and
calibration, and a solid explanation of what network pathologies they
do and don't cover. Also a RED team attitude towards them, as well as
thinking hard about what you are not measuring (operations research).

I actually rather love the new cloudflare speedtest, because it tests
a single TCP connection, rather than dozens, and at the same time folk
are complaining that it doesn't find the actual "speed!". yet... the
test itself more closely emulates a user experience than speedtest.net
does. I am personally pretty convinced that the fewer numbers of flows
that a web page opens improves the likelihood of a good user
experience, but lack data on it.

To try to tackle the evaluation and calibration part, I've reached out
to all the new test designers in the hope that we could get together
and produce a report of what each new test is actually doing. I've
tweeted, linked in, emailed, and spammed every measurement list I know
of, and only to some response, please reach out to other test designer
folks and have them join the rpm email list?

My principal kvetches in the new tests so far are:

0) None of the tests last long enough.

Ideally there should be a mode where they at least run to "time of
first loss", or periodically, just run longer than the
industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
there! It's really bad science to optimize the internet for 20
seconds. It's like optimizing a car, to handle well, for just 20
seconds.

1) Not testing up + down + ping at the same time

None of the new tests actually test the same thing that the infamous
rrul test does - all the others still test up, then down, and ping. It
was/remains my hope that the simpler parts of the flent test suite -
such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
tests would provide calibration to the test designers.

we've got zillions of flent results in the archive published here:
https://blog.cerowrt.org/post/found_in_flent/

The new tests have all added up + ping and down + ping, but not up +
down + ping. Why??

The behaviors of what happens in that case are really non-intuitive, I
know, but... it's just one more phase to add to any one of those new
tests. I'd be deliriously happy if someone(s) new to the field
started doing that, even optionally, and boggled at how it defeated
their assumptions.

Among other things that would show...

It's the home router industry's dirty secret than darn few "gigabit"
home routers can actually forward in both directions at a gigabit. I'd
like to smash that perception thoroughly, but given our starting point
is a gigabit router was a "gigabit switch" - and historically been
something that couldn't even forward at 200Mbit - we have a long way
to go there.

Only in the past year have non-x86 home routers appeared that could
actually do a gbit in both directions.

2) Few are actually testing within-stream latency

Apple's rpm project is making a stab in that direction. It looks
highly likely, that with a little more work, crusader and
go-responsiveness can finally start sampling the tcp RTT, loss and
markings, more directly. As for the rest... sampling TCP_INFO on
windows, and Linux, at least, always appeared simple to me, but I'm
discovering how hard it is by delving deep into the rust behind
crusader.

the goresponsiveness thing is also IMHO running WAY too many streams
at the same time, I guess motivated by an attempt to have the test
complete quickly?

B) To try and tackle the validation problem:

In the libreqos.io project we've established a testbed where tests can
be plunked through various ISP plan network emulations. It's here:
https://payne.taht.net (run bandwidth test for what's currently hooked
up)

We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
leverage with that, so I don't have to nat the various emulations.
(and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
to see more test designers setup a testbed like this to calibrate
their own stuff.

Presently we're able to test:
flent
netperf
iperf2
iperf3
speedtest-cli
crusader
the broadband forum udp based test:
https://github.com/BroadbandForum/obudpst
trexx

There's also a virtual machine setup that we can remotely drive a web
browser from (but I didn't want to nat the results to the world) to
test other web services.