[Rpm] [Starlink] Researchers Seeking Probe Volunteers in USA

Mon Jan 9 12:00:31 EST 2023

Hi Dave,

just a data point, apples networkQuality on Monterey (12.6.2, x86) defaults to bi-directionally saturating traffic. Your argument about the duration still holds though the test is really short. While I understand the motivation behind that, I think it would to the internet much better if all such tests would randomly offer users extended test duration of, say a minute. Users need to opt-in, but that would at least collect some longer duration data. Now, I have no idea whether apple actually keeps results on their server side (Ookla sure does, but given Apples applaudable privacy stance they might not do so) if not it would do little good to run extended tests, but for "players" like Ookla that do keep some logs interspersing longer running tests would offer a great way to test ISPs outside the "magic 20 seconds".

> On Jan 9, 2023, at 16:26, Dave Taht via Starlink <starlink at lists.bufferbloat.net> wrote:
> 
> I have many kvetches about the new latency under load tests being
> designed and distributed over the past year. I am delighted! that they
> are happening, but most really need third party evaluation, and
> calibration, and a solid explanation of what network pathologies they
> do and don't cover. Also a RED team attitude towards them, as well as
> thinking hard about what you are not measuring (operations research).

	[SM] RED as in RED/BLUE team or as in random early detection? ;)

> 
> I actually rather love the new cloudflare speedtest, because it tests
> a single TCP connection, rather than dozens, and at the same time folk
> are complaining that it doesn't find the actual "speed!".

	[SM] Ookla's on-line test can be toggled between multi and single flow mode (which is good, the default is multi) but e.g. the official macos client application from Ookla does not offer this toggle and defaults to multi-flow (which is less good). Fast.com ca be configured for single flow tests, but defaults to multi-flow.

> yet... the
> test itself more closely emulates a user experience than speedtest.net
> does.

	[SM] I like the separate reporting for transfer rates for objects of different sizes. I would argue that both single and multi-flow tests have merit, but I agree with you that if only one test is performed a single-flow test seems somewhat better.

> I am personally pretty convinced that the fewer numbers of flows
> that a web page opens improves the likelihood of a good user
> experience, but lack data on it.
> 
> To try to tackle the evaluation and calibration part, I've reached out
> to all the new test designers in the hope that we could get together
> and produce a report of what each new test is actually doing.

	[SM] +1; and probably part of your questionaire already, what measures are actually reported back to the user.

> I've
> tweeted, linked in, emailed, and spammed every measurement list I know
> of, and only to some response, please reach out to other test designer
> folks and have them join the rpm email list?
> 
> My principal kvetches in the new tests so far are:
> 
> 0) None of the tests last long enough.
> 
> Ideally there should be a mode where they at least run to "time of
> first loss", or periodically, just run longer than the
> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
> there! It's really bad science to optimize the internet for 20
> seconds. It's like optimizing a car, to handle well, for just 20
> seconds.

	[SM] ++1

> 1) Not testing up + down + ping at the same time
> 
> None of the new tests actually test the same thing that the infamous
> rrul test does - all the others still test up, then down, and ping. It
> was/remains my hope that the simpler parts of the flent test suite -
> such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
> tests would provide calibration to the test designers.
> 
> we've got zillions of flent results in the archive published here:
> https://blog.cerowrt.org/post/found_in_flent/
> 
> The new tests have all added up + ping and down + ping, but not up +
> down + ping. Why??

	[SM] I think at least on Monterey Apple's networkQuality does bidirectional tests (I just confirmed that via packet-capture, but it is already visible in iftop (but hobbled by iftop's relative high default hysteresis)). You actually need to manually intervene to get a sequential test:

laptop:~ user$ networkQuality -h
USAGE: networkQuality [-C <configuration_url>] [-c] [-h] [-I <interfaceName>] [-s] [-v]
    -C: override Configuration URL
    -c: Produce computer-readable output
    -h: Show help (this message)
    -I: Bind test to interface (e.g., en0, pdp_ip0,...)
    -s: Run tests sequentially instead of parallel upload/download
    -v: Verbose output

laptop:~ user $ networkQuality -v
==== SUMMARY ====                                                                                         
Upload capacity: 194.988 Mbps
Download capacity: 894.162 Mbps
Upload flows: 16
Download flows: 12
Responsiveness: High (2782 RPM)
Base RTT: 8
Start: 1/9/23, 17:45:57
End: 1/9/23, 17:46:12
OS Version: Version 12.6.2 (Build 21G320)

laptop:~ user $ networkQuality -v -s
==== SUMMARY ====                                                                                         
Upload capacity: 641.206 Mbps
Download capacity: 883.787 Mbps
Upload flows: 16
Download flows: 12
Upload Responsiveness: High (3529 RPM)
Download Responsiveness: High (1939 RPM)
Base RTT: 8
Start: 1/9/23, 17:46:17
End: 1/9/23, 17:46:41
OS Version: Version 12.6.2 (Build 21G320)

(this is alas not my home link...)

> 
> The behaviors of what happens in that case are really non-intuitive, I
> know, but... it's just one more phase to add to any one of those new
> tests. I'd be deliriously happy if someone(s) new to the field
> started doing that, even optionally, and boggled at how it defeated
> their assumptions.

	[SM] Someone at Apple apparently listened ;)

> 
> Among other things that would show...
> 
> It's the home router industry's dirty secret than darn few "gigabit"
> home routers can actually forward in both directions at a gigabit.

	[SM] That is going to be remedied in the near future, the first batch of nominal Gigabit links were mostly asymmetric, e.g. often something like 1000/50 over DOCSIS or 1000/500 over GPON (reflecting the asymmetric nature of the these media in the field). But with symmetric XGS-PON being deployed by more and more (still a low absolute number) ISPs symmetric performance is going to move into the spot-light. However my guess is that the first few generations of home routers for these speedgrades will rely heavily on accelerator engines.

> I'd
> like to smash that perception thoroughly, but given our starting point
> is a gigabit router was a "gigabit switch" - and historically been
> something that couldn't even forward at 200Mbit - we have a long way
> to go there.
> 
> Only in the past year have non-x86 home routers appeared that could
> actually do a gbit in both directions.
> 
> 2) Few are actually testing within-stream latency
> 
> Apple's rpm project is making a stab in that direction. It looks
> highly likely, that with a little more work, crusader and
> go-responsiveness can finally start sampling the tcp RTT, loss and
> markings, more directly. As for the rest... sampling TCP_INFO on
> windows, and Linux, at least, always appeared simple to me, but I'm
> discovering how hard it is by delving deep into the rust behind
> crusader.

	[SM] I think go-responsiveness looks at TCP_INFO already (on request) but will report an aggregate info block over all flows, which can get interesting as in my testing I often see a mix of IPv4 and IPv6 flows within individual tests, with noticeably different numbers for e.g. MSS. (Yes, MSS is not what you are asking for here, but I think flent does it right by diligently reporting all such measures flow-by-flow, but that will explode pretty quickly if say a test uses 32/32 flows by direction).

> 
> the goresponsiveness thing is also IMHO running WAY too many streams
> at the same time, I guess motivated by an attempt to have the test
> complete quickly?

	[SM] I can only guess, but that goal is to saturate the link persistently (and getting to that sate fast) and for that goal parallel flows seem to be OK, especially as that will reduce the server load for each of these flows a bit, no?

> 
> B) To try and tackle the validation problem:
> 
> In the libreqos.io project we've established a testbed where tests can
> be plunked through various ISP plan network emulations. It's here:
> https://payne.taht.net (run bandwidth test for what's currently hooked
> up)
> 
> We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
> leverage with that, so I don't have to nat the various emulations.
> (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
> to see more test designers setup a testbed like this to calibrate
> their own stuff.
> 
> Presently we're able to test:
> flent
> netperf
> iperf2
> iperf3
> speedtest-cli
> crusader
> the broadband forum udp based test:
> https://github.com/BroadbandForum/obudpst
> trexx
> 
> There's also a virtual machine setup that we can remotely drive a web
> browser from (but I didn't want to nat the results to the world) to
> test other web services.
> _______________________________________________
> Starlink mailing list
> Starlink at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink