[LibreQoS] [Starlink] [Rpm] Researchers Seeking Probe Volunteers in USA

Robert McMahon rjmcmahon at rjmcmahon.com
Thu Jan 12 12:49:59 EST 2023


Hi Sebastien,

⁣You make a good point. What I did was issue a warning if the tool found it was being CPU limited vs i/o limited. This indicates the i/o test likely is inaccurate from an i/o perspective, and the results are suspect. It does this crudely by comparing the cpu thread doing stats against the traffic threads doing i/o, which thread is waiting on the others. There is no attempt to assess the cpu load itself. So it's designed with a singular purpose of making sure i/o threads only block on syscalls of write and read.

I probably should revisit this both in design and implementation. Thanks for bringing it up and all input is truly appreciated. 

Bob

On Jan 12, 2023, 12:14 AM, at 12:14 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
>Hi Bob,
>
>
>> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcmahon at rjmcmahon.com> wrote:
>> 
>> Iperf 2 is designed to measure network i/o. Note: It doesn't have to
>move large amounts of data. It can support data profiles that don't
>drive TCP's CCA as an example.
>> 
>> Two things I've been asked for and avoided:
>> 
>> 1) Integrate clock sync into iperf's test traffic
>
>	[SM] This I understand, measurement conditions can be unsuited for
>tight time synchronization...
>
>
>> 2) Measure and output CPU usages
>
>	[SM] This one puzzles me, as far as I understand the only way to
>properly diagnose network issues is to rule out other things like CPU
>overload that can have symptoms similar to network issues. As an
>example, the cake qdisc will if CPU cycles become tight first increases
>its internal queueing and jitter (not consciously, it is just an
>observation that once cake does not get access to the CPU as timely as
>it wants, queuing latency and variability increases) and then later
>also shows reduced throughput, so similar things that can happen along
>an e2e network path for completely different reasons, e.g. lower level
>retransmissions or a variable rate link. So i would think that checking
>the CPU load at least coarse would be within the scope of network
>testing tools, no?
>
>Regards
>	Sebastian
>
>
>
>
>> I think both of these are outside the scope of a tool designed to
>test network i/o over sockets, rather these should be developed &
>validated independently of a network i/o tool.
>> 
>> Clock error really isn't about amount/frequency of traffic but rather
>getting a periodic high-quality reference. I tend to use GPS pulse per
>second to lock the local system oscillator to. As David says, most
>every modern handheld computer has the GPS chips to do this already. So
>to me it seems more of a policy choice between data center operators
>and device mfgs and less of a technical issue.
>> 
>> Bob
>>> Hello,
>>> 	Yall can call me crazy if you want.. but... see below [RWG]
>>>> Hi Bib,
>>>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink
><starlink at lists.bufferbloat.net> wrote:
>>>> >
>>>> > My biggest barrier is the lack of clock sync by the devices, i.e.
>very limited support for PTP in data centers and in end devices. This
>limits the ability to measure one way delays (OWD) and most assume that
>OWD is 1/2 and RTT which typically is a mistake. We know this
>intuitively with airplane flight times or even car commute times where
>the one way time is not 1/2 a round trip time. Google maps & directions
>provide a time estimate for the one way link. It doesn't compute a
>round trip and divide by two.
>>>> >
>>>> > For those that can get clock sync working, the iperf 2
>--trip-times options is useful.
>>>> 	[SM] +1; and yet even with unsynchronized clocks one can try to
>measure how latency changes under load and that can be done per
>direction. Sure this is far inferior to real reliably measured OWDs,
>but if life/the internet deals you lemons....
>>> [RWG] iperf2/iperf3, etc are already moving large amounts of data
>>> back and forth, for that matter any rate test, why not abuse some of
>>> that data and add the fundemental NTP clock sync data and
>>> bidirectionally pass each others concept of "current time".  IIRC
>(its
>>> been 25 years since I worked on NTP at this level) you *should* be
>>> able to get a fairly accurate clock delta between each end, and then
>>> use that info and time stamps in the data stream to compute OWD's.
>>> You need to put 4 time stamps in the packet, and with that you can
>>> compute "offset".
>>>> >
>>>> > --trip-times
>>>> >  enable the measurement of end to end write to read latencies
>(client and server clocks must be synchronized)
>>> [RWG] --clock-skew
>>> 	enable the measurement of the wall clock difference between sender
>and receiver
>>>> 	[SM] Sweet!
>>>> Regards
>>>> 	Sebastian
>>>> >
>>>> > Bob
>>>> >> I have many kvetches about the new latency under load tests
>being
>>>> >> designed and distributed over the past year. I am delighted!
>that they
>>>> >> are happening, but most really need third party evaluation, and
>>>> >> calibration, and a solid explanation of what network pathologies
>they
>>>> >> do and don't cover. Also a RED team attitude towards them, as
>well as
>>>> >> thinking hard about what you are not measuring (operations
>research).
>>>> >> I actually rather love the new cloudflare speedtest, because it
>tests
>>>> >> a single TCP connection, rather than dozens, and at the same
>time folk
>>>> >> are complaining that it doesn't find the actual "speed!". yet...
>the
>>>> >> test itself more closely emulates a user experience than
>speedtest.net
>>>> >> does. I am personally pretty convinced that the fewer numbers of
>flows
>>>> >> that a web page opens improves the likelihood of a good user
>>>> >> experience, but lack data on it.
>>>> >> To try to tackle the evaluation and calibration part, I've
>reached out
>>>> >> to all the new test designers in the hope that we could get
>together
>>>> >> and produce a report of what each new test is actually doing.
>I've
>>>> >> tweeted, linked in, emailed, and spammed every measurement list
>I know
>>>> >> of, and only to some response, please reach out to other test
>designer
>>>> >> folks and have them join the rpm email list?
>>>> >> My principal kvetches in the new tests so far are:
>>>> >> 0) None of the tests last long enough.
>>>> >> Ideally there should be a mode where they at least run to "time
>of
>>>> >> first loss", or periodically, just run longer than the
>>>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>>>> >> there! It's really bad science to optimize the internet for 20
>>>> >> seconds. It's like optimizing a car, to handle well, for just 20
>>>> >> seconds.
>>>> >> 1) Not testing up + down + ping at the same time
>>>> >> None of the new tests actually test the same thing that the
>infamous
>>>> >> rrul test does - all the others still test up, then down, and
>ping. It
>>>> >> was/remains my hope that the simpler parts of the flent test
>suite -
>>>> >> such as the tcp_up_squarewave tests, the rrul test, and the
>rtt_fair
>>>> >> tests would provide calibration to the test designers.
>>>> >> we've got zillions of flent results in the archive published
>here:
>>>> >> https://blog.cerowrt.org/post/found_in_flent/
>>>> >> ps. Misinformation about iperf 2 impacts my ability to do this.
>>>> >
>>>> >> The new tests have all added up + ping and down + ping, but not
>up +
>>>> >> down + ping. Why??
>>>> >> The behaviors of what happens in that case are really
>non-intuitive, I
>>>> >> know, but... it's just one more phase to add to any one of those
>new
>>>> >> tests. I'd be deliriously happy if someone(s) new to the field
>>>> >> started doing that, even optionally, and boggled at how it
>defeated
>>>> >> their assumptions.
>>>> >> Among other things that would show...
>>>> >> It's the home router industry's dirty secret than darn few
>"gigabit"
>>>> >> home routers can actually forward in both directions at a
>gigabit. I'd
>>>> >> like to smash that perception thoroughly, but given our starting
>point
>>>> >> is a gigabit router was a "gigabit switch" - and historically
>been
>>>> >> something that couldn't even forward at 200Mbit - we have a long
>way
>>>> >> to go there.
>>>> >> Only in the past year have non-x86 home routers appeared that
>could
>>>> >> actually do a gbit in both directions.
>>>> >> 2) Few are actually testing within-stream latency
>>>> >> Apple's rpm project is making a stab in that direction. It looks
>>>> >> highly likely, that with a little more work, crusader and
>>>> >> go-responsiveness can finally start sampling the tcp RTT, loss
>and
>>>> >> markings, more directly. As for the rest... sampling TCP_INFO on
>>>> >> windows, and Linux, at least, always appeared simple to me, but
>I'm
>>>> >> discovering how hard it is by delving deep into the rust behind
>>>> >> crusader.
>>>> >> the goresponsiveness thing is also IMHO running WAY too many
>streams
>>>> >> at the same time, I guess motivated by an attempt to have the
>test
>>>> >> complete quickly?
>>>> >> B) To try and tackle the validation problem:ps. Misinformation
>about iperf 2 impacts my ability to do this.
>>>> >
>>>> >> In the libreqos.io project we've established a testbed where
>tests can
>>>> >> be plunked through various ISP plan network emulations. It's
>here:
>>>> >> https://payne.taht.net (run bandwidth test for what's currently
>hooked
>>>> >> up)
>>>> >> We could rather use an AS number and at least a ipv4/24 and
>ipv6/48 to
>>>> >> leverage with that, so I don't have to nat the various
>emulations.
>>>> >> (and funding, anyone got funding?) Or, as the code is GPLv2
>licensed,
>>>> >> to see more test designers setup a testbed like this to
>calibrate
>>>> >> their own stuff.
>>>> >> Presently we're able to test:
>>>> >> flent
>>>> >> netperf
>>>> >> iperf2
>>>> >> iperf3
>>>> >> speedtest-cli
>>>> >> crusader
>>>> >> the broadband forum udp based test:
>>>> >> https://github.com/BroadbandForum/obudpst
>>>> >> trexx
>>>> >> There's also a virtual machine setup that we can remotely drive
>a web
>>>> >> browser from (but I didn't want to nat the results to the world)
>to
>>>> >> test other web services.
>>>> >> _______________________________________________
>>>> >> Rpm mailing list
>>>> >> Rpm at lists.bufferbloat.net
>>>> >> https://lists.bufferbloat.net/listinfo/rpm
>>>> > _______________________________________________
>>>> > Starlink mailing list
>>>> > Starlink at lists.bufferbloat.net
>>>> > https://lists.bufferbloat.net/listinfo/starlink
>>>> _______________________________________________
>>>> Starlink mailing list
>>>> Starlink at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/starlink
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20230112/c3e9623e/attachment-0001.html>


More information about the LibreQoS mailing list