[LibreQoS] [Rpm] [Starlink] Researchers Seeking Probe Volunteers in USA

Mon Jan 9 16:06:42 EST 2023

A peer likes gnuplot and sed. There are many, many visualization tools. 
An excerpt below:

My quick hack one-line parser was based on just a single line from the 
iperf output, not the entire log:

[  1] 0.00-1.00 sec T8-PDF: 
bin(w=1ms):cnt(849)=1:583,2:112,3:9,4:8,5:11,6:10,7:7,8:8,9:7,10:2,11:3,12:2,13:2,14:2,15:2,16:3,17:2,18:3,19:1,21:2,22:2,23:3,24:2,26:3,27:2,28:3,29:2,30:2,31:3,32:2,33:2,34:2,35:5,37:1,39:1,40:3,41:5,42:2,43:3,44:3,45:3,46:3,47:3,48:1,49:2,50:3,51:2,52:1,53:1 
(50.00/99.7/99.80/%=1/51/52,Outliers=0,obl/obu=0/0)

Your log contains 30 such histograms.  A very crude approach would be to 
filter only the lines that have T8-PDF:

plot "< sed -n '/T8-PDF/{s/.*)=//;s/ (.*//;s/,/\\n/g;s/:/ /g;p}' 
lat.txt" with lp

or

plot "< sed -n '/T8(f)-PDF/{s/.*)=//;s/ (.*//;s/,/\\n/g;s/:/ /g;p}' 
lat.txt" with lp

http://www.gnuplot.info/

Bob

> On Mon, Jan 9, 2023 at 12:46 PM rjmcmahon <rjmcmahon at rjmcmahon.com> 
> wrote:
>> 
>> The write to read latencies (OWD) are on the server side in CLT form.
>> Use --histograms on the server side to enable them.
> 
> Thx. It is far more difficult to instrument things on the server side
> of the testbed but we will tackle it.
> 
>> Your client side sampled TCP RTT is 6ms with less than a 1 ms of
>> variance (or sqrt of variance as variance is typically squared)  No
>> retries suggest the network isn't dropping packets.
> 
> Thank you for analyzing that result. the cake aqm, set for a 5ms
> target, with RFC3168-style ECN, is enabled on this path, on this
> setup, at the moment. So the result is correct.
> 
> A second test with ecn off showed the expected retries.
> 
> I have emulations also of fifos, pie, fq-pie, fq-codel, red, blue,
> sfq, with various realworld delays, and so on... but this is a bit
> distracting at the moment from our focus, which was in optimizing the
> XDP + ebpf based bridge and epping based sampling tools to crack
> 25Gbit.
> 
> I think iperf2 will be great for us after that settles down.
> 
>> All the newer bounceback code is only master and requires a compile 
>> from
>> source. It will be released in 2.1.9 after testing cycles. Hopefully, 
>> in
>> early March 2023
> 
> I would like to somehow parse and present those histograms.
>> 
>> Bob
>> 
>> https://sourceforge.net/projects/iperf2/
>> 
>> > The DC that so graciously loaned us 3 machines for the testbed (thx
>> > equinix!), does support ptp, but we have not configured it yet. In ntp
>> > tests between these hosts we seem to be within 500us, and certainly
>> > 50us would be great, in the future.
>> >
>> > I note that in all my kvetching about the new tests' needing
>> > validation today... I kind of elided that I'm pretty happy with
>> > iperf2's new tests that landed last august, and are now appearing in
>> > linux package managers around the world. I hope more folk use them.
>> > (sorry robert, it's been a long time since last august!)
>> >
>> > Our new testbed has multiple setups. In one setup - basically the
>> > machine name is equal to a given ISP plan, and a key testing point is
>> > looking at the differences between the FCC 25-3 and 100/20 plans in
>> > the real world. However at our scale (25gbit) it turned out that
>> > emulating the delay realistically has problematic.
>> >
>> > Anyway, here's a 25/3 result for iperf (other results and iperf test
>> > type requests gladly accepted)
>> >
>> > root at lqos:~# iperf -6 --trip-times -c c25-3 -e -i 1
>> > ------------------------------------------------------------
>> > Client connecting to c25-3, TCP port 5001 with pid 2146556 (1 flows)
>> > Write buffer size: 131072 Byte
>> > TOS set to 0x0 (Nagle on)
>> > TCP window size: 85.3 KByte (default)
>> > ------------------------------------------------------------
>> > [  1] local fd77::3%bond0.4 port 59396 connected with fd77::1:2 port
>> > 5001 (trip-times) (sock=3) (icwnd/mss/irtt=13/1428/948) (ct=1.10 ms)
>> > on 2023-01-09 20:13:37 (UTC)
>> > [ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry
>> >    Cwnd/RTT(var)        NetPwr
>> > [  1] 0.0000-1.0000 sec  3.25 MBytes  27.3 Mbits/sec  26/0          0
>> >      19K/6066(262) us  562
>> > [  1] 1.0000-2.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >      15K/4671(207) us  673
>> > [  1] 2.0000-3.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >      13K/5538(280) us  568
>> > [  1] 3.0000-4.0000 sec  3.12 MBytes  26.2 Mbits/sec  25/0          0
>> >      16K/6244(355) us  525
>> > [  1] 4.0000-5.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >      19K/6152(216) us  511
>> > [  1] 5.0000-6.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >      22K/6764(529) us  465
>> > [  1] 6.0000-7.0000 sec  3.12 MBytes  26.2 Mbits/sec  25/0          0
>> >      15K/5918(605) us  554
>> > [  1] 7.0000-8.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >      18K/5178(327) us  608
>> > [  1] 8.0000-9.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >      19K/5758(473) us  546
>> > [  1] 9.0000-10.0000 sec  3.00 MBytes  25.2 Mbits/sec  24/0          0
>> >       16K/6141(280) us  512
>> > [  1] 0.0000-10.0952 sec  30.6 MBytes  25.4 Mbits/sec  245/0
>> > 0       19K/5924(491) us  537
>> >
>> >
>> > On Mon, Jan 9, 2023 at 11:13 AM rjmcmahon <rjmcmahon at rjmcmahon.com>
>> > wrote:
>> >>
>> >> My biggest barrier is the lack of clock sync by the devices, i.e. very
>> >> limited support for PTP in data centers and in end devices. This
>> >> limits
>> >> the ability to measure one way delays (OWD) and most assume that OWD
>> >> is
>> >> 1/2 and RTT which typically is a mistake. We know this intuitively
>> >> with
>> >> airplane flight times or even car commute times where the one way time
>> >> is not 1/2 a round trip time. Google maps & directions provide a time
>> >> estimate for the one way link. It doesn't compute a round trip and
>> >> divide by two.
>> >>
>> >> For those that can get clock sync working, the iperf 2 --trip-times
>> >> options is useful.
>> >>
>> >> --trip-times
>> >>    enable the measurement of end to end write to read latencies
>> >> (client
>> >> and server clocks must be synchronized)
>> >>
>> >> Bob
>> >> > I have many kvetches about the new latency under load tests being
>> >> > designed and distributed over the past year. I am delighted! that they
>> >> > are happening, but most really need third party evaluation, and
>> >> > calibration, and a solid explanation of what network pathologies they
>> >> > do and don't cover. Also a RED team attitude towards them, as well as
>> >> > thinking hard about what you are not measuring (operations research).
>> >> >
>> >> > I actually rather love the new cloudflare speedtest, because it tests
>> >> > a single TCP connection, rather than dozens, and at the same time folk
>> >> > are complaining that it doesn't find the actual "speed!". yet... the
>> >> > test itself more closely emulates a user experience than speedtest.net
>> >> > does. I am personally pretty convinced that the fewer numbers of flows
>> >> > that a web page opens improves the likelihood of a good user
>> >> > experience, but lack data on it.
>> >> >
>> >> > To try to tackle the evaluation and calibration part, I've reached out
>> >> > to all the new test designers in the hope that we could get together
>> >> > and produce a report of what each new test is actually doing. I've
>> >> > tweeted, linked in, emailed, and spammed every measurement list I know
>> >> > of, and only to some response, please reach out to other test designer
>> >> > folks and have them join the rpm email list?
>> >> >
>> >> > My principal kvetches in the new tests so far are:
>> >> >
>> >> > 0) None of the tests last long enough.
>> >> >
>> >> > Ideally there should be a mode where they at least run to "time of
>> >> > first loss", or periodically, just run longer than the
>> >> > industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons
>> >> > there! It's really bad science to optimize the internet for 20
>> >> > seconds. It's like optimizing a car, to handle well, for just 20
>> >> > seconds.
>> >> >
>> >> > 1) Not testing up + down + ping at the same time
>> >> >
>> >> > None of the new tests actually test the same thing that the infamous
>> >> > rrul test does - all the others still test up, then down, and ping. It
>> >> > was/remains my hope that the simpler parts of the flent test suite -
>> >> > such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair
>> >> > tests would provide calibration to the test designers.
>> >> >
>> >> > we've got zillions of flent results in the archive published here:
>> >> > https://blog.cerowrt.org/post/found_in_flent/
>> >> > ps. Misinformation about iperf 2 impacts my ability to do this.
>> >>
>> >> > The new tests have all added up + ping and down + ping, but not up +
>> >> > down + ping. Why??
>> >> >
>> >> > The behaviors of what happens in that case are really non-intuitive, I
>> >> > know, but... it's just one more phase to add to any one of those new
>> >> > tests. I'd be deliriously happy if someone(s) new to the field
>> >> > started doing that, even optionally, and boggled at how it defeated
>> >> > their assumptions.
>> >> >
>> >> > Among other things that would show...
>> >> >
>> >> > It's the home router industry's dirty secret than darn few "gigabit"
>> >> > home routers can actually forward in both directions at a gigabit. I'd
>> >> > like to smash that perception thoroughly, but given our starting point
>> >> > is a gigabit router was a "gigabit switch" - and historically been
>> >> > something that couldn't even forward at 200Mbit - we have a long way
>> >> > to go there.
>> >> >
>> >> > Only in the past year have non-x86 home routers appeared that could
>> >> > actually do a gbit in both directions.
>> >> >
>> >> > 2) Few are actually testing within-stream latency
>> >> >
>> >> > Apple's rpm project is making a stab in that direction. It looks
>> >> > highly likely, that with a little more work, crusader and
>> >> > go-responsiveness can finally start sampling the tcp RTT, loss and
>> >> > markings, more directly. As for the rest... sampling TCP_INFO on
>> >> > windows, and Linux, at least, always appeared simple to me, but I'm
>> >> > discovering how hard it is by delving deep into the rust behind
>> >> > crusader.
>> >> >
>> >> > the goresponsiveness thing is also IMHO running WAY too many streams
>> >> > at the same time, I guess motivated by an attempt to have the test
>> >> > complete quickly?
>> >> >
>> >> > B) To try and tackle the validation problem:ps. Misinformation about
>> >> > iperf 2 impacts my ability to do this.
>> >>
>> >> >
>> >> > In the libreqos.io project we've established a testbed where tests can
>> >> > be plunked through various ISP plan network emulations. It's here:
>> >> > https://payne.taht.net (run bandwidth test for what's currently hooked
>> >> > up)
>> >> >
>> >> > We could rather use an AS number and at least a ipv4/24 and ipv6/48 to
>> >> > leverage with that, so I don't have to nat the various emulations.
>> >> > (and funding, anyone got funding?) Or, as the code is GPLv2 licensed,
>> >> > to see more test designers setup a testbed like this to calibrate
>> >> > their own stuff.
>> >> >
>> >> > Presently we're able to test:
>> >> > flent
>> >> > netperf
>> >> > iperf2
>> >> > iperf3
>> >> > speedtest-cli
>> >> > crusader
>> >> > the broadband forum udp based test:
>> >> > https://github.com/BroadbandForum/obudpst
>> >> > trexx
>> >> >
>> >> > There's also a virtual machine setup that we can remotely drive a web
>> >> > browser from (but I didn't want to nat the results to the world) to
>> >> > test other web services.
>> >> > _______________________________________________
>> >> > Rpm mailing list
>> >> > Rpm at lists.bufferbloat.net
>> >> > https://lists.bufferbloat.net/listinfo/rpm