From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bobcat.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id F3B1A3B2A4; Mon, 9 Jan 2023 16:06:42 -0500 (EST) Received: from mail.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) by bobcat.rjmcmahon.com (Postfix) with ESMTPA id 37BF41B252; Mon, 9 Jan 2023 13:06:42 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 bobcat.rjmcmahon.com 37BF41B252 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rjmcmahon.com; s=bobcat; t=1673298402; bh=oBq13FQOwc+iVkoelvMozR4uW5Q/T2RJl1RWTVUP1dA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nBng32VbP1tIpPg99/m4shJipHUo2Q/VRXv1VG8QHmR49TxkMn36a/DC/T7jPlWyW U4CdhcMIG72SHRV5egMJA15hl4/SIDFlmhHOiqtOQ5h9FV3+HRzCktgq+Kc6gY6Ac7 Y/g81LkGXbYFLcvKLZnzZHlEqIUnWG5d2lJPNopc= MIME-Version: 1.0 Date: Mon, 09 Jan 2023 13:06:42 -0800 From: rjmcmahon To: Dave Taht Cc: "Livingood, Jason" , Rpm , mike.reynolds@netforecast.com, libreqos , "David P. Reed" , starlink@lists.bufferbloat.net, bloat In-Reply-To: References: <1672786712.106922180@apps.rackspace.com> <412c00f23a6cfef61ecbf0fd9b6f3069@rjmcmahon.com> <067248a1bde7da5be839f9555cc2419b@rjmcmahon.com> Message-ID: <28c876185d53e2f39aa702766df6eba5@rjmcmahon.com> X-Sender: rjmcmahon@rjmcmahon.com Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Rpm] [Starlink] Researchers Seeking Probe Volunteers in USA X-BeenThere: rpm@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: revolutions per minute - a new metric for measuring responsiveness List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2023 21:06:43 -0000 A peer likes gnuplot and sed. There are many, many visualization tools. An excerpt below: My quick hack one-line parser was based on just a single line from the iperf output, not the entire log: [ 1] 0.00-1.00 sec T8-PDF: bin(w=1ms):cnt(849)=1:583,2:112,3:9,4:8,5:11,6:10,7:7,8:8,9:7,10:2,11:3,12:2,13:2,14:2,15:2,16:3,17:2,18:3,19:1,21:2,22:2,23:3,24:2,26:3,27:2,28:3,29:2,30:2,31:3,32:2,33:2,34:2,35:5,37:1,39:1,40:3,41:5,42:2,43:3,44:3,45:3,46:3,47:3,48:1,49:2,50:3,51:2,52:1,53:1 (50.00/99.7/99.80/%=1/51/52,Outliers=0,obl/obu=0/0) Your log contains 30 such histograms. A very crude approach would be to filter only the lines that have T8-PDF: plot "< sed -n '/T8-PDF/{s/.*)=//;s/ (.*//;s/,/\\n/g;s/:/ /g;p}' lat.txt" with lp or plot "< sed -n '/T8(f)-PDF/{s/.*)=//;s/ (.*//;s/,/\\n/g;s/:/ /g;p}' lat.txt" with lp http://www.gnuplot.info/ Bob > On Mon, Jan 9, 2023 at 12:46 PM rjmcmahon > wrote: >> >> The write to read latencies (OWD) are on the server side in CLT form. >> Use --histograms on the server side to enable them. > > Thx. It is far more difficult to instrument things on the server side > of the testbed but we will tackle it. > >> Your client side sampled TCP RTT is 6ms with less than a 1 ms of >> variance (or sqrt of variance as variance is typically squared) No >> retries suggest the network isn't dropping packets. > > Thank you for analyzing that result. the cake aqm, set for a 5ms > target, with RFC3168-style ECN, is enabled on this path, on this > setup, at the moment. So the result is correct. > > A second test with ecn off showed the expected retries. > > I have emulations also of fifos, pie, fq-pie, fq-codel, red, blue, > sfq, with various realworld delays, and so on... but this is a bit > distracting at the moment from our focus, which was in optimizing the > XDP + ebpf based bridge and epping based sampling tools to crack > 25Gbit. > > I think iperf2 will be great for us after that settles down. > >> All the newer bounceback code is only master and requires a compile >> from >> source. It will be released in 2.1.9 after testing cycles. Hopefully, >> in >> early March 2023 > > I would like to somehow parse and present those histograms. >> >> Bob >> >> https://sourceforge.net/projects/iperf2/ >> >> > The DC that so graciously loaned us 3 machines for the testbed (thx >> > equinix!), does support ptp, but we have not configured it yet. In ntp >> > tests between these hosts we seem to be within 500us, and certainly >> > 50us would be great, in the future. >> > >> > I note that in all my kvetching about the new tests' needing >> > validation today... I kind of elided that I'm pretty happy with >> > iperf2's new tests that landed last august, and are now appearing in >> > linux package managers around the world. I hope more folk use them. >> > (sorry robert, it's been a long time since last august!) >> > >> > Our new testbed has multiple setups. In one setup - basically the >> > machine name is equal to a given ISP plan, and a key testing point is >> > looking at the differences between the FCC 25-3 and 100/20 plans in >> > the real world. However at our scale (25gbit) it turned out that >> > emulating the delay realistically has problematic. >> > >> > Anyway, here's a 25/3 result for iperf (other results and iperf test >> > type requests gladly accepted) >> > >> > root@lqos:~# iperf -6 --trip-times -c c25-3 -e -i 1 >> > ------------------------------------------------------------ >> > Client connecting to c25-3, TCP port 5001 with pid 2146556 (1 flows) >> > Write buffer size: 131072 Byte >> > TOS set to 0x0 (Nagle on) >> > TCP window size: 85.3 KByte (default) >> > ------------------------------------------------------------ >> > [ 1] local fd77::3%bond0.4 port 59396 connected with fd77::1:2 port >> > 5001 (trip-times) (sock=3) (icwnd/mss/irtt=13/1428/948) (ct=1.10 ms) >> > on 2023-01-09 20:13:37 (UTC) >> > [ ID] Interval Transfer Bandwidth Write/Err Rtry >> > Cwnd/RTT(var) NetPwr >> > [ 1] 0.0000-1.0000 sec 3.25 MBytes 27.3 Mbits/sec 26/0 0 >> > 19K/6066(262) us 562 >> > [ 1] 1.0000-2.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 15K/4671(207) us 673 >> > [ 1] 2.0000-3.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 13K/5538(280) us 568 >> > [ 1] 3.0000-4.0000 sec 3.12 MBytes 26.2 Mbits/sec 25/0 0 >> > 16K/6244(355) us 525 >> > [ 1] 4.0000-5.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 19K/6152(216) us 511 >> > [ 1] 5.0000-6.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 22K/6764(529) us 465 >> > [ 1] 6.0000-7.0000 sec 3.12 MBytes 26.2 Mbits/sec 25/0 0 >> > 15K/5918(605) us 554 >> > [ 1] 7.0000-8.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 18K/5178(327) us 608 >> > [ 1] 8.0000-9.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 19K/5758(473) us 546 >> > [ 1] 9.0000-10.0000 sec 3.00 MBytes 25.2 Mbits/sec 24/0 0 >> > 16K/6141(280) us 512 >> > [ 1] 0.0000-10.0952 sec 30.6 MBytes 25.4 Mbits/sec 245/0 >> > 0 19K/5924(491) us 537 >> > >> > >> > On Mon, Jan 9, 2023 at 11:13 AM rjmcmahon >> > wrote: >> >> >> >> My biggest barrier is the lack of clock sync by the devices, i.e. very >> >> limited support for PTP in data centers and in end devices. This >> >> limits >> >> the ability to measure one way delays (OWD) and most assume that OWD >> >> is >> >> 1/2 and RTT which typically is a mistake. We know this intuitively >> >> with >> >> airplane flight times or even car commute times where the one way time >> >> is not 1/2 a round trip time. Google maps & directions provide a time >> >> estimate for the one way link. It doesn't compute a round trip and >> >> divide by two. >> >> >> >> For those that can get clock sync working, the iperf 2 --trip-times >> >> options is useful. >> >> >> >> --trip-times >> >> enable the measurement of end to end write to read latencies >> >> (client >> >> and server clocks must be synchronized) >> >> >> >> Bob >> >> > I have many kvetches about the new latency under load tests being >> >> > designed and distributed over the past year. I am delighted! that they >> >> > are happening, but most really need third party evaluation, and >> >> > calibration, and a solid explanation of what network pathologies they >> >> > do and don't cover. Also a RED team attitude towards them, as well as >> >> > thinking hard about what you are not measuring (operations research). >> >> > >> >> > I actually rather love the new cloudflare speedtest, because it tests >> >> > a single TCP connection, rather than dozens, and at the same time folk >> >> > are complaining that it doesn't find the actual "speed!". yet... the >> >> > test itself more closely emulates a user experience than speedtest.net >> >> > does. I am personally pretty convinced that the fewer numbers of flows >> >> > that a web page opens improves the likelihood of a good user >> >> > experience, but lack data on it. >> >> > >> >> > To try to tackle the evaluation and calibration part, I've reached out >> >> > to all the new test designers in the hope that we could get together >> >> > and produce a report of what each new test is actually doing. I've >> >> > tweeted, linked in, emailed, and spammed every measurement list I know >> >> > of, and only to some response, please reach out to other test designer >> >> > folks and have them join the rpm email list? >> >> > >> >> > My principal kvetches in the new tests so far are: >> >> > >> >> > 0) None of the tests last long enough. >> >> > >> >> > Ideally there should be a mode where they at least run to "time of >> >> > first loss", or periodically, just run longer than the >> >> > industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons >> >> > there! It's really bad science to optimize the internet for 20 >> >> > seconds. It's like optimizing a car, to handle well, for just 20 >> >> > seconds. >> >> > >> >> > 1) Not testing up + down + ping at the same time >> >> > >> >> > None of the new tests actually test the same thing that the infamous >> >> > rrul test does - all the others still test up, then down, and ping. It >> >> > was/remains my hope that the simpler parts of the flent test suite - >> >> > such as the tcp_up_squarewave tests, the rrul test, and the rtt_fair >> >> > tests would provide calibration to the test designers. >> >> > >> >> > we've got zillions of flent results in the archive published here: >> >> > https://blog.cerowrt.org/post/found_in_flent/ >> >> > ps. Misinformation about iperf 2 impacts my ability to do this. >> >> >> >> > The new tests have all added up + ping and down + ping, but not up + >> >> > down + ping. Why?? >> >> > >> >> > The behaviors of what happens in that case are really non-intuitive, I >> >> > know, but... it's just one more phase to add to any one of those new >> >> > tests. I'd be deliriously happy if someone(s) new to the field >> >> > started doing that, even optionally, and boggled at how it defeated >> >> > their assumptions. >> >> > >> >> > Among other things that would show... >> >> > >> >> > It's the home router industry's dirty secret than darn few "gigabit" >> >> > home routers can actually forward in both directions at a gigabit. I'd >> >> > like to smash that perception thoroughly, but given our starting point >> >> > is a gigabit router was a "gigabit switch" - and historically been >> >> > something that couldn't even forward at 200Mbit - we have a long way >> >> > to go there. >> >> > >> >> > Only in the past year have non-x86 home routers appeared that could >> >> > actually do a gbit in both directions. >> >> > >> >> > 2) Few are actually testing within-stream latency >> >> > >> >> > Apple's rpm project is making a stab in that direction. It looks >> >> > highly likely, that with a little more work, crusader and >> >> > go-responsiveness can finally start sampling the tcp RTT, loss and >> >> > markings, more directly. As for the rest... sampling TCP_INFO on >> >> > windows, and Linux, at least, always appeared simple to me, but I'm >> >> > discovering how hard it is by delving deep into the rust behind >> >> > crusader. >> >> > >> >> > the goresponsiveness thing is also IMHO running WAY too many streams >> >> > at the same time, I guess motivated by an attempt to have the test >> >> > complete quickly? >> >> > >> >> > B) To try and tackle the validation problem:ps. Misinformation about >> >> > iperf 2 impacts my ability to do this. >> >> >> >> > >> >> > In the libreqos.io project we've established a testbed where tests can >> >> > be plunked through various ISP plan network emulations. It's here: >> >> > https://payne.taht.net (run bandwidth test for what's currently hooked >> >> > up) >> >> > >> >> > We could rather use an AS number and at least a ipv4/24 and ipv6/48 to >> >> > leverage with that, so I don't have to nat the various emulations. >> >> > (and funding, anyone got funding?) Or, as the code is GPLv2 licensed, >> >> > to see more test designers setup a testbed like this to calibrate >> >> > their own stuff. >> >> > >> >> > Presently we're able to test: >> >> > flent >> >> > netperf >> >> > iperf2 >> >> > iperf3 >> >> > speedtest-cli >> >> > crusader >> >> > the broadband forum udp based test: >> >> > https://github.com/BroadbandForum/obudpst >> >> > trexx >> >> > >> >> > There's also a virtual machine setup that we can remotely drive a web >> >> > browser from (but I didn't want to nat the results to the world) to >> >> > test other web services. >> >> > _______________________________________________ >> >> > Rpm mailing list >> >> > Rpm@lists.bufferbloat.net >> >> > https://lists.bufferbloat.net/listinfo/rpm