From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from vsmx002.dclux.xion.oxcs.net (vsmx002.dclux.xion.oxcs.net [185.74.65.108]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 909F13B2A4 for ; Fri, 11 Jun 2021 18:34:45 -0400 (EDT) Received: from proxy-2.proxy.oxio.ns.xion.oxcs.net (proxy-2.proxy.oxio.ns.xion.oxcs.net [31.4.191.24]) by mx-out.dclux.xion.oxcs.net (Postfix) with SMTP id 3F5F28C0D8F; Fri, 11 Jun 2021 22:34:42 +0000 (UTC) Date: Sat, 12 Jun 2021 00:34:31 +0200 From: Mike Puchol To: starlink@lists.bufferbloat.net, davecb@spamcop.net Message-ID: In-Reply-To: <01a7bed2-6f49-3d7d-eb5a-209031ee8070@gmail.com> References: <950B8EAF-90B9-41A6-951D-91821F591D41@teklibre.net> <01a7bed2-6f49-3d7d-eb5a-209031ee8070@gmail.com> X-Readdle-Message-ID: a1b4d233-6c4d-47b2-b43d-e72acc84c23f@Spark MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="60c3e500_519b500d_3067" X-VadeSecure-Status: LEGIT X-VADE-STATUS: LEGIT Subject: Re: [Starlink] Fwd: Microstate Accounting and the Nyquist problem X-BeenThere: starlink@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Starlink has bufferbloat. Bad." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2021 22:34:45 -0000 --60c3e500_519b500d_3067 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline We know that Starlink recalculates topology every 15 seconds (this guy, w= ho obviously has way too much spare time, came up with an indirect observ= ation of this interval:=C2=A0https://blog.beerriot.com/2021/02/14/starlin= k-raster-scan/=C2=A0) If we could align with this, we could at least know when potential change= s in path delays happen, and try to observe other changes that happen at = a similar cadence. Other thoughts, try to plug more details out of the gRPC data, setup GPS-= synced probes with a device at the exit PoP, measure differences between = time-sync probes to an array of endpoints. Has nobody attacked the JTAG connector on a Dishy yet=3F Best, Mike On Jun 12, 2021, 00:14 +0200, David Collier-Brown = , wrote: > OK, Oh Smarter Colleagues, the challenge to you is to say if there is a= =22natural=22 place to capture state changes to get the data we want, an= d if so, is it common or similar enough between drivers to be worthy of a= ttention=3F > --dave > On 2021-06-09 9:15 a.m., Dave Taht wrote: > > > > > > > Begin forwarded message: > > > > > > =46rom: David Collier-Brown > > > Subject: Microstate Accounting and the Nyquist problem > > > Date: June 9, 2021 at 4:44:14 AM PDT > > > To: Dave Taht > > > Cc: Dave Collier-Brown > > > Reply-To: davecb=40spamcop.net > > > > > > A million years ago (roughly around Solaris 9), Sun was suffering f= rom the same problems in measuring their dispatcher as you are with =22sl= oshing=22. > > > A CPU would be 100% busy in one microsecond, 10% busy in the next g= azillion, and the average CPU utilization for our sample period would be = maybe 10.1, if the sampler happened to sample right when the spike was ha= ppening. > > > This was utterly useless for things like the fair-share scheduler, = so it got fixed in Solaris 10, by having the dispatcher record the time a= process (well, kernel thread) had spent in a state when the state change= d. > > > Initially =22microstate accounting=22 could be toggled on and off, = but the branch-around cost more time than always doing the calculation (a= s discovered by my mad friend =46red) and the kernel folks left it on. It= 's on to this day. > > > In Simon Sundberg's talk, the opportunity to measure occurs every 1= ,000 packets, when a suitable timestamp is provided. While the eBP=46 pro= gram can look at every packet and do after-the-fact book-keeping in a map= , that's only good if the phenomenon you're measuring is persistent enoug= h that it's around for =7E2,000 packets. > > > I'm going to suggest that the right place to record the information= you want is right where the event happens.=C2=A0 Preferably in c code, a= s performance is easy to mess up, but perhaps with an eBP=46 mechanism to= export it. > > > In previous Solaris work, I reliably found that exporting kstats wa= s a darn sight harder than collecting them, and in Eric's blog post=5B1=5D= he notes that converting time is expensive and best done long after coll= ecting, when someone wanted to read the data. > > > There was an effort to do kstats in Linux=5B2=5D, but it had suppos= edly poor performance, and actual trouble when the clock frequency change= d. > > > Is there, in your opinion, a =22natural=22 place to capture state c= hanges to get the data you want, and if so, is it common or similar enoug= h between drivers to be worthy of attention=3F > > > --dave > > > > > > References: > > > > > > 1. Solaris: http://dtrace.org/blogs/eschrock/2004/10/13/microstate-= accounting-in-solaris-10/ > > > 2. A failing Linux effort: https://lwn.net/Articles/127296/, https:= //sourceforge.net/projects/microstate/ > > > > > > -- > > > David Collier-Brown, =7C Always do right. This will gratify= > > > System Programmer and Author =7C some people and astonish the rest > > > davecb=40spamcop.net =7C -- Mark Twa= in > > > =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F > Starlink mailing list > Starlink=40lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink --60c3e500_519b500d_3067 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
We know that Starlink recalculates topology every 1= 5 seconds (this guy, who obviously has way too much spare time, came up w= ith an indirect observation of this interval:&=23160;https://blog.beerriot.com/2021/02/14/starlink-raster-scan/&=231= 60;)

If we could align with this, we could at least know when potential change= s in path delays happen, and try to observe other changes that happen at = a similar cadence.

Other thoughts, try to plug more details out of the gRPC data, setup GPS-= synced probes with a device at the exit PoP, measure differences between = time-sync probes to an array of endpoints.

Has nobody attacked the JTAG connector on a Dishy yet=3F

Best,

Mike
On Jun 12, 2021, 00:14 +0200, David= Collier-Brown <davecb.42=40gmail.com>, wrote:

OK, Oh Smarter Colleagues, the challenge to you is to say if th= ere is a =22natural=22 place to capture state changes to get the data we = want, and if so, is it common or similar enough between drivers to be wor= thy of attention=3F

--dave

On 2021-06-09 9:15 a.m., Dave Taht wro= te:


Begin forwarded message:

=46rom: David Collier-Brown <davecb.42=40= gmail.com>
Subject: Microstate Accounting and the Nyqu= ist problem
Date: June 9, 2021 at 4:44:14 AM PDT
To: Dave Taht <davet=40teklibre.net>=
Cc: Dave Collier-Brown <d= ave.collier-brown=40indexexchange.com>
=

A million years ago (roughly around Solaris 9), Sun was= suffering from the same problems in measuring their dispatcher as you ar= e with =22sloshing=22.

A CPU would be 100% busy in one microsecond, 10% busy i= n the next gazillion, and the average CPU utilization for our sample peri= od would be maybe 10.1, if the sampler happened to = sample right when the spike was happening.

This was utterly useless for things like the fair-share= scheduler, so it got fixed in Solaris 10, by having the dispatcher recor= d the time a process (well, kernel thread) had spent in a state when the = state changed.

Initially =22microstate accounting=22 could be toggled = on and off, but the branch-around cost more time than always doing the ca= lculation (as discovered by my mad friend =46red) and the kernel folks le= ft it on. It's on to this day.

In Simon Sundberg's talk, the opportunity to measure oc= curs every 1,000 packets, when a suitable timestamp is provided. While th= e eBP=46 program can look at every packet and do after-the-fact book-keep= ing in a map, that's only good if the phenomenon you're measuring is pers= istent enough that it's around for =7E2,000 packets.

I'm going to suggest that the right place to record the= information you want is right where the event happens.&=23160; Preferabl= y in c code, as performance is easy to mess up, but perhaps with an eBP=46= mechanism to export it.

In previous Solaris work, I reliably found that exporti= ng kstats was a darn sight harder than collecting them, and in Eric's blo= g post=5B1=5D he notes that converting time is expensive and best done lo= ng after collecting, when someone wanted to read the data.

There was an effort to do kstats in Linux=5B2=5D, but i= t had supposedly poor performance, and actual trouble when the clock freq= uency changed.

Is there, in your opinion, a =22natural=22 place to cap= ture state changes to get the data you want, and if so, is it common or s= imilar enough between drivers to be worthy of attention=3F

--dave


References:

  1. Solaris: http://dtrace.org/blogs/eschrock/2004/10/13/microstate-accounting-= in-solaris-10/
  2. A failing Linux effort: https://lwn.net/Art= icles/127296/, https://sourceforge.net= /projects/microstate/
-- =20
David Collier-Brown,         =7C Always do right. This will gratify
System Programmer and Author =7C some people and astonish the rest
davecb=40spamcop.net          =
 =7C                      -- Mark Twain

=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
Starlink mailing list
Starlink=40lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/starlink
--60c3e500_519b500d_3067--