From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.taht.net (mail.taht.net [176.58.107.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 41FA13B29E for ; Wed, 9 Jun 2021 09:15:29 -0400 (EDT) Received: from smtpclient.apple (unknown [IPv6:2600:380:455c:bb78:e8f6:553:65b3:eaef]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id 32B5A221D8; Wed, 9 Jun 2021 13:15:27 +0000 (UTC) From: Dave Taht Content-Type: multipart/alternative; boundary="Apple-Mail=_1C120C17-6A42-4ED0-A960-3A459188E846" Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Date: Wed, 9 Jun 2021 06:15:24 -0700 References: Cc: davecb.42@gmail.com To: starlink@lists.bufferbloat.net Message-Id: <950B8EAF-90B9-41A6-951D-91821F591D41@teklibre.net> X-Mailer: Apple Mail (2.3654.80.0.2.43) Subject: [Starlink] Fwd: Microstate Accounting and the Nyquist problem X-BeenThere: starlink@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Starlink has bufferbloat. Bad." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2021 13:15:29 -0000 --Apple-Mail=_1C120C17-6A42-4ED0-A960-3A459188E846 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > Begin forwarded message: >=20 > From: David Collier-Brown > Subject: Microstate Accounting and the Nyquist problem > Date: June 9, 2021 at 4:44:14 AM PDT > To: Dave Taht > Cc: Dave Collier-Brown > Reply-To: davecb@spamcop.net >=20 > A million years ago (roughly around Solaris 9), Sun was suffering from = the same problems in measuring their dispatcher as you are with = "sloshing". >=20 > A CPU would be 100% busy in one microsecond, 10% busy in the next = gazillion, and the average CPU utilization for our sample period would = be maybe 10.1, if the sampler happened to sample right when the spike = was happening. >=20 > This was utterly useless for things like the fair-share scheduler, so = it got fixed in Solaris 10, by having the dispatcher record the time a = process (well, kernel thread) had spent in a state when the state = changed. >=20 > Initially "microstate accounting" could be toggled on and off, but the = branch-around cost more time than always doing the calculation (as = discovered by my mad friend Fred) and the kernel folks left it on. It's = on to this day. >=20 > In Simon Sundberg's talk, the opportunity to measure occurs every = 1,000 packets, when a suitable timestamp is provided. While the eBPF = program can look at every packet and do after-the-fact book-keeping in a = map, that's only good if the phenomenon you're measuring is persistent = enough that it's around for ~2,000 packets. >=20 > I'm going to suggest that the right place to record the information = you want is right where the event happens. Preferably in c code, as = performance is easy to mess up, but perhaps with an eBPF mechanism to = export it. >=20 > In previous Solaris work, I reliably found that exporting kstats was a = darn sight harder than collecting them, and in Eric's blog post[1] he = notes that converting time is expensive and best done long after = collecting, when someone wanted to read the data. >=20 > There was an effort to do kstats in Linux[2], but it had supposedly = poor performance, and actual trouble when the clock frequency changed. >=20 > Is there, in your opinion, a "natural" place to capture state changes = to get the data you want, and if so, is it common or similar enough = between drivers to be worthy of attention? >=20 > --dave >=20 >=20 >=20 > References: >=20 > Solaris: = http://dtrace.org/blogs/eschrock/2004/10/13/microstate-accounting-in-solar= is-10/ = =20 > A failing Linux effort: https://lwn.net/Articles/127296/ = ,https://sourceforge.net/projects/micros= tate/ > --=20 > David Collier-Brown, | Always do right. This will gratify > System Programmer and Author | some people and astonish the rest > davecb@spamcop.net | = -- Mark Twain --Apple-Mail=_1C120C17-6A42-4ED0-A960-3A459188E846 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii

Begin forwarded message:

From: = David Collier-Brown <davecb.42@gmail.com>
Subject: = Microstate = Accounting and the Nyquist problem
Date: = June 9, 2021 at 4:44:14 AM = PDT
To: = Dave Taht <davet@teklibre.net>
Cc: Dave Collier-Brown <dave.collier-brown@indexexchange.com>
Reply-To: = davecb@spamcop.net

=20 =20

A million years ago (roughly around = Solaris 9), Sun was suffering from the same problems in measuring their dispatcher as you are with "sloshing".

A CPU would be 100% busy in one = microsecond, 10% busy in the next gazillion, and the average CPU utilization for our sample period would be maybe 10.1, if the sampler happened to = sample right when the spike was happening.

This was = utterly useless for things like the fair-share scheduler, so it got fixed in Solaris 10, by having the dispatcher record the time a process (well, kernel thread) had spent in a state when the state changed.

Initially "microstate accounting" could be toggled = on and off, but the branch-around cost more time than always doing the calculation (as discovered by my mad friend Fred) and the kernel folks left it on. It's on to this day.

In Simon = Sundberg's talk, the opportunity to measure occurs every 1,000 packets, when a suitable timestamp is provided. While the eBPF program can look at every packet and do after-the-fact book-keeping in a map, that's only good if the phenomenon you're measuring is persistent enough that it's around for ~2,000 packets.

I'm going to suggest that the right = place to record the information you want is right where the event happens.  = Preferably in c code, as performance is easy to mess up, but perhaps with an eBPF mechanism to export it.

In previous Solaris = work, I reliably found that exporting kstats was a darn sight harder than collecting them, and in Eric's blog post[1] he notes that converting time is expensive and best done long after collecting, when someone wanted to read the data.

There was an effort to do kstats in Linux[2], but it had supposedly poor performance, and actual trouble when the clock frequency changed.

Is there, in your opinion, a "natural" place to = capture state changes to get the data you want, and if so, is it common or similar enough between drivers to be worthy of attention?

--dave


References:

  1. Solaris: http://dtrace.org/blogs/eschrock/2004/10/13/microstate-acc= ounting-in-solaris-10/
  2. A failing Linux effort: https://lwn.net/Articles/127296/= , https://sourceforge.= net/projects/microstate/
--=20
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb@spamcop.net           |    =
                  -- Mark Twain

= --Apple-Mail=_1C120C17-6A42-4ED0-A960-3A459188E846--