From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com
 [IPv6:2a00:1450:4864:20::532])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 436A23B29E
 for <rpm@lists.bufferbloat.net>; Tue, 11 Jan 2022 14:44:43 -0500 (EST)
Received: by mail-ed1-x532.google.com with SMTP id k15so425037edk.13
 for <rpm@lists.bufferbloat.net>; Tue, 11 Jan 2022 11:44:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=yerd2WeEOyAbo32xC7YkTDE/3bMpjA82fKtBdldhkPo=;
 b=hSyo3lAfLjiYMjW6gISaaczP87oVkAdNIq4Fcjii5GygWTQ1/ZDqzOEw1LXKoK/CkF
 6BC8q1Q4YmuRY8k7h3wbE1JjBD9ujzXauxSoU671f+53sswIe8XpwHNR+Im5u8xnHbYk
 lGYcyr+dB2LfanDXi0ZXKoKnKGNrL7DoFYUJCEqYwfzvgPSKuoXwVOScOdaQ0+8dN4bg
 kI+L1Qtdc87+kVUYyaMaL1JWl7iGrwS+ndBH5QkpOe42nCuzcnoougsdlt8oCMyjL0gR
 pL1NoFbG18XpsBnT5M1G9fEBVvETGXC8c8yulqGsHSQrrQW6shqhz8SxNTNg6uUld/gM
 LCOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=yerd2WeEOyAbo32xC7YkTDE/3bMpjA82fKtBdldhkPo=;
 b=qxYlkwYz3p95HlZJaYqCyA/med8yGwaTbakgd+ZSbC9vwyf/HuLspzj6rHDp0k37ti
 /6O6Q+cEBMCosYFtNoE7eX9/6jyuFgca4rlOJRWAEHAGGIddxSz4kJYopnr7wSzK4JXU
 SGf2G2SMkFJ+Y92dY/viF/h3efu7Hf2DMCXTxRpY/b6cf6qA13WoRlsNylA7W21CnK8i
 gbT8EWj54k7mLEMYwWksbMsv/3ClhSi7OLvBTUDrIHipej20Bw40C1hIxVW2rcXmScbn
 pByDCpwTA9mXKiTty/tx6FvsypVAiqEzqbpK2rQvtGtGFLMCk+/CTtzqUJI5cc0tkPeR
 dL1A==
X-Gm-Message-State: AOAM531o9odlJzZ4hfLB+jfq7PRfTs4urD41TJ5S80SRGWUvHsZ8/Waq
 tdrI00yYvWlNrGQJHYWdz5OgQKmm0xZ4DDWRz0x6Zgu4up8=
X-Google-Smtp-Source: ABdhPJyvfNdHzdytKFKr7AHDIpk87bM9t9/rwHIgT6JR1dMDkry7Go3nilBl2pnwNWz2nf3ODj9PYA7CRMX74jD9ldk=
X-Received: by 2002:a17:906:1390:: with SMTP id
 f16mr4820451ejc.183.1641930282056; 
 Tue, 11 Jan 2022 11:44:42 -0800 (PST)
MIME-Version: 1.0
References: <CAA93jw6XoFWcRS6FyNyncXykfr=V1Pmy2tn20B7LHbHJZoXa4w@mail.gmail.com>
 <BBC12E1D-0661-4112-9F67-9E60BBF587C2@apple.com>
 <CALQXh-OzB7wzD2imA=_gXOKmOXRfrTEopVfbNaUwa441DZDAJw@mail.gmail.com>
In-Reply-To: <CALQXh-OzB7wzD2imA=_gXOKmOXRfrTEopVfbNaUwa441DZDAJw@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Tue, 11 Jan 2022 11:44:29 -0800
Message-ID: <CAA93jw5Q5J-rK5s7v--Yx81rieyPyp7QCFeco_p=i7Dxxt+Q4A@mail.gmail.com>
To: Aaron Wood <woody77@gmail.com>
Cc: Christoph Paasch <cpaasch@apple.com>, Rpm <rpm@lists.bufferbloat.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Rpm] apm metric - annoyance per minute
X-BeenThere: rpm@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: revolutions per minute - a new metric for measuring responsiveness
 <rpm.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/rpm>,
 <mailto:rpm-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/rpm>
List-Post: <mailto:rpm@lists.bufferbloat.net>
List-Help: <mailto:rpm-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/rpm>,
 <mailto:rpm-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 11 Jan 2022 19:44:43 -0000

On Tue, Jan 11, 2022 at 9:51 AM Aaron Wood <woody77@gmail.com> wrote:
>
> I read it as the number of events per minute (not say the number of frame=
s longer than 20ms, but the number of events that took at least 20ms longer=
 than the base RTT, which I think is what Dave meant by "latency excursion"=
).

yes. thx for reading my tea leaves. The big thing to me was
"annoyance" or "glitch" per minute somewhat in line of rpm's concept.

Hey, this network does 2500RPM but with .5APM:
https://blog.cerowrt.org/post/disabling_channel_scans/

On my holiday trip, mostly staying in cheap hotels, not *one* hotel
out of 6 could sustain a quality videoconference. 10-20GPM,
 but web pages and netflix loaded fine.

>
> E.g. if you're doing DNS queries, and they usually return in 50ms, and 3 =
of them take 83, 94, and 106 ms respectively, in a given minute, than that =
would be an APM of 3?  ( or maybe an APM rate of 30% if you were doing 10/m=
inute)

Yes. You count the excursions from the (semi-smoothed) baseline, not
the size of the excursion. A "glitch" happened.

> I've found the tricky thing for metrics is sorting out the event-count vs=
. events-rate differences.  A lot of tests that are isochronous give a cons=
tant event-rate to base on (e.g. ping's default of once per second), but ot=
her tests, like the UDP and ICMP pings in flent (at least in the past), hav=
e a rate that's based on the RTT, so as RTT goes up, the rate of events goe=
s down, which means that it oversamples the "fast" events, and undersamples=
 "slow" events.

The original rrul spec had an isochronous voip like flow, not ping rtt
test here., which has the annoying flaws you describe above. Which we
now have in the irtt tool, but most of our tests still use ping. At
the time (when we were shooting for reductions of latency from seconds
to 10s of ms) using "ping" wasn't as much of a problem as it is today.
We need a rrul_v2 and a tcp_nup, tcp_ndown tests that just do
isochronous flows at fixed (and ideally high frequency, irtt works
well to about 3ms, opus codec can do 2.7ms)

> Further, retries for failed events muddy the waters, as the events aren't=
 independent measurements.  If a momentary drop in connectivity causes retr=
ies to happen, and each failed retry is counted, is that N failures?  Or ju=
st 1 failure?  I've split those out as separate metrics in some systems I'v=
e built, so that I can tease them apart.  I've also done things like the di=
stribution (histogram) of "attempts before success" or "attempts before ope=
ration failed".  Usually those are dominated by "1 attempt before success",=
 and "N attempts before operation failed" where N is the number of total at=
tempts before just giving up.

Histograms are great. I kind of wanted to separate the concepts that a
"glitch" happened, and also measure the glitch duration (so X retries
turns into a duration rather than a count), and (sigh) whether the
glitch mattered or not. It doesn't matter to a web page if you have an
250ms RTO on one flow but it takes 3sec to load anyway.

glitches matter more for videoconferencing and gaming. I don't know if
there is any human factors research on this, but once I find my flow
in an application, a 20ms 'glitch' is roughly as annoying as a 3second
long one.

>
> On Tue, Jan 11, 2022 at 9:34 AM Christoph Paasch via Rpm <rpm@lists.buffe=
rbloat.net> wrote:
>>
>> Hi Dave!
>>
>> > On Jan 9, 2022, at 6:57 PM, Dave Taht via Rpm <rpm@lists.bufferbloat.n=
et> wrote:
>> >
>> > or gpm - glitch per minute
>> >
>> > defined as a latency excursion of more than 20ms.
>>
>> I kinda find that interesting :) Can you give an example? Would it count=
 the number of times we miss a "20ms-deadline"? So, if the RTT is 100ms, GP=
M would be 5 ?
>>
>>
>> Christoph
>>
>> >
>> > ?
>> >
>> >
>> > --
>> > I tried to build a better future, a few times:
>> > https://wayforward.archive.org/?site=3Dhttps%3A%2F%2Fwww.icei.org
>> >
>> > Dave T=C3=A4ht CEO, TekLibre, LLC
>> > _______________________________________________
>> > Rpm mailing list
>> > Rpm@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/rpm
>>
>> _______________________________________________
>> Rpm mailing list
>> Rpm@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm


--=20
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=3Dhttps%3A%2F%2Fwww.icei.org

Dave T=C3=A4ht CEO, TekLibre, LLC