revolutions per minute - a new metric for measuring responsiveness
 help / color / mirror / Atom feed
* [Rpm] apm metric - annoyance per minute
@ 2022-01-10  2:57 Dave Taht
  2022-01-11 17:34 ` Christoph Paasch
  2022-01-11 21:22 ` Simon Leinen
  0 siblings, 2 replies; 6+ messages in thread
From: Dave Taht @ 2022-01-10  2:57 UTC (permalink / raw)
  To: Rpm

or gpm - glitch per minute

defined as a latency excursion of more than 20ms.

?


-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Rpm] apm metric - annoyance per minute
  2022-01-10  2:57 [Rpm] apm metric - annoyance per minute Dave Taht
@ 2022-01-11 17:34 ` Christoph Paasch
  2022-01-11 17:50   ` Aaron Wood
  2022-01-11 21:22 ` Simon Leinen
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Paasch @ 2022-01-11 17:34 UTC (permalink / raw)
  To: Dave Taht; +Cc: Rpm

Hi Dave!

> On Jan 9, 2022, at 6:57 PM, Dave Taht via Rpm <rpm@lists.bufferbloat.net> wrote:
> 
> or gpm - glitch per minute
> 
> defined as a latency excursion of more than 20ms.

I kinda find that interesting :) Can you give an example? Would it count the number of times we miss a "20ms-deadline"? So, if the RTT is 100ms, GPM would be 5 ?


Christoph

> 
> ?
> 
> 
> -- 
> I tried to build a better future, a few times:
> https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
> 
> Dave Täht CEO, TekLibre, LLC
> _______________________________________________
> Rpm mailing list
> Rpm@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/rpm


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Rpm] apm metric - annoyance per minute
  2022-01-11 17:34 ` Christoph Paasch
@ 2022-01-11 17:50   ` Aaron Wood
  2022-01-11 19:44     ` Dave Taht
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Wood @ 2022-01-11 17:50 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Dave Taht, Rpm

[-- Attachment #1: Type: text/plain, Size: 2569 bytes --]

I read it as the number of events per minute (not say the number of frames
longer than 20ms, but the number of events that took at least 20ms longer
than the base RTT, which I think is what Dave meant by "latency excursion").

E.g. if you're doing DNS queries, and they usually return in 50ms, and 3 of
them take 83, 94, and 106 ms respectively, in a given minute, than that
would be an APM of 3?  ( or maybe an APM rate of 30% if you were doing
10/minute)

I've found the tricky thing for metrics is sorting out the event-count vs.
events-rate differences.  A lot of tests that are isochronous give a
constant event-rate to base on (e.g. ping's default of once per second),
but other tests, like the UDP and ICMP pings in flent (at least in the
past), have a rate that's based on the RTT, so as RTT goes up, the rate of
events goes down, which means that it oversamples the "fast" events, and
undersamples "slow" events.

Further, retries for failed events muddy the waters, as the events aren't
independent measurements.  If a momentary drop in connectivity causes
retries to happen, and each failed retry is counted, is that N failures?
Or just 1 failure?  I've split those out as separate metrics in some
systems I've built, so that I can tease them apart.  I've also done things
like the distribution (histogram) of "attempts before success" or "attempts
before operation failed".  Usually those are dominated by "1 attempt before
success", and "N attempts before operation failed" where N is the number of
total attempts before just giving up.

On Tue, Jan 11, 2022 at 9:34 AM Christoph Paasch via Rpm <
rpm@lists.bufferbloat.net> wrote:

> Hi Dave!
>
> > On Jan 9, 2022, at 6:57 PM, Dave Taht via Rpm <rpm@lists.bufferbloat.net>
> wrote:
> >
> > or gpm - glitch per minute
> >
> > defined as a latency excursion of more than 20ms.
>
> I kinda find that interesting :) Can you give an example? Would it count
> the number of times we miss a "20ms-deadline"? So, if the RTT is 100ms, GPM
> would be 5 ?
>
>
> Christoph
>
> >
> > ?
> >
> >
> > --
> > I tried to build a better future, a few times:
> > https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
> >
> > Dave Täht CEO, TekLibre, LLC
> > _______________________________________________
> > Rpm mailing list
> > Rpm@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/rpm
>
> _______________________________________________
> Rpm mailing list
> Rpm@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/rpm
>

[-- Attachment #2: Type: text/html, Size: 3660 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Rpm] apm metric - annoyance per minute
  2022-01-11 17:50   ` Aaron Wood
@ 2022-01-11 19:44     ` Dave Taht
  2022-01-11 22:03       ` Dave Taht
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Taht @ 2022-01-11 19:44 UTC (permalink / raw)
  To: Aaron Wood; +Cc: Christoph Paasch, Rpm

On Tue, Jan 11, 2022 at 9:51 AM Aaron Wood <woody77@gmail.com> wrote:
>
> I read it as the number of events per minute (not say the number of frames longer than 20ms, but the number of events that took at least 20ms longer than the base RTT, which I think is what Dave meant by "latency excursion").

yes. thx for reading my tea leaves. The big thing to me was
"annoyance" or "glitch" per minute somewhat in line of rpm's concept.

Hey, this network does 2500RPM but with .5APM:
https://blog.cerowrt.org/post/disabling_channel_scans/

On my holiday trip, mostly staying in cheap hotels, not *one* hotel
out of 6 could sustain a quality videoconference. 10-20GPM,
 but web pages and netflix loaded fine.

>
> E.g. if you're doing DNS queries, and they usually return in 50ms, and 3 of them take 83, 94, and 106 ms respectively, in a given minute, than that would be an APM of 3?  ( or maybe an APM rate of 30% if you were doing 10/minute)

Yes. You count the excursions from the (semi-smoothed) baseline, not
the size of the excursion. A "glitch" happened.

> I've found the tricky thing for metrics is sorting out the event-count vs. events-rate differences.  A lot of tests that are isochronous give a constant event-rate to base on (e.g. ping's default of once per second), but other tests, like the UDP and ICMP pings in flent (at least in the past), have a rate that's based on the RTT, so as RTT goes up, the rate of events goes down, which means that it oversamples the "fast" events, and undersamples "slow" events.

The original rrul spec had an isochronous voip like flow, not ping rtt
test here., which has the annoying flaws you describe above. Which we
now have in the irtt tool, but most of our tests still use ping. At
the time (when we were shooting for reductions of latency from seconds
to 10s of ms) using "ping" wasn't as much of a problem as it is today.
We need a rrul_v2 and a tcp_nup, tcp_ndown tests that just do
isochronous flows at fixed (and ideally high frequency, irtt works
well to about 3ms, opus codec can do 2.7ms)

> Further, retries for failed events muddy the waters, as the events aren't independent measurements.  If a momentary drop in connectivity causes retries to happen, and each failed retry is counted, is that N failures?  Or just 1 failure?  I've split those out as separate metrics in some systems I've built, so that I can tease them apart.  I've also done things like the distribution (histogram) of "attempts before success" or "attempts before operation failed".  Usually those are dominated by "1 attempt before success", and "N attempts before operation failed" where N is the number of total attempts before just giving up.

Histograms are great. I kind of wanted to separate the concepts that a
"glitch" happened, and also measure the glitch duration (so X retries
turns into a duration rather than a count), and (sigh) whether the
glitch mattered or not. It doesn't matter to a web page if you have an
250ms RTO on one flow but it takes 3sec to load anyway.

glitches matter more for videoconferencing and gaming. I don't know if
there is any human factors research on this, but once I find my flow
in an application, a 20ms 'glitch' is roughly as annoying as a 3second
long one.

>
> On Tue, Jan 11, 2022 at 9:34 AM Christoph Paasch via Rpm <rpm@lists.bufferbloat.net> wrote:
>>
>> Hi Dave!
>>
>> > On Jan 9, 2022, at 6:57 PM, Dave Taht via Rpm <rpm@lists.bufferbloat.net> wrote:
>> >
>> > or gpm - glitch per minute
>> >
>> > defined as a latency excursion of more than 20ms.
>>
>> I kinda find that interesting :) Can you give an example? Would it count the number of times we miss a "20ms-deadline"? So, if the RTT is 100ms, GPM would be 5 ?
>>
>>
>> Christoph
>>
>> >
>> > ?
>> >
>> >
>> > --
>> > I tried to build a better future, a few times:
>> > https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
>> >
>> > Dave Täht CEO, TekLibre, LLC
>> > _______________________________________________
>> > Rpm mailing list
>> > Rpm@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/rpm
>>
>> _______________________________________________
>> Rpm mailing list
>> Rpm@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm



-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Rpm] apm metric - annoyance per minute
  2022-01-10  2:57 [Rpm] apm metric - annoyance per minute Dave Taht
  2022-01-11 17:34 ` Christoph Paasch
@ 2022-01-11 21:22 ` Simon Leinen
  1 sibling, 0 replies; 6+ messages in thread
From: Simon Leinen @ 2022-01-11 21:22 UTC (permalink / raw)
  To: Dave Taht via Rpm

Dave Taht via Rpm writes:
> or gpm - glitch per minute
> defined as a latency excursion of more than 20ms.

Nice, but shouldn't it be something like "mean time between glitches",
according to the mantra that "higher should be better"?
-- 
Simon.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Rpm] apm metric - annoyance per minute
  2022-01-11 19:44     ` Dave Taht
@ 2022-01-11 22:03       ` Dave Taht
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Taht @ 2022-01-11 22:03 UTC (permalink / raw)
  To: Aaron Wood; +Cc: Christoph Paasch, Rpm

On Tue, Jan 11, 2022 at 11:44 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Tue, Jan 11, 2022 at 9:51 AM Aaron Wood <woody77@gmail.com> wrote:
> >
> > I read it as the number of events per minute (not say the number of frames longer than 20ms, but the number of events that took at least 20ms longer than the base RTT, which I think is what Dave meant by "latency excursion").
>
> yes. thx for reading my tea leaves. The big thing to me was
> "annoyance" or "glitch" per minute somewhat in line of rpm's concept.
>
> Hey, this network does 2500RPM but with .5APM:
> https://blog.cerowrt.org/post/disabling_channel_scans/
>
> On my holiday trip, mostly staying in cheap hotels, not *one* hotel
> out of 6 could sustain a quality videoconference. 10-20GPM,
>  but web pages and netflix loaded fine.
>
> >
> > E.g. if you're doing DNS queries, and they usually return in 50ms, and 3 of them take 83, 94, and 106 ms respectively, in a given minute, than that would be an APM of 3?  ( or maybe an APM rate of 30% if you were doing 10/minute)
>
> Yes. You count the excursions from the (semi-smoothed) baseline, not
> the size of the excursion. A "glitch" happened.
>
> > I've found the tricky thing for metrics is sorting out the event-count vs. events-rate differences.  A lot of tests that are isochronous give a constant event-rate to base on (e.g. ping's default of once per second), but other tests, like the UDP and ICMP pings in flent (at least in the past), have a rate that's based on the RTT, so as RTT goes up, the rate of events goes down, which means that it oversamples the "fast" events, and undersamples "slow" events.
>
> The original rrul spec had an isochronous voip like flow, not ping rtt
> test here., which has the annoying flaws you describe above. Which we
> now have in the irtt tool, but most of our tests still use ping. At
> the time (when we were shooting for reductions of latency from seconds
> to 10s of ms) using "ping" wasn't as much of a problem as it is today.
> We need a rrul_v2 and a tcp_nup, tcp_ndown tests that just do
> isochronous flows at fixed (and ideally high frequency, irtt works
> well to about 3ms, opus codec can do 2.7ms)
>
> > Further, retries for failed events muddy the waters, as the events aren't independent measurements.  If a momentary drop in connectivity causes retries to happen, and each failed retry is counted, is that N failures?  Or just 1 failure?  I've split those out as separate metrics in some systems I've built, so that I can tease them apart.  I've also done things like the distribution (histogram) of "attempts before success" or "attempts before operation failed".  Usually those are dominated by "1 attempt before success", and "N attempts before operation failed" where N is the number of total attempts before just giving up.
>
> Histograms are great. I kind of wanted to separate the concepts that a
> "glitch" happened, and also measure the glitch duration (so X retries
> turns into a duration rather than a count), and (sigh) whether the
> glitch mattered or not. It doesn't matter to a web page if you have an
> 250ms RTO on one flow but it takes 3sec to load anyway.
>
> glitches matter more for videoconferencing and gaming. I don't know if
> there is any human factors research on this, but once I find my flow
 > in an application, a 20ms 'glitch' is roughly as annoying as a 3second
> long one.

Glitches matter even more for music and video playback. Anyone who's
ever experienced a pink floyd record as a whole rather than ripped
into tracks is an example, very few players get the transitions
between songs correct (all my floyd records I ripped as a single
song). Coincidentally I was trying to work out a rube goldberg machine
for a web page load and st ubmedl ed

https://www.youtube.com/watch?v=WyOSqjIABe0

...

stumbled across this wonderful ok-go video about what a 90% and 99%
reliability rate meant for the successive success of 130 events. And
while I was typing this, my lte connection glitched
and I ended up typing "st ubmedl ed" whilst it caught up. I don't know
how many people suffer from glitches as badly as I do - as one example
I turn spellchekcing off when working with a real editor and there's
no way to do that with gmail, I think - but finding and keeping my
flow doesn't last very long when my services "glitch" like this.

>
> >
> > On Tue, Jan 11, 2022 at 9:34 AM Christoph Paasch via Rpm <rpm@lists.bufferbloat.net> wrote:
> >>
> >> Hi Dave!
> >>
> >> > On Jan 9, 2022, at 6:57 PM, Dave Taht via Rpm <rpm@lists.bufferbloat.net> wrote:
> >> >
> >> > or gpm - glitch per minute
> >> >
> >> > defined as a latency excursion of more than 20ms.
> >>
> >> I kinda find that interesting :) Can you give an example? Would it count the number of times we miss a "20ms-deadline"? So, if the RTT is 100ms, GPM would be 5 ?
> >>
> >>
> >> Christoph
> >>
> >> >
> >> > ?
> >> >
> >> >
> >> > --
> >> > I tried to build a better future, a few times:
> >> > https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
> >> >
> >> > Dave Täht CEO, TekLibre, LLC
> >> > _______________________________________________
> >> > Rpm mailing list
> >> > Rpm@lists.bufferbloat.net
> >> > https://lists.bufferbloat.net/listinfo/rpm
> >>
> >> _______________________________________________
> >> Rpm mailing list
> >> Rpm@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/rpm
>
>
>
> --
> I tried to build a better future, a few times:
> https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
>
> Dave Täht CEO, TekLibre, LLC



-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-11 22:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10  2:57 [Rpm] apm metric - annoyance per minute Dave Taht
2022-01-11 17:34 ` Christoph Paasch
2022-01-11 17:50   ` Aaron Wood
2022-01-11 19:44     ` Dave Taht
2022-01-11 22:03       ` Dave Taht
2022-01-11 21:22 ` Simon Leinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox