[Rpm] Outch! I found a problem with responsiveness

revolutions per minute - a new metric for measuring responsiveness
 help / color / mirror / Atom feed

* [Rpm] Outch! I found a problem with responsiveness
@ 2021-10-04 23:23 Matt Mathis
  2021-10-04 23:36 ` [Rpm] RPM open meeting tuesdays 9:30-10:30 Dave Taht
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Matt Mathis @ 2021-10-04 23:23 UTC (permalink / raw)
  To: Rpm

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

It has a super Heisenberg problem, to the point where it  is unlikely to
have much predictive value under conditions that are different from the
measurement itself.    The problem comes from the unbound specification for
"under load" and the impact of the varying drop/mark rate changing the
number of rounds needed to complete a transaction, such as a page load.

For modern TCP on an otherwise unloaded link with any minimally correct
queue management (including drop tail), the page load time is insensitive
to the details of the queue management.    There will be a little bit of
link idle in the first few RTT (early slowstart), and then under a huge
range of conditions for both the web page and the AQM, TCP will maintain at
least a short queue at the bottleneck with zero idle, up until the last
segment is delivered,   TCP will also avoid sending any duplicate data, so
the total data sent will be determined by the total number of bytes in the
page, and the total elapsed time, by the page size and link rate (plus the
idle from startup).

If AQM is used to increase the responsiveness, the losses or ECN marks will
cause the browser to take additional RTTs to load the page.  If there is no
cross traffic, these two effects (more rounds at higher RPM) will exactly
counterbalance each other.

This is perhaps why there are BB deniers: for many simple tasks it has zero
impact.

A concrete definition for "under load" should help to compare metrics
between implementations, but may not help predicting application
performance.
(Note there is a similar issue for base RTT).

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.

[-- Attachment #2: Type: text/html, Size: 2337 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Rpm] RPM open meeting tuesdays 9:30-10:30
  2021-10-04 23:23 [Rpm] Outch! I found a problem with responsiveness Matt Mathis
@ 2021-10-04 23:36 ` Dave Taht
  2021-10-05 15:47   ` Matt Mathis
  2021-10-11 20:52   ` Christoph Paasch
  2021-10-05 16:18 ` [Rpm] Outch! I found a problem with responsiveness Christoph Paasch
  2021-10-05 17:26 ` Stuart Cheshire
  2 siblings, 2 replies; 10+ messages in thread
From: Dave Taht @ 2021-10-04 23:36 UTC (permalink / raw)
  To: Matt Mathis, Karl Auerbach; +Cc: Rpm

for those of you new to this list, we have a regular, open meeting,
tuesdays 9:30-10:30 AM PDT,
presently held at: https://starwrt.v.taht.net:8443/group/bufferbloat

(use any login, no password, chrome is best)




-- 
Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw

Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] RPM open meeting tuesdays 9:30-10:30
  2021-10-04 23:36 ` [Rpm] RPM open meeting tuesdays 9:30-10:30 Dave Taht
@ 2021-10-05 15:47   ` Matt Mathis
  2021-10-11 20:52   ` Christoph Paasch
  1 sibling, 0 replies; 10+ messages in thread
From: Matt Mathis @ 2021-10-05 15:47 UTC (permalink / raw)
  To: Dave Taht; +Cc: Karl Auerbach, Rpm

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

I will try to be on but anticipate my preceding event running way over.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.


On Mon, Oct 4, 2021 at 4:36 PM Dave Taht <dave.taht@gmail.com> wrote:

> for those of you new to this list, we have a regular, open meeting,
> tuesdays 9:30-10:30 AM PDT,
> presently held at: https://starwrt.v.taht.net:8443/group/bufferbloat
>
> (use any login, no password, chrome is best)
>
>
>
>
> --
> Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw
>
> Dave Täht CEO, TekLibre, LLC
>

[-- Attachment #2: Type: text/html, Size: 1599 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] RPM open meeting tuesdays 9:30-10:30
  2021-10-04 23:36 ` [Rpm] RPM open meeting tuesdays 9:30-10:30 Dave Taht
  2021-10-05 15:47   ` Matt Mathis
@ 2021-10-11 20:52   ` Christoph Paasch
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Paasch @ 2021-10-11 20:52 UTC (permalink / raw)
  To: Dave Taht; +Cc: Rpm

For tomorrow's meeting I'm a "maybe" as I may have a conflicting meeting.


Christoph


On 10/04/21 - 16:36, Dave Taht via Rpm wrote:
> for those of you new to this list, we have a regular, open meeting,
> tuesdays 9:30-10:30 AM PDT,
> presently held at: https://starwrt.v.taht.net:8443/group/bufferbloat
> 
> (use any login, no password, chrome is best)
> 
> 
> 
> 
> -- 
> Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw
> 
> Dave Täht CEO, TekLibre, LLC
> _______________________________________________
> Rpm mailing list
> Rpm@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/rpm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] Outch! I found a problem with responsiveness
  2021-10-04 23:23 [Rpm] Outch! I found a problem with responsiveness Matt Mathis
  2021-10-04 23:36 ` [Rpm] RPM open meeting tuesdays 9:30-10:30 Dave Taht
@ 2021-10-05 16:18 ` Christoph Paasch
  2021-10-05 21:43   ` Simon Leinen
  2021-10-05 17:26 ` Stuart Cheshire
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Paasch @ 2021-10-05 16:18 UTC (permalink / raw)
  To: Matt Mathis; +Cc: Rpm

Hello Matt,

On 10/04/21 - 16:23, Matt Mathis via Rpm wrote:
> It has a super Heisenberg problem, to the point where it  is unlikely to
> have much predictive value under conditions that are different from the
> measurement itself.    The problem comes from the unbound specification for
> "under load" and the impact of the varying drop/mark rate changing the
> number of rounds needed to complete a transaction, such as a page load.

this is absolutely right. This is why it is not just "Responsiveness", but
"Responsiveness under working conditions" and it is important to specify the
"working conditions" properly. They need to be using a "realistic" workload,
while at the same time exploring the boundaries. This is why we chose a set
of HTTP/2 bulk data-transfers, using standard congestion controls.

> For modern TCP on an otherwise unloaded link with any minimally correct
> queue management (including drop tail), the page load time is insensitive
> to the details of the queue management.    There will be a little bit of
> link idle in the first few RTT (early slowstart), and then under a huge
> range of conditions for both the web page and the AQM, TCP will maintain at
> least a short queue at the bottleneck with zero idle, up until the last
> segment is delivered,   TCP will also avoid sending any duplicate data, so
> the total data sent will be determined by the total number of bytes in the
> page, and the total elapsed time, by the page size and link rate (plus the
> idle from startup).
> 
> If AQM is used to increase the responsiveness, the losses or ECN marks will
> cause the browser to take additional RTTs to load the page.  If there is no
> cross traffic, these two effects (more rounds at higher RPM) will exactly
> counterbalance each other.
> 
> This is perhaps why there are BB deniers: for many simple tasks it has zero
> impact.

That's right. BB is a transient problem that is extremely short-lived.

Having tried for the past year to reliably demo the user-visible
impact of bufferbloat, I have learned two things:

1. When it happens, it is bad - really bad.
2. However, it is very difficult to trigger it "on-demand".

> A concrete definition for "under load" should help to compare metrics
> between implementations, but may not help predicting application
> performance.

"Responsiveness under working conditions" is a metric similar to throughput
measured by tools like speedtest. Sure, speedtest may measure close to 1Gbps
on my home-network, but that does not mean that I am able to actually send
my emails at 1Gbps.
The same is true for responsiveness. It pushes the network to its limit and
explores the capabilities at that point.

Talk to you soon!

Cheers,
Christoph

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] Outch! I found a problem with responsiveness
  2021-10-05 16:18 ` [Rpm] Outch! I found a problem with responsiveness Christoph Paasch
@ 2021-10-05 21:43   ` Simon Leinen
  2021-10-11 21:01     ` Christoph Paasch
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Leinen @ 2021-10-05 21:43 UTC (permalink / raw)
  To: Christoph Paasch via Rpm

Hallo Christoph,

> That's right. BB is a transient problem that is extremely short-lived.

> Having tried for the past year to reliably demo the user-visible
> impact of bufferbloat, I have learned two things:

> 1. When it happens, it is bad - really bad.
> 2. However, it is very difficult to trigger it "on-demand".

I seem to be able to trigger it quite reliably by using mobile data
while traveling on the train and doing normal remote work.  Here in
Switzerland I often see RTTs in excess of 10 seconds.  In France I have
seen more than two MINUTES.

Maybe I should start setting up systematic measurements.  For example,
if I just sent pings both from my laptop to a well-connected fixed host,
and vice-versa, while capturing all ICMP packets on both ends, I should
be able to learn about bufferbloat in both directions.

It would be even better to have this in a mobile (web) app that could
record/send location data from the mobile node, to spot the regions
(presumably around tunnels and other connectivity-challenged areas)
where the problem tends to occur most often.  Alternatively, correlate
the probe timestamps with real-time location data provided by the
railway company.

Cheers,
-- 
Simon.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] Outch! I found a problem with responsiveness
  2021-10-05 21:43   ` Simon Leinen
@ 2021-10-11 21:01     ` Christoph Paasch
  2021-10-12  7:11       ` Sebastian Moeller
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Paasch @ 2021-10-11 21:01 UTC (permalink / raw)
  To: Simon Leinen; +Cc: Christoph Paasch via Rpm

Hello Simon,

On 10/05/21 - 23:43, Simon Leinen via Rpm wrote:
> Hallo Christoph,
>
> > That's right. BB is a transient problem that is extremely short-lived.
>
> > Having tried for the past year to reliably demo the user-visible
> > impact of bufferbloat, I have learned two things:
>
> > 1. When it happens, it is bad - really bad.
> > 2. However, it is very difficult to trigger it "on-demand".
>
> I seem to be able to trigger it quite reliably by using mobile data
> while traveling on the train and doing normal remote work.  Here in
> Switzerland I often see RTTs in excess of 10 seconds.  In France I have
> seen more than two MINUTES.

wow! Were you able to trace it down? (like, on which device it happend)

> Maybe I should start setting up systematic measurements.  For example,
> if I just sent pings both from my laptop to a well-connected fixed host,
> and vice-versa, while capturing all ICMP packets on both ends, I should
> be able to learn about bufferbloat in both directions.

Having tried to debug some bufferbloat problems in a complex
enterprise-network, it is extremely hard to pinpoint where the bufferbloat
happens.

Especially on such kind of a train network, where there is possible either a
VPN or GRE-tunnel involved to get the data out on the Internet...


If you have macOS Monterey, you could run the networkQuality tool to see how
much bufferbloat there is.


Cheers,
Christoph


> It would be even better to have this in a mobile (web) app that could
> record/send location data from the mobile node, to spot the regions
> (presumably around tunnels and other connectivity-challenged areas)
> where the problem tends to occur most often.  Alternatively, correlate
> the probe timestamps with real-time location data provided by the
> railway company.
>
> Cheers,
> --
> Simon.
> _______________________________________________
> Rpm mailing list
> Rpm@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/rpm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] Outch! I found a problem with responsiveness
  2021-10-11 21:01     ` Christoph Paasch
@ 2021-10-12  7:11       ` Sebastian Moeller
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Moeller @ 2021-10-12  7:11 UTC (permalink / raw)
  To: Christoph Paasch, Christoph Paasch via Rpm, Simon Leinen

HI Christoph

On 11 October 2021 23:01:20 CEST, Christoph Paasch via Rpm <rpm@lists.bufferbloat.net> wrote:
Hello Simon,

On 10/05/21 - 23:43, Simon Leinen via Rpm wrote:
Hallo Christoph,

That's right. BB is a transient problem that is extremely short-lived.

Having tried for the past year to reliably demo the user-visible
impact of bufferbloat, I have learned two things:

1. When it happens, it is bad - really bad.
2. However, it is very difficult to trigger it "on-demand".

I seem to be able to trigger it quite reliably by using mobile data
while traveling on the train and doing normal remote work.  Here in
Switzerland I often see RTTs in excess of 10 seconds.  In France I have
seen more than two MINUTES.

wow! Were you able to trace it down? (like, on which device it happend)

Maybe I should start setting up systematic measurements.  For example,
if I just sent pings both from my laptop to a well-connected fixed host,
and vice-versa, while capturing all ICMP packets on both ends, I should
be able to learn about bufferbloat in both directions.

Having tried to debug some bufferbloat problems in a complex
enterprise-network, it is extremely hard to pinpoint where the bufferbloat
happens.

Especially on such kind of a train network, where there is possible either a
VPN or GRE-tunnel involved to get the data out on the Internet...

	[SM] Mmmh, to really localize points of congestion, I assume one needs continuous traceroutes (like mtr or pingplotter) from both directions and ideally one-way delay measurements. Tunnels where the congested node might be invisible add to the challenge. But getting information for the reverse path is really important especially for internet targets as path's are very likely to be asymmetrical with handover between different AS at wildly different locations in both directions.

If you have macOS Monterey, you could run the networkQuality tool to see how
much bufferbloat there is.

	[SM] Is that officially out or still in late beta/RC?

Cheers,
Christoph

It would be even better to have this in a mobile (web) app that could
record/send location data from the mobile node, to spot the regions
(presumably around tunnels and other connectivity-challenged areas)
where the problem tends to occur most often.  Alternatively, correlate
the probe timestamps with real-time location data provided by the
railway company.

Cheers,
--
Simon.
Rpm mailing list
Rpm@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/rpm
Rpm mailing list
Rpm@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/rpm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] Outch! I found a problem with responsiveness
  2021-10-04 23:23 [Rpm] Outch! I found a problem with responsiveness Matt Mathis
  2021-10-04 23:36 ` [Rpm] RPM open meeting tuesdays 9:30-10:30 Dave Taht
  2021-10-05 16:18 ` [Rpm] Outch! I found a problem with responsiveness Christoph Paasch
@ 2021-10-05 17:26 ` Stuart Cheshire
  2021-10-05 22:01   ` Matt Mathis
  2 siblings, 1 reply; 10+ messages in thread
From: Stuart Cheshire @ 2021-10-05 17:26 UTC (permalink / raw)
  To: Matt Mathis; +Cc: Rpm

On 4 Oct 2021, at 16:23, Matt Mathis via Rpm <rpm@lists.bufferbloat.net> wrote:

> It has a super Heisenberg problem, to the point where it  is unlikely to have much predictive value under conditions that are different from the measurement itself.    The problem comes from the unbound specification for "under load" and the impact of the varying drop/mark rate changing the number of rounds needed to complete a transaction, such as a page load.
> 
> For modern TCP on an otherwise unloaded link with any minimally correct queue management (including drop tail), the page load time is insensitive to the details of the queue management.    There will be a little bit of link idle in the first few RTT (early slowstart), and then under a huge range of conditions for both the web page and the AQM, TCP will maintain at least a short queue at the bottleneck

Surely you mean: TCP will maintain an EVER GROWING queue at the bottleneck? (Of course, the congestion control algorithm in use affects the precise nature of queue growth here. For simplicity here I’m assuming Reno or CUBIC.)

> TCP will also avoid sending any duplicate data, so the total data sent will be determined by the total number of bytes in the page, and the total elapsed time, by the page size and link rate (plus the idle from startup).

You are focusing on time-to-completion for a flow. For clicking “send” on an email, this is a useful metric. For watching a two-hour movie, served as a single large HTTP GET for the entire media file, and playing it as it arrives, time-to-completion is not very interesting. What matters is consistent smooth delivery of the bytes within that flow, so the video can be played as it arrives. And if I get bored of that video and click another, the the amount of (now unwanted) stale packets sitting in the bottleneck queue is what limits how quickly I get to see the new video start playing.

> If AQM is used to increase the responsiveness, the losses or ECN marks will cause the browser to take additional RTTs to load the page.  If there is no cross traffic, these two effects (more rounds at higher RPM) will exactly counterbalance each other.

Right: Improving responsiveness has *no* downside on time-to-completion for a flow. Throughput -- in bytes per second -- is unchanged. What improving responsiveness does is improve what happens throughout the lifetime of the transfer, without affecting the end time either for better or for worse.

> This is perhaps why there are BB deniers: for many simple tasks it has zero impact.

Of course. In the development of any technology we solve the most obvious problems first, and the less obvious ones later.

If there was a bug that occasionally resulted in a corrupted file system and loss of data, would people argue that we shouldn’t fix it on the grounds that sometimes it *doesn’t* corrupt the file system?

If you car brakes didn’t work, would people argue that it doesn’t matter, because -- statistically speaking -- the brake pedal is depressed for only a tiny percentage of overall the time you spend driving?

Stuart Cheshire

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Rpm] Outch! I found a problem with responsiveness
  2021-10-05 17:26 ` Stuart Cheshire
@ 2021-10-05 22:01   ` Matt Mathis
  0 siblings, 0 replies; 10+ messages in thread
From: Matt Mathis @ 2021-10-05 22:01 UTC (permalink / raw)
  To: Stuart Cheshire; +Cc: Rpm

[-- Attachment #1: Type: text/plain, Size: 3898 bytes --]

What you say is correct for effectively infinite bulk transfers.    I was
talking about transactional data such as web pages.    These days most
video (including many VC systems) are paced transactions.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.


On Tue, Oct 5, 2021 at 10:26 AM Stuart Cheshire <cheshire@apple.com> wrote:

> On 4 Oct 2021, at 16:23, Matt Mathis via Rpm <rpm@lists.bufferbloat.net>
> wrote:
>
> > It has a super Heisenberg problem, to the point where it  is unlikely to
> have much predictive value under conditions that are different from the
> measurement itself.    The problem comes from the unbound specification for
> "under load" and the impact of the varying drop/mark rate changing the
> number of rounds needed to complete a transaction, such as a page load.
> >
> > For modern TCP on an otherwise unloaded link with any minimally correct
> queue management (including drop tail), the page load time is insensitive
> to the details of the queue management.    There will be a little bit of
> link idle in the first few RTT (early slowstart), and then under a huge
> range of conditions for both the web page and the AQM, TCP will maintain at
> least a short queue at the bottleneck
>
> Surely you mean: TCP will maintain an EVER GROWING queue at the
> bottleneck? (Of course, the congestion control algorithm in use affects the
> precise nature of queue growth here. For simplicity here I’m assuming Reno
> or CUBIC.)
>
> > TCP will also avoid sending any duplicate data, so the total data sent
> will be determined by the total number of bytes in the page, and the total
> elapsed time, by the page size and link rate (plus the idle from startup).
>
> You are focusing on time-to-completion for a flow. For clicking “send” on
> an email, this is a useful metric. For watching a two-hour movie, served as
> a single large HTTP GET for the entire media file, and playing it as it
> arrives, time-to-completion is not very interesting. What matters is
> consistent smooth delivery of the bytes within that flow, so the video can
> be played as it arrives. And if I get bored of that video and click
> another, the the amount of (now unwanted) stale packets sitting in the
> bottleneck queue is what limits how quickly I get to see the new video
> start playing.
>
> > If AQM is used to increase the responsiveness, the losses or ECN marks
> will cause the browser to take additional RTTs to load the page.  If there
> is no cross traffic, these two effects (more rounds at higher RPM) will
> exactly counterbalance each other.
>
> Right: Improving responsiveness has *no* downside on time-to-completion
> for a flow. Throughput -- in bytes per second -- is unchanged. What
> improving responsiveness does is improve what happens throughout the
> lifetime of the transfer, without affecting the end time either for better
> or for worse.
>
> > This is perhaps why there are BB deniers: for many simple tasks it has
> zero impact.
>
> Of course. In the development of any technology we solve the most obvious
> problems first, and the less obvious ones later.
>
> If there was a bug that occasionally resulted in a corrupted file system
> and loss of data, would people argue that we shouldn’t fix it on the
> grounds that sometimes it *doesn’t* corrupt the file system?
>
> If you car brakes didn’t work, would people argue that it doesn’t matter,
> because -- statistically speaking -- the brake pedal is depressed for only
> a tiny percentage of overall the time you spend driving?
>
> Stuart Cheshire
>
>

[-- Attachment #2: Type: text/html, Size: 4478 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-12  7:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-04 23:23 [Rpm] Outch! I found a problem with responsiveness Matt Mathis
2021-10-04 23:36 ` [Rpm] RPM open meeting tuesdays 9:30-10:30 Dave Taht
2021-10-05 15:47   ` Matt Mathis
2021-10-11 20:52   ` Christoph Paasch
2021-10-05 16:18 ` [Rpm] Outch! I found a problem with responsiveness Christoph Paasch
2021-10-05 21:43   ` Simon Leinen
2021-10-11 21:01     ` Christoph Paasch
2021-10-12  7:11       ` Sebastian Moeller
2021-10-05 17:26 ` Stuart Cheshire
2021-10-05 22:01   ` Matt Mathis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox