[Bloat] [ippm] Fwd: New Version Notification for draft-cpaasch-ippm-responsiveness-00.txt

Fri Oct 22 19:19:28 EDT 2021

Hello Toerless,

thanks for your feedback! Please see inline:

> On Sep 21, 2021, at 1:50 PM, Toerless Eckert <tte at cs.fau.de> wrote:
> 
> Dear authors
> 
> Thanks for the draft
> 
> a) Can you please update naming of the draft so people remembering RPM will find the draft ?
>   Something like:
> 
>   draft-cpaasch-ippm-rpm-bufferbloat-metric-00
>   Round-trips Per Minute (RPM) under load - a Metric for bufferbloat.

That's a good point! How does draft-cpaasch-ippm-responsiveness-rpm-00 sound?

I prefer not to have bufferbloat in the name as it is a loaded term (there are different interpretations of it) and easily misunderstood. Some consider bufferbloat to only be a problem on the routers/switches, while others consider it to be an end-to-end problem. Our test methodology measures end-to-end responsiveness and end-to-end here goes all the way up to the HTTP/2 implementation.

> b) The draft does not mention, or at least does not have a
>   separate section to discuss where the server is against which the test is run.
>   It should have such a section. I can hink of at least two key options,
>   - the server used for the service in question (e.g.: where contents comes from),
>   - a server at a wll defined location in the access network provider.

I see your point. I don't think the draft needs to "mandate" where one should put those servers. But rather a discussion on where one may want to place the server depending on what one is measuring.

A WiFi AP vendor may want to deploy the server locally in his testbed. A content-provider wants to host the server on its content-servers. And an ISP wants to host it probably at the border of its domain.

> c) I fear that b) leads to be biggest current issue with the metric:
>   The longer the path is, such as full path to a server, the more useful the
>   metric is for the user. But the user will effectively get a per-service metric.
>   To make this more fun to the authors: Imagine the appleTV server nodes have a worse
>   path to a particular user than the Netflix servers. Or vice versa.
> 
>   If we just use a path to some fixed point in the access provider,
>   then we take away the users ability to beat up their OTT services to
>   improve their paths. 
> 
>   If we use only a path toward the service, it will be harder to 
>   hit on the service provider, if the service provider is bad.
> 
>   So, obviously, i would like to have all three RPM: to Netflix, AppleTV
>   and a well defined server in Comcast. Then i can triangulate where
>   my bufferbloat problem is.

Yes, the location of the server has a huge impact on the resulting number. The goal with responsiveness is to measure user-experience. If the user uses Netflix, AppleTV and Comcast streaming, that's what the user is interested in. If the user also uses some remote streaming service on the other side of the ocean, that's as well the responsiveness number the user would be interested in.

> d) Worst yet, without having seen more example numbers (a reference pointing
>   to some good collected RPM numbers would be excellent), my
>   concern is that instead of fixing bufferbloat on paths, we would simply
>   encourage OTT to co-locate servers to the access providers own measurement
>   point, aka: as close to the subscriber.

Deploying close to the subscriber does not yet fix the bufferbloat-problem. If the last-mile or the HTTP-implementation has bufferbloat issues, they are still going to present a bad RPM-number. Sure, once we have eradicated bufferbloat from the Internet, then closeness to the subscriber will become a more driving factor. And that would be a good problem to have :-)

> 
> e) To solve d), maybe two ideas:
> 
>   - relevant to improve bufferbloat is only (lRPM - iRPM), where
>     lRPM would be your current RPM, e.g.: under (l)oaded condition),
>     and iRPM is idle RPM. This still does not take away from the
>     fact that a path with more queuing hops or higher queue loads
>     will fare worse than the shorter physcial propagation latency path,
>     but it does mke the metric significantly be focussed on queueing,
>     and should help a lot when we do compare service that might not
>     have servers in the users metro area.

The difficulty here is that it is near impossible to really measure iRPM. Because, one cannot know whether the network really is idle or not.

> 
>   - lRPM/m - RPM under load per mile (roughly).
>     - Measure idle RTT in units of msec (iRTT)
> 
>     - Measure load RTT in units of msec (lRTT)
> 
>     - Just take iRTT as a measure for the path lenth.
>       normalizing it absolutely is not of first order
>       important, we are primarily interested in relative number,
>       and this keeps the example calculation simple.
> 
>     - the RTT increase because of queueing is (lRTT - iRTT).
> 
>     - (lRTT - iRTT) / iRTT is therefore something like queuing RTT
>       per path stretch. I think this is th relative number we want.
> 
>     - RPM = iRTT / (lRTT - iRTT) * 1000 turns this into some 
>       number increasing with desired non-bufferbloat performance
>       with enough significant in non fractionals.
> 
>     - Example: 
>        idle RTT:  5msec, loaded RTT: 20 msec =>  333 RPM
>        idle RTT: 10msec, loaded RTT: 20 msec => 1000 RPM
>        idle RTT: 15msec, loaded RTT: 20 msec => 3000 RPM
> 
>        This nicely shows how the RPM will go up when the physcial
>        path itself gets longer, but the relevant load RTT stays 
>        the same.
> 
>        idle RTT:  5msec, loaded RTT: 20 msec =>  333 RPM
>        idle RTT: 10msec, loaded RTT: 40 msec =>  333 RPM
>        idle RTT: 15msec, loaded RTT: 60 msec =>  333 RPM
> 
>        This nicely shows that we can have servers at different
>        physical distance and get the same RPM number, when the
>        bufferbloat is the same, e.g.: 15 msec worth of bufferbloat
>        for every 5msec propagation latency segment.
> 
> f) I can see how you do NOT want the type of metric i am
>   proposing, because it only focusses on the bufferbloat
>   factor, and you may want to stick to the full experience of
>   the user, where unmistakingly the propagation latency can
>   not be ignored, but to repeat from above:
> 
>   If we do not use a metric that fairly treats paths of different
>   propagation latencies as the same wrt. performance, i am
>   quite persuaded we will continue to just see big services
>   win out, because hey can more easily afford to get closer
>   to the user with their (rented/time-shared/owned) servers.
> 
>   Aka: Right now RPM is a metric that will specifically 
>   make it easier for one of the big providers of sttreaming
>   such as that of the authors to position themselves better
>   against smaller services streaming from further away.

The goal of the responsiveness measurement is not for it to be a diagnostic tool that allows to find the accurate location of bufferbloat. It is also not a tool to make some streaming service providers look better than others and/or make one "win" against the other.

The goal really is to have an accurate representation of responsiveness under working conditions. If the server end-point is at a very remote location, then indeed the RPM number will be lower. And the responsiveness measurement should reflect that, because it tries to assess the user-experience.

With the standardization of the methodology our hope is that content-providers (small or big) would adopt this methodology so that they can properly assess the network's quality for their users and their services.

Cheers,
Christoph