[Rpm] [ippm] draft-ietf-ippm-responsiveness

Tue Jan 16 14:01:08 EST 2024

Hello Sebastian,

thanks for the feedback, please see inline!

> On Dec 3, 2023, at 10:13 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
> 
> Dear IPPM members,
> 
> On re-reading the current responsiveness draft I stumbled over the following section:
> 
> 
> Parallel vs Sequential Uplink and Downlink
> 
> Poor responsiveness can be caused by queues in either (or both) the upstream and the downstream direction. Furthermore, both paths may differ significantly due to access link conditions (e.g., 5G downstream and LTE upstream) or routing changes within the ISPs. To measure responsiveness under working conditions, the algorithm must explore both directions.
> 
> One approach could be to measure responsiveness in the uplink and downlink in parallel. It would allow for a shorter test run-time.
> 
> However, a number of caveats come with measuring in parallel:
> 
> 	• Half-duplex links may not permit simultaneous uplink and downlink traffic. This restriction means the test might not reach the path's capacity in both directions at once and thus not expose all the potential sources of low responsiveness.
> 	• Debuggability of the results becomes harder: During parallel measurement it is impossible to differentiate whether the observed latency happens in the uplink or the downlink direction.
> Thus, we recommend testing uplink and downlink sequentially. Parallel testing is considered a future extension.
> 
> 
> I argue, that this is not the correct diagnosis and hence not the correct decision.
> For half-duplex links the given argument is not incorrect, but incomplete, as it is quite likely that when forced to multiplex more bi-directional traffic (all TCP testing is bi-directional, so we only argue about the amount of reverse traffic, not whether it exist, and even if we would switch to QUIC/UDP we would still need a feed-back channel) we will se different "potential sources of low responsiveness" so ignoring any of the two seems ill advised.

You are saying that parallel bi-directional traffic exposes different sources of responsiveness issues than uni-directional traffic (up and down) ? What kind of different sources would that expose ? Can you give some examples and maybe a suggestion on how to word things ?

> Debuggability is not "rocket science" either, all one needs is a three value timestamp format (similar to what NTP uses) and one can, even without synchronized clocks! establish baseline OWDs and then under bi-directional load one can see which of these unloaded OWDs actually increases, so I argue that "it is impossible to differentiate whether the observed latency happens in the uplink or the downlink direction" is simply an incorrect assertion... (and we are actually doing this successfully in the existing internet as part of the cake-autorate project [h++ps://github.com/lynxthecat/cake-autorate/tree/master] already, based on ICMP timestamps). The relevant observation here is that we are not necessarily interested in veridical OWDs under idle conditions, but we want to see which OWD(s) increase during working-conditions, and that works with desynchronized clocks and is also robust against slow clock drift.

Unfortunately, this would require for the server to add timestamps to the HTTP-response, right ?

We opted against this because the “power” of the responsiveness methodology is that it is extremely lightweight on the server-side. And with lightweight I mean not only from an implementation/CPU perspective but also from a deployment perspective. All one needs to do on the server in order to provide a responsiveness-measurement-endpoint is to host 2 files (one very large one and a very small one) and provide an endpoint to “POST” data to. All of these are standard capabilities in every webserver that can easily be configured. And we have seen a rise of endpoints showing up thanks to the simplicity to deploy it.

So, it is IMO a balance between “deployability” and “debuggability”. The responsiveness test is clearly aiming towards being deployable and accessible. Thus I think we would prefer keeping things on the server-side simple.

Thoughts ?

That being said, I’m not entirely opposed to recommending the parallel mode as well. The interesting bit about the parallel mode is not so much the responsiveness measurement but rather the capacity measurement. Because, surprisingly many modems/… that are supposedly (according to their spec-sheet) able to handle 1 Gbps full-duplex suddenly show their weakness and are no more able to handle line-rate. So, it is more about capacity than responsiveness IMO.
However, as a frequent user of the networkQuality-tool I realize myself that whenever I want to test my network I end up using a sequential test in favor of the parallel test.

Christoph

> 
> Given these observations, I ask that we change this design parameter to default requiring both measurement modes and defaulting to parallel testing (or randomly select between both modes, but report which it choose).
> 
> Best Regards
> 	Sebastian
> _______________________________________________
> ippm mailing list
> ippm at ietf.org
> https://www.ietf.org/mailman/listinfo/ippm