From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 82FD63CB39 for ; Fri, 19 Jan 2024 08:14:25 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1705670059; x=1706274859; i=moeller0@gmx.de; bh=7uqse8FHDwISIlP+Csb4mNCiCJhc1Fc3cGlgaXJ9iTs=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References: To; b=gxLfZOLcUnM566Jkg8iqZ8xGv7h+kKYbrzGTn5ST64LlXCC6AvTl30TKUZdVz/nC It7QNq10wFzEzg2O26TWsDDSZRxGJ/0mP+sgDyx0xdYMCIqWbVpplbQ/h+nqu1gwN aM1SB1zhtDqK3z/9YPG82aIbrunD1MUZkvKZlFbm/bkRtIVgGp0O3MgRcz3/2s7Qa lDneJhYE7JJEFiLx06u0aLNnyamAN/gL+ozLzK6n3Wy4qpBrdMZNz7OalafFFC4Zb I5Ow+5RdXbsBiEC79m1JIkM+Rmhpc82NNeu8lZDY1JZVPNMAgjx6rVyBhbiNW7vM2 KTrMlLACZaH8wIyALQ== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MPXd2-1rmBL72LCc-00MaPP; Fri, 19 Jan 2024 14:14:19 +0100 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.300.61.1.2\)) From: Sebastian Moeller In-Reply-To: <7494CC8D-7BAB-41DB-9FF7-7306747F2DC9@apple.com> Date: Fri, 19 Jan 2024 14:14:09 +0100 Cc: IETF IPPM WG , Rpm Content-Transfer-Encoding: quoted-printable Message-Id: <14EC339A-9A84-40C5-AFCC-474DF03C16B6@gmx.de> References: <7494CC8D-7BAB-41DB-9FF7-7306747F2DC9@apple.com> To: Christoph Paasch X-Mailer: Apple Mail (2.3774.300.61.1.2) X-Provags-ID: V03:K1:e+00WRI3ceuCjQ3wMJjYwSOMQcuijI+SclPtdEbqABV3rli/p4b 0fWJprnvmLjLNuQ/pjUa5Pd7J0c+W7tYDkDgqQZrfiggdISOjFdtoWIbhiXYZcsgQhV7ivD E9updJ+IIYkrhZIOfw0lxuCX/3h1Ye0eFGb5JD1vTYXhKaiTKPZJmhQlDjxc3eS8SRyI/WT /0t14g/FijONdB7OFHNww== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:qc2LjwIgJVY=;7fcFEC0bz6rV4YvHKwclauKO0Vp eQHYYU0xA2Sv2FBxEYwgQpAUF2rwA+cVaxX9et8FEqGU38fyoXuxVv3ESmdy2Q5Xf4j0xr9Xi 5mMlBDqzRUz5qOBC2/AYPitCcV1heRdbvP7e1Jxyt3YKW5DkP8hMg+pfmoK7S5nXff6ydeKqL M81qsqtdJ2tI2i3vg/Cl/gaZGoYj652xlpCZN48JqOne8DfmUWuPMJ1Zt0QpbWtCB647OFlph UYS9Ww9kOX0RaCji2u4F4BPEmkU0I3oKnD6+7G7x7ZP7dJAfUZdAIlszRDe6+MpThi4ntbT1W hDvfgBAHqc0JZmHPzD989QDi/1es8RoU7yFQ7QexDf9644clo1Y9n4P6sLysl7h5ma4iDj1mH fsupiKnq6v6B1a4id8ZPwoB1Aml0LPRRsVQBv3p92d5MPW97vzxAa/AWyHRFQH04Nm4dz/Yzi RutfMBmEc/8Q3EBN66buHm1aarTVQ+reZLaKXs0Rlm/WFEstlpP0PgF/ofqzu0xAY7uT2UCKI OslDdSq5zX1N5kJeSSltXODf1lsmGJowvqOcQvVf/s8+wuHlAYkjBCiSfNps0k7JTDvbYH+IH 8JG5Lr9RC0Ni8r+vOS61ZWWTW0l0X/PGpNtkeFM6VbMSQdP0Y/nVj4owkf0B0sidTGWZYsioB UHCsT9ZMfRJpLA5q5CeHO8SmBW+9MNXWSjaDI0qnjCOcMgkwcZ/z3FypiTngVugTgET/6tXw8 uVF0rI41hzo1KIk4W0EaNrgoxQrk2rTzARwt3f4VKLeYBGVuNKSx8sS30AGfC6L5Sap6kNGhI Jr1ZCZgGdX2ey89GJuUnvELNavsjJq5YG2ZOLGAjjF4L2uGRTYGjLJDpmtRa/TWUzo3cmtG3Z YF/cTf/hQfjhmHR8y4ZLV4/Ni6CxufBbiJ3LlPUKTMi2M1/oD4S9GarJrwRDuO0Sgb5/nXZnh ciZzAmySAGp4wY0KqdxoKRdCt3U= Subject: Re: [Rpm] [ippm] draft-ietf-ippm-responsiveness X-BeenThere: rpm@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: revolutions per minute - a new metric for measuring responsiveness List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Jan 2024 13:14:25 -0000 Hi Christoph > On 16. Jan 2024, at 20:01, Christoph Paasch wrote: >=20 > Hello Sebastian, >=20 >=20 > thanks for the feedback, please see inline! >=20 >> On Dec 3, 2023, at 10:13=E2=80=AFAM, Sebastian Moeller = wrote: >>=20 >> Dear IPPM members, >>=20 >> On re-reading the current responsiveness draft I stumbled over the = following section: >>=20 >>=20 >> Parallel vs Sequential Uplink and Downlink >>=20 >> Poor responsiveness can be caused by queues in either (or both) the = upstream and the downstream direction. Furthermore, both paths may = differ significantly due to access link conditions (e.g., 5G downstream = and LTE upstream) or routing changes within the ISPs. To measure = responsiveness under working conditions, the algorithm must explore both = directions. >>=20 >> One approach could be to measure responsiveness in the uplink and = downlink in parallel. It would allow for a shorter test run-time. >>=20 >> However, a number of caveats come with measuring in parallel: >>=20 >> =E2=80=A2 Half-duplex links may not permit simultaneous uplink and = downlink traffic. This restriction means the test might not reach the = path's capacity in both directions at once and thus not expose all the = potential sources of low responsiveness. >> =E2=80=A2 Debuggability of the results becomes harder: During = parallel measurement it is impossible to differentiate whether the = observed latency happens in the uplink or the downlink direction. >> Thus, we recommend testing uplink and downlink sequentially. Parallel = testing is considered a future extension. >>=20 >>=20 >> I argue, that this is not the correct diagnosis and hence not the = correct decision. >> For half-duplex links the given argument is not incorrect, but = incomplete, as it is quite likely that when forced to multiplex more = bi-directional traffic (all TCP testing is bi-directional, so we only = argue about the amount of reverse traffic, not whether it exist, and = even if we would switch to QUIC/UDP we would still need a feed-back = channel) we will se different "potential sources of low responsiveness" = so ignoring any of the two seems ill advised. >=20 > You are saying that parallel bi-directional traffic exposes different = sources of responsiveness issues than uni-directional traffic (up and = down) ? What kind of different sources would that expose ? Can you give = some examples and maybe a suggestion on how to word things ? [SM] If the bottleneck is a WiFi link we occasionally see that some OS = are more aggressive than others in acquiring airtime, which easily = results in differential throughput for the two directions and often = higher queueing delay for the direction that is 'slowed' down. In theory = that should not really happen but in practise it does, e.g. the ISP = unhelpfully passes undesired DSCP marks into a home network that then = are acted upon by WiFi WMM. To elaborate, Comcast for a long time had an = issue where large fractions (IIRC up to 25%) of packets where = inadvertently marked as CS1 which in default WMM translates to AC_BK, = and if the client sends the upload traffic via the default AC_BE, these = differential AC usage can now result in different queueing delay = compared to looking at upload and download individually. (If all traffic = of a channel uses AC_BK instead of AC_BE this should not affect latency = much) Side-note: Comcast after being alerted took notice of the issue and = fixed it, but I think this kind of issue can happen to other ISPs as = well. >=20 >> Debuggability is not "rocket science" either, all one needs is a = three value timestamp format (similar to what NTP uses) and one can, = even without synchronized clocks! establish baseline OWDs and then under = bi-directional load one can see which of these unloaded OWDs actually = increases, so I argue that "it is impossible to differentiate whether = the observed latency happens in the uplink or the downlink direction" is = simply an incorrect assertion... (and we are actually doing this = successfully in the existing internet as part of the cake-autorate = project [h++ps://github.com/lynxthecat/cake-autorate/tree/master] = already, based on ICMP timestamps). The relevant observation here is = that we are not necessarily interested in veridical OWDs under idle = conditions, but we want to see which OWD(s) increase during = working-conditions, and that works with desynchronized clocks and is = also robust against slow clock drift. >=20 > Unfortunately, this would require for the server to add timestamps to = the HTTP-response, right ? [SM] Yes in a sense.... but that could be a a small process that simply = updates the content of that file every couple of milliseconds, so would = not strictly need to be the server process...=20 > We opted against this because the =E2=80=9Cpower=E2=80=9D of the = responsiveness methodology is that it is extremely lightweight on the = server-side. And with lightweight I mean not only from an = implementation/CPU perspective but also from a deployment perspective. = All one needs to do on the server in order to provide a = responsiveness-measurement-endpoint is to host 2 files (one very large = one and a very small one) and provide an endpoint to =E2=80=9CPOST=E2=80=9D= data to. All of these are standard capabilities in every webserver that = can easily be configured. And we have seen a rise of endpoints showing = up thanks to the simplicity to deploy it. >=20 > So, it is IMO a balance between =E2=80=9Cdeployability=E2=80=9D and = =E2=80=9Cdebuggability=E2=80=9D. The responsiveness test is clearly = aiming towards being deployable and accessible. Thus I think we would = prefer keeping things on the server-side simple. >=20 >=20 > Thoughts ? [SM] I really really would like some way to get OWDs if only optional, = but even more than that I think RPM should get as wide a deployment as = possible, ubiquity has its own inherent value for measurement platforms, = so if this makes deployment harder it would be a no-go.=20 Now, I get that this is a long shot, but I fear that if the draft does = not mention this at all the chance will be gone forever....=20 Could we maybe add a description of an optional 'time' payload, so = clients could expect a single standardised format for that, if a server = would optionally support it? > That being said, I=E2=80=99m not entirely opposed to recommending the = parallel mode as well. The interesting bit about the parallel mode is = not so much the responsiveness measurement but rather the capacity = measurement. Because, surprisingly many modems/=E2=80=A6 that are = supposedly (according to their spec-sheet) able to handle 1 Gbps = full-duplex suddenly show their weakness and are no more able to handle = line-rate. So, it is more about capacity than responsiveness IMO. [SM] True, yet such overload also occasionally affects queuing delay and = jitter (sure RPM does not report jitter, but it likely affects the = ability of a test to reach the required stability criteria). > However, as a frequent user of the networkQuality-tool I realize = myself that whenever I want to test my network I end up using a = sequential test in favor of the parallel test. [SM] I agree that a full complement of upload, then download, then = combined upload & download is a great tool for understanding network = behaviour. I also want to applaud Apple's networkQuality of an excellent = implementation of the ideas behind this draft, offering a great and well = selected set of options: USAGE: networkQuality [-C ] [-c] [-d] [-f = ] [-h] [-I ] [-k] [-p] [-r = host] [-S ] [-s] [-u] [-v] -C: Override Configuration URL or path (with scheme file://) -c: Produce computer-readable output -d: Do not run a download test (implies -s) -f: : Enforce Protocol selections. Available = options: h1: Force-enable HTTP/1.1 h2: Force-enable HTTP/2 h3: Force-enable HTTP/3 (QUIC) L4S: Force-enable L4S noL4S: Force-disable L4S -h: Show help (this message) -I: Bind test to interface (e.g., en0, pdp_ip0,...) -k: Disable certificate validation -p: Use iCloud Private Relay -r: Connect to host or IP, overriding DNS for initial config request -S: Start and run server on specified port. Other specified options = ignored -s: Run tests sequentially instead of parallel upload/download -u: Do not run an upload test (implies -s) -v: Verbose output that cover a lot of cases with a relative small set of control = parameters. >=20 >=20 >=20 > Christoph >=20 >=20 >>=20 >> Given these observations, I ask that we change this design parameter = to default requiring both measurement modes and defaulting to parallel = testing (or randomly select between both modes, but report which it = choose). >>=20 >> Best Regards >> Sebastian >> _______________________________________________ >> ippm mailing list >> ippm@ietf.org >> https://www.ietf.org/mailman/listinfo/ippm >=20