From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id E9B8F3B29D
 for <rpm@lists.bufferbloat.net>; Thu,  3 Nov 2022 04:57:18 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417;
 t=1667465832; bh=EC8MLl+To5oa2hRj29bA4fhuXX2ZqV6mIBaBxxZe+Uo=;
 h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To;
 b=JgMI895ZAIqR2s9Gcg8k1w9GiDb/W3Ym2Edt2eVQmwfBnyY0epkyhwaNnrACOzVeP
 kSVJByo1dWWfghU2Nk1EpeBBN5txSTCXFU6JOUAFUe7etayi0BCZhkDh4u4p8kdn+z
 9V1VxQLRVXSZm7YmoZ8yTCIeEXfQrUAAq9TTNczg4GP5GtkXcUwMWf2XZc1mpf14ae
 sTo3Xc5tRq80mRfK/naeG4E5//9GMqWqd6HvoW6pQE48UL82Yl9GHff5n9Oc9YxEkm
 GQ0WPx8jgvL542x1uxnyQMIlehQll7Igdf6YyjdrFsJlt7pyZPmAk5PhXncaKIdbEA
 3tfTyvysxAjqQ==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx005
 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MPXhA-1ocqlu1XFq-00Melu; Thu, 03
 Nov 2022 09:57:12 +0100
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <FR2P281MB152704452749951434C237559C389@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM>
Date: Thu, 3 Nov 2022 09:57:09 +0100
Cc: rjmcmahon <rjmcmahon@rjmcmahon.com>, Rpm <rpm@lists.bufferbloat.net>,
 IETF IPPM WG <ippm@ietf.org>, Andrew Somerville <aesomerville@gmail.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <6DEE77BC-2E17-4192-9173-3E78D0164AD6@gmx.de>
References: <CH0PR02MB79808E2508E6AED66DC7657AD32E9@CH0PR02MB7980.namprd02.prod.outlook.com>
 <CH0PR02MB7980DFB52D45F2458782430FD3379@CH0PR02MB7980.namprd02.prod.outlook.com>
 <CH0PR02MB7980D3036BF700A074D902A1D3379@CH0PR02MB7980.namprd02.prod.outlook.com>
 <CAA93jw7Jb_77dZzr-AFjXPtwf_hBxhODyF5UzTX5a-A6+xMkWw@mail.gmail.com>
 <0a8cc31c7077918bf84fddf9db50db02@rjmcmahon.com>
 <CH0PR02MB798043B62D22E8C82F61138DD3379@CH0PR02MB7980.namprd02.prod.outlook.com>
 <CAA93jw6kuHJp_PnUBb6J4HiFmy=xTG9uiu7bML7fuHFzNhMr2w@mail.gmail.com>
 <344f2a33b6bcae4ad4390dcb96f92589@rjmcmahon.com>
 <261B90F5-FD4E-46D5-BEFE-6BF12D249A28@gmx.de>
 <FR2P281MB15274FF81D44E875CC4940259C399@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM>
 <C3839FE9-4FC5-42B2-8AEC-4530C2B956A9@gmx.de>
 <FR2P281MB152704452749951434C237559C389@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM>
To: Ruediger.Geib@telekom.de
X-Mailer: Apple Mail (2.3696.120.41.1.1)
X-Provags-ID: V03:K1:LN21f0akjamJiOvsXmiW489XDubtHFAhVNBSjbcqTj1gGkRmc9m
 b5S1rkELho1ERqHz4ZWqg+RFjapIrjmp58k5n5QDf8GLxR6YNfMokM7QLGdz9fhO1XtRnaU
 zIXSnO8lEkCHmXnyWNhzrLPMvE7vggqVths/N4g1nbOVgIoZGgrIfcaF2VgJXOQ/lUvIuFm
 5lYcvqMi4B6vik4NKy5Mg==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:cVlRip9SKig=;S4Z8sfBTWZN6pZJ1rE15nD44cKI
 /QgB5JFMZbGiGGDJTuOLrZQoEskU6+5GOt8AFjVrXuFb4cTZ9rhg7r7Pcw3OFdVi3xjaO2HPR
 d6Yey6x/REVHjJe0V5hD47yaoyltTYln4yBrNtjCx7Z4fZpWXbyT69ClIu7TsxVMLdVfJ0R5d
 CLWBHF8lPynp0yKW//4WecOjyaQssjP+mNNdN3A9fGNFTLCf/KuHzkxMNyCuDn6YzoehxzO8J
 mdiwmROODheZrL7bunQN3/m97YdjHQ4+agiKOucUDL2TRNCbxGjxa+YGmx+t5u+fbz//6WO4z
 5nzgDvyXq4dNeHEH3RekLcKZBaFn/epdkCw+a5iD3zleKAA5MEK7vYLMlQ5g8ejytaBNkMXyc
 b+SmLOQgy88+1J6l6AapuvHrpfkGEUTcQY7Bx83fe2j1iNbleAMiQVfqoqSuSrqsiQL7I2B7H
 YJMkOOLibbPUqzCaqrDTNO8YIypqNK2ZDsw2yOC2Thh4oWOME6VL+6x7PFr4z4MrIdkec38rc
 BnIQw+MQSAoA21Sdt8/pQrbeqnaMwnDJ5urMBlhm8GUNTCk2HxPhqutEsCwTSJaQOVFGeXD6F
 EH5Un8QPP6YGRgon30yW8kfh5Wze5lJayaFowyUVs1lw1OtvsyvaUUOKLP6eYQfKKnOAeed71
 IM1b+MkpCICZHG0wn6ZLHFrdXPSwZ6fggxwMfjjN1k/MJUmZhscYxGynFm3iCD4bL8A0tNdEV
 Hs5ckMGch3fyaRzH4qyBqKGqp/CcS69r+b6DTXgOvTmutbeJIphP1/PcMoPEVZvlu5CIQt6bM
 N72siDmQbfm6xMA4GkjFYpMiJ2BX5FQO2tPoLTOyuvLeAcfsKyD49v0Tnm3JDNlqKAwOVhZq1
 W4CwmMCNWIquztgdoBT9kcoc+cemc/dVJLCgMHePBVCTuM45nkWkGYkvQaFNRtKuwWCWMQbwv
 Jt24i9AZr05YZtIt/7SHVa2st8c=
Subject: Re: [Rpm] [ippm] lightweight active sensing of bandwidth and
 buffering
X-BeenThere: rpm@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: revolutions per minute - a new metric for measuring responsiveness
 <rpm.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/rpm>,
 <mailto:rpm-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/rpm>
List-Post: <mailto:rpm@lists.bufferbloat.net>
List-Help: <mailto:rpm-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/rpm>,
 <mailto:rpm-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 03 Nov 2022 08:57:19 -0000

Hi Ruediger,

> On Nov 3, 2022, at 09:20, <Ruediger.Geib@telekom.de> =
<Ruediger.Geib@telekom.de> wrote:
>=20
> Hi Sebastian,
>=20
> [SM] Pragmatically we work with delay increase over baseline, which =
seems to work well enough to be useful, while it is unlikely to be =
perfect.
>=20
> RG: I agree. And I appreciate "well enough" - many factors may impact =
delays along a measurement path, causing e.g. temporal noise or =
permanent change.

	[SM] Exactly! We do try to account for some of these factors to =
some degree:
a) we use a set of diverse reflectors by default and use a voting =
mechanism and declare "congested_state" only if W out of the last X =
delay samples (from Y reflectors) were above threshold Z (all of W, X, =
Y, and Z are configurable to tune the code for specific links, but the =
defaults seem to already improve perceived latency-under-load noticeably =
for many/most users that reported back to us).
b) we keep individual "baseline" values for each reflector that iintend =
to model the path-dependent delay component, we adjust the baseline with =
two rules:
	A) if a sample was larger that baseline, we feed it into an EWMA =
to update/increase the baseline slightly (which will allow the baseline =
to slowly grow to larger values if e.g. a path changed and now is =
longer); the assumption here is that underestimating the "true" baseline =
will make the controller more "trigger-happy" which will keep latency =
low at the expense of potential throughput, and we prefer that transient =
loss in potential throughput over a similarly transient decrease in =
responsiveness.
	B) if a sample is smaller than the baseline we immediately =
update the baseline to that value. (One rationale for that is that =
during prolonged congestion epochs the EWMA method in A) will have =
artificially increased our base line estimate so we want to be quick to =
correct it again if the congestion relaxes even for a short duration; =
the other is to quickly adapt in case a path change results in a shorter =
path delay; again the principle is that we value responsiveness over =
throughput).

And yes, the goal is not to model congestion veridically, but really to =
offer something that is "good enough" to help while not being =
excessively complicated (though admittedly as often happens with =
heuristics we added more and more code once testing showed our previous =
approach as too simplistic). Also this effort over on the OpenWrt list =
is actually a set of multiple parallel efforts  with cake-autorate* only =
one ~4 alternatives (implemented in perl, lua, awk, and bash) that =
cross-pollinated each other and IMHO all approaches improved from open =
discussion and friendly competition.


*) And cake-autorate is neither the first nor the most innovative of the =
bunch (that would IMHO the perl implementation that had existed first =
and had single-handedly demonstrated the suitability of ICMP type 13/14 =
timestamps for directional "congestion" measurements) but the one with =
the most dedicated and open-minded main developer** (CCd) integrating =
all ideas (and coming up with new ones), without whom neither of the =
approaches would have become public or even happened I think. It is also =
the approach that appears to have the most active development momentum =
and testing community.

**) A developer that also uses this code on a day-to-day basis, so =
"dog-fooding" at its finest, that really helps I think to stay pragmatic =
;)


>=20
> Regards,
>=20
> Ruediger
>=20
>=20
>=20
> -----Urspr=C3=BCngliche Nachricht-----
> Von: Sebastian Moeller <moeller0@gmx.de>=20
> Gesendet: Mittwoch, 2. November 2022 22:41
> An: Geib, R=C3=BCdiger <Ruediger.Geib@telekom.de>
> Cc: rjmcmahon <rjmcmahon@rjmcmahon.com>; Rpm =
<rpm@lists.bufferbloat.net>; ippm@ietf.org
> Betreff: Re: [ippm] [Rpm] lightweight active sensing of bandwidth and =
buffering
>=20
> Dear Ruediger,
>=20
> thank you very much for your helpful information. I will chew over =
this and see how/if I can exploit these "development of congestion =
observations" somehow.=20
> The goal of these plots is not primarily to detect congestion* (that =
would be the core of autorate's functionality, detect increases in delay =
and respond in reducing the shaper rate to counter act them), but more =
to show how well this works (the current rationale is that compared to a =
situation without traffic shaping the difference in high versus low-load =
CDFs should be noticeably** smaller).
>=20
> *) autorate will be in control of an artificial bottleneck and we do =
measure the achieved throughput per direction, so we can reason about =
"congestion" based on throughput and delay; the loading is organic in =
that we simply measure the traffic volume per time of what travels over =
the relevant interfaces, the delay measurements however are active, =
which has its pros and cons...
> **) Maybe even run a few statistical tests, like =
Mann-Withney-U/Wilcoxon ranksum test and then claim "significantly =
smaller". I feel a parametric t-test might not be in order here, with =
delay PDFs decidedly non-normal in shape (then again they likely are =
mono-modal, so t-test would still work okayish in spite of its core =
assumption being violated).
>=20
>=20
>> On Nov 2, 2022, at 10:41, <Ruediger.Geib@telekom.de> =
<Ruediger.Geib@telekom.de> wrote:
>>=20
>> Bob, Sebastian,
>>=20
>> not being active on your topic, just to add what I observed on =
congestion:
>=20
> 	[SM] I will try to explain how/if we could exploit your =
observations for our controller
>=20
>> - starts with an increase of jitter, but measured minimum delays =
still remain constant. Technically, a queue builds up some of the time, =
but it isn't present permanently.
>=20
> 	[SM] So in that phase we would expect CDFs to have different =
slopes, higher variance should result in shallower slope? As for using =
this insight for the actual controller, I am not sure how that would =
work; maybe maintaining a "jitter" base line per reflector and test =
whether each new sample deviates significantly from that base line? That =
is similar to the approach we are currently taking with delay/RTT.
>=20
>> - buffer fill reaches a "steady state", called bufferbloat on access =
I=20
>> think
>=20
> 	[SM] I would call it buffer bloat if that steady-state results =
in too high delays increases (which to a degree is a subjective =
judgement). Although in accordance with the Nichols/Jacobsen analogy of =
buffers/queues as shock absorbers a queue with with acceptable =
steady-state induced delay might not work too well to even out =
occasional bursts?
>=20
>> ; technically, OWD increases also for the minimum delays, jitter now =
decreases (what you've described that as "the delay magnitude" decreases =
or "minimum CDF shift" respectively, if I'm correct).
>=20
> 	[SM] That is somewhat unfortunate as it is harder to detect =
quickly than something that simply increases and stays high (like RTT).
>=20
>> I'd expect packet loss to occur, once the buffer fill is on steady =
state, but loss might be randomly distributed and could be of a low =
percentage.
>=20
> 	[SM] Loss is mostly invisible to our controller (it would need =
to affect our relatively thin active delay measurement traffic we have =
no insight into the rest of the traffic), but more than that the =
controller's goal is to avoid this situation so hopefully it will be =
rare and transient.
>=20
>> - a sudden rather long load burst may cause a  jump-start to =
"steady-state" buffer fill.
>=20
> 	[SM] As would a rather steep drop in available capacity with =
traffic in-flight sized to the previous larger capacity. This is e.g. =
what can be observed over shared media like docsis/cable and GSM =
successors.
>=20
>=20
>> The above holds for a slow but steady load increase (where the =
measurement frequency determines the timescale qualifying "slow").
>> - in the end, max-min delay or delay distribution/jitter likely isn't =
an easy to handle single metric to identify congestion.
>=20
> 	[SM] Pragmatically we work with delay increase over baseline, =
which seems to work well enough to be useful, while it is unlikely to be =
perfect. The CDFs I plotted are really just for making sense post hoc =
out of the logged data... (cake-autorate is currently designed to =
maintain a "flight-recorder" log buffer that can be extracted after =
noticeable events, and I am trying to come up with how to slice and dice =
the data to help explain "noticeable events" from the limited log data =
we have).
>=20
> Many Thanks & Kind Regards
> 	Sebastian
>=20
>=20
>>=20
>> Regards,
>>=20
>> Ruediger
>>=20
>>=20
>>> On Nov 2, 2022, at 00:39, rjmcmahon via Rpm =
<rpm@lists.bufferbloat.net> wrote:
>>>=20
>>> Bufferbloat shifts the minimum of the latency or OWD CDF.
>>=20
>> 	[SM] Thank you for spelling this out explicitly, I only worked =
on a vage implicit assumption along those lines. However what I want to =
avoid is using delay magnitude itself as classifier between high and low =
load condition as that seems statistically uncouth to then show that the =
delay differs between the two classes;).=20
>> 	Yet, your comment convinced me that my current load threshold =
(at least for the high load condition) probably is too small, exactly =
because the "base" of the high-load CDFs coincides with the base of the =
low-load CDFs implying that the high-load class contains too many =
samples with decent delay (which after all is one of the goals of the =
whole autorate endeavor).
>>=20
>>=20
>>> A suggestion is to disable x-axis auto-scaling and start from zero.
>>=20
>> 	[SM] Will reconsider. I started with start at zero, end then =
switched=20
>> to an x-range that starts with the delay corresponding to 0.01% for=20=

>> the reflector/condition with the lowest such value and stops at 97.5%=20=

>> for the reflector/condition with the highest delay value. My =
rationale=20
>> is that the base delay/path delay of each reflector is not all that=20=

>> informative* (and it can still be learned from reading the x-axis),=20=

>> the long tail > 50% however is where I expect most differences so I=20=

>> want to emphasize this and finally I wanted to avoid that the actual=20=

>> "curvy" part gets compressed so much that all lines more or less=20
>> coincide. As I said, I will reconsider this
>>=20
>>=20
>> *) We also maintain individual baselines per reflector, so I could =
just plot the differences from baseline, but that would essentially =
equalize all reflectors, and I think having a plot that easily shows =
reflectors with outlying base delay can be informative when selecting =
reflector candidates. However once we actually switch to OWDs baseline =
correction might be required anyways, as due to colck differences ICMP =
type 13/14 data can have massive offsets that are mostly indicative of =
un synched clocks**.
>>=20
>> **) This is whyI would prefer to use NTP servers as reflectors with =
NTP requests, my expectation is all of these should be reasonably synced =
by default so that offsets should be in the sane range....
>>=20
>>=20
>>>=20
>>> Bob
>>>> For about 2 years now the cake w-adaptive bandwidth project has =
been=20
>>>> exploring techniques to lightweightedly sense  bandwidth and=20
>>>> buffering problems. One of my favorites was their discovery that=20
>>>> ICMP type 13 got them working OWD from millions of ipv4 devices!
>>>> They've also explored leveraging ntp and multiple other methods, =
and=20
>>>> have scripts available that do a good job of compensating for 5g =
and=20
>>>> starlink's misbehaviors.
>>>> They've also pioneered a whole bunch of new graphing techniques,=20
>>>> which I do wish were used more than single number summaries=20
>>>> especially in analyzing the behaviors of new metrics like rpm,=20
>>>> samknows, ookla, and
>>>> RFC9097 - to see what is being missed.
>>>> There are thousands of posts about this research topic, a new post=20=

>>>> on OWD just went by here.
>>>> https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/135379/793
>>>> and of course, I love flent's enormous graphing toolset for=20
>>>> simulating and analyzing complex network behaviors.
>>> _______________________________________________
>>> Rpm mailing list
>>> Rpm@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/rpm
>>=20
>> _______________________________________________
>> ippm mailing list
>> ippm@ietf.org
>> https://www.ietf.org/mailman/listinfo/ippm
>=20