[Rpm] [ippm] lightweight active sensing of bandwidth and buffering

Wed Nov 2 17:41:09 EDT 2022

Dear Ruediger,

thank you very much for your helpful information. I will chew over this and see how/if I can exploit these "development of congestion observations" somehow. 
The goal of these plots is not primarily to detect congestion* (that would be the core of autorate's functionality, detect increases in delay and respond in reducing the shaper rate to counter act them), but more to show how well this works (the current rationale is that compared to a situation without traffic shaping the difference in high versus low-load CDFs should be noticeably** smaller).

*) autorate will be in control of an artificial bottleneck and we do measure the achieved throughput per direction, so we can reason about "congestion" based on throughput and delay; the loading is organic in that we simply measure the traffic volume per time of what travels over the relevant interfaces, the delay measurements however are active, which has its pros and cons...
**) Maybe even run a few statistical tests, like Mann-Withney-U/Wilcoxon ranksum test and then claim "significantly smaller". I feel a parametric t-test might not be in order here, with delay PDFs decidedly non-normal in shape (then again they likely are mono-modal, so t-test would still work okayish in spite of its core assumption being violated).

> On Nov 2, 2022, at 10:41, <Ruediger.Geib at telekom.de> <Ruediger.Geib at telekom.de> wrote:
> 
> Bob, Sebastian,
> 
> not being active on your topic, just to add what I observed on congestion:

	[SM] I will try to explain how/if we could exploit your observations for our controller

> - starts with an increase of jitter, but measured minimum delays still remain constant. Technically, a queue builds up some of the time, but it isn't present permanently.

	[SM] So in that phase we would expect CDFs to have different slopes, higher variance should result in shallower slope? As for using this insight for the actual controller, I am not sure how that would work; maybe maintaining a "jitter" base line per reflector and test whether each new sample deviates significantly from that base line? That is similar to the approach we are currently taking with delay/RTT.

> - buffer fill reaches a "steady state", called bufferbloat on access I think

	[SM] I would call it buffer bloat if that steady-state results in too high delays increases (which to a degree is a subjective judgement). Although in accordance with the Nichols/Jacobsen analogy of buffers/queues as shock absorbers a queue with with acceptable steady-state induced delay might not work too well to even out occasional bursts?

> ; technically, OWD increases also for the minimum delays, jitter now decreases (what you've described that as "the delay magnitude" decreases or "minimum CDF shift" respectively, if I'm correct).

	[SM] That is somewhat unfortunate as it is harder to detect quickly than something that simply increases and stays high (like RTT).

> I'd expect packet loss to occur, once the buffer fill is on steady state, but loss might be randomly distributed and could be of a low percentage.

	[SM] Loss is mostly invisible to our controller (it would need to affect our relatively thin active delay measurement traffic we have no insight into the rest of the traffic), but more than that the controller's goal is to avoid this situation so hopefully it will be rare and transient.

> - a sudden rather long load burst may cause a  jump-start to "steady-state" buffer fill.

	[SM] As would a rather steep drop in available capacity with traffic in-flight sized to the previous larger capacity. This is e.g. what can be observed over shared media like docsis/cable and GSM successors.

> The above holds for a slow but steady load increase (where the measurement frequency determines the timescale qualifying "slow").
> - in the end, max-min delay or delay distribution/jitter likely isn't an easy to handle single metric to identify congestion.

	[SM] Pragmatically we work with delay increase over baseline, which seems to work well enough to be useful, while it is unlikely to be perfect. The CDFs I plotted are really just for making sense post hoc out of the logged data... (cake-autorate is currently designed to maintain a "flight-recorder" log buffer that can be extracted after noticeable events, and I am trying to come up with how to slice and dice the data to help explain "noticeable events" from the limited log data we have).

Many Thanks & Kind Regards
	Sebastian

> 
> Regards,
> 
> Ruediger
> 
> 
>> On Nov 2, 2022, at 00:39, rjmcmahon via Rpm <rpm at lists.bufferbloat.net> wrote:
>> 
>> Bufferbloat shifts the minimum of the latency or OWD CDF.
> 
> 	[SM] Thank you for spelling this out explicitly, I only worked on a vage implicit assumption along those lines. However what I want to avoid is using delay magnitude itself as classifier between high and low load condition as that seems statistically uncouth to then show that the delay differs between the two classes;). 
> 	Yet, your comment convinced me that my current load threshold (at least for the high load condition) probably is too small, exactly because the "base" of the high-load CDFs coincides with the base of the low-load CDFs implying that the high-load class contains too many samples with decent delay (which after all is one of the goals of the whole autorate endeavor).
> 
> 
>> A suggestion is to disable x-axis auto-scaling and start from zero.
> 
> 	[SM] Will reconsider. I started with start at zero, end then switched to an x-range that starts with the delay corresponding to 0.01% for the reflector/condition with the lowest such value and stops at 97.5% for the reflector/condition with the highest delay value. My rationale is that the base delay/path delay of each reflector is not all that informative* (and it can still be learned from reading the x-axis), the long tail > 50% however is where I expect most differences so I want to emphasize this and finally I wanted to avoid that the actual "curvy" part gets compressed so much that all lines more or less coincide. As I said, I will reconsider this
> 
> 
> *) We also maintain individual baselines per reflector, so I could just plot the differences from baseline, but that would essentially equalize all reflectors, and I think having a plot that easily shows reflectors with outlying base delay can be informative when selecting reflector candidates. However once we actually switch to OWDs baseline correction might be required anyways, as due to colck differences ICMP type 13/14 data can have massive offsets that are mostly indicative of un synched clocks**.
> 
> **) This is whyI would prefer to use NTP servers as reflectors with NTP requests, my expectation is all of these should be reasonably synced by default so that offsets should be in the sane range....
> 
> 
>> 
>> Bob
>>> For about 2 years now the cake w-adaptive bandwidth project has been 
>>> exploring techniques to lightweightedly sense  bandwidth and 
>>> buffering problems. One of my favorites was their discovery that ICMP 
>>> type 13 got them working OWD from millions of ipv4 devices!
>>> They've also explored leveraging ntp and multiple other methods, and 
>>> have scripts available that do a good job of compensating for 5g and 
>>> starlink's misbehaviors.
>>> They've also pioneered a whole bunch of new graphing techniques, 
>>> which I do wish were used more than single number summaries 
>>> especially in analyzing the behaviors of new metrics like rpm, 
>>> samknows, ookla, and
>>> RFC9097 - to see what is being missed.
>>> There are thousands of posts about this research topic, a new post on 
>>> OWD just went by here.
>>> https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/135379/793
>>> and of course, I love flent's enormous graphing toolset for 
>>> simulating and analyzing complex network behaviors.
>> _______________________________________________
>> Rpm mailing list
>> Rpm at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm
> 
> _______________________________________________
> ippm mailing list
> ippm at ietf.org
> https://www.ietf.org/mailman/listinfo/ippm