[Rpm] [ippm] lightweight active sensing of bandwidth and buffering

Wed Nov 2 17:13:10 EDT 2022

> On Nov 2, 2022, at 20:44, Dave Taht via Rpm <rpm at lists.bufferbloat.net> wrote:
> 
> On Wed, Nov 2, 2022 at 12:29 PM rjmcmahon via Rpm
> <rpm at lists.bufferbloat.net> wrote:
>> 
>> Most measuring bloat are ignoring queue build up phase and rather start
>> taking measurements after the bottleneck queue is in a standing state.
> 
> +10. It's the slow start transient that is holding things back.

	[SM] From my naive perspective slow start has a few facets:
a) it is absolutely the right approach conceptually with no reliable prior knowledge of the capacity the best we can do is to probe it by increasing the sending rate over time (and the current exponential growth is already plenty aggressive*)
b) Since this needs feedback from the remote endpoint at best we can figure out 1 RTT later whether we sent at acceptable rate
c) we want to go as fast as reasonable
d) but not any faster ;)
e) the trickiest part is really deciding when to leave slow start's aggressive rate increase per RTT regime so that the >= 1 RTT "blind" spot does not lead to too big a transient queue spike.
f) The slow start phase ca be relatively short, so the less averaging we need to do to decide when to drop out of slow start the better as averaging costs time.

IMHO the best way forward would be to switch from bit-banging the queue state (as in L4S' design) over multiple packets to sending a multi-bit queue occupancy signal per-packet, then the sender should be able to figure out the rate of queueing change as a function of sending rate change and predict a reasonable time to switch to congestion avoidance without having to wait for a drop (assuming the remote end signals these queue occupancy signals back to the sender in a timely fashion)...
(This is really just transfering the ideas from Arslan, Serhat, and Nick McKeown. ‘Switches Know the Exact Amount of Congestion’. In Proceedings of the 2019 Workshop on Buffer Sizing, 1–6, 2019. to slow-start).

*) Two factors to play with is size of the starting batch (aka initial window) and the actual factor of increase per RTT, people statred playing with the first but so far seem reasonable enough not to touch the second ;)

> If we
> could, for example
> open up the 110+ objects and flows web pages require all at once, and
> let 'em rip, instead of 15 at a time, without destroying the network,
> web PLT would get much better.
> 
>> My opinion, the best units for bloat is packets for UDP or bytes for
>> TCP. Min delay is a proxy measurement.
> 
> bytes, period. bytes = time. Sure most udp today is small packets but
> quic and videconferencing change that.
> 
>> 
>> Little's law allows one to compute this though does assume the network
>> is in a stable state over the measurement interval. In the real world,
>> this probably is rarely true. So we, in test & measurement engineering,
>> force the standing state with some sort of measurement co-traffic and
>> call it "working conditions" or equivalent. ;)
> 
> There was an extremely long, nuanced debate about little's law and
> where it applies, last year, here:
> 
> https://lists.bufferbloat.net/pipermail/cake/2021-July/005540.html
> 
> I don't want to go into it, again.
> 
>> 
>> Bob
>>> Bob, Sebastian,
>>> 
>>> not being active on your topic, just to add what I observed on
>>> congestion:
>>> - starts with an increase of jitter, but measured minimum delays still
>>> remain constant. Technically, a queue builds up some of the time, but
>>> it isn't present permanently.
>>> - buffer fill reaches a "steady state", called bufferbloat on access I
>>> think; technically, OWD increases also for the minimum delays, jitter
>>> now decreases (what you've described that as "the delay magnitude"
>>> decreases or "minimum CDF shift" respectively, if I'm correct). I'd
>>> expect packet loss to occur, once the buffer fill is on steady state,
>>> but loss might be randomly distributed and could be of a low
>>> percentage.
>>> - a sudden rather long load burst may cause a  jump-start to
>>> "steady-state" buffer fill. The above holds for a slow but steady load
>>> increase (where the measurement frequency determines the timescale
>>> qualifying "slow").
>>> - in the end, max-min delay or delay distribution/jitter likely isn't
>>> an easy to handle single metric to identify congestion.
>>> 
>>> Regards,
>>> 
>>> Ruediger
>>> 
>>> 
>>>> On Nov 2, 2022, at 00:39, rjmcmahon via Rpm
>>>> <rpm at lists.bufferbloat.net> wrote:
>>>> 
>>>> Bufferbloat shifts the minimum of the latency or OWD CDF.
>>> 
>>>      [SM] Thank you for spelling this out explicitly, I only worked on a
>>> vage implicit assumption along those lines. However what I want to
>>> avoid is using delay magnitude itself as classifier between high and
>>> low load condition as that seems statistically uncouth to then show
>>> that the delay differs between the two classes;).
>>>      Yet, your comment convinced me that my current load threshold (at
>>> least for the high load condition) probably is too small, exactly
>>> because the "base" of the high-load CDFs coincides with the base of
>>> the low-load CDFs implying that the high-load class contains too many
>>> samples with decent delay (which after all is one of the goals of the
>>> whole autorate endeavor).
>>> 
>>> 
>>>> A suggestion is to disable x-axis auto-scaling and start from zero.
>>> 
>>>      [SM] Will reconsider. I started with start at zero, end then switched
>>> to an x-range that starts with the delay corresponding to 0.01% for
>>> the reflector/condition with the lowest such value and stops at 97.5%
>>> for the reflector/condition with the highest delay value. My rationale
>>> is that the base delay/path delay of each reflector is not all that
>>> informative* (and it can still be learned from reading the x-axis),
>>> the long tail > 50% however is where I expect most differences so I
>>> want to emphasize this and finally I wanted to avoid that the actual
>>> "curvy" part gets compressed so much that all lines more or less
>>> coincide. As I said, I will reconsider this
>>> 
>>> 
>>> *) We also maintain individual baselines per reflector, so I could
>>> just plot the differences from baseline, but that would essentially
>>> equalize all reflectors, and I think having a plot that easily shows
>>> reflectors with outlying base delay can be informative when selecting
>>> reflector candidates. However once we actually switch to OWDs baseline
>>> correction might be required anyways, as due to colck differences ICMP
>>> type 13/14 data can have massive offsets that are mostly indicative of
>>> un synched clocks**.
>>> 
>>> **) This is whyI would prefer to use NTP servers as reflectors with
>>> NTP requests, my expectation is all of these should be reasonably
>>> synced by default so that offsets should be in the sane range....
>>> 
>>> 
>>>> 
>>>> Bob
>>>>> For about 2 years now the cake w-adaptive bandwidth project has been
>>>>> exploring techniques to lightweightedly sense  bandwidth and
>>>>> buffering problems. One of my favorites was their discovery that ICMP
>>>>> type 13 got them working OWD from millions of ipv4 devices!
>>>>> They've also explored leveraging ntp and multiple other methods, and
>>>>> have scripts available that do a good job of compensating for 5g and
>>>>> starlink's misbehaviors.
>>>>> They've also pioneered a whole bunch of new graphing techniques,
>>>>> which I do wish were used more than single number summaries
>>>>> especially in analyzing the behaviors of new metrics like rpm,
>>>>> samknows, ookla, and
>>>>> RFC9097 - to see what is being missed.
>>>>> There are thousands of posts about this research topic, a new post on
>>>>> OWD just went by here.
>>>>> https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/135379/793
>>>>> and of course, I love flent's enormous graphing toolset for
>>>>> simulating and analyzing complex network behaviors.
>>>> _______________________________________________
>>>> Rpm mailing list
>>>> Rpm at lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/rpm
>>> 
>>> _______________________________________________
>>> ippm mailing list
>>> ippm at ietf.org
>>> https://www.ietf.org/mailman/listinfo/ippm
>> _______________________________________________
>> Rpm mailing list
>> Rpm at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm
> 
> 
> 
> -- 
> This song goes out to all the folk that thought Stadia would work:
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC
> _______________________________________________
> Rpm mailing list
> Rpm at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/rpm