[Bloat] Abandoning Window-based CC Considered Harmful (was Re: Bechtolschiem)

Thu Jul 8 10:28:19 EDT 2021

Hi Neal,

On 08.07.21 at 15:29 Neal Cardwell wrote:
> On Thu, Jul 8, 2021 at 7:25 AM Bless, Roland (TM) 
> <roland.bless at kit.edu <mailto:roland.bless at kit.edu>> wrote:
>
>     It seems that in BBRv2 there are many more mechanisms present
>     that try to control the amount of inflight data more tightly and
>     the new "cap"
>     is at 1.25 BDP.
>
> To clarify, the BBRv2 cwnd cap is not 1.25*BDP. If there is no packet 
> loss or ECN, the BBRv2 cwnd cap is the same as BBRv1. But if there has 
> been packet loss then conceptually the cwnd cap is the maximum amount 
> of data delivered in a single round trip since the last packet loss 
> (with a floor to ensure that the cwnd does not decrease by more than 
> 30% per round trip with packet loss, similar to CUBIC's 30% reduction 
> in a round trip with packet loss). (And upon RTO the BBR (v1 or v2) 
> cwnd is reset to 1, and slow-starts upward from there.)
Thanks for the clarification. I'm patiently waiting to see the BBRv2 
mechanisms coherently written up
in that new BBR Internet-Draft version ;-) Getting this together from 
the "diffs" on the IETF slides or the source code
is somewhat tedious, so I'll be very grateful for having that single 
write up.
> There is an overview of the BBRv2 response to packet loss here:
> https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-an-update-on-bbr-00#page=18 
> <https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-an-update-on-bbr-00#page=18>
My assumption came from slide 25 of this slide set:
the probing is terminated if inflight > 1.25 estimated_bdp (or "hard 
ceiling" seen).
So without experiencing more than 2% packet loss this may end up beyond 
1.25 estimated_bdp,
but would it often end at 2estimated_bdp?

Best regards,

  Roland

>
>>     This is too large for short queue routers in the Internet core,
>>     but it helps a lot with cross traffic on large queue edge routers.
>
>     Best regards,
>      Roland
>
>     [1] https://ieeexplore.ieee.org/document/8117540
>     <https://ieeexplore.ieee.org/document/8117540>
>
>>
>>     On Wed, Jul 7, 2021 at 3:19 PM Bless, Roland (TM)
>>     <roland.bless at kit.edu <mailto:roland.bless at kit.edu>> wrote:
>>
>>         Hi Matt,
>>
>>         [sorry for the late reply, overlooked this one]
>>
>>         please, see comments inline.
>>
>>         On 02.07.21 at 21:46 Matt Mathis via Bloat wrote:
>>>         The argument is absolutely correct for Reno, CUBIC and all
>>>         other self-clocked protocols.  One of the core assumptions
>>>         in Jacobson88, was that the clock for the entire system
>>>         comes from packets draining through the bottleneck queue. 
>>>         In this world, the clock is intrinsically brittle if the
>>>         buffers are too small.  The drain time needs to be a
>>>         substantial fraction of the RTT.
>>         I'd like to separate the functions here a bit:
>>
>>         1) "automatic pacing" by ACK clocking
>>
>>         2) congestion-window-based operation
>>
>>         I agree that the automatic pacing generated by the ACK clock
>>         (function 1) is increasingly
>>         distorted these days and may consequently cause micro bursts.
>>         This can be mitigated by using paced sending, which I
>>         consider very useful.
>>         However, I consider abandoning the (congestion) window-based
>>         approaches
>>         with ACK feedback (function 2) as harmful:
>>         a congestion window has an automatic self-stabilizing
>>         property since the ACK feedback reflects
>>         also the queuing delay and the congestion window limits the
>>         amount of inflight data.
>>         In contrast, rate-based senders risk instability: two senders
>>         in an M/D/1 setting, each sender sending with 50%
>>         bottleneck rate in average, both using paced sending at 120%
>>         of the average rate, suffice to cause
>>         instability (queue grows unlimited).
>>
>>         IMHO, two approaches seem to be useful:
>>         a) congestion-window-based operation with paced sending
>>         b) rate-based/paced sending with limiting the amount of
>>         inflight data
>>
>>>
>>>         However, we have reached the point where we need to discard
>>>         that requirement. One of the side points of BBR is that in
>>>         many environments it is cheaper to burn serving CPU to pace
>>>         into short queue networks than it is to "right size" the
>>>         network queues.
>>>
>>>         The fundamental problem with the old way is that in some
>>>         contexts the buffer memory has to beat Moore's law, because
>>>         to maintain constant drain time the memory size and BW both
>>>         have to scale with the link (laser) BW.
>>>
>>>         See the slides I gave at the Stanford Buffer Sizing workshop
>>>         december 2019: Buffer Sizing: Position Paper
>>>         <https://docs.google.com/presentation/d/1VyBlYQJqWvPuGnQpxW4S46asHMmiA-OeMbewxo_r3Cc/edit#slide=id.g791555f04c_0_5>
>>>
>>>
>>         Thanks for the pointer. I don't quite get the point that the
>>         buffer must have a certain size to keep the ACK clock stable:
>>         in case of an non application-limited sender, a very small
>>         buffer suffices to let the ACK clock
>>         run steady. The large buffers were mainly required for
>>         loss-based CCs to let the standing queue
>>         build up that keeps the bottleneck busy during CWnd reduction
>>         after packet loss, thereby
>>         keeping the (bottleneck link) utilization high.
>>
>>         Regards,
>>
>>          Roland
>>
>>
>>>         Note that we are talking about DC and Internet core.  At the
>>>         edge, BW is low enough where memory is relatively cheap.  In
>>>         some sense BB came about because memory is too cheap in
>>>         these environments.
>>>
>>>         Thanks,
>>>         --MM--
>>>         The best way to predict the future is to create it.  - Alan Kay
>>>
>>>         We must not tolerate intolerance;
>>>                however our response must be carefully measured:
>>>                     too strong would be hypocritical and risks
>>>         spiraling out of control;
>>>                     too weak risks being mistaken for tacit approval.
>>>
>>>
>>>         On Fri, Jul 2, 2021 at 9:59 AM Stephen Hemminger
>>>         <stephen at networkplumber.org
>>>         <mailto:stephen at networkplumber.org>> wrote:
>>>
>>>             On Fri, 2 Jul 2021 09:42:24 -0700
>>>             Dave Taht <dave.taht at gmail.com
>>>             <mailto:dave.taht at gmail.com>> wrote:
>>>
>>>             > "Debunking Bechtolsheim credibly would get a lot of
>>>             attention to the
>>>             > bufferbloat cause, I suspect." - dpreed
>>>             >
>>>             > "Why Big Data Needs Big Buffer Switches" -
>>>             >
>>>             http://www.arista.com/assets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf
>>>             <http://www.arista.com/assets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf>
>>>             >
>>>
>>>             Also, a lot depends on the TCP congestion control
>>>             algorithm being used.
>>>             They are using NewReno which only researchers use in
>>>             real life.
>>>
>>>             Even TCP Cubic has gone through several revisions. In my
>>>             experience, the
>>>             NS-2 models don't correlate well to real world behavior.
>>>
>>>             In real world tests, TCP Cubic will consume any buffer
>>>             it sees at a
>>>             congested link. Maybe that is what they mean by capture
>>>             effect.
>>>
>>>             There is also a weird oscillation effect with multiple
>>>             streams, where one
>>>             flow will take the buffer, then see a packet loss and
>>>             back off, the
>>>             other flow will take over the buffer until it sees loss.
>>>
>>>             _______________________________________________
>>>
>>>         _______________________________________________
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20210708/4aed23ed/attachment.html>