From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mattmathis@google.com>
Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com
 [IPv6:2a00:1450:4864:20::42f])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 8C6053B29E
 for <bloat@lists.bufferbloat.net>; Wed,  7 Jul 2021 18:39:01 -0400 (EDT)
Received: by mail-wr1-x42f.google.com with SMTP id d2so4966635wrn.0
 for <bloat@lists.bufferbloat.net>; Wed, 07 Jul 2021 15:39:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=OAONE92Ma8cIQwvfvtdmLQvt6kzcZNtuQyoseo444Sg=;
 b=GaTP/G2+JOyX1DSLmpJF/3NcjNzBVTWFEc3gYiEjtiPd/hc9i43Nrh9ix0z/J78xM+
 Q5GMUVywxUcnBEOGn3Ee6KSB+Z9X1NJV1c503HVHxOJfHRcdcZ2mtepmx3YqPeHiJK2/
 LXr5fBm1gkx3mQrEza8Vm3q+UkVii+UOqzK0f/lHUadJH8b6gTki0lknxnw2ruJiPRYb
 zfOA3bBrxjMIPHX+1//0mDSnNLkECWqM88US+VYEAx+bGLKpgWEz6C99GeWzBVOinChw
 8lR2aH7ZiyDONKzN4MzHEluQY/2I7z4zzY4eUWk3QVgwlLqIznZWaLLowMXnfkH7O58C
 LPAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=OAONE92Ma8cIQwvfvtdmLQvt6kzcZNtuQyoseo444Sg=;
 b=JucTFsKROPqA598zvOVcjCK8ev4kt/GsL+RbdlJSUwJE75PoFHN9G5+vSbdbTFE4YI
 B/Aipylxe8xlsXw9E/jg/oPDCJuB7jEekrFpiCO8VNefwERtuk/O6NK9y1hyY+cbai9W
 /6nwmBzs/6FgkfMZhmowde/mHfh5kEdtK1tAOb1YRdqQyWmO+ajP6EHvsu5Rn/4dbRtt
 pL9zvuAa111HJF04DuJJj++J45+aivPMw6BAVRkrLG37eZt15+mUzlC8v4FedIKD+Ak1
 dCfoIoNqCLxYS9xv8QVFXvMHiFgLsHq6O8sxiy5s1d90Ruh97YMlCIXBYhpDhug8KQzs
 VU2Q==
X-Gm-Message-State: AOAM530QVCjYKTcd9zoUYbCD1aD8EbNicg2vNvZN0/nBpauZukAjEETl
 NUdnyZrV6z4/e9pubQUh9gKQRWhfG2CKClFJEhsmTw==
X-Google-Smtp-Source: ABdhPJwdDy1GbdqWvW+bnVqGdCwiOdMvmaeSSH2EFjoYZfIvhO/nOA5twraJ6TjqnklPKrXpGn4iJZ4M4n10XOuw7uY=
X-Received: by 2002:a5d:560c:: with SMTP id l12mr31155731wrv.310.1625697540314; 
 Wed, 07 Jul 2021 15:39:00 -0700 (PDT)
MIME-Version: 1.0
References: <55fdf513-9c54-bea9-1f53-fe2c5229d7ba@eggo.org>
 <871t4as1h9.fsf@toke.dk> <3D32F19B-5DEA-48AD-97E7-D043C4EAEC51@gmail.com>
 <alpine.DEB.2.02.1606062029380.28955@uplift.swm.pp.se>
 <CAD6NSj6vA=bjHt3Txyw8VuV9tqg-A7wvLd6ovJG4Jxabvvjw4g@mail.gmail.com>
 <1465267957.902610235@apps.rackspace.com>
 <CAA93jw5gT=9bHG_vmC_kKs6U-brhrvi5Abqe9cJrgYuH5Aoi3A@mail.gmail.com>
 <20210702095924.0427b579@hermes.local>
 <CAH56bmCCj36ZijO8Bzs-j3f+LuDULMTcn5iUq0gJpf-+kUZSmg@mail.gmail.com>
 <1bab95a0-7904-2807-02fe-62674c19948f@kit.edu>
In-Reply-To: <1bab95a0-7904-2807-02fe-62674c19948f@kit.edu>
From: Matt Mathis <mattmathis@google.com>
Date: Wed, 7 Jul 2021 15:38:48 -0700
Message-ID: <CAH56bmAwCSc82VkZQxdf2DW=NvUVjh+utV5aLGtN=WyUqG=QLg@mail.gmail.com>
To: "Bless, Roland (TM)" <roland.bless@kit.edu>
Cc: Dave Taht <dave.taht@gmail.com>, 
 "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>,
 bloat <bloat@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary="000000000000679dcd05c69034ce"
Subject: Re: [Bloat] Abandoning Window-based CC Considered Harmful (was Re:
	Bechtolschiem)
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Wed, 07 Jul 2021 22:39:01 -0000

--000000000000679dcd05c69034ce
Content-Type: text/plain; charset="UTF-8"

Actually BBR does have a window based backup, which normally only comes
into play during load spikes and at very short RTTs.   It defaults to
2*minRTT*maxBW, which is twice the steady state window in it's normal paced
mode.

This is too large for short queue routers in the Internet core, but it
helps a lot with cross traffic on large queue edge routers.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.


On Wed, Jul 7, 2021 at 3:19 PM Bless, Roland (TM) <roland.bless@kit.edu>
wrote:

> Hi Matt,
>
> [sorry for the late reply, overlooked this one]
>
> please, see comments inline.
>
> On 02.07.21 at 21:46 Matt Mathis via Bloat wrote:
>
> The argument is absolutely correct for Reno, CUBIC and all
> other self-clocked protocols.  One of the core assumptions in Jacobson88,
> was that the clock for the entire system comes from packets draining
> through the bottleneck queue.  In this world, the clock is intrinsically
> brittle if the buffers are too small.  The drain time needs to be a
> substantial fraction of the RTT.
>
> I'd like to separate the functions here a bit:
>
> 1) "automatic pacing" by ACK clocking
>
> 2) congestion-window-based operation
>
> I agree that the automatic pacing generated by the ACK clock (function 1)
> is increasingly
> distorted these days and may consequently cause micro bursts.
> This can be mitigated by using paced sending, which I consider very
> useful.
> However, I consider abandoning the (congestion) window-based approaches
> with ACK feedback (function 2) as harmful:
> a congestion window has an automatic self-stabilizing property since the
> ACK feedback reflects
> also the queuing delay and the congestion window limits the amount of
> inflight data.
> In contrast, rate-based senders risk instability: two senders in an M/D/1
> setting, each sender sending with 50%
> bottleneck rate in average, both using paced sending at 120% of the
> average rate, suffice to cause
> instability (queue grows unlimited).
>
> IMHO, two approaches seem to be useful:
> a) congestion-window-based operation with paced sending
> b) rate-based/paced sending with limiting the amount of inflight data
>
>
> However, we have reached the point where we need to discard that
> requirement.  One of the side points of BBR is that in many environments it
> is cheaper to burn serving CPU to pace into short queue networks than it is
> to "right size" the network queues.
>
> The fundamental problem with the old way is that in some contexts the
> buffer memory has to beat Moore's law, because to maintain constant drain
> time the memory size and BW both have to scale with the link (laser) BW.
>
> See the slides I gave at the Stanford Buffer Sizing workshop december
> 2019: Buffer Sizing: Position Paper
> <https://docs.google.com/presentation/d/1VyBlYQJqWvPuGnQpxW4S46asHMmiA-OeMbewxo_r3Cc/edit#slide=id.g791555f04c_0_5>
>
>
> Thanks for the pointer. I don't quite get the point that the buffer must
> have a certain size to keep the ACK clock stable:
> in case of an non application-limited sender, a very small buffer suffices
> to let the ACK clock
> run steady. The large buffers were mainly required for loss-based CCs to
> let the standing queue
> build up that keeps the bottleneck busy during CWnd reduction after packet
> loss, thereby
> keeping the (bottleneck link) utilization high.
>
> Regards,
>
>  Roland
>
>
> Note that we are talking about DC and Internet core.  At the edge, BW is
> low enough where memory is relatively cheap.   In some sense BB came about
> because memory is too cheap in these environments.
>
> Thanks,
> --MM--
> The best way to predict the future is to create it.  - Alan Kay
>
> We must not tolerate intolerance;
>        however our response must be carefully measured:
>             too strong would be hypocritical and risks spiraling out of
> control;
>             too weak risks being mistaken for tacit approval.
>
>
> On Fri, Jul 2, 2021 at 9:59 AM Stephen Hemminger <
> stephen@networkplumber.org> wrote:
>
>> On Fri, 2 Jul 2021 09:42:24 -0700
>> Dave Taht <dave.taht@gmail.com> wrote:
>>
>> > "Debunking Bechtolsheim credibly would get a lot of attention to the
>> > bufferbloat cause, I suspect." - dpreed
>> >
>> > "Why Big Data Needs Big Buffer Switches" -
>> >
>> http://www.arista.com/assets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf
>> >
>>
>> Also, a lot depends on the TCP congestion control algorithm being used.
>> They are using NewReno which only researchers use in real life.
>>
>> Even TCP Cubic has gone through several revisions. In my experience, the
>> NS-2 models don't correlate well to real world behavior.
>>
>> In real world tests, TCP Cubic will consume any buffer it sees at a
>> congested link. Maybe that is what they mean by capture effect.
>>
>> There is also a weird oscillation effect with multiple streams, where one
>> flow will take the buffer, then see a packet loss and back off, the
>> other flow will take over the buffer until it sees loss.
>>
>> _______________________________________________
>
> _______________________________________________
>
>
>

--000000000000679dcd05c69034ce
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Actually BBR does have a window based backup, which normal=
ly only comes into play during load spikes and at very short RTTs.=C2=A0 =
=C2=A0It defaults to 2*minRTT*maxBW, which is twice the steady state window=
 in it&#39;s normal paced mode.<div><br></div><div>This is too large for sh=
ort queue routers in the Internet core, but it helps a lot with cross traff=
ic on large queue edge routers.<br><div><div><br></div><div>Thanks,<div><di=
v dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature">=
<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr">--MM--<br>The best way t=
o predict the future is to create it. =C2=A0- Alan Kay<br><br>We must not t=
olerate intolerance;</div><div dir=3D"ltr">=C2=A0 =C2=A0 =C2=A0 =C2=A0howev=
er our response must be carefully measured:=C2=A0</div><div>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 too strong would be hypocritical and risks spir=
aling out of control;</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 t=
oo weak risks being mistaken for tacit approval.</div></div></div></div></d=
iv><br></div></div></div></div><br><div class=3D"gmail_quote"><div dir=3D"l=
tr" class=3D"gmail_attr">On Wed, Jul 7, 2021 at 3:19 PM Bless, Roland (TM) =
&lt;<a href=3D"mailto:roland.bless@kit.edu">roland.bless@kit.edu</a>&gt; wr=
ote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
 =20
   =20
 =20
  <div>
    <div>Hi Matt,<br>
      <br>
      [sorry for the late reply, overlooked this one]</div>
    <div><br>
    </div>
    <div>please, see comments inline.<br>
    </div>
    <div><br>
    </div>
    <div>On 02.07.21 at 21:46 Matt Mathis via
      Bloat wrote:<br>
    </div>
    <blockquote type=3D"cite">
     =20
      <div dir=3D"ltr">The argument is absolutely correct for Reno, CUBIC
        and all other=C2=A0self-clocked protocols.=C2=A0 One of the core
        assumptions in Jacobson88, was that the clock=C2=A0for the entire
        system comes from packets draining through the bottleneck
        queue.=C2=A0 In this world, the clock is intrinsically brittle if t=
he
        buffers=C2=A0are too small.=C2=A0 The drain time needs to be a subs=
tantial
        fraction of the RTT.</div>
    </blockquote>
    I&#39;d like to separate the functions here a bit:<br>
    <p>1) &quot;automatic pacing&quot; by ACK clocking</p>
    <p>2) congestion-window-based operation</p>
    <p>I agree that the automatic pacing generated by the ACK clock
      (function 1) is increasingly <br>
      distorted these days and may consequently cause micro bursts.<br>
      This can be mitigated by using paced sending, which I consider
      very useful. <br>
      However, I consider abandoning the (congestion) window-based
      approaches <br>
      with ACK feedback (function 2) as harmful:<br>
      a congestion window has an automatic self-stabilizing property
      since the ACK feedback reflects<br>
      also the queuing delay and the congestion window limits the amount
      of inflight data.<br>
      In contrast, rate-based senders risk instability: two senders in
      an M/D/1 setting, each sender sending with 50%<br>
      bottleneck rate in average, both using paced sending at 120% of
      the average rate, suffice to cause<br>
      instability (queue grows unlimited).<br>
      <br>
      IMHO, two approaches seem to be useful:<br>
      a) congestion-window-based operation with paced sending<br>
      b) rate-based/paced sending with limiting the amount of inflight
      data<br>
    </p>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div><br>
        </div>
        <div>However, we have reached the point where=C2=A0we need to disca=
rd
          that requirement.=C2=A0 One of the side points of BBR is that in
          many environments it is cheaper to burn serving CPU to pace
          into short queue networks than it is to &quot;right size&quot; th=
e
          network=C2=A0queues.</div>
        <div><br>
        </div>
        <div>The fundamental problem with the old=C2=A0way is that in some
          contexts the buffer memory has to beat Moore&#39;s law, because t=
o
          maintain constant drain time the memory=C2=A0size and BW both hav=
e
          to scale with the link (laser) BW.</div>
        <div><br>
        </div>
        <div>See the slides I gave at the=C2=A0Stanford Buffer Sizing
          workshop december 2019:=C2=A0<a href=3D"https://docs.google.com/p=
resentation/d/1VyBlYQJqWvPuGnQpxW4S46asHMmiA-OeMbewxo_r3Cc/edit#slide=3Did.=
g791555f04c_0_5" target=3D"_blank">Buffer Sizing: Position Paper</a>=C2=A0<=
/div>
        <div><br>
        </div>
      </div>
    </blockquote>
    <p>Thanks for the pointer. I don&#39;t quite get the point that the
      buffer must have a certain size to keep the ACK clock stable:<br>
      in case of an non application-limited sender, a very small buffer
      suffices to let the ACK clock <br>
      run steady. The large buffers were mainly required for loss-based
      CCs to let the standing queue <br>
      build up that keeps the bottleneck busy during CWnd reduction
      after packet loss, thereby <br>
      keeping the (bottleneck link) utilization high.<br>
    </p>
    <p>Regards,</p>
    <p>=C2=A0Roland<br>
    </p>
    <p><br>
    </p>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div>Note that we are talking about DC and Internet core.=C2=A0 At
          the edge, BW is low enough where memory is relatively cheap.=C2=
=A0
          =C2=A0In some sense BB came about because memory is too cheap in
          these environments.</div>
        <div><br>
        </div>
        <div>
          <div>
            <div dir=3D"ltr">
              <div dir=3D"ltr">
                <div>
                  <div dir=3D"ltr">
                    <div>
                      <div dir=3D"ltr">
                        <div>Thanks,</div>
                        --MM--<br>
                        The best way to predict the future is to create
                        it. =C2=A0- Alan Kay<br>
                        <br>
                        We must not tolerate intolerance;</div>
                      <div dir=3D"ltr">=C2=A0 =C2=A0 =C2=A0 =C2=A0however o=
ur response must be
                        carefully measured:=C2=A0</div>
                      <div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 too st=
rong would be hypocritical
                        and risks spiraling out of control;</div>
                      <div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 too we=
ak risks being mistaken for
                        tacit approval.</div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
          <br>
        </div>
      </div>
      <br>
      <div class=3D"gmail_quote">
        <div dir=3D"ltr" class=3D"gmail_attr">On Fri, Jul 2, 2021 at 9:59 A=
M
          Stephen Hemminger &lt;<a href=3D"mailto:stephen@networkplumber.or=
g" target=3D"_blank">stephen@networkplumber.org</a>&gt;
          wrote:<br>
        </div>
        <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex">On
          Fri, 2 Jul 2021 09:42:24 -0700<br>
          Dave Taht &lt;<a href=3D"mailto:dave.taht@gmail.com" target=3D"_b=
lank">dave.taht@gmail.com</a>&gt;
          wrote:<br>
          <br>
          &gt; &quot;Debunking Bechtolsheim credibly would get a lot of
          attention to the<br>
          &gt; bufferbloat cause, I suspect.&quot; - dpreed<br>
          &gt; <br>
          &gt; &quot;Why Big Data Needs Big Buffer Switches&quot; -<br>
          &gt; <a href=3D"http://www.arista.com/assets/data/pdf/Whitepapers=
/BigDataBigBuffers-WP.pdf" rel=3D"noreferrer" target=3D"_blank">http://www.=
arista.com/assets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf</a><br>
          &gt; <br>
          <br>
          Also, a lot depends on the TCP congestion control algorithm
          being used.<br>
          They are using NewReno which only researchers use in real
          life.<br>
          <br>
          Even TCP Cubic has gone through several revisions. In my
          experience, the<br>
          NS-2 models don&#39;t correlate well to real world behavior.<br>
          <br>
          In real world tests, TCP Cubic will consume any buffer it sees
          at a<br>
          congested link. Maybe that is what they mean by capture
          effect.<br>
          <br>
          There is also a weird oscillation effect with multiple
          streams, where one<br>
          flow will take the buffer, then see a packet loss and back
          off, the<br>
          other flow will take over the buffer until it sees loss.<br>
          <br>
          _______________________________________________</blockquote>
      </div>
      <pre>_______________________________________________
</pre>
    </blockquote>
    <br>
  </div>

</blockquote></div>

--000000000000679dcd05c69034ce--