From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com
 [IPv6:2a00:1450:4864:20::431])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 9B0673B29D
 for <bloat@lists.bufferbloat.net>; Sun,  7 Aug 2022 13:34:31 -0400 (EDT)
Received: by mail-wr1-x431.google.com with SMTP id j1so8559095wrw.1
 for <bloat@lists.bufferbloat.net>; Sun, 07 Aug 2022 10:34:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc;
 bh=BaJ0U9PCqzTQyHlDDC8dfGhssuLD06Ncd9VVN3O/4EE=;
 b=S1KP3/dzgzpTBuZeFbN9VdvpGr6Lq9FnqAEE4rauc1zp5yznAfCtapWBPQLHyYTTq7
 2P7lHFCAiAqCQ/A4bBH04KeS8e02T+a7pu/rQtkCwThW9CvxY6TxKJqUOP4W8KoZpEOv
 IJfEYedRUWbAhWpOXeRTyuPCQJ4Txh9NZ2lTcv2JFk1cm2Rk2b1j7gtafUmZZV5mDU6z
 RPzUyZtAeasns4D7c4eG79K3fNy+xUvh8bfgRHjWYkFyr1H6nUrLnBgP1lAauMAF42CB
 26GrSWwzwjW/69GOolpyb4OqsKOPzTZ0x4jh4nhPCLu71PK4yIBNY35L17EC/8c8JVNn
 dwEQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc;
 bh=BaJ0U9PCqzTQyHlDDC8dfGhssuLD06Ncd9VVN3O/4EE=;
 b=gAuK+y2h6ncFDQiD4MAZGbwe067NlD1Iwn/8lvKoW4PlEi5w51YE3qlkCaUo4niTO/
 JccYXeYUhTmBeQZwtEKnlAskMfHg4GbZEqXvmqhnrLJuQZuikXXEfd4kHkkv3lc3ALOH
 68PLby+8v1a/BOV9klZKSaium26d8MtjFI4ti0pNLbF5NM++PAqsHxTb5EiepifIcFWA
 994imv3EPSyJgxUJ+ZRxVjx3tuw1qnDHGBDI6vCkNxTtkQv1J9qUa2jFTJgT4mGCXEIt
 c71bRhUVB6p0mSCsgds7gtPFxP6vR41XDSnThEPE1dFpwf7le/EhqqgDpMndW20HH70D
 suxg==
X-Gm-Message-State: ACgBeo3+9ERcgusvoDff1gv6UjoZRGvceqvg5jl4D7o1veCKIPxEQFTI
 r8zoebEFDa5h8k+wK/lDJuZPtf8xR0fdqSMy+VA=
X-Google-Smtp-Source: AA6agR7nhO9rkBnXanCeYjSpdDHihwJzQrboxzOeUIN2WyYaL3TdpLQsZR6EqkFEIgvyGLrivhU0Y47cRXrONE1A7Ik=
X-Received: by 2002:a5d:488a:0:b0:21e:d477:6555 with SMTP id
 g10-20020a5d488a000000b0021ed4776555mr9552758wrq.380.1659893670328; Sun, 07
 Aug 2022 10:34:30 -0700 (PDT)
MIME-Version: 1.0
References: <CAAWx_pX3fHc96N8Miti6Tuk9Km3YLUqAqZxDu_M5WJw2NErwHA@mail.gmail.com>
 <20220725131600.GC30425@cmadams.net>
 <CAAWx_pUk1rOwc+jvVEED4Es5384hLEb_eDHH0DtdnS8m-3K8ZQ@mail.gmail.com>
 <CAAeewD9h6ci89WecsO72-v-975xGcwtXZqnWHyFHXHXi=dDHYw@mail.gmail.com>
 <884F632E-BB1C-44DA-9FF8-9D20AC66D158@gmail.com>
 <CAEHH8rG2w_3nBAQhAfu=bby2TarnmWaDQJVjBPQxvJiQxUyYkw@mail.gmail.com>
 <CAAWx_pXJ6X36odR2pHRHn4So7qHwDDCtgcB7heKAA2FWhfS05Q@mail.gmail.com>
 <49E386AE-BC73-458F-B679-2A438CF73E7A@hxcore.ol>
 <CAFAzdPUKijZeJV+j1C+j=TtM7q3sgW5smdADRPpN191dEQmv-Q@mail.gmail.com>
 <CAAeewD-M5chnDF-dLDn_=8ow245yWbuvdOZgeL-kF51nvQeL-A@mail.gmail.com>
 <02ac01d8a8f1$3114efb0$933ecf10$@gmail.com>
 <1e895c68-bc7b-dfbf-3fdc-e31394c68626@necom830.hpcl.titech.ac.jp>
 <CAAeewD-_zE+huidNw_a__EEjRwFSGM+3uDV3COGL9EC0_=suBw@mail.gmail.com>
 <7e0b044d-3963-9c60-6a88-66d462278118@necom830.hpcl.titech.ac.jp>
 <CAFwHcn=YH0vL6RNkr3wcHXmF405FY0x+YobgfTTMV5wBexwMFw@mail.gmail.com>
In-Reply-To: <CAFwHcn=YH0vL6RNkr3wcHXmF405FY0x+YobgfTTMV5wBexwMFw@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Sun, 7 Aug 2022 10:34:17 -0700
Message-ID: <CAA93jw4L+JmjCQDiJeTa4XG_iJfcSbkW3fTFNjsDfdMaJLcpyA@mail.gmail.com>
To: dip <diptanshu.singh@gmail.com>
Cc: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>, NANOG <nanog@nanog.org>, 
 bloat <bloat@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary="00000000000095bef605e5aa1ca2"
Subject: Re: [Bloat] 400G forwarding - how does it work?
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 07 Aug 2022 17:34:31 -0000

--00000000000095bef605e5aa1ca2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

If it's of any help... the bloat mailing list at lists.bufferbloat.net has
the largest concentration of
queue theorists and network operator + developers I know of. (also, bloat
readers, this ongoing thread on nanog about 400Gbit is fascinating)

There is 10+ years worth of debate in the archives:
https://lists.bufferbloat.net/pipermail/bloat/2012-May/thread.html as one
example.

On Sun, Aug 7, 2022 at 10:14 AM dip <diptanshu.singh@gmail.com> wrote:

>
> Disclaimer: I often use the M/M/1 queuing assumption for much of my work
> to keep the maths simple and believe that I am reasonably aware in which
> context it's a right or a wrong application :). Also, I don't intend to
> change the core topic of the thread, but since this has come up, I couldn=
't
> resist.
>
> >> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of
> >> buffer is enough to make packet drop probability less than
> >> 1%. With 98% load, the probability is 0.0041%.
>
> To expand the above a bit so that there is no ambiguity. The above assume=
s
> that the router behaves like an M/M/1 queue. The expected number of packe=
ts
> in the systems can be given by
>
> [image: image.png]
> where [image: image.png] is the utilization. The probability that at
> least B packets are in the system is given by  [image: image.png] where B
> is the number of packets in the system. for a link utilization of .98, th=
e
> packet drop probability is .98**(500) =3D 0.000041%. for a link utilizati=
on
> of 99%,  .99**500 =3D 0.00657%.
>
>
Regrettably, tcp ccs, by design do not stop growth until you get that drop,
e.g. 100+% utilization.


>> When many TCPs are running, burst is averaged and traffic
> >> is poisson.
>
> M/M/1 queuing assumes that traffic is Poisson, and the Poisson assumption
> is
> 1) The number of sources is infinite
> 2) The traffic arrival pattern is random.
>
> I think the second assumption is where I often question whether the
> traffic arrival pattern is truly random. I have seen cases where traffic
> behaves more like self-similar. Most Poisson models rely on the Central
> limit theorem, which loosely states that the sample distribution will
> approach a normal distribution as we aggregate more from various
> distributions. The mean will smooth towards a value.
>
> Do you have any good pointers where the research has been done that
> today's internet traffic can be modeled accurately by Poisson? For as man=
y
> papers supporting Poisson, I have seen as many papers saying it's not
> Poisson.
>
> https://www.icir.org/vern/papers/poisson.TON.pdf
> https://www.cs.wustl.edu/~jain/cse567-06/ftp/traffic_models2/#sec1.2
>

I am firmly in the not-poisson camp, however, by inserting (esp) FQ and AQM
techniques on the bottleneck links it is very possible to smooth traffic
into this more easily analytical model - and gain enormous benefits from
doing so.


> On Sun, 7 Aug 2022 at 04:18, Masataka Ohta <
> mohta@necom830.hpcl.titech.ac.jp> wrote:
>
>> Saku Ytti wrote:
>>
>> >> I'm afraid you imply too much buffer bloat only to cause
>> >> unnecessary and unpleasant delay.
>> >>
>> >> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of
>> >> buffer is enough to make packet drop probability less than
>> >> 1%. With 98% load, the probability is 0.0041%.
>>
>> > I feel like I'll live to regret asking. Which congestion control
>> > algorithm are you thinking of?
>>
>> I'm not assuming LAN environment, for which paced TCP may
>> be desirable (if bandwidth requirement is tight, which is
>> unlikely in LAN).
>>
>> > But Cubic and Reno will burst tcp window growth at sender rate, which
>> > may be much more than receiver rate, someone has to store that growth
>> > and pace it out at receiver rate, otherwise window won't grow, and
>> > receiver rate won't be achieved.
>>
>> When many TCPs are running, burst is averaged and traffic
>> is poisson.
>>
>> > So in an ideal scenario, no we don't need a lot of buffer, in
>> > practical situations today, yes we need quite a bit of buffer.
>>
>> That is an old theory known to be invalid (Ethernet switches with
>> small buffer is enough for IXes) and theoretically denied by:
>>
>>         Sizing router buffers
>>         https://dl.acm.org/doi/10.1145/1030194.1015499
>>
>> after which paced TCP was developed for unimportant exceptional
>> cases of LAN.
>>
>>  > Now add to this multiple logical interfaces, each having 4-8 queues,
>>  > it adds up.
>>
>> Having so may queues requires sorting of queues to properly
>> prioritize them, which costs a lot of computation (and
>> performance loss) for no benefit and is a bad idea.
>>
>>  > Also the shallow ingress buffers discussed in the thread are not dela=
y
>>  > buffers and the problem is complex because no device is marketable
>>  > that can accept wire rate of minimum packet size, so what trade-offs
>>  > do we carry, when we get bad traffic at wire rate at small packet
>>  > size? We can't empty the ingress buffers fast enough, do we have
>>  > physical memory for each port, do we share, how do we share?
>>
>> People who use irrationally small packets will suffer, which is
>> not a problem for the rest of us.
>>
>>                                                 Masataka Ohta
>>
>>
>>

--=20
FQ World Domination pending:
https://blog.cerowrt.org/post/state_of_fq_codel/
Dave T=C3=A4ht CEO, TekLibre, LLC

--00000000000095bef605e5aa1ca2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>If it&#39;s of any help... the bloat mailing list at =
<a href=3D"http://lists.bufferbloat.net">lists.bufferbloat.net</a> has the =
largest concentration of</div><div>queue theorists and network operator + d=
evelopers I know of. (also, bloat readers, this ongoing thread on nanog abo=
ut 400Gbit is fascinating)</div><div><br></div>There is 10+ years worth of =
debate in the archives: <a href=3D"https://lists.bufferbloat.net/pipermail/=
bloat/2012-May/thread.html">https://lists.bufferbloat.net/pipermail/bloat/2=
012-May/thread.html</a> as one example.=C2=A0<div><br><div class=3D"gmail_q=
uote"><div dir=3D"ltr" class=3D"gmail_attr">On Sun, Aug 7, 2022 at 10:14 AM=
 dip &lt;<a href=3D"mailto:diptanshu.singh@gmail.com">diptanshu.singh@gmail=
.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1=
ex"><div dir=3D"ltr"><div dir=3D"ltr"><br><div><div><div>Disclaimer: I ofte=
n use the M/M/1 queuing assumption for much of my work to keep the maths si=
mple and believe that I am reasonably aware in which context it&#39;s a rig=
ht or a wrong application :). Also, I don&#39;t intend to change the core t=
opic of the thread, but since this has come up, I couldn&#39;t resist.</div=
><div><br></div><div><font color=3D"#351c75">&gt;&gt; With 99% load M/M/1, =
500 packets (750kB for 1500B MTU) of<br>&gt;&gt; buffer is enough to make p=
acket drop probability less than<br>&gt;&gt; 1%. With 98% load, the probabi=
lity is 0.0041%.</font><br></div><div><font color=3D"#351c75"><br></font></=
div><div>To expand the above a bit so that there is no ambiguity. The above=
 assumes that the router behaves like an M/M/1 queue. The expected number o=
f packets in the systems can be given by</div><div><br></div><div><img alt=
=3D"image.png" width=3D"106" height=3D"43" style=3D"margin-right: 0px;"><br=
></div></div><div>where=C2=A0<img alt=3D"image.png" width=3D"15" height=3D"=
17">=C2=A0is the utilization. The probability=C2=A0that at least B packets =
are in the system is given by=C2=A0=C2=A0<img alt=3D"image.png" width=3D"25=
" height=3D"22">=C2=A0where B is the number of packets in the system. for a=
 link utilization of .98, the packet drop probability is .98**(500) =3D=C2=
=A00.000041%. for a link utilization of 99%,=C2=A0 .99**500 =3D=C2=A00.0065=
7%.</div><div><br></div></div></div></div></blockquote><div><br></div><div>=
Regrettably, tcp ccs, by design do not stop growth until you get that drop,=
 e.g. 100+% utilization.</div><div><br></div><div><br></div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid =
rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div><=
div></div><div><div><font color=3D"#351c75">&gt;&gt; When many TCPs are run=
ning, burst is averaged and traffic</font></div><font color=3D"#351c75">&gt=
;&gt; is poisson.</font><br></div><div><font color=3D"#351c75"><br></font><=
/div><div>M/M/1 queuing assumes that traffic is Poisson, and the Poisson as=
sumption is</div><div>1) The number of sources is infinite=C2=A0<br><div>2)=
 The traffic arrival pattern is random.=C2=A0</div><div><br></div><div>I th=
ink the second assumption is where I often question whether the traffic arr=
ival pattern is truly random. I have seen cases where traffic behaves more =
like self-similar. Most Poisson models rely on the Central limit theorem, w=
hich loosely states that the sample distribution will approach a normal dis=
tribution as we aggregate more from various distributions. The mean will sm=
ooth towards a value.=C2=A0</div></div><div><div><br></div><div>Do you have=
 any good pointers where the research has been done that today&#39;s intern=
et traffic can be modeled accurately by Poisson? For as many papers support=
ing Poisson, I have seen as many papers saying it&#39;s not Poisson.</div><=
div><br></div><div><a href=3D"https://www.icir.org/vern/papers/poisson.TON.=
pdf" target=3D"_blank">https://www.icir.org/vern/papers/poisson.TON.pdf</a>=
<br></div><div><a href=3D"https://www.cs.wustl.edu/~jain/cse567-06/ftp/traf=
fic_models2/#sec1.2" target=3D"_blank">https://www.cs.wustl.edu/~jain/cse56=
7-06/ftp/traffic_models2/#sec1.2</a></div></div></div></div></div></blockqu=
ote><div><br></div><div>I am firmly in the not-poisson camp, however, by in=
serting (esp) FQ and AQM techniques on the bottleneck links it is very poss=
ible to smooth traffic into this more easily analytical model - and gain en=
ormous benefits from doing so.=C2=A0=C2=A0</div><div><br></div><div><br></d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><br><=
div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sun, 7 A=
ug 2022 at 04:18, Masataka Ohta &lt;<a href=3D"mailto:mohta@necom830.hpcl.t=
itech.ac.jp" target=3D"_blank">mohta@necom830.hpcl.titech.ac.jp</a>&gt; wro=
te:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Saku Ytti wr=
ote:<br>
<br>
&gt;&gt; I&#39;m afraid you imply too much buffer bloat only to cause<br>
&gt;&gt; unnecessary and unpleasant delay.<br>
&gt;&gt;<br>
&gt;&gt; With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of<br>
&gt;&gt; buffer is enough to make packet drop probability less than<br>
&gt;&gt; 1%. With 98% load, the probability is 0.0041%.<br>
<br>
&gt; I feel like I&#39;ll live to regret asking. Which congestion control<b=
r>
&gt; algorithm are you thinking of?<br>
<br>
I&#39;m not assuming LAN environment, for which paced TCP may<br>
be desirable (if bandwidth requirement is tight, which is<br>
unlikely in LAN).<br>
<br>
&gt; But Cubic and Reno will burst tcp window growth at sender rate, which<=
br>
&gt; may be much more than receiver rate, someone has to store that growth<=
br>
&gt; and pace it out at receiver rate, otherwise window won&#39;t grow, and=
<br>
&gt; receiver rate won&#39;t be achieved.<br>
<br>
When many TCPs are running, burst is averaged and traffic<br>
is poisson.<br>
<br>
&gt; So in an ideal scenario, no we don&#39;t need a lot of buffer, in<br>
&gt; practical situations today, yes we need quite a bit of buffer.<br>
<br>
That is an old theory known to be invalid (Ethernet switches with<br>
small buffer is enough for IXes) and theoretically denied by:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Sizing router buffers<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"https://dl.acm.org/doi/10.1145/10301=
94.1015499" rel=3D"noreferrer" target=3D"_blank">https://dl.acm.org/doi/10.=
1145/1030194.1015499</a><br>
<br>
after which paced TCP was developed for unimportant exceptional<br>
cases of LAN.<br>
<br>
=C2=A0&gt; Now add to this multiple logical interfaces, each having 4-8 que=
ues,<br>
=C2=A0&gt; it adds up.<br>
<br>
Having so may queues requires sorting of queues to properly<br>
prioritize them, which costs a lot of computation (and<br>
performance loss) for no benefit and is a bad idea.<br>
<br>
=C2=A0&gt; Also the shallow ingress buffers discussed in the thread are not=
 delay<br>
=C2=A0&gt; buffers and the problem is complex because no device is marketab=
le<br>
=C2=A0&gt; that can accept wire rate of minimum packet size, so what trade-=
offs<br>
=C2=A0&gt; do we carry, when we get bad traffic at wire rate at small packe=
t<br>
=C2=A0&gt; size? We can&#39;t empty the ingress buffers fast enough, do we =
have<br>
=C2=A0&gt; physical memory for each port, do we share, how do we share?<br>
<br>
People who use irrationally small packets will suffer, which is<br>
not a problem for the rest of us.<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 Masataka Ohta<br>
<br>
<br>
</blockquote></div></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail_signature"><div dir=3D"ltr"><div>FQ World Domination pendin=
g: <a href=3D"https://blog.cerowrt.org/post/state_of_fq_codel/" target=3D"_=
blank">https://blog.cerowrt.org/post/state_of_fq_codel/</a><br></div><div>D=
ave T=C3=A4ht CEO, TekLibre, LLC <br></div></div></div></div></div>

--00000000000095bef605e5aa1ca2--