From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 9B0673B29D for ; Sun, 7 Aug 2022 13:34:31 -0400 (EDT) Received: by mail-wr1-x431.google.com with SMTP id j1so8559095wrw.1 for ; Sun, 07 Aug 2022 10:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=BaJ0U9PCqzTQyHlDDC8dfGhssuLD06Ncd9VVN3O/4EE=; b=S1KP3/dzgzpTBuZeFbN9VdvpGr6Lq9FnqAEE4rauc1zp5yznAfCtapWBPQLHyYTTq7 2P7lHFCAiAqCQ/A4bBH04KeS8e02T+a7pu/rQtkCwThW9CvxY6TxKJqUOP4W8KoZpEOv IJfEYedRUWbAhWpOXeRTyuPCQJ4Txh9NZ2lTcv2JFk1cm2Rk2b1j7gtafUmZZV5mDU6z RPzUyZtAeasns4D7c4eG79K3fNy+xUvh8bfgRHjWYkFyr1H6nUrLnBgP1lAauMAF42CB 26GrSWwzwjW/69GOolpyb4OqsKOPzTZ0x4jh4nhPCLu71PK4yIBNY35L17EC/8c8JVNn dwEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=BaJ0U9PCqzTQyHlDDC8dfGhssuLD06Ncd9VVN3O/4EE=; b=gAuK+y2h6ncFDQiD4MAZGbwe067NlD1Iwn/8lvKoW4PlEi5w51YE3qlkCaUo4niTO/ JccYXeYUhTmBeQZwtEKnlAskMfHg4GbZEqXvmqhnrLJuQZuikXXEfd4kHkkv3lc3ALOH 68PLby+8v1a/BOV9klZKSaium26d8MtjFI4ti0pNLbF5NM++PAqsHxTb5EiepifIcFWA 994imv3EPSyJgxUJ+ZRxVjx3tuw1qnDHGBDI6vCkNxTtkQv1J9qUa2jFTJgT4mGCXEIt c71bRhUVB6p0mSCsgds7gtPFxP6vR41XDSnThEPE1dFpwf7le/EhqqgDpMndW20HH70D suxg== X-Gm-Message-State: ACgBeo3+9ERcgusvoDff1gv6UjoZRGvceqvg5jl4D7o1veCKIPxEQFTI r8zoebEFDa5h8k+wK/lDJuZPtf8xR0fdqSMy+VA= X-Google-Smtp-Source: AA6agR7nhO9rkBnXanCeYjSpdDHihwJzQrboxzOeUIN2WyYaL3TdpLQsZR6EqkFEIgvyGLrivhU0Y47cRXrONE1A7Ik= X-Received: by 2002:a5d:488a:0:b0:21e:d477:6555 with SMTP id g10-20020a5d488a000000b0021ed4776555mr9552758wrq.380.1659893670328; Sun, 07 Aug 2022 10:34:30 -0700 (PDT) MIME-Version: 1.0 References: <20220725131600.GC30425@cmadams.net> <884F632E-BB1C-44DA-9FF8-9D20AC66D158@gmail.com> <49E386AE-BC73-458F-B679-2A438CF73E7A@hxcore.ol> <02ac01d8a8f1$3114efb0$933ecf10$@gmail.com> <1e895c68-bc7b-dfbf-3fdc-e31394c68626@necom830.hpcl.titech.ac.jp> <7e0b044d-3963-9c60-6a88-66d462278118@necom830.hpcl.titech.ac.jp> In-Reply-To: From: Dave Taht Date: Sun, 7 Aug 2022 10:34:17 -0700 Message-ID: To: dip Cc: Masataka Ohta , NANOG , bloat Content-Type: multipart/alternative; boundary="00000000000095bef605e5aa1ca2" Subject: Re: [Bloat] 400G forwarding - how does it work? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Aug 2022 17:34:31 -0000 --00000000000095bef605e5aa1ca2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable If it's of any help... the bloat mailing list at lists.bufferbloat.net has the largest concentration of queue theorists and network operator + developers I know of. (also, bloat readers, this ongoing thread on nanog about 400Gbit is fascinating) There is 10+ years worth of debate in the archives: https://lists.bufferbloat.net/pipermail/bloat/2012-May/thread.html as one example. On Sun, Aug 7, 2022 at 10:14 AM dip wrote: > > Disclaimer: I often use the M/M/1 queuing assumption for much of my work > to keep the maths simple and believe that I am reasonably aware in which > context it's a right or a wrong application :). Also, I don't intend to > change the core topic of the thread, but since this has come up, I couldn= 't > resist. > > >> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of > >> buffer is enough to make packet drop probability less than > >> 1%. With 98% load, the probability is 0.0041%. > > To expand the above a bit so that there is no ambiguity. The above assume= s > that the router behaves like an M/M/1 queue. The expected number of packe= ts > in the systems can be given by > > [image: image.png] > where [image: image.png] is the utilization. The probability that at > least B packets are in the system is given by [image: image.png] where B > is the number of packets in the system. for a link utilization of .98, th= e > packet drop probability is .98**(500) =3D 0.000041%. for a link utilizati= on > of 99%, .99**500 =3D 0.00657%. > > Regrettably, tcp ccs, by design do not stop growth until you get that drop, e.g. 100+% utilization. >> When many TCPs are running, burst is averaged and traffic > >> is poisson. > > M/M/1 queuing assumes that traffic is Poisson, and the Poisson assumption > is > 1) The number of sources is infinite > 2) The traffic arrival pattern is random. > > I think the second assumption is where I often question whether the > traffic arrival pattern is truly random. I have seen cases where traffic > behaves more like self-similar. Most Poisson models rely on the Central > limit theorem, which loosely states that the sample distribution will > approach a normal distribution as we aggregate more from various > distributions. The mean will smooth towards a value. > > Do you have any good pointers where the research has been done that > today's internet traffic can be modeled accurately by Poisson? For as man= y > papers supporting Poisson, I have seen as many papers saying it's not > Poisson. > > https://www.icir.org/vern/papers/poisson.TON.pdf > https://www.cs.wustl.edu/~jain/cse567-06/ftp/traffic_models2/#sec1.2 > I am firmly in the not-poisson camp, however, by inserting (esp) FQ and AQM techniques on the bottleneck links it is very possible to smooth traffic into this more easily analytical model - and gain enormous benefits from doing so. > On Sun, 7 Aug 2022 at 04:18, Masataka Ohta < > mohta@necom830.hpcl.titech.ac.jp> wrote: > >> Saku Ytti wrote: >> >> >> I'm afraid you imply too much buffer bloat only to cause >> >> unnecessary and unpleasant delay. >> >> >> >> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of >> >> buffer is enough to make packet drop probability less than >> >> 1%. With 98% load, the probability is 0.0041%. >> >> > I feel like I'll live to regret asking. Which congestion control >> > algorithm are you thinking of? >> >> I'm not assuming LAN environment, for which paced TCP may >> be desirable (if bandwidth requirement is tight, which is >> unlikely in LAN). >> >> > But Cubic and Reno will burst tcp window growth at sender rate, which >> > may be much more than receiver rate, someone has to store that growth >> > and pace it out at receiver rate, otherwise window won't grow, and >> > receiver rate won't be achieved. >> >> When many TCPs are running, burst is averaged and traffic >> is poisson. >> >> > So in an ideal scenario, no we don't need a lot of buffer, in >> > practical situations today, yes we need quite a bit of buffer. >> >> That is an old theory known to be invalid (Ethernet switches with >> small buffer is enough for IXes) and theoretically denied by: >> >> Sizing router buffers >> https://dl.acm.org/doi/10.1145/1030194.1015499 >> >> after which paced TCP was developed for unimportant exceptional >> cases of LAN. >> >> > Now add to this multiple logical interfaces, each having 4-8 queues, >> > it adds up. >> >> Having so may queues requires sorting of queues to properly >> prioritize them, which costs a lot of computation (and >> performance loss) for no benefit and is a bad idea. >> >> > Also the shallow ingress buffers discussed in the thread are not dela= y >> > buffers and the problem is complex because no device is marketable >> > that can accept wire rate of minimum packet size, so what trade-offs >> > do we carry, when we get bad traffic at wire rate at small packet >> > size? We can't empty the ingress buffers fast enough, do we have >> > physical memory for each port, do we share, how do we share? >> >> People who use irrationally small packets will suffer, which is >> not a problem for the rest of us. >> >> Masataka Ohta >> >> >> --=20 FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/ Dave T=C3=A4ht CEO, TekLibre, LLC --00000000000095bef605e5aa1ca2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
If it's of any help... the bloat mailing list at = lists.bufferbloat.net has the = largest concentration of
queue theorists and network operator + d= evelopers I know of. (also, bloat readers, this ongoing thread on nanog abo= ut 400Gbit is fascinating)

There is 10+ years worth of = debate in the archives: https://lists.bufferbloat.net/pipermail/bloat/2= 012-May/thread.html as one example.=C2=A0

On Sun, Aug 7, 2022 at 10:14 AM= dip <diptanshu.singh@gmail= .com> wrote:

Disclaimer: I ofte= n use the M/M/1 queuing assumption for much of my work to keep the maths si= mple and believe that I am reasonably aware in which context it's a rig= ht or a wrong application :). Also, I don't intend to change the core t= opic of the thread, but since this has come up, I couldn't resist.

>> With 99% load M/M/1, = 500 packets (750kB for 1500B MTU) of
>> buffer is enough to make p= acket drop probability less than
>> 1%. With 98% load, the probabi= lity is 0.0041%.


To expand the above a bit so that there is no ambiguity. The above= assumes that the router behaves like an M/M/1 queue. The expected number o= f packets in the systems can be given by

=3D"image.png"
where=C2=A03D"image.png"=C2=A0is the utilization. The probability=C2=A0that at least B packets = are in the system is given by=C2=A0=C2=A03D"image.png"=C2=A0where B is the number of packets in the system. for a= link utilization of .98, the packet drop probability is .98**(500) =3D=C2= =A00.000041%. for a link utilization of 99%,=C2=A0 .99**500 =3D=C2=A00.0065= 7%.


= Regrettably, tcp ccs, by design do not stop growth until you get that drop,= e.g. 100+% utilization.


<= div>
>> When many TCPs are run= ning, burst is averaged and traffic
>= ;> is poisson.

<= /div>
M/M/1 queuing assumes that traffic is Poisson, and the Poisson as= sumption is
1) The number of sources is infinite=C2=A0
2)= The traffic arrival pattern is random.=C2=A0

I th= ink the second assumption is where I often question whether the traffic arr= ival pattern is truly random. I have seen cases where traffic behaves more = like self-similar. Most Poisson models rely on the Central limit theorem, w= hich loosely states that the sample distribution will approach a normal dis= tribution as we aggregate more from various distributions. The mean will sm= ooth towards a value.=C2=A0

Do you have= any good pointers where the research has been done that today's intern= et traffic can be modeled accurately by Poisson? For as many papers support= ing Poisson, I have seen as many papers saying it's not Poisson.
<= div>

I am firmly in the not-poisson camp, however, by in= serting (esp) FQ and AQM techniques on the bottleneck links it is very poss= ible to smooth traffic into this more easily analytical model - and gain en= ormous benefits from doing so.=C2=A0=C2=A0



<= div class=3D"gmail_quote">
On Sun, 7 A= ug 2022 at 04:18, Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wro= te:
Saku Ytti wr= ote:

>> I'm afraid you imply too much buffer bloat only to cause
>> unnecessary and unpleasant delay.
>>
>> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of
>> buffer is enough to make packet drop probability less than
>> 1%. With 98% load, the probability is 0.0041%.

> I feel like I'll live to regret asking. Which congestion control > algorithm are you thinking of?

I'm not assuming LAN environment, for which paced TCP may
be desirable (if bandwidth requirement is tight, which is
unlikely in LAN).

> But Cubic and Reno will burst tcp window growth at sender rate, which<= br> > may be much more than receiver rate, someone has to store that growth<= br> > and pace it out at receiver rate, otherwise window won't grow, and=
> receiver rate won't be achieved.

When many TCPs are running, burst is averaged and traffic
is poisson.

> So in an ideal scenario, no we don't need a lot of buffer, in
> practical situations today, yes we need quite a bit of buffer.

That is an old theory known to be invalid (Ethernet switches with
small buffer is enough for IXes) and theoretically denied by:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Sizing router buffers
=C2=A0 =C2=A0 =C2=A0 =C2=A0 https://dl.acm.org/doi/10.= 1145/1030194.1015499

after which paced TCP was developed for unimportant exceptional
cases of LAN.

=C2=A0> Now add to this multiple logical interfaces, each having 4-8 que= ues,
=C2=A0> it adds up.

Having so may queues requires sorting of queues to properly
prioritize them, which costs a lot of computation (and
performance loss) for no benefit and is a bad idea.

=C2=A0> Also the shallow ingress buffers discussed in the thread are not= delay
=C2=A0> buffers and the problem is complex because no device is marketab= le
=C2=A0> that can accept wire rate of minimum packet size, so what trade-= offs
=C2=A0> do we carry, when we get bad traffic at wire rate at small packe= t
=C2=A0> size? We can't empty the ingress buffers fast enough, do we = have
=C2=A0> physical memory for each port, do we share, how do we share?

People who use irrationally small packets will suffer, which is
not a problem for the rest of us.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 Masataka Ohta




--
D= ave T=C3=A4ht CEO, TekLibre, LLC
--00000000000095bef605e5aa1ca2--