From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp105.iad3a.emailsrvr.com (smtp105.iad3a.emailsrvr.com [173.203.187.105]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 61F783B260 for ; Mon, 6 Jun 2016 22:58:50 -0400 (EDT) Received: from smtp14.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp14.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 299F72805D2; Mon, 6 Jun 2016 22:58:50 -0400 (EDT) Received: from app8.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp14.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 0502028053A; Mon, 6 Jun 2016 22:58:50 -0400 (EDT) X-Sender-Id: dpreed@reed.com Received: from app8.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by 0.0.0.0:25 (trex/5.5.4); Mon, 06 Jun 2016 22:58:50 -0400 Received: from reed.com (localhost [127.0.0.1]) by app8.wa-webapps.iad3a (Postfix) with ESMTP id E8652E1409; Mon, 6 Jun 2016 22:58:49 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Mon, 6 Jun 2016 22:58:49 -0400 (EDT) Date: Mon, 6 Jun 2016 22:58:49 -0400 (EDT) From: dpreed@reed.com To: dpreed@reed.com Cc: "Ketan Kulkarni" , "Jonathan Morton" , "cerowrt-devel@lists.bufferbloat.net" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20160606225849000000_71660" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: <1465267957.902610235@apps.rackspace.com> References: <55fdf513-9c54-bea9-1f53-fe2c5229d7ba@eggo.org> <871t4as1h9.fsf@toke.dk> <3D32F19B-5DEA-48AD-97E7-D043C4EAEC51@gmail.com> <1465267957.902610235@apps.rackspace.com> X-Auth-ID: dpreed@reed.com Message-ID: <1465268329.938313737@apps.rackspace.com> X-Mailer: webmail/12.4.2-RC Subject: Re: [Cerowrt-devel] trying to make sense of what switch vendors say wrt buffer bloat X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2016 02:58:50 -0000 ------=_20160606225849000000_71660 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0AEven better, it would be fun to get access to an Arista switch and some = high performance TCP sources and sinks, and demonstrate extreme bufferbloat= compared to a small-buffer switch. Just a demo, not a simulation full of = assumptions and guesses.=0A =0ARRUL, basically.=0A =0A=0A=0AOn Monday, June= 6, 2016 10:52pm, dpreed@reed.com said:=0A=0A=0A=0ASo did anyone write a re= sponse debunking their paper? Their NS-2 simulation is most likely the er= roneous part of their analysis - the white paper would not pass a review by= qualified referees because there is no way to check their results and some= of what they say beggars belief.=0A =0ABechtolsheim is one of those guys w= ho can write any damn thing and it becomes "truth" - mostly because he co-f= ounded Sun. But that doesn't mean that he can't make huge errors - any of u= s can.=0A =0AThe so-called TCP/IP Bandwidth Capture effect that he refers t= o doesn't sound like any capture effect I've ever heard of. There is an "E= thernet Capture Effect" (which is cited), which is due to properties of CSM= A/CD binary exponential backoff, not anything to do with TCP's flow/congest= ion control. So it has that "truthiness" that makes glib people sound like= they know what they are talking about, but I'd like to see a reference tha= t says this is a property of TCP!=0A =0AWhat's interesting is that the refe= rence to the Ethernet Capture Effect in that white paper proposes a solutio= n that involves changing the backoff algorithm slightly at the Ethernet lev= el - NOT increasing buffer size!=0A =0AAnother thing that would probably im= prove matters a great deal would be to drop/ECN-mark packets when a contend= ed output port on an Arista switch develops a backlog. This will throttle = TCP sources sharing the path.=0A =0AThe comments in the white paper that sa= y that ACK contention in TCP in the reverse direction are the problem that = causes the "so-called TCP/IP Bandwidth Capture effect" that is invented by = the authors appears to be hogwash of the first order.=0A =0ADebunking Becht= olsheim credibly would get a lot of attention to the bufferbloat cause, I s= uspect.=0A =0A=0A=0AOn Monday, June 6, 2016 5:16pm, "Ketan Kulkarni" said:=0A=0A=0A=0Asome time back they had this whitepaper -= =0A"Why Big Data Needs Big Buffer Switches"=0A=0A[ http://www.arista.com/as= sets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf ]( http://www.arista.com= /assets/data/pdf/Whitepapers/BigDataBigBuffers-WP.pdf )=0Athe type of apps = they talk about is big data, hadoop etc=0A=0A=0AOn Mon, Jun 6, 2016 at 11:3= 7 AM, Mikael Abrahamsson <[ swmike@swm.pp.se ]( mailto:swmike@swm.pp.se )> = wrote:=0AOn Mon, 6 Jun 2016, Jonathan Morton wrote:=0A=0AAt 100ms buffering= , their 10Gbps switch is effectively turning any DC it=E2=80=99s installed = in into a transcontinental Internet path, as far as peak latency is concern= ed. Just because RAM is cheap these days=E2=80=A6Nono, nononononono. I can= tell you they're spending serious money on inserting this kind of bufferin= g memory into these kinds of devices. Buying these devices without deep buf= fers is a lot lower cost.=0A=0A These types of switch chips either have on-= die memory (usually 16MB or less), or they have very expensive (a direct co= st of lowered port density) off-chip buffering memory.=0A=0A Typically you = do this:=0A=0A ports ---|-------=0A ports ---| |=0A ports ---| chip |= =0A ports ---|-------=0A=0A Or you do this=0A=0A ports ---|------|---buffer= =0A ports ---| chip |---TCAM=0A --------=0A=0A or if you do a mult= i-linecard-device=0A=0A ports ---|------|---buffer=0A | chip |---T= CAM=0A --------=0A |=0A switch fabric=0A=0A (o= r any variant of them)=0A=0A So basically if you want to buffer and if you = want large L2-L4 lookup tables, you have to sacrifice ports. Sacrifice lots= of ports.=0A=0A So never say these kinds of devices add buffering because = RAM is cheap. This is most definitely not why they're doing it. Buffer memo= ry for them is EXTREMELY EXPENSIVE.=0A=0A -- =0A Mikael Abrahamsson emai= l: [ swmike@swm.pp.se ]( mailto:swmike@swm.pp.se )=0A______________________= _________________________=0A Cerowrt-devel mailing list=0A[ Cerowrt-devel@l= ists.bufferbloat.net ]( mailto:Cerowrt-devel@lists.bufferbloat.net )=0A[ ht= tps://lists.bufferbloat.net/listinfo/cerowrt-devel ]( https://lists.bufferb= loat.net/listinfo/cerowrt-devel )=0A=0A ------=_20160606225849000000_71660 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Even= better, it would be fun to get access to an Arista switch and some high pe= rformance TCP sources and sinks, and demonstrate extreme bufferbloat compar= ed to a small-buffer switch.  Just a demo, not a simulation full of as= sumptions and guesses.

=0A

 

=0A

RRUL, basically.

=0A



On Monday, June 6, 2016 10:= 52pm, dpreed@reed.com said:

=0A
=0A

So did anyone wr= ite a response debunking their paper?   Their NS-2 simulation is = most likely the erroneous part of their analysis - the white paper would no= t pass a review by qualified referees because there is no way to check thei= r results and some of what they say beggars belief.

=0A

 

=0A

Bechtolsheim is one of those guys who can write an= y damn thing and it becomes "truth" - mostly because he co-founded Sun. But= that doesn't mean that he can't make huge errors - any of us can.

=0A 

=0A

The so-called TCP/IP Bandwidth Capt= ure effect that he refers to doesn't sound like any capture effect I've eve= r heard of.  There is an "Ethernet Capture Effect" (which is cited), w= hich is due to properties of CSMA/CD binary exponential backoff, not anythi= ng to do with TCP's flow/congestion control.  So it has that "truthine= ss" that makes glib people sound like they know what they are talking about= , but I'd like to see a reference that says this is a property of TCP!

= =0A

 

=0A

What's interesting is that the= reference to the Ethernet Capture Effect in that white paper proposes a so= lution that involves changing the backoff algorithm slightly at the Etherne= t level - NOT increasing buffer size!

=0A

 

=0A

Another thing that would probably improve matters a great deal w= ould be to drop/ECN-mark packets when a contended output port on an Arista = switch develops a backlog.  This will throttle TCP sources sharing the= path.

=0A

 

= =0A

The comments in the = white paper that say that ACK contention in TCP in the reverse direction ar= e the problem that causes the "so-called TCP/IP Bandwidth Capture effect" t= hat is invented by the authors appears to be hogwash of the first order.=0A

 

=0A

Debunking Bechtolsheim credib= ly would get a lot of attention to the bufferbloat cause, I suspect.

=0A=

 

=0A


These types of switch chips either have on-die memory (us= ually 16MB or less), or they have very expensive (a direct cost of lowered = port density) off-chip buffering memory.

Typically you do this:=

ports ---|-------
ports ---|      |
= ports ---| chip |
ports ---|-------

Or you do this
=
ports ---|------|---buffer
ports ---| chip |---TCAM
&nbs= p;        --------

or if you do a multi-lin= ecard-device

ports ---|------|---buffer
    &nb= sp;    | chip |---TCAM
         --= ------
            |
   = ;     switch fabric

(or any variant of them)
So basically if you want to buffer and if you want large L2-L4 lookup t= ables, you have to sacrifice ports. Sacrifice lots of ports.

So= never say these kinds of devices add buffering because RAM is cheap. This = is most definitely not why they're doing it. Buffer memory for them is EXTR= EMELY EXPENSIVE.

--
Mikael Abrahamsson    email: swmike@swm.pp.se

_______________________________________________
Cerowrt-devel maili= ng list
Cerowrt= -devel@lists.bufferbloat.net
https://li= sts.bufferbloat.net/listinfo/cerowrt-devel

=0A<= /div>=0A

=0A=0A
------=_20160606225849000000_71660--