From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ww0-f47.google.com (mail-ww0-f47.google.com [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 43F6A201A61 for ; Fri, 6 May 2011 04:28:12 -0700 (PDT) Received: by wwk4 with SMTP id 4so3214084wwk.28 for ; Fri, 06 May 2011 04:33:04 -0700 (PDT) Received: by 10.216.230.213 with SMTP id j63mr6909166weq.20.1304681584744; Fri, 06 May 2011 04:33:04 -0700 (PDT) Received: from [192.168.0.160] (host86-143-173-227.range86-143.btcentralplus.com [86.143.173.227]) by mx.google.com with ESMTPS id g32sm1566350wej.27.2011.05.06.04.33.02 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 06 May 2011 04:33:03 -0700 (PDT) References: <4DB70FDA.6000507@mti-systems.com> <4DC2C9D2.8040703@freedesktop.org> <20110505091046.3c73e067@nehalam> <6E25D2CF-D0F0-4C41-BABC-4AB0C00862A6@pnsol.com> In-Reply-To: <6E25D2CF-D0F0-4C41-BABC-4AB0C00862A6@pnsol.com> Mime-Version: 1.0 (iPad Mail 8C148) Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary=Apple-Mail-5--42704657 Message-Id: <16D9F290-9FB1-4885-80C9-4F93283DA87C@spacething.org> X-Mailer: iPad Mail (8C148) From: Sam Stickland Date: Fri, 6 May 2011 12:40:07 +0100 To: Neil Davies Cc: Stephen Hemminger , "bloat@lists.bufferbloat.net" Subject: Re: [Bloat] Burst Loss X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 11:28:13 -0000 --Apple-Mail-5--42704657 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 5 May 2011, at 17:49, Neil Davies wrote: > On the issue of loss - we did a study of the UK's ADSL access network back= in 2006 over several weeks, looking at the loss and delay that was introduc= ed into the bi-directional traffic. >=20 > We found that the delay variability (that bit left over after you've taken= the effects of geography and line sync rates) was broadly > the same over the half dozen locations we studied - it was there all the t= ime to the same level of variance and that what did vary by time of day was= the loss rate. >=20 > We also found out, at the time much to our surprise - but we understand wh= y now, that loss was broadly independent of the offered load - we used a con= stant data rate (with either fixed or variable packet sizes) . >=20 > We found that loss rates were in the range 1% to 3% (which is what would b= e expected from a large number of TCP streams contending for a limiting reso= urce). >=20 > As for burst loss, yes it does occur - but it could be argued that this mo= re the fault of the sending TCP stack than the network. >=20 > This phenomenon was well covered in the academic literature in the '90s (i= f I remember correctly folks at INRIA lead the way) - it is all down to the n= ature of random processes and how you observe them. =20 >=20 > Back to back packets see higher loss rates than packets more spread out in= time. Consider a pair of packets, back to back, arriving over a 1Gbit/sec l= ink into a queue being serviced at 34Mbit/sec, the first packet being 'lost'= is equivalent to saying that the first packet 'observed' the queue full - t= he system's state is no longer a random variable - it is known to be full. T= he second packet (lets assume it is also a full one) 'makes an observation' o= f the state of that queue about 12us later - but that is only 3% of the time= that it takes to service such large packets at 34 Mbit/sec. The system has n= ot had any time to 'relax' anywhere near to back its steady state, it is hig= hly likely that it is still full.=20 >=20 > Fixing this makes a phenomenal difference on the goodput (with the usual d= elay effects that implies), we've even built and deployed systems with this s= ort of engineering embedded (deployed as a network 'wrap') that mean that en= d users can sustainably (days on end) achieve effective throughput that is b= etter than 98% of (the transmission media imposed) maximum. What we had done= is make the network behave closer to the underlying statistical assumptions= made in TCP's design. How did you fix this? What alters the packet spacing? The network or the hos= t? Sam= --Apple-Mail-5--42704657 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8


On 5 May 2011, at 17:49, Neil D= avies <Neil.Davies@pnsol.com= > wrote:

On t= he issue of loss - we did a study of the UK's ADSL access network back in 20= 06 over several weeks, looking at the loss and delay that was introduced int= o the bi-directional traffic.

We found that= the delay variability (that bit left over after you've taken the effects of= geography and line sync rates) was broadly
the same over th= e half dozen locations we studied - it was there all the time to the same le= vel of  variance and that what did vary by time of day was the loss rat= e.

We also found out, at the time much to o= ur surprise - but we understand why now, that loss was broadly independent o= f the offered load - we used a constant data rate (with either fixed or vari= able packet sizes) .

We found that loss rat= es were in the range 1% to 3% (which is what would be expected from a large n= umber of TCP streams contending for a limiting resource).

As for burst loss, yes it does occur - but it could be argued= that this more the fault of the sending TCP stack than the network.<= br>
This phenomenon was well covered in the academic l= iterature in the '90s (if I remember correctly folks at INRIA lead the way) -= it is all down to the nature of random processes and how you observe them. &= nbsp;

Back to back packets see higher loss r= ates than packets more spread out in time. Consider a pair of packets, back t= o back, arriving over a 1Gbit/sec link into a queue being serviced at 34Mbit= /sec, the first packet being 'lost' is equivalent to saying that the first p= acket 'observed' the queue full - the system's state is no longer a random v= ariable - it is known to be full. The second packet (lets assume it is also a= full one) 'makes an observation' of the state of that queue about 12us late= r - but that is only 3% of the time that it takes to service such large pack= ets at 34 Mbit/sec. The system has not had any time to 'relax' anywhere near= to back its steady state, it is highly likely that it is still full.

Fixing this makes a phenomenal difference on the= goodput (with the usual delay effects that implies), we've even built and d= eployed systems with this sort of engineering embedded (deployed as a networ= k 'wrap') that mean that end users can sustainably (days on end) achieve eff= ective throughput that is better than 98% of (the transmission media imposed= ) maximum. What we had done is make the network behave closer to the underly= ing statistical assumptions made in TCP's design.