From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.vyatta.com (mail.vyatta.com [76.74.103.46]) by huchra.bufferbloat.net (Postfix) with ESMTP id 467A7201A74 for ; Thu, 5 May 2011 09:06:19 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.vyatta.com (Postfix) with ESMTP id C417D18294CE; Thu, 5 May 2011 09:10:48 -0700 (PDT) X-Virus-Scanned: amavisd-new at tahiti.vyatta.com Received: from mail.vyatta.com ([127.0.0.1]) by localhost (mail.vyatta.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AtN5NDkwiQ3y; Thu, 5 May 2011 09:10:47 -0700 (PDT) Received: from nehalam (static-50-53-80-93.bvtn.or.frontiernet.net [50.53.80.93]) by mail.vyatta.com (Postfix) with ESMTPSA id 8CFE318294AD; Thu, 5 May 2011 09:10:47 -0700 (PDT) Date: Thu, 5 May 2011 09:10:46 -0700 From: Stephen Hemminger To: Jim Gettys Message-ID: <20110505091046.3c73e067@nehalam> In-Reply-To: <4DC2C9D2.8040703@freedesktop.org> References: <4DB70FDA.6000507@mti-systems.com> <4DC2C9D2.8040703@freedesktop.org> Organization: Vyatta X-Mailer: Claws Mail 3.7.6 (GTK+ 2.22.0; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Goodput fraction w/ AQM vs bufferbloat X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 16:06:19 -0000 On Thu, 05 May 2011 12:01:22 -0400 Jim Gettys wrote: > On 04/30/2011 03:18 PM, Richard Scheffenegger wrote: > > I'm curious, has anyone done some simulations to check if the > > following qualitative statement holds true, and if, what the > > quantitative effect is: > > > > With bufferbloat, the TCP congestion control reaction is unduely > > delayed. When it finally happens, the tcp stream is likely facing a > > "burst loss" event - multiple consecutive packets get dropped. Worse > > yet, the sender with the lowest RTT across the bottleneck will likely > > start to retransmit while the (tail-drop) queue is still overflowing. > > > > And a lost retransmission means a major setback in bandwidth (except > > for Linux with bulk transfers and SACK enabled), as the standard (RFC > > documented) behaviour asks for a RTO (1sec nominally, 200-500 ms > > typically) to recover such a lost retransmission... > > > > The second part (more important as an incentive to the ISPs actually), > > how does the fraction of goodput vs. throughput change, when AQM > > schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs > > have to pay for their upstream volume, regardless if that is "real" > > work (goodput) or unneccessary retransmissions. > > > > When I was at a small cable ISP in switzerland last week, surely > > enough bufferbloat was readily observable (17ms -> 220ms after 30 sec > > of a bulk transfer), but at first they had the "not our problem" view, > > until I started discussing burst loss / retransmissions / goodput vs > > throughput - with the latest point being a real commercial incentive > > to them. (They promised to check if AQM would be available in the CPE > > / CMTS, and put latency bounds in their tenders going forward). > > > I wish I had a good answer to your very good questions. Simulation > would be interesting though real daa is more convincing. > > I haven't looked in detail at all that many traces to try to get a feel > for how much bandwidth waste there actually is, and more formal studies > like Netalyzr, SamKnows, or the Bismark project would be needed to > quantify the loss on the network as a whole. > > I did spend some time last fall with the traces I've taken. In those, > I've typically been seeing 1-3% packet loss in the main TCP transfers. > On the wireless trace I took, I saw 9% loss, but whether that is > bufferbloat induced loss or not, I don't know (the data is out there for > those who might want to dig). And as you note, the losses are > concentrated in bursts (probably due to the details of Cubic, so I'm told). > > I've had anecdotal reports (and some first hand experience) with much > higher loss rates, for example from Nick Weaver at ICSI; but I believe > in playing things conservatively with any numbers I quote and I've not > gotten consistent results when I've tried, so I just report what's in > the packet captures I did take. > > A phenomena that could be occurring is that during congestion avoidance > (until TCP loses its cookies entirely and probes for a higher operating > point) that TCP is carefully timing it's packets to keep the buffers > almost exactly full, so that competing flows (in my case, simple pings) > are likely to arrive just when there is no buffer space to accept them > and therefore you see higher losses on them than you would on the single > flow I've been tracing and getting loss statistics from. > > People who want to look into this further would be a great help. > - Jim I would not put a lot of trust in measuring loss with pings. I heard that some ISP's do different processing on ICMP's used for ping packets. They either prioritize them high to provide artificially good response (better marketing numbers); or prioritize them low since they aren't useful traffic. There are also filters that only allow N ICMP requests per second which means repeated probes will be dropped. --