From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eu1sys200aog102.obsmtp.com (eu1sys200aog102.obsmtp.com [207.126.144.113]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 33ED1200830 for ; Sat, 7 Apr 2012 08:08:27 -0700 (PDT) Received: from mail.la.pnsol.com ([89.145.213.110]) (using TLSv1) by eu1sys200aob102.postini.com ([207.126.147.11]) with SMTP ID DSNKT4BYZfdlzq7vLdkFOV71TWQeTfe3lLLS@postini.com; Sat, 07 Apr 2012 15:08:27 UTC Received: from [172.20.5.166] by mail.la.pnsol.com with esmtp (Exim 4.63) (envelope-from ) id 1SGXFV-0003x6-0C; Sat, 07 Apr 2012 16:08:21 +0100 Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=us-ascii From: Neil Davies In-Reply-To: Date: Sat, 7 Apr 2012 16:08:20 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <8C78AC1F-7305-4623-ADE6-1535CAA1FCBF@pnsol.com> References: <20120406213725.GA12641@uio.no> To: Fred Baker X-Mailer: Apple Mail (2.1257) Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Best practices for paced TCP on Linux? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Apr 2012 15:08:28 -0000 Fred That is the general idea - the issue is that the dynamic arrival rate as = "round trip window size" double just dramatically exceeds the available = buffering at some intermediate point - it is self inflicted (intra = stream) congestion with the effect of dramatically increasing the = quality attenuation (delay and loss) for streams flowing through that = point. The packet train may also be an issue, especially if there is h/w assist = for TCP (which might well be the case here, as the interface was a 10G = one, comments Steinar?) - we have observed an interesting phenomena in = access networks where packet trains arrive (8+ packets back to pack at = 10G) for service down a low speed (2M) link - this leads to the = effective transport delay being highly non-stationary - with all that = implies for the other flows on that link. Neil On 7 Apr 2012, at 15:17, Fred Baker wrote: >=20 > On Apr 7, 2012, at 4:54 AM, Neil Davies wrote: >=20 >> The answer was rather simple - calculate the amount of buffering = needed to achieve >> say 99% of the "theoretical" throughput (this took some measurement = as to exactly what=20 >> that was) and limit the sender to that. >=20 > So what I think I hear you saying is that we need some form of ioctl = interface in the sockets library that will allow the sender to state the = rate it associates with the data (eg, the video codec rate), and let TCP = calculate >=20 > f(rate in bits per second, pmtu) > cwnd_limit =3D ceiling (--------------------------------) + C > g(rtt in microseconds) >=20 > Where C is a fudge factor, probably a single digit number, and f and g = are appropriate conversion functions. >=20 > I suspect there may also be value in considering Jain's "Packet = Trains" paper. Something you can observe in a simple trace is that the = doubling behavior in slow start has the effect of bunching a TCP = session's data together. If I have two 5 MBPS data exchanges sharing a = 10 MBPS pipe, it's not unusual to observe one of the sessions dominating = the pipe for a while and then the other one, for a long time. One of the = benefits of per-flow WFQ in the network is that it consciously breaks = that up - it forces the TCPs to interleave packets instead of bursts, = which means that a downstream device on a more limited bandwidth sees = packets arrive at what it considers a more rational rate. It might be = nice if In its initial burst, TCP consciously broke the initial window = into 2, or 3, or 4, or ten, individual packet trains - spaced those = packets some number of milliseconds apart, so that their = acknowledgements were similarly spaced, and the resulting packet trains = in subsequent RTTs were relatively small.