From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <neil.davies@pnsol.com>
Received: from eu1sys200aog107.obsmtp.com (eu1sys200aog107.obsmtp.com
	[207.126.144.123])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 137EB2006D6
	for <bloat@lists.bufferbloat.net>; Wed, 17 Aug 2011 23:51:52 -0700 (PDT)
Received: from mail.la.pnsol.com ([89.145.213.110]) (using TLSv1) by
	eu1sys200aob107.postini.com ([207.126.147.11]) with SMTP
	ID DSNKTkzDGtnd3WrnGqAcwZ2ugNuOuR8p9N2x@postini.com;
	Thu, 18 Aug 2011 07:45:40 UTC
Received: from ba6-office.pnsol.com ([172.20.5.199])
	by mail.la.pnsol.com with esmtp (Exim 4.63)
	(envelope-from <neil.davies@pnsol.com>)
	id 1QtxI9-0005Zu-NH; Thu, 18 Aug 2011 08:45:29 +0100
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Neil Davies <neil.davies@pnsol.com>
In-Reply-To: <20110817205724.4b91e188@nehalam.ftrdhcpuser.net>
Date: Thu, 18 Aug 2011 08:45:29 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <21FE0F13-C946-4212-93F7-64112F63E8CF@pnsol.com>
References: <CAKGkousxwnvog=De9X9ynDs=_iqXXqD93opTcM_gGCshautHHg@mail.gmail.com>
	<20110817205724.4b91e188@nehalam.ftrdhcpuser.net>
To: Stephen Hemminger <shemminger@vyatta.com>
X-Mailer: Apple Mail (2.1084)
Cc: "Patrick J. LoPresti" <lopresti@gmail.com>, bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Not all the world's a WAN
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 06:51:53 -0000

Stephen

I disagree with you - Patrick has solved his problem.

As for papering over the cracks - that is just pure provocation - if =
more than *one* packet in the buffer?=20

Any finite queueing system has two degrees of freedom - there are three =
variables in play: loading factor (ratio of arrival to departure along =
with distribution of those); delay (distribution); and loss (and its =
distribution).

And in that system it is a trade - Patrick's trade is to constrain the =
arrival distribution/pattern so as to keep the total quality attenuation =
(delay and loss) at his buffer points within an acceptable bound for his =
application - he's made a rational choice for his requirements - he's =
bounded the induced quality attenuation.

To solve the 'general' case you need to solve the general 'induced =
quality attenuation' - there is the nub of the issue=20

Neil

On 18 Aug 2011, at 04:57, Stephen Hemminger wrote:

> On Wed, 17 Aug 2011 18:26:00 -0700
> "Patrick J. LoPresti" <lopresti@gmail.com> wrote:
>=20
>> Hello, BufferBloat crusaders.
>>=20
>> Permit me briefly to describe my application.  I have a rack full of
>> Linux systems, all with 10GbE NICs tied together by a 10GbE switch.
>> There are no routers or broader Internet connectivity.  (At least,
>> none that matters here.)  Round trip "ping" times between systems are
>> 100 microseconds or so.
>>=20
>> Some of the systems are "servers", some are "clients".  Any single
>> client may decide to slurp data from multiple servers.  For example,
>> the servers could be serving up a distributed file system, so when a
>> client accesses a file striped across multiple servers, it tries to
>> pull data from multiple servers simultaneously.  (This is not my
>> literal application, but it does represent the same access pattern.)
>>=20
>> The purpose of my cluster is to process data sets measured in =
hundreds
>> of gigabytes, as fast as possible.  So, for my application:
>>=20
>> - Speed =3D Throughput (latency is irrelevant)
>> - TCP retransmissions are a disaster, not least because
>> - 200ms is an eternity
>>=20
>>=20
>> The problem I have is this:  At 10 gigabits/second, it takes very
>> little time to overrun even a sizable buffer in a 10GbE switch.
>> Although each client does have a 10GbE connection, it is reading
>> multiple sockets from multiple servers, so over short intervals the
>> switch's aggregate incoming bandwidth (multiple 10GbE links from
>> servers) is larger than its outgoing bandwidth (single 10GbE link to
>> client).  If the servers do not throttle themselves -- say, because
>> the TCP windows are large -- packets overrun the switch's buffer and
>> get lost.
>=20
> You need faster switches ;-)
>=20
>> I have "fixed" this problem by using a switch with a _large_ buffer,
>> plus using TCP_WINDOW_CLAMP on the clients to ensure the TCP window
>> never gets very large.  This ensures that the servers never send so
>> much data that they overrun the switch.  And it is actually working
>> great; I am able to saturate all of my 10GbE links with zero
>> retransmissions.
>=20
> You just papered over the problem. If the mean queue length over
> time is greater than one, you will lose packets. This maybe a case
> where Ethernet flow control might help. It does have the problem
> of head of line blocking when cascading switches but if the switch
> is just a pass through it might help.
>=20
>> I have not read all of the messages on this list, but I have read
>> enough to make me a little nervous.  And thus I send this message in
>> the hope that, in your quest to slay the "buffer bloat" dragon, you =
do
>> not completely forget applications like mine.  I would hate to have =
to
>> switch to Infiniband or whatever just because everyone decided that
>> Web browsers are the only TCP/IP application in the world.
>>=20
>=20
> My view is this all about getting the defaults right for average
> users. People with big servers will always end up tuning; thats what
> they get paid for. Think of it as the difference between a Formula 1
> car versus an average sedan. You want the sedan to just work, and
> have all the traction control and rev limiters. For the F1 race
> car, the driver knows best.
>=20
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat