From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lopresti@gmail.com>
Received: from mail-fx0-f43.google.com (mail-fx0-f43.google.com
	[209.85.161.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 83D8D2006D6
	for <bloat@lists.bufferbloat.net>; Wed, 17 Aug 2011 17:32:24 -0700 (PDT)
Received: by fxg17 with SMTP id 17so1777810fxg.16
	for <bloat@lists.bufferbloat.net>; Wed, 17 Aug 2011 18:26:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=Avyb7/pu0TdwfcqXY2NmqWAqZgBPxoVJxgB+Pjq4dJE=;
	b=BOR2uJTRrVgMDm8qsIrkjTp+hHg6qpETlGz5Lkk4QbSa0GJ6S35sKxjmJmnbh8qBjK
	C49+SKWwB74DUHM4TKbySVUrsc3AiD6Pu848Pr7j+xviIusLSuJEufPZPfLeVQ0NYAnx
	25+A2ZWWep8pWabBZrLNzWRuE7OVNZd36fXxY=
MIME-Version: 1.0
Received: by 10.223.155.74 with SMTP id r10mr211067faw.32.1313630761237; Wed,
	17 Aug 2011 18:26:01 -0700 (PDT)
Received: by 10.223.106.142 with HTTP; Wed, 17 Aug 2011 18:26:00 -0700 (PDT)
Date: Wed, 17 Aug 2011 18:26:00 -0700
Message-ID: <CAKGkousxwnvog=De9X9ynDs=_iqXXqD93opTcM_gGCshautHHg@mail.gmail.com>
From: "Patrick J. LoPresti" <lopresti@gmail.com>
To: bloat@lists.bufferbloat.net
Content-Type: text/plain; charset=ISO-8859-1
Subject: [Bloat] Not all the world's a WAN
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 00:32:25 -0000

Hello, BufferBloat crusaders.

Permit me briefly to describe my application.  I have a rack full of
Linux systems, all with 10GbE NICs tied together by a 10GbE switch.
There are no routers or broader Internet connectivity.  (At least,
none that matters here.)  Round trip "ping" times between systems are
100 microseconds or so.

Some of the systems are "servers", some are "clients".  Any single
client may decide to slurp data from multiple servers.  For example,
the servers could be serving up a distributed file system, so when a
client accesses a file striped across multiple servers, it tries to
pull data from multiple servers simultaneously.  (This is not my
literal application, but it does represent the same access pattern.)

The purpose of my cluster is to process data sets measured in hundreds
of gigabytes, as fast as possible.  So, for my application:

 - Speed = Throughput (latency is irrelevant)
 - TCP retransmissions are a disaster, not least because
 - 200ms is an eternity


The problem I have is this:  At 10 gigabits/second, it takes very
little time to overrun even a sizable buffer in a 10GbE switch.
Although each client does have a 10GbE connection, it is reading
multiple sockets from multiple servers, so over short intervals the
switch's aggregate incoming bandwidth (multiple 10GbE links from
servers) is larger than its outgoing bandwidth (single 10GbE link to
client).  If the servers do not throttle themselves -- say, because
the TCP windows are large -- packets overrun the switch's buffer and
get lost.

I have "fixed" this problem by using a switch with a _large_ buffer,
plus using TCP_WINDOW_CLAMP on the clients to ensure the TCP window
never gets very large.  This ensures that the servers never send so
much data that they overrun the switch.  And it is actually working
great; I am able to saturate all of my 10GbE links with zero
retransmissions.

I have not read all of the messages on this list, but I have read
enough to make me a little nervous.  And thus I send this message in
the hope that, in your quest to slay the "buffer bloat" dragon, you do
not completely forget applications like mine.  I would hate to have to
switch to Infiniband or whatever just because everyone decided that
Web browsers are the only TCP/IP application in the world.

Thanks for reading.

 - Pat