[Bloat] Not all the world's a WAN

Patrick J. LoPresti lopresti at gmail.com
Wed Aug 17 21:26:00 EDT 2011


Hello, BufferBloat crusaders.

Permit me briefly to describe my application.  I have a rack full of
Linux systems, all with 10GbE NICs tied together by a 10GbE switch.
There are no routers or broader Internet connectivity.  (At least,
none that matters here.)  Round trip "ping" times between systems are
100 microseconds or so.

Some of the systems are "servers", some are "clients".  Any single
client may decide to slurp data from multiple servers.  For example,
the servers could be serving up a distributed file system, so when a
client accesses a file striped across multiple servers, it tries to
pull data from multiple servers simultaneously.  (This is not my
literal application, but it does represent the same access pattern.)

The purpose of my cluster is to process data sets measured in hundreds
of gigabytes, as fast as possible.  So, for my application:

 - Speed = Throughput (latency is irrelevant)
 - TCP retransmissions are a disaster, not least because
 - 200ms is an eternity


The problem I have is this:  At 10 gigabits/second, it takes very
little time to overrun even a sizable buffer in a 10GbE switch.
Although each client does have a 10GbE connection, it is reading
multiple sockets from multiple servers, so over short intervals the
switch's aggregate incoming bandwidth (multiple 10GbE links from
servers) is larger than its outgoing bandwidth (single 10GbE link to
client).  If the servers do not throttle themselves -- say, because
the TCP windows are large -- packets overrun the switch's buffer and
get lost.

I have "fixed" this problem by using a switch with a _large_ buffer,
plus using TCP_WINDOW_CLAMP on the clients to ensure the TCP window
never gets very large.  This ensures that the servers never send so
much data that they overrun the switch.  And it is actually working
great; I am able to saturate all of my 10GbE links with zero
retransmissions.

I have not read all of the messages on this list, but I have read
enough to make me a little nervous.  And thus I send this message in
the hope that, in your quest to slay the "buffer bloat" dragon, you do
not completely forget applications like mine.  I would hate to have to
switch to Infiniband or whatever just because everyone decided that
Web browsers are the only TCP/IP application in the world.

Thanks for reading.

 - Pat



More information about the Bloat mailing list