From: "Richard Scheffenegger" <rscheff@gmx.at>
To: "Patrick J. LoPresti" <lopresti@gmail.com>,
<bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] Not all the world's a WAN
Date: Tue, 23 Aug 2011 09:44:15 +0200 [thread overview]
Message-ID: <D263C3AE91D44335A5D5858BD31C0262@srichardlxp2> (raw)
In-Reply-To: <CAKGkousxwnvog=De9X9ynDs=_iqXXqD93opTcM_gGCshautHHg@mail.gmail.com>
you problem is called incast. and there is vast literature around that
subject, how to alleviate it more or less.
there are simple approaches - with limited benefits - like rto reduction,
hires tcp timers, and introducing short random delays for the responses.
none of these will give optimal bandwidth though.
if you have a cheap 10g switch built around the broadcom chipsets, rather
than the expensive gear from another more well known vendor, you can perhaps
deploy dctcp, yielding up to 98% bandwidth with optimal latency and near
zero loss, even when the setup is prone to severe incast...
rgds
----- Original Message -----
From: "Patrick J. LoPresti" <lopresti@gmail.com>
To: <bloat@lists.bufferbloat.net>
Sent: Thursday, August 18, 2011 3:26 AM
Subject: [Bloat] Not all the world's a WAN
> Hello, BufferBloat crusaders.
>
> Permit me briefly to describe my application. I have a rack full of
> Linux systems, all with 10GbE NICs tied together by a 10GbE switch.
> There are no routers or broader Internet connectivity. (At least,
> none that matters here.) Round trip "ping" times between systems are
> 100 microseconds or so.
>
> Some of the systems are "servers", some are "clients". Any single
> client may decide to slurp data from multiple servers. For example,
> the servers could be serving up a distributed file system, so when a
> client accesses a file striped across multiple servers, it tries to
> pull data from multiple servers simultaneously. (This is not my
> literal application, but it does represent the same access pattern.)
>
> The purpose of my cluster is to process data sets measured in hundreds
> of gigabytes, as fast as possible. So, for my application:
>
> - Speed = Throughput (latency is irrelevant)
> - TCP retransmissions are a disaster, not least because
> - 200ms is an eternity
>
>
> The problem I have is this: At 10 gigabits/second, it takes very
> little time to overrun even a sizable buffer in a 10GbE switch.
> Although each client does have a 10GbE connection, it is reading
> multiple sockets from multiple servers, so over short intervals the
> switch's aggregate incoming bandwidth (multiple 10GbE links from
> servers) is larger than its outgoing bandwidth (single 10GbE link to
> client). If the servers do not throttle themselves -- say, because
> the TCP windows are large -- packets overrun the switch's buffer and
> get lost.
>
> I have "fixed" this problem by using a switch with a _large_ buffer,
> plus using TCP_WINDOW_CLAMP on the clients to ensure the TCP window
> never gets very large. This ensures that the servers never send so
> much data that they overrun the switch. And it is actually working
> great; I am able to saturate all of my 10GbE links with zero
> retransmissions.
>
> I have not read all of the messages on this list, but I have read
> enough to make me a little nervous. And thus I send this message in
> the hope that, in your quest to slay the "buffer bloat" dragon, you do
> not completely forget applications like mine. I would hate to have to
> switch to Infiniband or whatever just because everyone decided that
> Web browsers are the only TCP/IP application in the world.
>
> Thanks for reading.
>
> - Pat
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
prev parent reply other threads:[~2011-08-23 6:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-18 1:26 Patrick J. LoPresti
2011-08-18 3:57 ` Stephen Hemminger
2011-08-18 7:45 ` Neil Davies
2011-08-18 5:08 ` Steinar H. Gunderson
2011-08-19 14:59 ` BeckW
2011-08-23 7:37 ` Richard Scheffenegger
2011-08-23 7:44 ` Richard Scheffenegger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D263C3AE91D44335A5D5858BD31C0262@srichardlxp2 \
--to=rscheff@gmx.at \
--cc=bloat@lists.bufferbloat.net \
--cc=lopresti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox