General list for discussing Bufferbloat
 help / color / mirror / Atom feed
From: Neil Davies <neil.davies@pnsol.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: "Patrick J. LoPresti" <lopresti@gmail.com>, bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Not all the world's a WAN
Date: Thu, 18 Aug 2011 08:45:29 +0100	[thread overview]
Message-ID: <21FE0F13-C946-4212-93F7-64112F63E8CF@pnsol.com> (raw)
In-Reply-To: <20110817205724.4b91e188@nehalam.ftrdhcpuser.net>

Stephen

I disagree with you - Patrick has solved his problem.

As for papering over the cracks - that is just pure provocation - if more than *one* packet in the buffer? 

Any finite queueing system has two degrees of freedom - there are three variables in play: loading factor (ratio of arrival to departure along with distribution of those); delay (distribution); and loss (and its distribution).

And in that system it is a trade - Patrick's trade is to constrain the arrival distribution/pattern so as to keep the total quality attenuation (delay and loss) at his buffer points within an acceptable bound for his application - he's made a rational choice for his requirements - he's bounded the induced quality attenuation.

To solve the 'general' case you need to solve the general 'induced quality attenuation' - there is the nub of the issue 

Neil

On 18 Aug 2011, at 04:57, Stephen Hemminger wrote:

> On Wed, 17 Aug 2011 18:26:00 -0700
> "Patrick J. LoPresti" <lopresti@gmail.com> wrote:
> 
>> Hello, BufferBloat crusaders.
>> 
>> Permit me briefly to describe my application.  I have a rack full of
>> Linux systems, all with 10GbE NICs tied together by a 10GbE switch.
>> There are no routers or broader Internet connectivity.  (At least,
>> none that matters here.)  Round trip "ping" times between systems are
>> 100 microseconds or so.
>> 
>> Some of the systems are "servers", some are "clients".  Any single
>> client may decide to slurp data from multiple servers.  For example,
>> the servers could be serving up a distributed file system, so when a
>> client accesses a file striped across multiple servers, it tries to
>> pull data from multiple servers simultaneously.  (This is not my
>> literal application, but it does represent the same access pattern.)
>> 
>> The purpose of my cluster is to process data sets measured in hundreds
>> of gigabytes, as fast as possible.  So, for my application:
>> 
>> - Speed = Throughput (latency is irrelevant)
>> - TCP retransmissions are a disaster, not least because
>> - 200ms is an eternity
>> 
>> 
>> The problem I have is this:  At 10 gigabits/second, it takes very
>> little time to overrun even a sizable buffer in a 10GbE switch.
>> Although each client does have a 10GbE connection, it is reading
>> multiple sockets from multiple servers, so over short intervals the
>> switch's aggregate incoming bandwidth (multiple 10GbE links from
>> servers) is larger than its outgoing bandwidth (single 10GbE link to
>> client).  If the servers do not throttle themselves -- say, because
>> the TCP windows are large -- packets overrun the switch's buffer and
>> get lost.
> 
> You need faster switches ;-)
> 
>> I have "fixed" this problem by using a switch with a _large_ buffer,
>> plus using TCP_WINDOW_CLAMP on the clients to ensure the TCP window
>> never gets very large.  This ensures that the servers never send so
>> much data that they overrun the switch.  And it is actually working
>> great; I am able to saturate all of my 10GbE links with zero
>> retransmissions.
> 
> You just papered over the problem. If the mean queue length over
> time is greater than one, you will lose packets. This maybe a case
> where Ethernet flow control might help. It does have the problem
> of head of line blocking when cascading switches but if the switch
> is just a pass through it might help.
> 
>> I have not read all of the messages on this list, but I have read
>> enough to make me a little nervous.  And thus I send this message in
>> the hope that, in your quest to slay the "buffer bloat" dragon, you do
>> not completely forget applications like mine.  I would hate to have to
>> switch to Infiniband or whatever just because everyone decided that
>> Web browsers are the only TCP/IP application in the world.
>> 
> 
> My view is this all about getting the defaults right for average
> users. People with big servers will always end up tuning; thats what
> they get paid for. Think of it as the difference between a Formula 1
> car versus an average sedan. You want the sedan to just work, and
> have all the traction control and rev limiters. For the F1 race
> car, the driver knows best.
> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


  reply	other threads:[~2011-08-18  6:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-18  1:26 Patrick J. LoPresti
2011-08-18  3:57 ` Stephen Hemminger
2011-08-18  7:45   ` Neil Davies [this message]
2011-08-18  5:08 ` Steinar H. Gunderson
2011-08-19 14:59   ` BeckW
2011-08-23  7:37     ` Richard Scheffenegger
2011-08-23  7:44 ` Richard Scheffenegger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21FE0F13-C946-4212-93F7-64112F63E8CF@pnsol.com \
    --to=neil.davies@pnsol.com \
    --cc=bloat@lists.bufferbloat.net \
    --cc=lopresti@gmail.com \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox