From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-fx0-f43.google.com (mail-fx0-f43.google.com
	[209.85.161.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 14297200968
	for <bloat@lists.bufferbloat.net>; Sun, 15 May 2011 17:22:23 -0700 (PDT)
Received: by fxm3 with SMTP id 3so4417950fxm.16
	for <bloat@lists.bufferbloat.net>; Sun, 15 May 2011 17:31:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:subject:mime-version:content-type:from
	:in-reply-to:date:cc:content-transfer-encoding:message-id:references
	:to:x-mailer; bh=FVHIhijVdlFo1skpDcZkI9vdJuXuRRuGRt0kcwq2r3s=;
	b=m9IsskRIXF9BSclkuR+2+LzDiJZVTTQqI0P2RugVHu6V53y6ixxVGJyVszPho4uCq9
	XlPeVhq9TEjoNlR5RQ2HbXAqnPcQe/7itdcpEEG+Z42lARlGK4L9kXwb2AWvQSKQMLRx
	RvjjdTJitBsADnqNY8CZikMIh08q6kIf5GJ2U=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	b=f5OLNVzgbSTsTCKBHXHb0IoBoGD5HnmuyVuorBuJM93Bv5XYFESZT6tYxmY3n+TiGh
	IEEBjlJr3mtjr+lOC5rpNul4zjxrVEwtNC2y6tMmONtlA/CAGJVSoygKZmnhHF7vPwNK
	HMi021alSVFfaufQGTanVARlIRQ2xMlrWm3po=
Received: by 10.223.79.151 with SMTP id p23mr3997840fak.78.1305505905472;
	Sun, 15 May 2011 17:31:45 -0700 (PDT)
Received: from [192.168.239.42] (xdsl-83-150-84-172.nebulazone.fi
	[83.150.84.172])
	by mx.google.com with ESMTPS id o10sm1621948faa.19.2011.05.15.17.31.43
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sun, 15 May 2011 17:31:44 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <5946BA6B-4E00-43AF-A8A2-17FB3769F37B@cisco.com>
Date: Mon, 16 May 2011 03:31:41 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <2EEFB9D5-E9CC-4612-8D91-F6B382E3C2FB@gmail.com>
References: <BANLkTi=9Kgz4kXRzK_KC9LpSDBEoVQiseg@mail.gmail.com>	<BANLkTi=pOCQRdUA3_-_=q+m27H526rWK7w@mail.gmail.com>	<BANLkTimrxy6=8NQ+VpysJiRcWCX6RpbrWA@mail.gmail.com>	<4DB70FDA.6000507@mti-systems.com>	<D02B19AE0CC44AFCBA30F6CD0B10C56C@srichardlxp2>	<4DC2C9D2.8040703@freedesktop.org>	<20110505091046.3c73e067@nehalam>	<6E25D2CF-D0F0-4C41-BABC-4AB0C00862A6@pnsol.com>	<35D8AC71C7BF46E29CC3118AACD97FA6@srichardlxp2>	<1304964368.8149.202.camel@tardy>	<4DD9A464-8845-49AA-ADC4-A0D36D91AAEC@cisco.com>	<BANLkTi=LyPBRsOXxw=VWrXi7BbELnhbtXg@mail.gmail.com>	<1305297321.8149.549.camel@tardy>
	<BANLkTi=ztg9r68Efsk43--Y0Hcrny4g8EA@mail.gmail.com>
	<014c01cc11a8$de78ac10$9b6a0430$@gross@avanw.com>
	<8A928839-1D91-4F18-8252-F06BD004E37D@cisco.com>
	<B2AC2C3E-DBF8-449D-9647-9A263970BF75@gmail.com>
	<5946BA6B-4E00-43AF-A8A2-17FB3769F37B@cisco.com>
To: Fred Baker <fred@cisco.com>
X-Mailer: Apple Mail (2.1084)
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Jumbo frames and LAN buffers (was: RE:  Burst Loss)
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2011 00:22:24 -0000


On 15 May, 2011, at 11:49 pm, Fred Baker wrote:

>=20
> On May 15, 2011, at 11:28 AM, Jonathan Morton wrote:
>> The fundamental thing is that the sender must be able to know when =
sent frames can be flushed from the buffer because they don't need to be =
retransmitted.  So if there's a NACK, there must also be an ACK - at =
which point the ACK serves the purpose of the NACK, as it does in TCP.  =
The only alternative is a wall-time TTL, which is doable on single hops =
but requires careful design.
>=20
> To a point. NORM holds a frame for possible retransmission for a =
stated period of time, and if retransmission isn't requested in that =
interval forgets it. So the ack isn't actually necessary; what is =
necessary is that the retention interval be long enough that a nack has =
a high probability of succeeding in getting the message through.

Okay, so because it can fall back to TCP's retransmit, the retention =
requirements can be relaxed.

>> ...recent versions of Ethernet *do* support a throttling feedback =
mechanism, and this can and should be exploited to tell the edge host or =
router that ECN *might* be needed.  Also, with throttling feedback =
throughout the LAN, the Ethernet can for practical purposes be treated =
as almost-reliable.  This is *better* in terms of packet loss than ARQ =
or NACK, although if the Ethernet's buffers are large, it will still =
increase delay.  (With small buffers, it will just decrease throughput =
to the capacity, which is fine.)
>=20
> It increases the delay anyway. It just pushes the retention buffer to =
another place. What do you think the packet is doing during the "don't =
transmit" interval?

Most packets delayed by Ethernet throttling would, with small buffers, =
end up waiting in the sending host (or router).  They thus spend more =
time in a potentially active queue instead of in a dumb one.  But even =
if the host queue is dumb, the overall delay is no worse than with the =
larger Ethernet buffers.

> Throughput never exceeds capacity. If I have a 10 GBPS link, I will =
never get more than 10 GBPS through it. Buffer fill rate is =
statistically predictable. With small buffers, the fill rate acheives =
the top sooner. They increase the probability that the buffers are full, =
which is to say the drop probability. Which puts us to an end to end =
retransmission, which is the worst case of what you were worried about.

Let's suppose someone has generously provisioned an office with GigE =
throughout, using a two-level hierarchy of switches.  Some dumb schmuck =
then schedules every single computer to run it's backups (to a single =
fileserver) at the same time.  That's say 100 computers all competing =
for one GigE link to the fileserver.  If the switches are fair, each =
computer should get 10Mbps - that's the capacity.

With throttling, each computer sees the link closed 99% of the time.  It =
can send at link rate for the remaining 1% of the time.  On medium =
timescales, that looks like a 10Mbps bottleneck at the first link.  So =
the throughput on that link equals the capacity, and hopefully the =
goodput is also thus.  The only queue that is likely to overflow is the =
one on the sending computer, and one would hope there is enough feedback =
in a host's own TCP/IP stack to prevent that.

Without throttling but with ARQ, NACK or whatever you want to call it, =
the host has no signal to tell it to slow down - so the throughput on =
the edge link is more than 10Mbps (but the goodput will be less).  The =
buffer in the outer switch fills up - no matter how big or small it is - =
and starts dropping packets.  The switch then won't ask for =
retransmission of packets it's just dropped, because it has nowhere to =
put them.  The same process then repeats at the inner switch.  Finally, =
the server sees the missing packets, and asks for the retransmission - =
but these requests have to be switched all the way back to the clients, =
because the missing packets aren't in the switches' buffers.  It's =
therefore no better than a TCP SACK retransmission.

So there you have a classic congested network scenario in which =
throttling solves the problem, but link-level retransmission can't.

Where ARQ and/or NACK come in handy is where the link itself is =
unreliable, such as on WLANs (hence the use in amateur radio) and =
last-mile links.  In that case, the reason for the packet loss is not a =
full receive buffer, so asking for a retransmission is not inherently =
self-defeating.

> I'm not going to argue against letting retransmission go end to end; =
it's an endless debate. I'll simply note that several link layers, =
including but not limited to those you mention, find that applications =
using them work better if there is a high high probability of =
retransmission in an interval on the order of the link RTT as opposed to =
the end to end RTT. You brought up data centers (aka variable delays in =
LAN networks); those have been heavily the province of fiberchannel, =
which is a link layer protocol with retransmission. Think about it.

What I'd like to see is a complete absence of need for retransmission on =
a properly built wired network.  Obviously the capability still needs to =
be there to cope with the parts that aren't properly built or aren't =
wired, but TCP can do that. Throttling (in the form of Ethernet PAUSE) =
is simply the third possible method of signalling congestion in the =
network, alongside delay and loss - and it happens to be quite widely =
deployed already.

 - Jonathan