From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sj-iport-2.cisco.com (sj-iport-2.cisco.com [171.71.176.71]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "sj-iport-2.cisco.com", Issuer "Cisco SSCA" (not verified)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 673FE200968 for ; Sun, 15 May 2011 13:40:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=fred@cisco.com; l=2836; q=dns/txt; s=iport; t=1305492591; x=1306702191; h=subject:mime-version:from:in-reply-to:date:cc:message-id: references:to:content-transfer-encoding; bh=lv7gp+CM2OORZS7bSrjWVQ1TcWQPNcA0HXygpdjw/VM=; b=ZdaXfiO4bhiK2ZVkt2qC+Cgy4BkdGhOhmyvgAonw7bqouGGpG4bFDOt/ GoNi9BjJCjx6s7v5EZ6p8I6B7aHQVszYox9CvL8UVhj/X5tPgysuNFOJz U6UjQ/POq9AXOvUbGgq4Ou5rTgLz7ux8KHhxTgq0Ld85k9ATisIgl2iYf M=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAFE80E2rRDoI/2dsb2JhbACmFHepPJx6hhkEhlCJQYQvimY X-IronPort-AV: E=Sophos;i="4.64,370,1301875200"; d="scan'208";a="357463610" Received: from mtv-core-3.cisco.com ([171.68.58.8]) by sj-iport-2.cisco.com with ESMTP; 15 May 2011 20:49:40 +0000 Received: from Freds-Computer.local ([10.21.119.2]) by mtv-core-3.cisco.com (8.14.3/8.14.3) with ESMTP id p4FKnXjC005233; Sun, 15 May 2011 20:49:39 GMT Received: from [127.0.0.1] by Freds-Computer.local (PGP Universal service); Sun, 15 May 2011 13:49:39 -0700 X-PGP-Universal: processed; by Freds-Computer.local on Sun, 15 May 2011 13:49:39 -0700 Mime-Version: 1.0 (Apple Message framework v1084) From: Fred Baker In-Reply-To: Date: Sun, 15 May 2011 13:49:22 -0700 Message-Id: <5946BA6B-4E00-43AF-A8A2-17FB3769F37B@cisco.com> References: <4DB70FDA.6000507@mti-systems.com> <4DC2C9D2.8040703@freedesktop.org> <20110505091046.3c73e067@nehalam> <6E25D2CF-D0F0-4C41-BABC-4AB0C00862A6@pnsol.com> <35D8AC71C7BF46E29CC3118AACD97FA6@srichardlxp2> <1304964368.8149.202.camel@tardy> <4DD9A464-8845-49AA-ADC4-A0D36D91AAEC@cisco.com> <1305297321.8149.549.camel@tardy> <014c01cc11a8$de78ac10$9b6a0430$@gross@avanw.com> <8A928839-1D91-4F18-8252-F06BD004E37D@cisco.com> To: Jonathan Morton X-Mailer: Apple Mail (2.1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Jumbo frames and LAN buffers (was: RE: Burst Loss) X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 May 2011 20:40:31 -0000 On May 15, 2011, at 11:28 AM, Jonathan Morton wrote: > The fundamental thing is that the sender must be able to know when = sent frames can be flushed from the buffer because they don't need to be = retransmitted. So if there's a NACK, there must also be an ACK - at = which point the ACK serves the purpose of the NACK, as it does in TCP. = The only alternative is a wall-time TTL, which is doable on single hops = but requires careful design. To a point. NORM holds a frame for possible retransmission for a stated = period of time, and if retransmission isn't requested in that interval = forgets it. So the ack isn't actually necessary; what is necessary is = that the retention interval be long enough that a nack has a high = probability of succeeding in getting the message through. A 100 Gbit = interface can handle 97656 per millisecond (100G/(8*128*1000). We're = looking at something on the order of 18 bits (4 ms to retransmit without = falling back to TCP) for a rational sequence number at 100 Gbps; 16 bits = would be enough at 10 Gbps, and 12 bits would be enough at 1 Gbps. > ...recent versions of Ethernet *do* support a throttling feedback = mechanism, and this can and should be exploited to tell the edge host or = router that ECN *might* be needed. Also, with throttling feedback = throughout the LAN, the Ethernet can for practical purposes be treated = as almost-reliable. This is *better* in terms of packet loss than ARQ = or NACK, although if the Ethernet's buffers are large, it will still = increase delay. (With small buffers, it will just decrease throughput = to the capacity, which is fine.) It increases the delay anyway. It just pushes the retention buffer to = another place. What do you think the packet is doing during the "don't = transmit" interval? Throughput never exceeds capacity. If I have a 10 GBPS link, I will = never get more than 10 GBPS through it. Buffer fill rate is = statistically predictable. With small buffers, the fill rate acheives = the top sooner. They increase the probability that the buffers are full, = which is to say the drop probability. Which puts us to an end to end = retransmission, which is the worst case of what you were worried about. I'm not going to argue against letting retransmission go end to end; = it's an endless debate. I'll simply note that several link layers, = including but not limited to those you mention, find that applications = using them work better if there is a high high probability of = retransmission in an interval on the order of the link RTT as opposed to = the end to end RTT. You brought up data centers (aka variable delays in = LAN networks); those have been heavily the province of fiberchannel, = which is a link layer protocol with retransmission. Think about it.=