From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rcdn-iport-9.cisco.com (rcdn-iport-9.cisco.com [173.37.86.80]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "rcdn-iport.cisco.com", Issuer "Cisco SSCA2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 0C38921F371 for ; Wed, 28 May 2014 15:15:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=5260; q=dns/txt; s=iport; t=1401315328; x=1402524928; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=YAVmT1FB+kZZwHOTZzD3n41epRnF/sgM9OLcqZQDDts=; b=SrV/VcXiXRbYq3cHVTbchSNreTX1lT6VgnFrT276i40ar1anUo9Zc8k3 EY2qmlX8oz39rIRM6uun7GzbXDHnQYqUt0dI3goGReX7KL0uYLnuVLLpP cE1qTmDfG3ECmkRQzX/mK57xMxJkqeX4yjyTboZdWtHkMDS+R8Bch6dOm o=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aj0FAPxehlOtJA2F/2dsb2JhbABagwdSWLp4h0ABgRAWdIIlAQEBAwEBAQE3NAsFBwQCAQgRBAEBAQoUCQcnAQoUCQgCBAENBQgTiB8IDdgPEwSJM4Q8CycxBwaDJYEVBJUtgWeWCIM4gW1C X-IronPort-AV: E=Sophos;i="4.98,930,1392163200"; d="scan'208";a="325606694" Received: from alln-core-11.cisco.com ([173.36.13.133]) by rcdn-iport-9.cisco.com with ESMTP; 28 May 2014 22:15:27 +0000 Received: from xhc-aln-x06.cisco.com (xhc-aln-x06.cisco.com [173.36.12.80]) by alln-core-11.cisco.com (8.14.5/8.14.5) with ESMTP id s4SMFQHi007640 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 28 May 2014 22:15:26 GMT Received: from xmb-aln-x05.cisco.com ([169.254.11.106]) by xhc-aln-x06.cisco.com ([173.36.12.80]) with mapi id 14.03.0123.003; Wed, 28 May 2014 17:15:26 -0500 From: "Bill Ver Steeg (versteb)" To: David Lang , Jonathan Morton Thread-Topic: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES" Thread-Index: AQHPeqdpAY8fCy1oDky9lbz6HAMDOptWie1w Date: Wed, 28 May 2014 22:15:26 +0000 Message-ID: References: <20140528093920.9351E406062@ip-64-139-1-69.sjc.megapath.net> <5AB607A3-A4EA-4B6E-A0F6-7FA0ED9B36E7@gmail.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.117.75.38] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Hal Murray , "bloat@lists.bufferbloat.net" Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES" X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2014 22:15:28 -0000 This really speaks to the difference between cross-traffic induced delay an= d self- induced delay. There are several toolkits that can be brought to bear, and we must be care= ful to examine the impact of each of them. The one that we tend to think ab= out most (at least recently) is the AQM algorithm that manages the depth of= a given queue. It is important to note that waiting for the buffer to fill= up before dropping is not optimal, because it is then too late. You want t= o provide mark/drop back pressure a bit earlier so that you do not grind al= l of the flows to a halt at once. See the PIE and CoDel papers for the deta= ils. There are also several technologies that can be used to segregate flow= s to lessen the impact of cross traffic. There are also congestion avoidanc= e algorithms that can be used on the hosts to recognize/avoid bloat. There = are hybrids of these schemes, and multiple technologies with their own swee= t spots in each of these domains. There is no magic bullet, and a successful system will need to draw from ea= ch of these disciplines. In the specific case of short lived flows vs long lived flows, one could ma= ke a case that hashing the several flows into a set of discrete queues woul= d provide tremendous benefit. IMHO, this is the best approach, - but I am l= ooking into this in some detail. One could also argue that not all middlebo= xes are able to support multiple queues, (and that the number of queues is = finite) so an intelligent AQM algorithm is also important for limiting cros= s traffic induced delay. Once could also make the point that some (hopefull= y fewer and fewer) middleboxes will not have any sort of rational buffer ma= nagement capabilities and will just do tail-drop with large buffers, so the= hosts need to do what they can to avoid bloat.=20 Bill VerSteeg -----Original Message----- From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.buffe= rbloat.net] On Behalf Of David Lang Sent: Wednesday, May 28, 2014 2:56 PM To: Jonathan Morton Cc: Hal Murray; bloat@lists.bufferbloat.net Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES" On Wed, 28 May 2014, Jonathan Morton wrote: > On 28 May, 2014, at 12:39 pm, Hal Murray wrote: > >>> in non discarding scheduling total delay is conserved, irrespective=20 >>> of the scheduling discipline >> >> Is that true for all backplane/switching topologies? > > It's a mathematical truth for any topology that you can reduce to a=20 > black box with one or more inputs and one output, which you call a=20 > "queue" and which *does not discard* packets. Non-discarding queues=20 > don't exist in the real world, of course. > > The intuitive proof is that every time you promote a packet to be=20 > transmitted earlier, you must demote one to be transmitted later. A=20 > non-FIFO queue tends to increase the maximum delay and decrease the=20 > minimum delay, but the average delay will remain constant. True, but not all traffic is equal. delays in DNS and short TCP connections= are far more noticable than the same total delay in long TCP connections (= because the users tend to be serialized on the short connections while doin= g the long ones in parallel) so queueing that favors short duration flows over long duration ones still = averages the same latency delay overall, but the latency/connection_length = will remain very small in all cases instead lf letting this ratio become ve= ry large for short connections. David Lang >>> The question is if (codel/pie/whatever) AQM makes sense at all for=20 >>> 10G/40G hardware and higher performance irons? Igress/egress=20 >>> bandwidth is nearly identical, a larger/longer buffering should not=20 >>> happen. Line card memory is limited, a larger buffering is defacto excl= uded. >> >> The simplest interesting case is where you have two input lines=20 >> feeding the same output line. >> >> AQM may not be the best solution, but you have to do something. =20 >> Dropping any packet that won't fit into the buffer is probably simplest. > > The relative bandwidths of the input(s) and output(s) is also relevant. = You *can* have a saturated 5-port switch with no dropped packets, even if o= ne of them is a common uplink, provided the uplink port has four times the = bandwidth and the traffic coming in on it is evenly distributed to the othe= r four. > > Which yields you the classic tail-drop FIFO, whose faults are by now well= documented. If you have the opportunity to do something better than that,= you probably should. The simplest improvement I can think of is a *head*-= drop FIFO, which gets the congestion signal back to the source quicker. It= *should* I think be possible to do Codel at 10G (if not 40G) by now; wheth= er or not it is *easy* probably depends on your transistor budget. > > - Jonathan Morton > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat