From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eu1sys200aog118.obsmtp.com (eu1sys200aog118.obsmtp.com [207.126.144.145]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id B994B21F37B for ; Thu, 29 May 2014 00:36:06 -0700 (PDT) Received: from mail.la.pnsol.com ([89.145.213.110]) (using TLSv1) by eu1sys200aob118.postini.com ([207.126.147.11]) with SMTP ID DSNKU4bjXsjeM9VDhKBwFYOi/1WHs60IVF7l@postini.com; Thu, 29 May 2014 07:36:06 UTC Received: from [172.20.5.238] (helo=roam.smtp.pnsol.com) by mail.la.pnsol.com with esmtp (Exim 4.76) (envelope-from ) id 1WpusY-0004uF-4E; Thu, 29 May 2014 08:35:58 +0100 Received: from gw.eu-west-1b.aws.pnsol.com ([172.30.11.4] helo=[172.30.8.198]) by roam.smtp.pnsol.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1WpusW-0000En-PA; Thu, 29 May 2014 07:35:57 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\)) From: Neil Davies In-Reply-To: <5AB607A3-A4EA-4B6E-A0F6-7FA0ED9B36E7@gmail.com> Date: Thu, 29 May 2014 08:20:42 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <4F010DC9-309D-4468-AE8B-60EDC59CE028@pnsol.com> References: <20140528093920.9351E406062@ip-64-139-1-69.sjc.megapath.net> <5AB607A3-A4EA-4B6E-A0F6-7FA0ED9B36E7@gmail.com> To: Jonathan Morton X-Mailer: Apple Mail (2.1878.2) Cc: Hal Murray , bloat@lists.bufferbloat.net Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES" X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2014 07:36:07 -0000 On 28 May 2014, at 12:00, Jonathan Morton wrote: >=20 > On 28 May, 2014, at 12:39 pm, Hal Murray wrote: >=20 >>> in non discarding scheduling total delay is conserved, >>> irrespective of the scheduling discipline >>=20 >> Is that true for all backplane/switching topologies? >=20 > It's a mathematical truth for any topology that you can reduce to a = black box with one or more inputs and one output, which you call a = "queue" and which *does not discard* packets. Non-discarding queues = don't exist in the real world, of course. >=20 > The intuitive proof is that every time you promote a packet to be = transmitted earlier, you must demote one to be transmitted later. A = non-FIFO queue tends to increase the maximum delay and decrease the = minimum delay, but the average delay will remain constant. Jonathan - there is a mathematical underpinning for this, when you = (mathematically) construct queueing systems that will differentially = allocate both delay and loss you find that the underlying state space = has certain properties - they have "lumpability" - this lumpabilty = (apart from making the state space dramatically smaller) has another, = profound, implication. A set of states that are in a "lump" have an = interesting equivalence, it doesn't matter how you leave the "lump" the = overall system properties are unaffected.=20 In the systems we studied (in which there was a ranking in "order of = service" (delay/urgency) things in, and a ranking in discarding = (loss/cherish) things) this basically implied that the overall system = properties (the total "amount" of loss and delay) was independent of = that choice. The "quality attenuation" (the loss and delay) was thus = conserved. >=20 >>> The question is if (codel/pie/whatever) AQM makes sense at all for = 10G/40G >>> hardware and higher performance irons? Igress/egress bandwidth is = nearly >>> identical, a larger/longer buffering should not happen. Line card = memory is >>> limited, a larger buffering is defacto excluded.=20 >>=20 >> The simplest interesting case is where you have two input lines = feeding the=20 >> same output line. >>=20 >> AQM may not be the best solution, but you have to do something. = Dropping any=20 >> packet that won't fit into the buffer is probably simplest. >=20 > The relative bandwidths of the input(s) and output(s) is also = relevant. You *can* have a saturated 5-port switch with no dropped = packets, even if one of them is a common uplink, provided the uplink = port has four times the bandwidth and the traffic coming in on it is = evenly distributed to the other four. >=20 > Which yields you the classic tail-drop FIFO, whose faults are by now = well documented. If you have the opportunity to do something better = than that, you probably should. The simplest improvement I can think of = is a *head*-drop FIFO, which gets the congestion signal back to the = source quicker. It *should* I think be possible to do Codel at 10G (if = not 40G) by now; whether or not it is *easy* probably depends on your = transistor budget. Caveat: this is probably the best strategy for networks that consist = solely of long lived, non service critical, TCP flows - for the rest of = networking requirements think carefully. There are several, real world, = scenarios where this is not the best strategy and, where you are looking = to make any form of "safety" case (be it fiscal or safety of life) it = does create new performance related attack vectors. We know this, = because we've been asked this and we've done the analysis. >=20 > - Jonathan Morton >=20 --------------------------------------------------- Neil Davies, PhD, CEng, CITP, MBCS Chief Scientist Predictable Network Solutions Ltd Tel: +44 3333 407715 Mob: +44 7974 922445 neil.davies@pnsol.com