From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com
	[IPv6:2a00:1450:400c:c05::22f])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 4A93421F394
	for <bloat@lists.bufferbloat.net>; Thu, 29 May 2014 09:58:12 -0700 (PDT)
Received: by mail-wi0-f175.google.com with SMTP id f8so5928590wiw.14
	for <bloat@lists.bufferbloat.net>; Thu, 29 May 2014 09:58:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=fVSqE/OREjavcI08uqcqAUJs+fKGuC9vjNr0b6FFaQU=;
	b=J+ba/AhhvXbHJqkR/NW+UPOt7ECtsTiXwvnJSD7zhEjF/Ax5UWH4c/z9v3Hyd4ogkX
	CqmfwQTWK7gUUDLHRNm/zgM9dEteODRfyb5O+fGVt2uO2GKXz7J2H5+AzDP1kZTlTkXq
	+3YlvxzmvTCo/4beFT//KyPqonUpA7TaGf1Y4XZaT5sctBB0xWWROeFbhb1ZJHWwUzZ7
	m3cnpFEi/2HrS3+ytSg8A38f1OT/H1Z+FkeIsqXeWM+JIcSP4uYVOz/iFUCHW6nQZfzf
	JsYrCF7tVD4csGT2SpPMwKcPxGaudLSfCP+I3MRytZc09XbQLHbGGpA8dAGgbZSRFQAT
	UxoQ==
MIME-Version: 1.0
X-Received: by 10.194.22.100 with SMTP id c4mr12418995wjf.89.1401382689050;
	Thu, 29 May 2014 09:58:09 -0700 (PDT)
Received: by 10.216.207.82 with HTTP; Thu, 29 May 2014 09:58:08 -0700 (PDT)
In-Reply-To: <4F010DC9-309D-4468-AE8B-60EDC59CE028@pnsol.com>
References: <20140528093920.9351E406062@ip-64-139-1-69.sjc.megapath.net>
	<5AB607A3-A4EA-4B6E-A0F6-7FA0ED9B36E7@gmail.com>
	<4F010DC9-309D-4468-AE8B-60EDC59CE028@pnsol.com>
Date: Thu, 29 May 2014 09:58:08 -0700
Message-ID: <CAA93jw7qkR30YTKA3WqPPeNd9WiJGJyHBcGcSjOZ3S-tyJbVAQ@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: Neil Davies <neil.davies@pnsol.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Hal Murray <hmurray@megapathdsl.net>, bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 29 May 2014 16:58:12 -0000

I am really enjoying this thread. There was a video and presentation
from stanford
last (?) year  that decided that the "right" number of buffers at
really high rates (10gb+)
was really small, like, 20, and used 10s of thousands of flows to make
its point.

I think it came out of the optical networking group... anybody remember the
paper/preso/video I'm talking about? It seemed like a pretty radical conclu=
sion
at the time.

On Thu, May 29, 2014 at 12:20 AM, Neil Davies <neil.davies@pnsol.com> wrote=
:
>
> On 28 May 2014, at 12:00, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>>
>> On 28 May, 2014, at 12:39 pm, Hal Murray wrote:
>>
>>>> in non discarding scheduling total delay is conserved,
>>>> irrespective of the scheduling discipline
>>>
>>> Is that true for all backplane/switching topologies?
>>
>> It's a mathematical truth for any topology that you can reduce to a blac=
k box with one or more inputs and one output, which you call a "queue" and =
which *does not discard* packets.  Non-discarding queues don't exist in the=
 real world, of course.
>>
>> The intuitive proof is that every time you promote a packet to be transm=
itted earlier, you must demote one to be transmitted later.  A non-FIFO que=
ue tends to increase the maximum delay and decrease the minimum delay, but =
the average delay will remain constant.

There are two cases here, under congestion, that are of interest. One
is X into 1, where figuring out
what to shoot at when, is important.

The other is where X into 1 at one rate is ultimately being stepped
down from, say 10gbit, to 10mbit, e2e.
In the latter case I'm reasonably confident that stochastic fair
queueing at a ratio of number of flows proportional to the ultimate
step-down is a win. (and you still have to decide what to shoot at) -
and it makes tons of sense for hosts servicing a limited number of
users to also disburse their
packet payloads at a similar ratio.

In either case as rates and numbers of flows get insanely high, my gut
(which has been wrong before!)
agreed with the stanford result, (short queues, drop tail), and
conflicts with the observation that breaking
up high speed clumps into highly mixed packets is a good thing.

I wish it were possible to experiment with a 10+gbit, congested,
internet backbone link and observe the results of these lines of
thought...

>
> Jonathan - there is a mathematical underpinning for this, when you (mathe=
matically) construct queueing systems that will differentially allocate bot=
h delay and loss you find that the underlying state space has certain prope=
rties - they have "lumpability" - this lumpabilty (apart from making the st=
ate space dramatically smaller) has another, profound, implication. A set o=
f states that are in a "lump" have an interesting equivalence, it doesn't m=
atter how you leave the "lump" the overall system properties are unaffected=
.

http://www.pnsol.com/publications.html has invented several terms that
I don't fully understand.


> In the systems we studied (in which there was a ranking in "order of serv=
ice" (delay/urgency) things in, and a ranking in discarding (loss/cherish) =
things) this basically implied that the overall system properties (the tota=
l "amount" of loss and delay) was independent of that choice. The "quality =
attenuation" (the loss and delay) was thus conserved.
>
>>
>>>> The question is if (codel/pie/whatever) AQM makes sense at all for 10G=
/40G
>>>> hardware and higher performance irons? Igress/egress bandwidth is near=
ly
>>>> identical, a larger/longer buffering should not happen. Line card memo=
ry is
>>>> limited, a larger buffering is defacto excluded.
>>>
>>> The simplest interesting case is where you have two input lines feeding=
 the
>>> same output line.
>>>
>>> AQM may not be the best solution, but you have to do something.  Droppi=
ng any
>>> packet that won't fit into the buffer is probably simplest.
>>
>> The relative bandwidths of the input(s) and output(s) is also relevant. =
 You *can* have a saturated 5-port switch with no dropped packets, even if =
one of them is a common uplink, provided the uplink port has four times the=
 bandwidth and the traffic coming in on it is evenly distributed to the oth=
er four.
>>
>> Which yields you the classic tail-drop FIFO, whose faults are by now wel=
l documented.  If you have the opportunity to do something better than that=
, you probably should.  The simplest improvement I can think of is a *head*=
-drop FIFO, which gets the congestion signal back to the source quicker.  I=
t *should* I think be possible to do Codel at 10G (if not 40G) by now; whet=
her or not it is *easy* probably depends on your transistor budget.
>
> Caveat: this is probably the best strategy for networks that consist sole=
ly of long lived, non service critical, TCP flows - for the rest of network=
ing requirements think carefully. There are several, real world, scenarios =
where this is not the best strategy and, where you are looking to make any =
form of "safety" case (be it fiscal or safety of life) it does create new p=
erformance related attack vectors. We know this, because we've been asked t=
his and we've done the analysis.
>
>>
>> - Jonathan Morton
>>
>
> ---------------------------------------------------
> Neil Davies, PhD, CEng, CITP, MBCS
> Chief Scientist
> Predictable Network Solutions Ltd
> Tel:   +44 3333 407715
> Mob: +44 7974 922445
> neil.davies@pnsol.com
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


--=20
Dave T=C3=A4ht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_=
indecent.article