From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id BC4A43BA8E for ; Wed, 28 Nov 2018 04:56:43 -0500 (EST) Received: by mail-qt1-x832.google.com with SMTP id p17so25104236qtl.5 for ; Wed, 28 Nov 2018 01:56:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lBEKH4HMZ0IU2tfeaLlhUpsW77xVLOpjv2KKhPpuuBs=; b=FUXxmap6PkB+FSoVjGWMLLXp0OWy5nqxiuHpxgQ/aHqwABk3P6zc6mpaCWOVFFKH7R uMGy5KDHds/jZi4HiHe7/gchGG4JNFn5yF5oYji+G6fkbDIMPlJScQ3UuwotTeOCyNZl ASTgCASVM9OrtKMbXyKV4LbLbzT//jr4ZAOWJmlVsGJxQSZVOb6qEPdnCMyqvtFSViYo PKA/y7WMpI3hVeHbdEc9hVOmRQE1bCN1C3ScVk8TPPdcHvq3wuRV5pZe78zcm7a9Gd5M mh0sXt8kIqGHPAkj1qEEbRTki3EKFPlEGTQ8ah67YQDeVT6C6H1UFNGH70XRZY2rydsO zPMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lBEKH4HMZ0IU2tfeaLlhUpsW77xVLOpjv2KKhPpuuBs=; b=kPCBbhAMw05SJVih/rAa6btSa4gpy+eihESq30BqFL5Axxm23schBbnDVMikKJScwO F06QJx2sZI0wI4WoyVWPb25V5eBPPHn8QkcQia9Rq6MgNW1AoGwS3FfSCXZZGLcZCTIq qujo6APboasXGWt41tT7Mf3EfDj6aOCTVjRpJmh/iSMM3A4UBEWQi4IDt0uAcVtc27o3 vcrBs8eJpA/LlVDrOikGoKzlvFMcuZI1VwRdyx36hZFYBk8WcTTVJZCbkZ2EsoJj2rJu eY8r9isQDFbgBvqy6f0aO8mJ2xqtXnn7xNMJ2/JKHRt4VdWthjKE0iM59v14B7iPszzJ OYog== X-Gm-Message-State: AGRZ1gIfTdpD4kRKvc1kafyPZ8nBo4mPw3ymqJdpY07MhFrWf8mEegwG fJQ/AxDlVjkoqZQrbvNvXVV/xbC+DKoMHxdC2UE= X-Google-Smtp-Source: AJdET5cG2v9ZM9GiFEdLdHdqF2GC06npEGNIqNhqYsqSQ9ax38inKNHXsdjMYbMdNWA8AuKLFB+sx+zx1PiRki4ylDY= X-Received: by 2002:aed:2946:: with SMTP id s64mr34188488qtd.383.1543399003178; Wed, 28 Nov 2018 01:56:43 -0800 (PST) MIME-Version: 1.0 References: <65EAC6C1-4688-46B6-A575-A6C7F2C066C5@heistp.net> In-Reply-To: From: Luca Muscariello Date: Wed, 28 Nov 2018 10:56:31 +0100 Message-ID: To: Dave Taht Cc: Jonathan Morton , bloat Content-Type: multipart/alternative; boundary="00000000000054cdb1057bb69562" Subject: Re: [Bloat] when does the CoDel part of fq_codel help in the real world? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Nov 2018 09:56:43 -0000 --00000000000054cdb1057bb69562 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Dave, The single BDP inflight is a rule of thumb that does not account for fluctuations of the RTT. And I am not talking about random fluctuations and noise. I am talking about fluctuations from a control theoretic point of view to stabilise the system, e.g. the trajectory of the system variable that gets to the optimal point no matter the initial conditions (Lyapunov). The ACM queue paper talking about Codel makes a fairly intuitive and accessible explanation of that. There is a less accessible literature talking about that, which dates back to some time ago that may be useful to re-read again Damon Wischik and Nick McKeown. 2005. Part I: buffer sizes for core routers. SIGCOMM Comput. Commun. Rev. 35, 3 (July 2005), 75-78. DOI=3D http://dx.doi.org/10.1145/1070873.1070884 http://klamath.stanford.edu/~nickm/papers/BufferSizing.pdf.pdf and Gaurav Raina, Don Towsley, and Damon Wischik. 2005. Part II: control theory for buffer sizing. SIGCOMM Comput. Commun. Rev. 35, 3 (July 2005), 79-82. DOI=3Dhttp://dx.doi.org/10.1145/1070873.1070885 http://www.statslab.cam.ac.uk/~gr224/PAPERS/Control_Theory_Buffers.pdf One of the thing that Frank Kelly has brought to the literature is about optimal control. >From a pure optimization point of view we know since Robert Gallagher (and Bertsekas 1981) that the optimal sending rate is a function of the shadow price at the bottleneck. This shadow price is nothing more than the Lagrange multiplier of the capacity constraint at the bottleneck. Some protocols such as XCP or RCP propose to carry something very close to a shadow price in the ECN but that's not that simple. And currently we have a 0/1 "shadow price" which way insufficient. Optimal control as developed by Frank Kelly since 1998 tells you that you have a stability region that is needed to get to the optimum. Wischik work, IMO, helps quite a lot to understand tradeoffs while designing AQM and CC. I feel like the people who wrote the codel ACM Queue paper are very much aware of this literature, because Codel design principles seem to take into account that. And the BBR paper too. On Tue, Nov 27, 2018 at 9:58 PM Dave Taht wrote: > OK, wow, this conversation got long. and I'm still 20 messages behind. > > Two points, and I'm going to go back to work, and maybe I'll try to > summarize a table > of the competing viewpoints, as there's far more than BDP of > discussion here, and what > we need is sqrt(bdp) to deal with all the different conversational flows. > :) > > On Tue, Nov 27, 2018 at 1:24 AM Luca Muscariello > wrote: > > > > I think that this is a very good comment to the discussion at the > defense about the comparison between > > SFQ with longest queue drop and FQ_Codel. > > > > A congestion controlled protocol such as TCP or others, including QUIC, > LEDBAT and so on > > need at least the BDP in the transmission queue to get full link > efficiency, i.e. the queue never empties out. > > no, I think it needs a BDP in flight. > > I think some of the confusion here is that your TCP stack needs to > keep around a BDP in order to deal with > retransmits, but that lives in another set of buffers entirely. > > > This gives rule of thumbs to size buffers which is also very practical > and thanks to flow isolation becomes very accurate. > > > > Which is: > > > > 1) find a way to keep the number of backlogged flows at a reasonable > value. > > This largely depends on the minimum fair rate an application may need i= n > the long term. > > We discussed a little bit of available mechanisms to achieve that in th= e > literature. > > > > 2) fix the largest RTT you want to serve at full utilization and size > the buffer using BDP * N_backlogged. > > Or the other way round: check how much memory you can use > > in the router/line card/device and for a fixed N, compute the largest > RTT you can serve at full utilization. > > My own take on the whole BDP argument is that *so long as the flows in > that BDP are thoroughly mixed* you win. > > > > > 3) there is still some memory to dimension for sparse flows in addition > to that, but this is not based on BDP. > > It is just enough to compute the total utilization of sparse flows and > use the same simple model Toke has used > > to compute the (de)prioritization probability. > > > > This procedure would allow to size FQ_codel but also SFQ. > > It would be interesting to compare the two under this buffer sizing. > > It would also be interesting to compare another mechanism that we have > mentioned during the defense > > which is AFD + a sparse flow queue. Which is, BTW, already available in > Cisco nexus switches for data centres. > > > > I think that the the codel part would still provide the ECN feature, > that all the others cannot have. > > However the others, the last one especially can be implemented in > silicon with reasonable cost. > > > > > > > > > > > > On Mon 26 Nov 2018 at 22:30, Jonathan Morton > wrote: > >> > >> > On 26 Nov, 2018, at 9:08 pm, Pete Heist wrote: > >> > > >> > So I just thought to continue the discussion- when does the CoDel > part of fq_codel actually help in the real world? > >> > >> Fundamentally, without Codel the only limits on the congestion window > would be when the sender or receiver hit configured or calculated rwnd an= d > cwnd limits (the rwnd is visible on the wire and usually chosen to be lar= ge > enough to be a non-factor), or when the queue overflows. Large windows > require buffer memory in both sender and receiver, increasing costs on th= e > sender in particular (who typically has many flows to manage per machine)= . > >> > >> Queue overflow tends to result in burst loss and head-of-line blocking > in the receiver, which is visible to the user as a pause and subsequent > jump in the progress of their download, accompanied by a major fluctuatio= n > in the estimated time to completion. The lost packets also consume > capacity upstream of the bottleneck which does not contribute to > application throughput. These effects are independent of whether overflo= w > dropping occurs at the head or tail of the bottleneck queue, though > recovery occurs more quickly (and fewer packets might be lost) if droppin= g > occurs from the head of the queue. > >> > >> From a pure throughput-efficiency standpoint, Codel allows using ECN > for congestion signalling instead of packet loss, potentially eliminating > packet loss and associated lead-of-line blocking entirely. Even without > ECN, the actual cwnd is kept near the minimum necessary to satisfy the BD= P > of the path, reducing memory requirements and significantly shortening th= e > recovery time of each loss cycle, to the point where the end-user may not > notice that delivery is not perfectly smooth, and implementing accurate > completion time estimators is considerably simplified. > >> > >> An important use-case is where two sequential bottlenecks exist on the > path, the upstream one being only slightly higher capacity but lacking an= y > queue management at all. This is presently common in cases where home CP= E > implements inbound shaping on a generic ISP last-mile link. In that case= , > without Codel running on the second bottleneck, traffic would collect in > the first bottleneck's queue as well, greatly reducing the beneficial > effects of FQ implemented on the second bottleneck. In this topology, th= e > overall effect is inter-flow as well as intra-flow. > >> > >> The combination of Codel with FQ is done in such a way that a separate > instance of Codel is implemented for each flow. This means that congesti= on > signals are only sent to flows that require them, and non-saturating flow= s > are unmolested. This makes the combination synergistic, where each > component offers an improvement to the behaviour of the other. > >> > >> - Jonathan Morton > >> > >> _______________________________________________ > >> Bloat mailing list > >> Bloat@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/bloat > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > > -- > > Dave T=C3=A4ht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 > --00000000000054cdb1057bb69562 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Dave,

The single BDP inflight is a rule of thu= mb that does not account for fluctuations of the RTT.
And I am no= t talking about random fluctuations and noise. I am talking about fluctuati= ons
from a control theoretic point of=C2=A0view to stabilise the = system, e.g. the trajectory of the system variable that
gets to t= he optimal point no matter the initial conditions (Lyapunov).
The= ACM queue paper talking about Codel makes a fairly intuitive and accessibl= e explanation of that.

There is a less accessible = literature talking about that, which dates back to some time ago=C2=A0
that may be useful to re-read again

Damon Wischik and Nick McKeown. 2005.=C2=A0
Part I: buffer sizes for core routers.=C2=A0
<= span style=3D"box-sizing:border-box;color:rgb(0,0,0);font-family:tahoma,ari= al,verdana,sans-serif;font-size:12px">SIGCOMM Comput. Commun. Rev.=C2=A035, 3 (July 2005), 75-78. DOI=3Dhttp://dx.doi.org/10.1145/1070873.1070884

and= =C2=A0

Gaurav Raina, Don Towsley, and Damon Wischik. 200= 5.=C2=A0
Part II: control theory for buff= er sizing.=C2=A0
SI= GCOMM Comput. Commun. Rev.=C2=A035, 3 (July 2005), = 79-82.=C2=A0
=

One of the thing = that Frank Kelly has brought to the literature is about optimal control.
From a pure optimization point of view we know since Robert Gallagh= er (and Bertsekas 1981) that=C2=A0
the optimal sending rate is a = function of the shadow price at the bottleneck.
This shadow price= is nothing more than the Lagrange multiplier of the capacity constraint=C2= =A0
at the bottleneck. Some protocols such as XCP or RCP propose = to carry something
very close to a shadow price in the ECN but th= at's not that simple.=C2=A0
And currently we have a 0/1 "= ;shadow price" which way insufficient.

Optima= l control as developed by Frank Kelly since 1998 tells you that you have
a stability region that is needed to get to the optimum.
=
Wischik work, IMO, helps quite a lot to understand tradeoffs= while designing AQM
and CC. I feel like the people who wrote the= codel ACM Queue paper are very much aware of this literature,
be= cause Codel design principles seem to take into account that.
And= the BBR paper too.


On Tue, Nov 27, 2018 at 9:58 PM Dave Taht <dave.taht@gmail.com> wrote:
OK, wow, this conversation g= ot long. and I'm still 20 messages behind.

Two points, and I'm going to go back to work, and maybe I'll try to=
summarize a table
of the competing viewpoints, as there's far more than BDP of
discussion here, and what
we need is sqrt(bdp) to deal with all the different conversational flows. := )

On Tue, Nov 27, 2018 at 1:24 AM Luca Muscariello
<luca.mu= scariello@gmail.com> wrote:
>
> I think that this is a very good comment to the discussion at the defe= nse about the comparison between
> SFQ with longest queue drop and FQ_Codel.
>
> A congestion controlled protocol such as TCP or others, including QUIC= , LEDBAT and so on
> need at least the BDP in the transmission queue to get full link effic= iency, i.e. the queue never empties out.

no, I think it needs a BDP in flight.

I think some of the confusion here is that your TCP stack needs to
keep around a BDP in order to deal with
retransmits, but that lives in another set of buffers entirely.

> This gives rule of thumbs to size buffers which is also very practical= and thanks to flow isolation becomes very accurate.
>
> Which is:
>
> 1) find a way to keep the number of backlogged flows at a reasonable v= alue.
> This largely depends on the minimum fair rate an application may need = in the long term.
> We discussed a little bit of available mechanisms to achieve that in t= he literature.
>
> 2) fix the largest RTT you want to serve at full utilization and size = the buffer using BDP * N_backlogged.
> Or the other way round: check how much memory you can use
> in the router/line card/device and for a fixed N, compute the largest = RTT you can serve at full utilization.

My own take on the whole BDP argument is that *so long as the flows in
that BDP are thoroughly mixed* you win.

>
> 3) there is still some memory to dimension for sparse flows in additio= n to that, but this is not based on BDP.
> It is just enough to compute the total utilization of sparse flows and= use the same simple model Toke has used
> to compute the (de)prioritization probability.
>
> This procedure would allow to size FQ_codel but also SFQ.
> It would be interesting to compare the two under this buffer sizing. > It would also be interesting to compare another mechanism that we have= mentioned during the defense
> which is AFD + a sparse flow queue. Which is, BTW, already available i= n Cisco nexus switches for data centres.
>
> I think that the the codel part would still provide the ECN feature, t= hat all the others cannot have.
> However the others, the last one especially can be implemented in sili= con with reasonable cost.
>
>
>
>
>
> On Mon 26 Nov 2018 at 22:30, Jonathan Morton <chromatix99@gmail.com> wrote:<= br> >>
>> > On 26 Nov, 2018, at 9:08 pm, Pete Heist <pete@heistp.net> wrote:
>> >
>> > So I just thought to continue the discussion- when does the C= oDel part of fq_codel actually help in the real world?
>>
>> Fundamentally, without Codel the only limits on the congestion win= dow would be when the sender or receiver hit configured or calculated rwnd = and cwnd limits (the rwnd is visible on the wire and usually chosen to be l= arge enough to be a non-factor), or when the queue overflows.=C2=A0 Large w= indows require buffer memory in both sender and receiver, increasing costs = on the sender in particular (who typically has many flows to manage per mac= hine).
>>
>> Queue overflow tends to result in burst loss and head-of-line bloc= king in the receiver, which is visible to the user as a pause and subsequen= t jump in the progress of their download, accompanied by a major fluctuatio= n in the estimated time to completion.=C2=A0 The lost packets also consume = capacity upstream of the bottleneck which does not contribute to applicatio= n throughput.=C2=A0 These effects are independent of whether overflow dropp= ing occurs at the head or tail of the bottleneck queue, though recovery occ= urs more quickly (and fewer packets might be lost) if dropping occurs from = the head of the queue.
>>
>> From a pure throughput-efficiency standpoint, Codel allows using E= CN for congestion signalling instead of packet loss, potentially eliminatin= g packet loss and associated lead-of-line blocking entirely.=C2=A0 Even wit= hout ECN, the actual cwnd is kept near the minimum necessary to satisfy the= BDP of the path, reducing memory requirements and significantly shortening= the recovery time of each loss cycle, to the point where the end-user may = not notice that delivery is not perfectly smooth, and implementing accurate= completion time estimators is considerably simplified.
>>
>> An important use-case is where two sequential bottlenecks exist on= the path, the upstream one being only slightly higher capacity but lacking= any queue management at all.=C2=A0 This is presently common in cases where= home CPE implements inbound shaping on a generic ISP last-mile link.=C2=A0= In that case, without Codel running on the second bottleneck, traffic woul= d collect in the first bottleneck's queue as well, greatly reducing the= beneficial effects of FQ implemented on the second bottleneck.=C2=A0 In th= is topology, the overall effect is inter-flow as well as intra-flow.
>>
>> The combination of Codel with FQ is done in such a way that a sepa= rate instance of Codel is implemented for each flow.=C2=A0 This means that = congestion signals are only sent to flows that require them, and non-satura= ting flows are unmolested.=C2=A0 This makes the combination synergistic, wh= ere each component offers an improvement to the behaviour of the other.
>>
>>=C2=A0 - Jonathan Morton
>>
>> _______________________________________________
>> Bloat mailing list
>> B= loat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
> _______________________________________________
> Bloat mailing list
>
Bloat= @lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


--

Dave T=C3=A4ht
CTO, TekLibre, LLC
ht= tp://www.teklibre.com
Tel: 1-831-205-9740
--00000000000054cdb1057bb69562--