[Bloat] when does the CoDel part of fq_codel help in the real world?

Luca Muscariello luca.muscariello at gmail.com
Wed Nov 28 05:48:19 EST 2018


On Wed, Nov 28, 2018 at 11:40 AM Dave Taht <dave.taht at gmail.com> wrote:

> On Wed, Nov 28, 2018 at 1:56 AM Luca Muscariello
> <luca.muscariello at gmail.com> wrote:
> >
> > Dave,
> >
> > The single BDP inflight is a rule of thumb that does not account for
> fluctuations of the RTT.
> > And I am not talking about random fluctuations and noise. I am talking
> about fluctuations
> > from a control theoretic point of view to stabilise the system, e.g. the
> trajectory of the system variable that
> > gets to the optimal point no matter the initial conditions (Lyapunov).
>
> I have been trying all day to summon the gumption to make this argument:
>
> IF you have a good idea of the actual RTT...
>
> it is also nearly certain that there will be *at least* one other flow
> you will be competing with...
> therefore the fluctuations from every point of view are dominated by
> the interaction between these flows and
> the goal is, in general, is not to take up a full BDP for your single flow.
>
> And BBR aims for some tiny percentage less than what it thinks it can
> get, when, well, everybody's seen it battle it out with itself and
> with cubic. I hand it FQ at the bottleneck link and it works well.
>
> single flows exist only in the minds of theorists and labs.
>
> There's a relevant passage worth citing in the kleinrock paper, I
> thought (did he write two recently?) that talked about this problem...
> I *swear* when I first read it it had a deeper discussion of the
> second sentence below and had two paragraphs that went into the issues
> with multiple flows:
>
> "ch earlier and led to the Flow Deviation algorithm [28]. 17 The
> reason that the early work of 40 years ago took so long to make its
> current impact is because in [31] it was shown that the mechanism
> presented in [2] and [3] could not be implemented in a decentralized
> algorithm. This delayed the application of Power until the recent work
> by the Google team in [1] demonstrated that the key elements of
> response time and bandwidth could indeed be estimated using a
> distributed control loop sliding window spanning approximately 10
> round-trip times."
>
> but I can't find it today.
>
>
Here it is

https://www.lk.cs.ucla.edu/data/files/Kleinrock/Internet%20Congestion%20Control%20Using%20the%20Power%20Metric-Keep%20the%20Pipe%20Just%20Full%2C%20But%20No%20Fuller%20July%202018.pdf


> > The ACM queue paper talking about Codel makes a fairly intuitive and
> accessible explanation of that.
>
> I haven't re-read the lola paper. I just wanted to make the assertion
> above. And then duck. :)
>
> Also, when I last looked at BBR, it made a false assumption that 200ms
> was "long enough" to probe the actual RTT, when my comcast links and
> others are measured at 680ms+ of buffering.
>

This is essentially the same paper I cited which is Part I.



>
> And I always liked the stanford work, here, which tried to assert that
> a link with n flows requires no more than B = (RTT ×C)/ √ n.
>
> http://yuba.stanford.edu/techreports/TR04-HPNG-060800.pdf


That that paper does not say that that rule ALWAYS apply. It does under
certain conditions.
But my point is about optimality.

I does NOT mean that the system HAS to work ALWAYS in that point because
things change.

And for BBR, I would say that one thing is the design principles another is
the implementations
and we better distinguish between them. The key design principles are all
valid.



>
>
> night!
>
>
night ;)


>
>
> > There is a less accessible literature talking about that, which dates
> back to some time ago
> > that may be useful to re-read again
> >
> > Damon Wischik and Nick McKeown. 2005.
> > Part I: buffer sizes for core routers.
> > SIGCOMM Comput. Commun. Rev. 35, 3 (July 2005), 75-78. DOI=
> http://dx.doi.org/10.1145/1070873.1070884
> > http://klamath.stanford.edu/~nickm/papers/BufferSizing.pdf.pdf
> >
> > and
> >
> > Gaurav Raina, Don Towsley, and Damon Wischik. 2005.
> > Part II: control theory for buffer sizing.
> > SIGCOMM Comput. Commun. Rev. 35, 3 (July 2005), 79-82.
> > DOI=http://dx.doi.org/10.1145/1070873.1070885
> > http://www.statslab.cam.ac.uk/~gr224/PAPERS/Control_Theory_Buffers.pdf
> >
> > One of the thing that Frank Kelly has brought to the literature is about
> optimal control.
> > From a pure optimization point of view we know since Robert Gallagher
> (and Bertsekas 1981) that
> > the optimal sending rate is a function of the shadow price at the
> bottleneck.
> > This shadow price is nothing more than the Lagrange multiplier of the
> capacity constraint
> > at the bottleneck. Some protocols such as XCP or RCP propose to carry
> something
> > very close to a shadow price in the ECN but that's not that simple.
> > And currently we have a 0/1 "shadow price" which way insufficient.
> >
> > Optimal control as developed by Frank Kelly since 1998 tells you that
> you have
> > a stability region that is needed to get to the optimum.
> >
> > Wischik work, IMO, helps quite a lot to understand tradeoffs while
> designing AQM
> > and CC. I feel like the people who wrote the codel ACM Queue paper are
> very much aware of this literature,
> > because Codel design principles seem to take into account that.
> > And the BBR paper too.
> >
> >
> > On Tue, Nov 27, 2018 at 9:58 PM Dave Taht <dave.taht at gmail.com> wrote:
> >>
> >> OK, wow, this conversation got long. and I'm still 20 messages behind.
> >>
> >> Two points, and I'm going to go back to work, and maybe I'll try to
> >> summarize a table
> >> of the competing viewpoints, as there's far more than BDP of
> >> discussion here, and what
> >> we need is sqrt(bdp) to deal with all the different conversational
> flows. :)
> >>
> >> On Tue, Nov 27, 2018 at 1:24 AM Luca Muscariello
> >> <luca.muscariello at gmail.com> wrote:
> >> >
> >> > I think that this is a very good comment to the discussion at the
> defense about the comparison between
> >> > SFQ with longest queue drop and FQ_Codel.
> >> >
> >> > A congestion controlled protocol such as TCP or others, including
> QUIC, LEDBAT and so on
> >> > need at least the BDP in the transmission queue to get full link
> efficiency, i.e. the queue never empties out.
> >>
> >> no, I think it needs a BDP in flight.
> >>
> >> I think some of the confusion here is that your TCP stack needs to
> >> keep around a BDP in order to deal with
> >> retransmits, but that lives in another set of buffers entirely.
> >>
> >> > This gives rule of thumbs to size buffers which is also very
> practical and thanks to flow isolation becomes very accurate.
> >> >
> >> > Which is:
> >> >
> >> > 1) find a way to keep the number of backlogged flows at a reasonable
> value.
> >> > This largely depends on the minimum fair rate an application may need
> in the long term.
> >> > We discussed a little bit of available mechanisms to achieve that in
> the literature.
> >> >
> >> > 2) fix the largest RTT you want to serve at full utilization and size
> the buffer using BDP * N_backlogged.
> >> > Or the other way round: check how much memory you can use
> >> > in the router/line card/device and for a fixed N, compute the largest
> RTT you can serve at full utilization.
> >>
> >> My own take on the whole BDP argument is that *so long as the flows in
> >> that BDP are thoroughly mixed* you win.
> >>
> >> >
> >> > 3) there is still some memory to dimension for sparse flows in
> addition to that, but this is not based on BDP.
> >> > It is just enough to compute the total utilization of sparse flows
> and use the same simple model Toke has used
> >> > to compute the (de)prioritization probability.
> >> >
> >> > This procedure would allow to size FQ_codel but also SFQ.
> >> > It would be interesting to compare the two under this buffer sizing.
> >> > It would also be interesting to compare another mechanism that we
> have mentioned during the defense
> >> > which is AFD + a sparse flow queue. Which is, BTW, already available
> in Cisco nexus switches for data centres.
> >> >
> >> > I think that the the codel part would still provide the ECN feature,
> that all the others cannot have.
> >> > However the others, the last one especially can be implemented in
> silicon with reasonable cost.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Mon 26 Nov 2018 at 22:30, Jonathan Morton <chromatix99 at gmail.com>
> wrote:
> >> >>
> >> >> > On 26 Nov, 2018, at 9:08 pm, Pete Heist <pete at heistp.net> wrote:
> >> >> >
> >> >> > So I just thought to continue the discussion- when does the CoDel
> part of fq_codel actually help in the real world?
> >> >>
> >> >> Fundamentally, without Codel the only limits on the congestion
> window would be when the sender or receiver hit configured or calculated
> rwnd and cwnd limits (the rwnd is visible on the wire and usually chosen to
> be large enough to be a non-factor), or when the queue overflows.  Large
> windows require buffer memory in both sender and receiver, increasing costs
> on the sender in particular (who typically has many flows to manage per
> machine).
> >> >>
> >> >> Queue overflow tends to result in burst loss and head-of-line
> blocking in the receiver, which is visible to the user as a pause and
> subsequent jump in the progress of their download, accompanied by a major
> fluctuation in the estimated time to completion.  The lost packets also
> consume capacity upstream of the bottleneck which does not contribute to
> application throughput.  These effects are independent of whether overflow
> dropping occurs at the head or tail of the bottleneck queue, though
> recovery occurs more quickly (and fewer packets might be lost) if dropping
> occurs from the head of the queue.
> >> >>
> >> >> From a pure throughput-efficiency standpoint, Codel allows using ECN
> for congestion signalling instead of packet loss, potentially eliminating
> packet loss and associated lead-of-line blocking entirely.  Even without
> ECN, the actual cwnd is kept near the minimum necessary to satisfy the BDP
> of the path, reducing memory requirements and significantly shortening the
> recovery time of each loss cycle, to the point where the end-user may not
> notice that delivery is not perfectly smooth, and implementing accurate
> completion time estimators is considerably simplified.
> >> >>
> >> >> An important use-case is where two sequential bottlenecks exist on
> the path, the upstream one being only slightly higher capacity but lacking
> any queue management at all.  This is presently common in cases where home
> CPE implements inbound shaping on a generic ISP last-mile link.  In that
> case, without Codel running on the second bottleneck, traffic would collect
> in the first bottleneck's queue as well, greatly reducing the beneficial
> effects of FQ implemented on the second bottleneck.  In this topology, the
> overall effect is inter-flow as well as intra-flow.
> >> >>
> >> >> The combination of Codel with FQ is done in such a way that a
> separate instance of Codel is implemented for each flow.  This means that
> congestion signals are only sent to flows that require them, and
> non-saturating flows are unmolested.  This makes the combination
> synergistic, where each component offers an improvement to the behaviour of
> the other.
> >> >>
> >> >>  - Jonathan Morton
> >> >>
> >> >> _______________________________________________
> >> >> Bloat mailing list
> >> >> Bloat at lists.bufferbloat.net
> >> >> https://lists.bufferbloat.net/listinfo/bloat
> >> >
> >> > _______________________________________________
> >> > Bloat mailing list
> >> > Bloat at lists.bufferbloat.net
> >> > https://lists.bufferbloat.net/listinfo/bloat
> >>
> >>
> >>
> >> --
> >>
> >> Dave Täht
> >> CTO, TekLibre, LLC
> >> http://www.teklibre.com
> >> Tel: 1-831-205-9740
>
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20181128/e7e8d25e/attachment-0001.html>


More information about the Bloat mailing list