[Codel] [aqm] I-D Action: draft-ietf-aqm-codel-03.txt

Thu Jul 7 15:11:55 EDT 2016

On 7/7/16 1:34 PM, Michael Menth wrote:
> Dear Polina, dear all,
> 
> Am 07.07.2016 um 13:49 schrieb Goltsman, Polina:
>> Dear Andrew, Dear Wesley,
>>
>>
>> First of all, sorry for the long waiting time. I still haven't processed
>> "nits" but I have 2 major issues.
>>
>>
>> *Issue 1: *This issue is still related to the reentering dropping state.
>>
>>
>> As of the latest I-D, the "reentering dropping state"  appears in form
>> of one sentence in the text and a small code block in the pseudo-code.
>>
>>
>> The text is on page 13:
>>
>>> Additional logic prevents re-
>>> entering the dropping state too soon after exiting it and resumes the
>>> dropping state at a recent control level, if one exists.
>>
>> The comment to the pseudo-code on page 20:
>>
>>
>>            // If min went above target close to when it last went
>>            // below, assume that the drop rate that controlled the
>>            // queue on the last cycle is a good starting point to
>>            // control it now. ('drop_next' will be at most 'interval'
>>            // later than the time of the last drop so 'now - drop_next'
>>            // is a good approximation of the time from the last drop
>>            // until now.) *Implementations vary slightly here; this is // the Linux version, which
>> is more widely deployed and // tested.
>>
>> *
>>
>>
>> Or summarized: this issue is not addressed in text, which instead just
>> mentions that the issue exists, there are several versions of this logic
>> and here is one of them.
>>
>>
>> As it reads now, this "reentering" should be a tiny insignificant part
>> of the algorithm that (1) doesn't happen in "most" of the cases, and (2)
>> when it does happen, it will only "slightly" affect "performance" (with
>> proper definitions of "most", "slightly", and "performance").
>>
>>
>> Is it really so? 
>>
>>
>> Based on our evaluations, with pure CoDel (without FQ-CoDel),
>> "reentering" is actually a common case. I think Dave and Toke should
>> have more experimental results to answer this question. 

It is actually a common case. The initial run-up of the count variable
dominates in the early stages of a many-TCP onslaught, then later on it
falls into and out of the drop state portion in the latter part of the
code, being discussed here.

Flent has suport for sampling the delay, drops, and queue backlogs, and
we have data for 1-1000 flows, <1 to 100s ms of rtts through bottleneck
links ranging from 1mbit through a gbit.

We have not been sampling the value of the count variable in those
tests. Most of our testing has been limited to persistent loads of 20
minutes or less, and has tended to focus on bidirectional and web
traffic, using fq_codel derivatives, rather than pounding X number of
full rate flows through the system.

>>(I included Dave in CC)

I am on the aqm, bloat, and codel lists, but am traveling through the EU
(Well, england just now), through mid-august, so replies may be
infrequent and/or incomplete. I will be at ietf.

> 
> We also studied this issue about one or two years ago. The algorithm
> controlling the reentering of the dropping mode may have a significant
> impact on control variables, drop rates, drop pattern depending on the
> traffic. It can also impact utilization. Most interesting is probably
> Fig. 6 on page 7 in
> https://atlas.informatik.uni-tuebingen.de/~menth/papers/Menth16e.pdf

A couple quick, scattered notes:

re:

I have never been happy with the resume/decay portion of the algorithm,
and thank you for exploring some of the alternatives left behind on the
simulation bench.

I'd have liked the linux version of this bit plugged into the experiment.

Floating point is unavailable in the linux kernel, and while we could do
a fixed point fractional count, getting up to the number of colliding
flows needed to start needing that in fq_codel seemed not needed.

Adding the -ACT ideas to fq_codel would be an interesting test.
I think, from reading this paper, that we could easily recreate the
conditions of this experiment in flent, but I would always welcome
source code so as to be able to change variables like the kernel tcp
version from kernel 2.6.21 to 4.8, factor in BQL, look at RTTs between
10 and 100, bandwidths > 10mbit, and other underlying issues like that.

In the 64 flows case at 10mbit, with the sustained low delay of
codel-act, and pie, I am curious how often tcp flows are resetting or
having RTOs? I do not think that a fixed delay is desirable. "You aim
for a target", not that you need to hit it. More important is high
utilization, and trying to keep the connections afloat, with as minimum
latency as possible.

I would have liked to seen a staggered start attempted for this many
flows, reverse traffic, etc.

> Kind regards,
> 
> Michael
> 
> 
> 
>>
>>
>> If it is so, I still think that the "common case" should be at the very
>> least explained in text too.
>>
>>
>> *Issue 2:* The "initial drop spacing" is inconsistent
>>
>>
>> Regarding the "initial drop spacing" there are following lines in the
>> document.
>>
>>
>> between pages 11 and 12:
>>
>>> and so the initial drop spacing 
>>> SHOULD be set to the estimator's *interval plus twice the target*
>>> (i.e., initial drop spacing = 1.1 * interval)
>>
>> page 12:
>>
>>> the initial next
>>> drop spacing is intended to be long enough to give the endpoints time
>>> to react to the single drop so SHOULD be set to a value *of 1.1 times >the interval*
>>
>> page 16:
>>
>>
>>> and the initial drop
>>> spacing is also set to *interval*.
>>
>> and it is also set to*one interval* in pseudo-code, which I believe is
>> this line:
>>
>>
>>> drop_next_ = control_law(now, count_);
>>
>>
>> Note: I think there was an email on CoDel mailing list
>> (https://lists.bufferbloat.net/pipermail/codel/) about this issue.
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Andrew Mcgregor <andrewmcgr at google.com>
>> *Sent:* Friday, June 3, 2016 3:50 AM
>> *To:* Goltsman, Polina
>> *Cc:* Wesley Eddy; aqm at ietf.org; Jana Iyengar
>> *Subject:* Re: [aqm] I-D Action: draft-ietf-aqm-codel-03.txt
>>  
>> Thank you Polina.
>>
>> We have incorporated most of these (sometimes with slightly different
>> solutions than you suggested), and a new draft will be out very
>> shortly.  I felt that by far the most important was the different way
>> the Linux implementation addressed re-entry to the dropping state, and
>> the pseudocode now does exactly that.
>>
>> Andrew
>>
>> On 22 March 2016 at 08:10, Polina Goltsman
>> <polina.goltsman at student.kit.edu
>> <mailto:polina.goltsman at student.kit.edu>> wrote:
>>
>>     Dear Wesley, Dear All,
>>
>>     First of all our feedback regarding different "re-entering dropping
>>     state" in the document and in the Linux implementation
>>     (http://www.ietf.org/mail-archive/web/aqm/current/msg01686.html) was
>>     not addressed.
>>
>>     As FQ-CoDel  relies on CoDel, this issue is also (partly) relevant
>>     for the FQ-CoDel document. In the introduction FQ-CoDel references
>>     ns-3 and Linux implementations where the first one uses the
>>     re-entering logic from the CoDel document while the second from
>>     CoDel Linux implementation. The algorithm that has seen widespread
>>     testing according to Section 7 is (I suppose) the Linux version. Is
>>     this situation acceptable for an algorithm specification?
>>
>>     /[since this comment was supposed to be sent before the end of 2015,
>>     feel free to (silently) ignore it]/
>>     Second, unlike Rasool Al-Saadi (see
>>     http://www.ietf.org/mail-archive/web/aqm/current/msg01693.html) I do
>>     not like the document. Although I agree that the pseudocode is
>>     sufficient to create a working implementation, however, in my
>>     opinion, the rest of the document makes implementing CoDeL more
>>     confusing (at least without reading [CODEL2012] first). Is it normal
>>     for a RFC, which, as I assume, should primarily contain an algorithm
>>     specification to contain the algorithm specification ONLY in form of
>>     pseudocode?  
>>     These are two items that I found the most confusing:
>>     (1) section 3 introduces estimator, setpoint, and control loop which
>>     are not clearly distinguishable in pseudocode. It would be nice if
>>     section 4 explained how the three entities transition into the
>>     routines in pseudocode.
>>     (2) IMHO, there is some missing explanations. For example, the
>>     document never says how exactly "bad"/persistent queue is
>>     determined. The document says in Section 3.1 that the minimum is
>>     tracked, but it never says that persistent queue is when the minimum
>>     is above setpoint/target for at least an interval. As another
>>     example, Section 3.3 could say how the controller looks like exactly.
>>
>>     I also have some small comments and nits regarding the latest
>>     version (-03):
>>
>>     ---
>>
>>     in the whole document there are several places of using "we" to
>>     (supposedly) refer to the authors. According to Bob Briscoe's
>>     message
>>     (http://www.ietf.org/mail-archive/web/aqm/current/msg01805.html),
>>     "we" do not necessary refer to the authors in a rfc. Are these "we"s ok?
>>
>>     ---
>>
>>     abstract:
>>
>>     This document describes a general *framework* called CoDel 
>>
>>     IMHO the word "framework" is too overloaded and too broad. It might
>>     be better to call CoDel an AQM or something that has "queue" in it.
>>     Other ways (e.g., http://queue.acm.org/detail.cfm?id=2839461)
>>     probably also use it to control a queue.
>>
>>     ---
>>
>>     section 1:
>>
>>     (nit)
>>        Despite this
>>        awareness, the problem has only gotten worse as Moore's Law growth in
>>        memory density fueled an exponential increase in buffer pool size.
>>        Efforts to deploy AQM have been frustrated by difficult configuration
>>        and negative impact on network utilization.  *This problem,* recently
>>        christened "bufferbloat", [TSV2011] [BB2011] has become increasingly
>>        important throughout the Internet but particularly at the consumer
>>        edge.
>>
>>     it reads like "this problem" refers to "efforts to deploy AQM" and
>>     not to "increase in buffer pool size"
>>
>>     ---
>>
>>     (nit)
>>
>>        To understand
>>        queue management, it is critical to understand the difference between
>>        the necessary, useful *"good" queue*, and the counterproductive *"bad" queue*.
>>
>>     strictly speaking "good" and "bad" queue were not defined either
>>     before or in this sentence (it is explained in section 3 or in
>>     [CODEL2012])
>>
>>     ---
>>
>>     (nit)
>>
>>        o  treat "good queue" and "bad queue" differently, that is, keep
>>           delay low while permitting necessary bursts of traffic
>>
>>     (1) see above
>>     (2) I would say that the goal is "to keep delay low while ..." and
>>     to treat good/bad queue differently is a mechanism used to achieve
>>     the goal
>>
>>     ---
>>
>>     section 2:
>>
>>     (nit)
>>           In this document, *the characters ">>" preceding an indented
>>     line(s)*
>>        indicates a compliance requirement statement using the key words
>>        listed above.  This convention aids reviewers in quickly identifying
>>        or finding the explicit compliance requirements of this RFC.
>>
>>     I maybe missing something, but I can't find any use of "characters
>>     ">>"" anywhere except this line in any (txt,html, pdf) of the versions.
>>
>>     ---
>>
>>     section 3:
>>
>>     generally IMHO it would be nice to introduce /interval/ and /target/
>>     so it is more obvious that these are parameters (or some other kind
>>     of variables). Interval is more or less obvious (although imho
>>     should be more explicit). The target is first introduced at the end
>>     of section 3.2 as:
>>
>>        This results in a particularly simple form for the
>>        setpoint: the ideal range for the permitted standing queue is between
>>        5% and 10% of the TCP connection's RTT.  Thus *target* is simply 5% of
>>        the interval of section 3.1
>>     <https://tools.ietf.org/html/draft-ietf-aqm-codel-03#section-3.1>.
>>
>>     It was not previously defined that /target/ is a parameter with a
>>     meaning of /setpoint/. I don't think that this is clear for people
>>     who do not know what target is in advance.
>>
>>     [It seems that it was intended to introduce /interval/ and /target/
>>     parameters in section 4]
>>
>>     ---
>>
>>        Instead of averages
>>        *we recommend* tracking the minimum sojourn time, then, 
>>
>>     "we recommend" or "CoDeL does"? According to the last paragraph in
>>     Section 3.1, tracking minimum is a core part of CoDeL:
>>
>>        These two innovations, use of sojourn time as observed values *and the local minimum as the statistic to monitor queue congestion* are key to
>>        CoDel's estimator building block.
>>
>>     ---
>>
>>        Since the peak queue delay*is simply f r*, power is solely a function
>>
>>     is it f/r ?
>>
>>     ---
>>     (nit)
>>
>>        The only remaining building block needed for a basic AQM is a
>>        'control loop' algorithm to effectively drive the queueing system
>>        from any *'persistent queue above target'* state to a state where the
>>        *persistent queue is below target*.
>>
>>     It would be nice to explicitly state that the "persistent queue
>>     above target" is indication of "bad" queue/congestion somewhere
>>     above or in this line. I think that these terms are used in
>>     [codel2012] but were not used before in the document. See also the
>>     next item.
>>
>>     ---
>>     In section 3.3 there are three "states":
>>
>>       *
>>
>>         'persistent queue above target' state
>>
>>       *
>>
>>         'has persistent queue' state
>>
>>       *
>>
>>          dropping state
>>
>>     It would be nice to a) use the same one b) explain how CoDeL gets
>>     in/out of there
>>
>>     ---
>>
>>     (probably nit)
>>
>>        When
>>        *the minimum sojourn time first crosses the target* and *CoDel drops a packet*, the earliest the controller could see the effect of the drop
>>
>>     from current sentence it could be inferred that the two events occur
>>     at the same time which is not the case. Maybe " when persistent
>>     queue above target is detected" would be better.
>>
>>     ---
>>
>>        variation is less than the target, and so the *initial drop spacing SHOULD be set to the estimator's interval plus
>>     twice the target (i.e., initial drop spacing = 1.1 * interval)* to ensure that the
>>
>>     this contradicts the pseudocode:
>>
>>        dodequeue_result codel_queue_t::dodequeue(time_t now)
>>
>>     	...
>>     	first_above_time_ = now + interval_;
>>
>>     section 4:
>>
>>        To keep from
>>        making drops when it would starve the output link, CoDel makes
>>        another check before dropping to see if at least an MTU worth of
>>        bytes remains in the buffer. If not, the packet SHOULD NOT be
>>        dropped and, *currently*, CoDel exits the drop state. 
>>
>>     what is the meaning of "currently" ? in the current experimental
>>     version?
>>
>>     ---
>>     (nit)
>>
>>        In practice, this value is not known or measured *(*though see
>>        Section 6.2
>>     <https://tools.ietf.org/html/draft-ietf-aqm-codel-03#section-6.2> for an application where interval is measured.
>>
>>     missing closing bracket
>>
>>     Section 5 / pseudocode: (all nits)
>>
>>     The variables are introduced as first_time_above, dropping, but are
>>     used as first_time_above_, dropping_ (with underscores at end). 
>>
>>     ---
>>
>>     *Packet** CoDelQueue::dequeue()
>>
>>     a pointer to current packet was introduced as packet_t
>>
>>     *--- double* now = clock();;
>>
>>     time units were supposed to be of type time_t
>>
>>     *--- dodequeueResult* r = dodequeue(now);
>>
>>     compare with: typedef struct { ... } *dodequeue_result* below
>>     (present in two routines)
>>
>>     ---
>>
>>     dodequeueResult r = {*NULL, queue_t::dequeue() *};
>>
>>     shouldn't it be {queue_t::dequeue(), false} ?
>>
>>
>>     Best Regards,
>>     Polina
>>
>>
>>     * note the reply by Dave Taht:
>>     http://www.ietf.org/mail-archive/web/aqm/current/msg01687.html
>>
>>     On 03/20/2016 04:35 PM, Wesley Eddy wrote:
>>>     It looks like the WGLC feedback on the document body is
>>>     incorporated, so this is good.
>>>
>>>     Is there a reason to stay with Informational and not Experimental
>>>     like we've done with PIE an d FQ-CoDel?
>>>
>>>     Also, idnits has some problems with the references that should be
>>>     fixed (e.g. "NATAL2010" is probably supposed to be "NETAL2010"):
>>>     https://tools.ietf.org/idnits?url=https://tools.ietf.org/id/draft-ietf-aqm-codel-03.txt
>>
>>
>>     _______________________________________________
>>     aqm mailing list
>>     aqm at ietf.org <mailto:aqm at ietf.org>
>>     https://www.ietf.org/mailman/listinfo/aqm
>>
>>
>>
>>
>> -- 
>> Andrew McGregor | SRE | andrewmcgr at google.com
>> <mailto:andrewmcgr at google.com> | +61 4 1071 2221
>>
>>
>> _______________________________________________
>> aqm mailing list
>> aqm at ietf.org
>> https://www.ietf.org/mailman/listinfo/aqm
>>
>