* [Codel] Fwd: [aqm] Codel's count variable and re-entering dropping state at small time intervals
[not found] <55AD2695.8050605@kit.edu>
@ 2015-07-20 17:04 ` Dave Taht
2015-07-20 19:34 ` [Codel] " Jonathan Morton
[not found] ` <7A2801D5E40DD64A85E38DF22117852C70AE040F@wdc1exchmbxp05.hq.corp.viasat.com>
2 siblings, 0 replies; 6+ messages in thread
From: Dave Taht @ 2015-07-20 17:04 UTC (permalink / raw)
To: codel
---------- Forwarded message ----------
From: Roland Bless <roland.bless@kit.edu>
Date: Mon, Jul 20, 2015 at 6:49 PM
Subject: [aqm] Codel's count variable and re-entering dropping state
at small time intervals
To: aqm@ietf.org
Dear All,
we (Polina and I) have two questions concerning the behavior of the
`count` variable in Codel which can be summarized as:
1. after exiting dropping state, count is usually reset, unless "the
next drop state is entered too close to the previous one". This special
case is not explained in the text of the codel draft, but is present in
all implementations, and, currently, there are at least three different
versions of it (see below). We feel that implementers need more guidance
here...
2. is 'count' supposed to be reset or saturated on overflow and what
should be its maximum value (it makes a difference whether you are using
16-, 32-, or 64-bit counter)?
Regarding the first question:
Here are references to code that we looked to:
1) reference implementation in the draft:
https://tools.ietf.org/html/draft-ietf-aqm-codel-01#page-20
2) ns-2 code:
https://github.com/hbatmit/ns2.35/blob/master/queue/codel.cc#L144
3) linux code:
https://github.com/torvalds/linux/blob/110bc76729d448fdbcb5cdb63b83d9fd65ce5e26/include/net/codel.h
4) cake code: https://github.com/dtaht/sch_cake/blob/master/codel5.h
When codel encounters congestion it enters drop state, counts the number
the of packets that were dropped since congestion was encountered, and
drops packets in intervals of "interval" / sqrt (count). When there is
no more congestion, the drop state is exited and this counter is reset
unless "the next drop state is entered too close to the previous one".
This "too close to the previous one" is a part that is not well
documented and differs in implementations:
1) In the ns-2 code it is in lines 187-200. The most important part is
the comment that says that this control law is bad, but there is no
better one available.
2) In the reference code these lines are the last lines on page 20. It
has the same code (except // kmn decay tests line), but doesn't say that
this solution is temporary.
3) In Linux code the lines in question are 334-348 and 352; first it
uses 16 instead of 8, but this is consistent with comments, second it
sets the count to something else that wasn't comprehensible (It doesn't
look like the difference between previous count and count before that as
the names would suggest, but rather a some improvement of count - 2,
where 2 can be 3 or 1 or something ...)
4) For Cake the lines in question are 401-411, and in particular 404-405.
What is the "official" recommended version or is there any guidance on
how to select one?
Regarding the second question:
In the reference implementation of the draft, count is 32-bit integer,
but one cannot find any comments about the overflow behavior.
In Linux there is a comment to ignore overflow since there is no
division, and due to the sqrt^-1 approximation it won't break at 0.
According to these two emails on the Cake archive:
https://lists.bufferbloat.net/pipermail/cake/2015-June/000301.html,
https://lists.bufferbloat.net/pipermail/cake/2015-June/000302.html , it
seems that the original intention of count was to be reset by overflow
at 2^16, but according to experiments it is better to saturate it.
So, what is the desired overflow behavior for count: should it overflow
or saturate, and if the latter applies, at which maximal value?
Best Regards,
Roland and Polina (currently implementing stuff)
_______________________________________________
aqm mailing list
aqm@ietf.org
https://www.ietf.org/mailman/listinfo/aqm
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Codel] [aqm] Codel's count variable and re-entering dropping state at small time intervals
[not found] <55AD2695.8050605@kit.edu>
2015-07-20 17:04 ` [Codel] Fwd: [aqm] Codel's count variable and re-entering dropping state at small time intervals Dave Taht
@ 2015-07-20 19:34 ` Jonathan Morton
[not found] ` <55CDE8E3.4080207@student.kit.edu>
[not found] ` <7A2801D5E40DD64A85E38DF22117852C70AE040F@wdc1exchmbxp05.hq.corp.viasat.com>
2 siblings, 1 reply; 6+ messages in thread
From: Jonathan Morton @ 2015-07-20 19:34 UTC (permalink / raw)
To: Roland Bless, Anil.Agarwal; +Cc: codel, aqm
I can’t speak for Kathy, but as Cake’s author I have some insight into the changes I’ve made to the associated version of Codel.
> Ronald Bless:
> 1. after exiting dropping state, count is usually reset, unless "the
> next drop state is entered too close to the previous one”.
In fact, in most cases (ie. where the true RTT is near to or less than the one assumed by the interval parameter, and the flow is persistent) the drop state will be re-entered sufficiently soon to prevent the reset. The backoff behaviour for that case is thus more significant in practice.
The standard backoff behaviour is to reduce count by just 2. This is obviously wrong in the case where count has already reached a significant value; with a high drop frequency, count will climb a lot during a single RTT. But if the elevated drop frequency is necessary and sufficient to control the flow, we don’t want it to grow indefinitely.
Cake’s version halves the count, reducing the drop frequency by a factor of sqrt(2) after a pause. This multiplicative backoff prevents count from growing indefinitely, and allows it to reduce gradually to a lower value if that becomes appropriate due to reduced network load.
Anil's suggestion of continuously reducing count during the non-dropping state is an interesting alternative which might also make sense. It has the pleasing property of relating the backoff rate to the length of time spent outside the dropping state.
> 2. is 'count' supposed to be reset or saturated on overflow and what
> should be its maximum value (it makes a difference whether you are using
> 16-, 32-, or 64-bit counter)?
It’s clear to me that a saturating increment is necessary to deal with unresponsive flows adequately. Otherwise, the drop frequency will wrap around to a low value periodically when it needs to be consistently high.
My test case for this involves 50 upstream TCP flows on a narrow link (say 10Mbit) and a single downstream TCP flow on a wide link (say 100Mbit). In this configuration, the acks for the downstream flow constitute an unresponsive (in fact, anti-responsive) upstream flow of greater throughput demand than the individual upstream TCP flows. Anti-responsive means that dropping packets actually increases the network load, because reducing the ack density permits higher throughput on the downlink, which in turn generates additional acks on the uplink.
Since Cake uses very robust flow isolation, simply delivering all the acks in order, using a fair-share of the bandwidth, would severely limit the throughput of the downstream flow. However, the standard Codel behaviour resulted in wildly oscillating throughput on the downstream flow. Straightforward calculations suggested that the count variable was probably wrapping.
At first, I only put in the modified backoff behaviour mentioned above. Since the downstream flow was often reaching full throughput, I thought that occasionally the drop rate was managing to empty the ack queue, making the backoff behaviour significant. This assumption turned out to be correct, but the backoff modification was insufficient to completely correct the behaviour. Only when I changed count to use a saturating increment instead of a wrapping one did the downstream flow achieve full throughput consistently.
It’s possible that if count was a wider variable (32 instead of 16 bits) that the backoff modification would have been sufficient to prevent wrapping in this case. However, using saturating arithmetic is inexpensive on modern CPUs (many have dedicated instructions which could be used by highly optimised implementations) and is the most robust solution.
Note that in the absence of robust flow isolation, eg. with plain Codel or PIE, the upstream TCP flows will back off to make room for all the acks. This does not require a particularly high drop rate, but it reduces the upstream goodput significantly. Cake instead achieves high goodput in both directions in this especially challenging case.
> Anil Agarwal:
> I have attached the updated document.
There are indeed a lot of suggestions here, which I haven’t yet fully assimilated. I’ll cherry-pick a few that stood out:
> When RTT is small compared to the default Interval, CoDel is slow in reacting and controlling queue size/delay during TCP slow start.
> For example, if Interval = 100 ms and RTT = 20 ms, CoDel will not drop or mark a packet for the first 100 ms of a new TCP connection; if link rate = 10 Mbps, initial window size = 14600 bytes, then during this period, the TCP cwnd will grow to 104 kbytes; queuing delay will be 63 ms.
Cake’s modified Codel does vary its sensitivity according to how quickly the queue is filling. With the default parameters, it may trigger as quickly as 35ms after a large burst arrives, or as slowly as 200ms if the queue is growing very slowly.
It could be improved further in theory, by giving Codel visibility of the length and drain rate of the queue (and thus the ability to predict future delay) rather than only observing the exit delay. However, in the general case the drain rate is variable, so this may be difficult to make robust.
Ultimately, triggering instantly when the queue begins to fill is optimal to deal with slow-start, which will double its cwnd further while the congestion signal is in transit, before responding with a halving. However, an instant trigger is suboptimal for the congestion-avoidance phase of TCP, which is why Codel always delays its response. Flow isolation is a good (if imperfect) countermeasure for the suboptimal slow-start behaviour this causes.
My “Explicit Load Regulation” proposal, which hasn’t yet been widely circulated, aims to get closer to optimal behaviour in both situations. It is an extension to ECN behaviour, giving the option of more moderate signals than ECN permits, while being easier to deploy incrementally than DCTCP is. Hopefully I’ll be able to write it up properly later.
> The draft rfc states - “For example, there is about an 86% probability that no more than two of the 100 VoIP sessions will be involved in any given collision”
> Based on analytical equations for hash collision probabilities (I derived them independently some time ago), the probability of no collision = 90.78%…
Cake’s set-associative hash function reduces the probability of flow collisions further, to negligible values for the above number of flows.
> Alternatively, codel_dequeue() can be re-structured as two routines…
Certainly a refactoring would be beneficial for readability. I haven’t got around to doing it yet.
- Jonathan Morton
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Codel] [aqm] Codel's count variable and re-entering dropping state at small time intervals
[not found] ` <55CDE8E3.4080207@student.kit.edu>
@ 2015-08-14 14:52 ` Jonathan Morton
[not found] ` <55CE085F.5080700@student.kit.edu>
2015-08-14 16:11 ` Jonathan Morton
1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Morton @ 2015-08-14 14:52 UTC (permalink / raw)
To: Polina Goltsman; +Cc: Roland Bless, codel, aqm
> 1. in Cake, count saturates at 2^32-1, am I right (https://github.com/dtaht/sch_cake/blob/master/codel5.h#L352).
>
> could it make sense to saturate count when interval/sqrt(count) and interval/sqrt(count+1) are indistinguishable in timer resolution?
Actually it saturates at 2^16-1, since count is declared as u16 (though it could reasonably be a wider type, too).
At that point, there are about 2560 drops (or marks) per second; if it did go up to 2^32-1, that would be 655360 drops/marks per second. Linux timers on modern hardware generally have microsecond resolution or better, so there is no reason to saturate early even in the latter case.
> 2. I found three more changes from the original Codel:
>
> 2.1 Line292: if (sojourn_time < target || queue is empty) instead of (sojourn_time < target || there is less than MTU bytes in the queue).
>
> was it changed because queue_length_in_bytes < MTU causes sojourn_time < target ?
We found no reason to special-case the last packet in the queue, so we simplified the code by removing it. Detecting MTU (via the “maxpacket” statistic) was rather fragile in the face of GRO and GSO, with some common hardware regularly producing 64K aggregates.
Instead of special-casing the last packet, Cake constrains the target to 1.5 hardware MTUs (detected via the interface API, not via maxpacket) at the shaped rate. This has more nearly the desired effect.
> 2.2. Line 304:
>
> else if (vars->count > 1 && now - vars->drop_next < 8 * interval) {
> /* we were recently dropping; be more aggressive */
> return now - vars->first_above_time >
> codel_control_law(now, interval, vars->rec_inv_sqrt);
> }
>
> 2.3. Line 308:
>
> else if (((now - vars->first_above_time) >> 15) * ((now - codel_get_enqueue_time(skb)) >> 15) > threshold) { return true; }.
> This line is explained in this email from Cake mailing list: https://lists.bufferbloat.net/pipermail/cake/2015-May/000227.html
>
>
> My question is - are these changes valid for a standalone Codel or are they specifically made to optimize Cake's scheduler?
I believe they could be. The intent is to bias interval according to how quickly the queue is growing, so as to react more aggressively to slow-start’s rapid growth than to congestion-avoidance slow growth. It might be worth testing in a standalone context.
- Jonathan Morton
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Codel] [aqm] Codel's count variable and re-entering dropping state at small time intervals
[not found] ` <55CE085F.5080700@student.kit.edu>
@ 2015-08-14 15:59 ` Jonathan Morton
0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Morton @ 2015-08-14 15:59 UTC (permalink / raw)
To: Polina Goltsman; +Cc: Roland Bless, codel, aqm
> On 14 Aug, 2015, at 18:25, Polina Goltsman <polina.goltsman@student.kit.edu> wrote:
>
> What is the default value of "threshold" for default interval and target?
From sch_cake.c 694-696:
fqcd->cparams.target = max(byte_target_ns, ns_target);
fqcd->cparams.interval = max(MS2TIME(100) + fqcd->cparams.target - ns_target, fqcd->cparams.target * 8);
fqcd->cparams.threshold = (fqcd->cparams.target >> 15) * (fqcd->cparams.interval >> 15) * 2;
Given default interval is 100ms (100,000,000 ns) and target is 5ms (5,000,000 ns), threshold will be about 931,322. This is compared to the scaled product of sojourn time and the time since the target was first exceeded, so its units are not time but time-squared.
The calculation is designed to place the trigger when the queue is growing as slowly as possible (ie. resting just above the target sojourn time) at twice the nominal interval. This also places the fastest possible trigger at about a third of the nominal interval.
- Jonathan Morton
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Codel] [aqm] Codel's count variable and re-entering dropping state at small time intervals
[not found] ` <55CDE8E3.4080207@student.kit.edu>
2015-08-14 14:52 ` Jonathan Morton
@ 2015-08-14 16:11 ` Jonathan Morton
1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Morton @ 2015-08-14 16:11 UTC (permalink / raw)
To: Polina Goltsman; +Cc: Roland Bless, codel, aqm
> On 14 Aug, 2015, at 16:10, Polina Goltsman <polina.goltsman@student.kit.edu> wrote:
>
> My question is - are these changes valid for a standalone Codel or are they specifically made to optimize Cake's scheduler?
Incidentally, if you want to test this version of Codel standalone, you can do so within Cake, by turning off all of the other fancy features:
tc qdisc change dev eth0 handle 1: cake unlimited raw besteffort flowblind
This leaves you with no flow-isolation, no Diffserv support, no shaping and no overhead compensation - just Codel and GSO peeling.
- Jonathan Morton
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Codel] [aqm] Codel's count variable and re-entering dropping state at small time intervals
[not found] ` <7A2801D5E40DD64A85E38DF22117852C8813002D@wdc1exchmbxp05.hq.corp.viasat.com>
@ 2015-10-14 19:05 ` Dave Taht
0 siblings, 0 replies; 6+ messages in thread
From: Dave Taht @ 2015-10-14 19:05 UTC (permalink / raw)
To: Agarwal, Anil; +Cc: Toke Høiland-Jørgensen, codel
this is not a day to discuss codel. it is a day to fix the internet.
http://www.businesswire.com/news/home/20151014005564/en/Global-Internet-Experts-Reveal-Plan-Secure-Reliable
Please expect any codel related discussion to go by the wayside for a while.
On Wed, Oct 14, 2015 at 9:01 PM, Agarwal, Anil <Anil.Agarwal@viasat.com> wrote:
> Toke,
>
> Here is the info you requested.
> Please let me know if you find any errors.
>
> For the example in the draft rfc,
> based on analytical equations for hash collision probabilities,
> the probability of no collision = 90.78%,
> probability that no more than two of the 100 VoIP sessions will be involved in any given collision = 99.57%,
> probability that no more than three of the 100 VoIP sessions will be involved in any given collision = 99.99%.
>
> With a 4-way set associative hash table of equivalent size,
> the probability of no collision = ~100%,
>
> With an 8-way set associative hash table of equivalent size,
> the probability of no collision = 99.93%,
>
> Regards,
> Anil
>
> -----Original Message-----
> From: Toke Høiland-Jørgensen [mailto:toke@toke.dk]
> Sent: Wednesday, October 14, 2015 1:58 PM
> To: Agarwal, Anil
> Subject: RE: [aqm] Codel's count variable and re-entering dropping state at small time intervals
>
> Yes, my understanding is #2 (Paul wrote the section originally, will check with him). Wasn't planning to mention set-associative hashes in this draft, but we do use it in cake (the "successor"), so would love to see the analysis!
>
> Incidentally, are you aware of any other fairness queuing systems that use set-associative hashes?
>
> -Toke
>
> On 14 October 2015 19:23:51 CEST, "Agarwal, Anil" <Anil.Agarwal@viasat.com> wrote:
>>Toke,
>>
>>Also, do you plan to add some notes on the use of set-associative hash
>>tables?
>>My analysis also shows collision probabilities for N-way
>>set-associative hash tables.
>>
>>Anil
>>
>>From: Agarwal, Anil
>>Sent: Wednesday, October 14, 2015 1:13 PM
>>To: 'Toke Høiland-Jørgensen'
>>Subject: RE: [aqm] Codel's count variable and re-entering dropping
>>state at small time intervals
>>
>>
>>Toke,
>>
>>
>>
>>I want to make sure I precisely understand the following statement in
>>the draft rfc -
>>
>>
>>
>>There is about an 86% probability that no more than two of the 100 VoIP
>>
>>sessions will be involved in any given collision.
>>
>>
>>
>>Is it saying that -
>>
>>1. There is an 86% probability that 98 sessions will be
>>collision-free and only 2 sessions will collide.
>>
>>I think not.
>>
>>or
>>
>>2. There is an 86% probability that among the sessions that
>>collide, the number of sessions that collide together at a common bin
>>value will be 2.
>>
>>There is a 14% chance that 3 or more sessions will collide into the
>>same bin (and hence share the same queue).
>>
>>or
>>
>>3. Something else
>>
>>
>>
>>I assume it is #2.
>>
>>Please confirm and then I will send you the analysis.
>>
>>
>>
>>Thanks,
>>
>>Anil
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: Toke Høiland-Jørgensen [mailto:toke@toke.dk]
>>Sent: Wednesday, October 14, 2015 9:13 AM
>>To: Agarwal, Anil
>>Subject: Re: [aqm] Codel's count variable and re-entering dropping
>>state at small time intervals
>>
>>
>>
>>"Agarwal, Anil"
>><Anil.Agarwal@viasat.com<mailto:Anil.Agarwal@viasat.com>> writes:
>>
>>
>>
>>> I will clean up the equations and send you the doc in a few hours.
>>
>>
>>
>>Cool, thanks!
>>
>>
>>
>>> Are in the process of revising draft-ietf-aqm-fq-codel-01? It's
>>
>>> expiration date is shown as January 5, 2016. But I suppose it is
>>
>>> useful to post the next update earlier than that.
>>
>>
>>
>>Yeah, figured I'd do another revision before the next meeting, even if
>>the AQM group is not scheduled. Current draft here:
>>
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__kau.toke.dk_ietf_d
>>raft-2Dietf-2Daqm-2Dfq-2Dcodel-2D02.html&d=BQIBAg&c=jcv3orpCsv7C4ly8-ub
>>Dob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&
>>m=2GbxJwOxywTH1qXvOYaEDXpi8xnj4PMSw3vn7aDEszU&s=Sj61jStdaiIE503n8_vQ9Bf
>>k-F9uk_hdcDi9x47m66U&e=
>>
>>
>>
>>-Toke
--
Dave Täht
Do you want faster, better, wifi? https://www.patreon.com/dtaht
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-10-14 19:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <55AD2695.8050605@kit.edu>
2015-07-20 17:04 ` [Codel] Fwd: [aqm] Codel's count variable and re-entering dropping state at small time intervals Dave Taht
2015-07-20 19:34 ` [Codel] " Jonathan Morton
[not found] ` <55CDE8E3.4080207@student.kit.edu>
2015-08-14 14:52 ` Jonathan Morton
[not found] ` <55CE085F.5080700@student.kit.edu>
2015-08-14 15:59 ` Jonathan Morton
2015-08-14 16:11 ` Jonathan Morton
[not found] ` <7A2801D5E40DD64A85E38DF22117852C70AE040F@wdc1exchmbxp05.hq.corp.viasat.com>
[not found] ` <87a8rlvmxy.fsf@toke.dk>
[not found] ` <7A2801D5E40DD64A85E38DF22117852C8812F7E5@wdc1exchmbxp05.hq.corp.viasat.com>
[not found] ` <87io69twyi.fsf@toke.dk>
[not found] ` <7A2801D5E40DD64A85E38DF22117852C8812FD9E@wdc1exchmbxp05.hq.corp.viasat.com>
[not found] ` <703269EE-2B09-4F2B-9462-6CB644371875@toke.dk>
[not found] ` <7A2801D5E40DD64A85E38DF22117852C8813002D@wdc1exchmbxp05.hq.corp.viasat.com>
2015-10-14 19:05 ` Dave Taht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox