* [Codel] fq_codel: revenge of the standing queue
@ 2012-09-04 18:35 Dave Taht
2012-09-04 18:50 ` Eric Dumazet
2012-09-04 19:02 ` Eric Dumazet
0 siblings, 2 replies; 7+ messages in thread
From: Dave Taht @ 2012-09-04 18:35 UTC (permalink / raw)
To: codel
As presently implemented, multi-queue fq_codel has a few FIFO codel
assumptions baked into it.
The first, already fixed, was that codel's state was completely reset
when a fq_codel queue was emptied. This has been fixed via a patch to
mainline.
However there are other places where codel's state is reset where
(perhaps!) it shouldn't. I note that I am NOT at this time
recommending these
changes to the codebase, the current behavior "does little harm", and
eventually gets the standing queue down to a reasonable, if not ideal,
minimum. I think it has the potential to run away at seriously high
numbers of flows, but haven't tested that.
I've tried various combinations of fixes below and run into multiple
issues. I think some of the stability/fairness problems I've had might
be an interaction with TSQ, and need to revise the testbed to be a
pure routing environment, and that is going to take time that I don't
have right now.
I hope that in pointing out these issue others will fiddle, and I do
have various patches that I can make available, that try different
stuff.
My basic test is running 75 TCP_MAERTS streams and 75 TCP_STREAM
streams at 10 or 100Mbit, for 60 seconds or more, with which I end up
with a standing queue of 80-150 packets (depending on codel version
and TSQ) and consequent large RTT. I am mostly using the
nearly-current-ns2 model (ported to linux) of codel.h. These results
hold as a function of the number of streams.
1) An empty queue in fq_codel has no meaningful information.
in: codel_should_drop
if (!skb) {
vars->first_above_time = 0;
return false;
}
I believe the codel intent here was to reset codel's state when the
single FIFO queue was emptied. In fq_codel's case, an empty queue
contains no information about the state, really, and a
if (!skb) {
if(sch->qstats.backlog <= mtu)
vars->first_above_time = 0;
return false;
}
comes closer to the intent. That said, the null skb then bleeds into
the rest of the algorithm
in codel_dequeue
if (!skb) {
vars->dropping = false;
return skb;
}
later on there's
if (vars->dropping) {
if (!drop) {
/* sojourn time below target - leave dropping state */
vars->dropping = false;
and the same assumption within the while loop and in the else if(drop)...
I'm pretty sure that leaving of dropping state just because this
fq_codel queue is (temporarily) empty is not the right thing, and that
the main reason for exiting the dropping state should be getting under
the target delay. It might make sense to reschedule the next drop on a
null skb, perhaps after reducing count...
While I've fiddled with these ideas, and got some drainage, I do get
fairly big oscillations in queue depth, and starvation of some flows,
in various versions of my explorations. Which led me to looking at
quantums and...
2) New flows
The next issue (at these 10 and 100Mbit rates) is the "new flow" idea
in fq_codel. It is VERY useful pushing sparse flows to the fore of the
queues, and also provided a boost to long RTT flows. However at high
levels of occupancy and low bandwidth, flows empty their queue
rapidly, become "new flows", and then mislead codel for that flow into
resetting itself again, since we end up with a short delay for the
re-entrance. This isn't a particularly horrible behavior, but as my
own hope for this feature was to make voip better primarily, and
everything else we got from it was a bonus.
TSQ really made this oddity show up big. I think on routed traffic it
will be less of an issue.
Anyway, instead of depending on "empty queue" as a sign of a new flow,
I have fiddled with adding the consideration
if (list_empty(&flow->flowchain)) {
if(codel_time_after(now - flow->cvars.last_delivery,
q->cparams.target)) {
list_add_tail(&flow->flowchain, &q->new_flows);
q->new_flow_count++;
flow->deficit = q->quantum;
} else {
list_add_tail(&flow->flowchain, &q->old_flows);
}
}
which seems to do more of what I'd wanted in the first place. (last
delivery is set on every delivery from the flow)
...
3) I do kind of like amortizing the slope of the delay over an
interval (are we winning? Losing? By how much?) and trying
to inform decisions that way...
--
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] fq_codel: revenge of the standing queue
2012-09-04 18:35 [Codel] fq_codel: revenge of the standing queue Dave Taht
@ 2012-09-04 18:50 ` Eric Dumazet
2012-09-04 20:02 ` Kathleen Nichols
2012-09-04 19:02 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-09-04 18:50 UTC (permalink / raw)
To: Dave Taht; +Cc: codel
On Tue, 2012-09-04 at 11:35 -0700, Dave Taht wrote:
> 2) New flows
>
> The next issue (at these 10 and 100Mbit rates) is the "new flow" idea
> in fq_codel. It is VERY useful pushing sparse flows to the fore of the
> queues, and also provided a boost to long RTT flows. However at high
> levels of occupancy and low bandwidth, flows empty their queue
> rapidly, become "new flows", and then mislead codel for that flow into
> resetting itself again, since we end up with a short delay for the
> re-entrance. This isn't a particularly horrible behavior, but as my
> own hope for this feature was to make voip better primarily, and
> everything else we got from it was a bonus.
>
> TSQ really made this oddity show up big. I think on routed traffic it
> will be less of an issue.
This is simply not true, and based on misunderstanding on what does
fq_codel.
A flow never leaves the new flow list before doing a full RR cycle in
old flow list (even without any packet in this flow)
_IF_ we did a full RR cycle, then we accumulated a full quantum of
credit, and lost our rank the 'fq_codel RR queue' (the combination of
the 2 internal queues, new and old)
Therefore, we allow the flow to be queued in 'new flow', to recover a
bit of the lost rank in queue.
Basically you dont interpret correctly the 'new_flow_count' counter.
It means almost nothing at all, since for a given TCP flow, we _can_
increment this counter several time.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] fq_codel: revenge of the standing queue
2012-09-04 18:50 ` Eric Dumazet
@ 2012-09-04 20:02 ` Kathleen Nichols
2012-09-04 20:22 ` Dave Taht
0 siblings, 1 reply; 7+ messages in thread
From: Kathleen Nichols @ 2012-09-04 20:02 UTC (permalink / raw)
To: codel
I can't keep up with the pace here, but I'm compelled to express puzzlement
at the notion that fq_codel gives *longer* standing queues than codel. The
opposite happens with sfq_codel which is, I believe, doing the same basic
things as fq_codel (a similar approach to that below for queues/bins that go
empty) except for being packet scheduled rather than byte scheduled. I
see this wonderful, well-controlled behavior without "reverse traffic"
issues.
This is why I think Eric's work on this rocks.
imho,
Kathie
On 9/4/12 11:50 AM, Eric Dumazet wrote:
> On Tue, 2012-09-04 at 11:35 -0700, Dave Taht wrote:
>
>> 2) New flows
>>
>> The next issue (at these 10 and 100Mbit rates) is the "new flow" idea
>> in fq_codel. It is VERY useful pushing sparse flows to the fore of the
>> queues, and also provided a boost to long RTT flows. However at high
>> levels of occupancy and low bandwidth, flows empty their queue
>> rapidly, become "new flows", and then mislead codel for that flow into
>> resetting itself again, since we end up with a short delay for the
>> re-entrance. This isn't a particularly horrible behavior, but as my
>> own hope for this feature was to make voip better primarily, and
>> everything else we got from it was a bonus.
>>
>> TSQ really made this oddity show up big. I think on routed traffic it
>> will be less of an issue.
>
> This is simply not true, and based on misunderstanding on what does
> fq_codel.
>
> A flow never leaves the new flow list before doing a full RR cycle in
> old flow list (even without any packet in this flow)
>
> _IF_ we did a full RR cycle, then we accumulated a full quantum of
> credit, and lost our rank the 'fq_codel RR queue' (the combination of
> the 2 internal queues, new and old)
>
> Therefore, we allow the flow to be queued in 'new flow', to recover a
> bit of the lost rank in queue.
>
> Basically you dont interpret correctly the 'new_flow_count' counter.
>
> It means almost nothing at all, since for a given TCP flow, we _can_
> increment this counter several time.
>
>
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] fq_codel: revenge of the standing queue
2012-09-04 20:02 ` Kathleen Nichols
@ 2012-09-04 20:22 ` Dave Taht
0 siblings, 0 replies; 7+ messages in thread
From: Dave Taht @ 2012-09-04 20:22 UTC (permalink / raw)
To: Kathleen Nichols; +Cc: codel
On Tue, Sep 4, 2012 at 1:02 PM, Kathleen Nichols <nichols@pollere.com> wrote:
>
>
> I can't keep up with the pace here, but I'm compelled to express puzzlement
> at the notion that fq_codel gives *longer* standing queues than codel.
I'm not saying that. fq_codel < codel < pfifo_fast, certainly and we
don't run into needing
floating point on count, and it's totally great.
I'm saying that at high numbers of flows we end up with a standing
queue in fq_codel.
That's it.
It would be cool to find a way to drain that swamp more, or understand
better what I see, and that was the basic thrust of that long email
(that I had been
sitting on a week), where I tried to point at what I felt were flawed
assumptions
regarding temporarily empty queues.
Eric views fq_codel as a "codel multiplexor", and I was trying to describe,
obviously unsuccessfully, possible ways to do a merger of fq PLUS codel that
might work better.
>The
> opposite happens with sfq_codel which is, I believe, doing the same basic
> things as fq_codel (a similar approach to that below for queues/bins that go
> empty) except for being packet scheduled rather than byte scheduled. I
> see this wonderful, well-controlled behavior without "reverse traffic"
> issues.
If you run your sfqcodel sim at 10 and 100mbit with 150 bidirectional
flows, what do you get?
300?
1000?
> This is why I think Eric's work on this rocks.
I think it rocks too.
Most of my own battle of late has been with uTP.
>
> imho,
> Kathie
I'm going to crawl back under my rock now.
>
>
> On 9/4/12 11:50 AM, Eric Dumazet wrote:
>> On Tue, 2012-09-04 at 11:35 -0700, Dave Taht wrote:
>>
>>> 2) New flows
>>>
>>> The next issue (at these 10 and 100Mbit rates) is the "new flow" idea
>>> in fq_codel. It is VERY useful pushing sparse flows to the fore of the
>>> queues, and also provided a boost to long RTT flows. However at high
>>> levels of occupancy and low bandwidth, flows empty their queue
>>> rapidly, become "new flows", and then mislead codel for that flow into
>>> resetting itself again, since we end up with a short delay for the
>>> re-entrance. This isn't a particularly horrible behavior, but as my
>>> own hope for this feature was to make voip better primarily, and
>>> everything else we got from it was a bonus.
>>>
>>> TSQ really made this oddity show up big. I think on routed traffic it
>>> will be less of an issue.
>>
>> This is simply not true, and based on misunderstanding on what does
>> fq_codel.
>>
>> A flow never leaves the new flow list before doing a full RR cycle in
>> old flow list (even without any packet in this flow)
>>
>> _IF_ we did a full RR cycle, then we accumulated a full quantum of
>> credit, and lost our rank the 'fq_codel RR queue' (the combination of
>> the 2 internal queues, new and old)
>>
>> Therefore, we allow the flow to be queued in 'new flow', to recover a
>> bit of the lost rank in queue.
>>
>> Basically you dont interpret correctly the 'new_flow_count' counter.
>>
>> It means almost nothing at all, since for a given TCP flow, we _can_
>> increment this counter several time.
>>
>>
>>
>> _______________________________________________
>> Codel mailing list
>> Codel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/codel
>>
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
--
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] fq_codel: revenge of the standing queue
2012-09-04 18:35 [Codel] fq_codel: revenge of the standing queue Dave Taht
2012-09-04 18:50 ` Eric Dumazet
@ 2012-09-04 19:02 ` Eric Dumazet
2012-09-04 19:51 ` Dave Taht
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-09-04 19:02 UTC (permalink / raw)
To: Dave Taht; +Cc: codel
On Tue, 2012-09-04 at 11:35 -0700, Dave Taht wrote:
> 1) An empty queue in fq_codel has no meaningful information.
>
> in: codel_should_drop
>
> if (!skb) {
> vars->first_above_time = 0;
> return false;
> }
> I believe the codel intent here was to reset codel's state when the
> single FIFO queue was emptied. In fq_codel's case, an empty queue
> contains no information about the state, really, and a
>
> if (!skb) {
> if(sch->qstats.backlog <= mtu)
> vars->first_above_time = 0;
> return false;
> }
>
> comes closer to the intent. That said, the null skb then bleeds into
> the rest of the algorithm
>
> in codel_dequeue
>
>
> if (!skb) {
> vars->dropping = false;
> return skb;
> }
>
> later on there's
>
> if (vars->dropping) {
> if (!drop) {
> /* sojourn time below target - leave dropping state */
> vars->dropping = false;
>
> and the same assumption within the while loop and in the else if(drop)...
>
> I'm pretty sure that leaving of dropping state just because this
> fq_codel queue is (temporarily) empty is not the right thing, and that
> the main reason for exiting the dropping state should be getting under
> the target delay. It might make sense to reschedule the next drop on a
> null skb, perhaps after reducing count...
>
> While I've fiddled with these ideas, and got some drainage, I do get
> fairly big oscillations in queue depth, and starvation of some flows,
> in various versions of my explorations. Which led me to looking at
> quantums and...
I have no idea of what you try to say.
Each flow has its own cvars :
skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
dequeue);
So each flow has its own codel unit.
Try to view fq_codel as a multiplexor, then a Codel unit for each flow.
So if you believe there is a bug in Codel, try to describe the big in
Codel, not in fq_codel, because fq_codel is _not_ a codel variant.
The only thing that fq_codel codel units share are the parameters and
stats.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] fq_codel: revenge of the standing queue
2012-09-04 19:02 ` Eric Dumazet
@ 2012-09-04 19:51 ` Dave Taht
2012-09-04 20:10 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2012-09-04 19:51 UTC (permalink / raw)
To: Eric Dumazet; +Cc: codel
On Tue, Sep 4, 2012 at 12:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-09-04 at 11:35 -0700, Dave Taht wrote:
>
>> 1) An empty queue in fq_codel has no meaningful information.
>>
>> in: codel_should_drop
>>
>> if (!skb) {
>> vars->first_above_time = 0;
>> return false;
>> }
>
>
>
>> I believe the codel intent here was to reset codel's state when the
>> single FIFO queue was emptied. In fq_codel's case, an empty queue
>> contains no information about the state, really, and a
>>
>> if (!skb) {
>> if(sch->qstats.backlog <= mtu)
>> vars->first_above_time = 0;
>> return false;
>> }
>>
>> comes closer to the intent. That said, the null skb then bleeds into
>> the rest of the algorithm
>>
>> in codel_dequeue
>>
>>
>> if (!skb) {
>> vars->dropping = false;
>> return skb;
>> }
>>
>> later on there's
>>
>> if (vars->dropping) {
>> if (!drop) {
>> /* sojourn time below target - leave dropping state */
>> vars->dropping = false;
>>
>> and the same assumption within the while loop and in the else if(drop)...
>>
>> I'm pretty sure that leaving of dropping state just because this
>> fq_codel queue is (temporarily) empty is not the right thing, and that
>> the main reason for exiting the dropping state should be getting under
>> the target delay. It might make sense to reschedule the next drop on a
>> null skb, perhaps after reducing count...
>>
>> While I've fiddled with these ideas, and got some drainage, I do get
>> fairly big oscillations in queue depth, and starvation of some flows,
>> in various versions of my explorations. Which led me to looking at
>> quantums and...
>
> I have no idea of what you try to say.
>
> Each flow has its own cvars :
>
> skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
> dequeue);
- and we exit dropping state when that queue empties, when globally,
across all the fq_codel queues, we still need to be dropping in order
to get to the target.
And that first_above time for that cvars for that fq_codel queue is
reset to 0 for that fq_codel queue when it empties, forcing a recalc
of the right interval (sojourn) for re-entering dropping state, with a
"hands off" interval...
It's seems reasonably ok for a fq_codel queue to go empty for a while
but not have to go through a sojourn again to start dropping. It makes
sense to always deliver one packet after going empty...
these thoughts are half formed, and I did my damnedest to describe the
behavior I was seeing.
> So each flow has its own codel unit.
>
> Try to view fq_codel as a multiplexor, then a Codel unit for each flow.
I do.
> So if you believe there is a bug in Codel, try to describe the big in
> Codel, not in fq_codel, because fq_codel is _not_ a codel variant.
No, this thread was about fq_codel's assumptions differences from
codel's assumptions.
Codel has it's own bugs, which I didn't talk to in this thread.
> The only thing that fq_codel codel units share are the parameters and
> stats.
Understood.
>
>
>
>
>
--
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] fq_codel: revenge of the standing queue
2012-09-04 19:51 ` Dave Taht
@ 2012-09-04 20:10 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2012-09-04 20:10 UTC (permalink / raw)
To: Dave Taht; +Cc: codel
On Tue, 2012-09-04 at 12:51 -0700, Dave Taht wrote:
> - and we exit dropping state when that queue empties, when globally,
> across all the fq_codel queues, we still need to be dropping in order
> to get to the target.
>
Not at all. You are trying to view fq_codel as a global controller,
which it is not.
> And that first_above time for that cvars for that fq_codel queue is
> reset to 0 for that fq_codel queue when it empties, forcing a recalc
> of the right interval (sojourn) for re-entering dropping state, with a
> "hands off" interval...
>
> It's seems reasonably ok for a fq_codel queue to go empty for a while
> but not have to go through a sojourn again to start dropping. It makes
> sense to always deliver one packet after going empty...
>
> these thoughts are half formed, and I did my damnedest to describe the
> behavior I was seeing.
Then you describe a codel bug.
But its not a bug. If a queue is empty, we set dropping to false, so
that next time we dequeue a packet, we can check how long queue was
empty.
Once again, this has nothing to do with fq_codel.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-09-04 20:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-04 18:35 [Codel] fq_codel: revenge of the standing queue Dave Taht
2012-09-04 18:50 ` Eric Dumazet
2012-09-04 20:02 ` Kathleen Nichols
2012-09-04 20:22 ` Dave Taht
2012-09-04 19:02 ` Eric Dumazet
2012-09-04 19:51 ` Dave Taht
2012-09-04 20:10 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox