[Codel] [PATCH] codel: Refine re-entering drop state to react sooner

Sun Aug 26 14:08:28 EDT 2012

I had an entertaining couple days wrapping my head around
the behavior of the ns2 model, running tons of experiments and putting
together multiple variants of codel and fq_codel. This patch was not
optimal in several ways and should be ignored.

(I should have marked it as an rfc anyway),

I have a new patch coming up that adheres closely to the ns2 model
that won... some notes follow:

On Thu, Aug 23, 2012 at 1:37 AM, Dave Täht <dave.taht at bufferbloat.net> wrote:
> From: Dave Taht <dave.taht at bufferbloat.net>
>
> This patch attempts to smooth out codel behavior in several ways.
>
> These first two are arguably bugs.
>
> 1) Newton's method doesn't run well in reverse, run it twice on a decline

Seems to help.

> 2) Account for the idea of dropping out of drop state after a drop
> upon entering drop state.

Didn't work as well as I thought it would. It's an interesting piece
of information but...

>
> 3) the old "count - lastcount" method gyrates between a heavy dropping state
>    and nearly nothing when it should find an optimum. For example, if
>    the optimum count was 66, which was found by going up 6 from lastcount
>    of 60, the old result would be 6. In this version of the code, it
>    would be 63. Arguably this could be curved by the width of the
>    8*interval between entering drop states, so > interval * 4 could be
>    something like count = count - (3 * 4), or an ewma based on ldelay.

The ns2 notion of a steady count - 2 works well, with the ns2 change in the
ok_to_drop routine that re-runs the control law. I'd done the first in this
version of the patch, but not the latter. With both in the upcoming
patch, life got better. This latter change was key to seeing the bump
up in codel's utilization.

> 4) Note that in heavy dropping states, count now increases slower, as well,
> as it is moved outside of the while loop.

Moving it inside the while loop worked much better...

I did most of my testing at 100Mbit (line rate) and 1Mbit (htb), with
8 bidirectional streams to two devices (from a 3rd), and a competing
ping.
(same (short) RTTs)

Testing at higher rates and with tons more streams and a variety of
RTTs would be useful.

> Some of this is borrowed from ideas in the ns2 code.

The code I have is now identical to the current ns2 code, with
the exception of this bit in s3

                // kmn decay tests
                if(count_ > 126) count_ = 0.9844 * (count_ + 2);

In fq_codel's case, it's really rare count gets this large, except
under a udp flood (where the basic assumption of codel vs tcp doesn't
hold anyway)

Codel can get up there...

In the process of testing I gradually switched the test server box to
running 3.6-rc{1,2,3}, and now kind of need to re-run everything.

But in the upcoming patch:

Bidirectional utilization is nearly perfect. ~182.X consistently for
anything with fq in it. Even pfifo_fast (when coupled with fq on the
other side) in the same ballpark most of the time.

std deviation for 8 streams for fq_codel wins over everything else

Codel'd ECN and non-ecn streams now perform ~identically. Codel
improved from 153/163 Mbit in one test type to ~172/172 - to what
extent this is  the CE fix, TSQ, other changes in 3.6-rc3, or what I
just did, don't know, working on it.

TCP small queues is pretty amazing. I need to backport it to 3.3.8
(only one (puzzling) line of the patch doesn't apply), as I can no
longer "trust" the test server box to misbehave on TCP with any qdisc
and now have to toss more boxes inline on the wire to get interesting
results, and I figure it will do helpful things for netserver on the
routers... D**n it, eric...

-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"