[Codel] codel "oversteer"
Kathleen Nichols
nichols at pollere.com
Wed Jun 20 16:07:20 EDT 2012
If most of the buffering is at the device driver level then fq_codel isn't
the answer.
When you get your drop burst, is that codel drops or tail drops? If the
driver just has enough buffering/delay in order to properly service
the link, then you don't really want to involve that in the queue
management.
If there are bugs then who knows? But it would be good to be able to
instrument
the drops and get some trace information.
Kathie
On 6/19/12 6:32 PM, Dave Taht wrote:
> I've been forming a theory regarding codel behavior in some
> pathological conditions. For the sake of developing the theory I'm
> going to return to the original car analogy published here, and add a
> new one - "oversteer".
>
> Briefly:
>
> If the underlying interface device driver is overbuffered, when the
> packet backlog finally makes it into the qdisc layer, that bursts up
> rapidly and codel rapidly ramps up it's drop strategy, which corrects
> the problem, but we are back in a state where we are, as in the case
> of an auto on ice, or a very loose connection to the steering wheel,
> "oversteering" because codel is actually not measuring the entire
> time-width of the queue and unable to control it well, even if it
> could.
>
> What I observe on wireless now with fq_codel under heavy load is
> oscillation in the qdisc layer between 0 length queue and 70 or more
> packets backlogged, a burst of drops when that happens, and far more
> drops than ecn marks that I expected (with the new (arbitrary) drop
> ecn packets if > 2 * target idea I was fiddling with illustrating the
> point better, now). It's difficult to gain further direct insight
> without time and packet traces, and maybe exporting more data to
> userspace, but this kind of explains a report I got privately on x86
> (no ecn drop enabled), and the behavior of fq_codel on wireless on the
> present version of cerowrt.
>
> (I could always have inserted a bug, too, if it wasn't for the private
> report and having to get on a plane shortly I wouldn't be posting this
> now)
>
> Further testing ideas (others!) could try would be:
>
> Increase BQL's setting to over-large values on a BQL enabled interface
> and see what happens
> Test with an overbuffered ethernet interface in the first place
> Improve the ns3 model to have an emulated network interface with
> user-settable buffering
>
> Assuming I'm right and others can reproduce this, this implies that
> focusing much harder on BQL and overbuffering related issues on the
> dozens? hundreds? of non-BQL enabled ethernet drivers is needed at
> this point. And we already know that much more hard work on fixing
> wifi is needed.
>
> Despite this I'm generally pleased with the fq_codel results over
> wireless I'm currently getting from today's build of cerowrt, and
> certainly the BQL-enabled ethernet drivers I've worked with (ar71xx,
> e1000) don't display this behavior, neither does soft rate limiting
> using htb - instead achieving a steady state for the packet backlog,
> accepting bursts, and otherwise being "nice".
>
More information about the Codel
mailing list