CoDel AQM discussions
 help / color / mirror / Atom feed
From: Dave Taht <dave.taht@gmail.com>
To: codel@lists.bufferbloat.net
Cc: Eric Dumazet <edumazet@google.com>
Subject: [Codel] codel janitorial: increasing signal strength on serious overload, and tail drop, and other ideas
Date: Sun, 19 Aug 2018 18:32:12 -0700	[thread overview]
Message-ID: <CAA93jw68xwoMAYJmY7aw+SOA3+UdhRY127Df=kJ1LULfmZmFig@mail.gmail.com> (raw)

Based on the carnage over on this bug report:
https://github.com/systemd/systemd/issues/9725 and after exploring
various options (I really want the pacing rate to be independent of
the cwnd, but don't know how)...

I'm contemplating two ideas for when fq_codel has lost control of the
queue, neither "solves" the problem stated there, but might help in
the general case. There's been a few others too...

1) increase codel's signal strength on a bulk drop, by a bulk
increment count and reschedule next drop.

I *loved* the bulk dropper idea, as it made a big difference cpu-wise
while responding to udp floods on a string of tests in an early
deployment.

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/sched/sch_fq_codel.c#n153

However, I think (!) it should also basically replicate what codel
would have done here:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/include/net/codel_impl.h#n176

 to increase the drop rate for that flow. And that's easy, just slam in

    vars->count+=i;
    codel_Newton_step(vars);
    codel_control_law(now, params->interval, vars->rec_inv_sqrt);

looks like a win to me. I'll go experiment.

2) add tail drop mode for local packets

pfifo-fast and sch_fq will refuse additional packets when the packet
limit is hit. The local stack, won't drop them, but reduce cwnd and
resubmit. Codel is usually running below its limits in the first
place, but if we identify that a packet is from the local stack
(how?), perhaps we can just push back there instead of hitting the
bulk dropper.

I don't care for this idea because I feel it should hit the fattest
flow instead, or be exerting backpressure via some other means.

Neither of these ideas actually solve "the problem", and I didn't
agree with either the problem statement or the proposed solution on
that particular bug report either, and it took me a while to do a
variety of experiments (and calm down).

Other stuff that has been in my out of tree patchset for a while have been:

3) getting rid of the ce_threshold mode, which so far I know (?) was
the topic of a failed experiment elsewhere in google and entirely
unused anywhere else I know of. It's hard to delete "features", in
linux, but I could put it out there (at least for openwrt) and see if
anyone screams. Saves 32 bits and a couple if statements.

4) the flow->dropped stat is not very useful as it is reset every time
the queue empties and is only visible if you do a tc class dump. It
could be made useful by not zeroing it out, but I don't think it's
used....

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/sched/sch_fq_codel.c#n217

5) It seems possible to get the bulk dropper to be closer to O(1) by
merely keeping track of the max backlogged flow continuously. You do a
bulk drop, and then you have up to 64 rounds of fq to find the next
biggest fat flow.

Moving the per flow backlog statistic

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/sched/sch_fq_codel.c#n60

out of its own array and sticking it in codel_vars will save on a
cache miss as well. There happens to be 15 spare bits in codel_vars
(https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/sched/sch_fq_codel.c#n217)
a maxbacklog of packet len shifted right 6 leaves 4MB for that which
seems almost enough...... and it doesn't need to be that accurate as
we keep track of the total backlog elsewhere.

(part of my reason for proposing 3 & 4 is that makes 5 a zero cost new
option, and after the cpu monster cake became I had an urge to save
cycles somewhere to balance my karma)

6) I don't think the bulk dropper ever made it to the wifi code?

-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

                 reply	other threads:[~2018-08-20  1:32 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/codel.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA93jw68xwoMAYJmY7aw+SOA3+UdhRY127Df=kJ1LULfmZmFig@mail.gmail.com' \
    --to=dave.taht@gmail.com \
    --cc=codel@lists.bufferbloat.net \
    --cc=edumazet@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox