From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Toke Høiland-Jørgensen" <toke@toke.dk>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: codel <codel@lists.bufferbloat.net>, bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Codel] [Bloat] Describing fq_codel
Date: Sun, 9 Feb 2014 21:09:38 +0100 [thread overview]
Message-ID: <20140209210938.54a1a231@redhat.com> (raw)
In-Reply-To: <87ha8cl3qw.fsf@toke.dk>
[-- Attachment #1: Type: text/plain, Size: 7746 bytes --]
(top post)
Thank you Toke for this excellent description, it is really good!
AFAIKR Paul also did a description of fq_codel, but with a focus on the
SFQ part. I actually think, these two descriptions could be combined.
Perhaps Paul can give us a link to his desc?
And where do you plan to put this excellent description (so its more
accessible to mankind)?
--Jesper
On Thu, 06 Feb 2014 15:23:03 +0100
Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> I recently had occasion to describe the scheduling mechanism in fq_codel
> in plain text. I thought this might be useful to have for general
> reference somewhere, so I'm posting it here to elicit comments and have
> a plan to post it to the bufferbloat wiki somewhere afterwards.
>
> See below.
>
> -Toke
>
>
> FQ_CODEL
>
> The fq_codel queueing discipline in Linux is an implementation of a
> somewhat modified stochastic fairness queueing algorithm, with CoDel
> added as an AQM for the individual queues. As such, it consists of two
> parts: the scheduler which selects which queue to dequeue a packet from,
> and the CoDel AQM which works on each of the queues built in to the
> qdisc. The subtleties of fq_codel are mostly in the scheduling part,
> which is the subject of this description. For a description of CoDel,
> refer to Kathy and Van's paper[1].
>
> The interaction between the scheduler and the CoDel algorithm are fairly
> straight forward: At initiation (i.e. at boot, or when fq_codel is first
> installed on the interface, or its parameters are changed), the list of
> queues is initialised so that each queue has a separate set of CoDel
> state variables. By default, 1024 queues are created, and packets are
> hashed into them at enqueue time. Each queue maintains the CoDel state
> variables throughout its lifetime, and so acts the same as the non-fq
> CoDel variant would (including retaining the control law state when the
> queue drains, etc).
>
> As for the scheduler, it is different from a straight-forward conceptual
> SFQ scheduler[2] in a number of respects:
>
> - fq_codel is byte-based, employing a deficit round-robin mechanism[3]
> between queues. The quantum is configurable, but defaults to the
> interface MTU. This means that if one flow sends packets of size
> MTU/3, and another sends MTU-sized packets, the first flow will
> dequeue three packets each time it gets a turn, whereas the second
> flow only dequeues one. This is kept track of by maintaining a byte
> dequeue deficit for each queue, which is first initialised to the
> quantum value, and decreased by the packet size on each dequeue from
> the queue.
>
> - Queue space is shared: there's a global limit on the number of packets
> the queues can hold, but not one per queue. If the space runs out at
> enqueue time, the queue with the largest number of *bytes* in it will
> get a packet dropped. This means that the packet being enqueued will
> pretty much never be dropped; rather a different packet is dropped,
> and not necessarily from the same queue. Packets are always dropped
> from the head of a queue.
>
> - fq_codel maintains two lists of active queues, called "new" and "old"
> queues ("new" and "old" being the terms used in the code). When a
> packet is added to a queue that is not currently active, that queue is
> added to the list of new queues. This is the source of some subtlety
> in the packet scheduling at dequeue time, explained below.
>
> - Most of fq_codel's scheduling is done at packet dequeue time. It
> consists of three parts: selecting a queue from which to dequeue a
> packet, actually dequeuing it (employing the CoDel algorithm in the
> process), and some final bookkeeping.
>
> For the first part, the scheduler first looks at the list of new
> queues; for each queue in that list, if that queue has a negative
> deficit (i.e. it has already dequeued a quantum of bytes (or more)),
> its deficit is increased by one quantum, and the queue is put onto the
> end of the list of old queues, and the routine starts again.
> Otherwise, that queue is selected for dequeue. If the list of new
> queues is empty, the scheduler proceeds down the list of old queues in
> the same fashion (checking the deficit, and either selecting the queue
> for dequeuing, or increasing the deficit and putting the queue back at
> the end of the list).
>
> After having selected a queue from which to dequeue a packet, the
> CoDel algorithm is invoked on that queue. This applies the CoDel
> control law in the usual fashion, and may discard one or more packets
> from the head of the queue, before returning the packet that should be
> dequeued (or nothing if the queue is or becomes empty while being
> handled by the CoDel algorithm).
>
> Finally, if the CoDel algorithm did not return a packet, the queue is
> empty, and the scheduler does one of two things: if the queue selected
> for dequeue came from the list of new queues, it is moved to the end
> of the list of old queues. If instead it came from the list of old
> queues, that queue is removed from the list, to be added back (as a
> new queue) the next time a packet arrives that hashes to that queue.
> Then, (since no packet was available for dequeue), the whole dequeue
> process is restarted from the beginning.
>
> If, instead, the scheduler *did* get a packet back from the CoDel
> algorithm, it updates the byte deficit for the selected queue before
> returning the packet to the lower layers of the kernel networking
> stack for sending.
>
> The new/old queue distinction has a particular consequence for queues
> that don't build up more than a quantum bytes before being visited by
> the scheduler (as a new queue), *and* then stay empty until the
> scheduler comes back around to it in the list of old queues: Such queues
> will get removed from the list, and then re-added as a new queue each
> time a packet arrives for it, and so will always get priority over
> queues that do not empty out each round. Exactly how much data a flow
> has to send to keep its queue in this state is somewhat difficult to
> reason about, because it depends on both the egress link speed and the
> number of concurrent flows. This makes it harder to reason about the
> behaviour of fq_codel. However, in practice many things that are
> beneficial to have prioritised for typical internet use (ACKs, DNS
> lookups, interactive SSH, HTTP requests; and also ICMP pings) *tend* to
> fall in this category, which is why fq_codel performs so well for many
> practical applications.
>
> For those interested in examining the behaviour of fq_codel in more
> detail, the code can be found in your local Linux source tree, in
> net/sched/sch_fq_codel.c[4]. While some of it can be somewhat of a
> challenge to comprehend, it overall is a very instructive example of a
> practical implementation of a queueing algorithm in a modern operating
> system.
>
> [1] http://queue.acm.org/detail.cfm?id=2209336
>
> [2] For a description of SFQ, refer to your system man page `man tc-sfq`
> or to the paper by Paul McKenney:
> http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf
>
> [3] See http://en.wikipedia.org/wiki/Deficit_round_robin
>
> [4] Or online at
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/sched/sch_fq_codel.c
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 230 bytes --]
next prev parent reply other threads:[~2014-02-09 20:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-06 14:23 [Codel] " Toke Høiland-Jørgensen
2014-02-09 3:52 ` [Codel] [Bloat] Describing fq_codel to a layperson Rich Brown
2014-02-09 18:45 ` Toke Høiland-Jørgensen
2014-02-09 20:09 ` Jesper Dangaard Brouer [this message]
2014-02-09 21:36 ` [Codel] [Bloat] Describing fq_codel Paul E. McKenney
2014-02-09 21:46 ` Toke Høiland-Jørgensen
2014-02-10 16:06 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/codel.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140209210938.54a1a231@redhat.com \
--to=brouer@redhat.com \
--cc=bloat@lists.bufferbloat.net \
--cc=codel@lists.bufferbloat.net \
--cc=paulmck@linux.vnet.ibm.com \
--cc=toke@toke.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox