[Codel] Describing fq_codel
toke at toke.dk
Thu Feb 6 09:23:03 EST 2014
I recently had occasion to describe the scheduling mechanism in fq_codel
in plain text. I thought this might be useful to have for general
reference somewhere, so I'm posting it here to elicit comments and have
a plan to post it to the bufferbloat wiki somewhere afterwards.
The fq_codel queueing discipline in Linux is an implementation of a
somewhat modified stochastic fairness queueing algorithm, with CoDel
added as an AQM for the individual queues. As such, it consists of two
parts: the scheduler which selects which queue to dequeue a packet from,
and the CoDel AQM which works on each of the queues built in to the
qdisc. The subtleties of fq_codel are mostly in the scheduling part,
which is the subject of this description. For a description of CoDel,
refer to Kathy and Van's paper.
The interaction between the scheduler and the CoDel algorithm are fairly
straight forward: At initiation (i.e. at boot, or when fq_codel is first
installed on the interface, or its parameters are changed), the list of
queues is initialised so that each queue has a separate set of CoDel
state variables. By default, 1024 queues are created, and packets are
hashed into them at enqueue time. Each queue maintains the CoDel state
variables throughout its lifetime, and so acts the same as the non-fq
CoDel variant would (including retaining the control law state when the
queue drains, etc).
As for the scheduler, it is different from a straight-forward conceptual
SFQ scheduler in a number of respects:
- fq_codel is byte-based, employing a deficit round-robin mechanism
between queues. The quantum is configurable, but defaults to the
interface MTU. This means that if one flow sends packets of size
MTU/3, and another sends MTU-sized packets, the first flow will
dequeue three packets each time it gets a turn, whereas the second
flow only dequeues one. This is kept track of by maintaining a byte
dequeue deficit for each queue, which is first initialised to the
quantum value, and decreased by the packet size on each dequeue from
- Queue space is shared: there's a global limit on the number of packets
the queues can hold, but not one per queue. If the space runs out at
enqueue time, the queue with the largest number of *bytes* in it will
get a packet dropped. This means that the packet being enqueued will
pretty much never be dropped; rather a different packet is dropped,
and not necessarily from the same queue. Packets are always dropped
from the head of a queue.
- fq_codel maintains two lists of active queues, called "new" and "old"
queues ("new" and "old" being the terms used in the code). When a
packet is added to a queue that is not currently active, that queue is
added to the list of new queues. This is the source of some subtlety
in the packet scheduling at dequeue time, explained below.
- Most of fq_codel's scheduling is done at packet dequeue time. It
consists of three parts: selecting a queue from which to dequeue a
packet, actually dequeuing it (employing the CoDel algorithm in the
process), and some final bookkeeping.
For the first part, the scheduler first looks at the list of new
queues; for each queue in that list, if that queue has a negative
deficit (i.e. it has already dequeued a quantum of bytes (or more)),
its deficit is increased by one quantum, and the queue is put onto the
end of the list of old queues, and the routine starts again.
Otherwise, that queue is selected for dequeue. If the list of new
queues is empty, the scheduler proceeds down the list of old queues in
the same fashion (checking the deficit, and either selecting the queue
for dequeuing, or increasing the deficit and putting the queue back at
the end of the list).
After having selected a queue from which to dequeue a packet, the
CoDel algorithm is invoked on that queue. This applies the CoDel
control law in the usual fashion, and may discard one or more packets
from the head of the queue, before returning the packet that should be
dequeued (or nothing if the queue is or becomes empty while being
handled by the CoDel algorithm).
Finally, if the CoDel algorithm did not return a packet, the queue is
empty, and the scheduler does one of two things: if the queue selected
for dequeue came from the list of new queues, it is moved to the end
of the list of old queues. If instead it came from the list of old
queues, that queue is removed from the list, to be added back (as a
new queue) the next time a packet arrives that hashes to that queue.
Then, (since no packet was available for dequeue), the whole dequeue
process is restarted from the beginning.
If, instead, the scheduler *did* get a packet back from the CoDel
algorithm, it updates the byte deficit for the selected queue before
returning the packet to the lower layers of the kernel networking
stack for sending.
The new/old queue distinction has a particular consequence for queues
that don't build up more than a quantum bytes before being visited by
the scheduler (as a new queue), *and* then stay empty until the
scheduler comes back around to it in the list of old queues: Such queues
will get removed from the list, and then re-added as a new queue each
time a packet arrives for it, and so will always get priority over
queues that do not empty out each round. Exactly how much data a flow
has to send to keep its queue in this state is somewhat difficult to
reason about, because it depends on both the egress link speed and the
number of concurrent flows. This makes it harder to reason about the
behaviour of fq_codel. However, in practice many things that are
beneficial to have prioritised for typical internet use (ACKs, DNS
lookups, interactive SSH, HTTP requests; and also ICMP pings) *tend* to
fall in this category, which is why fq_codel performs so well for many
For those interested in examining the behaviour of fq_codel in more
detail, the code can be found in your local Linux source tree, in
net/sched/sch_fq_codel.c. While some of it can be somewhat of a
challenge to comprehend, it overall is a very instructive example of a
practical implementation of a queueing algorithm in a modern operating
 For a description of SFQ, refer to your system man page `man tc-sfq`
or to the paper by Paul McKenney:
 See http://en.wikipedia.org/wiki/Deficit_round_robin
 Or online at
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 489 bytes
Desc: not available
More information about the Codel