* [Codel] Describing fq_codel
@ 2014-02-06 14:23 Toke Høiland-Jørgensen
2014-02-09 3:52 ` [Codel] [Bloat] Describing fq_codel to a layperson Rich Brown
2014-02-09 20:09 ` [Codel] [Bloat] Describing fq_codel Jesper Dangaard Brouer
0 siblings, 2 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2014-02-06 14:23 UTC (permalink / raw)
To: bloat, codel
[-- Attachment #1: Type: text/plain, Size: 6695 bytes --]
I recently had occasion to describe the scheduling mechanism in fq_codel
in plain text. I thought this might be useful to have for general
reference somewhere, so I'm posting it here to elicit comments and have
a plan to post it to the bufferbloat wiki somewhere afterwards.
See below.
-Toke
FQ_CODEL
The fq_codel queueing discipline in Linux is an implementation of a
somewhat modified stochastic fairness queueing algorithm, with CoDel
added as an AQM for the individual queues. As such, it consists of two
parts: the scheduler which selects which queue to dequeue a packet from,
and the CoDel AQM which works on each of the queues built in to the
qdisc. The subtleties of fq_codel are mostly in the scheduling part,
which is the subject of this description. For a description of CoDel,
refer to Kathy and Van's paper[1].
The interaction between the scheduler and the CoDel algorithm are fairly
straight forward: At initiation (i.e. at boot, or when fq_codel is first
installed on the interface, or its parameters are changed), the list of
queues is initialised so that each queue has a separate set of CoDel
state variables. By default, 1024 queues are created, and packets are
hashed into them at enqueue time. Each queue maintains the CoDel state
variables throughout its lifetime, and so acts the same as the non-fq
CoDel variant would (including retaining the control law state when the
queue drains, etc).
As for the scheduler, it is different from a straight-forward conceptual
SFQ scheduler[2] in a number of respects:
- fq_codel is byte-based, employing a deficit round-robin mechanism[3]
between queues. The quantum is configurable, but defaults to the
interface MTU. This means that if one flow sends packets of size
MTU/3, and another sends MTU-sized packets, the first flow will
dequeue three packets each time it gets a turn, whereas the second
flow only dequeues one. This is kept track of by maintaining a byte
dequeue deficit for each queue, which is first initialised to the
quantum value, and decreased by the packet size on each dequeue from
the queue.
- Queue space is shared: there's a global limit on the number of packets
the queues can hold, but not one per queue. If the space runs out at
enqueue time, the queue with the largest number of *bytes* in it will
get a packet dropped. This means that the packet being enqueued will
pretty much never be dropped; rather a different packet is dropped,
and not necessarily from the same queue. Packets are always dropped
from the head of a queue.
- fq_codel maintains two lists of active queues, called "new" and "old"
queues ("new" and "old" being the terms used in the code). When a
packet is added to a queue that is not currently active, that queue is
added to the list of new queues. This is the source of some subtlety
in the packet scheduling at dequeue time, explained below.
- Most of fq_codel's scheduling is done at packet dequeue time. It
consists of three parts: selecting a queue from which to dequeue a
packet, actually dequeuing it (employing the CoDel algorithm in the
process), and some final bookkeeping.
For the first part, the scheduler first looks at the list of new
queues; for each queue in that list, if that queue has a negative
deficit (i.e. it has already dequeued a quantum of bytes (or more)),
its deficit is increased by one quantum, and the queue is put onto the
end of the list of old queues, and the routine starts again.
Otherwise, that queue is selected for dequeue. If the list of new
queues is empty, the scheduler proceeds down the list of old queues in
the same fashion (checking the deficit, and either selecting the queue
for dequeuing, or increasing the deficit and putting the queue back at
the end of the list).
After having selected a queue from which to dequeue a packet, the
CoDel algorithm is invoked on that queue. This applies the CoDel
control law in the usual fashion, and may discard one or more packets
from the head of the queue, before returning the packet that should be
dequeued (or nothing if the queue is or becomes empty while being
handled by the CoDel algorithm).
Finally, if the CoDel algorithm did not return a packet, the queue is
empty, and the scheduler does one of two things: if the queue selected
for dequeue came from the list of new queues, it is moved to the end
of the list of old queues. If instead it came from the list of old
queues, that queue is removed from the list, to be added back (as a
new queue) the next time a packet arrives that hashes to that queue.
Then, (since no packet was available for dequeue), the whole dequeue
process is restarted from the beginning.
If, instead, the scheduler *did* get a packet back from the CoDel
algorithm, it updates the byte deficit for the selected queue before
returning the packet to the lower layers of the kernel networking
stack for sending.
The new/old queue distinction has a particular consequence for queues
that don't build up more than a quantum bytes before being visited by
the scheduler (as a new queue), *and* then stay empty until the
scheduler comes back around to it in the list of old queues: Such queues
will get removed from the list, and then re-added as a new queue each
time a packet arrives for it, and so will always get priority over
queues that do not empty out each round. Exactly how much data a flow
has to send to keep its queue in this state is somewhat difficult to
reason about, because it depends on both the egress link speed and the
number of concurrent flows. This makes it harder to reason about the
behaviour of fq_codel. However, in practice many things that are
beneficial to have prioritised for typical internet use (ACKs, DNS
lookups, interactive SSH, HTTP requests; and also ICMP pings) *tend* to
fall in this category, which is why fq_codel performs so well for many
practical applications.
For those interested in examining the behaviour of fq_codel in more
detail, the code can be found in your local Linux source tree, in
net/sched/sch_fq_codel.c[4]. While some of it can be somewhat of a
challenge to comprehend, it overall is a very instructive example of a
practical implementation of a queueing algorithm in a modern operating
system.
[1] http://queue.acm.org/detail.cfm?id=2209336
[2] For a description of SFQ, refer to your system man page `man tc-sfq`
or to the paper by Paul McKenney:
http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf
[3] See http://en.wikipedia.org/wiki/Deficit_round_robin
[4] Or online at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/sched/sch_fq_codel.c
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] [Bloat] Describing fq_codel to a layperson
2014-02-06 14:23 [Codel] Describing fq_codel Toke Høiland-Jørgensen
@ 2014-02-09 3:52 ` Rich Brown
2014-02-09 18:45 ` Toke Høiland-Jørgensen
2014-02-09 20:09 ` [Codel] [Bloat] Describing fq_codel Jesper Dangaard Brouer
1 sibling, 1 reply; 7+ messages in thread
From: Rich Brown @ 2014-02-09 3:52 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: codel, bloat
Hi folks,
Serendipity is delightful. Toke had a chance to write a technical definition of fq_codel. *I* recently had occasion to describe fq_codel and bufferbloat to a layperson.
So I'm posting this version to elicit comments and have a plan to post it to the bufferbloat wiki somewhere afterwards.
Rich
-----------------
What is Bufferbloat?
Bufferbloat makes your kids say, "The Internet is slow today, Daddy". It happens because routers and other network equipment buffer (that is, accept for delivery) more data than can be delivered in a timely way. Much of the poor performance and human pain experienced using today’s Internet comes from bufferbloat. Here's an analogy to explain what's going on:
Imagine a ski shop with one employee. That employee handles everything: small purchases, renting skis, installing new bindings, making repairs, etc. He also handles customers in first-come, first-served order, and accepts all the jobs, even if there's already a big backlog. Imagine, too, that he never stops working with a customer until their purchase is complete. He never goes out of order, never pauses a job in the middle, not even to sell a Chapstick.
That's dumb, you say. No store would do that. Their customers - if they had any left - would get really terrible service, and never know when they're going to be served.
Unfortunately, a lot of network routers (both home and commercial) work just like that fictitious ski shop. And the people who use them get terrible service. On a packet-by-packet basis, the router has no notion of whether it's sending a big packet or small, whether there has been a lot of traffic for a particular destination, or whether things are getting backed up.
Since the router no global knowledge of what's happening, it cannot inform a big sender to slow down, or throttle a particular stream of traffic by discarding some data (in the ski shop analogy, sending away a customer who has another long repair job). A dumb router simply "buffers up" the data, expecting it to be sent sooner or later. To make matters worse, in networking (but not in ski shops), if delays get long enough, sometimes the computers resend the data (thinking that the original data must have been lost). These "retransmissions" further increase delay, because there are now two copies of the same data buffered up, waiting to be sent...
This is the genesis of the name "bufferbloat" - the memory buffers within the router get bloated with old, outdated packets. When the router doggedly determines to send that data, it blocks newer sessions from even starting, and everything on the network gets slow.
What's the solution?
The members of the CeroWrt team have been working for the last two years to solve the problem of bufferbloat. We've largely succeeded: the CeroWrt firmware works really well. CeroWrt users no longer see problems with "the internet being slow" even when uploading and downloading files, watching videos, etc.
CeroWrt introduces a new queueing discipline called fq_codel [link to Toke's full technical description] that can detect flows (streams of data between two endpoints) that are using more than their share of the bottleneck link (usually, the connection to the ISP). It works by dividing the traffic into multiple queues, one per flow, and sending the first packet in each queue in round-robin order. (The algorithm is somewhat more involved, so read the full description for details.) fq_codel also measures the time that each packet has been queued. If a packet has been in the queue for "too long", then fq_codel discards it, preventing it from using more than its fair share of bandwidth on the bottleneck.
Wait a minute - discarding packets? Doesn't that make things worse?
It does slow the affected flow, but that is exactly what should happen. If a sender has sent so many packets that they're building up in the queue, then it's fair to offer back pressure for that particular flow by dropping some of its packets.
In the meantime, all the other flows (from the other queues) have their packets sent promptly, since they're not building up a queue and haven't been waiting a long time to be sent. This automatically keeps everything responsive: short packets, and those from low-volume flows automatically get sent first. The big senders, whose packets are dropped, will re-send the data, but at a slower rate, bringing the entire system back into balance.
What about Quality of Service (QoS)? Doesn't that help?
Yes, it helps a bit. If you configure your router for QoS, the router will use that information to prioritize certain packets and send them first, ahead of the bulk traffic that's buffered up. But there are several problems with QoS:
- It's annoying to configure QoS. You have to understand the configuration GUI of the router and manually make the changes. This is something that only a network geek could come to enjoy.
- If you use a new application, QoS may not help you until you adjust the rules to take it into account.
- It doesn't solve the problem of overbuffering. The QoS rules allow the router to send certain packets first. But those buffers from large flows are still queued up, and will be sent at some point, potentially starving out other traffic.
- As a corollary, there's no throttling of the big senders: they don't get prompt feedback that their streams are using more than their fair share of the capacity, so they don’t fall back to a lower rate.
- Finally, QoS doesn't help for the "other direction". It can improve traffic being sent from your local (home) network toward the Internet. But if the equipment at the far end at your DSL or cable provider is bloated (and very often it is), then QoS in your router won't make things any better for traffic coming toward your local network.
The fq_codel and other algorithms in CeroWrt handle all this automatically. The only configuration parameters are what kind of link you have (DSL, Cable, etc.) and the speeds of those links. You don't have to adjust QoS settings or make other adjustments.
Does Bufferbloat affect my network?
Quite possibly - here's one symptom: If the network works well when no one else is using it (early morning, or late at night after everyone else is asleep), but gets really slow when others are "on the net", then you are likely to be suffering from Bufferbloat. Another symptom is a degradation of your voice, video chat, or gaming experience when others are using the network.
Here's a more scientific test: Start a ping test to a reliable host, like google.com. Examine the response times when no one is using the network (again, early morning or late at night.) You will likely see ping times in the 30-100 msec range. Then do the same ping test when things are busy, say running a speed test and up- or down-loadig a big file. If your router is bloated, the response times will often be as much as 10 to 100 times larger. For more details, see the Quick Test for Bufferbloat at: http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloat
What can I do about this?
Two years of network research have paid off: the networks work great at our houses. Our algorithms are being adopted and implemented in operating systems and some commercial network equipment. We are making the code changes available at no charge, and are encouraging all vendors to embrace this code. We are also pushing these changes into the OpenWRT firmware project (http://openwrt.org), so it will be available in many different routers. If your router is bloated (based on the test above), and you’re not willing to try OpenWrt, call your vendor's support line to ask when they're going to fix it. Tell them to read our site. We need the visibility across all kinds of network equipment to convince vendors to solve the problem everywhere.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] [Bloat] Describing fq_codel to a layperson
2014-02-09 3:52 ` [Codel] [Bloat] Describing fq_codel to a layperson Rich Brown
@ 2014-02-09 18:45 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2014-02-09 18:45 UTC (permalink / raw)
To: Rich Brown; +Cc: codel, bloat
[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]
Rich Brown <richb.hanover@gmail.com> writes:
> So I'm posting this version to elicit comments and have a plan to post
> it to the bufferbloat wiki somewhere afterwards.
Hi Rich
I like the writeup, however I'm probably not the best person to validate
how it will sound to an outsider...
One things, though...
> Imagine a ski shop with one employee. That employee handles
> everything: small purchases, renting skis, installing new bindings,
> making repairs, etc. He also handles customers in first-come,
> first-served order, and accepts all the jobs, even if there's already
> a big backlog. Imagine, too, that he never stops working with a
> customer until their purchase is complete. He never goes out of order,
> never pauses a job in the middle, not even to sell a Chapstick.
Sometimes I feel like this describes way too many actual ski shops (or
maybe not ski shops, but certainly other establishments)... I.e. I'm not
sure it is really that obvious of an absurdity as one would think; maybe
stating more explicitly what a ski shop 'ought' to do would help?
-Toke
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] [Bloat] Describing fq_codel
2014-02-06 14:23 [Codel] Describing fq_codel Toke Høiland-Jørgensen
2014-02-09 3:52 ` [Codel] [Bloat] Describing fq_codel to a layperson Rich Brown
@ 2014-02-09 20:09 ` Jesper Dangaard Brouer
2014-02-09 21:36 ` Paul E. McKenney
2014-02-09 21:46 ` Toke Høiland-Jørgensen
1 sibling, 2 replies; 7+ messages in thread
From: Jesper Dangaard Brouer @ 2014-02-09 20:09 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Paul E. McKenney; +Cc: codel, bloat
[-- Attachment #1: Type: text/plain, Size: 7746 bytes --]
(top post)
Thank you Toke for this excellent description, it is really good!
AFAIKR Paul also did a description of fq_codel, but with a focus on the
SFQ part. I actually think, these two descriptions could be combined.
Perhaps Paul can give us a link to his desc?
And where do you plan to put this excellent description (so its more
accessible to mankind)?
--Jesper
On Thu, 06 Feb 2014 15:23:03 +0100
Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> I recently had occasion to describe the scheduling mechanism in fq_codel
> in plain text. I thought this might be useful to have for general
> reference somewhere, so I'm posting it here to elicit comments and have
> a plan to post it to the bufferbloat wiki somewhere afterwards.
>
> See below.
>
> -Toke
>
>
> FQ_CODEL
>
> The fq_codel queueing discipline in Linux is an implementation of a
> somewhat modified stochastic fairness queueing algorithm, with CoDel
> added as an AQM for the individual queues. As such, it consists of two
> parts: the scheduler which selects which queue to dequeue a packet from,
> and the CoDel AQM which works on each of the queues built in to the
> qdisc. The subtleties of fq_codel are mostly in the scheduling part,
> which is the subject of this description. For a description of CoDel,
> refer to Kathy and Van's paper[1].
>
> The interaction between the scheduler and the CoDel algorithm are fairly
> straight forward: At initiation (i.e. at boot, or when fq_codel is first
> installed on the interface, or its parameters are changed), the list of
> queues is initialised so that each queue has a separate set of CoDel
> state variables. By default, 1024 queues are created, and packets are
> hashed into them at enqueue time. Each queue maintains the CoDel state
> variables throughout its lifetime, and so acts the same as the non-fq
> CoDel variant would (including retaining the control law state when the
> queue drains, etc).
>
> As for the scheduler, it is different from a straight-forward conceptual
> SFQ scheduler[2] in a number of respects:
>
> - fq_codel is byte-based, employing a deficit round-robin mechanism[3]
> between queues. The quantum is configurable, but defaults to the
> interface MTU. This means that if one flow sends packets of size
> MTU/3, and another sends MTU-sized packets, the first flow will
> dequeue three packets each time it gets a turn, whereas the second
> flow only dequeues one. This is kept track of by maintaining a byte
> dequeue deficit for each queue, which is first initialised to the
> quantum value, and decreased by the packet size on each dequeue from
> the queue.
>
> - Queue space is shared: there's a global limit on the number of packets
> the queues can hold, but not one per queue. If the space runs out at
> enqueue time, the queue with the largest number of *bytes* in it will
> get a packet dropped. This means that the packet being enqueued will
> pretty much never be dropped; rather a different packet is dropped,
> and not necessarily from the same queue. Packets are always dropped
> from the head of a queue.
>
> - fq_codel maintains two lists of active queues, called "new" and "old"
> queues ("new" and "old" being the terms used in the code). When a
> packet is added to a queue that is not currently active, that queue is
> added to the list of new queues. This is the source of some subtlety
> in the packet scheduling at dequeue time, explained below.
>
> - Most of fq_codel's scheduling is done at packet dequeue time. It
> consists of three parts: selecting a queue from which to dequeue a
> packet, actually dequeuing it (employing the CoDel algorithm in the
> process), and some final bookkeeping.
>
> For the first part, the scheduler first looks at the list of new
> queues; for each queue in that list, if that queue has a negative
> deficit (i.e. it has already dequeued a quantum of bytes (or more)),
> its deficit is increased by one quantum, and the queue is put onto the
> end of the list of old queues, and the routine starts again.
> Otherwise, that queue is selected for dequeue. If the list of new
> queues is empty, the scheduler proceeds down the list of old queues in
> the same fashion (checking the deficit, and either selecting the queue
> for dequeuing, or increasing the deficit and putting the queue back at
> the end of the list).
>
> After having selected a queue from which to dequeue a packet, the
> CoDel algorithm is invoked on that queue. This applies the CoDel
> control law in the usual fashion, and may discard one or more packets
> from the head of the queue, before returning the packet that should be
> dequeued (or nothing if the queue is or becomes empty while being
> handled by the CoDel algorithm).
>
> Finally, if the CoDel algorithm did not return a packet, the queue is
> empty, and the scheduler does one of two things: if the queue selected
> for dequeue came from the list of new queues, it is moved to the end
> of the list of old queues. If instead it came from the list of old
> queues, that queue is removed from the list, to be added back (as a
> new queue) the next time a packet arrives that hashes to that queue.
> Then, (since no packet was available for dequeue), the whole dequeue
> process is restarted from the beginning.
>
> If, instead, the scheduler *did* get a packet back from the CoDel
> algorithm, it updates the byte deficit for the selected queue before
> returning the packet to the lower layers of the kernel networking
> stack for sending.
>
> The new/old queue distinction has a particular consequence for queues
> that don't build up more than a quantum bytes before being visited by
> the scheduler (as a new queue), *and* then stay empty until the
> scheduler comes back around to it in the list of old queues: Such queues
> will get removed from the list, and then re-added as a new queue each
> time a packet arrives for it, and so will always get priority over
> queues that do not empty out each round. Exactly how much data a flow
> has to send to keep its queue in this state is somewhat difficult to
> reason about, because it depends on both the egress link speed and the
> number of concurrent flows. This makes it harder to reason about the
> behaviour of fq_codel. However, in practice many things that are
> beneficial to have prioritised for typical internet use (ACKs, DNS
> lookups, interactive SSH, HTTP requests; and also ICMP pings) *tend* to
> fall in this category, which is why fq_codel performs so well for many
> practical applications.
>
> For those interested in examining the behaviour of fq_codel in more
> detail, the code can be found in your local Linux source tree, in
> net/sched/sch_fq_codel.c[4]. While some of it can be somewhat of a
> challenge to comprehend, it overall is a very instructive example of a
> practical implementation of a queueing algorithm in a modern operating
> system.
>
> [1] http://queue.acm.org/detail.cfm?id=2209336
>
> [2] For a description of SFQ, refer to your system man page `man tc-sfq`
> or to the paper by Paul McKenney:
> http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf
>
> [3] See http://en.wikipedia.org/wiki/Deficit_round_robin
>
> [4] Or online at
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/sched/sch_fq_codel.c
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 230 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] [Bloat] Describing fq_codel
2014-02-09 20:09 ` [Codel] [Bloat] Describing fq_codel Jesper Dangaard Brouer
@ 2014-02-09 21:36 ` Paul E. McKenney
2014-02-09 21:46 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2014-02-09 21:36 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Toke Høiland-Jørgensen, codel, bloat
[-- Attachment #1: Type: text/plain, Size: 8329 bytes --]
On Sun, Feb 09, 2014 at 09:09:38PM +0100, Jesper Dangaard Brouer wrote:
>
> (top post)
>
> Thank you Toke for this excellent description, it is really good!
>
> AFAIKR Paul also did a description of fq_codel, but with a focus on the
> SFQ part. I actually think, these two descriptions could be combined.
> Perhaps Paul can give us a link to his desc?
Please see attached for a tarball. I dropped this effort about a year
ago when it became apparent that I was working to conflicting sets
of requirements. This is not all that unusual, but I had not budgeted
sufficient time to resolve them. I believe that others (e.g., Dave Taht
and Jim Gettys, as well as Toke) have done more since.
Thanx, Paul
> And where do you plan to put this excellent description (so its more
> accessible to mankind)?
>
> --Jesper
>
>
> On Thu, 06 Feb 2014 15:23:03 +0100
> Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> > I recently had occasion to describe the scheduling mechanism in fq_codel
> > in plain text. I thought this might be useful to have for general
> > reference somewhere, so I'm posting it here to elicit comments and have
> > a plan to post it to the bufferbloat wiki somewhere afterwards.
> >
> > See below.
> >
> > -Toke
> >
> >
> > FQ_CODEL
> >
> > The fq_codel queueing discipline in Linux is an implementation of a
> > somewhat modified stochastic fairness queueing algorithm, with CoDel
> > added as an AQM for the individual queues. As such, it consists of two
> > parts: the scheduler which selects which queue to dequeue a packet from,
> > and the CoDel AQM which works on each of the queues built in to the
> > qdisc. The subtleties of fq_codel are mostly in the scheduling part,
> > which is the subject of this description. For a description of CoDel,
> > refer to Kathy and Van's paper[1].
> >
> > The interaction between the scheduler and the CoDel algorithm are fairly
> > straight forward: At initiation (i.e. at boot, or when fq_codel is first
> > installed on the interface, or its parameters are changed), the list of
> > queues is initialised so that each queue has a separate set of CoDel
> > state variables. By default, 1024 queues are created, and packets are
> > hashed into them at enqueue time. Each queue maintains the CoDel state
> > variables throughout its lifetime, and so acts the same as the non-fq
> > CoDel variant would (including retaining the control law state when the
> > queue drains, etc).
> >
> > As for the scheduler, it is different from a straight-forward conceptual
> > SFQ scheduler[2] in a number of respects:
> >
> > - fq_codel is byte-based, employing a deficit round-robin mechanism[3]
> > between queues. The quantum is configurable, but defaults to the
> > interface MTU. This means that if one flow sends packets of size
> > MTU/3, and another sends MTU-sized packets, the first flow will
> > dequeue three packets each time it gets a turn, whereas the second
> > flow only dequeues one. This is kept track of by maintaining a byte
> > dequeue deficit for each queue, which is first initialised to the
> > quantum value, and decreased by the packet size on each dequeue from
> > the queue.
> >
> > - Queue space is shared: there's a global limit on the number of packets
> > the queues can hold, but not one per queue. If the space runs out at
> > enqueue time, the queue with the largest number of *bytes* in it will
> > get a packet dropped. This means that the packet being enqueued will
> > pretty much never be dropped; rather a different packet is dropped,
> > and not necessarily from the same queue. Packets are always dropped
> > from the head of a queue.
> >
> > - fq_codel maintains two lists of active queues, called "new" and "old"
> > queues ("new" and "old" being the terms used in the code). When a
> > packet is added to a queue that is not currently active, that queue is
> > added to the list of new queues. This is the source of some subtlety
> > in the packet scheduling at dequeue time, explained below.
> >
> > - Most of fq_codel's scheduling is done at packet dequeue time. It
> > consists of three parts: selecting a queue from which to dequeue a
> > packet, actually dequeuing it (employing the CoDel algorithm in the
> > process), and some final bookkeeping.
> >
> > For the first part, the scheduler first looks at the list of new
> > queues; for each queue in that list, if that queue has a negative
> > deficit (i.e. it has already dequeued a quantum of bytes (or more)),
> > its deficit is increased by one quantum, and the queue is put onto the
> > end of the list of old queues, and the routine starts again.
> > Otherwise, that queue is selected for dequeue. If the list of new
> > queues is empty, the scheduler proceeds down the list of old queues in
> > the same fashion (checking the deficit, and either selecting the queue
> > for dequeuing, or increasing the deficit and putting the queue back at
> > the end of the list).
> >
> > After having selected a queue from which to dequeue a packet, the
> > CoDel algorithm is invoked on that queue. This applies the CoDel
> > control law in the usual fashion, and may discard one or more packets
> > from the head of the queue, before returning the packet that should be
> > dequeued (or nothing if the queue is or becomes empty while being
> > handled by the CoDel algorithm).
> >
> > Finally, if the CoDel algorithm did not return a packet, the queue is
> > empty, and the scheduler does one of two things: if the queue selected
> > for dequeue came from the list of new queues, it is moved to the end
> > of the list of old queues. If instead it came from the list of old
> > queues, that queue is removed from the list, to be added back (as a
> > new queue) the next time a packet arrives that hashes to that queue.
> > Then, (since no packet was available for dequeue), the whole dequeue
> > process is restarted from the beginning.
> >
> > If, instead, the scheduler *did* get a packet back from the CoDel
> > algorithm, it updates the byte deficit for the selected queue before
> > returning the packet to the lower layers of the kernel networking
> > stack for sending.
> >
> > The new/old queue distinction has a particular consequence for queues
> > that don't build up more than a quantum bytes before being visited by
> > the scheduler (as a new queue), *and* then stay empty until the
> > scheduler comes back around to it in the list of old queues: Such queues
> > will get removed from the list, and then re-added as a new queue each
> > time a packet arrives for it, and so will always get priority over
> > queues that do not empty out each round. Exactly how much data a flow
> > has to send to keep its queue in this state is somewhat difficult to
> > reason about, because it depends on both the egress link speed and the
> > number of concurrent flows. This makes it harder to reason about the
> > behaviour of fq_codel. However, in practice many things that are
> > beneficial to have prioritised for typical internet use (ACKs, DNS
> > lookups, interactive SSH, HTTP requests; and also ICMP pings) *tend* to
> > fall in this category, which is why fq_codel performs so well for many
> > practical applications.
> >
> > For those interested in examining the behaviour of fq_codel in more
> > detail, the code can be found in your local Linux source tree, in
> > net/sched/sch_fq_codel.c[4]. While some of it can be somewhat of a
> > challenge to comprehend, it overall is a very instructive example of a
> > practical implementation of a queueing algorithm in a modern operating
> > system.
> >
> > [1] http://queue.acm.org/detail.cfm?id=2209336
> >
> > [2] For a description of SFQ, refer to your system man page `man tc-sfq`
> > or to the paper by Paul McKenney:
> > http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf
> >
> > [3] See http://en.wikipedia.org/wiki/Deficit_round_robin
> >
> > [4] Or online at
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/sched/sch_fq_codel.c
>
>
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Sr. Network Kernel Developer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
[-- Attachment #2: SFQ2014.02.09a.tgz --]
[-- Type: application/x-gtar-compressed, Size: 428019 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] [Bloat] Describing fq_codel
2014-02-09 20:09 ` [Codel] [Bloat] Describing fq_codel Jesper Dangaard Brouer
2014-02-09 21:36 ` Paul E. McKenney
@ 2014-02-09 21:46 ` Toke Høiland-Jørgensen
2014-02-10 16:06 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2014-02-09 21:46 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Paul E. McKenney, codel, bloat
[-- Attachment #1: Type: text/plain, Size: 422 bytes --]
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> Thank you Toke for this excellent description, it is really good!
Thanks!
> And where do you plan to put this excellent description (so its more
> accessible to mankind)?
Planning to stick it on the bufferbloat wiki somewhere, once I deem
people have had enough of a chance to comment. So probably sometime in
the coming week. Will link it here once I do :)
-Toke
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Codel] [Bloat] Describing fq_codel
2014-02-09 21:46 ` Toke Høiland-Jørgensen
@ 2014-02-10 16:06 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2014-02-10 16:06 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Paul E. McKenney, codel, bloat
[-- Attachment #1: Type: text/plain, Size: 479 bytes --]
Toke Høiland-Jørgensen <toke@toke.dk> writes:
> Planning to stick it on the bufferbloat wiki somewhere, once I deem
> people have had enough of a chance to comment. So probably sometime in
> the coming week. Will link it here once I do :)
Right, so stuck it on the wiki pretty much as it was posted to the list.
Link:
http://www.bufferbloat.net/projects/codel/wiki/Technical_description_of_FQ_CoDel
-- also linked from the front page of the CoDel wiki. :)
-Toke
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 489 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-02-10 16:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-06 14:23 [Codel] Describing fq_codel Toke Høiland-Jørgensen
2014-02-09 3:52 ` [Codel] [Bloat] Describing fq_codel to a layperson Rich Brown
2014-02-09 18:45 ` Toke Høiland-Jørgensen
2014-02-09 20:09 ` [Codel] [Bloat] Describing fq_codel Jesper Dangaard Brouer
2014-02-09 21:36 ` Paul E. McKenney
2014-02-09 21:46 ` Toke Høiland-Jørgensen
2014-02-10 16:06 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox