[Codel] [Bloat] Describing fq_codel to a layperson
Rich Brown
richb.hanover at gmail.com
Sat Feb 8 22:52:42 EST 2014
Hi folks,
Serendipity is delightful. Toke had a chance to write a technical definition of fq_codel. *I* recently had occasion to describe fq_codel and bufferbloat to a layperson.
So I'm posting this version to elicit comments and have a plan to post it to the bufferbloat wiki somewhere afterwards.
Rich
-----------------
What is Bufferbloat?
Bufferbloat makes your kids say, "The Internet is slow today, Daddy". It happens because routers and other network equipment buffer (that is, accept for delivery) more data than can be delivered in a timely way. Much of the poor performance and human pain experienced using today’s Internet comes from bufferbloat. Here's an analogy to explain what's going on:
Imagine a ski shop with one employee. That employee handles everything: small purchases, renting skis, installing new bindings, making repairs, etc. He also handles customers in first-come, first-served order, and accepts all the jobs, even if there's already a big backlog. Imagine, too, that he never stops working with a customer until their purchase is complete. He never goes out of order, never pauses a job in the middle, not even to sell a Chapstick.
That's dumb, you say. No store would do that. Their customers - if they had any left - would get really terrible service, and never know when they're going to be served.
Unfortunately, a lot of network routers (both home and commercial) work just like that fictitious ski shop. And the people who use them get terrible service. On a packet-by-packet basis, the router has no notion of whether it's sending a big packet or small, whether there has been a lot of traffic for a particular destination, or whether things are getting backed up.
Since the router no global knowledge of what's happening, it cannot inform a big sender to slow down, or throttle a particular stream of traffic by discarding some data (in the ski shop analogy, sending away a customer who has another long repair job). A dumb router simply "buffers up" the data, expecting it to be sent sooner or later. To make matters worse, in networking (but not in ski shops), if delays get long enough, sometimes the computers resend the data (thinking that the original data must have been lost). These "retransmissions" further increase delay, because there are now two copies of the same data buffered up, waiting to be sent...
This is the genesis of the name "bufferbloat" - the memory buffers within the router get bloated with old, outdated packets. When the router doggedly determines to send that data, it blocks newer sessions from even starting, and everything on the network gets slow.
What's the solution?
The members of the CeroWrt team have been working for the last two years to solve the problem of bufferbloat. We've largely succeeded: the CeroWrt firmware works really well. CeroWrt users no longer see problems with "the internet being slow" even when uploading and downloading files, watching videos, etc.
CeroWrt introduces a new queueing discipline called fq_codel [link to Toke's full technical description] that can detect flows (streams of data between two endpoints) that are using more than their share of the bottleneck link (usually, the connection to the ISP). It works by dividing the traffic into multiple queues, one per flow, and sending the first packet in each queue in round-robin order. (The algorithm is somewhat more involved, so read the full description for details.) fq_codel also measures the time that each packet has been queued. If a packet has been in the queue for "too long", then fq_codel discards it, preventing it from using more than its fair share of bandwidth on the bottleneck.
Wait a minute - discarding packets? Doesn't that make things worse?
It does slow the affected flow, but that is exactly what should happen. If a sender has sent so many packets that they're building up in the queue, then it's fair to offer back pressure for that particular flow by dropping some of its packets.
In the meantime, all the other flows (from the other queues) have their packets sent promptly, since they're not building up a queue and haven't been waiting a long time to be sent. This automatically keeps everything responsive: short packets, and those from low-volume flows automatically get sent first. The big senders, whose packets are dropped, will re-send the data, but at a slower rate, bringing the entire system back into balance.
What about Quality of Service (QoS)? Doesn't that help?
Yes, it helps a bit. If you configure your router for QoS, the router will use that information to prioritize certain packets and send them first, ahead of the bulk traffic that's buffered up. But there are several problems with QoS:
- It's annoying to configure QoS. You have to understand the configuration GUI of the router and manually make the changes. This is something that only a network geek could come to enjoy.
- If you use a new application, QoS may not help you until you adjust the rules to take it into account.
- It doesn't solve the problem of overbuffering. The QoS rules allow the router to send certain packets first. But those buffers from large flows are still queued up, and will be sent at some point, potentially starving out other traffic.
- As a corollary, there's no throttling of the big senders: they don't get prompt feedback that their streams are using more than their fair share of the capacity, so they don’t fall back to a lower rate.
- Finally, QoS doesn't help for the "other direction". It can improve traffic being sent from your local (home) network toward the Internet. But if the equipment at the far end at your DSL or cable provider is bloated (and very often it is), then QoS in your router won't make things any better for traffic coming toward your local network.
The fq_codel and other algorithms in CeroWrt handle all this automatically. The only configuration parameters are what kind of link you have (DSL, Cable, etc.) and the speeds of those links. You don't have to adjust QoS settings or make other adjustments.
Does Bufferbloat affect my network?
Quite possibly - here's one symptom: If the network works well when no one else is using it (early morning, or late at night after everyone else is asleep), but gets really slow when others are "on the net", then you are likely to be suffering from Bufferbloat. Another symptom is a degradation of your voice, video chat, or gaming experience when others are using the network.
Here's a more scientific test: Start a ping test to a reliable host, like google.com. Examine the response times when no one is using the network (again, early morning or late at night.) You will likely see ping times in the 30-100 msec range. Then do the same ping test when things are busy, say running a speed test and up- or down-loadig a big file. If your router is bloated, the response times will often be as much as 10 to 100 times larger. For more details, see the Quick Test for Bufferbloat at: http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloat
What can I do about this?
Two years of network research have paid off: the networks work great at our houses. Our algorithms are being adopted and implemented in operating systems and some commercial network equipment. We are making the code changes available at no charge, and are encouraging all vendors to embrace this code. We are also pushing these changes into the OpenWRT firmware project (http://openwrt.org), so it will be available in many different routers. If your router is bloated (based on the test above), and you’re not willing to try OpenWrt, call your vendor's support line to ask when they're going to fix it. Tell them to read our site. We need the visibility across all kinds of network equipment to convince vendors to solve the problem everywhere.
More information about the Codel
mailing list