[Bloat] Seen in passing: A little bump in the wire that makes your Internet faster

Wed Aug 8 09:15:22 EDT 2018

A report on addressing bufferbloat in a monopoly telco DSL lashup.

--dave

-------- Forwarded Message --------
Subject: 	A little bump in the wire that makes your Internet faster
Date: 	Wed, 08 Aug 2018 10:32:47 +0000
From: 	apenwarr - Business is Programming <>

http://apenwarr.ca/log/?m=201808#08
A little bump in the wire that makes your Internet faster

My parents live in a rural area, where the usual monopolist Internet 
service provider provides the usual monopolist Internet service: DSL, 
really far from the exchange point, very /very/ asymmetric, and with 
insanely oversized buffers (ie. bufferbloat), especially in the upstream 
direction. The result is that, basically, if you tried to browse the web 
while uploading anything, it pretty much didn't work at all.

I wrote about the causes of these problems (software, of course) in my 
bufferbloat rant from 2011 <?m=201101#10>. For some reason, there's been 
a recent resurgence of interest in that article. Upon rereading it, I 
(re-)discovered that it's very... uh... stream-of-consciousness. I find 
it interesting that some people like it so much. Even I barely 
understand what I wrote anymore. Also, it's now obsolete, because there 
are much better solutions to the problems than there used to be, so even 
people who understand it are not going to get the best possible results. 
Time for an update!

*The Challenge*

I don't live in the same city as my parents, and I won't be back for a 
few months, but I did find myself with some spare time and a desire to 
pre-emptively make their Internet connection more usable for next time I 
visited. So, I wanted to build a device (a "bump in the wire 
<https://en.wikipedia.org/wiki/Bump-in-the-wire>) that:

  * Needs zero configuration at install time
  * Does not interfere with the existing network (no DHCP, firewall,
    double NAT, etc)
  * Doesn't reduce security (no new admin ports in the data path)
  * Doesn't need periodic reboots
  * Actually solves their bufferbloat problem

Let me ruin the surprise: it works. I'll describe the actual setup 
<#openwrt> down below. But first, we have to define "works."

*This is an improvement, I promise!*

Here's the fast.com <http://fast.com/> test result before we installed 
the Bump.

(Side note: there are a lot of speedtests out there. I like fast.com for 
two reasons. First, they have an easy-to-understand bufferbloat test. 
Second, their owner has strong incentives to test actual Internet speeds 
including peering 
<https://www.cnet.com/news/fcc-whats-up-with-those-netflix-isp-peering-deals/>, 
and to bypass various monopolistic ISPs' various speedtest-cheating 
traffic shaping techniques.)

And here's what it looked like after we added the Bump:

...okay, so you're probably thinking, hey, that big number is lower now! 
It got worse! Yes. In a very narrow sense, it did get worse. But in 
/most/ senses (including all the numbers in smaller print), it got 
better. And even the big number is not as much worse as it appears at first.

Unfortunately, it would take far too long to explain exactly how these 
numbers interact and why it matters. But luckily for you, I'm on vacation!

*Download speed is the wrong measurement*

In my wifi data presentation <https://apenwarr.ca/log/?m=201603#28> from 
2016, I spent a lot of time exploring what makes an Internet connection 
feel "fast." In particular, I showed a slide from an FCC report from 
2015 
<https://www.fcc.gov/reports-research/reports/measuring-broadband-america/measuring-broadband-america-2015#_Toc431901638> 
(back when the FCC was temporarily anti-monopolist):

What's that slide saying? Basically, that beyond 20 Mbps or so, typical 
web page load times stop improving.^1 Sure, if you're downloading large 
files, a faster connection will make it finish sooner.^2 But most people 
spend most of their time just browsing, not downloading.

Web page load times are limited by things other than bandwidth, 
including javascript parsing time, rendering time, and (most relevant to 
us here) round trip times to the server. (Most people use "lag", 
"latency", and "round trip time" to mean about the same thing, so we'll 
do that here too.) Loading a typical web page requires several round 
trips: to one or more DNS servers, then the TCP four-way handshake, then 
SSL negotiation, then grabbing the HTML, then grabbing the javascript it 
points to, then grabbing whatever other files are requested by the HTML 
and javascript. If that's, say, 10 round trips, at 100ms each, you can 
see how a typical page would take at least a second to load, even with 
no bandwidth constraints. (Maybe there are fewer round trips needed, 
each with lower latencies; same idea.)

So that's the first secret: if your page load times are limited by round 
trip time, and round trip time goes from 80ms (or 190ms) to 31ms (or 
42ms), then you could see a 2x (or 4.5x) improvement in page load speed, 
just from cutting latency. Our Bump achieved that - which I'll explain 
in a moment.

It also managed to /improve/ the measured uplink speed in this test. How 
is that possible? Well, probably several interconnected reasons, but a 
major one is: TCP takes longer to get up to speed when the round trip 
time is longer. (One algorithm for this is called TCP slow start 
<https://en.wikipedia.org/wiki/TCP_congestion_control#Slow_start>.) And 
it has even more trouble converging if the round trip time is variable, 
like it was in the first test above. The Bump makes round trip time 
lower, but also more consistent, so it improves TCP performance in both 
ways.

*But how does it work?*

Alert readers will have noticed that by adding a Bump in the wire, that 
is, by adding an /extra/ box and thus extra overhead, I have managed to 
make latency /less/. Alert readers will hate this, as they should, 
because it's called "negative latency," and alert readers know that 
there is no such thing. (I tried to find a good explanation of it on the 
web, but all the pages I could find sucked. I guess that's fair, for a 
concept that does not exist. Shameless self-plug then: I did write a fun 
article involving this topic back in 2009 
<https://apenwarr.ca/log/?m=200910#29> about work we did back in 2003. 
Apparently I've been obsessing over this for a long time.)

So, right, the impossible. As usual, the impossible is a magic trick. 
Our Bump doesn't subtract latency; it just tricks another device - in 
this case the misconfigured DSL router provided by the monopolistic ISP 
- into adding /less/ latency, by precisely adding a bit of its own. The 
net result is less than the DSL router on its own.

*Bufferbloat (and chocolate)*

Stop me if you've heard this one before. Most DSL routers and cable 
modems have buffers that were sized to achieve the maximum steady-state 
throughput on a very fast connection - the one that the monopolistic ISP 
benchmarks on, for its highest priced plan. To max out the speed in such 
a case, you need a buffer some multiple of the "bandwidth delay 
product," (BDP) which is an easier concept than it sounds like: just 
multiply the bandwidth by the round trip time (delay). So if you have 
100ms round trip time and your upstream is about 25 Mbps = ~2.5 
MBytes/sec, then your BDP is 2.5 Mbytes/sec * 0.1sec = 2.5 MBytes. If 
you think about it, the BDP is "how much data fits in the wire," the 
same way a pipe's capacity is how much water fits in the pipe. For 
example, if a pipe spits out 1L of water per second, and it takes 10 
seconds for water to traverse the pipe, then the pipe contains 1L x 10 
seconds = 10L.

Anyway, the pipe is the Internet^3 , and we can't control the 
bandwidth-delay product of the Internet from our end. People spend a lot 
of time trying to optimize that, but they get paid a lot, and I'm on 
vacation, and they don't let me fiddle with their million-dollar 
equipment, so too bad. What I /can/ control is the equipment that feeds 
into the pipe: my router, or, in our plumbing analogy, the drain.

Duck in a bathtub drain vortex,
via pinterest

You know how when you drain the bathtub, near the end it starts making 
that sqlrshplshhh sucking noise? That's the sound of a pipe that's not 
completely full. Now, after a nice bath that sound is a key part of the 
experience and honestly makes me feel disproportionately gleeful, but it 
also means your drain is underutilized. Err, which I guess is a good 
thing for the environment. Uh.

Okay, new analogy: oil pipelines! Wait, those are unfashionable now too. 
Uh... beer taps... no, apparently beer is bad for diversity or 
something... chocolate fountains!

Chocolate fountain via indiamart 
<https://www.indiamart.com/proddetail/chocolate-fountain-machine-15067222530.html>

Okay! Let's say you rented one of those super fun chocolate fountain 
machines for a party: the ones where a pool of delicious liquid 
chocolate goes down a drain at the bottom, and then gets pumped back up 
to the top, only to trickle gloriously down a chocolate waterfall (under 
which you can bathe various fruits or whatever) and back into the pool, 
forever, until the party is over and the lucky, lucky party consultants 
get to take it home every night to their now-very-diabetic children.

Mmmm, tasty, tasty chocolate. What were we talking about again?

Oh right. The drain+pump is the Internet. The pool at the bottom is the 
buffer in your DSL router. And the party consultant is, uh, me, 
increasingly sure that I've ended up on the wrong side of this analogy, 
because you can't eat bits, and now I'm hungry.

Aaaaanyway, a little known fact about these chocolate fountain machines 
is that they stop dripping chocolate before the pool completely empties. 
In order to keep the pump running at capacity, there needs to be enough 
chocolate in the pool to keep it fully fed. In an ideal world, the 
chocolate would drip into the pool and then the pump at a perfectly 
constant rate, so you could adjust the total amount of chocolate in the 
system to keep the pool+pump content at the absolute minimimum, which is 
the bandwidth-delay product (FINALLY HE IS BACK ON TOPIC). But that 
would require your chocolate to be far too watery; thicker chocolate is 
more delicious (SIGH), but has the annoying habit of dripping in clumps 
(as shown in the picture) and not running smoothly into the drain unless 
the pool has extra chocolate to push it along. So what we do is to make 
the chocolate thicker and clumpier (not negotiable) and so, to keep the 
machine running smoothly, we have to add extra chocolate so that the 
pool stays filled, and our children thus become more diabetic than would 
otherwise be necessary.

Getting back to the math of the situation, if you could guarantee 
perfectly smooth chocolate (packet) flow, the capacity of the system 
could be the bandwidth-delay product, which is the minimum you need in 
order to keep the chocolate (bits) dripping at the maximum speed. If you 
make a taller chocolate tower (increase the delay), you need more 
chocolate, because the BDP increases. If you supersize your choco-pump 
(get a faster link), it moves the chocolate faster, so you need more 
chocolate, because the BDP increases. And if your chocolate is more 
gloppy (bursty traffic), you need more chocolate (bits of buffer) to 
make sure the pump is always running smoothly.

Moving back into pure networking (FINALLY), we have very little control 
over the burstiness of traffic. We generally assume it follows some 
statistical distribution, but in any case, while there's an average flow 
rate, the flow rate will always fluctuate, and sometimes it fluctuates 
by a lot. That means you might receive very little traffic for a while 
(draining your buffer aka chocolate pool) or you might get a big burst 
of traffic all at once (flooding your buffer aka chocolate pool). 
Because of a phenomenon called self-similarity 
<https://www.cse.wustl.edu/~jain/cse567-06/ftp/traffic_models1/#self-similar>, 
you will often get the big bursts near the droughts, which means your 
pool will tend to fill up and empty out, or vice versa.

(Another common analogy for network traffic is road traffic. When a road 
is really busy, car traffic naturally arranges itself into bursts 
<https://www.researchgate.net/publication/245307183_Self-Similar_Characteristics_of_Vehicle_Arrival_Pattern_on_Highways>, 
just like network traffic does.)

Okay! So your router is going to receive bursts of traffic, and the 
amount of data in transit will fluctuate. To keep your uplink fully 
utilized, there must always be 1 BDP of traffic in the Internet link 
(the round trip from your router to whatever server and back). To fill 
the Internet uplink, you need to have a transmit queue in the router 
with packets. Because the packets arrive in bursts, you need to keep 
that transmit queue nonempty: there's an ideal fill level so that it 
(almost) never empties out, but so our children don't get unnecessarily 
diabetic, um, I mean, so that our traffic is not unnecessarily delayed.

An empty queue isn't our only concern: the router has limited memory. If 
the queue memory fills up because of a really large incoming burst, then 
the only thing we can do is throw away packets, either the newly-arrived 
ones ("tail drop") or some of the older ones ("head drop" or more 
generally, "active queue management").

When we throw away packets, TCP slows down. When TCP slows down, you get 
slower speedtest results. When you get slower speedtest results, and 
you're a DSL modem salesperson, you sell fewer DSL modems. So what do we 
do? We add more RAM to DSL modems so hopefully the queue /never/ fills 
up.^4 The DSL vendors who don't do this, get a few percent slower speeds 
in the benchmarks, so nobody buys their DSL modem. Survival of the fittest!

...except, as we established earlier, that's the wrong benchmark. If 
customers would time page load times instead of raw download speeds, 
shorter buffers would be better. But they don't, unless they're the FCC 
in 2015, and we pay the price. (By the way, if you're an ISP, use better 
benchmarks 
<https://www.bufferbloat.net/projects/codel/wiki/RRUL_test_suite/>! 
Seriously.)

So okay, that's the (very long) story of what went wrong. That's 
"bufferbloat 
<https://www.bufferbloat.net/projects/bloat/wiki/Introduction/>." How do 
we fix it?

*"Active" queue management*

Imagine for a moment that we're making DSL routers, and we want the best 
of both worlds: an "unlimited" queue so it never gets so full we have to 
drop packets, and the shortest possible latency. (Now we're in the realm 
of pure fiction, because real DSL router makers clearly don't care about 
the latter, but bear with me for now. We'll get back to reality later.)

What we want is to have lots of /space/ in the queue - so that when a 
really big burst happens, we don't have to drop packets - but for the 
/steady state/ length of the queue to be really short.

But that raises a question. Where does the steady state length of the 
queue come from? We know why a queue can't be mainly empty - because we 
wouldn't have enough packets to keep the pipe full - and we know that 
the ideal queue utilization has something to do with the BDP and the 
burstiness. But who controls the rate of /incoming/ traffic into the router?

The answer: nobody, directly. The Internet uses a very weird distributed 
algorithm (or family of algorithms) called "TCP congestion control 
<https://en.wikipedia.org/wiki/TCP_congestion_control>." The most common 
TCP congestion controls (Reno and CUBIC) will basically just keep 
sending faster and faster until packets start getting dropped. Dropped 
packets, the thinking goes, mean that there isn't enough capacity so 
we'd better slow down. (This is why, as I mentioned above, TCP slows 
down when packets get dropped. It's designed that way.)

Unfortunately, a side effect of this behaviour is that the obvious dumb 
queue implementation - FIFO - will always be full. That's because the 
obvious dumb router doesn't drop packets until the queue is full. TCP 
doesn't slow down until packets are dropped,^5 so it doesn't slow down 
until the queue is full. If the queue is not full, TCP will speed up 
until packets get dropped.^6

So, all these TCP streams are sending as fast as they can until packets 
get dropped, and that means our queue fills up. What can we do? Well, 
perversely... we can drop packets /before/ our queue fills up. As far as 
I know, the first proposal of this idea was Random Early Detection (RED) 
<http://www.icir.org/floyd/papers/red/red.html>, by Sally Floyd and Van 
Jacobson. The idea here is that we calculate the ideal queue utilization 
(based on throughput and burstiness), then drop more packets if we 
exceed that length, and fewer packets if we're below that length.

The only catch is that it's super hard to calculate the ideal queue 
utilization. RED works great if you know that value, but nobody ever 
does. I think I heard that Van Jacobson later proved that it's 
impossible to know that value, which explains a lot. Anyway, this led to 
the development of Controlled Delay (CoDel) 
<https://en.wikipedia.org/wiki/CoDel>, by Kathleen Nichols and Van 
Jacobson. Instead of trying to figure out the ideal queue size in 
packets, CoDel just sees how long it takes for packets to traverse the 
queue. If it consistently takes "too long," then it starts dropping 
packets, which signals TCP to slow down, which shortens the average 
queue length, which means a shorter delay. The cool thing about this 
design is it's nearly configuration-free: "too long," in milliseconds, 
is pretty well defined no matter how fast your link is. (Note: CoDel has 
a lot of details I'm skipping over here. Read the research paper if you 
care.)

Anyway, sure enough, CoDel really works, and you don't need to configure 
it. It produces the best of both worlds: typically short queues that can 
absorb bursts. Which is why it's so annoying that DSL routers still 
don't use it. Jerks. Seriously.

*Flow queueing (FQ)*

A discussion on queue management wouldn't be complete without a 
discussion about flow queueing (FQ), the second half of the now very 
popular (except among DSL router vendors) fq_codel magic combination.

CoDel is a very exciting invention that should be in approximately every 
router, because it can be implemented in hardware, requires almost no 
extra memory, and is very fast. But it does have some limitations: it 
takes a while to converge, and it's not really "fair"^7 . Burstiness in 
one stream (or ten streams) can increase latency for another, which 
kinda sucks.

Imagine, for example, that I have an ssh session running. It uses almost 
no bandwidth: most of the time it just goes as fast as I can type, and 
no faster. But I'm also running some big file transfers, both upload and 
download, and that results in an upload queue that has something to do 
with the BDP and burstiness of the traffic, which could build up to 
hundreds of extra milliseconds. If the big file transfers /weren't/ 
happening, my queue would be completely empty, which means my ssh 
traffic would get through right away, which would be optimal (just the 
round trip time, with no queue delay).

A naive way to work around this is prioritization: whenever an ssh 
packet arrives, put it at the front of the queue, and whenever a "bulk 
data" packet arrives, put it at the end of the queue. That way, ssh 
never has to wait. There are a few problems with that method though. For 
example, if I use scp to copy a large file /over/ my ssh session, then 
that file transfer takes precedence over everything else. Oops. If I use 
ssh on a different port, there's no way to tag it. And so on. It's very 
brittle.

FQ tries to give you (nearly) the same low latency, even on a busy link, 
with no special configuration. To make a long story short, it keeps a 
separate queue for every active flow (or stream), then alternates 
"fairly"^7 between them. Simple round-robin would work pretty well, but 
they take it one step further, detecting so-called "fat" flows (which 
send as fast as they can) and "thin" flows (which send slower than they 
can) and giving higher priority to the thin ones. An interactive ssh 
session is a thin flow; an scp-over-ssh file transfer is a fat flow.

And then you put CoDel on each of the separate FQ queues, and you get 
Linux's fq_codel, which works really well.

Incidentally, it turns out that FQ alone - forget about CoDel or any 
other active queue management - gets you most of the benefits of CoDel, 
plus more. You have really long queues for your fat flows, but the thin 
flows don't care. The CoDel part still helps (for example, if you're 
doing a videoconference, you really want the latency inside that one 
video stream to be as low as possible; and TCP always works better with 
lower latency), and it's cheap, so we include it. But FQ has very 
straightforward benefits that are hard to resist, as long as you and FQ 
agree on what "fairness"^7 means.

FQ is a lot more expensive than CoDel: it requires you to maintain more 
queues - which costs more memory and CPU time and thrashes the cache 
more - and you have to screw around with hash table algorithms, and so 
on. As far as I know, nobody knows how to implement FQ in hardware, so 
it's not really appropriate for routers running at the limit of their 
hardware capacity. This includes super-cheap home routers running 
gigabit ports, or backbone routers pushing terabits. On the other hand, 
if you're limited mainly by wifi (typically much less than a gigabit) or 
a super slow DSL link, the benefits of FQ outweigh its costs.^8

*Back to the Bump*

Ok, after all that discussion about CoDel and FQ and fq_codel, you might 
have forgotten that this whole exercise hinged on the idea that we were 
making DSL routers, which we aren't, but if we were, we could really cut 
down that latency. Yay! Except that's not us, it's some hypothetical 
competent DSL router manufacturer.

I bet you're starting to guess what the Bump is, though, right? You 
insert it between your DSL modem and your LAN, and it runs fq_codel, and 
it fixes all the queuing, and life is grand, right?

Well, almost. The problem is, the Bump has two ethernet ports, the LAN 
side and the WAN side, and they're both really fast (in my case, 100 
Mbps ethernet, but they could be gigabit ethernet, or whatever). So the 
data comes in at 100 Mbps, gets enqueued, then gets dequeued at 100 
Mbps. If you think about it for a while, you'll see this means the queue 
length is always 0 or 1, which is... really short. No bufferbloat there, 
which means CoDel won't work, and no queue at all, which means there's 
nothing for FQ to prioritize either.

What went wrong? Well, we're missing one trick. We have to release the 
packets out the WAN port (toward the DSL modem) more slowly. Ideally, we 
want to let them out perfectly smoothly at exactly the rate that the DSL 
modem can transmit them over the DSL link. This will allow the packets 
to enqueue in the Bump instead, where we can fq_codel them, and will 
leave the DSL modem's dumb queue nearly empty. (Why can that queue be 
empty without sacrificing DSL link utilization? Because the /burstiness/ 
going into the DSL modem is near zero, thanks to our smooth release of 
packets from the Bump. Remember our chocolate fountain: if the chocolate 
were perfectly smooth, we wouldn't need a pool of chocolate at the 
bottom. There would always be exactly the right amount of chocolate to 
keep the pump going.)

Slowing down the packet outflow from the Bump is pretty easy using 
something called a token bucket filter (tbf). But it turns out that 
nowadays there's a new thing called "cake" which is basically 
fq_codel+tbf combined 
<https://www.bufferbloat.net/projects/codel/wiki/Cake/>. Combining them 
has some advantages that I don't really understand, but one of them is 
that it's really easy to set up. You just load the cake qdisc, tell it 
the upload and download speeds, and it does the magic. Apparently it's 
also less bursty and takes less CPU. So use that.

The only catch is... what upload/download speeds should we give to cake? 
Okay, I cheated for that one. I just asked my dad what speed his DSL 
link goes in real life, and plugged those in. (Someday I want to build a 
system that can calculate this automatically, but... it's tricky.)

*But what about the downstream?*

Oh, you caught me! All that stuff was talking about the upstream 
direction. Admittedly, on DSL links, the upstream direction is usually 
the worst, because it's typically about 10x slower than the downstream, 
which means upstream bufferbloat problems are about 10x worse than 
downstream. But of course, not to be left out, the people making the big 
heavy multi-port DSL equipment at the ISP added plenty of bufferbloat 
too. Can we fix that?

Kind of. I mean, ideally they'd get a Bump over on their end, between 
the ISP and their DSL megarouter, which would manage the uplink's queue. 
Or, if we're dreaming anyway, the surprisingly competent vendor of the 
DSL megarouter would just include fq_codel, or at least CoDel, and they 
wouldn't need an extra Bump. Fat chance.

It turns out, though, that if you're crazy enough, you can almost make 
it work in the downstream direction. There are two catches: first, FQ is 
pretty much impossible (the downstream queue is just one queue, not 
multiple queues, so tough). And second, it's a pretty blunt instrument. 
What you can do is throw away packets /after/ they've traversed the 
downstream queue, a process called "policing" (as in, we punish your 
stream for breaking the rules, rather than just "shaping" all streams so 
that they follow the rules). With policing, the best you can do is 
detect that data is coming in too fast, and start dropping packets to 
slow it down. Unfortunately, the CoDel trick - dropping traffic only if 
the queue is persistently too long - doesn't work, because on the 
receiving side, you don't know how big the queue is. When you get a 
packet from the WAN side, you just send it to the LAN side, and there's 
no bottleneck, so your queue is always empty. You have to resort to just 
throwing away packets whenever the incoming rate is even /close/ to the 
maximum. That is, you have to police to a rate somewhat slower than the 
DSL modem's downlink speed.

Whereas in the upload direction, you could use, say, 99.9% of the upload 
rate and still have an empty queue on the DSL router, you don't have the 
precise measurements needed for that in the download direction. In my 
experience you have to use 80-90%.

That's why the download speed in the second fast.com test at the top of 
this article was reduced from the first test: I set the shaping rate 
pretty low. (I think I set it /too/ low, because I wanted to ensure it 
would cut the latency. I had to pick some guaranteed-to-work number 
before shipping the Bump cross-country to my parents, and I only got one 
chance. More tuning would help.)

*Phew!*

I know, right? But, assuming you read all that, now you know how the 
Bump works. All that's left is learning how to build one.

*BYOB (Build Your Own Bump)*

Modern Linux already contains cake, which is almost all you need. So any 
Linux box will do, but the obvious choice is a router where you install 
openwrt. I used a D-Link DIR-825 because I didn't need it to go more 
than 100 Mbps (that's a lot faster than a 5 Mbps DSL link) and I liked 
the idea of a device with 0% proprietary firmware. But basically any 
openwrt hardware will work, as long as it has at least two ethernet ports.

You need a sufficiently new version of openwrt. I used 18.06.0. From 
there, install the SQM packages, as described in the openwrt wiki 
<https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm>.

*Setting up the cake queue manager*

This part is really easy: once the SQM packages are installed in 
openwrt, you just activate them in the web console. First, enable SQM 
like this:

In the Queue Discipline tab, make sure you're using cake instead of 
whatever the overcomplicated and mostly-obsolete default is:

(You could mess with the Link Layer Adaptation tab, but that's mostly 
for benchmark twiddlers. You're unlikely to notice if you just set your 
download speed to about 80%, and upload speed to about 90%, of the 
available bandwidth. You should probably also avoid the "advanced" 
checkboxes. I tried them and consistently made things worse.)

If you're boring, you now have a perfectly good wifi/ethernet/NAT router 
that happens to have awesome queue management. Who needs a Bump? Just 
throw away your old wifi/router/firewall and use this instead, attached 
to your DSL modem.

*Fancy bridge mode*

...On the other hand, if, like me, you're not boring, you'll want to 
configure it as a bridge, so that nothing else about the destination 
network needs to be reconfigured when you install it. This approach just 
feels more magical, because you'll have a physical box that produces 
negative latency. It's not as cool if the negative and positive 
latencies are added together all in one box; that's just latency.

What I did was to configure the port marked "4" on the DIR-825 to talk 
to its internal network (with a DHCP server), and configure the port 
marked "1" to bridge directly to the WAN port. I disabled ports 2 and 3 
to prevent bridging loops during installation.

To do this, I needed two VLANs, like this:

(Note: the DIR-825 labels have the ports in the opposite order from 
openwrt. In this screenshot, port LAN4 is on VLAN1, but that's labelled 
"1" on the physical hardware. I wanted to be able to say "use ports 1 
and WAN" when installing, and reserve port 4 only for configuration 
purposes, so I chose to go by the hardware labels.)

Next, make sure VLAN2 (aka eth0.2) is /not/ bridged to the wan port 
(it's the management network, only for configuring openwrt):

And finally, bridge VLAN1 (aka eth0.1) with the wan port:

You may need to reboot to activate the new settings.

*Footnotes*

^1 Before and since that paper in 2015, many many people have been 
working on cutting the number of round trips, not just the time per 
round trip. Some of the recent improvements include TCP fast open 
<https://en.wikipedia.org/wiki/TCP_Fast_Open>, TLS session resumption 
<https://hpbn.co/transport-layer-security-tls/#tls-session-resumption>, 
and QUIC <https://en.wikipedia.org/wiki/QUIC> (which opens encrypted 
connections in "zero" round trips). And of course, javascript and 
rendering engines have both gotten faster, cutting the other major 
sources of page load times. (Meanwhile, pages have continued getting 
larger 
<https://www.wired.com/2016/04/average-webpage-now-size-original-doom/>, 
sigh.) It would be interesting to see an updated version of the FCC's 
2015 paper to see if the curve has changed.

^2 Also, if you're watching videos, a faster connection will improve 
video quality (peaking at about 5 Mbps/stream for an 1080p stream or 25 
Mbps/stream for 4K <https://help.netflix.com/en/node/306>, in Netflix's 
case). But even a 20 Mbps Internet connection will let you stream four 
HD videos at once, which is more than most people usually need to do.

^3 We like to make fun of politicians, but it's actually very accurate 
to describe the Internet as a "series of tubes," albeit virtual ones.

^4 A more generous interpretation is that DSL modems end up with a queue 
size calculated using a reasonable formula, but for one particular use 
case, and fixed to a number of bytes. For example, a 100ms x 100 Mbps 
link might need 0.1s x 100 Mbit/sec x ~0.1 bytes/bit = 1 Mbyte of 
buffer. But on a 5 Mbit/sec link, that same 1 Mbyte would take 10 Mbits 
/ 5 Mbit/sec = 2 seconds to empty out, which is way too long. 
Unfortunately, until a few years ago, nobody understood that too-large 
buffers could be just as destructive as too-small ones. They just 
figured that maxing out the buffer would max out the benchmark, and that 
was that.

^5 Various TCP implementations try to avoid this situation. My favourite 
is the rather new TCP BBR 
<https://cloudplatform.googleblog.com/2017/07/TCP-BBR-congestion-control-comes-to-GCP-your-Internet-just-got-faster.html>, 
which does an almost magically good job of using all available bandwidth 
without filling queues. If everyone used something like BBR, we mostly 
wouldn't need any of the stuff in this article.

^6 To be more precise, in a chain of routers, only the "bottleneck" 
router's queue will be full. The others all have excess capacity because 
the link attached to the bottleneck is overloaded. For a home Internet 
connection, the bottleneck is almost always the home router, so this 
technicality doesn't matter to our analysis.

^7 Some people say that "fair" is a stupid goal in a queue. They 
probably say this because fairness is so hard to define: there is no 
queue that can be fair by all possible definitions, and no definition of 
fair will be the "best" thing to do in all situations. For example, 
let's say I'm doing a videoconference call that takes 95% of my 
bandwidth and my roommate wants to visit a web site. Should we now each 
get 50% of the bandwidth? Probably not: video calls are much more 
sensitive to bandwidth fluctuations, whereas when loading a web page, it 
mostly doesn't matter if it takes 3 seconds instead of 1 second right 
now, as long as it loads. I'm not going to try to take sides in this 
debate, except to point out that if you use FQ, the latency for most 
streams is much lower than if you don't, and I really like low latency.

^8 Random side note: FQ is also really annoying because it makes your 
pings look fast even when you're building up big queues. That's because 
pings are "thin" and so they end up prioritized in front of your fat 
flows. Weirdly, this means that most benchmarks of FQ vs fq_codel show 
exactly the same latencies; FQ hides the CoDel improvements unless you 
very carefully code your benchmarks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20180808/3bc3b407/attachment-0001.html>