[Cerowrt-devel] [Bloat] DOCSIS 3+ recommendation?

Jonathan Morton chromatix99 at gmail.com
Fri Mar 20 14:14:03 EDT 2015


> On 20 Mar, 2015, at 16:11, Livingood, Jason <Jason_Livingood at cable.comcast.com> wrote:
> 
>> Even when you get to engineers in the organizations who build the equipment, it's hard.  First you have to explain that "more is not better", and "some packet loss is good for you".
> 
> That’s right, Jim. The “some packet loss is good” part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible, not to mention that you should never fill a link to capacity (meaning either there should never be a bottleneck link anywhere on the Internet and/or that congestion should never occur anywhere). 

That’s a rather interesting combination of viewpoints to have - and very revealing, too, of a fundamental disconnect between their mental theory of how the Internet works and how the Internet is actually used.

So here are some talking points that might be useful in an elevator pitch.  The wording will need to be adjusted to circumstances.

In short, they’re thinking only about the *core* Internet.  There, not being the bottleneck is a reasonably good idea, and packet loss is a reasonable metric of performance.  Buffers are used to absorb momentary bursts exceeding the normal rate, and since the link is supposed to never be congested, it doesn’t matter for latency how big those buffers are.  Adding capacity to satisfy that assumption is relatively easy, too - just plug in another 10G Ethernet module for peering, or another optical transceiver on a spare light-frequency for transit.  Or so I hear.

But nobody sees the core Internet except a few technician types in shadowy datacentres.  At least 99.999% of Internet users have to deal with the last mile on a daily basis - and it’s usually the last mile that is the bottleneck, unless someone *really* screwed up on a peering arrangement.  The key technologies in the last mile are the head-end, the CPE modem, and the CPE router; the last two might be in the same physical box as each other.  Those three are where we’re focusing our attention.

There, the basic assumption that the link should never be loaded to capacity is utter bunk.  The only common benchmarks of Internet performance that most people have access to (and which CPE vendors perform) are to do precisely that, and see just how big they can make the resulting bandwidth number.  And as soon as anyone starts a big TCP/IP-based upload or download, such as a software update or a video, the TCP stack in any modern OS will do its level best to load the link to capacity - and beyond.  This is more than a simple buffer - of *any* size - can deal with.

As an aside, it’s occasionally difficult to convince last-mile ISPs that packet loss (of several percent, due to line quality, not congestion) *is* a problem.  But in that case, it’s probably because it would cost money (and thus profit margin) to send someone out to fix the underlying physical cause.  It really is a different world.

Once upon a time, the receive window of TCP was limited to 64KB, and the momentary bursts that could be expected from a single flow were limited accordingly.  Those days are long gone.  Given the chance, a modern TCP stack will increase the receive and congestion window to multi-megabyte proportions.  Even on a premium, 100Mbps cable or FTTC downlink (which most consumers can’t afford and often can’t even obtain), that corresponds to roughly a whole second of buffering; an order of magnitude above the usual rule of thumb for buffer sizing.  On slower links, the proportions are even more outrageous.  Something to think about next time you’re negotiating microseconds with a high-frequency trading outfit.

I count myself among the camp of “packet loss is bad”.  However, I have the sense to realise that if more packets are persistently coming into a box than can be sent out the other side, some of those packets *will* be lost, sooner or later.  What AQM does is to signal (either through early loss or ECN marking) to the TCP endpoints that the link capacity has been reached, and it can stop pushing now - please - thank you.  This allows the buffer to do its designed job of absorbing momentary bursts.

Given that last-mile links are often congested, it becomes important to distinguish between latency-sensitive and throughput-sensitive traffic flows.  VoIP and online gaming are the most obvious examples of latency-sensitive traffic, but Web browsing is *also* more latency-sensitive than throughput-sensitive, for typical modern Web pages.  Video streaming, software updates and uploading photos are good examples of throughput-sensitive applications; latency doesn’t matter much to them, since all they want to do is use the full link capacity.

The trouble is that often, in the same household, there are several different people using the same last-mile link, and they will tend to get home and spend their leisure time on the Internet at roughly the same time as each other.  The son fires up his console to frag some noobs, and Mother calls her sister over VoIP; so far so good.  But then Father decides on which movie to watch later that evening and starts downloading it, and the daughter starts uploading photos from her school field trip to goodness knows where. So there are now two latency-sensitive and and two throughput-sensitive applications using this single link simultaneously, and the throughput-sensitive ones have immediately loaded the link to capacity in both directions (one each).

So what happens then?  You tell me - you know your hardware the best.  Or haven’t you measured its behaviour under those conditions? Oh, for shame!

Okay, I’ll tell you what happens with 99.9% of head-end and CPE hardware out there today:  Mother can’t hear her sister properly any more, nor vice versa.  And not just because the son has just stormed out of his bedroom yelling about lag and how he would have pwned that lamer if only that crucial shot had actually gone where he knows he aimed it.  But as far as Father and the daughter are concerned, the Internet is still working just fine - look, the progress bars are ticking along nicely! - until, that is, Father wants to read the evening news, but the news site’s front page takes half a minute to load, and half the images are missing when it does.

And Father knows that calling the ISP in the morning (when their call centre is open) won’t help.  They’ll run tests and find absolutely nothing wrong, and not-so-subtly imply that he (or more likely his wife) is an idiotic time-waster.  Of course, a weekday morning isn't when everyone’s using it, so nothing *is* wrong.  The link is uncongested at the time of testing, latency is as low as it should be, and there’s no line-quality packet loss.  The problem has mysteriously disappeared - only to reappear in the evening.  It’s not even weather related, and the ISP insists that they have adequate backhaul and peering capacity.

So why?  Because the throughput-sensitive applications fill not only the link capacity but the buffers in front of it (on both sides).  Since it takes time for a packet at the back of each queue to reach the link, this induces latency - typically *hundreds* of milliseconds of it, and sometimes even much more than that; *minutes* in extreme cases.  But both a VoIP call and a typical online game require latencies *below one hundred* milliseconds for optimum performance.  That’s why Mother and the son had their respective evening activities ruined, and Father’s experience with the news site is representative of a particularly bad case.

The better AQM systems now available (eg. fq_codel) can separate latency-sensitive traffic from throughput-sensitive traffic and give them both the service they need.  This will give your customers a far better experience in the reasonably common situation I just outlined - but only if you put it in your hardware product and make sure that it actually works.  Otherwise, you’ll start losing customers to the first competitor who does.

 - Jonathan Morton




More information about the Cerowrt-devel mailing list