[Cerowrt-devel] [Bloat] DC behaviors today
Mikael Abrahamsson
swmike at swm.pp.se
Thu Dec 14 03:22:20 EST 2017
On Wed, 13 Dec 2017, Jonathan Morton wrote:
> Ten times average demand estimated at time of deployment, and struggling
> badly with peak demand a decade later, yes. And this is the
> transportation industry, where a decade is a *short* time - like less
> than a year in telecoms.
I've worked in ISPs since 1999 or so. I've been at startups and I've been
at established ISPs.
It's kind of an S curve when it comes to traffic growth, when you're
adding customers you can easily see 100%-300% growth per year (or more).
Then after market becomes saturated growth comes from per-customer
increased usage, and for the past 20 years or so, this has been in the
neighbourhood of 20-30% per year.
Running a network that congests parts of the day, it's hard to tell what
"Quality of Experience" your customers will have. I've heard of horror
stories from the 90ties where a then large US ISP was running an OC3 (155
megabit/s) full most of the day. So someone said "oh, we need to upgrade
this", and after a while, they did, to 2xOC3. Great, right? No, after that
upgrade both OC3:s were completely congested. Ok, then upgrade to OC12
(622 megabit/s). After that upgrade, evidently that link was not congested
a few hours of the day, and of course needed more upgrades.
So at the places I've been, I've advocated for planning rules that say
that when the link is peaking at 5 minute averages of more than 50% of
link capacity, then upgrade needs to be ordered. This 50% number can be
larger if the link aggregates larger number of customers, because
typically your "statistical overbooking" varies less the more customers
participates.
These devices do not do per-flow anything. They might have 10G or 100G
link to/from it with many many millions of flows, and it's all NPU
forwarding. Typically they might do DIFFserv-based queueing and WRED to
mitigate excessive buffering. Today, they typically don't even do ECN
marking (which I have advocated for, but there is not much support from
other ISPs in this mission).
Now, on the customer access line it's a completely different matter.
Typically people build with BRAS or similar, where (tens of) thousands of
customers might sit on a (very expensive) access card with hundreds of
thousands of queues per NPU. This still leaves just a few queues per
customer, unfortunately. So these do not do per-flow anything either. This
is where PIE comes in, because these devices like these can do PIE in the
NPU fairly easily because it's kind of like WRED.
So back to the capacity issue. Since these devices typically aren't good
at assuring per-customer access to the shared medium (backbone links),
it's easier to just make sure the backbone links are not regularily full.
This doesn't mean you're going to have 10x capacity all the time, it
probably means you're going to be bouncing between 25-70% utilization of
your links (for the normal case, because you need spare capacity to handle
events that increase traffic temporarily, plus handle loss of capacity in
case of a link fault). The upgrade might be to add another link, or a
higher tier speed interface, bringing down the utilization to typically
half or quarter of what you had before.
--
Mikael Abrahamsson email: swmike at swm.pp.se
More information about the Cerowrt-devel
mailing list