<div dir="ltr">This is an interesting topic to me. Over the past 5+ years, I've been reading about GPON fiber aggregators(GPON chassis for lack of a proper term) with 400Gb-1Tb/s of uplink, 1-2Tb/s line-cards, and enough GPON ports for several thousand customers.<div><br></div><div>When my current ISP started rolling out fiber(all of it underground, no above-ground fiber), I called support during a graveyard hour on the weekend, and I got a senior network admin answering the phone instead of normal tech support. When talking to him, I asked him what they claimed by "guaranteed" bandwidth. I guess I should mention that my ISP claims dedicated bandwidth for everyone. He told me that they played with over-subscription for a while, but it just resulted in complex situations that caused customers to complain. Complaining customers are expensive because they eat up support phone time. They eventually went to a non-oversubscribed flat model. He told me that the GPON chassis plugs strait into the core router. I asked him about GPON port shared bandwidth and the GPON uplink. He said they will not over-subscribe a GPON port, so all ONTs on the port can use 100% of their provisioned rate, and they will not place more provision bandwidth on a single GPON chassis than what they uplink can support.</div><div><br></div><div>For the longest time, their max sold bandwidth was 50Mb/s. After some time, they were having some issues resulting in packet-loss during peak hours. Turned out their old core router could not support all of the new customers in the ARP cache and was causing massive amounts of broadcasted packets. I actually helped them solve this issue. They had me work with a hired consulting service that was having issues diagnosing the problem, much because of the older hardware not supporting modern diagnostic features. They fixed the problem by upgrading the core router. Because I was already in contact with them during this issue, I was made privy that their new core router could handle about 10Tb/s with a lot of room for 100Gb+ ports. No exact details, but told their slowest internal link was now 100Gb.</div><div><br></div><div>Their core router actually had traffic shaping and an AQM built in. They switched from using ONT rate limiting for provisioning to letting the core router handle provisioning. I can actually see 1Gb bursts as their shaping seems to be like a sliding window over a few tens of ms. I have actually tested their AQM a bit via a DOS testing service. At the time, I had a 100Mb/100Mb service, and externally flooding my connection with 110Mb/s resulted in about 10% packetloss, but my ping stayed under 20ms. I tried 200Mb/s for about 20 seconds, which resulted in about 50% loss and still ~20ms pings. For about 10 seconds I tested 1Gb/s DOS and had about 90% loss(not a long time to sample, but was sampled at a rate of 10pps against their speed-test server), but 20-40ms pings. I tested this during off hours, like 1am.</div><div><br></div><div>A few months after the upgrade, I got upgraded to a 100Mb connection with no change in price and several new higher tiers were added, all the way up to 1Gb/s. I asked them about this. Yes, the 1Gb tier was also not over-subscribed. I'm not sure if some lone customer pretty much got their own GPON port or they had some WDM-PON linecards.</div><div><br></div><div>I'm currently paying about $40/m for 150/150 for a "dedicated" connection. I'm currently getting about 1ms+-0.1ms pings to my ISP's speedtest server 24/7. If I do a ping flood, I can get my avg ping down near 0.12ms. I assume this is because of GPON scheduling. Of course I only test this against their speedtest server and during off hours.</div><div><br></div><div>As for the trunk, I've also talked to them about that, at least in the past and I can't speak for more current times. They had 3 trunks, 2 to Level 3 Chicago and one to Global Crossing Minnesota. I was told each link was a paired link for immediate fail-over. I was told that in some cases, they've bonded the links, primarily due to DDOS attacks, to quickly double their bandwidth. Their GX link was their fail-over and the two Chicago Level 3 links were the load balanced primaries. Based on trace-routes, they seemed to be load-balanced by some lower-bits in the IP address. This gave a total of 6 links. The network admin told me that any given link had enough bandwidth provisioned, that if all 5 other links were down, that one link would have a 95th percentile below 80% during peak hours, and customers should be completely unaffected.</div><div><br></div><div>They've been advertising guaranteed dedicated bandwidth for over 15 years now. They recently had a marketing campaign against the local incumbent where they poked fun at them for only selling "up to" bandwidth. This went on for at least a year. They openly advertised that their bandwidth was not "up to", but that customers will always get all of their bandwidth all the time. In the small print it said "to their transit provider". In short, my ISP is claiming I should always get my provisioned bandwidth to Level 3 24/7. As far as I have cared to measure, this is true. At one point I had a month long ping of 2 pps against AWS Frunkfurt. ~140ms avg, ~139ms min, ~std-dev 3ms, max ping of ~160ms, and fewer than 100 lost packets. 6-12ms to Chicago, depending on which link, 30-35ms to New York City depending on the link, 90ms to London, and 110 to Paris. Interesting note, AWS Frankfurt was only claiming about 6 hops from Midwest USA. That's impressive.<br></div><div><br></div><div>Back when I was load testing my 100Mb connection, I queued up a bunch of well seeded large Linux ISOs, and downloaded to my SSDs. Between my traffic shaping via pfSense and my ISP's unknown AQM, I averaged 99.8Mb/s with a max of 99.5Mb/s and a min of 99.7Mb/s, sampled over a 1.5 hour window from 8:30p to 10p. Those averages were as reported by pfSense in 1min slices. 0 ping packets lost to my ISP with no ping more than ~10ms and the avg/std-dev was identical to idle to with-in 0.1ms. When doing the DDOS, pfSense reported exactly 100.0Mb/s hitting the WAN with zero dips.</div><div><br></div><div>In short, if I wanted to, I could purchase a 500/500 "dedicated" connection for $110/m, plus tax but no other fees, free install, passive point-to-point self-healing ring back to the CO from my house, and a /29 static block for an additional $10/m, and told I can do do web-hosting, but no SLA, even though I get near perfect connectivity and single digit minutes 1a-2a yearly downtime.</div><div><br></div><div>This is all from a local private ISP that openly brags that they do no accept any government grants, loans, or other subsidies. My ISP is about 120 years old and started off as a telegraph service. I've gotten the feeling that fast dedicated bandwidth is cheap and easy, assuming you're an established ISP that doesn't have to fight through red-tape. We've got farmers with 1Gb/1Gb dedicated fiber connections, all without government support.</div><div><br></div><div>About 3 years ago I was reading about petabit core routers with 1Tb/s ports and single-fiber ~40Tb/s multiplexers. Recently I heard that 100Gb PON with 2.5Tb/s of bandwidth is already partially working in labs, with an expected cost not much more than current day XG2-PON, which is what... 300Gb/s or so split among 32 customers?. As far as I can tell, last mile bandwidth is a solved problem short of incompetence, greed, or extreme circumstances.</div><div><br></div><div>Ahh yes. Statistical over-subscription was the topic. This works well for backbone providers where they have many peering links with a heavy mix of flows. Level 3 has a blog where they were showing off a 10Gb link where below the 95th percentile, the link had zero total packets lost and a queuing delay of less than 0.1ms. But above 80%, suddenly loss and jitter went up with a hockey-stick curve. Then they showed a 400Gb link. It was at 98% utilization for the 95th percentile and it had zero total packets lost and a max queuing delay of 0.01ms with an average of 0.00ms.</div><div><br></div><div>There was a major European IX that had a blog about bandwidth planning and over-provisioning. They had a 95th percentile in the many-terabits, and they said they said they could always predict peak bandwidth to within 1% for any given day. Given a large mix of flow types, statistics is very good.</div><div><br></div><div>On a slightly different topic, I wonder what trunk providers are using for AQMs. My ISP was under a massive DDOS some time in the past year and I use a Level 3 looking glass from Chicago, which showed only a 40ms delta between the pre-hop and hitting my ISP, where it was normally about 11ms for that link. You could say about 30ms of buffering was going on. The really interesting thing is I was only getting about 5-10Mb/s, which means there was virtually zero free bandwidth. but I had almost no packet-loss. I called my ISP shortly after the issue started and that's when they told me they were under a DDOS and were at 100% trunk, and they said they were going to have their trunk bandwidth increased shortly. 5 minutes later, the issue was gone. About 30 minutes later I was called back and told the DDOS was still on-going, they just upgraded to enough bandwidth to soak it all. I found it very interesting that a DDOS large enough to effectively kill 95% of my provisioned bandwidth and increase my ping 30ms over normal, did not seem to affect packet-loss almost at all. It was well under 0.1%. Is this due to the statistical nature of large links or did Level 3 have an AQM to my ISP?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 14, 2017 at 2:22 AM, Mikael Abrahamsson <span dir="ltr"><<a href="mailto:swmike@swm.pp.se" target="_blank">swmike@swm.pp.se</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, 13 Dec 2017, Jonathan Morton wrote:<br>
<br>
</span><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Ten times average demand estimated at time of deployment, and struggling badly with peak demand a decade later, yes. And this is the transportation industry, where a decade is a *short* time - like less than a year in telecoms.<br>
</blockquote>
<br></span>
I've worked in ISPs since 1999 or so. I've been at startups and I've been at established ISPs.<br>
<br>
It's kind of an S curve when it comes to traffic growth, when you're adding customers you can easily see 100%-300% growth per year (or more). Then after market becomes saturated growth comes from per-customer increased usage, and for the past 20 years or so, this has been in the neighbourhood of 20-30% per year.<br>
<br>
Running a network that congests parts of the day, it's hard to tell what "Quality of Experience" your customers will have. I've heard of horror stories from the 90ties where a then large US ISP was running an OC3 (155 megabit/s) full most of the day. So someone said "oh, we need to upgrade this", and after a while, they did, to 2xOC3. Great, right? No, after that upgrade both OC3:s were completely congested. Ok, then upgrade to OC12 (622 megabit/s). After that upgrade, evidently that link was not congested a few hours of the day, and of course needed more upgrades.<br>
<br>
So at the places I've been, I've advocated for planning rules that say that when the link is peaking at 5 minute averages of more than 50% of link capacity, then upgrade needs to be ordered. This 50% number can be larger if the link aggregates larger number of customers, because typically your "statistical overbooking" varies less the more customers participates.<br>
<br>
These devices do not do per-flow anything. They might have 10G or 100G link to/from it with many many millions of flows, and it's all NPU forwarding. Typically they might do DIFFserv-based queueing and WRED to mitigate excessive buffering. Today, they typically don't even do ECN marking (which I have advocated for, but there is not much support from other ISPs in this mission).<br>
<br>
Now, on the customer access line it's a completely different matter. Typically people build with BRAS or similar, where (tens of) thousands of customers might sit on a (very expensive) access card with hundreds of thousands of queues per NPU. This still leaves just a few queues per customer, unfortunately. So these do not do per-flow anything either. This is where PIE comes in, because these devices like these can do PIE in the NPU fairly easily because it's kind of like WRED.<br>
<br>
So back to the capacity issue. Since these devices typically aren't good at assuring per-customer access to the shared medium (backbone links), it's easier to just make sure the backbone links are not regularily full. This doesn't mean you're going to have 10x capacity all the time, it probably means you're going to be bouncing between 25-70% utilization of your links (for the normal case, because you need spare capacity to handle events that increase traffic temporarily, plus handle loss of capacity in case of a link fault). The upgrade might be to add another link, or a higher tier speed interface, bringing down the utilization to typically half or quarter of what you had before.<div class="HOEnZb"><div class="h5"><br>
<br>
-- <br>
Mikael Abrahamsson email: <a href="mailto:swmike@swm.pp.se" target="_blank">swmike@swm.pp.se</a><br>
______________________________<wbr>_________________<br>
Bloat mailing list<br>
<a href="mailto:Bloat@lists.bufferbloat.net" target="_blank">Bloat@lists.bufferbloat.net</a><br>
<a href="https://lists.bufferbloat.net/listinfo/bloat" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/<wbr>listinfo/bloat</a><br>
</div></div></blockquote></div><br></div>