[NNagain] transit and peering costs projections
rjmcmahon
rjmcmahon at rjmcmahon.com
Sun Oct 15 16:39:13 EDT 2023
Hi Jack,
Thanks again for sharing. It's very interesting to me.
Today, the networks are shifting from capacity constrained to latency
constrained, as can be seen in the IX discussions about how the speed of
light over fiber is too slow even between Houston & Dallas.
The mitigations against standing queues (which cause bloat today) are:
o) Shrink the e2e bottleneck queue so it will drop packets in a flow and
TCP will respond to that "signal"
o) Use some form of ECN marking where the network forwarding plane
ultimately informs the TCP source state machine so it can slow down or
pace effectively. This can be an earlier feedback signal and, if done
well, can inform the sources to avoid bottleneck queuing. There are
couple of approaches with ECN. Comcast is trialing L4S now which seems
interesting to me as a WiFi test & measurement engineer. The jury is
still out on this and measurements are needed.
o) Mitigate source side bloat via TCP_NOTSENT_LOWAT
The QoS priority approach per congestion is orthogonal by my judgment as
it's typically not supported e2e, many networks will bleach DSCP
markings. And it's really too late by my judgment.
Also, on clock sync, yes your generation did us both a service and
disservice by getting rid of the PSTN TDM clock ;) So IP networking
devices kinda ignored clock sync, which makes e2e one way delay (OWD)
measurements impossible. Thankfully, the GPS atomic clock is now
available mostly everywhere and many devices use TCXO oscillators so
it's possible to get clock sync and use oscillators that can minimize
drift. I pay $14 for a Rpi4 GPS chip with pulse per second as an
example.
It seems silly to me that clocks aren't synced to the GPS atomic clock
even if by a proxy even if only for measurement and monitoring.
Note: As Richard Roy will point out, there really is no such thing as
synchronized clocks across geographies per general relativity - so those
syncing clocks need to keep those effects in mind. I limited the iperf 2
timestamps to microsecond precision in hopes avoiding those issues.
Note: With WiFi, a packet drop can occur because an intermittent RF
channel condition. TCP can't tell the difference between an RF drop vs a
congested queue drop. That's another reason ECN markings from network
devices may be better than dropped packets.
Note: I've added some iperf 2 test support around pacing as that seems
to be the direction the industry is heading as networks are less and
less capacity strained and user quality of experience is being driven by
tail latencies. One can also test with the Prague CCA for the L4S
scenarios. (This is a fun project: https://www.l4sgear.com/ and fairly
low cost)
--fq-rate n[kmgKMG]
Set a rate to be used with fair-queuing based socket-level pacing, in
bytes or bits per second. Only available on platforms supporting the
SO_MAX_PACING_RATE socket option. (Note: Here the suffixes indicate
bytes/sec or bits/sec per use of uppercase or lowercase, respectively)
--fq-rate-step n[kmgKMG]
Set a step of rate to be used with fair-queuing based socket-level
pacing, in bytes or bits per second. Step occurs every
fq-rate-step-interval (defaults to one second)
--fq-rate-step-interval n
Time in seconds before stepping the fq-rate
Bob
PS. Iperf 2 man page https://iperf2.sourceforge.io/iperf-manpage.html
> The "VGV User" (Voice, Gaming, Videoconferencing) cares a lot about
> latency. It's not just "rewarding" to have lower latencies; high
> latencies may make VGV unusable. Average (or "typical") latency as
> the FCC label proposes isn't a good metric to judge usability. A path
> which has high variance in latency can be unusable even if the average
> is quite low. Having your voice or video or gameplay "break up"
> every minute or so when latency spikes to 500 msec makes the "user
> experience" intolerable.
>
> A few years ago, I ran some simple "ping" tests to help a friend who
> was trying to use a gaming app. My data was only for one specific
> path so it's anecdotal. What I saw was surprising - zero data loss,
> every datagram was delivered, but occasionally a datagram would take
> up to 30 seconds to arrive. I didn't have the ability to poke around
> inside, but I suspected it was an experience of "bufferbloat", enabled
> by the dramatic drop in price of memory over the decades.
>
> It's been a long time since I was involved in operating any part of
> the Internet, so I don't know much about the inner workings today.
> Apologies for my ignorance....
>
> There was a scenario in the early days of the Internet for which we
> struggled to find a technical solution. Imagine some node in the
> bowels of the network, with 3 connected "circuits" to some other
> nodes. On two of those inputs, traffic is arriving to be forwarded
> out the third circuit. The incoming flows are significantly more than
> the outgoing path can accept.
>
> What happens? How is "backpressure" generated so that the incoming
> flows are reduced to the point that the outgoing circuit can handle
> the traffic?
>
> About 45 years ago, while we were defining TCPV4, we struggled with
> this issue, but didn't find any consensus solutions. So "placeholder"
> mechanisms were defined in TCPV4, to be replaced as research continued
> and found a good solution.
>
> In that "placeholder" scheme, the "Source Quench" (SQ) IP message was
> defined; it was to be sent by a switching node back toward the sender
> of any datagram that had to be discarded because there wasn't any
> place to put it.
>
> In addition, the TOS (Type Of Service) and TTL (Time To Live) fields
> were defined in IP.
>
> TOS would allow the sender to distinguish datagrams based on their
> needs. For example, we thought "Interactive" service might be needed
> for VGV traffic, where timeliness of delivery was most important.
> "Bulk" service might be useful for activities like file transfers,
> backups, et al. "Normal" service might now mean activities like
> using the Web.
>
> The TTL field was an attempt to inform each switching node about the
> "expiration date" for a datagram. If a node somehow knew that a
> particular datagram was unlikely to reach its destination in time to
> be useful (such as a video datagram for a frame that has already been
> displayed), the node could, and should, discard that datagram to free
> up resources for useful traffic. Sadly we had no mechanisms for
> measuring delay, either in transit or in queuing, so TTL was defined
> in terms of "hops", which is not an accurate proxy for time. But
> it's all we had.
>
> Part of the complexity was that the "flow control" mechanism of the
> Internet had put much of the mechanism in the users' computers' TCP
> implementations, rather than the switches which handle only IP.
> Without mechanisms in the users' computers, all a switch could do is
> order more circuits, and add more memory to the switches for queuing.
> Perhaps that led to "bufferbloat".
>
> So TOS, SQ, and TTL were all placeholders, for some mechanism in a
> future release that would introduce a "real" form of Backpressure and
> the ability to handle different types of traffic. Meanwhile, these
> rudimentary mechanisms would provide some flow control. Hopefully the
> users' computers sending the flows would respond to the SQ
> backpressure, and switches would prioritize traffic using the TTL and
> TOS information.
>
> But, being way out of touch, I don't know what actually happens
> today. Perhaps the current operators and current government watchers
> can answer?:git clone https://rjmcmahon@git.code.sf.net/p/iperf2/code
> iperf2-code
>
> 1/ How do current switches exert Backpressure to reduce competing
> traffic flows? Do they still send SQs?
>
> 2/ How do the current and proposed government regulations treat the
> different needs of different types of traffic, e.g., "Bulk" versus
> "Interactive" versus "Normal"? Are Internet carriers permitted to
> treat traffic types differently? Are they permitted to charge
> different amounts for different types of service?
>
> Jack Haverty
>
> On 10/15/23 09:45, Dave Taht via Nnagain wrote:
>> For starters I would like to apologize for cc-ing both nanog and my
>> new nn list. (I will add sender filters)
>>
>> A bit more below.
>>
>> On Sun, Oct 15, 2023 at 9:32 AM Tom Beecher <beecher at beecher.cc>
>> wrote:
>>>> So for now, we'll keep paying for transit to get to the others
>>>> (since it’s about as much as transporting IXP from Dallas), and
>>>> hoping someone at Google finally sees Houston as more than a third
>>>> rate city hanging off of Dallas. Or… someone finally brings a
>>>> worthwhile IX to Houston that gets us more than peering to Kansas
>>>> City. Yeah, I think the former is more likely. 😊
>>>
>>> There is often a chicken/egg scenario here with the economics. As an
>>> eyeball network, your costs to build out and connect to Dallas are
>>> greater than your transit cost, so you do that. Totally fair.
>>>
>>> However think about it from the content side. Say I want to build
>>> into to Houston. I have to put routers in, and a bunch of cache
>>> servers, so I have capital outlay , plus opex for space, power,
>>> IX/backhaul/transit costs. That's not cheap, so there's a lot of
>>> calculations that go into it. Is there enough total eyeball traffic
>>> there to make it worth it? Is saving 8-10ms enough of a performance
>>> boost to justify the spend? What are the long term trends in that
>>> market? These answers are of course different for a company running
>>> their own CDN vs the commercial CDNs.
>>>
>>> I don't work for Google and obviously don't speak for them, but I
>>> would suspect that they're happy to eat a 8-10ms performance hit to
>>> serve from Dallas , versus the amount of capital outlay to build out
>>> there right now.
>> The three forms of traffic I care most about are voip, gaming, and
>> videoconferencing, which are rewarding to have at lower latencies.
>> When I was a kid, we had switched phone networks, and while the sound
>> quality was poorer than today, the voice latency cross-town was just
>> like "being there". Nowadays we see 500+ms latencies for this kind of
>> traffic.
>>
>> As to how to make calls across town work that well again, cost-wise, I
>> do not know, but the volume of traffic that would be better served by
>> these interconnects quite low, respective to the overall gains in
>> lower latency experiences for them.
>>
>>
>>
>>> On Sat, Oct 14, 2023 at 11:47 PM Tim Burke <tim at mid.net> wrote:
>>>> I would say that a 1Gbit IP transit in a carrier neutral DC can be
>>>> had for a good bit less than $900 on the wholesale market.
>>>>
>>>> Sadly, IXP’s are seemingly turning into a pay to play game, with
>>>> rates almost costing as much as transit in many cases after you
>>>> factor in loop costs.
>>>>
>>>> For example, in the Houston market (one of the largest and fastest
>>>> growing regions in the US!), we do not have a major IX, so to get up
>>>> to Dallas it’s several thousand for a 100g wave, plus several
>>>> thousand for a 100g port on one of those major IXes. Or, a better
>>>> option, we can get a 100g flat internet transit for just a little
>>>> bit more.
>>>>
>>>> Fortunately, for us as an eyeball network, there are a good number
>>>> of major content networks that are allowing for private peering in
>>>> markets like Houston for just the cost of a cross connect and a QSFP
>>>> if you’re in the right DC, with Google and some others being the
>>>> outliers.
>>>>
>>>> So for now, we'll keep paying for transit to get to the others
>>>> (since it’s about as much as transporting IXP from Dallas), and
>>>> hoping someone at Google finally sees Houston as more than a third
>>>> rate city hanging off of Dallas. Or… someone finally brings a
>>>> worthwhile IX to Houston that gets us more than peering to Kansas
>>>> City. Yeah, I think the former is more likely. 😊
>>>>
>>>> See y’all in San Diego this week,
>>>> Tim
>>>>
>>>> On Oct 14, 2023, at 18:04, Dave Taht <dave.taht at gmail.com> wrote:
>>>>> This set of trendlines was very interesting. Unfortunately the
>>>>> data
>>>>> stops in 2015. Does anyone have more recent data?
>>>>>
>>>>> https://drpeering.net/white-papers/Internet-Transit-Pricing-Historical-And-Projected.php
>>>>>
>>>>> I believe a gbit circuit that an ISP can resell still runs at about
>>>>> $900 - $1.4k (?) in the usa? How about elsewhere?
>>>>>
>>>>> ...
>>>>>
>>>>> I am under the impression that many IXPs remain very successful,
>>>>> states without them suffer, and I also find the concept of doing
>>>>> micro
>>>>> IXPs at the city level, appealing, and now achievable with cheap
>>>>> gear.
>>>>> Finer grained cross connects between telco and ISP and IXP would
>>>>> lower
>>>>> latencies across town quite hugely...
>>>>>
>>>>> PS I hear ARIN is planning on dropping the price for, and bundling
>>>>> 3
>>>>> BGP AS numbers at a time, as of the end of this year, also.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Oct 30:
>>>>> https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.html
>>>>> Dave Täht CSO, LibreQos
>>
>>
>
> _______________________________________________
> Nnagain mailing list
> Nnagain at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/nnagain
More information about the Nnagain
mailing list