[Cerowrt-devel] [Bloat] viability of the data center in the internet of the future
Fred Baker (fred)
fred at cisco.com
Tue Jul 1 12:38:47 EDT 2014
On Jul 1, 2014, at 1:37 AM, Dave Taht <dave.taht at gmail.com> wrote:
> On Sat, Jun 28, 2014 at 5:50 PM, Fred Baker (fred) <fred at cisco.com> wrote:
>> There is in fact a backbone. Once upon a time, it was run by a single company, BBN. Then it was more like five, and then ... and now it’s 169. There are, if the BGP report (http://seclists.org/nanog/2014/Jun/495) is to be believed, 47136 ASNs in the system, of which 35929 don’t show up as transit for anyone and are therefore presumably edge networks and potentially multihomed, and of those 16325 only announce a single prefix. Of the 6101 ASNs that show up as transit, 169 ONLY show up as transit. Yes, the core is 169 ASNs, and it’s not a little dot off to the side. If you want to know where it is, do a traceroute (tracery on windows).
>
> The fact that the internet has grown to 10+ billion devices (by some
> estimates), and from 1 transit provider to only 169 doesn't impress
> me. There are 206 countries in the world...
Did I say that there was only one transit provider? I said there were 169 AS’s that, in potoroo’s equivalent of route views, *only* show up as transit. There are, this morning, 195 transit-only AS’s, 40724 origin-only AS’s (AS’s that are only found at the edge), and 6573 AS’s that show up both a origin AS’s and transit AS’s.
http://bgp.potaroo.net/as2.0/bgp-active.html
> It is a shame that multi-homing has never been easily obtainable nor
> widely available, it would be nice to be able to have multiple links
> for any business critically dependent on the continuous operation of
> the internet and cloud.
Actually, it is pretty common. Again, from potoroo.net, there are 30620 origin AS’s announced via a single AS path. The implication is that there are 40724-30620=10104 origin AS’s being announced to AS65000 via multiple AS paths. I don’t know whether they or their upstreams are multi-homed, but I’ll bet a significant subset of them are multihomed.
>> I’ll give you two, one through Cisco and one through my residential provider.
>>
>> traceroute to reed.com (67.223.249.82), 64 hops max, 52 byte packets
>> 1 sjc-fred-881.cisco.com (10.19.64.113) 1.289 ms 12.000 ms 1.130 ms
>
> This is through your vpn?
Yes
>> 2 sjce-access-hub1-tun10.cisco.com (10.27.128.1) 47.661 ms 45.281 ms 42.995 ms
>
>> 3 ...
>> 11 sjck-isp-gw1-ten1-1-0.cisco.com (128.107.239.217) 44.972 ms 45.094 ms 43.670 ms
>> 12 tengige0-2-0-0.gw5.scl2.alter.net (152.179.99.153) 48.806 ms 49.338 ms 47.975 ms
>> 13 0.xe-9-1-0.br1.sjc7.alter.net (152.63.51.101) 43.998 ms 45.595 ms 49.838 ms
>> 14 206.111.6.121.ptr.us.xo.net (206.111.6.121) 52.110 ms 45.492 ms 47.373 ms
>> 15 207.88.14.225.ptr.us.xo.net (207.88.14.225) 126.696 ms 124.374 ms 127.983 ms
>> 16 te-2-0-0.rar3.washington-dc.us.xo.net (207.88.12.70) 127.639 ms 132.965 ms 131.415 ms
>> 17 te-3-0-0.rar3.nyc-ny.us.xo.net (207.88.12.73) 129.747 ms 125.680 ms 123.907 ms
>> 18 ae0d0.mcr1.cambridge-ma.us.xo.net (216.156.0.26) 125.009 ms 123.152 ms 126.992 ms
>> 19 ip65-47-145-6.z145-47-65.customer.algx.net (65.47.145.6) 118.244 ms 118.024 ms 117.983 ms
>> 20 * * *
>> 21 209.59.211.175 (209.59.211.175) 119.378 ms * 122.057 ms
>> 22 reed.com (67.223.249.82) 120.051 ms 120.146 ms 118.672 ms
>
>
>> traceroute to reed.com (67.223.249.82), 64 hops max, 52 byte packets
>> 1 10.0.2.1 (10.0.2.1) 1.728 ms 1.140 ms 1.289 ms
>> 2 10.6.44.1 (10.6.44.1) 122.289 ms 126.330 ms 14.782 ms
>
> ^^^^^ is this a wireless hop or something? Seeing your traceroute jump
> all the way to 122+ms strongly suggests you are either wireless or
> non-pied/fq_codeled.
The zeroth hop is wireless - I pull my Ethernet plug and turn on the wifi interface, which is instantiated by two Apple Airport APs in the home. 10.0.2.1 is the residential slice of my router. To be honest, I’m hard-pressed to say what 10.6.44.1 is; I suspect it’s an address of my CMTS. The address *I* have for my CMTS is 98.173.193.1, and my address in that subnet is 98.173.193.12. If you want my guess, Cox is returning an RFC 1918 address to prevent non-customers from pinging it.
--- 10.6.44.1 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 7.668/10.102/12.012/1.520 ms
--- 98.173.193.1 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.414/30.501/120.407/41.031 ms
and 98.173.193.1 doesn’t show up in my traceroute.
Absent per-hop timestamps, I’m not in a position to say where the delay came from. For all I know, it has something to do with the Wifi in the house. Wifi can have really strange delays.
Whatever.
>> 3 ip68-4-12-20.oc.oc.cox.net (68.4.12.20) 13.208 ms 12.667 ms 8.941 ms
>> 4 ip68-4-11-96.oc.oc.cox.net (68.4.11.96) 17.025 ms 13.911 ms 13.835 ms
>> 5 langbprj01-ae1.rd.la.cox.net (68.1.1.13) 131.855 ms 14.677 ms 129.860 ms
>> 6 68.105.30.150 (68.105.30.150) 16.750 ms 31.627 ms 130.134 ms
>> 7 ae11.cr2.lax112.us.above.net (64.125.21.173) 40.754 ms 31.873 ms 130.246 ms
>> 8 ae3.cr2.iah1.us.above.net (64.125.21.85) 162.884 ms 77.157 ms 69.431 ms
>> 9 ae14.cr2.dca2.us.above.net (64.125.21.53) 97.115 ms 113.428 ms 80.068 ms
>> 10 ae8.mpr4.bos2.us.above.net.29.125.64.in-addr.arpa (64.125.29.33) 109.957 ms 124.964 ms 122.447 ms
>> 11 * 64.125.69.90.t01470-01.above.net (64.125.69.90) 86.163 ms 103.232 ms
>> 12 250.252.148.207.static.yourhostingaccount.com (207.148.252.250) 111.068 ms 119.984 ms 114.022 ms
>> 13 209.59.211.175 (209.59.211.175) 103.358 ms 87.412 ms 86.345 ms
>> 14 reed.com (67.223.249.82) 87.276 ms 102.752 ms 86.800 ms
>
> Doing me to you:
>
> d at ida:$ traceroute -n 68.4.12.20
Through Cox:
--- 68.4.12.20 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 12.954/16.348/28.209/4.777 ms
traceroute to 68.4.12.20 (68.4.12.20), 64 hops max, 52 byte packets
1 10.0.2.1 1.975 ms 9.026 ms 1.397 ms
2 * * *
3 * * *
Traceroute to Facebook works, though:
traceroute www.facebook.com
traceroute to star.c10r.facebook.com (31.13.77.65), 64 hops max, 52 byte packets
1 10.0.2.1 (10.0.2.1) 1.490 ms 1.347 ms 0.934 ms
2 10.6.44.1 (10.6.44.1) 9.253 ms 11.308 ms 10.974 ms
3 ip68-4-12-20.oc.oc.cox.net (68.4.12.20) 11.275 ms 13.531 ms 20.180 ms
4 ip68-4-11-96.oc.oc.cox.net (68.4.11.96) 18.901 ms 13.013 ms 18.723 ms
5 sanjbprj01-ae0.0.rd.sj.cox.net (68.1.5.184) 29.397 ms 28.944 ms 30.062 ms
6 sv1.br01.sjc1.tfbnw.net (206.223.116.166) 31.011 ms 31.082 ms
sv1.pr02.tfbnw.net (206.223.116.153) 32.035 ms
7 ae1.bb01.sjc1.tfbnw.net (74.119.76.23) 32.932 ms 33.251 ms
po126.msw01.05.sjc1.tfbnw.net (31.13.31.131) 31.822 ms
8 edge-star-shv-05-sjc1.facebook.com (31.13.77.65) 38.234 ms 44.150 ms 31.165 ms
So it’s not that the router is dropping incoming ICMP.
Through Cisco:
--- 68.4.12.20 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss
traceroute to 68.4.12.20 (68.4.12.20), 64 hops max, 52 byte packets
1 10.19.64.113 1.173 ms 0.932 ms 0.932 ms
2 10.27.128.1 36.256 ms 36.478 ms 37.376 ms
3 10.20.1.205 35.831 ms 36.211 ms 36.090 ms
4 171.69.14.249 36.084 ms 36.345 ms 37.889 ms
5 171.69.14.206 38.342 ms 37.791 ms 39.771 ms
6 171.69.7.178 37.699 ms 36.662 ms 41.758 ms
7 128.107.236.39 43.112 ms 36.401 ms 39.407 ms
8 128.107.239.6 35.576 ms 35.092 ms 37.770 ms
9 128.107.239.218 35.846 ms 35.337 ms 36.488 ms
10 128.107.239.250 35.504 ms 36.924 ms 39.353 ms
11 128.107.239.217 36.881 ms 38.063 ms 37.892 ms
12 152.179.99.153 38.745 ms 39.754 ms 39.665 ms
13 152.63.51.97 38.322 ms 37.466 ms 41.380 ms
14 129.250.9.249 39.924 ms 40.913 ms 39.690 ms
15 129.250.5.52 46.302 ms 43.463 ms 39.334 ms
16 129.250.6.10 49.332 ms 45.380 ms 47.309 ms
17 129.250.5.86 46.556 ms 48.806 ms
129.250.5.70 48.635 ms
18 129.250.6.181 48.020 ms
129.250.6.203 47.502 ms 47.111 ms
19 129.250.194.166 47.373 ms 48.532 ms 48.723 ms
20 68.1.0.179 66.514 ms
68.1.0.185 63.758 ms
68.1.0.189 61.326 ms
21 * * *
> Using ping rather than traceroute I get a typical min RTT to you
> of 32ms.
>
> As the crow drives between santa barbara and los gatos, (280 miles) at
> the speed of light in cable, we have roughly 4ms of RTT between us, or
> 28ms of induced latency due to the characteristics of the underlying
> media technologies, and the quality and limited quantity of the
> interconnects.
>
> A number I've long longed to have from fios, dsl, and cable are
> measurements of "cross-town" latency - in the prior age of
> circuit-switched networks, I can't imagine it being much higher than
> 4ms, and local telephony used to account for a lot of calls.
Well, if it’s of any interest, I once upon a time had a fractional T-1 to the home (a different one, but here in Santa Barbara), and ping RTT to Cisco was routinely 30ish ms much as it is now through Cox. I did have it jump once of about 600 ms, and I called to complain.
> Going cable to cable, between two comcast cablemodems on (so far as I
> know) different CMTSes, the 20 miles between los gatos and scotts
> valley:
>
> 1 50.197.142.150 0.794 ms 0.692 ms 0.517 ms
> 2 67.180.184.1 19.266 ms 18.397 ms 8.726 ms
> 3 68.85.102.173 14.953 ms 9.347 ms 10.213 ms
> 4 69.139.198.146 20.477 ms 69.139.198.142 12.434 ms
> 69.139.198.138 16.116 ms
> 5 68.87.226.205 17.850 ms 15.375 ms 13.954 ms
> 6 68.86.142.250 28.254 ms 33.133 ms 28.546 ms
> 7 67.180.229.17 21.987 ms 23.831 ms 27.354 ms
>
> gfiber testers are reporting 3-5ms RTT to speedtest (co-lo'd in their
> data center), which is a very encouraging statistic, but I don't have
> subscriber-2-subscriber numbers there. Yet.
>
>>
>> Cisco->AlterNet->XO->ALGX is one path, and Cox->AboveNet->Presumably ALGX is another. They both traverse the core.
>>
>> Going to bufferbloat.net, I actually do skip the core in one path. Through Cisco, I go through core site and hurricane electric and finally into ISC. ISC, it turns out, is a Cox customer; taking my residential path, since Cox serves us both, the traffic never goes upstream from Cox.
>>
>> Yes, there are CDNs. I don’t think you’d like the way Video/IP and especially adaptive bitrate video - Netflix, Youtube, etc - worked if they didn’t exist.
>
> I totally favor CDNs of all sorts. My worry - not successfully
> mirrored in the fast/slow lane debate - was over the vertical
> integration of certain providers preventing future CDN deployments of
> certain kinds of content.
Personally, I think most of that is blarney. A contract to colo a CDN provider is money for the service provider. I haven’t noticed any service providers turning down money.
>> Akamai is probably the prototypical one, and when they deployed theirs it made the Internet quite a bit snappier - and that helped the economics of Internet sales. Google and Facebook actually do operate large data centers, but a lot of their common content (or at least Google’s) is in CDNlets. NetFlix uses several CDNs, or so I’m told; the best explanation I have found of their issues with Comcast and Level 3 is at http://www.youtube.com/watch?v=tR1sLLOYxnY (and it has imperfections). And yes, part of the story is business issues over CDNs. Netflix’s data traverses the core once to each CDN download server, and from the server to its customers.
>
> Yes, that description mostly mirrors my understanding, and the viewpoint we
> point forth in the wired article which I hoped help to defuse the hysteria.
>
> Then what gfiber published shortly afterwards on their co-lo policy
> scored some points, I thought.
>
> http://googlefiberblog.blogspot.com/2014/05/minimizing-buffering.html
>
> In addition the wayward political arguments, the what bothered me
> about level3's argument is that the made unsubstantiated claims about
> packet loss and latency that I'd have loved to hear more about,
> notably whether or not they had any AQM in place.
Were I Netflix and company, and for that matter Youtube, I would handle delay at the TCP sender by using a delay-based TCP congestion control algorithm. There is at least one common data center provider that I think does that; they told me that they had purchased a congestion control algorithm (although the guy I was speaking with didn’t know what they bought or from whom), and the only one I know of that is for sale in that sense is a pretty effective delay-based algorithm. The point of TCP congestion control is to maximize throughput while protecting the Internet. I would argue that it SHOULD be to maximize throughput while minimizing latency. Rant available on request.
>> The IETF uses a CDN, as of recently. It’s called Cloudflare.
>>
>> One of the places I worry is Chrome and Silk’s SPDY Proxies, which are somewhere in Google and Amazon respectively.
>
> Well, the current focus on e2e encryption everywhere is breaking good
> old fashioned methods of minimizing dns and web traffic inside an
> organization and coping with odd circumstances like satellite links. I
> liked web proxies, they were often capable of reducing traffic by 10s
> of percentage points, reduce latency enormously for lossy or satellite
> links, and were frequently used by large organizations (like schools)
> to manage content.
Well, yes. They also have the effect of gerrymandering routing. All traffic through a proxy could go directly to the destination but goes first to the proxy. If the proxy is on the path, well and good. If it’s off-path, it adds RTT.
>> Chrome and Silk send https and SPDY traffic directly to the targeted service, but http traffic to their proxies, which do their magic and send the result back. One of the potential implications is that instead of going to the CDN nearest me, it then goes to the CDN nearest the proxy. That’s not good for me. I just hope that the CDNs I use accept https from me, because that will give me the best service (and btw encrypts my data).
>>
>> Blind men and elephants, and they’re all right.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20140701/d92534c2/attachment.sig>
More information about the Cerowrt-devel
mailing list