[Bloat] viability of the data center in the internet of the future

Tue Jul 1 12:38:47 EDT 2014

On Jul 1, 2014, at 1:37 AM, Dave Taht <dave.taht at gmail.com> wrote:

> On Sat, Jun 28, 2014 at 5:50 PM, Fred Baker (fred) <fred at cisco.com> wrote:
>> There is in fact a backbone. Once upon a time, it was run by a single company, BBN. Then it was more like five, and then ... and now it’s 169. There are, if the BGP report (http://seclists.org/nanog/2014/Jun/495) is to be believed, 47136 ASNs in the system, of which 35929 don’t show up as transit for anyone and are therefore presumably edge networks and potentially multihomed, and of those 16325 only announce a single prefix. Of the 6101 ASNs that show up as transit, 169 ONLY show up as transit. Yes, the core is 169 ASNs, and it’s not a little dot off to the side. If you want to know where it is, do a traceroute (tracery on windows).
> 
> The fact that the internet has grown to 10+ billion devices (by some
> estimates), and from 1 transit provider to only 169 doesn't impress
> me. There are 206 countries in the world...

Did I say that there was only one transit provider? I said there were 169 AS’s that, in potoroo’s equivalent of route views, *only* show up as transit. There are, this morning, 195 transit-only AS’s, 40724 origin-only AS’s (AS’s that are only found at the edge), and 6573 AS’s that show up both a origin AS’s and transit AS’s. 

http://bgp.potaroo.net/as2.0/bgp-active.html

> It is a shame that multi-homing has never been easily obtainable nor
> widely available, it would be nice to be able to have multiple links
> for any business critically dependent on the continuous operation of
> the internet and cloud.

Actually, it is pretty common. Again, from potoroo.net, there are 30620 origin AS’s announced via a single AS path. The implication is that there are 40724-30620=10104 origin AS’s being announced to AS65000 via multiple AS paths. I don’t know whether they or their upstreams are multi-homed, but I’ll bet a significant subset of them are multihomed.

>> I’ll give you two, one through Cisco and one through my residential provider.
>> 
>> traceroute to reed.com (67.223.249.82), 64 hops max, 52 byte packets
>> 1  sjc-fred-881.cisco.com (10.19.64.113)  1.289 ms  12.000 ms  1.130 ms
> 
> This is through your vpn?

Yes

>> 2  sjce-access-hub1-tun10.cisco.com (10.27.128.1)  47.661 ms  45.281 ms  42.995 ms
> 
>> 3  ...
>> 11  sjck-isp-gw1-ten1-1-0.cisco.com (128.107.239.217)  44.972 ms  45.094 ms  43.670 ms
>> 12  tengige0-2-0-0.gw5.scl2.alter.net (152.179.99.153)  48.806 ms  49.338 ms  47.975 ms
>> 13  0.xe-9-1-0.br1.sjc7.alter.net (152.63.51.101)  43.998 ms  45.595 ms  49.838 ms
>> 14  206.111.6.121.ptr.us.xo.net (206.111.6.121)  52.110 ms  45.492 ms  47.373 ms
>> 15  207.88.14.225.ptr.us.xo.net (207.88.14.225)  126.696 ms  124.374 ms  127.983 ms
>> 16  te-2-0-0.rar3.washington-dc.us.xo.net (207.88.12.70)  127.639 ms  132.965 ms  131.415 ms
>> 17  te-3-0-0.rar3.nyc-ny.us.xo.net (207.88.12.73)  129.747 ms  125.680 ms  123.907 ms
>> 18  ae0d0.mcr1.cambridge-ma.us.xo.net (216.156.0.26)  125.009 ms  123.152 ms  126.992 ms
>> 19  ip65-47-145-6.z145-47-65.customer.algx.net (65.47.145.6)  118.244 ms  118.024 ms  117.983 ms
>> 20  * * *
>> 21  209.59.211.175 (209.59.211.175)  119.378 ms *  122.057 ms
>> 22  reed.com (67.223.249.82)  120.051 ms  120.146 ms  118.672 ms
> 
> 
>> traceroute to reed.com (67.223.249.82), 64 hops max, 52 byte packets
>> 1  10.0.2.1 (10.0.2.1)  1.728 ms  1.140 ms  1.289 ms
>> 2  10.6.44.1 (10.6.44.1)  122.289 ms  126.330 ms  14.782 ms
> 
> ^^^^^ is this a wireless hop or something? Seeing your traceroute jump
> all the way to 122+ms strongly suggests you are either wireless or
> non-pied/fq_codeled.

The zeroth hop is wireless - I pull my Ethernet plug and turn on the wifi interface, which is instantiated by two Apple Airport APs in the home. 10.0.2.1 is the residential slice of my router. To be honest, I’m hard-pressed to say what 10.6.44.1 is; I suspect it’s an address of my CMTS. The address *I* have for my CMTS is 98.173.193.1, and my address in that subnet is 98.173.193.12. If you want my guess, Cox is returning an RFC 1918 address to prevent non-customers from pinging it.

--- 10.6.44.1 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 7.668/10.102/12.012/1.520 ms

--- 98.173.193.1 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.414/30.501/120.407/41.031 ms

and 98.173.193.1 doesn’t show up in my traceroute. 

Absent per-hop timestamps, I’m not in a position to say where the delay came from. For all I know, it has something to do with the Wifi in the house. Wifi can have really strange delays.

Whatever.

>> 3  ip68-4-12-20.oc.oc.cox.net (68.4.12.20)  13.208 ms  12.667 ms  8.941 ms
>> 4  ip68-4-11-96.oc.oc.cox.net (68.4.11.96)  17.025 ms  13.911 ms  13.835 ms
>> 5  langbprj01-ae1.rd.la.cox.net (68.1.1.13)  131.855 ms  14.677 ms  129.860 ms
>> 6  68.105.30.150 (68.105.30.150)  16.750 ms  31.627 ms  130.134 ms
>> 7  ae11.cr2.lax112.us.above.net (64.125.21.173)  40.754 ms  31.873 ms  130.246 ms
>> 8  ae3.cr2.iah1.us.above.net (64.125.21.85)  162.884 ms  77.157 ms  69.431 ms
>> 9  ae14.cr2.dca2.us.above.net (64.125.21.53)  97.115 ms  113.428 ms  80.068 ms
>> 10  ae8.mpr4.bos2.us.above.net.29.125.64.in-addr.arpa (64.125.29.33)  109.957 ms  124.964 ms  122.447 ms
>> 11  * 64.125.69.90.t01470-01.above.net (64.125.69.90)  86.163 ms  103.232 ms
>> 12  250.252.148.207.static.yourhostingaccount.com (207.148.252.250)  111.068 ms  119.984 ms  114.022 ms
>> 13  209.59.211.175 (209.59.211.175)  103.358 ms  87.412 ms  86.345 ms
>> 14  reed.com (67.223.249.82)  87.276 ms  102.752 ms  86.800 ms
> 
> Doing me to you:
> 
> d at ida:$ traceroute -n 68.4.12.20

Through Cox:

--- 68.4.12.20 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 12.954/16.348/28.209/4.777 ms

traceroute to 68.4.12.20 (68.4.12.20), 64 hops max, 52 byte packets
 1  10.0.2.1  1.975 ms  9.026 ms  1.397 ms
 2  * * *
 3  * * *

Traceroute to Facebook works, though:

traceroute www.facebook.com
traceroute to star.c10r.facebook.com (31.13.77.65), 64 hops max, 52 byte packets
 1  10.0.2.1 (10.0.2.1)  1.490 ms  1.347 ms  0.934 ms
 2  10.6.44.1 (10.6.44.1)  9.253 ms  11.308 ms  10.974 ms
 3  ip68-4-12-20.oc.oc.cox.net (68.4.12.20)  11.275 ms  13.531 ms  20.180 ms
 4  ip68-4-11-96.oc.oc.cox.net (68.4.11.96)  18.901 ms  13.013 ms  18.723 ms
 5  sanjbprj01-ae0.0.rd.sj.cox.net (68.1.5.184)  29.397 ms  28.944 ms  30.062 ms
 6  sv1.br01.sjc1.tfbnw.net (206.223.116.166)  31.011 ms  31.082 ms
    sv1.pr02.tfbnw.net (206.223.116.153)  32.035 ms
 7  ae1.bb01.sjc1.tfbnw.net (74.119.76.23)  32.932 ms  33.251 ms
    po126.msw01.05.sjc1.tfbnw.net (31.13.31.131)  31.822 ms
 8  edge-star-shv-05-sjc1.facebook.com (31.13.77.65)  38.234 ms  44.150 ms  31.165 ms

So it’s not that the router is dropping incoming ICMP.

Through Cisco:

--- 68.4.12.20 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

traceroute to 68.4.12.20 (68.4.12.20), 64 hops max, 52 byte packets
 1  10.19.64.113  1.173 ms  0.932 ms  0.932 ms
 2  10.27.128.1  36.256 ms  36.478 ms  37.376 ms
 3  10.20.1.205  35.831 ms  36.211 ms  36.090 ms
 4  171.69.14.249  36.084 ms  36.345 ms  37.889 ms
 5  171.69.14.206  38.342 ms  37.791 ms  39.771 ms
 6  171.69.7.178  37.699 ms  36.662 ms  41.758 ms
 7  128.107.236.39  43.112 ms  36.401 ms  39.407 ms
 8  128.107.239.6  35.576 ms  35.092 ms  37.770 ms
 9  128.107.239.218  35.846 ms  35.337 ms  36.488 ms
10  128.107.239.250  35.504 ms  36.924 ms  39.353 ms
11  128.107.239.217  36.881 ms  38.063 ms  37.892 ms
12  152.179.99.153  38.745 ms  39.754 ms  39.665 ms
13  152.63.51.97  38.322 ms  37.466 ms  41.380 ms
14  129.250.9.249  39.924 ms  40.913 ms  39.690 ms
15  129.250.5.52  46.302 ms  43.463 ms  39.334 ms
16  129.250.6.10  49.332 ms  45.380 ms  47.309 ms
17  129.250.5.86  46.556 ms  48.806 ms
    129.250.5.70  48.635 ms
18  129.250.6.181  48.020 ms
    129.250.6.203  47.502 ms  47.111 ms
19  129.250.194.166  47.373 ms  48.532 ms  48.723 ms
20  68.1.0.179  66.514 ms
    68.1.0.185  63.758 ms
    68.1.0.189  61.326 ms
21  * * *

> Using ping rather than traceroute I get a typical min RTT to you
> of 32ms.
> 
> As the crow drives between santa barbara and los gatos, (280 miles) at
> the speed of light in cable, we have roughly 4ms of RTT between us, or
> 28ms of induced latency due to the characteristics of the underlying
> media technologies, and the quality and limited quantity of the
> interconnects.
> 
> A number I've long longed to have from fios, dsl, and cable are
> measurements of "cross-town" latency - in the prior age of
> circuit-switched networks, I can't imagine it being much higher than
> 4ms, and local telephony used to account for a lot of calls.

Well, if it’s of any interest, I once upon a time had a fractional T-1 to the home (a different one, but here in Santa Barbara), and ping RTT to Cisco was routinely 30ish ms much as it is now through Cox. I did have it jump once of about 600 ms, and I called to complain.

> Going cable to cable, between two comcast cablemodems on (so far as I
> know) different CMTSes, the 20 miles between los gatos and scotts
> valley:
> 
> 1  50.197.142.150  0.794 ms  0.692 ms  0.517 ms
> 2  67.180.184.1  19.266 ms  18.397 ms  8.726 ms
> 3  68.85.102.173  14.953 ms  9.347 ms  10.213 ms
> 4  69.139.198.146  20.477 ms  69.139.198.142  12.434 ms
> 69.139.198.138  16.116 ms
> 5  68.87.226.205  17.850 ms  15.375 ms  13.954 ms
> 6  68.86.142.250  28.254 ms  33.133 ms  28.546 ms
> 7  67.180.229.17  21.987 ms  23.831 ms  27.354 ms
> 
> gfiber testers are reporting 3-5ms RTT to speedtest (co-lo'd in their
> data center), which is a very encouraging statistic, but I don't have
> subscriber-2-subscriber numbers there. Yet.
> 
>> 
>> Cisco->AlterNet->XO->ALGX is one path, and Cox->AboveNet->Presumably ALGX is another. They both traverse the core.
>> 
>> Going to bufferbloat.net, I actually do skip the core in one path. Through Cisco, I go through core site and hurricane electric and finally into ISC. ISC, it turns out, is a Cox customer; taking my residential path, since Cox serves us both, the traffic never goes upstream from Cox.
>> 
>> Yes, there are CDNs. I don’t think you’d like the way Video/IP and especially adaptive bitrate video - Netflix, Youtube, etc - worked if they didn’t exist.
> 
> I totally favor CDNs of all sorts. My worry - not successfully
> mirrored in the fast/slow lane debate - was over the vertical
> integration of certain providers preventing future CDN deployments of
> certain kinds of content.

Personally, I think most of that is blarney. A contract to colo a CDN provider is money for the service provider. I haven’t noticed any service providers turning down money.

>> Akamai is probably the prototypical one, and when they deployed theirs it made the Internet quite a bit snappier - and that helped the economics of Internet sales. Google and Facebook actually do operate large data centers, but a lot of their common content (or at least Google’s) is in CDNlets. NetFlix uses several CDNs, or so I’m told; the best explanation I have found of their issues with Comcast and Level 3 is at http://www.youtube.com/watch?v=tR1sLLOYxnY (and it has imperfections). And yes, part of the story is business issues over CDNs. Netflix’s data traverses the core once to each CDN download server, and from the server to its customers.
> 
> Yes, that description mostly mirrors my understanding, and the viewpoint we
> point forth in the wired article which I hoped help to defuse the hysteria.
> 
> Then what gfiber published shortly afterwards on their co-lo policy
> scored some points, I thought.
> 
> http://googlefiberblog.blogspot.com/2014/05/minimizing-buffering.html
> 
> In addition the wayward political arguments, the what bothered me
> about level3's argument is that the made unsubstantiated claims about
> packet loss and latency that I'd have loved to hear more about,
> notably whether or not they had any AQM in place.

Were I Netflix and company, and for that matter Youtube, I would handle delay at the TCP sender by using a delay-based TCP congestion control algorithm. There is at least one common data center provider that I think does that; they told me that they had purchased a congestion control algorithm (although the guy I was speaking with didn’t know what they bought or from whom), and the only one I know of that is for sale in that sense is a pretty effective delay-based algorithm. The point of TCP congestion control is to maximize throughput while protecting the Internet. I would argue that it SHOULD be to maximize throughput while minimizing latency. Rant available on request.

>> The IETF uses a CDN, as of recently. It’s called Cloudflare.
>> 
>> One of the places I worry is Chrome and Silk’s SPDY Proxies, which are somewhere in Google and Amazon respectively.
> 
> Well, the current focus on e2e encryption everywhere is breaking good
> old fashioned methods of minimizing dns and web traffic inside an
> organization and coping with odd circumstances like satellite links. I
> liked web proxies, they were often capable of reducing traffic by 10s
> of percentage points, reduce latency enormously for lossy or satellite
> links, and were frequently used by large organizations (like schools)
> to manage content.

Well, yes. They also have the effect of gerrymandering routing. All traffic through a proxy could go directly to the destination but goes first to the proxy. If the proxy is on the path, well and good. If it’s off-path, it adds RTT. 

>> Chrome and Silk send https and SPDY traffic directly to the targeted service, but http traffic to their proxies, which do their magic and send the result back. One of the potential implications is that instead of going to the CDN nearest me, it then goes to the CDN nearest the proxy. That’s not good for me. I just hope that the CDNs I use accept https from me, because that will give me the best service (and btw encrypts my data).
>> 
>> Blind men and elephants, and they’re all right.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20140701/d92534c2/attachment-0001.sig>