From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id DE1973CB39 for ; Sun, 15 Oct 2023 21:39:39 -0400 (EDT) Received: by mail-pg1-x52c.google.com with SMTP id 41be03b00d2f7-584a761b301so2965812a12.3 for ; Sun, 15 Oct 2023 18:39:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697420378; x=1698025178; darn=lists.bufferbloat.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0y8OaNMw6W9EcVKPEavqC3IzTis8QjPf/IpRVsNmppU=; b=CLp1W6BrE8xq1yB/JeYfea8lzmXxj6f0yYwRjz7Y/6JaL/XVv0WGr6JMahUdEqbY9p iWjQNFrxPd2mXbK1927UkMtO7bG85ubXfx0WACGCcrQgzk4uW94ibLHJcqM3lVI1rZbf 3dugWiWo7FPsZWuT/JjhpcTIthjq4UOHBO6xt1x8H5Nx6pz0O8gGFvFdWX13x1vCuHl2 cYOH9KMNQrjPtdCY0Vd0JKrJhn/Yxh34GiyPIwX/zled6azEEo7NCZj0ejb+B30YB8Yc cu0Ke3TJhTk4Y0kB/Mus9Wp6VE5pAqu1a3yfe6AARk793SNI44qt0qo7ul09h60+cL9s K2jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697420378; x=1698025178; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0y8OaNMw6W9EcVKPEavqC3IzTis8QjPf/IpRVsNmppU=; b=KtI+NVNX+9FCXIoMPDbB7Jxbf4rYaJ6CNWOo2AoOrp1LWJOaETtajEoP3EHH6WtonJ gyjQHW60PCxkvozD+5LlPvUg2XKv+HH5uwLuPakCnXF2NWolZyDUhDMiR7khGhjRJ7be EMlew8seiMeP8ConB2ub8CONLm4ax0wdLjqZ1dPKjVS9XpnXQlGrYH5pCbg/EfgoCN6/ gQ/c+Yneb121VBnfu+Yu84TH5dOsmeXDscFSrcbVbFCCehC3YqJ0rFM8Hms7ANN9LMzx ymOgeJN+BbcjNA/8RnXi9ilehT/wvbuz77DOIh7YHOwYSKCeCgFomBTBpSvNvsB6cX/W WfRQ== X-Gm-Message-State: AOJu0YzNrtYQiN9HFz6shPnh2+SB4BoqMjZ+jlyy45dzOPTChfrUmhrw zctaWZrDQuuR0n84kz9WnaSFDcf2L012icNCdqP0dpmZoug= X-Google-Smtp-Source: AGHT+IGpdLpGKt5NExLDq43NEY/9ww8l4Te+nvjgW9pSza9DU82vY9L51nBJzfrUCH/CSOHPZ2OAimq1EEZV1LC9Fzg= X-Received: by 2002:a17:90a:5d18:b0:27d:d9d:c54d with SMTP id s24-20020a17090a5d1800b0027d0d9dc54dmr12545020pji.34.1697420378134; Sun, 15 Oct 2023 18:39:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Sun, 15 Oct 2023 18:39:23 -0700 Message-ID: To: =?UTF-8?Q?Network_Neutrality_is_back=21_Let=C2=B4s_make_the_technical_asp?= =?UTF-8?Q?ects_heard_this_time=21?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [NNagain] The history of congestion control on the internet X-BeenThere: nnagain@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: =?utf-8?q?Network_Neutrality_is_back!_Let=C2=B4s_make_the_technical_aspects_heard_this_time!?= List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Oct 2023 01:39:40 -0000 It is wonderful to have your original perspectives here, Jack. But please, everyone, before a major subject change, change the subject? Jack's email conflates a few things that probably deserve other threads for them. One is VGV - great acronym! Another is about the "Placeholders" of TTL, and TOS. The last is the history of congestion control - and it's future! As being a part of the most recent episodes here I have written extensively on the subject, but what I most like to point people to is my fun talks trying to make it more accessible like this one at apnic https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-ove= r-yet/ or my more recent one at tti/vanguard. Most recently one of our LibreQos clients has been collecting 10ms samples and movies of what real-world residential traffic actually looks like: https://www.youtube.com/@trendaltoews7143 And it is my hope that that conveys intuition to others... as compared to speedtest traffic, which prove nothing about the actual behaviors of VGV traffic, which I ranted about here: https://blog.cerowrt.org/post/speedtests/ - I am glad that these speedtests now have latency under load reports almost universally, but see the rant for more detail. Most people only have a picture of traffic in the large, over 5 minute intervals, which behaves quite differently, or a pre-conception that backpressure actually exists across the internet. It doesn't. An explicit ack for every packet was ripped out of the arpanet as costing too much time. Wifi, to some extent, recreates the arpanet problem by having explicit acks on the local loop that are repeated until by god the packet comes through, usually without exponential backoff. We have some really amazing encoding schemes now - I do not understand how starlink works without retries for example, an my grip on 5G's encodings is non-existent, except knowing it is the most bufferbloated of all our technologies. ... Anyway, my hope for this list is that we come up with useful technical feedback to the powers-that-be that want to regulate the internet under some title ii provisions, and I certainly hope we can make strides towards fixing bufferbloat along the way! There are many other issues. Let's talk about those instead! But... ... In "brief" response to the notes below - source quench died due to easy ddos, AQMs from RED (1992) until codel (2012) struggled with measuring the wrong things ( Kathie's updated paper on red in a different light: https://pollere.net/Codel.html ), SFQ was adopted by many devices, WRR used in others, ARED I think is common in juniper boxes, fq_codel is pretty much the default now for most of linux, and I helped write CAKE. TCPs evolved from reno to vegas to cubic to bbr and the paper on BBR is excellent: https://research.google/pubs/pub45646/ as is len kleinrock's monograph on it. However problems with self congestion and excessive packet loss were observed, and after entering the ietf process, is now in it's 3rd revision, which looks pretty good. Hardware pause frames in ethernet are often available, there are all kinds of specialized new hardware flow control standards in 802.1, a new more centralized controller in wifi7 To this day I have no idea how infiniband works. Or how ATM was supposed to work. I have a good grip on wifi up to version 6, and the work we did on wifi is in use now on a lot of wifi gear like openwrt, eero and evenroute, and I am proudest of all my teams' work on achieving airtime fairness, and better scheduling described in this paper here: https://www.cs.kau.se/tohojo/airtime-fairness/ for wifi and MOS to die for. There is new work on this thing called L4S, which has a bunch of RFCs for it, leverages multi-bit DCTCP style ECN and is under test by apple and comcast, it is discussed on tsvwg list a lot. I encourage users to jump in on the comcast/apple beta, and operators to at least read this: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4sops/ Knowing that there is a book or three left to write on this subject that nobody will read is an issue, as is coming up with an architecture to take packet handling as we know it, to the moon and the rest of the solar system, seems kind of difficult. Ideally I would love to be working on that earth-moon architecture rather than trying to finish getting stuff we designed in 2012-2016 deployed. I am going to pull out a few specific questions from the below and answer separately. On Sun, Oct 15, 2023 at 1:00=E2=80=AFPM Jack Haverty via Nnagain wrote: > > The "VGV User" (Voice, Gaming, Videoconferencing) cares a lot about > latency. It's not just "rewarding" to have lower latencies; high > latencies may make VGV unusable. Average (or "typical") latency as the > FCC label proposes isn't a good metric to judge usability. A path which > has high variance in latency can be unusable even if the average is > quite low. Having your voice or video or gameplay "break up" every > minute or so when latency spikes to 500 msec makes the "user experience" > intolerable. > > A few years ago, I ran some simple "ping" tests to help a friend who was > trying to use a gaming app. My data was only for one specific path so > it's anecdotal. What I saw was surprising - zero data loss, every > datagram was delivered, but occasionally a datagram would take up to 30 > seconds to arrive. I didn't have the ability to poke around inside, but > I suspected it was an experience of "bufferbloat", enabled by the > dramatic drop in price of memory over the decades. > > It's been a long time since I was involved in operating any part of the > Internet, so I don't know much about the inner workings today. Apologies > for my ignorance.... > > There was a scenario in the early days of the Internet for which we > struggled to find a technical solution. Imagine some node in the bowels > of the network, with 3 connected "circuits" to some other nodes. On two > of those inputs, traffic is arriving to be forwarded out the third > circuit. The incoming flows are significantly more than the outgoing > path can accept. > > What happens? How is "backpressure" generated so that the incoming > flows are reduced to the point that the outgoing circuit can handle the > traffic? > > About 45 years ago, while we were defining TCPV4, we struggled with this > issue, but didn't find any consensus solutions. So "placeholder" > mechanisms were defined in TCPV4, to be replaced as research continued > and found a good solution. > > In that "placeholder" scheme, the "Source Quench" (SQ) IP message was > defined; it was to be sent by a switching node back toward the sender of > any datagram that had to be discarded because there wasn't any place to > put it. > > In addition, the TOS (Type Of Service) and TTL (Time To Live) fields > were defined in IP. > > TOS would allow the sender to distinguish datagrams based on their > needs. For example, we thought "Interactive" service might be needed > for VGV traffic, where timeliness of delivery was most important. > "Bulk" service might be useful for activities like file transfers, > backups, et al. "Normal" service might now mean activities like using > the Web. > > The TTL field was an attempt to inform each switching node about the > "expiration date" for a datagram. If a node somehow knew that a > particular datagram was unlikely to reach its destination in time to be > useful (such as a video datagram for a frame that has already been > displayed), the node could, and should, discard that datagram to free up > resources for useful traffic. Sadly we had no mechanisms for measuring > delay, either in transit or in queuing, so TTL was defined in terms of > "hops", which is not an accurate proxy for time. But it's all we had. > > Part of the complexity was that the "flow control" mechanism of the > Internet had put much of the mechanism in the users' computers' TCP > implementations, rather than the switches which handle only IP. Without > mechanisms in the users' computers, all a switch could do is order more > circuits, and add more memory to the switches for queuing. Perhaps that > led to "bufferbloat". > > So TOS, SQ, and TTL were all placeholders, for some mechanism in a > future release that would introduce a "real" form of Backpressure and > the ability to handle different types of traffic. Meanwhile, these > rudimentary mechanisms would provide some flow control. Hopefully the > users' computers sending the flows would respond to the SQ backpressure, > and switches would prioritize traffic using the TTL and TOS information. > > But, being way out of touch, I don't know what actually happens today. > Perhaps the current operators and current government watchers can answer?= : I would love moe feedback about RED''s deployment at scale in particular. > > 1/ How do current switches exert Backpressure to reduce competing > traffic flows? Do they still send SQs? Some send various forms of hardware flow control, an ethernet pause frame derivative > 2/ How do the current and proposed government regulations treat the > different needs of different types of traffic, e.g., "Bulk" versus > "Interactive" versus "Normal"? Are Internet carriers permitted to treat > traffic types differently? Are they permitted to charge different > amounts for different types of service? > Jack Haverty > > On 10/15/23 09:45, Dave Taht via Nnagain wrote: > > For starters I would like to apologize for cc-ing both nanog and my > > new nn list. (I will add sender filters) > > > > A bit more below. > > > > On Sun, Oct 15, 2023 at 9:32=E2=80=AFAM Tom Beecher wrote: > >>> So for now, we'll keep paying for transit to get to the others (since= it=E2=80=99s about as much as transporting IXP from Dallas), and hoping so= meone at Google finally sees Houston as more than a third rate city hanging= off of Dallas. Or=E2=80=A6 someone finally brings a worthwhile IX to Houst= on that gets us more than peering to Kansas City. Yeah, I think the former = is more likely. =F0=9F=98=8A > >> > >> There is often a chicken/egg scenario here with the economics. As an e= yeball network, your costs to build out and connect to Dallas are greater t= han your transit cost, so you do that. Totally fair. > >> > >> However think about it from the content side. Say I want to build into= to Houston. I have to put routers in, and a bunch of cache servers, so I h= ave capital outlay , plus opex for space, power, IX/backhaul/transit costs.= That's not cheap, so there's a lot of calculations that go into it. Is the= re enough total eyeball traffic there to make it worth it? Is saving 8-10ms= enough of a performance boost to justify the spend? What are the long term= trends in that market? These answers are of course different for a company= running their own CDN vs the commercial CDNs. > >> > >> I don't work for Google and obviously don't speak for them, but I woul= d suspect that they're happy to eat a 8-10ms performance hit to serve from = Dallas , versus the amount of capital outlay to build out there right now. > > The three forms of traffic I care most about are voip, gaming, and > > videoconferencing, which are rewarding to have at lower latencies. > > When I was a kid, we had switched phone networks, and while the sound > > quality was poorer than today, the voice latency cross-town was just > > like "being there". Nowadays we see 500+ms latencies for this kind of > > traffic. > > > > As to how to make calls across town work that well again, cost-wise, I > > do not know, but the volume of traffic that would be better served by > > these interconnects quite low, respective to the overall gains in > > lower latency experiences for them. > > > > > > > >> On Sat, Oct 14, 2023 at 11:47=E2=80=AFPM Tim Burke wrote= : > >>> I would say that a 1Gbit IP transit in a carrier neutral DC can be ha= d for a good bit less than $900 on the wholesale market. > >>> > >>> Sadly, IXP=E2=80=99s are seemingly turning into a pay to play game, w= ith rates almost costing as much as transit in many cases after you factor = in loop costs. > >>> > >>> For example, in the Houston market (one of the largest and fastest gr= owing regions in the US!), we do not have a major IX, so to get up to Dalla= s it=E2=80=99s several thousand for a 100g wave, plus several thousand for = a 100g port on one of those major IXes. Or, a better option, we can get a 1= 00g flat internet transit for just a little bit more. > >>> > >>> Fortunately, for us as an eyeball network, there are a good number of= major content networks that are allowing for private peering in markets li= ke Houston for just the cost of a cross connect and a QSFP if you=E2=80=99r= e in the right DC, with Google and some others being the outliers. > >>> > >>> So for now, we'll keep paying for transit to get to the others (since= it=E2=80=99s about as much as transporting IXP from Dallas), and hoping so= meone at Google finally sees Houston as more than a third rate city hanging= off of Dallas. Or=E2=80=A6 someone finally brings a worthwhile IX to Houst= on that gets us more than peering to Kansas City. Yeah, I think the former = is more likely. =F0=9F=98=8A > >>> > >>> See y=E2=80=99all in San Diego this week, > >>> Tim > >>> > >>> On Oct 14, 2023, at 18:04, Dave Taht wrote: > >>>> =EF=BB=BFThis set of trendlines was very interesting. Unfortunately = the data > >>>> stops in 2015. Does anyone have more recent data? > >>>> > >>>> https://drpeering.net/white-papers/Internet-Transit-Pricing-Historic= al-And-Projected.php > >>>> > >>>> I believe a gbit circuit that an ISP can resell still runs at about > >>>> $900 - $1.4k (?) in the usa? How about elsewhere? > >>>> > >>>> ... > >>>> > >>>> I am under the impression that many IXPs remain very successful, > >>>> states without them suffer, and I also find the concept of doing mic= ro > >>>> IXPs at the city level, appealing, and now achievable with cheap gea= r. > >>>> Finer grained cross connects between telco and ISP and IXP would low= er > >>>> latencies across town quite hugely... > >>>> > >>>> PS I hear ARIN is planning on dropping the price for, and bundling 3 > >>>> BGP AS numbers at a time, as of the end of this year, also. > >>>> > >>>> > >>>> > >>>> -- > >>>> Oct 30: https://netdevconf.info/0x17/news/the-maestro-and-the-music-= bof.html > >>>> Dave T=C3=A4ht CSO, LibreQos > > > > > > _______________________________________________ > Nnagain mailing list > Nnagain@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/nnagain --=20 Oct 30: https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.htm= l Dave T=C3=A4ht CSO, LibreQos