[NNagain] The history of congestion control on the internet

Robert McMahon rjmcmahon at rjmcmahon.com
Mon Oct 16 13:37:29 EDT 2023


We in semiconductors test TCP on hundreds of test rigs and multiple operating systems, use statistical process controls before sending our chips, and support sw to system integrators or device manufacturers. Then, those companies do their work and test more before shipping to their customers. There is a lot of testing baked in now. If not, billions of TCP state machines wouldn't function nor interoperate. And people then wouldn't buy these products as networks are essential infrastructure.

Reading the code doesn't really work for this class of problem. Code reviews are good in human processes, but escapes are quite high. Computers have to engage too and are now doing the heavy lifting 

Bob

On Oct 16, 2023, 10:21 AM, at 10:21 AM, Spencer Sevilla via Nnagain <nnagain at lists.bufferbloat.net> wrote:
>That Flakeway tool makes me think of an early version of the Chaos
>Monkey. To that note, Apple maintains a developer tool called Network
>Link Conditioner that does a good job simulating reduced network
>performance.
>
>> On Oct 15, 2023, at 23:30, Jack Haverty via Nnagain
><nnagain at lists.bufferbloat.net> wrote:
>> 
>> Even back in 1978, I didn't think Source Quench would work.   I
>recall that I was trying to adapt my TCP2.5 Unix implementation to
>become TCP4, and I asked what my TCP should do if it sent the first IP
>datagram to open a TCP connection and received a Source Quench.  It
>wasn't clear at all how I should "slow down".   Other TCP implementors
>took the receipt of an SQ as an indication that a datagram they had
>sent had been discarded, so the obvious reaction for user satisfaction
>was to retransmit immediately.   Slowing down would simply degrade
>their user's experience.
>> 
>> Glad to hear SQ is gone.   I hope whatever replaced it works.
>> 
>> There's some confusion about the Arpanet.  The Arpanet was known as a
>"packet switching network", but it had lots of internal mechanisms that
>essentially created virtual circuits between attached computers.  
>Every packet sent in to the network by a user computer came out at the
>destination intact, in order, and not duplicated or lost.   The Arpanet
>switches even had a hardware mechanism for flow control; a switch could
>halt data transfer from a user computer when necessary.   During the
>80s, the Arpanet evolved to have an X.25 interface, and operated as a
>true "virtual circuit" provider.   Even in the Defense Data Network
>(DDN), the network delivered a virtual circuit service.  The attached
>users' computers had TCP, but the TCP didn't need to deal with most of
>the network behavior that TCP was designed to handle.  Congestion was
>similarly handled by internal Arpanet mechanisms (there were several
>technical reports from BBN to ARPA with details).    I don't remember
>any time that "an explicit ack for every packet was ripped out of the
>arpanet" None of those events happened when two TCP computers were
>connected to the Arpanet.
>> 
>> The Internet grew up around the Arpanet, which provided most of the
>wide-area connectivity through the mid-80s.   Since the Arpanet
>provided the same "reliable byte stream" behavior as TCP provided, and
>most user computers were physically attached to an Arpanet switch, it
>wasn't obvious how to test a TCP implementation, to see how well it
>dealt with reordering, duplication, dropping, or corruption of IP
>datagrams.   
>> 
>> We (at BBN) actually had to implement a software package called a
>"Flakeway", which ran on a SparcStation.   Using a "feature" of
>Ethernets and ARP (some would call it a vulnerability), the Flakeway
>could insert itself invisibly in the stream of datagrams between any
>two computers on that LAN (e.g., between a user computer and the
>gateway/router providing a path to other sites).  The Flakeway could
>then simulate "real" Internet behavior by dropping, duplicating,
>reordering, mangling, delaying, or otherwise interfering with the flow.
>That was extremely useful in testing and diagnosing TCP
>implementations.
>> 
>> I understand that there has been a lot of technical work over the
>years, and lots of new mechanisms defined for use in the Internet to
>solve various problems.  But one issue that has not been addressed --
>how do you know whether or not some such mechanism has actually been
>implemented, and configured correctly, in the millions of devices that
>are now using TCP (and UDP, IP, etc.)?  AFAIK, there's no way to tell
>unless you can examine the actual code.
>> 
>> The Internet, and TCP, was an experiment.  One aspect of that
>experiment involved changing the traditional role of a network
>"switch", and moving mechanisms for flow control, error control, and
>other mechanisms used to create a "virtual circuit" behavior.   Instead
>of being implemented inside some switching equipment, TCP's mechanisms
>are implemented inside users' computers.    That was a significant
>break from traditional network architecture.
>> 
>> I didn't realize it at the time, but now, with users' devices being
>uncountable handheld or desktop computers rather than huge racks in
>relatively few data centers, moving all those mechanisms from switches
>to users' computers significantly complicates the system design and
>especially operation.
>> 
>> That may be one of the more important results of the long-running
>experiment.
>> 
>> Jack Haverty
>> 
>> On 10/15/23 18:39, Dave Taht wrote:
>>> It is wonderful to have your original perspectives here, Jack.
>>> 
>>> But please, everyone, before a major subject change, change the
>subject?
>>> 
>>> Jack's email conflates a few things that probably deserve other
>>> threads for them. One is VGV - great acronym! Another is about the
>>> "Placeholders" of TTL, and TOS. The last is the history of
>congestion
>>> control - and it's future! As being a part of the most recent
>episodes
>>> here I have written extensively on the subject, but what I most like
>>> to point people to is my fun talks trying to make it more accessible
>>> like this one at apnic
>>>
>https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over-yet/
>>> or my more recent one at tti/vanguard.
>>> 
>>> Most recently one of our LibreQos clients has been collecting 10ms
>>> samples and movies of what real-world residential traffic actually
>>> looks like:
>>> 
>>> https://www.youtube.com/@trendaltoews7143
>>> 
>>> And it is my hope that that conveys intuition to others... as
>compared
>>> to speedtest traffic, which prove nothing about the actual behaviors
>>> of VGV traffic, which I ranted about here:
>>> https://blog.cerowrt.org/post/speedtests/ - I am glad that these
>>> speedtests now have latency under load reports almost universally,
>but
>>> see the rant for more detail.
>>> 
>>> Most people only have a picture of traffic in the large, over 5
>minute
>>> intervals, which behaves quite differently, or a pre-conception that
>>> backpressure actually exists across the internet. It doesn't. An
>>> explicit ack for every packet was ripped out of the arpanet as
>costing
>>> too much time. Wifi, to some extent, recreates the arpanet problem
>by
>>> having explicit acks on the local loop that are repeated until by
>god
>>> the packet comes through, usually without exponential backoff.
>>> 
>>> We have some really amazing encoding schemes now - I do not
>understand
>>> how starlink works without retries for example, an my grip on 5G's
>>> encodings is non-existent, except knowing it is the most
>bufferbloated
>>> of all our technologies.
>>> 
>>> ...
>>> 
>>> Anyway, my hope for this list is that we come up with useful
>technical
>>> feedback to the powers-that-be that want to regulate the internet
>>> under some title ii provisions, and I certainly hope we can make
>>> strides towards fixing bufferbloat along the way! There are many
>other
>>> issues. Let's talk about those instead!
>>> 
>>> But...
>>> ...
>>> 
>>> In "brief" response to the notes below - source quench died due to
>>> easy ddos, AQMs from RED (1992) until codel (2012) struggled with
>>> measuring the wrong things ( Kathie's updated paper on red in a
>>> different light: https://pollere.net/Codel.html ), SFQ was adopted
>by
>>> many devices, WRR used in others, ARED I think is common in juniper
>>> boxes, fq_codel is pretty much the default now for most of linux,
>and
>>> I helped write CAKE.
>>> 
>>> TCPs evolved from reno to vegas to cubic to bbr and the paper on BBR
>>> is excellent: https://research.google/pubs/pub45646/ as is len
>>> kleinrock's monograph on it. However problems with self congestion
>and
>>> excessive packet loss were observed, and after entering the ietf
>>> process, is now in it's 3rd revision, which looks pretty good.
>>> 
>>> Hardware pause frames in ethernet are often available, there are all
>>> kinds of specialized new hardware flow control standards in 802.1, a
>>> new more centralized controller in wifi7
>>> 
>>> To this day I have no idea how infiniband works. Or how ATM was
>>> supposed to work. I have a good grip on wifi up to version 6, and
>the
>>> work we did on wifi is in use now on a lot of wifi gear like
>openwrt,
>>> eero and evenroute, and I am proudest of all my teams' work on
>>> achieving airtime fairness, and better scheduling described in this
>>> paper here: https://www.cs.kau.se/tohojo/airtime-fairness/ for wifi
>>> and MOS to die for.
>>> 
>>> There is new work on this thing called L4S, which has a bunch of
>RFCs
>>> for it, leverages multi-bit DCTCP style ECN and is under test by
>apple
>>> and comcast, it is discussed on tsvwg list a lot. I encourage users
>to
>>> jump in on the comcast/apple beta, and operators to at least read
>>> this: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4sops/
>>> 
>>> Knowing that there is a book or three left to write on this subject
>>> that nobody will read is an issue, as is  coming up with an
>>> architecture to take packet handling as we know it, to the moon and
>>> the rest of the solar system, seems kind of difficult.
>>> 
>>> Ideally I would love to be working on that earth-moon architecture
>>> rather than trying to finish getting stuff we designed in 2012-2016
>>> deployed.
>>> 
>>> I am going to pull out a few specific questions from the below and
>>> answer separately.
>>> 
>>> On Sun, Oct 15, 2023 at 1:00 PM Jack Haverty via Nnagain
>>> <nnagain at lists.bufferbloat.net>
><mailto:nnagain at lists.bufferbloat.net> wrote:
>>>> The "VGV User" (Voice, Gaming, Videoconferencing) cares a lot about
>>>> latency.   It's not just "rewarding" to have lower latencies; high
>>>> latencies may make VGV unusable.   Average (or "typical") latency
>as the
>>>> FCC label proposes isn't a good metric to judge usability.  A path
>which
>>>> has high variance in latency can be unusable even if the average is
>>>> quite low.   Having your voice or video or gameplay "break up"
>every
>>>> minute or so when latency spikes to 500 msec makes the "user
>experience"
>>>> intolerable.
>>>> 
>>>> A few years ago, I ran some simple "ping" tests to help a friend
>who was
>>>> trying to use a gaming app.  My data was only for one specific path
>so
>>>> it's anecdotal.  What I saw was surprising - zero data loss, every
>>>> datagram was delivered, but occasionally a datagram would take up
>to 30
>>>> seconds to arrive.  I didn't have the ability to poke around
>inside, but
>>>> I suspected it was an experience of "bufferbloat", enabled by the
>>>> dramatic drop in price of memory over the decades.
>>>> 
>>>> It's been a long time since I was involved in operating any part of
>the
>>>> Internet, so I don't know much about the inner workings today.
>Apologies
>>>> for my ignorance....
>>>> 
>>>> There was a scenario in the early days of the Internet for which we
>>>> struggled to find a technical solution.  Imagine some node in the
>bowels
>>>> of the network, with 3 connected "circuits" to some other nodes. 
>On two
>>>> of those inputs, traffic is arriving to be forwarded out the third
>>>> circuit.  The incoming flows are significantly more than the
>outgoing
>>>> path can accept.
>>>> 
>>>> What happens?   How is "backpressure" generated so that the
>incoming
>>>> flows are reduced to the point that the outgoing circuit can handle
>the
>>>> traffic?
>>>> 
>>>> About 45 years ago, while we were defining TCPV4, we struggled with
>this
>>>> issue, but didn't find any consensus solutions.  So "placeholder"
>>>> mechanisms were defined in TCPV4, to be replaced as research
>continued
>>>> and found a good solution.
>>>> 
>>>> In that "placeholder" scheme, the "Source Quench" (SQ) IP message
>was
>>>> defined; it was to be sent by a switching node back toward the
>sender of
>>>> any datagram that had to be discarded because there wasn't any
>place to
>>>> put it.
>>>> 
>>>> In addition, the TOS (Type Of Service) and TTL (Time To Live)
>fields
>>>> were defined in IP.
>>>> 
>>>> TOS would allow the sender to distinguish datagrams based on their
>>>> needs.  For example, we thought "Interactive" service might be
>needed
>>>> for VGV traffic, where timeliness of delivery was most important.
>>>> "Bulk" service might be useful for activities like file transfers,
>>>> backups, et al.   "Normal" service might now mean activities like
>using
>>>> the Web.
>>>> 
>>>> The TTL field was an attempt to inform each switching node about
>the
>>>> "expiration date" for a datagram.   If a node somehow knew that a
>>>> particular datagram was unlikely to reach its destination in time
>to be
>>>> useful (such as a video datagram for a frame that has already been
>>>> displayed), the node could, and should, discard that datagram to
>free up
>>>> resources for useful traffic.  Sadly we had no mechanisms for
>measuring
>>>> delay, either in transit or in queuing, so TTL was defined in terms
>of
>>>> "hops", which is not an accurate proxy for time.   But it's all we
>had.
>>>> 
>>>> Part of the complexity was that the "flow control" mechanism of the
>>>> Internet had put much of the mechanism in the users' computers' TCP
>>>> implementations, rather than the switches which handle only IP.
>Without
>>>> mechanisms in the users' computers, all a switch could do is order
>more
>>>> circuits, and add more memory to the switches for queuing.  Perhaps
>that
>>>> led to "bufferbloat".
>>>> 
>>>> So TOS, SQ, and TTL were all placeholders, for some mechanism in a
>>>> future release that would introduce a "real" form of Backpressure
>and
>>>> the ability to handle different types of traffic.   Meanwhile,
>these
>>>> rudimentary mechanisms would provide some flow control. Hopefully
>the
>>>> users' computers sending the flows would respond to the SQ
>backpressure,
>>>> and switches would prioritize traffic using the TTL and TOS
>information.
>>>> 
>>>> But, being way out of touch, I don't know what actually happens
>today.
>>>> Perhaps the current operators and current government watchers can
>answer?:
>>> I would love moe feedback about RED''s deployment at scale in
>particular.
>>> 
>>>> 1/ How do current switches exert Backpressure to  reduce competing
>>>> traffic flows?  Do they still send SQs?
>>> Some send various forms of hardware flow control, an ethernet pause
>>> frame derivative
>>> 
>>>> 2/ How do the current and proposed government regulations treat the
>>>> different needs of different types of traffic, e.g., "Bulk" versus
>>>> "Interactive" versus "Normal"?  Are Internet carriers permitted to
>treat
>>>> traffic types differently?  Are they permitted to charge different
>>>> amounts for different types of service?
>>> 
>>>> Jack Haverty
>>>> 
>>>> On 10/15/23 09:45, Dave Taht via Nnagain wrote:
>>>>> For starters I would like to apologize for cc-ing both nanog and
>my
>>>>> new nn list. (I will add sender filters)
>>>>> 
>>>>> A bit more below.
>>>>> 
>>>>> On Sun, Oct 15, 2023 at 9:32 AM Tom Beecher <beecher at beecher.cc>
><mailto:beecher at beecher.cc> wrote:
>>>>>>> So for now, we'll keep paying for transit to get to the others
>(since it’s about as much as transporting IXP from Dallas), and hoping
>someone at Google finally sees Houston as more than a third rate city
>hanging off of Dallas. Or… someone finally brings a worthwhile IX to
>Houston that gets us more than peering to Kansas City. Yeah, I think
>the former is more likely. 😊
>>>>>> There is often a chicken/egg scenario here with the economics. As
>an eyeball network, your costs to build out and connect to Dallas are
>greater than your transit cost, so you do that. Totally fair.
>>>>>> 
>>>>>> However think about it from the content side. Say I want to build
>into to Houston. I have to put routers in, and a bunch of cache
>servers, so I have capital outlay , plus opex for space, power,
>IX/backhaul/transit costs. That's not cheap, so there's a lot of
>calculations that go into it. Is there enough total eyeball traffic
>there to make it worth it? Is saving 8-10ms enough of a performance
>boost to justify the spend? What are the long term trends in that
>market? These answers are of course different for a company running
>their own CDN vs the commercial CDNs.
>>>>>> 
>>>>>> I don't work for Google and obviously don't speak for them, but I
>would suspect that they're happy to eat a 8-10ms performance hit to
>serve from Dallas , versus the amount of capital outlay to build out
>there right now.
>>>>> The three forms of traffic I care most about are voip, gaming, and
>>>>> videoconferencing, which are rewarding to have at lower latencies.
>>>>> When I was a kid, we had switched phone networks, and while the
>sound
>>>>> quality was poorer than today, the voice latency cross-town was
>just
>>>>> like "being there". Nowadays we see 500+ms latencies for this kind
>of
>>>>> traffic.
>>>>> 
>>>>> As to how to make calls across town work that well again,
>cost-wise, I
>>>>> do not know, but the volume of traffic that would be better served
>by
>>>>> these interconnects quite low, respective to the overall gains in
>>>>> lower latency experiences for them.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sat, Oct 14, 2023 at 11:47 PM Tim Burke <tim at mid.net>
><mailto:tim at mid.net> wrote:
>>>>>>> I would say that a 1Gbit IP transit in a carrier neutral DC can
>be had for a good bit less than $900 on the wholesale market.
>>>>>>> 
>>>>>>> Sadly, IXP’s are seemingly turning into a pay to play game, with
>rates almost costing as much as transit in many cases after you factor
>in loop costs.
>>>>>>> 
>>>>>>> For example, in the Houston market (one of the largest and
>fastest growing regions in the US!), we do not have a major IX, so to
>get up to Dallas it’s several thousand for a 100g wave, plus several
>thousand for a 100g port on one of those major IXes. Or, a better
>option, we can get a 100g flat internet transit for just a little bit
>more.
>>>>>>> 
>>>>>>> Fortunately, for us as an eyeball network, there are a good
>number of major content networks that are allowing for private peering
>in markets like Houston for just the cost of a cross connect and a QSFP
>if you’re in the right DC, with Google and some others being the
>outliers.
>>>>>>> 
>>>>>>> So for now, we'll keep paying for transit to get to the others
>(since it’s about as much as transporting IXP from Dallas), and hoping
>someone at Google finally sees Houston as more than a third rate city
>hanging off of Dallas. Or… someone finally brings a worthwhile IX to
>Houston that gets us more than peering to Kansas City. Yeah, I think
>the former is more likely. 😊
>>>>>>> 
>>>>>>> See y’all in San Diego this week,
>>>>>>> Tim
>>>>>>> 
>>>>>>> On Oct 14, 2023, at 18:04, Dave Taht <dave.taht at gmail.com>
><mailto:dave.taht at gmail.com> wrote:
>>>>>>>> This set of trendlines was very interesting. Unfortunately the
>data
>>>>>>>> stops in 2015. Does anyone have more recent data?
>>>>>>>> 
>>>>>>>>
>https://drpeering.net/white-papers/Internet-Transit-Pricing-Historical-And-Projected.php
>>>>>>>> 
>>>>>>>> I believe a gbit circuit that an ISP can resell still runs at
>about
>>>>>>>> $900 - $1.4k (?) in the usa? How about elsewhere?
>>>>>>>> 
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> I am under the impression that many IXPs remain very
>successful,
>>>>>>>> states without them suffer, and I also find the concept of
>doing micro
>>>>>>>> IXPs at the city level, appealing, and now achievable with
>cheap gear.
>>>>>>>> Finer grained cross connects between telco and ISP and IXP
>would lower
>>>>>>>> latencies across town quite hugely...
>>>>>>>> 
>>>>>>>> PS I hear ARIN is planning on dropping the price for, and
>bundling 3
>>>>>>>> BGP AS numbers at a time, as of the end of this year, also.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Oct 30:
>https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.html
>>>>>>>> Dave Täht CSO, LibreQos
>>>>> 
>>>> _______________________________________________
>>>> Nnagain mailing list
>>>> Nnagain at lists.bufferbloat.net
><mailto:Nnagain at lists.bufferbloat.net>
>>>> https://lists.bufferbloat.net/listinfo/nnagain
>>> 
>>> 
>> 
>> _______________________________________________
>> Nnagain mailing list
>> Nnagain at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/nnagain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/nnagain/attachments/20231016/d6583cf8/attachment-0001.html>


More information about the Nnagain mailing list