From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bobcat.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id D86AF3B2A4 for ; Tue, 24 Oct 2023 01:16:23 -0400 (EDT) Received: from [192.168.1.59] (c-69-181-111-171.hsd1.ca.comcast.net [69.181.111.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by bobcat.rjmcmahon.com (Postfix) with ESMTPSA id 97F441B203; Mon, 23 Oct 2023 22:16:22 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 bobcat.rjmcmahon.com 97F441B203 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rjmcmahon.com; s=bobcat; t=1698124582; bh=nUjqMSTy6AyFlPkTo3qhpYrZnFlQ4qwiGdH0eV/unbY=; h=In-Reply-To:References:Subject:From:Date:To:CC:From; b=qI2ORaBl9EWI6I3U5vSfLZODiWyGDtyO1GgLTV3ZO7tc9rAk6gLw8ROm2TwpHo3km f616iRTbLXrcPD7xgKt5po38Ea+CScvwtf3Fa5jAyuyXsJYYlqqmTBy99YHM+4Ga+8 SMgJ914zKyohel7On4ytOHFwlv0fokSJ8JQZTTgI= In-Reply-To: <50c07326-781d-40de-8e2c-92d84bd84cf1@cavebear.com> References: <7dd9294f-2871-46cc-bbc5-e72f3becd73d@cavebear.com> <50c07326-781d-40de-8e2c-92d84bd84cf1@cavebear.com> X-Referenced-Uid: 000117c3567702d5 Thread-Topic: Re: [NNagain] upgrading old routers to modern, secure FOSS User-Agent: Android X-Is-Generated-Message-Id: true MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----GTYYMG2L1QSKR0TWCHBXBHOM4N4GL4" Content-Transfer-Encoding: 7bit From: Robert McMahon Date: Mon, 23 Oct 2023 22:16:35 -0700 To: karl@cavebear.com CC: Dave Taht via Nnagain Message-ID: <972bc1d3-e4bd-40c1-81ee-85e85892ce7d@rjmcmahon.com> Subject: Re: [NNagain] upgrading old routers to modern, secure FOSS X-BeenThere: nnagain@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: =?utf-8?q?Network_Neutrality_is_back!_Let=C2=B4s_make_the_technical_aspects_heard_this_time!?= List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Oct 2023 05:16:24 -0000 ------GTYYMG2L1QSKR0TWCHBXBHOM4N4GL4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Thanks, this is very interesting=2E I wrote code to DMA packets in an early= Cisco switch and the hardware ASIC that did the movement across the fabric= would provide a simple status of success or not=2E Unfortunately, the ASIC= would at times indicate success and never move the packet across the fabri= c and then went to a state of using the wrong egress for all subsequent pac= kets=2E It wasn't possible to change the ASIC as that was locked down years= earlier=2E Luckily, we had the ability to query the ASIC to get more infor= mation on what it actually did so the code could see when it needed a fix= =2E We did the FDIR, lost a bunch of packets, and assumed TCP would handle = it=2E Of course, TCP designers assumed the loss was due to congestion so th= ose state machines were incorrect but would ultimately recover=2E I starte= d my career working on a NASA FDDI network=2E SW had gotten so complex that= all the states could not be inspected by humans like done on the Shuttle, = nor even tested by conmputers=2E The strategy became commercial off the she= lf (COTS) because, through "market magic", it was assumed fully tested=2E = I think the same naivety is now applied to open source code=2E There is no = magic here either=2E Testing is way beyond simple scenarios repeated over a= nd over again as the only test that matters=2E Networks and distributed sy= stems have bugs=2E I think a current Linux kernel is 30M lines of code and = 1100 config options=2E Good luck in testing that=2E This is beyond complex= and not easy=2E FDIR has to be designed in from the get go=2E Bob On Oct= 23, 2023, 4:22 PM, at 4:22 PM, Karl Auerbach wrote: = >On 10/23/23 2:54 PM, rjmcmahon wrote: >> Home networks today are embarrass= ing to me=2E Our industry is woefully >> behind here=2E >> >I would be mor= e expansive=2E > >(Bringing this back to network neutrality - my argument, = not clearly >suggested below, is that "neutrality" is more than bandwidth = or >connectivity but ought also ought to include other aspects including = >robust and repairable service in the face of reasonably foreseeable >even= ts=2E=C2=A0 By-the-way, when I was involved in the early days of the net, >= I >worked for groups [such as the US Joint Chiefs] who thought that >route= rs >being vaporized by nuclear explosions were "reasonable foreseeable"=2E= ) > >The lawyer half of me lives in fear of the harm that can come from bad= >code in network devices=2E=C2=A0 I've seen the growth of strict product = >liability laws in the consumer space (sometimes resulting in those >silly= >"do not eat" labels on silica gel packets, but also resulting in >impor= tant steps, like pressure-release closures on cleaning products >that cont= ain dry sodium hydroxide, or dual braking systems in >automobiles=2E) > >An= d the railroad nut in me remembers that Murphy's law is as strong as >ever= =2E=C2=A0 (Just ask "Why are highway and railroad traffic control signals = >red and green [actually a quite bluish green]?" [Hint, they originally >w= ere red and white, and sometimes the red colored lens would fall >out=2E]) = > >When I was working with the DARPA Robotics Challenge my job was to >int= roduce network problems - the kinds of things that can happen in >real >li= fe when a robot operates in a disaster zone=2E=C2=A0 I could introduce a >= simple change - like increasing the level of lost Ethernet frames when >a = >robot went through a door into a (simulated) concrete reactor building >- = >and the robot would simply stop or fall over=2E > >I've seen videos of an= imal surgery performed by remote control over a >long distance (50km) netw= ork link where the doctors presumed that the >net was endlessly flawless= =2E=C2=A0 (I have this mental image of a robotic >scalpel overshooting its= cut due to a non-idempotent command contained >in a packet that was repli= cated on the net=2E) > >And I've seen users of satellites fail to remember = that every now and >then, from the point of view of a ground station, a sa= tellite may >transit across the face of the sun (a highly predictable even= t) and be >temporarily blinded and unable to receive data=2E > >Many of ou= r implementations today are hanging on only because modern >machines have = gobs upon gobs of memory and nobody notices if a couple >of >gigabytes lea= k or are uselessly allocated for a few minutes=2E > >(For instance, one way= to stop a Linux stack is to send it patterns of >tiny IPv4 fragments that= overlap or have gaps so that reassembly is not > >possible (or difficult) = and buffers just sit there waiting for a rather > >long timeout before bein= g reclaimed=2E) > >It seems that everybody and her brothers think they can = write code=2E=C2=A0 >And >they do=2E=C2=A0 And in our open source world th= e code they write is often >protocol code=2E=C2=A0 Often it is badly writt= en protocol code containing >monumental flaws, such as use of "integer" ty= pes in C (when "unsigned >uint16" or similar is needed), failure to recogn= ize that number spaces >wrap, assumptions that "everything is in ASCII" or= that character >sequences do not contain null bytes=2E (Last time I looke= d some major >libraries went down in flames when string data in packets ha= ppened to >contain nulls - the code was using ancient Unix/C string routin= es=2E)=C2=A0 I >once sent several SIP phones into the weeds when I sent le= ngth fields >(in text form) with leading zero characters (e=2Eg=2E 050 rat= her than 50) - > >some code treated that as octal!) > >It would certainly b= e nice if we had a body of network implementation >design/implementation r= ules - similar in concept to engineering design >rules used in bridges, ai= rcraft, electrical networks, etc - for use >when >writing code=2E=C2=A0 An= y one who wanted to do something outside of those >rules >could do so, but= would be strongly "encouraged" to seek the advice and >oversight of other= s=2E > >Once the Interop show net was brought to a stop (by infinitely loop= ing >packets) when two brands of routers had different notions how to expa= nd > >IPv4 multicast addresses into MAC addresses=2E=C2=A0 (I can't remembe= r the >details, but when every light in the NOC turned red everybody in th= e >Interop NOC turned to look at me, guessing [incorrectly in this >insta= nce] that I was the cause=2E]) > >It would be nice if we built our network = devices so that they each had >a >little introspective daemon that frequen= tly asked "am I healthy, am I >still connected, are packets still moving t= hrough me?"=C2=A0 (For consumer >devices an answer of "no" could trigger a= full device reboot or reset=2E) > >For larger devices, such as routers, we= could have some machinery, >internal or external, that did a bit of model= ling and informed the >routing machinery of anticipated queue lengths and = similar metrics=2E=C2=A0 >Then the router could monitor itself to check if= it was wobbling >outside >of those anticipated ranges and take appropriat= e action to signal the >issue=2E=C2=A0 (I was once quite surprised to lear= n on at least one large type > >of router that it was difficult-to-impossib= le to obtain queue length >data because so much function had been pushed i= nto hardware that had >few >test or measurement points=2E) > >My grandfath= er and father were radio and TV repair guys=2E=C2=A0 I learned >from >an e= arly age the value of good tools and of looking outside the basic >operati= on of a device for symptoms=2E (You could often hear a failing >capacitor = or inductor; or you could smell a slowly burning resistor=2E)=C2=A0 >Our m= odern networks and code usually lack that kind of observational >(and acti= ve testing) plane=2E > >I can see a big net neutrality differentiator betwe= en providers being >"time to detect" and "time to repair"=2E > > =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 --karl-- ------GTYYMG2L1QSKR0TWCHBXBHOM4N4GL4 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Thanks, this= is very interesting=2E I wrote code to DMA packets in an early Cisco switc= h and the hardware ASIC that did the movement across the fabric would provi= de a simple status of success or not=2E Unfortunately, the ASIC would at ti= mes indicate success and never move the packet across the fabric and then w= ent to a state of using the wrong egress for all subsequent packets=2E It w= asn't possible to change the ASIC as that was locked down years earlier=2E = Luckily, we had the ability to query the ASIC to get more information on wh= at it actually did so the code could see when it needed a fix=2E We did the= FDIR, lost a bunch of packets, and assumed TCP would handle it=2E Of cours= e, TCP designers assumed the loss was due to congestion so those state mach= ines were incorrect but would ultimately recover=2E

I started my career working on a NASA FDDI network=2E SW had gott= en so complex that all the states could not be inspected by humans like don= e on the Shuttle, nor even tested by conmputers=2E The strategy became comm= ercial off the shelf (COTS) because, through "market magic", it was assumed= fully tested=2E

I think the same naivety i= s now applied to open source code=2E There is no magic here either=2E Testi= ng is way beyond simple scenarios repeated over and over again as the only = test that matters=2E

Networks and distribut= ed systems have bugs=2E I think a current Linux kernel is 30M lines of code= and 1100 config options=2E Good luck in testing that=2E

This is beyond complex and not easy=2E FDIR has to be designe= d in from the get go=2E

Bob
On Oct 23, 2023, at 4:22 PM, Karl Auerbach <karl@cavebear=2Ecom&g= t; wrote:
On 10/23/23 2:54 PM, rjmcmahon wrote:
Home networks today are embarrassing to me= =2E Our industry is woefully
behind here=2E

I would be= more expansive=2E

(Bringing this back to network neutrality - my ar= gument, not clearly
suggested below, is that "neutrality" is more than = bandwidth or
connectivity but ought also ought to include other aspects= including
robust and repairable service in the face of reasonably fore= seeable
events=2E  By-the-way, when I was involved in the early da= ys of the net, I
worked for groups [such as the US Joint Chiefs] who th= ought that routers
being vaporized by nuclear explosions were "reasonab= le foreseeable"=2E)

The lawyer half of me lives in fear of the harm = that can come from bad
code in network devices=2E  I've seen the g= rowth of strict product
liability laws in the consumer space (sometimes= resulting in those silly
"do not eat" labels on silica gel packets, bu= t also resulting in
important steps, like pressure-release closures on = cleaning products
that contain dry sodium hydroxide, or dual braking sy= stems in automobiles=2E)

And the railroad nut in me remembers that M= urphy's law is as strong as
ever=2E  (Just ask "Why are highway an= d railroad traffic control signals
red and green [actually a quite blui= sh green]?" [Hint, they originally
were red and white, and sometimes th= e red colored lens would fall out=2E])

When I was working with the D= ARPA Robotics Challenge my job was to
introduce network problems - the = kinds of things that can happen in real
life when a robot operates in a= disaster zone=2E  I could introduce a
simple change - like increa= sing the level of lost Ethernet frames when a
robot went through a door= into a (simulated) concrete reactor building -
and the robot would sim= ply stop or fall over=2E

I've seen videos of animal surgery performe= d by remote control over a
long distance (50km) network link where the = doctors presumed that the
net was endlessly flawless=2E  (I have t= his mental image of a robotic
scalpel overshooting its cut due to a non= -idempotent command contained
in a packet that was replicated on the ne= t=2E)

And I've seen users of satellites fail to remember that every = now and
then, from the point of view of a ground station, a satellite m= ay
transit across the face of the sun (a highly predictable event) and = be
temporarily blinded and unable to receive data=2E

Many of our= implementations today are hanging on only because modern
machines have= gobs upon gobs of memory and nobody notices if a couple of
gigabytes l= eak or are uselessly allocated for a few minutes=2E

(For instance, o= ne way to stop a Linux stack is to send it patterns of
tiny IPv4 fragme= nts that overlap or have gaps so that reassembly is not
possible (or di= fficult) and buffers just sit there waiting for a rather
long timeout b= efore being reclaimed=2E)

It seems that everybody and her brothers t= hink they can write code=2E  And
they do=2E  And in our open = source world the code they write is often
protocol code=2E  Often = it is badly written protocol code containing
monumental flaws, such as = use of "integer" types in C (when "unsigned
uint16" or similar is neede= d), failure to recognize that number spaces
wrap, assumptions that "eve= rything is in ASCII" or that character
sequences do not contain null by= tes=2E (Last time I looked some major
libraries went down in flames whe= n string data in packets happened to
contain nulls - the code was using= ancient Unix/C string routines=2E)  I
once sent several SIP phone= s into the weeds when I sent length fields
(in text form) with leading = zero characters (e=2Eg=2E 050 rather than 50) -
some code treated that = as octal!)

It would certainly be nice if we had a body of network im= plementation
design/implementation rules - similar in concept to engine= ering design
rules used in bridges, aircraft, electrical networks, etc = - for use when
writing code=2E  Any one who wanted to do something= outside of those rules
could do so, but would be strongly "encouraged"= to seek the advice and
oversight of others=2E

Once the Interop = show net was brought to a stop (by infinitely looping
packets) when two= brands of routers had different notions how to expand
IPv4 multicast a= ddresses into MAC addresses=2E  (I can't remember the
details, but= when every light in the NOC turned red everybody in the
Interop NOC tu= rned to look at me, guessing [incorrectly in this
instance] that I was = the cause=2E])

It would be nice if we built our network devices so t= hat they each had a
little introspective daemon that frequently asked "= am I healthy, am I
still connected, are packets still moving through me= ?"  (For consumer
devices an answer of "no" could trigger a full d= evice reboot or reset=2E)

For larger devices, such as routers, we co= uld have some machinery,
internal or external, that did a bit of modell= ing and informed the
routing machinery of anticipated queue lengths and= similar metrics=2E 
Then the router could monitor itself to check= if it was wobbling outside
of those anticipated ranges and take approp= riate action to signal the
issue=2E  (I was once quite surprised t= o learn on at least one large type
of router that it was difficult-to-i= mpossible to obtain queue length
data because so much function had been= pushed into hardware that had few
test or measurement points=2E)
My grandfather and father were radio and TV repair guys=2E  I learne= d from
an early age the value of good tools and of looking outside the = basic
operation of a device for symptoms=2E (You could often hear a fai= ling
capacitor or inductor; or you could smell a slowly burning resisto= r=2E) 
Our modern networks and code usually lack that kind of obse= rvational
(and active testing) plane=2E

I can see a big net neut= rality differentiator between providers being
"time to detect" and "tim= e to repair"=2E

        --karl--
<= br>
------GTYYMG2L1QSKR0TWCHBXBHOM4N4GL4--