[Cerowrt-devel] Fwd: Dave's wishlist [was: Source-specific routing merged]

dpreed at reed.com dpreed at reed.com
Tue Mar 17 16:38:47 EDT 2015


I agree wholeheartedly with your point, David.

One other clarifying point (I'm not trying to be pedantic, here, but it may sound that way):

Reliability is not the same as Availability.  The two are quite different.

 Bufferbloat is pretty much an "availability" issue, not a reliability issue.  In other words, packets are not getting lost.  The system is just preventing desired use.

Availability issues can be due to actual failures of components, but there are lots of availability issues that are caused (as you suggest) by attempts to focus narrowly on "loss of data" or "component failures".

When you build a system, there is a temptation to apply what is called the Fallacy of Composition (look it up on Wikipedia for precise definition).  The key thing in the Fallacy of Composition is that when a system of components has a property as a whole, then every component of the system must by definition have that property.

(The end-to-end argument is a specific rule that is based on a recognition of the Fallacy of Composition in one case.)

We all know that there is never a single moment when any moderately large part of the Internet does not contain failed components.  Yet the Internet has *very* high availability - 24x7x365, and we don't need to know very much about what parts are failing.  That's by design, of course. And it is a design that does not derive its properties from a trivial notion of "proof of correctness", or even "bug freeness"

The relevance of a "failure" or even a "design flaw" to system availability is a matter of a much bigger perspective of what the system does, and what its users perceive as to whether they can get work done.




On Tuesday, March 17, 2015 3:30pm, "David Lang" <david at lang.hm> said:

> On Tue, 17 Mar 2015, Dave Taht wrote:
> 
>> My quest is always for an extra "9" of reliability. Anyplace where you can
>> make something more robust (even if it is out at the .9999999999) level, I
>> tend to like to do in order to have the highest MTBF possible in
>> combination with all the other moving parts on the spacecraft (spaceship
>> earth).
> 
> There are different ways to add reliability
> 
> one is to try and make sure nothing ever fails
> 
> the second is to have a way of recovering when things go wrong.
> 
> 
> Bufferbloat came about because people got trapped into the first mode of
> thinking (packets should never get lost), when the right answer ended up being
> to realize that we have a recovery method and use it.
> 
> Sometimes trying to make sure nothing ever fails adds a lot of complexity to the
> code to handle all the corner cases, and the overall reliability will improve by
> instead simplify normal flow, even if it add a small number of failures, if that
> means that you can have a common set of recovery code that's well excercised and
> tested.
> 
> As you are talking about loosing packets with route changes, watch out that you
> don't fall into this trap.
> 
> David Lang
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 





More information about the Cerowrt-devel mailing list