From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david@lang.hm>
Received: from bifrost.lang.hm (mail.lang.hm [64.81.33.126])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 92B0321F618
	for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 25 Jul 2014 14:03:39 -0700 (PDT)
Received: from asgard (asgard.lang.hm [10.0.0.100])
	by bifrost.lang.hm (8.13.4/8.13.4/Debian-3) with ESMTP id
	s6PL3cKL005896 for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 25 Jul 2014 14:03:38 -0700
Received: from lang.hm (localhost [127.0.0.1])
	by asgard (Postfix) with ESMTP id AAABD13818B
	for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 25 Jul 2014 14:03:38 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Fri, 25 Jul 2014 14:03:38 -0700
From: David Lang <david@lang.hm>
To: <cerowrt-devel@lists.bufferbloat.net>
In-Reply-To: <13144.1406313454@turing-police.cc.vt.edu>
References: <CACj-SW2xRzNJa_c7CyOGzY-Yvun7UjNyp0W0aeF5DjO_Guu=ag@mail.gmail.com>
	<13144.1406313454@turing-police.cc.vt.edu>
Message-ID: <36889fad276c5cdd1cd083d1c83f2265@lang.hm>
X-Sender: david@lang.hm
User-Agent: Roundcube Webmail/0.5.1
Subject: Re: [Cerowrt-devel] Ideas on how to simplify and popularize
 bufferbloat control for consideration.
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 25 Jul 2014 21:03:39 -0000

 On Fri, 25 Jul 2014 14:37:34 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Sat, 24 May 2014 10:02:53 -0400, "R." said:
>
>> Further, this function could be auto-scheduled or made enabled on
>> router boot up.
>
> Yeah, if such a thing worked, it would be good.
>
> (Note in the following that a big part of my *JOB* is doing "What 
> could
> possibly go wrong?" analysis on mission-critical systems, which tends
> to color
> my viewpoint on projects. I still think the basic concept is good, 
> just
> difficult to do, and am listing the obvious challenges for anybody 
> brave
> enough to tackle it... :)
>
>> I must be missing something important which prevents this. What is 
>> it?
>
> There's a few biggies.  The first is what the linux-kernel calls 
> -ENOPATCH -
> nobody's written the code.  The second is you need an upstream target
> someplace
> to test against.  You need to deal with both the "server is 
> unavalailable due
> to a backhoe incident 2 time zones away" problem (which isn't *that*
> hard, just
> default to Something Not Obviously Bad(TM), and "server is 
> slashdotted" (whci
> is a bit harder to deal with.  Remember that there's some really odd 
> corner
> cases to worry about - for instance, if there's a power failure in a
> town, then
> when the electric company restores power you're going to have every
> cerowrt box
> hit the server within a few seconds - all over the same uplink most
> likely.  No
> good data can result from that... (Holy crap, it's been almost 3
> decades since
> I first saw a Sun 3/280 server tank because 12 Sun 3/50s all rebooted
> over the
> network at once when building power was restored).
>
> And if you're in Izbekistan and the closest server netwise is at 60
> Hudson, the
> analysis to compute the correct values becomes.... interesting.
>
> Dealing with non-obvious error conditions is also a challenge - a 
> router
> may only boot once every few months.  And if you happen to be booting 
> just
> as a BGP routing flap is causing your traffic to take a vastly 
> suboptimal
> path, you may end up encoding a vastly inaccurate setting and have it 
> stuck
> there, causing suckage for non-obvious reasons for the non-technical, 
> so you
> really don't want to enable auto-tuning unless you also have a good 
> plan for
> auto-*RE*tuning....

 have the router record it's finding, and then repeat the test 
 periodically, recording it's finding as well. If the new finding is 
 substantially different from the prior ones, schedule a retest 'soon' 
 (or default to the prior setting if it's bad enough), otherwise, if 
 there aren't many samples, schedule a test 'soon' if there are a lot of 
 samples, schedule a test in a while.

 However, I think the big question is how much the tuning is required.

 If a connection with BQL and fq_codel is 90% as good as a tuned setup, 
 default to untuned unless the user explicitly hits a button to measure 
 (and then a second button to accept the measurement)

 If BQL and fw_codel by default are M70% as good as a tuned setup, 
 there's more space to argue that all setups must be tuned, but then the 
 question is how to they fare against a old, non-BQL, non-fq-codel setup? 
 if they are considerably better, it may still be worthwhile.

 David Lang