From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bifrost.lang.hm (mail.lang.hm [64.81.33.126]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 92B0321F618 for ; Fri, 25 Jul 2014 14:03:39 -0700 (PDT) Received: from asgard (asgard.lang.hm [10.0.0.100]) by bifrost.lang.hm (8.13.4/8.13.4/Debian-3) with ESMTP id s6PL3cKL005896 for ; Fri, 25 Jul 2014 14:03:38 -0700 Received: from lang.hm (localhost [127.0.0.1]) by asgard (Postfix) with ESMTP id AAABD13818B for ; Fri, 25 Jul 2014 14:03:38 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 25 Jul 2014 14:03:38 -0700 From: David Lang To: In-Reply-To: <13144.1406313454@turing-police.cc.vt.edu> References: <13144.1406313454@turing-police.cc.vt.edu> Message-ID: <36889fad276c5cdd1cd083d1c83f2265@lang.hm> X-Sender: david@lang.hm User-Agent: Roundcube Webmail/0.5.1 Subject: Re: [Cerowrt-devel] Ideas on how to simplify and popularize bufferbloat control for consideration. X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jul 2014 21:03:39 -0000 On Fri, 25 Jul 2014 14:37:34 -0400, Valdis.Kletnieks@vt.edu wrote: > On Sat, 24 May 2014 10:02:53 -0400, "R." said: > >> Further, this function could be auto-scheduled or made enabled on >> router boot up. > > Yeah, if such a thing worked, it would be good. > > (Note in the following that a big part of my *JOB* is doing "What > could > possibly go wrong?" analysis on mission-critical systems, which tends > to color > my viewpoint on projects. I still think the basic concept is good, > just > difficult to do, and am listing the obvious challenges for anybody > brave > enough to tackle it... :) > >> I must be missing something important which prevents this. What is >> it? > > There's a few biggies. The first is what the linux-kernel calls > -ENOPATCH - > nobody's written the code. The second is you need an upstream target > someplace > to test against. You need to deal with both the "server is > unavalailable due > to a backhoe incident 2 time zones away" problem (which isn't *that* > hard, just > default to Something Not Obviously Bad(TM), and "server is > slashdotted" (whci > is a bit harder to deal with. Remember that there's some really odd > corner > cases to worry about - for instance, if there's a power failure in a > town, then > when the electric company restores power you're going to have every > cerowrt box > hit the server within a few seconds - all over the same uplink most > likely. No > good data can result from that... (Holy crap, it's been almost 3 > decades since > I first saw a Sun 3/280 server tank because 12 Sun 3/50s all rebooted > over the > network at once when building power was restored). > > And if you're in Izbekistan and the closest server netwise is at 60 > Hudson, the > analysis to compute the correct values becomes.... interesting. > > Dealing with non-obvious error conditions is also a challenge - a > router > may only boot once every few months. And if you happen to be booting > just > as a BGP routing flap is causing your traffic to take a vastly > suboptimal > path, you may end up encoding a vastly inaccurate setting and have it > stuck > there, causing suckage for non-obvious reasons for the non-technical, > so you > really don't want to enable auto-tuning unless you also have a good > plan for > auto-*RE*tuning.... have the router record it's finding, and then repeat the test periodically, recording it's finding as well. If the new finding is substantially different from the prior ones, schedule a retest 'soon' (or default to the prior setting if it's bad enough), otherwise, if there aren't many samples, schedule a test 'soon' if there are a lot of samples, schedule a test in a while. However, I think the big question is how much the tuning is required. If a connection with BQL and fq_codel is 90% as good as a tuned setup, default to untuned unless the user explicitly hits a button to measure (and then a second button to accept the measurement) If BQL and fw_codel by default are M70% as good as a tuned setup, there's more space to argue that all setups must be tuned, but then the question is how to they fare against a old, non-BQL, non-fq-codel setup? if they are considerably better, it may still be worthwhile. David Lang