[Cerowrt-devel] Ideas on how to simplify and popularize bufferbloat control for consideration.

Fri Jul 25 17:03:38 EDT 2014

 On Fri, 25 Jul 2014 14:37:34 -0400, Valdis.Kletnieks at vt.edu wrote:
> On Sat, 24 May 2014 10:02:53 -0400, "R." said:
>
>> Further, this function could be auto-scheduled or made enabled on
>> router boot up.
>
> Yeah, if such a thing worked, it would be good.
>
> (Note in the following that a big part of my *JOB* is doing "What 
> could
> possibly go wrong?" analysis on mission-critical systems, which tends
> to color
> my viewpoint on projects. I still think the basic concept is good, 
> just
> difficult to do, and am listing the obvious challenges for anybody 
> brave
> enough to tackle it... :)
>
>> I must be missing something important which prevents this. What is 
>> it?
>
> There's a few biggies.  The first is what the linux-kernel calls 
> -ENOPATCH -
> nobody's written the code.  The second is you need an upstream target
> someplace
> to test against.  You need to deal with both the "server is 
> unavalailable due
> to a backhoe incident 2 time zones away" problem (which isn't *that*
> hard, just
> default to Something Not Obviously Bad(TM), and "server is 
> slashdotted" (whci
> is a bit harder to deal with.  Remember that there's some really odd 
> corner
> cases to worry about - for instance, if there's a power failure in a
> town, then
> when the electric company restores power you're going to have every
> cerowrt box
> hit the server within a few seconds - all over the same uplink most
> likely.  No
> good data can result from that... (Holy crap, it's been almost 3
> decades since
> I first saw a Sun 3/280 server tank because 12 Sun 3/50s all rebooted
> over the
> network at once when building power was restored).
>
> And if you're in Izbekistan and the closest server netwise is at 60
> Hudson, the
> analysis to compute the correct values becomes.... interesting.
>
> Dealing with non-obvious error conditions is also a challenge - a 
> router
> may only boot once every few months.  And if you happen to be booting 
> just
> as a BGP routing flap is causing your traffic to take a vastly 
> suboptimal
> path, you may end up encoding a vastly inaccurate setting and have it 
> stuck
> there, causing suckage for non-obvious reasons for the non-technical, 
> so you
> really don't want to enable auto-tuning unless you also have a good 
> plan for
> auto-*RE*tuning....

 have the router record it's finding, and then repeat the test 
 periodically, recording it's finding as well. If the new finding is 
 substantially different from the prior ones, schedule a retest 'soon' 
 (or default to the prior setting if it's bad enough), otherwise, if 
 there aren't many samples, schedule a test 'soon' if there are a lot of 
 samples, schedule a test in a while.

 However, I think the big question is how much the tuning is required.

 If a connection with BQL and fq_codel is 90% as good as a tuned setup, 
 default to untuned unless the user explicitly hits a button to measure 
 (and then a second button to accept the measurement)

 If BQL and fw_codel by default are M70% as good as a tuned setup, 
 there's more space to argue that all setups must be tuned, but then the 
 question is how to they fare against a old, non-BQL, non-fq-codel setup? 
 if they are considerably better, it may still be worthwhile.

 David Lang