[Cerowrt-devel] Ideas on how to simplify and popularize bufferbloat control for consideration.

Sat Jul 26 07:30:08 EDT 2014

Hi David,

On Jul 25, 2014, at 23:03 , David Lang <david at lang.hm> wrote:

> On Fri, 25 Jul 2014 14:37:34 -0400, Valdis.Kletnieks at vt.edu wrote:
>> On Sat, 24 May 2014 10:02:53 -0400, "R." said:
>> 
>>> Further, this function could be auto-scheduled or made enabled on
>>> router boot up.
>> 
>> Yeah, if such a thing worked, it would be good.
>> 
>> (Note in the following that a big part of my *JOB* is doing "What could
>> possibly go wrong?" analysis on mission-critical systems, which tends
>> to color
>> my viewpoint on projects. I still think the basic concept is good, just
>> difficult to do, and am listing the obvious challenges for anybody brave
>> enough to tackle it... :)
>> 
>>> I must be missing something important which prevents this. What is it?
>> 
>> There's a few biggies.  The first is what the linux-kernel calls -ENOPATCH -
>> nobody's written the code.  The second is you need an upstream target
>> someplace
>> to test against.  You need to deal with both the "server is unavalailable due
>> to a backhoe incident 2 time zones away" problem (which isn't *that*
>> hard, just
>> default to Something Not Obviously Bad(TM), and "server is slashdotted" (whci
>> is a bit harder to deal with.  Remember that there's some really odd corner
>> cases to worry about - for instance, if there's a power failure in a
>> town, then
>> when the electric company restores power you're going to have every
>> cerowrt box
>> hit the server within a few seconds - all over the same uplink most
>> likely.  No
>> good data can result from that... (Holy crap, it's been almost 3
>> decades since
>> I first saw a Sun 3/280 server tank because 12 Sun 3/50s all rebooted
>> over the
>> network at once when building power was restored).
>> 
>> And if you're in Izbekistan and the closest server netwise is at 60
>> Hudson, the
>> analysis to compute the correct values becomes.... interesting.
>> 
>> Dealing with non-obvious error conditions is also a challenge - a router
>> may only boot once every few months.  And if you happen to be booting just
>> as a BGP routing flap is causing your traffic to take a vastly suboptimal
>> path, you may end up encoding a vastly inaccurate setting and have it stuck
>> there, causing suckage for non-obvious reasons for the non-technical, so you
>> really don't want to enable auto-tuning unless you also have a good plan for
>> auto-*RE*tuning....
> 
> have the router record it's finding, and then repeat the test periodically, recording it's finding as well. If the new finding is substantially different from the prior ones, schedule a retest 'soon' (or default to the prior setting if it's bad enough), otherwise, if there aren't many samples, schedule a test 'soon' if there are a lot of samples, schedule a test in a while.

	Yeah, keeping some history to “predict” when to measure next sounds clever.

> 
> However, I think the big question is how much the tuning is required.

I assume in most cases you need to measure the home-routers bandwidth rarely (say on DSL only after a re-sync with the DSLAM), but you need to measure the bandwidth early as only then you can properly shape the downlink. And we need to know the link’s capacity to use traffic shaping so that BQL and fq_codel in the router have control over the bottleneck queue… An equivalent of BQL and fq_codel running in the DSLAM/CMTS and CPE obviously would be what we need, because then BQL and fq_codel on the router would be all that is required. But that does not seem like it is happening anytime soon, so we still need to workaround the limitations in the equipment fr a long time to come, I fear. 

> 
> If a connection with BQL and fq_codel is 90% as good as a tuned setup, default to untuned unless the user explicitly hits a button to measure (and then a second button to accept the measurement)
> 
> If BQL and fw_codel by default are M70% as good as a tuned setup, there's more space to argue that all setups must be tuned, but then the question is how to they fare against a old, non-BQL, non-fq-codel setup? if they are considerably better, it may still be worthwhile.

Best Regards
	Sebastian

> 
> David Lang
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel