[Cerowrt-devel] Ideas on how to simplify and popularize bufferbloat control for consideration.

Sat Jul 26 16:39:59 EDT 2014

On Sat, 26 Jul 2014, Sebastian Moeller wrote:

> Hi David,
>
>
> On Jul 25, 2014, at 23:03 , David Lang <david at lang.hm> wrote:
>
>> On Fri, 25 Jul 2014 14:37:34 -0400, Valdis.Kletnieks at vt.edu wrote:
>>> On Sat, 24 May 2014 10:02:53 -0400, "R." said:
>>>
>>>> Further, this function could be auto-scheduled or made enabled on
>>>> router boot up.
>>>
>>> Yeah, if such a thing worked, it would be good.
>>>
>>> (Note in the following that a big part of my *JOB* is doing "What could
>>> possibly go wrong?" analysis on mission-critical systems, which tends
>>> to color
>>> my viewpoint on projects. I still think the basic concept is good, just
>>> difficult to do, and am listing the obvious challenges for anybody brave
>>> enough to tackle it... :)
>>>
>>>> I must be missing something important which prevents this. What is it?
>>>
>>> There's a few biggies.  The first is what the linux-kernel calls -ENOPATCH -
>>> nobody's written the code.  The second is you need an upstream target
>>> someplace
>>> to test against.  You need to deal with both the "server is unavalailable due
>>> to a backhoe incident 2 time zones away" problem (which isn't *that*
>>> hard, just
>>> default to Something Not Obviously Bad(TM), and "server is slashdotted" (whci
>>> is a bit harder to deal with.  Remember that there's some really odd corner
>>> cases to worry about - for instance, if there's a power failure in a
>>> town, then
>>> when the electric company restores power you're going to have every
>>> cerowrt box
>>> hit the server within a few seconds - all over the same uplink most
>>> likely.  No
>>> good data can result from that... (Holy crap, it's been almost 3
>>> decades since
>>> I first saw a Sun 3/280 server tank because 12 Sun 3/50s all rebooted
>>> over the
>>> network at once when building power was restored).
>>>
>>> And if you're in Izbekistan and the closest server netwise is at 60
>>> Hudson, the
>>> analysis to compute the correct values becomes.... interesting.
>>>
>>> Dealing with non-obvious error conditions is also a challenge - a router
>>> may only boot once every few months.  And if you happen to be booting just
>>> as a BGP routing flap is causing your traffic to take a vastly suboptimal
>>> path, you may end up encoding a vastly inaccurate setting and have it stuck
>>> there, causing suckage for non-obvious reasons for the non-technical, so you
>>> really don't want to enable auto-tuning unless you also have a good plan for
>>> auto-*RE*tuning....
>>
>> have the router record it's finding, and then repeat the test periodically, recording it's finding as well. If the new finding is substantially different from the prior ones, schedule a retest 'soon' (or default to the prior setting if it's bad enough), otherwise, if there aren't many samples, schedule a test 'soon' if there are a lot of samples, schedule a test in a while.
>
> 	Yeah, keeping some history to “predict” when to measure next sounds clever.
>
>>
>> However, I think the big question is how much the tuning is required.
>
> I assume in most cases you need to measure the home-routers bandwidth rarely 
> (say on DSL only after a re-sync with the DSLAM), but you need to measure the 
> bandwidth early as only then you can properly shape the downlink. And we need 
> to know the link’s capacity to use traffic shaping so that BQL and fq_codel in 
> the router have control over the bottleneck queue… An equivalent of BQL and 
> fq_codel running in the DSLAM/CMTS and CPE obviously would be what we need, 
> because then BQL and fq_codel on the router would be all that is required. But 
> that does not seem like it is happening anytime soon, so we still need to 
> workaround the limitations in the equipment fr a long time to come, I fear.

by how much tuning is required, I wasn't meaning how frequently to tune, but how 
close default settings can come to the performance of a expertly tuned setup.

Ideally the tuning takes into account the characteristics of the hardware of the 
link layer. If it's IP encapsulated in something else (ATM, PPPoE, VPN, VLAN 
tagging, ethernet with jumbo packet support for example), then you have overhead 
from the encapsulation that you would ideally take into account when tuning 
things.

the question I'm talking about below is how much do you loose compared to the 
idea if you ignore this sort of thing and just assume that the wire is dumb and 
puts the bits on them as you send them? By dumb I mean don't even allow for 
inter-packet gaps, don't measure the bandwidth, don't try to pace inbound 
connections by the timing of your acks, etc. Just run BQL and fq_codel and start 
the BQL sizes based on the wire speed of your link (Gig-E on the 3800) and 
shrink them based on long-term passive observation of the sender.

If you end up only loosing 5-10% of your overall network performance by ignoring 
the details of the wire, then we should ignore them by default.

If however, not measuring anything first results in significantly worse 
performance than a tuned setup, then we need to figure out how to do the 
measurements needed for tuning.

Some people seem to have fallen into the "perfect is the enemy of good enough" 
trap on this topic. They are so fixated on getting the absolute best performance 
out of a link that they are forgetting how bad the status-quo is right now.

If you look at the graph that Dave Taht put on page 6 of his slide deck 
http://snapon.lab.bufferbloat.net/~d/Presos/CaseForComprehensiveQueueManagement/assets/player/KeynoteDHTMLPlayer.html#5 
it's important to realize that even the worst of the BQL+fq_codel graphs is 
worlds better than the default setting, while it would be nice to get to the 
green trace on the left, even getting to the middle traces instead of the black 
trace on the right would be a huge win for the public.

David Lang

>> If a connection with BQL and fq_codel is 90% as good as a tuned setup, 
>> default to untuned unless the user explicitly hits a button to measure (and 
>> then a second button to accept the measurement)
>>
>> If BQL and fw_codel by default are M70% as good as a tuned setup, there's 
>> more space to argue that all setups must be tuned, but then the question is 
>> how to they fare against a old, non-BQL, non-fq-codel setup? if they are 
>> considerably better, it may still be worthwhile.