[Cerowrt-devel] Ideas on how to simplify and popularize bufferbloat control for consideration.
moeller0 at gmx.de
Sat Jul 26 17:25:35 EDT 2014
On Jul 26, 2014, at 22:39 , David Lang <david at lang.hm> wrote:
> On Sat, 26 Jul 2014, Sebastian Moeller wrote:
>> Hi David,
>> On Jul 25, 2014, at 23:03 , David Lang <david at lang.hm> wrote:
>>> On Fri, 25 Jul 2014 14:37:34 -0400, Valdis.Kletnieks at vt.edu wrote:
>>>> On Sat, 24 May 2014 10:02:53 -0400, "R." said:
>>>>> Further, this function could be auto-scheduled or made enabled on
>>>>> router boot up.
>>>> Yeah, if such a thing worked, it would be good.
>>>> (Note in the following that a big part of my *JOB* is doing "What could
>>>> possibly go wrong?" analysis on mission-critical systems, which tends
>>>> to color
>>>> my viewpoint on projects. I still think the basic concept is good, just
>>>> difficult to do, and am listing the obvious challenges for anybody brave
>>>> enough to tackle it... :)
>>>>> I must be missing something important which prevents this. What is it?
>>>> There's a few biggies. The first is what the linux-kernel calls -ENOPATCH -
>>>> nobody's written the code. The second is you need an upstream target
>>>> to test against. You need to deal with both the "server is unavalailable due
>>>> to a backhoe incident 2 time zones away" problem (which isn't *that*
>>>> hard, just
>>>> default to Something Not Obviously Bad(TM), and "server is slashdotted" (whci
>>>> is a bit harder to deal with. Remember that there's some really odd corner
>>>> cases to worry about - for instance, if there's a power failure in a
>>>> town, then
>>>> when the electric company restores power you're going to have every
>>>> cerowrt box
>>>> hit the server within a few seconds - all over the same uplink most
>>>> likely. No
>>>> good data can result from that... (Holy crap, it's been almost 3
>>>> decades since
>>>> I first saw a Sun 3/280 server tank because 12 Sun 3/50s all rebooted
>>>> over the
>>>> network at once when building power was restored).
>>>> And if you're in Izbekistan and the closest server netwise is at 60
>>>> Hudson, the
>>>> analysis to compute the correct values becomes.... interesting.
>>>> Dealing with non-obvious error conditions is also a challenge - a router
>>>> may only boot once every few months. And if you happen to be booting just
>>>> as a BGP routing flap is causing your traffic to take a vastly suboptimal
>>>> path, you may end up encoding a vastly inaccurate setting and have it stuck
>>>> there, causing suckage for non-obvious reasons for the non-technical, so you
>>>> really don't want to enable auto-tuning unless you also have a good plan for
>>> have the router record it's finding, and then repeat the test periodically, recording it's finding as well. If the new finding is substantially different from the prior ones, schedule a retest 'soon' (or default to the prior setting if it's bad enough), otherwise, if there aren't many samples, schedule a test 'soon' if there are a lot of samples, schedule a test in a while.
>> Yeah, keeping some history to “predict” when to measure next sounds clever.
>>> However, I think the big question is how much the tuning is required.
>> I assume in most cases you need to measure the home-routers bandwidth rarely (say on DSL only after a re-sync with the DSLAM), but you need to measure the bandwidth early as only then you can properly shape the downlink. And we need to know the link’s capacity to use traffic shaping so that BQL and fq_codel in the router have control over the bottleneck queue… An equivalent of BQL and fq_codel running in the DSLAM/CMTS and CPE obviously would be what we need, because then BQL and fq_codel on the router would be all that is required. But that does not seem like it is happening anytime soon, so we still need to workaround the limitations in the equipment fr a long time to come, I fear.
> by how much tuning is required, I wasn't meaning how frequently to tune, but how close default settings can come to the performance of a expertly tuned setup.
> Ideally the tuning takes into account the characteristics of the hardware of the link layer. If it's IP encapsulated in something else (ATM, PPPoE, VPN, VLAN tagging, ethernet with jumbo packet support for example), then you have overhead from the encapsulation that you would ideally take into account when tuning things.
> the question I'm talking about below is how much do you loose compared to the idea if you ignore this sort of thing and just assume that the wire is dumb and puts the bits on them as you send them? By dumb I mean don't even allow for inter-packet gaps, don't measure the bandwidth, don't try to pace inbound connections by the timing of your acks, etc. Just run BQL and fq_codel and start the BQL sizes based on the wire speed of your link (Gig-E on the 3800) and shrink them based on long-term passive observation of the sender.
As data talks I just did a quick experiment with my ADSL2+ koine at home. The solid lines in the attached plot show the results for proper shaping with SQM (shaping to 95% of del link rates of downstream and upstream while taking the link layer properties, that is ATM encapsulation and per packet overhead into account) the broken lines show the same system with just the link layer adjustments and per packet overhead adjustments disabled, but still shaping to 95% of link rate (this is roughly equivalent to 15% underestimation of the packet size). The actual theist is netperf-wrappers RRUL (4 tcp streams up, 4 tcp steams down while measuring latency with ping and UDP probes). As you can see from the plot just getting the link layer encapsulation wrong destroys latency under load badly. The host is ~52ms RTT away, and with fq_codel the ping time per leg is just increased one codel target of 5ms each resulting in an modest latency increase of ~10ms with proper shaping for a total of ~65ms, with improper shaping RTTs increase to ~95ms (they almost double), so RTT increases by ~43ms. Also note how the extremes for the broken lines are much worse than for the solid lines. In short I would estimate that a slight misjudgment (15%) results in almost 80% increase of latency under load. In other words getting the rates right matters a lot. (I should also note that in my setup there is a secondary router that limits RTT to max 300ms, otherwise the broken lines might look even worse...)
> If you end up only loosing 5-10% of your overall network performance by ignoring the details of the wire, then we should ignore them by default.
> If however, not measuring anything first results in significantly worse performance than a tuned setup, then we need to figure out how to do the measurements needed for tuning.
> Some people seem to have fallen into the "perfect is the enemy of good enough" trap on this topic. They are so fixated on getting the absolute best performance out of a link that they are forgetting how bad the status-quo is right now.
> If you look at the graph that Dave Taht put on page 6 of his slide deck http://snapon.lab.bufferbloat.net/~d/Presos/CaseForComprehensiveQueueManagement/assets/player/KeynoteDHTMLPlayer.html#5 it's important to realize that even the worst of the BQL+fq_codel graphs is worlds better than the default setting, while it would be nice to get to the green trace on the left, even getting to the middle traces instead of the black trace on the right would be a huge win for the public.
Just to note in the plot above the connection to the DSL modem was always mediated by fq_codel and BQL? and since shaping was used BQL would not come into effect…
> David Lang
>>> If a connection with BQL and fq_codel is 90% as good as a tuned setup, default to untuned unless the user explicitly hits a button to measure (and then a second button to accept the measurement)
>>> If BQL and fw_codel by default are M70% as good as a tuned setup, there's more space to argue that all setups must be tuned, but then the question is how to they fare against a old, non-BQL, non-fq-codel setup? if they are considerably better, it may still be worthwhile.
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 344086 bytes
Desc: not available
More information about the Cerowrt-devel