[Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
davecb.42 at gmail.com
Sat Jan 9 18:01:32 EST 2021
At work, I recently had a database outage due to network saturation and
timeouts, which we proposed to address by setting up a QOS policy for
the machines in question. However, from the discussion in Ms Drucker's
BBR talk, that could lead us to doing /A Bad Thing/ (;-))
Let's start at the beginning, though. The talk, mentioned before in the
list, was about the interaction of BBR and large values of buffering,
specifically for video traffic. I attended it, and listened with
interest to the questions from the committee. She subsequently gave me a
copy of the paper and presentation, which I appreciate: it's very good work.
She reported the severity of the effect of large buffers on BBR. I've
attached a screenshot, but the list probably won't take it, so I'll
describe it. After the first few packets with large buffers, RTT rises,
throughput plummets and then throughput stays low for about 200,000 ms.
Then it rises to about half the initial throughput for about 50,000 ms
as RTT falls, then throughput plummets once more. This pattern repeats
throughout the test.
Increasing the buffering in the test environment turns perfectly
reasonable performance into a real disappointment, even though BBR is
trying to estimate /the network’s bandwidth-delay product, BDP, and
regulating its //sending rate to maximize throughput while attempting to
maintain BDP worth of packets in the //buffer, irrespective of the size
of the buffer/.
One of the interesting questions was about the token-bucket algorithm
used in the router to limit performance. In her paper, she discusses the
token bucket filter used by OpenWRT 19.07.1 on a Linksys WRT1900ACS
router. Allowing more than the actual bandwidth of the interface as the
/burst rate/ can exacerbate the buffering problem, so the listener was
concerned that routers "in the wild" might also be contributing to the
poor performance by using token-bucket algorithms with "excess burst
The very first Cisco manual I found in a Google search explained how to
*/set/* excess burst size (!)
defined excess burst size as /Traffic that falls between the normal
burst size and the Excess Burst size/ and specifies it will be sent
regardless, /with a probability that increases as the burst size increases./
A little later, it explains that the excess or "extended" burst size
/exists so as to avoid tail-drop behavior, and, instead,
engage behavior like that of Random Early Detection (RED)./
In order to avoid tail drop, they suggest the "extended burst" be set to
twice the burst size, where the burst size by definition is the capacity
of the interface, per unit time.
So, folks, am I right in thinking that Cisco's recommendation just might
be a /terrible/ piece of advice?
As a capacity planner, it sounds a lot like they're praying for a
conveniently timed lull after every time they let too many bytes through.
As a follower of the discussion here, the reference to tail drop and RED
sound faintly ... antique.
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb at spamcop.net | -- Mark Twain
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Bloat