* [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos @ 2021-01-09 23:01 David Collier-Brown 2021-01-10 5:39 ` Erik Auerswald 2021-01-10 14:25 ` David Collier-Brown 0 siblings, 2 replies; 7+ messages in thread From: David Collier-Brown @ 2021-01-09 23:01 UTC (permalink / raw) To: bloat; +Cc: dave.collier-brown [-- Attachment #1: Type: text/plain, Size: 3625 bytes --] At work, I recently had a database outage due to network saturation and timeouts, which we proposed to address by setting up a QOS policy for the machines in question. However, from the discussion in Ms Drucker's BBR talk, that could lead us to doing /A Bad Thing/ (;-)) Let's start at the beginning, though. The talk, mentioned before in the list[1], was about the interaction of BBR and large values of buffering, specifically for video traffic. I attended it, and listened with interest to the questions from the committee. She subsequently gave me a copy of the paper and presentation, which I appreciate: it's very good work. She reported the severity of the effect of large buffers on BBR. I've attached a screenshot, but the list probably won't take it, so I'll describe it. After the first few packets with large buffers, RTT rises, throughput plummets and then throughput stays low for about 200,000 ms. Then it rises to about half the initial throughput for about 50,000 ms as RTT falls, then throughput plummets once more. This pattern repeats throughout the test. Increasing the buffering in the test environment turns perfectly reasonable performance into a real disappointment, even though BBR is trying to estimate /the network’s bandwidth-delay product, BDP, and regulating its //sending rate to maximize throughput while attempting to maintain BDP worth of packets in the //buffer, irrespective of the size of the buffer/. One of the interesting questions was about the token-bucket algorithm used in the router to limit performance. In her paper, she discusses the token bucket filter used by OpenWRT 19.07.1 on a Linksys WRT1900ACS router. Allowing more than the actual bandwidth of the interface as the /burst rate/ can exacerbate the buffering problem, so the listener was concerned that routers "in the wild" might also be contributing to the poor performance by using token-bucket algorithms with "excess burst size" parameters. The very first Cisco manual I found in a Google search explained how to */set/* excess burst size (!) https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_plcshp/configuration/12-4/qos-plcshp-12-4-book.pdf defined excess burst size as /Traffic that falls between the normal burst size and the Excess Burst size/ and specifies it will be sent regardless, /with a probability that increases as the burst size increases./ A little later, it explains that the excess or "extended" burst size /exists so as to avoid tail-drop behavior, and, instead, engage behavior like that of Random Early Detection (RED)./ In order to avoid tail drop, they suggest the "extended burst" be set to twice the burst size, where the burst size by definition is the capacity of the interface, per unit time. So, folks, am I right in thinking that Cisco's recommendation just might be a /terrible/ piece of advice? As a capacity planner, it sounds a lot like they're praying for a conveniently timed lull after every time they let too many bytes through. As a follower of the discussion here, the reference to tail drop and RED sound faintly ... antique. --dave c-b [1. https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video ] <https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video> -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain [-- Attachment #2: Type: text/html, Size: 5479 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos 2021-01-09 23:01 [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos David Collier-Brown @ 2021-01-10 5:39 ` Erik Auerswald 2021-01-10 7:19 ` Jonathan Morton 2021-01-12 12:31 ` David Collier-Brown 2021-01-10 14:25 ` David Collier-Brown 1 sibling, 2 replies; 7+ messages in thread From: Erik Auerswald @ 2021-01-10 5:39 UTC (permalink / raw) To: bloat Hi, On Sat Jan 9 18:01:32 EST 2021, David Collier-Brown wrote: > At work, I recently had a database outage due to network saturation and > timeouts, which we proposed to address by setting up a QOS policy for > the machines in question. However, from the discussion in Ms Drucker's > BBR talk, that could lead us to doing /A Bad Thing/ (;-)) QoS policies are dangerous, they seldomly work exactly as intended. I'll assume you have already convinced yourself that you want to apply the QoS policy at a congested link, e.g., a WAN router, as opposed to shallow buffered LAN switches running Cisco IOS. If you are not using Cisco gear, then please do not assume that Cisco documentation can help you solve your problem. ;-) > Let's start at the beginning, though. The talk, mentioned before > in the list[1], was about the interaction of BBR and large values of > buffering, specifically for video traffic. I attended it, and listened > with interest to the questions from the committee. She subsequently > gave me a copy of the paper and presentation, which I appreciate: > it's very good work. The link to the talk announcement leads to an error page now. I did not find slides or a paper either. :-( > [...] > Increasing the buffering in the test environment turns perfectly > reasonable performance into a real disappointment > [...] Since I did neither attend the talk, nor could I read a paper or look at presentation slides, I'll just continue with the assumption that BBR does not successfully mitigate bufferbloat effects even for video delivery (which I would assume to be an important use case for Google resp. YouTube). > [...] > One of the interesting questions was about the token-bucket algorithm > used in the router to limit performance. In her paper, she discusses > the token bucket filter used by OpenWRT 19.07.1 on a Linksys WRT1900ACS > router. Allowing more than the actual bandwidth of the interface as > the /burst rate/ can exacerbate the buffering problem, so the listener > was concerned that routers "in the wild" might also be contributing > to the poor performance by using token-bucket algorithms with "excess > burst size" parameters. The burst *time* is essential in any QoS configuration, because only the combination of time, size and interface speed allows to reason about the behaviour. Most QoS documentation for enterprise networking gear glosses over this, since the time is usually not configurable, and varies widely between devices and device generations. In my experience, asking about token-bucket algorithm details is often a sign for the asker to not see the forest for the trees. > The very first Cisco manual I found in a Google search explained how > to */set/* excess burst size (!) > > https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_plcshp/configuration/12-4/qos-plcshp-12-4-book.pdf IOS 12.4 is quite old. I do not expect current documentation to have improved significantly, but IOS 12.4 was a thing well before CoDel existed. > [...] > A little later, it explains that the excess or "extended" burst size > /exists so as to avoid tail-drop behavior, and, instead, > engage behavior like that of Random Early Detection (RED)./ The desire to avoid tail-drop is rooted in the desire to maximize throughput. As long as the queue is short, tail-drop is not a problem in practice. Assuming you just want a working network and have sufficient network capacity to support your applications. > [...] > So, folks, am I right in thinking that Cisco's recommendation just > might be a /terrible/ piece of advice? No comment. ;-) > As a capacity planner, it sounds a lot like they're praying for a > conveniently timed lull after every time they let too many bytes > through. Yes. This is a necessary assumption if you want your packet switched network to actually function. The network must not be consistently overloaded such that buffers only absorb bursts and are mostly empty. TCP's congestion control is an attempt to reach this despite end points having the capacity to overwhelm the network, combined with the desire to make good use of the available network capacity. Bloated buffers break this scheme by delaying the signals TCP's congestion control requires to work too much. Thus excessive buffers lead to persistent congestion, limited only by end points timing out. > As a follower of the discussion here, the reference to tail drop and > RED sound faintly ... antique. One reason might be that you looked at antique documentation. ;-) Looking at recent documentation does not really change this impression, though, at least in my experience. Anyway, in an attempt to actually help you: Cisco IOS routers allow the configuration of the queue size (in packets). Thus you could consider to just limit the queue size to guarantee a maximum queuing delay with MTU sized packets. That may well hurt throughput, but transfers you back to a pre-bufferbloat time. As long as the queues are short, you can consider fair queuing. I would suggest to not even attempt any prioritization, because chances are that makes the situation worse. With Cisco IOS, beware (strict) priority queuing, since a priority queue there *always* has a policer, and your traffic usually does not conform to your mental model. Please be aware that Cisco sells routers with different operating systems, and even within one operating system family, QoS details vary widely, thus I would suggest you carefully search for the documentation for your specific devices. Cisco (and most of the other enterprise network device vendors) provide many tuning knobs. Many even try to give helpful advice in their documentation. But QoS is a sufficiently hard problem that it is not yet solved by a widely available "do the right thing" tuning knob in specialized networking gear (I am explicitly excluding Linux based home routers and similar devices here). Generic advice on how to tune networking gear for QoS purposes is nigh impossible. As a result, QoS configurations often create more problems than they solve. As a result, I do not even think this an addressable documentation issue. Here be dragons. Just Say No. To preemp vendor fan persons: I am not bashing Cisco, but the original email explicitly mentioned Cisco. IMHO all the vendors are similar in a generic sense, with specific differences for specific use cases. Some vendors are worse because they hide their documentation from the public, and hide more of their implementation details than, for arguments sake, Cisco. Thanks and HTH, Erik P.S. I actually solved quite a few QoS related problems by disabling QoS. P.P.S. Sometimes I solved QoS related problems by introducing a QoS configuration. YMMV. -- In the beginning, there was static routing. -- RFC 1118 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos 2021-01-10 5:39 ` Erik Auerswald @ 2021-01-10 7:19 ` Jonathan Morton 2021-01-10 7:59 ` Erik Auerswald 2021-01-10 13:21 ` Toke Høiland-Jørgensen 2021-01-12 12:31 ` David Collier-Brown 1 sibling, 2 replies; 7+ messages in thread From: Jonathan Morton @ 2021-01-10 7:19 UTC (permalink / raw) To: Erik Auerswald; +Cc: bloat > On 10 Jan, 2021, at 7:39 am, Erik Auerswald <auerswal@unix-ag.uni-kl.de> wrote: > > In my experience, asking about token-bucket algorithm details is often > a sign for the asker to not see the forest for the trees. IMHO, token-bucket is an obsolete algorithm that should not be used. Like RED, it requires tuning parameters whose correct values are not obvious to the typical end-user, nor even to automatic algorithms. Codel replaces RED, and virtual-clock algorithms can similarly replace token-bucket. Token-bucket is essentially a credit-mode algorithm. The notional "bucket" is replenished at regular (frequent) intervals by an amount proportional to the configured rate of delivery. Traffic may be delivered as long as there is sufficient credit in the bucket to cover it. This inherently leads to the delivery of traffic bursts at line rate, rather than delivery rate, and the size of those bursts may be as large as the bucket. Conversely, if the bucket is too small, then scheduling and other quantum effects may conspire to reduce achievable throughput. Since the bucket size must be chosen, manually, in advance, it is almost always wrong (and usually much too large). Many token-bucket implementations further complicate this by having two nested token-buckets. A larger bucket is replenished at exactly the configured rate from an infinite source, while a smaller bucket is replenished at some higher rate from the larger bucket. This reduces the incidence of line-rate bursts and accommodates Reno-like sawtooth behaviour, but as noted, has the potential to seriously confuse BBR if the buckets are too large. BBRv2 may handle it better if you add ECN and AQM, as the latter will help to correct bad estimations of throughput capacity resulting from the buckets initially being drained. The virtual-clock algorithm I implemented in Cake is essentially a deficit-mode algorithm. During any continuous period of traffic delivery, defined as finding a packet in the queue when one is scheduled to deliver, the time of delivering the next packet is updated after every packet is delivered, by calculating the serialisation time of that packet and adding it to the previous delivery schedule. As long as that time is in the past, the next packet may be delivered immediately. When it goes into the future, the time to wait before delivering the next packet is precisely known. Hence bursts occur only due to quantum effects and are automatically of the minimum size necessary to maintain throughput, without any configuration (explicit or otherwise). Since the scenario here involves an OpenWRT device, you should be able to install Cake on it, if it isn't there already. Please give it a try and let us know if it improves matters. - Jonathan Morton ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos 2021-01-10 7:19 ` Jonathan Morton @ 2021-01-10 7:59 ` Erik Auerswald 2021-01-10 13:21 ` Toke Høiland-Jørgensen 1 sibling, 0 replies; 7+ messages in thread From: Erik Auerswald @ 2021-01-10 7:59 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat Hi, On 10.01.21 08:19, Jonathan Morton wrote: >> On 10 Jan, 2021, at 7:39 am, Erik Auerswald <auerswal@unix-ag.uni-kl.de> wrote: >> >> In my experience, asking about token-bucket algorithm details is often >> a sign for the asker to not see the forest for the trees. > > IMHO, token-bucket is an obsolete algorithm that should not be used. This level of detail seems useful (use of a different class of algorithm instead of implementation details for a given algorithm class). > [...] > Many token-bucket implementations further complicate this by having two nested token-buckets. Those are the details I had in mind as not of general importance. They may matter for specific circumstances, but probably not much in the context of short TCP streams using BBR vs. bufferbloat. > [...] Thanks, Erik -- Thinking doesn't guarantee that we won't make mistakes. But not thinking guarantees that we will. -- Leslie Lamport ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos 2021-01-10 7:19 ` Jonathan Morton 2021-01-10 7:59 ` Erik Auerswald @ 2021-01-10 13:21 ` Toke Høiland-Jørgensen 1 sibling, 0 replies; 7+ messages in thread From: Toke Høiland-Jørgensen @ 2021-01-10 13:21 UTC (permalink / raw) To: Jonathan Morton, Erik Auerswald; +Cc: bloat, Jesper Dangaard Brouer Jonathan Morton <chromatix99@gmail.com> writes: > The virtual-clock algorithm I implemented in Cake is essentially a > deficit-mode algorithm. During any continuous period of traffic > delivery, defined as finding a packet in the queue when one is > scheduled to deliver, the time of delivering the next packet is > updated after every packet is delivered, by calculating the > serialisation time of that packet and adding it to the previous > delivery schedule. As long as that time is in the past, the next > packet may be delivered immediately. When it goes into the future, > the time to wait before delivering the next packet is precisely known. > Hence bursts occur only due to quantum effects and are automatically > of the minimum size necessary to maintain throughput, without any > configuration (explicit or otherwise). Also, while CAKE's shaper predates it, the rest of the Linux kernel is also moving to a timing-based packet scheduling model, following Van Jacobson's talk at Netdevconf in 2018: https://netdevconf.info/0x12/session.html?evolving-from-afap-teaching-nics-about-time In particular, the TCP stack uses early departure time since 2018: https://lwn.net/Articles/766564/ The (somewhat misnamed) sch_fq packet scheduler will also obey packet timestamps and when scheduling, which works with both the timestamps set by the TCP stack as per the commit above, but can also be set from userspace with a socket option, or from a BPF filter. Jesper wrote a BPF-based implementation of a shaper that uses a BPF filter to set packet timestamps to shape traffic at a set rate with precise timing (avoiding bursts): https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt The use case here is an ISP middlebox that can smooth out traffic to avoid tail drops in shallow-buffered switches. He tells me it scales quite well, although some tuning of the kernel and drivers is necessary to completely avoid microbursts. There's also a BPF implementation of CoDel in there, BTW. I've been talking to Jesper about comparing his implementation's performance to the shaper in CAKE, but we haven't gotten around to it yet. We'll share data once we do, obviously :) -Toke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos 2021-01-10 5:39 ` Erik Auerswald 2021-01-10 7:19 ` Jonathan Morton @ 2021-01-12 12:31 ` David Collier-Brown 1 sibling, 0 replies; 7+ messages in thread From: David Collier-Brown @ 2021-01-12 12:31 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 1273 bytes --] Just FYI, we're running16.09.03: I found the reference in google, and considered it antique. On 2021-01-10 12:39 a.m., Erik Auerswald wrote: > In my experience, asking about token-bucket algorithm details is often > a sign for the asker to not see the forest for the trees. > >> The very first Cisco manual I found in a Google search explained how >> to */set/* excess burst size (!) >> >> https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_plcshp/configuration/12-4/qos-plcshp-12-4-book.pdf > IOS 12.4 is quite old. I do not expect current documentation to have > improved significantly, but IOS 12.4 was a thing well before CoDel existed. > Looking at the current manual set, it emphasizes "Weighted Random Early Detection", and does not discuss the token-bucket algorithm at all, though pages describing QOS say it is used. Amusingly, the page about WRED carefully repeats itself, suggesting a slight proofreading problem (;-)) https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_conavd/configuration/xe-16/qos-conavd-xe-16-book/qos-conavd-oview.html -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain [-- Attachment #2: Type: text/html, Size: 2990 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos 2021-01-09 23:01 [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos David Collier-Brown 2021-01-10 5:39 ` Erik Auerswald @ 2021-01-10 14:25 ` David Collier-Brown 1 sibling, 0 replies; 7+ messages in thread From: David Collier-Brown @ 2021-01-10 14:25 UTC (permalink / raw) To: davecb, bloat [-- Attachment #1: Type: text/plain, Size: 669 bytes --] The announcement moved: it used to be > > [1. > https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video > ] > <https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video> > Today I find it at https://www.cs.stonybrook.edu/Rebecca-Drucker-PhD-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain [-- Attachment #2: Type: text/html, Size: 1357 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-01-12 12:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-09 23:01 [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos David Collier-Brown 2021-01-10 5:39 ` Erik Auerswald 2021-01-10 7:19 ` Jonathan Morton 2021-01-10 7:59 ` Erik Auerswald 2021-01-10 13:21 ` Toke Høiland-Jørgensen 2021-01-12 12:31 ` David Collier-Brown 2021-01-10 14:25 ` David Collier-Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox