[Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
@ 2021-01-09 23:01 David Collier-Brown
  2021-01-10  5:39 ` Erik Auerswald
  2021-01-10 14:25 ` David Collier-Brown
  0 siblings, 2 replies; 7+ messages in thread
From: David Collier-Brown @ 2021-01-09 23:01 UTC (permalink / raw)
  To: bloat; +Cc: dave.collier-brown

[-- Attachment #1: Type: text/plain, Size: 3625 bytes --]

At work, I recently had a database outage due to network saturation and 
timeouts, which we proposed to address by setting up a QOS policy for 
the machines in question. However, from the discussion in Ms Drucker's 
BBR talk, that could lead us to doing /A Bad Thing/ (;-))

Let's start at the beginning, though.  The talk, mentioned before in the 
list[1], was about the interaction of BBR and large values of buffering, 
specifically for video traffic.  I attended it, and listened with 
interest to the questions from the committee. She subsequently gave me a 
copy of the paper and presentation, which I appreciate: it's very good work.

She reported the severity of the effect of large buffers on BBR. I've 
attached a screenshot, but the list probably won't take it, so I'll 
describe it. After the first few packets with large buffers, RTT rises, 
throughput plummets and then throughput stays low for about 200,000 ms. 
Then it rises to about half the initial throughput for about 50,000 ms 
as RTT falls, then throughput plummets once more. This pattern repeats 
throughout the test.

Increasing the buffering in the test environment turns perfectly 
reasonable performance into a real disappointment, even though BBR is 
trying to estimate /the network’s bandwidth-delay product, BDP, and 
regulating its //sending rate to maximize throughput while attempting to 
maintain BDP worth of packets in the //buffer, irrespective of the size 
of the buffer/.

One of the interesting questions was about the token-bucket algorithm 
used in the router to limit performance. In her paper, she discusses the 
token bucket filter used by OpenWRT 19.07.1 on a Linksys WRT1900ACS 
router. Allowing more than the actual bandwidth of the interface as the 
/burst rate/ can exacerbate the buffering problem, so the listener was 
concerned that routers "in the wild" might also be contributing to the 
poor performance by using token-bucket algorithms with "excess burst 
size" parameters.

The very first Cisco manual I found in a Google search explained how to 
*/set/* excess burst size (!)

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_plcshp/configuration/12-4/qos-plcshp-12-4-book.pdf 
defined excess burst size as /Traffic that falls between the normal 
burst size and the Excess Burst size/ and specifies it will be sent 
regardless, /with a probability that increases as the burst size increases./

A little later, it explains that the excess or "extended" burst size 
/exists so as to avoid tail-drop behavior, and, instead,
engage behavior like that of Random Early Detection (RED)./

In order to avoid tail drop, they suggest the "extended burst" be set to 
twice the burst size, where the burst size by definition is the capacity 
of the interface, per unit time.

So, folks, am I right in thinking that Cisco's recommendation just might 
be a /terrible/ piece of advice?

As a capacity planner, it sounds a lot like they're praying for a 
conveniently timed lull after every time they let too many bytes through.

As a follower of the discussion here, the reference to tail drop and RED 
sound faintly ... antique.

--dave c-b

[1. 
https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video 
] 
<https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video> 

-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb@spamcop.net            |                      -- Mark Twain

[-- Attachment #2: Type: text/html, Size: 5479 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
  2021-01-09 23:01 [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos David Collier-Brown
@ 2021-01-10  5:39 ` Erik Auerswald
  2021-01-10  7:19   ` Jonathan Morton
  2021-01-12 12:31   ` David Collier-Brown
  2021-01-10 14:25 ` David Collier-Brown
  1 sibling, 2 replies; 7+ messages in thread
From: Erik Auerswald @ 2021-01-10  5:39 UTC (permalink / raw)
  To: bloat

Hi,

On Sat Jan 9 18:01:32 EST 2021, David Collier-Brown wrote:
> At work, I recently had a database outage due to network saturation and
> timeouts, which we proposed to address by setting up a QOS policy for
> the machines in question. However, from the discussion in Ms Drucker's
> BBR talk, that could lead us to doing /A Bad Thing/ (;-))

QoS policies are dangerous, they seldomly work exactly as intended.

I'll assume you have already convinced yourself that you want to apply
the QoS policy at a congested link, e.g., a WAN router, as opposed to
shallow buffered LAN switches running Cisco IOS.

If you are not using Cisco gear, then please do not assume that Cisco
documentation can help you solve your problem. ;-)

> Let's start at the beginning, though.  The talk, mentioned before
> in the list[1], was about the interaction of BBR and large values of
> buffering, specifically for video traffic.  I attended it, and listened
> with interest to the questions from the committee. She subsequently
> gave me a copy of the paper and presentation, which I appreciate:
> it's very good work.

The link to the talk announcement leads to an error page now.  I did
not find slides or a paper either. :-(

> [...]
> Increasing the buffering in the test environment turns perfectly 
> reasonable performance into a real disappointment
> [...]

Since I did neither attend the talk, nor could I read a paper or look
at presentation slides, I'll just continue with the assumption that
BBR does not successfully mitigate bufferbloat effects even for video
delivery (which I would assume to be an important use case for Google
resp. YouTube).

> [...]
> One of the interesting questions was about the token-bucket algorithm
> used in the router to limit performance. In her paper, she discusses
> the token bucket filter used by OpenWRT 19.07.1 on a Linksys WRT1900ACS
> router. Allowing more than the actual bandwidth of the interface as
> the /burst rate/ can exacerbate the buffering problem, so the listener
> was concerned that routers "in the wild" might also be contributing
> to the poor performance by using token-bucket algorithms with "excess
> burst size" parameters.

The burst *time* is essential in any QoS configuration, because only
the combination of time, size and interface speed allows to reason
about the behaviour.  Most QoS documentation for enterprise networking
gear glosses over this, since the time is usually not configurable,
and varies widely between devices and device generations.

In my experience, asking about token-bucket algorithm details is often
a sign for the asker to not see the forest for the trees.

> The very first Cisco manual I found in a Google search explained how
> to */set/* excess burst size (!)
> 
> https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_plcshp/configuration/12-4/qos-plcshp-12-4-book.pdf 

IOS 12.4 is quite old.  I do not expect current documentation to have
improved significantly, but IOS 12.4 was a thing well before CoDel existed.

> [...]
> A little later, it explains that the excess or "extended" burst size 
> /exists so as to avoid tail-drop behavior, and, instead,
> engage behavior like that of Random Early Detection (RED)./

The desire to avoid tail-drop is rooted in the desire to maximize
throughput.  As long as the queue is short, tail-drop is not a problem
in practice.  Assuming you just want a working network and have sufficient
network capacity to support your applications.

> [...]
> So, folks, am I right in thinking that Cisco's recommendation just
> might be a /terrible/ piece of advice?

No comment. ;-)

> As a capacity planner, it sounds a lot like they're praying for a
> conveniently timed lull after every time they let too many bytes
> through.

Yes.

This is a necessary assumption if you want your packet switched
network to actually function.  The network must not be consistently
overloaded such that buffers only absorb bursts and are mostly empty.
TCP's congestion control is an attempt to reach this despite end points
having the capacity to overwhelm the network, combined with the desire
to make good use of the available network capacity.

Bloated buffers break this scheme by delaying the signals TCP's congestion
control requires to work too much.  Thus excessive buffers lead to
persistent congestion, limited only by end points timing out.

> As a follower of the discussion here, the reference to tail drop and
> RED sound faintly ... antique.

One reason might be that you looked at antique documentation. ;-)
Looking at recent documentation does not really change this impression,
though, at least in my experience.

Anyway, in an attempt to actually help you:  Cisco IOS routers allow the
configuration of the queue size (in packets).  Thus you could consider
to just limit the queue size to guarantee a maximum queuing delay
with MTU sized packets.  That may well hurt throughput, but transfers
you back to a pre-bufferbloat time.  As long as the queues are short,
you can consider fair queuing.  I would suggest to not even attempt
any prioritization, because chances are that makes the situation worse.
With Cisco IOS, beware (strict) priority queuing, since a priority queue
there *always* has a policer, and your traffic usually does not conform
to your mental model.

Please be aware that Cisco sells routers with different operating systems,
and even within one operating system family, QoS details vary widely,
thus I would suggest you carefully search for the documentation for your
specific devices.

Cisco (and most of the other enterprise network device vendors)
provide many tuning knobs.  Many even try to give helpful advice in
their documentation.  But QoS is a sufficiently hard problem that it
is not yet solved by a widely available "do the right thing" tuning
knob in specialized networking gear (I am explicitly excluding Linux
based home routers and similar devices here).  Generic advice on how to
tune networking gear for QoS purposes is nigh impossible.  As a result,
QoS configurations often create more problems than they solve.

As a result, I do not even think this an addressable documentation
issue.

Here be dragons.  Just Say No.

To preemp vendor fan persons:  I am not bashing Cisco, but the original
email explicitly mentioned Cisco.  IMHO all the vendors are similar
in a generic sense, with specific differences for specific use cases.
Some vendors are worse because they hide their documentation from the
public, and hide more of their implementation details than, for arguments
sake, Cisco.

Thanks and HTH,
Erik

P.S. I actually solved quite a few QoS related problems by disabling QoS.

P.P.S. Sometimes I solved QoS related problems by introducing a QoS
       configuration.  YMMV.
-- 
In the beginning, there was static routing.
                        -- RFC 1118

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
  2021-01-10  5:39 ` Erik Auerswald
@ 2021-01-10  7:19   ` Jonathan Morton
  2021-01-10  7:59     ` Erik Auerswald
  2021-01-10 13:21     ` Toke Høiland-Jørgensen
  2021-01-12 12:31   ` David Collier-Brown
  1 sibling, 2 replies; 7+ messages in thread
From: Jonathan Morton @ 2021-01-10  7:19 UTC (permalink / raw)
  To: Erik Auerswald; +Cc: bloat

> On 10 Jan, 2021, at 7:39 am, Erik Auerswald <auerswal@unix-ag.uni-kl.de> wrote:
> 
> In my experience, asking about token-bucket algorithm details is often
> a sign for the asker to not see the forest for the trees.

IMHO, token-bucket is an obsolete algorithm that should not be used.  Like RED, it requires tuning parameters whose correct values are not obvious to the typical end-user, nor even to automatic algorithms.  Codel replaces RED, and virtual-clock algorithms can similarly replace token-bucket.

Token-bucket is essentially a credit-mode algorithm.  The notional "bucket" is replenished at regular (frequent) intervals by an amount proportional to the configured rate of delivery.  Traffic may be delivered as long as there is sufficient credit in the bucket to cover it.  This inherently leads to the delivery of traffic bursts at line rate, rather than delivery rate, and the size of those bursts may be as large as the bucket.  Conversely, if the bucket is too small, then scheduling and other quantum effects may conspire to reduce achievable throughput.  Since the bucket size must be chosen, manually, in advance, it is almost always wrong (and usually much too large).

Many token-bucket implementations further complicate this by having two nested token-buckets.  A larger bucket is replenished at exactly the configured rate from an infinite source, while a smaller bucket is replenished at some higher rate from the larger bucket.  This reduces the incidence of line-rate bursts and accommodates Reno-like sawtooth behaviour, but as noted, has the potential to seriously confuse BBR if the buckets are too large.  BBRv2 may handle it better if you add ECN and AQM, as the latter will help to correct bad estimations of throughput capacity resulting from the buckets initially being drained.

The virtual-clock algorithm I implemented in Cake is essentially a deficit-mode algorithm.  During any continuous period of traffic delivery, defined as finding a packet in the queue when one is scheduled to deliver, the time of delivering the next packet is updated after every packet is delivered, by calculating the serialisation time of that packet and adding it to the previous delivery schedule.  As long as that time is in the past, the next packet may be delivered immediately.  When it goes into the future, the time to wait before delivering the next packet is precisely known.  Hence bursts occur only due to quantum effects and are automatically of the minimum size necessary to maintain throughput, without any configuration (explicit or otherwise).

Since the scenario here involves an OpenWRT device, you should be able to install Cake on it, if it isn't there already.  Please give it a try and let us know if it improves matters.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
  2021-01-10  7:19   ` Jonathan Morton
@ 2021-01-10  7:59     ` Erik Auerswald
  2021-01-10 13:21     ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 7+ messages in thread
From: Erik Auerswald @ 2021-01-10  7:59 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

Hi,

On 10.01.21 08:19, Jonathan Morton wrote:
>> On 10 Jan, 2021, at 7:39 am, Erik Auerswald <auerswal@unix-ag.uni-kl.de> wrote:
>>
>> In my experience, asking about token-bucket algorithm details is often
>> a sign for the asker to not see the forest for the trees.
>
> IMHO, token-bucket is an obsolete algorithm that should not be used.

This level of detail seems useful (use of a different class of algorithm
instead of implementation details for a given algorithm class).

> [...]
> Many token-bucket implementations further complicate this by having two nested token-buckets.

Those are the details I had in mind as not of general importance.
They may matter for specific circumstances, but probably not much
in the context of short TCP streams using BBR vs. bufferbloat.

> [...]

Thanks,
Erik
-- 
Thinking doesn't guarantee that we won't make mistakes. But not thinking
guarantees that we will.
                         -- Leslie Lamport

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
  2021-01-10  7:19   ` Jonathan Morton
  2021-01-10  7:59     ` Erik Auerswald
@ 2021-01-10 13:21     ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-01-10 13:21 UTC (permalink / raw)
  To: Jonathan Morton, Erik Auerswald; +Cc: bloat, Jesper Dangaard Brouer

Jonathan Morton <chromatix99@gmail.com> writes:

> The virtual-clock algorithm I implemented in Cake is essentially a
> deficit-mode algorithm.  During any continuous period of traffic
> delivery, defined as finding a packet in the queue when one is
> scheduled to deliver, the time of delivering the next packet is
> updated after every packet is delivered, by calculating the
> serialisation time of that packet and adding it to the previous
> delivery schedule.  As long as that time is in the past, the next
> packet may be delivered immediately.  When it goes into the future,
> the time to wait before delivering the next packet is precisely known.
> Hence bursts occur only due to quantum effects and are automatically
> of the minimum size necessary to maintain throughput, without any
> configuration (explicit or otherwise).

Also, while CAKE's shaper predates it, the rest of the Linux kernel is
also moving to a timing-based packet scheduling model, following Van
Jacobson's talk at Netdevconf in 2018:

https://netdevconf.info/0x12/session.html?evolving-from-afap-teaching-nics-about-time

In particular, the TCP stack uses early departure time since 2018:
https://lwn.net/Articles/766564/

The (somewhat misnamed) sch_fq packet scheduler will also obey packet
timestamps and when scheduling, which works with both the timestamps set
by the TCP stack as per the commit above, but can also be set from
userspace with a socket option, or from a BPF filter.

Jesper wrote a BPF-based implementation of a shaper that uses a BPF
filter to set packet timestamps to shape traffic at a set rate with
precise timing (avoiding bursts):
https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt

The use case here is an ISP middlebox that can smooth out traffic to
avoid tail drops in shallow-buffered switches. He tells me it scales
quite well, although some tuning of the kernel and drivers is necessary
to completely avoid microbursts. There's also a BPF implementation of
CoDel in there, BTW.

I've been talking to Jesper about comparing his implementation's
performance to the shaper in CAKE, but we haven't gotten around to it
yet. We'll share data once we do, obviously :)

-Toke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
  2021-01-10  5:39 ` Erik Auerswald
  2021-01-10  7:19   ` Jonathan Morton
@ 2021-01-12 12:31   ` David Collier-Brown
  1 sibling, 0 replies; 7+ messages in thread
From: David Collier-Brown @ 2021-01-12 12:31 UTC (permalink / raw)
  To: bloat

[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]

Just FYI, we're running16.09.03: I found the reference in google, and 
considered it antique.

On 2021-01-10 12:39 a.m., Erik Auerswald wrote:
> In my experience, asking about token-bucket algorithm details is often
> a sign for the asker to not see the forest for the trees.
>
>> The very first Cisco manual I found in a Google search explained how
>> to */set/* excess burst size (!)
>>
>> https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_plcshp/configuration/12-4/qos-plcshp-12-4-book.pdf  
> IOS 12.4 is quite old.  I do not expect current documentation to have
> improved significantly, but IOS 12.4 was a thing well before CoDel existed.
>
Looking at the current manual set, it emphasizes "Weighted Random Early 
Detection", and does not discuss the token-bucket algorithm at all, 
though pages describing QOS say it is used.

Amusingly, the page about WRED carefully repeats itself, suggesting a 
slight proofreading problem (;-)) 
https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos_conavd/configuration/xe-16/qos-conavd-xe-16-book/qos-conavd-oview.html


-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb@spamcop.net           |                      -- Mark Twain


[-- Attachment #2: Type: text/html, Size: 2990 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos
  2021-01-09 23:01 [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos David Collier-Brown
  2021-01-10  5:39 ` Erik Auerswald
@ 2021-01-10 14:25 ` David Collier-Brown
  1 sibling, 0 replies; 7+ messages in thread
From: David Collier-Brown @ 2021-01-10 14:25 UTC (permalink / raw)
  To: davecb, bloat

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

The announcement moved: it used to be
>
> [1. 
> https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video 
> ] 
> <https://www.cs.stonybrook.edu/Rebecca-Drucker-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH-Video>
>
Today I find it at

https://www.cs.stonybrook.edu/Rebecca-Drucker-PhD-Research-Proficiency-Presentation-Investigating-BBR-Bufferbloat-Problem-DASH

-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb@spamcop.net           |                      -- Mark Twain


[-- Attachment #2: Type: text/html, Size: 1357 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-12 12:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-09 23:01 [Bloat] Rebecca Drucker's talk sounds like it exposes an addressable bloat issue in Ciscos David Collier-Brown
2021-01-10  5:39 ` Erik Auerswald
2021-01-10  7:19   ` Jonathan Morton
2021-01-10  7:59     ` Erik Auerswald
2021-01-10 13:21     ` Toke Høiland-Jørgensen
2021-01-12 12:31   ` David Collier-Brown
2021-01-10 14:25 ` David Collier-Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox