[Cerowrt-devel] trying to make sense of what switch vendors say wrt buffer bloat

Tue Jun 7 06:46:21 EDT 2016

On Mon, 6 Jun 2016, dpreed at reed.com wrote:

> Even better, it would be fun to get access to an Arista switch and some 
> high performance TCP sources and sinks, and demonstrate extreme 
> bufferbloat compared to a small-buffer switch.  Just a demo, not a 
> simulation full of assumptions and guesses.

So while it can be rightfully argued that we don't need 100ms worth of 
buffering (here it actually is kind of correct to say "ram is cheap" 
because as soon as you go for offchip RAM, it's now cheap).

So these vendors have two choices:

1. 8-16MB on-chip buffer.
2. External RAM

If you choose the external RAM one, you might as well put a lot of RAM 
there, and give the option to the customer to configure the port buffer 
settings any way they want.

For the on-chip small buffer one, having 80 10GE ports,all sharing 8 
megabyte of buffer (let's say 10 ports are congesting, meaning each 
port gets 800kilobytes of buffer) and each port doing 1.25gigabyte/s of 
data, that's 0.64ms worth of buffer per congested port (I hope I got my 
math right). That is just too little unless you control the TCP stacks of 
the clients, and are just doing low-RTT communication.

So while I'd admit that 100ms worth of FIFO is too much, what needs to 
happen now is to have them configured to do something clever and aiming to 
never have prolonged use of more than a few ms worth of buffer.

It's hard to do AQM with half a millisecond worth of buffer, right?

At least this has been shown by previous generation of datacenter switches 
that had miniscule buffers and ISPs tried to use them and when there were 
microbursts there was uncontrolled packet loss.

-- 
Mikael Abrahamsson    email: swmike at swm.pp.se