[Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"

Wed May 28 07:00:42 EDT 2014

On 28 May, 2014, at 12:39 pm, Hal Murray wrote:

>> in non discarding scheduling total delay is conserved,
>> irrespective of the scheduling discipline
> 
> Is that true for all backplane/switching topologies?

It's a mathematical truth for any topology that you can reduce to a black box with one or more inputs and one output, which you call a "queue" and which *does not discard* packets.  Non-discarding queues don't exist in the real world, of course.

The intuitive proof is that every time you promote a packet to be transmitted earlier, you must demote one to be transmitted later.  A non-FIFO queue tends to increase the maximum delay and decrease the minimum delay, but the average delay will remain constant.

>> The question is if (codel/pie/whatever) AQM makes sense at all for 10G/40G
>> hardware and higher performance irons? Igress/egress bandwidth is nearly
>> identical, a larger/longer buffering should not happen. Line card memory is
>> limited, a larger buffering is defacto excluded. 
> 
> The simplest interesting case is where you have two input lines feeding the 
> same output line.
> 
> AQM may not be the best solution, but you have to do something.  Dropping any 
> packet that won't fit into the buffer is probably simplest.

The relative bandwidths of the input(s) and output(s) is also relevant.  You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.

Which yields you the classic tail-drop FIFO, whose faults are by now well documented.  If you have the opportunity to do something better than that, you probably should.  The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker.  It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.

 - Jonathan Morton