[Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
@ 2014-05-27  8:21 Hagen Paul Pfeifer
  2014-05-27 10:45 ` Neil Davies
  0 siblings, 1 reply; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2014-05-27  8:21 UTC (permalink / raw)
  To: bloat

Details are missing, like line card buffering mechanisms, line card
buffer management, ... anyway:

http://blog.ipspace.net/2014/05/queuing-mechanisms-in-modern-switches.html

Hagen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-27  8:21 [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES" Hagen Paul Pfeifer
@ 2014-05-27 10:45 ` Neil Davies
  2014-05-27 12:20   ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Davies @ 2014-05-27 10:45 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: bloat

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

Of course it misses out the first principle.

	in non discarding scheduling total delay is conserved, irrespective of the scheduling discipline

(there is a similar statement when discarding is taking place).

Neil


On 27 May 2014, at 09:21, Hagen Paul Pfeifer <hagen@jauu.net> wrote:

> Details are missing, like line card buffering mechanisms, line card
> buffer management, ... anyway:
> 
> http://blog.ipspace.net/2014/05/queuing-mechanisms-in-modern-switches.html
> 
> Hagen
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 235 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-27 10:45 ` Neil Davies
@ 2014-05-27 12:20   ` Hagen Paul Pfeifer
  2014-05-27 12:34     ` Neil Davies
  2014-05-28 18:44     ` David Lang
  0 siblings, 2 replies; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2014-05-27 12:20 UTC (permalink / raw)
  To: Neil Davies; +Cc: bloat

The question is if (codel/pie/whatever) AQM makes sense at all for
10G/40G hardware and higher performance irons? Igress/egress bandwidth
is nearly identical, a larger/longer buffering should not happen. Line
card memory is limited, a larger buffering is defacto excluded.

Are there any documents/papers about high bandwidth equipment and
bufferbloat effects?

Hagen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-27 12:20   ` Hagen Paul Pfeifer
@ 2014-05-27 12:34     ` Neil Davies
  2014-05-28 18:44     ` David Lang
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Davies @ 2014-05-27 12:34 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: bloat

Hagen

It comes down to the portion of the end-to-end quality attenuation (quality attenuation - ∆Q - incorporates both loss and delay) budget  you want to assign to device and how you want it distributed amongst the competing flows (given that is all you can do - you can’t “destroy” loss or “destroy” delay, just differentially distribute it). 

As for ingress/egress capacity being almost the same, that *REALLY* depends on the deployment scenario….

You can’t do traffic performance engineering in a vacuum - you need to have objectives for the application outcomes - that makes the problem context dependent. 

When we do this for people we often find that there are several locations in the architecture where FIFO is the best solution (where you can prove that the natural relaxation times of the queues, given the offered load pattern, is sufficiently small so as not to induce to much quality attenuation). In other places you need to do more analysis.\x10

Neil

On 27 May 2014, at 13:20, Hagen Paul Pfeifer <hagen@jauu.net> wrote:

> The question is if (codel/pie/whatever) AQM makes sense at all for
> 10G/40G hardware and higher performance irons? Igress/egress bandwidth
> is nearly identical, a larger/longer buffering should not happen. Line
> card memory is limited, a larger buffering is defacto excluded.
> 
> Are there any documents/papers about high bandwidth equipment and
> bufferbloat effects?
> 
> Hagen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-27 12:20   ` Hagen Paul Pfeifer
  2014-05-27 12:34     ` Neil Davies
@ 2014-05-28 18:44     ` David Lang
  1 sibling, 0 replies; 12+ messages in thread
From: David Lang @ 2014-05-28 18:44 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: bloat

On Tue, 27 May 2014, Hagen Paul Pfeifer wrote:

> The question is if (codel/pie/whatever) AQM makes sense at all for
> 10G/40G hardware and higher performance irons? Igress/egress bandwidth
> is nearly identical, a larger/longer buffering should not happen. Line
> card memory is limited, a larger buffering is defacto excluded.

what if your router has more than two 40G interfaces? then you can have traffic 
patters where traffic inbound on connections #1 and #2 are trying to go out #3 
at a rate higher than it can handle.

At that point, you have two options

1. drop the packets

2. buffer them and hope that this is a temporary spike

if you buffer them, then the question of what queuing to use, simple FIFO, 
codel, or ??? as well as how large the buffer should be allowed to grow before 
you start dropping (at which point, which packets do you drop)

So I think that even on such big iron devices, there is room for the same sort 
of queueing options as for lower speed connections, but processor speed and 
memory size may limit how much you can do.

David Lang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-29  7:20   ` Neil Davies
  2014-05-29 14:06     ` Jonathan Morton
@ 2014-05-29 16:58     ` Dave Taht
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Taht @ 2014-05-29 16:58 UTC (permalink / raw)
  To: Neil Davies; +Cc: Hal Murray, bloat

I am really enjoying this thread. There was a video and presentation
from stanford
last (?) year  that decided that the "right" number of buffers at
really high rates (10gb+)
was really small, like, 20, and used 10s of thousands of flows to make
its point.

I think it came out of the optical networking group... anybody remember the
paper/preso/video I'm talking about? It seemed like a pretty radical conclusion
at the time.

On Thu, May 29, 2014 at 12:20 AM, Neil Davies <neil.davies@pnsol.com> wrote:
>
> On 28 May 2014, at 12:00, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>>
>> On 28 May, 2014, at 12:39 pm, Hal Murray wrote:
>>
>>>> in non discarding scheduling total delay is conserved,
>>>> irrespective of the scheduling discipline
>>>
>>> Is that true for all backplane/switching topologies?
>>
>> It's a mathematical truth for any topology that you can reduce to a black box with one or more inputs and one output, which you call a "queue" and which *does not discard* packets.  Non-discarding queues don't exist in the real world, of course.
>>
>> The intuitive proof is that every time you promote a packet to be transmitted earlier, you must demote one to be transmitted later.  A non-FIFO queue tends to increase the maximum delay and decrease the minimum delay, but the average delay will remain constant.

There are two cases here, under congestion, that are of interest. One
is X into 1, where figuring out
what to shoot at when, is important.

The other is where X into 1 at one rate is ultimately being stepped
down from, say 10gbit, to 10mbit, e2e.
In the latter case I'm reasonably confident that stochastic fair
queueing at a ratio of number of flows proportional to the ultimate
step-down is a win. (and you still have to decide what to shoot at) -
and it makes tons of sense for hosts servicing a limited number of
users to also disburse their
packet payloads at a similar ratio.

In either case as rates and numbers of flows get insanely high, my gut
(which has been wrong before!)
agreed with the stanford result, (short queues, drop tail), and
conflicts with the observation that breaking
up high speed clumps into highly mixed packets is a good thing.

I wish it were possible to experiment with a 10+gbit, congested,
internet backbone link and observe the results of these lines of
thought...

>
> Jonathan - there is a mathematical underpinning for this, when you (mathematically) construct queueing systems that will differentially allocate both delay and loss you find that the underlying state space has certain properties - they have "lumpability" - this lumpabilty (apart from making the state space dramatically smaller) has another, profound, implication. A set of states that are in a "lump" have an interesting equivalence, it doesn't matter how you leave the "lump" the overall system properties are unaffected.

http://www.pnsol.com/publications.html has invented several terms that
I don't fully understand.


> In the systems we studied (in which there was a ranking in "order of service" (delay/urgency) things in, and a ranking in discarding (loss/cherish) things) this basically implied that the overall system properties (the total "amount" of loss and delay) was independent of that choice. The "quality attenuation" (the loss and delay) was thus conserved.
>
>>
>>>> The question is if (codel/pie/whatever) AQM makes sense at all for 10G/40G
>>>> hardware and higher performance irons? Igress/egress bandwidth is nearly
>>>> identical, a larger/longer buffering should not happen. Line card memory is
>>>> limited, a larger buffering is defacto excluded.
>>>
>>> The simplest interesting case is where you have two input lines feeding the
>>> same output line.
>>>
>>> AQM may not be the best solution, but you have to do something.  Dropping any
>>> packet that won't fit into the buffer is probably simplest.
>>
>> The relative bandwidths of the input(s) and output(s) is also relevant.  You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.
>>
>> Which yields you the classic tail-drop FIFO, whose faults are by now well documented.  If you have the opportunity to do something better than that, you probably should.  The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker.  It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.
>
> Caveat: this is probably the best strategy for networks that consist solely of long lived, non service critical, TCP flows - for the rest of networking requirements think carefully. There are several, real world, scenarios where this is not the best strategy and, where you are looking to make any form of "safety" case (be it fiscal or safety of life) it does create new performance related attack vectors. We know this, because we've been asked this and we've done the analysis.
>
>>
>> - Jonathan Morton
>>
>
> ---------------------------------------------------
> Neil Davies, PhD, CEng, CITP, MBCS
> Chief Scientist
> Predictable Network Solutions Ltd
> Tel:   +44 3333 407715
> Mob: +44 7974 922445
> neil.davies@pnsol.com
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-29  7:20   ` Neil Davies
@ 2014-05-29 14:06     ` Jonathan Morton
  2014-05-29 16:58     ` Dave Taht
  1 sibling, 0 replies; 12+ messages in thread
From: Jonathan Morton @ 2014-05-29 14:06 UTC (permalink / raw)
  To: Neil Davies; +Cc: Hal Murray, bloat

[-- Attachment #1: Type: text/plain, Size: 2557 bytes --]

> > Which yields you the classic tail-drop FIFO, whose faults are by now
well documented.  If you have the opportunity to do something better than
that, you probably should.  The simplest improvement I can think of is a
*head*-drop FIFO, which gets the congestion signal back to the source
quicker.  It *should* I think be possible to do Codel at 10G (if not 40G)
by now; whether or not it is *easy* probably depends on your transistor
budget.
>
> Caveat: this is probably the best strategy for networks that consist
solely of long lived, non service critical, TCP flows - for the rest of
networking requirements think carefully. There are several, real world,
scenarios where this is not the best strategy and, where you are looking to
make any form of "safety" case (be it fiscal or safety of life) it does
create new performance related attack vectors. We know this, because we've
been asked this and we've done the analysis.

That sounds like you're talking about applications where reliable packet
delivery trumps latency. In which case, go ahead and build big buffers and
use them; build the hardware so that AQM can be switched off. I happen to
believe that AQM has the more common applications, so it should still be
built into hardware whenever practical to do so.

Speaking of which, here are a couple more ideas for hardware-simple AQM
strategies:

RANDOM qdisc, which has no queue head or tail. On dequeue, it delivers a
random packet from the queue. On enqueue to a full buffer, it drops random
packets from the queue until the new packet will fit. Your primary
congestion signal is then packets arriving out of order and with delay
jitter, which increases with the average fill of the queue. As a backup, it
will revert to dropping packets in an approximately fair manner.

HALF MARK qdisc, which is essentially a head-drop FIFO, but when the queue
is half-full it begins marking all ECN capable packets (on dequeue) and
dropping the rest (according to ECN RFC). I know, it's theoretically
inferior to RED, but it's far more deployable. It is also capable of giving
a congestion signal without dropping packets, as long as everything
supports ECN. Easily generalised into HIGH WATER MARK qdisc where the ECN
threshold is not necessarily at half-full.

You may also notice that RANDOM and HALF MARK can be implemented
simultaneously on the same queue. This is generally true of any two AQM
strategies which respectively target packet ordering and packet marking
exclusively. You could thus also have RANDOM+Codel or similar.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 2745 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-28 11:00 ` Jonathan Morton
  2014-05-28 18:56   ` David Lang
@ 2014-05-29  7:20   ` Neil Davies
  2014-05-29 14:06     ` Jonathan Morton
  2014-05-29 16:58     ` Dave Taht
  1 sibling, 2 replies; 12+ messages in thread
From: Neil Davies @ 2014-05-29  7:20 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Hal Murray, bloat

On 28 May 2014, at 12:00, Jonathan Morton <chromatix99@gmail.com> wrote:

> 
> On 28 May, 2014, at 12:39 pm, Hal Murray wrote:
> 
>>> in non discarding scheduling total delay is conserved,
>>> irrespective of the scheduling discipline
>> 
>> Is that true for all backplane/switching topologies?
> 
> It's a mathematical truth for any topology that you can reduce to a black box with one or more inputs and one output, which you call a "queue" and which *does not discard* packets.  Non-discarding queues don't exist in the real world, of course.
> 
> The intuitive proof is that every time you promote a packet to be transmitted earlier, you must demote one to be transmitted later.  A non-FIFO queue tends to increase the maximum delay and decrease the minimum delay, but the average delay will remain constant.

Jonathan - there is a mathematical underpinning for this, when you (mathematically) construct queueing systems that will differentially allocate both delay and loss you find that the underlying state space has certain properties - they have "lumpability" - this lumpabilty (apart from making the state space dramatically smaller) has another, profound, implication. A set of states that are in a "lump" have an interesting equivalence, it doesn't matter how you leave the "lump" the overall system properties are unaffected. 

In the systems we studied (in which there was a ranking in "order of service" (delay/urgency) things in, and a ranking in discarding (loss/cherish) things) this basically implied that the overall system properties (the total "amount" of loss and delay) was independent of that choice. The "quality attenuation" (the loss and delay) was thus conserved.

> 
>>> The question is if (codel/pie/whatever) AQM makes sense at all for 10G/40G
>>> hardware and higher performance irons? Igress/egress bandwidth is nearly
>>> identical, a larger/longer buffering should not happen. Line card memory is
>>> limited, a larger buffering is defacto excluded. 
>> 
>> The simplest interesting case is where you have two input lines feeding the 
>> same output line.
>> 
>> AQM may not be the best solution, but you have to do something.  Dropping any 
>> packet that won't fit into the buffer is probably simplest.
> 
> The relative bandwidths of the input(s) and output(s) is also relevant.  You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.
> 
> Which yields you the classic tail-drop FIFO, whose faults are by now well documented.  If you have the opportunity to do something better than that, you probably should.  The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker.  It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.

Caveat: this is probably the best strategy for networks that consist solely of long lived, non service critical, TCP flows - for the rest of networking requirements think carefully. There are several, real world, scenarios where this is not the best strategy and, where you are looking to make any form of "safety" case (be it fiscal or safety of life) it does create new performance related attack vectors. We know this, because we've been asked this and we've done the analysis.

> 
> - Jonathan Morton
> 

---------------------------------------------------
Neil Davies, PhD, CEng, CITP, MBCS
Chief Scientist
Predictable Network Solutions Ltd
Tel:   +44 3333 407715
Mob: +44 7974 922445
neil.davies@pnsol.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-28 18:56   ` David Lang
@ 2014-05-28 22:15     ` Bill Ver Steeg (versteb)
  0 siblings, 0 replies; 12+ messages in thread
From: Bill Ver Steeg (versteb) @ 2014-05-28 22:15 UTC (permalink / raw)
  To: David Lang, Jonathan Morton; +Cc: Hal Murray, bloat

This really speaks to the difference between cross-traffic induced delay and self- induced delay.

There are several toolkits that can be brought to bear, and we must be careful to examine the impact of each of them. The one that we tend to think about most (at least recently) is the AQM algorithm that manages the depth of a given queue. It is important to note that waiting for the buffer to fill up before dropping is not optimal, because it is then too late. You want to provide mark/drop back pressure a bit earlier so that you do not grind all of the flows to a halt at once. See the PIE and CoDel papers for the details. There are also several technologies that can be used to segregate flows to lessen the impact of cross traffic. There are also congestion avoidance algorithms that can be used on the hosts to recognize/avoid bloat. There are hybrids of these schemes, and multiple technologies with their own sweet spots in each of these domains.

There is no magic bullet, and a successful system will need to draw from each of these disciplines.

In the specific case of short lived flows vs long lived flows, one could make a case that hashing the several flows into a set of discrete queues would provide tremendous benefit. IMHO, this is the best approach, - but I am looking into this in some detail. One could also argue that not all middleboxes are able to support multiple queues, (and that the number of queues is finite) so an intelligent AQM algorithm is also important for limiting cross traffic induced delay. Once could also make the point that some (hopefully fewer and fewer) middleboxes will not have any sort of rational buffer management capabilities and will just do tail-drop with large buffers, so the hosts need to do what they can to avoid bloat. 

Bill VerSteeg

-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of David Lang
Sent: Wednesday, May 28, 2014 2:56 PM
To: Jonathan Morton
Cc: Hal Murray; bloat@lists.bufferbloat.net
Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"

On Wed, 28 May 2014, Jonathan Morton wrote:

> On 28 May, 2014, at 12:39 pm, Hal Murray wrote:
>
>>> in non discarding scheduling total delay is conserved, irrespective 
>>> of the scheduling discipline
>>
>> Is that true for all backplane/switching topologies?
>
> It's a mathematical truth for any topology that you can reduce to a 
> black box with one or more inputs and one output, which you call a 
> "queue" and which *does not discard* packets.  Non-discarding queues 
> don't exist in the real world, of course.
>
> The intuitive proof is that every time you promote a packet to be 
> transmitted earlier, you must demote one to be transmitted later.  A 
> non-FIFO queue tends to increase the maximum delay and decrease the 
> minimum delay, but the average delay will remain constant.

True, but not all traffic is equal. delays in DNS and short TCP connections are far more noticable than the same total delay in long TCP connections (because the users tend to be serialized on the short connections while doing the long ones in parallel)

so queueing that favors short duration flows over long duration ones still averages the same latency delay overall, but the latency/connection_length will remain very small in all cases instead lf letting this ratio become very large for short connections.

David Lang

>>> The question is if (codel/pie/whatever) AQM makes sense at all for 
>>> 10G/40G hardware and higher performance irons? Igress/egress 
>>> bandwidth is nearly identical, a larger/longer buffering should not 
>>> happen. Line card memory is limited, a larger buffering is defacto excluded.
>>
>> The simplest interesting case is where you have two input lines 
>> feeding the same output line.
>>
>> AQM may not be the best solution, but you have to do something.  
>> Dropping any packet that won't fit into the buffer is probably simplest.
>
> The relative bandwidths of the input(s) and output(s) is also relevant.  You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.
>
> Which yields you the classic tail-drop FIFO, whose faults are by now well documented.  If you have the opportunity to do something better than that, you probably should.  The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker.  It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.
>
> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-28 11:00 ` Jonathan Morton
@ 2014-05-28 18:56   ` David Lang
  2014-05-28 22:15     ` Bill Ver Steeg (versteb)
  2014-05-29  7:20   ` Neil Davies
  1 sibling, 1 reply; 12+ messages in thread
From: David Lang @ 2014-05-28 18:56 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Hal Murray, bloat

On Wed, 28 May 2014, Jonathan Morton wrote:

> On 28 May, 2014, at 12:39 pm, Hal Murray wrote:
>
>>> in non discarding scheduling total delay is conserved,
>>> irrespective of the scheduling discipline
>>
>> Is that true for all backplane/switching topologies?
>
> It's a mathematical truth for any topology that you can reduce to a black box 
> with one or more inputs and one output, which you call a "queue" and which 
> *does not discard* packets.  Non-discarding queues don't exist in the real 
> world, of course.
>
> The intuitive proof is that every time you promote a packet to be transmitted 
> earlier, you must demote one to be transmitted later.  A non-FIFO queue tends 
> to increase the maximum delay and decrease the minimum delay, but the average 
> delay will remain constant.

True, but not all traffic is equal. delays in DNS and short TCP connections are 
far more noticable than the same total delay in long TCP connections (because 
the users tend to be serialized on the short connections while doing the long 
ones in parallel)

so queueing that favors short duration flows over long duration ones still 
averages the same latency delay overall, but the latency/connection_length will 
remain very small in all cases instead lf letting this ratio become very large 
for short connections.

David Lang

>>> The question is if (codel/pie/whatever) AQM makes sense at all for 10G/40G
>>> hardware and higher performance irons? Igress/egress bandwidth is nearly
>>> identical, a larger/longer buffering should not happen. Line card memory is
>>> limited, a larger buffering is defacto excluded.
>>
>> The simplest interesting case is where you have two input lines feeding the
>> same output line.
>>
>> AQM may not be the best solution, but you have to do something.  Dropping any
>> packet that won't fit into the buffer is probably simplest.
>
> The relative bandwidths of the input(s) and output(s) is also relevant.  You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.
>
> Which yields you the classic tail-drop FIFO, whose faults are by now well documented.  If you have the opportunity to do something better than that, you probably should.  The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker.  It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.
>
> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
  2014-05-28  9:39 Hal Murray
@ 2014-05-28 11:00 ` Jonathan Morton
  2014-05-28 18:56   ` David Lang
  2014-05-29  7:20   ` Neil Davies
  0 siblings, 2 replies; 12+ messages in thread
From: Jonathan Morton @ 2014-05-28 11:00 UTC (permalink / raw)
  To: Hal Murray; +Cc: bloat

On 28 May, 2014, at 12:39 pm, Hal Murray wrote:

>> in non discarding scheduling total delay is conserved,
>> irrespective of the scheduling discipline
> 
> Is that true for all backplane/switching topologies?

It's a mathematical truth for any topology that you can reduce to a black box with one or more inputs and one output, which you call a "queue" and which *does not discard* packets.  Non-discarding queues don't exist in the real world, of course.

The intuitive proof is that every time you promote a packet to be transmitted earlier, you must demote one to be transmitted later.  A non-FIFO queue tends to increase the maximum delay and decrease the minimum delay, but the average delay will remain constant.

>> The question is if (codel/pie/whatever) AQM makes sense at all for 10G/40G
>> hardware and higher performance irons? Igress/egress bandwidth is nearly
>> identical, a larger/longer buffering should not happen. Line card memory is
>> limited, a larger buffering is defacto excluded. 
> 
> The simplest interesting case is where you have two input lines feeding the 
> same output line.
> 
> AQM may not be the best solution, but you have to do something.  Dropping any 
> packet that won't fit into the buffer is probably simplest.

The relative bandwidths of the input(s) and output(s) is also relevant.  You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.

Which yields you the classic tail-drop FIFO, whose faults are by now well documented.  If you have the opportunity to do something better than that, you probably should.  The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker.  It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
@ 2014-05-28  9:39 Hal Murray
  2014-05-28 11:00 ` Jonathan Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Hal Murray @ 2014-05-28  9:39 UTC (permalink / raw)
  To: bloat; +Cc: Hal Murray

> in non discarding scheduling total delay is conserved,
> irrespective of the scheduling discipline

Is that true for all backplane/switching topologies?


> The question is if (codel/pie/whatever) AQM makes sense at all for 10G/40G
> hardware and higher performance irons? Igress/egress bandwidth is nearly
> identical, a larger/longer buffering should not happen. Line card memory is
> limited, a larger buffering is defacto excluded. 

The simplest interesting case is where you have two input lines feeding the 
same output line.

AQM may not be the best solution, but you have to do something.  Dropping any 
packet that won't fit into the buffer is probably simplest.



-- 
These are my opinions.  I hate spam.




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-05-29 16:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-27  8:21 [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES" Hagen Paul Pfeifer
2014-05-27 10:45 ` Neil Davies
2014-05-27 12:20   ` Hagen Paul Pfeifer
2014-05-27 12:34     ` Neil Davies
2014-05-28 18:44     ` David Lang
2014-05-28  9:39 Hal Murray
2014-05-28 11:00 ` Jonathan Morton
2014-05-28 18:56   ` David Lang
2014-05-28 22:15     ` Bill Ver Steeg (versteb)
2014-05-29  7:20   ` Neil Davies
2014-05-29 14:06     ` Jonathan Morton
2014-05-29 16:58     ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox