From: "Bill Ver Steeg (versteb)" <versteb@cisco.com>
To: David Lang <david@lang.hm>, Jonathan Morton <chromatix99@gmail.com>
Cc: Hal Murray <hmurray@megapathdsl.net>,
"bloat@lists.bufferbloat.net" <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
Date: Wed, 28 May 2014 22:15:26 +0000 [thread overview]
Message-ID: <AE7F97DB5FEE054088D82E836BD15BE92452C67F@xmb-aln-x05.cisco.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1405281146090.32611@nftneq.ynat.uz>
This really speaks to the difference between cross-traffic induced delay and self- induced delay.
There are several toolkits that can be brought to bear, and we must be careful to examine the impact of each of them. The one that we tend to think about most (at least recently) is the AQM algorithm that manages the depth of a given queue. It is important to note that waiting for the buffer to fill up before dropping is not optimal, because it is then too late. You want to provide mark/drop back pressure a bit earlier so that you do not grind all of the flows to a halt at once. See the PIE and CoDel papers for the details. There are also several technologies that can be used to segregate flows to lessen the impact of cross traffic. There are also congestion avoidance algorithms that can be used on the hosts to recognize/avoid bloat. There are hybrids of these schemes, and multiple technologies with their own sweet spots in each of these domains.
There is no magic bullet, and a successful system will need to draw from each of these disciplines.
In the specific case of short lived flows vs long lived flows, one could make a case that hashing the several flows into a set of discrete queues would provide tremendous benefit. IMHO, this is the best approach, - but I am looking into this in some detail. One could also argue that not all middleboxes are able to support multiple queues, (and that the number of queues is finite) so an intelligent AQM algorithm is also important for limiting cross traffic induced delay. Once could also make the point that some (hopefully fewer and fewer) middleboxes will not have any sort of rational buffer management capabilities and will just do tail-drop with large buffers, so the hosts need to do what they can to avoid bloat.
Bill VerSteeg
-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of David Lang
Sent: Wednesday, May 28, 2014 2:56 PM
To: Jonathan Morton
Cc: Hal Murray; bloat@lists.bufferbloat.net
Subject: Re: [Bloat] ipspace.net: "QUEUING MECHANISMS IN MODERN SWITCHES"
On Wed, 28 May 2014, Jonathan Morton wrote:
> On 28 May, 2014, at 12:39 pm, Hal Murray wrote:
>
>>> in non discarding scheduling total delay is conserved, irrespective
>>> of the scheduling discipline
>>
>> Is that true for all backplane/switching topologies?
>
> It's a mathematical truth for any topology that you can reduce to a
> black box with one or more inputs and one output, which you call a
> "queue" and which *does not discard* packets. Non-discarding queues
> don't exist in the real world, of course.
>
> The intuitive proof is that every time you promote a packet to be
> transmitted earlier, you must demote one to be transmitted later. A
> non-FIFO queue tends to increase the maximum delay and decrease the
> minimum delay, but the average delay will remain constant.
True, but not all traffic is equal. delays in DNS and short TCP connections are far more noticable than the same total delay in long TCP connections (because the users tend to be serialized on the short connections while doing the long ones in parallel)
so queueing that favors short duration flows over long duration ones still averages the same latency delay overall, but the latency/connection_length will remain very small in all cases instead lf letting this ratio become very large for short connections.
David Lang
>>> The question is if (codel/pie/whatever) AQM makes sense at all for
>>> 10G/40G hardware and higher performance irons? Igress/egress
>>> bandwidth is nearly identical, a larger/longer buffering should not
>>> happen. Line card memory is limited, a larger buffering is defacto excluded.
>>
>> The simplest interesting case is where you have two input lines
>> feeding the same output line.
>>
>> AQM may not be the best solution, but you have to do something.
>> Dropping any packet that won't fit into the buffer is probably simplest.
>
> The relative bandwidths of the input(s) and output(s) is also relevant. You *can* have a saturated 5-port switch with no dropped packets, even if one of them is a common uplink, provided the uplink port has four times the bandwidth and the traffic coming in on it is evenly distributed to the other four.
>
> Which yields you the classic tail-drop FIFO, whose faults are by now well documented. If you have the opportunity to do something better than that, you probably should. The simplest improvement I can think of is a *head*-drop FIFO, which gets the congestion signal back to the source quicker. It *should* I think be possible to do Codel at 10G (if not 40G) by now; whether or not it is *easy* probably depends on your transistor budget.
>
> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat
next prev parent reply other threads:[~2014-05-28 22:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-28 9:39 Hal Murray
2014-05-28 11:00 ` Jonathan Morton
2014-05-28 18:56 ` David Lang
2014-05-28 22:15 ` Bill Ver Steeg (versteb) [this message]
2014-05-29 7:20 ` Neil Davies
2014-05-29 14:06 ` Jonathan Morton
2014-05-29 16:58 ` Dave Taht
-- strict thread matches above, loose matches on Subject: below --
2014-05-27 8:21 Hagen Paul Pfeifer
2014-05-27 10:45 ` Neil Davies
2014-05-27 12:20 ` Hagen Paul Pfeifer
2014-05-27 12:34 ` Neil Davies
2014-05-28 18:44 ` David Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AE7F97DB5FEE054088D82E836BD15BE92452C67F@xmb-aln-x05.cisco.com \
--to=versteb@cisco.com \
--cc=bloat@lists.bufferbloat.net \
--cc=chromatix99@gmail.com \
--cc=david@lang.hm \
--cc=hmurray@megapathdsl.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox