This *is* commonly a problem. Look up "TCP incast".

The scenario is exactly as you describe. A distributed database makes
queries over the same switch to K other nodes in order to verify the
integrity of the answer. Data is served from memory and thus access times
are roughly the same on all the K nodes. If the data response is sizable,
then the switch output port is overwhelmed with traffic, and it drops
packets. TCPs congestion algorithm gets into play.

It is almost like resonance in engineering. At the wrong "frequency", the
bridge/switch will resonate and make everything go haywire.


On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson <
sgunderson@bigfoot.com> wrote:

> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote:
> > Internal networks rarely have bandwidth issues and congestion only
> happens
> > when you don't have enough bandwidth.
>
> I don't think this is true. You might not have an aggregate bandwidth
> issues,
> but given the burstiness of TCP and the typical switch buffer depth
> (64 frames is a typical number), it's very very easy to lose packets in
> your
> switch even on a relatively quiet network with no downconversion. (Witness
> the rise of DCTCP, made especially for internal traffic on this kind of
> network.)
>
> /* Steinar */
> --
> Homepage: https://www.sesse.net/
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>


-- 
J.