[Bloat] Network test tools for many parallel/concurrent connections?

Tue May 14 14:13:46 EDT 2013

There are really three kinds of killer traffic here, and it's important to
understand the differences so as to best design testing:

   1) long lived flows that clobber you and ruin your whole day.

   2) "streaming" video traffic (e.g. netflix, youtube, hulu), that are
actually "chunking" data over TCP, and putting periodic latency into your
connection as they temporarily build some queue.

fq_codel can deal really, really well with both 1 and 2.  But the number of
flows is usually not very large.

   3) the DOS attacks of visiting a new sharded web page on your
broadband/wireless connection, where you get the multiplication of N
connections * TCP Initial Window size, sometime resulting in pulses of
order hundred packets in a ton of new flows.  I've measured transient
latency of order 100's of milliseconds on a 50Mbps cable system! These web
sites generate a bunch of flows effectively simultaneously, each with often
only a few packets so never even do slow start to speak of.

Exactly what damage is done given 3, using fq_codel's algorithm isn't
entirely clear to me.  Many/most images on such sharded web sites are quite
small, even less than one packet at times.

fq_codel is clearly radically better than nothing at handling 3, but I
suspect we still have work to do...  Spdy will help if/when fully deployed,
but the ability to game buffers remains, and will continue to provide
incentive to anti-social applications to mis-behave.  We're really far from
done, but as Matt Mathis notes, what we have now in fq_codel is soooo,
sooooo much better than the current disaster, we shouldn't wait to deploy
something 'better' while working out problems like that.

I've thought for a while that exactly how we want to define a "flow" may
depend on where we are in the network: what's appropriate for an ISP is
different than what we do in the home, for example.

How best to test for the problems these generate, at various points in the
network, is still a somewhat open question.  And ensuring everything works
well at scale is extremely important.

I'm glad Jesper is doing scaling tests!
                                        - Jim

On Tue, May 14, 2013 at 1:01 PM, Dave Taht <dave.taht at gmail.com> wrote:

>
> On May 14, 2013 12:21 PM, "Stephen Hemminger" <stephen at networkplumber.org>
> wrote:
> >
> > On Tue, 14 May 2013 15:48:38 +0200
> > Jesper Dangaard Brouer <jbrouer at redhat.com> wrote:
> >
> > >
> > > (I'm testing fq_codel and codel)
> > >
> > > I need a test tool that can start many TCP streams (>1024).
> > > During/after the testrun I want to know if the connections got a fair
> > > share of the bandwidth.
> > >
> > > Can anyone recomment tools for this?
> > >
> > > After the test I would also like to, "deep-dive" analyse one of the TCP
> > > streams to see how the congestion window, outstanding-win/data is
> > > behaving.  Back in 2005 I used-to-use a tool  called
> > > "tcptrace" (http://www.tcptrace.org).
> > > Have any better tools surfaced?
> > >
> >
> >
> > You may want to look at some of the "realistic" load tools since
> > in real life not all flows are 100% of bandwidth and long lived.
>
> You may want to look at some realistic load tools since in real life
> 99.9Xx% of all flows are 100% of bandwidth AND long lived.
>
> At various small timescales a flow or flows can be 100% of bandwidth.
>
> But it still takes one full rate flow to mess up your whole day.
>
> This is why I suggested ab.
>
> Here bandwidth is an average usually taken over a second and often much
> more. If you sample at a higher resolution, like a ms, you are either at
> capacity or empty.
>
> Another way of thinking about it is for example, mrtg takes samples every
> 30 seconds and the most detailed presentation of that data it gives you is
> on a 5 minute interval. The biggest fq codel site I have almost never shows
> a 5 minute average over 60% of capacity, but I know full well that Netflix
> users are clobbering things on a 10 sec interval and that there are
> frequent peaks where it is running at capacity for a few seconds at a time
> from looking at the data on a much finer interval and the fq codel drop
> statistics.
>
> > _______________________________________________
> > Bloat mailing list
> > Bloat at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
>
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20130514/3652936f/attachment-0003.html>