From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com [209.85.219.54]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 87B3321F1A6; Tue, 14 May 2013 11:13:47 -0700 (PDT) Received: by mail-oa0-f54.google.com with SMTP id o17so1022178oag.41 for ; Tue, 14 May 2013 11:13:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=nejQdHIM8cMkyW1GbrbVime1u/fP60ppRTVU3ubA8q0=; b=PeH817qKPr+AbvdFZjEj/LdBV6H3PnR1O7tJR3vIzhmVPP0lbhqJFPzuXGH4KGg7k/ OWc5j5hz8RKc7nmvoQsCBZkX0RSxdnvYNyPVrsL3iaM2pe0QjVX/IC4nAyH0c4ax5Qfj z09qCM60e8GpUMWM1ym+IJbAsQTWgzyhq4vyM3KuSdgzG2q7yBbvGXIZKLba4c5+2mtv 5mAVK32bcLvgvPf8kUt+J9vd2BVuXCPFS08K42abc4u3ScV/vJziYqU5OegqPmPWRqt5 QCVv2pvG8VEtk89yCDD3fN7pj3fBGJNU4ZIv48ZkDUaHbdEAx+WZyzZhlc2byDQIesDm Tp8g== MIME-Version: 1.0 X-Received: by 10.60.45.98 with SMTP id l2mr16916585oem.103.1368555226557; Tue, 14 May 2013 11:13:46 -0700 (PDT) Sender: gettysjim@gmail.com Received: by 10.76.69.197 with HTTP; Tue, 14 May 2013 11:13:46 -0700 (PDT) In-Reply-To: References: <20130514154838.2d9622b7@redhat.com> <20130514084726.0fcd715a@nehalam.linuxnetplumber.net> Date: Tue, 14 May 2013 14:13:46 -0400 X-Google-Sender-Auth: oTptQXg8p25p74pCW4ZY3ZBZj00 Message-ID: From: Jim Gettys To: Dave Taht Content-Type: multipart/alternative; boundary=089e0149bbfe22875d04dcb198a3 Cc: "codel@lists.bufferbloat.net" , Jesper Dangaard Brouer , bloat Subject: Re: [Codel] [Bloat] Network test tools for many parallel/concurrent connections? X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 May 2013 18:13:47 -0000 --089e0149bbfe22875d04dcb198a3 Content-Type: text/plain; charset=ISO-8859-1 There are really three kinds of killer traffic here, and it's important to understand the differences so as to best design testing: 1) long lived flows that clobber you and ruin your whole day. 2) "streaming" video traffic (e.g. netflix, youtube, hulu), that are actually "chunking" data over TCP, and putting periodic latency into your connection as they temporarily build some queue. fq_codel can deal really, really well with both 1 and 2. But the number of flows is usually not very large. 3) the DOS attacks of visiting a new sharded web page on your broadband/wireless connection, where you get the multiplication of N connections * TCP Initial Window size, sometime resulting in pulses of order hundred packets in a ton of new flows. I've measured transient latency of order 100's of milliseconds on a 50Mbps cable system! These web sites generate a bunch of flows effectively simultaneously, each with often only a few packets so never even do slow start to speak of. Exactly what damage is done given 3, using fq_codel's algorithm isn't entirely clear to me. Many/most images on such sharded web sites are quite small, even less than one packet at times. fq_codel is clearly radically better than nothing at handling 3, but I suspect we still have work to do... Spdy will help if/when fully deployed, but the ability to game buffers remains, and will continue to provide incentive to anti-social applications to mis-behave. We're really far from done, but as Matt Mathis notes, what we have now in fq_codel is soooo, sooooo much better than the current disaster, we shouldn't wait to deploy something 'better' while working out problems like that. I've thought for a while that exactly how we want to define a "flow" may depend on where we are in the network: what's appropriate for an ISP is different than what we do in the home, for example. How best to test for the problems these generate, at various points in the network, is still a somewhat open question. And ensuring everything works well at scale is extremely important. I'm glad Jesper is doing scaling tests! - Jim On Tue, May 14, 2013 at 1:01 PM, Dave Taht wrote: > > On May 14, 2013 12:21 PM, "Stephen Hemminger" > wrote: > > > > On Tue, 14 May 2013 15:48:38 +0200 > > Jesper Dangaard Brouer wrote: > > > > > > > > (I'm testing fq_codel and codel) > > > > > > I need a test tool that can start many TCP streams (>1024). > > > During/after the testrun I want to know if the connections got a fair > > > share of the bandwidth. > > > > > > Can anyone recomment tools for this? > > > > > > After the test I would also like to, "deep-dive" analyse one of the TCP > > > streams to see how the congestion window, outstanding-win/data is > > > behaving. Back in 2005 I used-to-use a tool called > > > "tcptrace" (http://www.tcptrace.org). > > > Have any better tools surfaced? > > > > > > > > > You may want to look at some of the "realistic" load tools since > > in real life not all flows are 100% of bandwidth and long lived. > > You may want to look at some realistic load tools since in real life > 99.9Xx% of all flows are 100% of bandwidth AND long lived. > > At various small timescales a flow or flows can be 100% of bandwidth. > > But it still takes one full rate flow to mess up your whole day. > > This is why I suggested ab. > > Here bandwidth is an average usually taken over a second and often much > more. If you sample at a higher resolution, like a ms, you are either at > capacity or empty. > > Another way of thinking about it is for example, mrtg takes samples every > 30 seconds and the most detailed presentation of that data it gives you is > on a 5 minute interval. The biggest fq codel site I have almost never shows > a 5 minute average over 60% of capacity, but I know full well that Netflix > users are clobbering things on a 10 sec interval and that there are > frequent peaks where it is running at capacity for a few seconds at a time > from looking at the data on a much finer interval and the fq codel drop > statistics. > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > --089e0149bbfe22875d04dcb198a3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The= re are really three kinds of killer traffic here, and it's important to= understand the differences so as to best design testing:

=A0 =A01) = long lived flows that clobber you and ruin your whole day.

=A0 =A02) "streaming" video traffic (e.g. netflix, youtube, hulu)= , that are actually "chunking" data over TCP, and putting periodi= c latency into your connection as they temporarily build some queue.
<= div class=3D"gmail_default" style=3D"font-size:small">
fq_codel c= an deal really, really well with both 1 and 2. =A0But the number of flows i= s usually not very large.

=A0 =A03) = the DOS attacks of visiting a new sharded web page on your broadband/wirele= ss connection, where you get the multiplication of N connections * TCP Init= ial Window size, sometime resulting in pulses of order hundred packets in a= ton of new flows. =A0I've measured transient latency of order 100'= s of milliseconds on a 50Mbps cable system! These web sites generate a bunc= h of flows effectively simultaneously, each with often only a few packets s= o never even do slow start to speak of.

Exactly what damage is done gi= ven 3, using fq_codel's algorithm isn't entirely clear to me. =A0Ma= ny/most images on such sharded web sites are quite small, even less than on= e packet at times. =A0

fq_codel is clearly radically = better than nothing at handling 3, but I suspect we still have work to do..= . =A0Spdy will help if/when fully deployed, but the ability to game buffers= remains, and will continue to provide incentive to anti-social application= s to mis-behave. =A0We're really far from done, but as Matt Mathis note= s, what we have now in fq_codel is soooo, sooooo much better than the curre= nt disaster, we shouldn't wait to deploy something 'better' whi= le working out problems like that.

I've thought for a while t= hat exactly how we want to define a "flow" may depend on where we= are in the network: what's appropriate for an ISP is different than wh= at we do in the home, for example.

How best to test for the probl= ems these generate, at various points in the network, is still a somewhat o= pen question. =A0And ensuring everything works well at scale is extremely i= mportant.

I'm glad Jesper is doing s= caling tests!
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 - Jim



On Tue, May 14, 2013 at= 1:01 PM, Dave Taht <dave.taht@gmail.com> wrote:


On May 14, 2013 12:21 PM, "Stephen Hemminger" <stephen@networkplumber.org= > wrote:
>
> On Tue, 14 May 2013 15:48:38 +0200
> Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
>
> >
> > (I'm testing fq_codel and codel)
> >
> > I need a test tool that can start many TCP streams (>1024). > > During/after the testrun I want to know if the connections got a = fair
> > share of the bandwidth.
> >
> > Can anyone recomment tools for this?
> >
> > After the test I would also like to, "deep-dive" analys= e one of the TCP
> > streams to see how the congestion window, outstanding-win/data is=
> > behaving. =A0Back in 2005 I used-to-use a tool =A0called
> > "tcptrace" (http://www.tcptrace.org).
> > Have any better tools surfaced?
> >
>
>
> You may want to look at some of the "realistic" load tools s= ince
> in real life not all flows are 100% of bandwidth and long lived.

You may want to look at some realistic load tools sinc= e in real life 99.9Xx% of all flows are 100% of bandwidth AND long lived.

At various small timescales a flow or flows can be 100% of b= andwidth.

But it still takes one full rate flow to mess up your whole = day.

This is why I suggested ab.

Here bandwidth is an average usually taken over a second and= often much more. If you sample at a higher resolution, like a ms, you are = either at capacity or empty.

Another way of thinking about it is for example, mrtg takes = samples every 30 seconds and the most detailed presentation of that data it= gives you is on a 5 minute interval. The biggest fq codel site I have almo= st never shows a 5 minute average over 60% of capacity, but I know full wel= l that Netflix users are clobbering things on a 10 sec interval and that th= ere are frequent peaks where it is running at capacity for a few seconds at= a time from looking at the data on a much finer interval and the fq codel = drop statistics.

> _______________________________________________
> Bloat mailing list
> Bloat= @lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net<= /a>
= https://lists.bufferbloat.net/listinfo/bloat


--089e0149bbfe22875d04dcb198a3--