From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id B949620099A;
	Sat, 18 Aug 2012 10:07:47 -0700 (PDT)
Received: by wgbfa7 with SMTP id fa7so3308709wgb.28
	for <multiple recipients>; Sat, 18 Aug 2012 10:07:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=GCnq0vdizchC74BpnaOUBLqmUx+KDecxvQa8VbIcy8w=;
	b=yQ8uNcunMuIOer4kNsWi6qs7OtmYoRDjOxrA6Jn/kDGtiF7/dAcvTPaVcnCqJmVu7+
	ityFdoqUzpTFt7Djd99YHYgjYXklJoyL5fgJFBIvAGThjSLA2YWJiNN3oi7N7uY7p5ek
	E6ExhqNE25Mtki9bOkwlexwlZBFatuqMnII7iaoqzW2pnhvjgsslNkZcK0VOsfx8clTe
	+bcQzbiD+ttxzRiXIGX8E4bjN2upvbIbjyEwZ2YKKc9LtyCxWMBOmg9JfEpJClX7VLrI
	AP6CpIyUQ9od5y4u/AvZ+cCA4vOeCv1Ny0HhoGTTFUV9bUDOjQjjDXM+r8g0zMTXP5BG
	T/9A==
MIME-Version: 1.0
Received: by 10.180.109.129 with SMTP id hs1mr14735190wib.0.1345309665656;
	Sat, 18 Aug 2012 10:07:45 -0700 (PDT)
Received: by 10.223.143.69 with HTTP; Sat, 18 Aug 2012 10:07:45 -0700 (PDT)
In-Reply-To: <502F6279.1090708@etorok.net>
References: <CAA93jw50MeqWH6TVKditFGfg-V-mOi-UUtsABqd+WHs2vedHQw@mail.gmail.com>
	<502E064C.50305@etorok.net>
	<CAA93jw6JGFb9Eqh9tTdy2DY6fsNDDKpq391JQqirezoU4aJCSQ@mail.gmail.com>
	<502F6279.1090708@etorok.net>
Date: Sat, 18 Aug 2012 10:07:45 -0700
Message-ID: <CAA93jw7QAQYeUe=xNhGyDLBMwNtpq0ptXBLYBqfOR64=uTA2RQ@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: =?ISO-8859-1?Q?T=F6r=F6k_Edwin?= <edwin+ml-cerowrt@etorok.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: cerowrt-devel@lists.bufferbloat.net, bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] [Cerowrt-devel] cerowrt 3.3.8-17: nice latency
 improvements, some issues with bind
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 18 Aug 2012 17:07:48 -0000

Thx again for the benchmarks on your hardware! Can I get you to go one
more time to the well?

There's a subtle point to be made which basically involves the
difference between testing in lab conditions and in the real world.

On Sat, Aug 18, 2012 at 2:38 AM, T=F6r=F6k Edwin
<edwin+ml-cerowrt@etorok.net> wrote:
> Baseline (only ping, no other traffic):            0.806/ 1.323/  8.753/ =
1.333 ms
> no fq_codel on laptop, cerowrt defaults, nttcp -t: 1.192/16.605/107.351/2=
5.265 ms;  94 Mbps
> no fq_codel on laptop, cerowrt qlen_*=3D4, nttcp -t: 1.285/25.108/105.519=
/22.607 ms; 107 Mbps
> no fq_codel on laptop, cerowrt qlen_*=3D12,nttcp -t: 2.195/24.277/131.490=
/21.161 ms; 127 Mbps

Stripping out the incremental steps some will save you some time
on benchmarking, so lets go with 3,4,12,35,100. Wireless data is
incredibly noisy and I usually end up going with cdf plots like this
old one

http://www.teklibre.com/~d/bloat/hoqvssfqred.ps

to cope with noisy data with tons and tons of voip-like pings

http://www.teklibre.com/~d/bloat/ping_log.ps (also old)

but moving forward, we can do some stuff with this, so see below..

(to explain the first plot: sfqred was the predecessor to fq_codel,
and the first showed a distinct advantage towards optimizing for new
streams, which ended up (more elegantly) in fq_codel. The second plot
shows the effect of a small bandwidth change on latency, when the
underlying buffering was large. Yes, I need to get around to newer
plots but we still have some analysis and optimization to do of the
underlying codel algo)

> no fq_codel on laptop, cerowrt defaults, nttcp -r: 1.332/ 3.129/ 41.900/ =
5.221 ms;  39 Mbps
> no fq_codel on laptop, cerowrt qlen_*=3D4, nttcp -r: 1.514/ 3.205/  8.595=
/ 1.817 ms;  46 Mbps

> no fq_codel on laptop, cerowrt qlen_*=3D12,nttcp -r: 2.025/ 5.173/ 16.890=
/ 3.763 ms;  81 Mbps
> no fq_codel on laptop, cerowrt qlen_*=3D35,nttcp -r: 2.893/ 7.895/130.859=
/17.621 ms; 119 Mbps
> no fq_codel on laptop, cerowrt qlen_*=3D50,nttcp -r: 0.951/ 7.810/ 47.646=
/ 6.428 ms; 131 Mbps
> no fq_codel on laptop, cerowrt qlen_*=3D100,nttcp -r:5.149/ 8.766/ 14.371=
/ 2.191 ms; 128 Mbps
>
> To get twice the speed a qlen=3D11 is enough already, and to get all the =
speed back a qlen=3D35 is needed.

This is an incomplete conclusion. It is incomplete in that A) these
tests were done under laboratory conditions at the highest data rate
(MCS15), and B), it was with a single point to point link to an AP
which normally would be handling more than one client. C) it tests a
single full throttle TCP stream when typical websites and usage
involve 70+ dns lookups and 70 separate short streams.

I can live with B and C) for now, although I note that the chrome
benchmark while doing a full blown stream test as you are doing now in
the background and ping is quite useful for looking at C. Let's tackle
A...

>
> And here are the results with fq_codel on the laptop too (just nttcp -t a=
s thats the one affected):
>
> fq_codel on laptop, cerowrt defaults,  nttcp -t:  1.248/12.960/108.490/16=
.733 ms; 90 Mbps
> fq_codel on laptop, cerowrt qlen_*=3D4,  nttcp -t:  1.205/10.843/ 76.983/=
12.460 ms; 105 Mbps
> fq_codel on laptop, cerowrt qlen_*=3D8,  nttcp -t:  4.034/16.088/ 98.611/=
17.050 ms; 120 Mbps
> fq_codel on laptop, cerowrt qlen_*=3D11, nttcp -t:  3.766/15.687/ 56.684/=
11.135 ms; 114 Mbps
> fq_codel on laptop, cerowrt qlen_*=3D35, nttcp -t: 11.360/26.742/ 48.051/=
 7.489 ms; 113 Mbps

So, if you could move your laptop to where it gets MCS4 on a fairly
reliable basis, and repeat the tests? a wall or three will do it.

I will predict several things:

1) the bulk of the buffering problem is going to move to your laptop,
as it has weaker antennas than the wndrs. Most likely you will end up
with tx on the one side higher than rx on the other.

2) you will see much higher jitter and latency and much lower
throughput. Your results will also get wildly more variable run to
run. (I tend to run tests for 2 minutes or longer and toss out the
first few seconds)

3) The lower fixed buffering sizes on cero's qlens will start making a
lot more sense, but it may be hard to see due to 1 and 2.

The thing I don't honestly know is how well fq_codel reacts to sudden
bandwidth changes when the underlying device driver (the iwl in this
case) is overbuffered or how well codel's target idea really works in
the wifi case in general. It would be nice to have some data on it.
(hint, hint)

Some work was done on debloating the iwl last year, I don't know if
any of the work made it into mainline.

Lastly, I put a version of Linux 3.6-rc2 up here.

http://snapon.lab.bufferbloat.net/~cero1/deb/

It has a fix to codel in it that was needed (I think but have not
checked to see if it's in 3.5.1), and it also incorporates "TCP small
queues", which reduces tcp-related buffering in pfifo_fast enormously,
and helps on other qdiscs as well. Switching to it will invalidate the
testing you've done so far...

(another reason why I'm reluctant to post graphs on codel/fq_codel
right now is that good stuff keeps happening above/below it in Linux),

please don't change your kernel out before trying that test... (and I
make no warranties about the reliability/usefulness of a rc2!)


> Shouldn't wireless N be able to do 200 - 300 Mbps though? If I enable deb=
ugging in iwl4965 I see that it
> starts TX aggregation, so not sure whats wrong (router or laptop?). With =
encryption off I can get at most 160 Mbps.

A UDP test will get you in the 270Mbit range usually.

>
> iw dev sw10 station dump shows:
> ...
>         signal:         -56 [-60, -59] dBm
>         signal avg:     -125 [-65, -58] dBm
>         tx bitrate:     300.0 MBit/s MCS 15 40Mhz short GI
>         rx bitrate:     300.0 MBit/s MCS 15 40Mhz short GI
>
> On laptop:
>         tx bitrate:     300.0 Mbit/s MCS 15 40Mhz short GI

In non-lab conditions you generally don't lock into a rate. The
minstrel algorithm tries various strategies to get the packets
through, so you can
get a grip on what's really happening by looking at the rc_stats file
for your particular device.

example here:


http://www.bufferbloat.net/projects/cerowrt/wiki/Minstrel_Wireless_Rate_Sel=
ection