From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-yx0-f171.google.com (mail-yx0-f171.google.com
	[209.85.213.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id A303720069D
	for <bloat@lists.bufferbloat.net>; Sat, 19 Nov 2011 14:47:41 -0800 (PST)
Received: by yenm6 with SMTP id m6so32519yen.16
	for <bloat@lists.bufferbloat.net>; Sat, 19 Nov 2011 14:47:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=F2jWkDLQifzrsu9aJu+yyAk2TTt/PaDS2uodAHwLDKg=;
	b=rpeW9RssQupGUtXz7Wlrgboh4i2NvGWbE4Yk0bqoPBHbVPthQcA10MNnKfQbAxwkrp
	NQEGADIXqKW57Cdvchob+ol1rcKaINDGK8sCe5c89MGSzvHPJvvYHU42RV1+hZo5QhJN
	LGANcliRjqfEf80qYDb4K/u2teNVeUP9FNraQ=
MIME-Version: 1.0
Received: by 10.182.115.5 with SMTP id jk5mr1838229obb.6.1321742858687; Sat,
	19 Nov 2011 14:47:38 -0800 (PST)
Received: by 10.182.193.65 with HTTP; Sat, 19 Nov 2011 14:47:38 -0800 (PST)
In-Reply-To: <CA+mtBx9A4j8PX3nmrWMLjE2Z1in0-HUo9kDLnbzb6mrkSnnYRQ@mail.gmail.com>
References: <CAA93jw7_m7Vf8uVLP1oSGbiZ=uLZpC8jPsM7wkacMRArwN7G2A@mail.gmail.com>
	<CA+mtBx9A4j8PX3nmrWMLjE2Z1in0-HUo9kDLnbzb6mrkSnnYRQ@mail.gmail.com>
Date: Sat, 19 Nov 2011 23:47:38 +0100
Message-ID: <CAA93jw5797UistnvTn5o4EG8DQCFXJadk5kMJnJo0cGf7TzjsQ@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: Tom Herbert <therbert@google.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] some (very good) preliminary results from fiddling with
 byte queue limits on 100Mbit ethernet
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 19 Nov 2011 22:47:41 -0000

On Sat, Nov 19, 2011 at 10:53 PM, Tom Herbert <therbert@google.com> wrote:
> Thanks for trying this out Dave!

I note that there was MAJOR churn in the 3.2 directory layouts and
if you could rebase that patchset on 3.2 it would be good.

>
>> With byte queue limits at mtu*3 + the SFQ qdisc, latency under load
>> can be hammered
>> =A0down below 6ms when running at a 100Mbit line rate. No CBQ needed.
>>
> I'm hoping that we didn't have to set the BQL max_limit. =A0I would
> guess that this might indicate some periodic spikes in interrupt
> latency (BQL will increase limit aggressively in that case). =A0You
> might want to try adjusting the hold_time to a lower value. =A0Also,
> disabling TSO might lower the limit.

You will find it helpful in debugging (and the results more pleasing)
to artificially lower your line rate to 100Mbit as per the ethtool trick
noted in the email prior.

This also disables TSO at least on the e1000e.

> Without lowering the max_limit, what values so you see for limit and
> inflight? =A0If you set min_limit to a really big number (effectively
> turn of BQL), what does inflight grow to?

It is very late in paris right now. I'll apply your suggestions in the morn=
ing.

>> Anyway, script could use improvement, and I'm busily patching BQL into
>> the ag71xx driver as I write.

I wish I could make QFQ work without a CBQ. So far no luck. It should be
better than SFQ, with the right classifier. SFQ might be better with a
different classifier... finally, to have options higher in the stack!

>>
> Cool, I look forward to those results!
>
>> Sorry it's taken me so long to get to this since your bufferbloat
>> talks at linux plumbers. APPLAUSE.
>> It's looking like BQL + SFQ is an effective means of improving
>> fairness and reducing latency on drivers
>> that can support it. Even if they have large tx rings that the
>> hardware demands.
>>
> Great. =A0I actually got back to looking at this a little last week.
> AFAICT the overhead of BQL is < 1% CPU and throughput (still need more
> testing to verify that).

Seeing it work well at 100Mbit (which much of the world still runs at - not=
ably
most ADSL and cable modems are running at that (or less), as do all 3
of my laptops)
*really* made my night. I've been having a losing battle with the
wireless stack
architecture of late...

You don't get a factor of ~50 improvement in something every day at
nearly zero cost!

I mean, with BQL, a saturated 100Mbit system will start new tcp
connects ~50x faster,
do local dns lookups in roughly 22 ms rather than 140, and so on and
so on, and so on.

At the moment I don't care if it eats 10% of CPU! so long as it saves the m=
ost
important component of the network - the user - time. :) (thus my
interest in QFQ now)

(particularly as I assume your < 1% of cpu is for gige speeds?)

And being self clocked, BQL can handle scary hardware things like
pause frames better, too.

Win all the way across the board.

Effectively tying the driver to the line rate as BQL seems to do moves the =
need
for more intelligence in queue management back up into  the qdisc layer.

I recently learned that with multi-cores it's actually possible to
have more than one
packet in the qdisc even at gigE speeds so a better qdisc up there may help=
 even
at that speed, assuming BQL scales up right.

>There are some (very) minor performance
> improvements that might be possible, but I don't have any major
> modifications pending at this point.

My major thought is that bytes on the wire is a proxy for 'time'. If
you did a smoothed
ewma based on bytes/each time interval, you might be able to hold
latencies down even
better, and still use a lightweight timesource like jiffies for the calcula=
tion.

All the same, the BQL API is wonderfully clean and you can fiddle as
much as you want with
the core algorithm without exposing the actual scheme elsewhere in the stac=
k.

My hat is off to you.  I HATED the artificially low tx queue rings I
was using in cerowrt...

>


--=20
Dave T=E4ht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net