From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <njs@vorpus.org>
Received: from mail-12-iad.dyndns.com (mxout-189-iad.mailhop.org
	[216.146.32.189])
	by lists.bufferbloat.net (Postfix) with ESMTP id F076F2E0532
	for <bloat-devel@lists.bufferbloat.net>;
	Wed, 16 Feb 2011 20:26:50 -0800 (PST)
Received: from scan-11-iad.mailhop.org (scan-11-iad.local [10.150.0.208])
	by mail-12-iad.dyndns.com (Postfix) with ESMTP id EB79F370334
	for <bloat-devel@lists.bufferbloat.net>;
	Thu, 17 Feb 2011 04:26:49 +0000 (UTC)
X-Spam-Score: -1.0 (-)
X-Mail-Handler: MailHop by DynDNS
X-Originating-IP: 209.85.214.43
Received: from mail-bw0-f43.google.com (mail-bw0-f43.google.com
	[209.85.214.43])
	by mail-12-iad.dyndns.com (Postfix) with ESMTP id 3E3F83700B2
	for <bloat-devel@lists.bufferbloat.net>;
	Thu, 17 Feb 2011 04:26:46 +0000 (UTC)
Received: by bwz14 with SMTP id 14so2313121bwz.16
	for <bloat-devel@lists.bufferbloat.net>;
	Wed, 16 Feb 2011 20:26:45 -0800 (PST)
MIME-Version: 1.0
Received: by 10.204.33.70 with SMTP id g6mr1193657bkd.177.1297916804220; Wed,
	16 Feb 2011 20:26:44 -0800 (PST)
Sender: njs@vorpus.org
Received: by 10.204.54.4 with HTTP; Wed, 16 Feb 2011 20:26:44 -0800 (PST)
In-Reply-To: <1297907356-3214-1-git-send-email-linville@tuxdriver.com>
References: <1297619803-2832-1-git-send-email-njs@pobox.com>
	<1297907356-3214-1-git-send-email-linville@tuxdriver.com>
Date: Wed, 16 Feb 2011 20:26:44 -0800
X-Google-Sender-Auth: g5WvNUM_oBIpEC1Qq-dB1-Mu2JE
Message-ID: <AANLkTimmvCFWU4v1Q2=vV4xa1kOaTcGABLwFcayHxcG9@mail.gmail.com>
Subject: Re: [RFC] mac80211: implement eBDP algorithm to fight bufferbloat
From: Nathaniel Smith <njs@pobox.com>
To: "John W. Linville" <linville@tuxdriver.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: bloat-devel@lists.bufferbloat.net, johannes@sipsolutions.net,
	linux-wireless@vger.kernel.org
X-BeenThere: bloat-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: "Developers working on AQM, device drivers,
	and networking stacks" <bloat-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat-devel>,
	<mailto:bloat-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat-devel>
List-Post: <mailto:bloat-devel@lists.bufferbloat.net>
List-Help: <mailto:bloat-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat-devel>,
	<mailto:bloat-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 17 Feb 2011 04:26:51 -0000

On Wed, Feb 16, 2011 at 5:49 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> I'm sure this isn't ideal. =C2=A0This should be combined with the ALT
> algorithm to yield the A* algorithm. =C2=A0There are parameters that
> probably should be tunable (or at least better researched). =C2=A0This ma=
y
> not be ideal for 802.11n -- it may even be detrimental to it.

Excellent!

General thoughts:

I think you're wiping out any interrupt mitigation any drivers are
doing, because they'll never get filled up to even their low water
mark? (http://article.gmane.org/gmane.linux.kernel.wireless.general/64843
has a scheme for adapting the eBDP idea to interrupt mitigation)

It's important to keep in mind the distinction between:
 -- a host's total tx buffer
 -- the individual queues that make up that buffer
In Linux we have two queues in serial, the net subsystem's Qdisc
thing, which feeds the driver's tx queue. It's this distinction that
makes it reasonable to shrink the tx queue down to really tiny sizes
(a few ms), because while a router needs a few hundred milliseconds
(~a RTT) of *total* buffering to absorb bursty packet arrivals, we
want as much of that buffering as possible to be happening in the
Qdisc, where it can have AMQ and QoS and applied.

A* is an algorithm for estimating the right total host buffer size.

(Of course, we might want to just disable the Qdisc buffering and move
everything inside the driver -- Felix Fietkau is considering this[1],
because when aggregating you really want a separate buffer per
destination STA, and there's no easy way to do this with the current
system -- but obviously that raises its own issues...)

[1] https://lists.bufferbloat.net/pipermail/bloat-devel/2011-February/00001=
3.html

> @@ -212,6 +216,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, st=
ruct sk_buff *skb)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (memcmp(hdr->ad=
dr2, sta->sdata->vif.addr, ETH_ALEN))
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0continue;
>
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 atomic_dec(&sta->sdata=
->enqueued);
> +
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* factor current tser=
v into service time estimate */
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ewma_add(&sta->sdata->=
tserv_ns_avg, ktime_to_ns(tserv));
> +

I think you're calculating the total time that the packet was resident
in the queue, and treating it like the time to service a single
packet. In my patch I also stored the current queue length at the time
the packet was enqueued, and then divided the time delta by the number
of packets that were serviced in that time.

> @@ -1323,6 +1325,20 @@ static int __ieee80211_tx(struct ieee80211_local *=
local,
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sdata =3D vif_to_s=
data(info->control.vif);
>
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* test for queue admi=
ssion qualifications */
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tserv_ns_avg =3D ewma_=
read(&sdata->tserv_ns_avg);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* constants 2 msec an=
d offset 5 should be tunable? */
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 max_enqueued =3D 2 * N=
SEC_PER_MSEC / tserv_ns_avg + 5;

5 packets worth of fudge factor seems high. I can measure 15-20 ms
single packet service times here just by turning down to 1Mbs on an
uncontended network; the Li et al paper you link has a graph
suggesting that on contented networks, 50-100 ms/packet is not
uncommon. (And even if I don't force my card to 1Mbs, with my patch
I'm still seeing appropriate buffer sizes in the 1-2 packet range.) So
you may be unconditionally adding a few hundred milliseconds of
latency here.

They make a good point that you might want some extra space to absorb
short-term fluctuations in packet service times, but I think it'd make
more sense to do that by clamping the queue length to some minimum
value (2 packets?), and possibly bumping up the magic "2 ms" constant.
Or be even more clever and estimate the standard deviation of the
single packet service times...

(http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_a=
lgorithm)

> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (atomic_read(&sdata=
->enqueued) > max_enqueued) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 /* silently drop */
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 dev_kfree_skb(skb);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 return IEEE80211_TX_OK;
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }

Shouldn't you be stopping the queue, too?

I think by leaving the queue open and discarding everything, you
effectively disable the net layer's Qdisc buffering, which might also
be a factor in your observations of reduced bufferbloat :-).

-- Nathaniel