From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-oi0-x231.google.com (mail-oi0-x231.google.com
 [IPv6:2607:f8b0:4003:c06::231])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id C99173B25E
 for <make-wifi-fast@lists.bufferbloat.net>;
 Tue, 12 Jul 2016 10:02:16 -0400 (EDT)
Received: by mail-oi0-x231.google.com with SMTP id w18so23612757oiw.3
 for <make-wifi-fast@lists.bufferbloat.net>;
 Tue, 12 Jul 2016 07:02:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=k3vMUMp6BblxoDQQzmaLWDIwi+byFvvbXOr5s6yEI4k=;
 b=TdMyNlanRlD1x6ik/2RLBecnW9L3iwnKXdCa2GxXLWE09IeC/GVMGXwVSzwOpKulQX
 JhVEaer2p+i8bPdiWKWj76Z0wDcW2NwxGyzHgGamg/G532ACQdxszJLPC34/LGQXrKCN
 o70Ns0ak6K5Dxq5tWBRa1geAkytcEHcTLh6g5LQrrdaOJgbPzPnnz729xoBjdcvFPcmb
 /72mfZ6i/tjd2ET+4ZZ1vQsz/wmRyKCLPNjoEtCLZwzxc6OMPvxsRcKDElqADC/LVFLk
 3U+BeK934kIpUoxHkjU+4k+dL3y7kASGeNzn4i9TTvFr6uUenJtv+PT+BP3XvBfaMhTX
 EFow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=k3vMUMp6BblxoDQQzmaLWDIwi+byFvvbXOr5s6yEI4k=;
 b=ZWO51nGXP9nNtorNDDaKc8d//RNoK45HLsktlGuhXsu9FrOJ5li/WywLycP9SShlP8
 /SqMfEuB6kN/zL09zwpRhRd6V9o0a+NLvMAQyMwdmZzYbroKl6Pk+R7M6Vec5PkxJkLN
 7VkysekVq7YfynJyTg20ByNZGmB9olwCp28O+7ZjfcU1ptlajRouytvAzHvGI0kNyrt+
 wpOEw9VFM0m1T5bcWF0jjAsjiO2CXruLZ7tp7sbS+FZPPhxd6YGaVvHRqAoGBa45Xrxm
 BfeBbzZRek0QGqHnCfPZT2N91eED9FAnvLj2k1f2Dw1dG/UfC45114cnIyl2yVV/HcU2
 d9og==
X-Gm-Message-State: ALyK8tJ0jFpMQsF73CTvbT9/PXSPQVJRGqTAqseAbav13LoAURFXgsYWGgGoLBiG0KOhII7oF5JGoeAOHro8RA==
X-Received: by 10.157.37.119 with SMTP id j52mr1463513otd.115.1468332136126;
 Tue, 12 Jul 2016 07:02:16 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.202.230.71 with HTTP; Tue, 12 Jul 2016 07:02:15 -0700 (PDT)
In-Reply-To: <097af8e4-5393-8e1b-1748-36233e605867@nbd.name>
References: <11fa6d16-21e2-2169-8d18-940f6dc11dca@nbd.name>
 <CAA93jw6=_pemSUjk3dje1GmnTzrs+=pEdza4j1Wjtrw+VKNdNg@mail.gmail.com>
 <097af8e4-5393-8e1b-1748-36233e605867@nbd.name>
From: Dave Taht <dave.taht@gmail.com>
Date: Tue, 12 Jul 2016 16:02:15 +0200
Message-ID: <CAA93jw4GQknawtp6Lo3ZM8=qW=p2eTnZ_krC7AvnNbKwp8R5SQ@mail.gmail.com>
To: Felix Fietkau <nbd@nbd.name>
Cc: make-wifi-fast@lists.bufferbloat.net, 
 linux-wireless <linux-wireless@vger.kernel.org>,
 Michal Kazior <michal.kazior@tieto.com>, 
 =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= <toke@toke.dk>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Make-wifi-fast] TCP performance regression in mac80211
	triggered by the fq code
X-BeenThere: make-wifi-fast@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: <make-wifi-fast.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/make-wifi-fast>,
 <mailto:make-wifi-fast-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/make-wifi-fast>
List-Post: <mailto:make-wifi-fast@lists.bufferbloat.net>
List-Help: <mailto:make-wifi-fast-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/make-wifi-fast>,
 <mailto:make-wifi-fast-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 12 Jul 2016 14:02:16 -0000

On Tue, Jul 12, 2016 at 3:21 PM, Felix Fietkau <nbd@nbd.name> wrote:
> On 2016-07-12 14:13, Dave Taht wrote:
>> On Tue, Jul 12, 2016 at 12:09 PM, Felix Fietkau <nbd@nbd.name> wrote:
>>> Hi,
>>>
>>> With Toke's ath9k txq patch I've noticed a pretty nasty performance
>>> regression when running local iperf on an AP (running the txq stuff) to
>>> a wireless client.
>>
>> Your kernel? cpu architecture?
> QCA9558, 720 MHz, running Linux 4.4.14
>
>> What happens when going through the AP to a server from the wireless cli=
ent?
> Will test that next.
>
>> Which direction?
> AP->STA, iperf running on the AP. Client is a regular MacBook Pro
> (Broadcom).

There are always 2 wifi chips in play. Like the Sith.

>>> Here's some things that I found:
>>> - when I use only one TCP stream I get around 90-110 Mbit/s
>>
>> with how much cpu left over?
> ~20%
>
>>> - when running multiple TCP streams, I get only 35-40 Mbit/s total
>> with how much cpu left over?
> ~30%

Hmm.

Care to try netperf?

>
>> context switch difference between the two tests?
> What's the easiest way to track that?

if you have gnu "time" time -v the_process

or:

perf record -e context-switches -ag

or: process /proc/$PID/status for cntx

>> tcp_limit_output_bytes is?
> 262144

I keep hoping to be able to reduce this to something saner like 4096
one day. It got bumped to 64k based on bad wifi performance once, and
then to it's current size to make the Xen folk happier.

The other param I'd like to see fiddled with is tcp_notsent_lowat.

In both cases reductions will increase your context switches but
reduce memory pressure and lead to a more reactive tcp.

And in neither case I think this is the real cause of this problem.


>> got perf?
> Need to make a new build for that.
>
>>> - fairness between TCP streams looks completely fine
>>
>> A codel will get to long term fairness pretty fast. Packet captures
>> from a fq will show much more regular interleaving of packets,
>> regardless.
>>
>>> - there's no big queue buildup, the code never actually drops any packe=
ts
>>
>> A "trick" I have been using to observe codel behavior has been to
>> enable ecn on server and client, then checking in wireshark for ect(3)
>> marked packets.
> I verified this with printk. The same issue already appears if I have
> just the fq patch (with the codel patch reverted).

OK. A four flow test "should" trigger codel....

Running out of cpu (or hitting some other bottleneck), without
loss/marking "should" result in a tcptrace -G and xplot.org of the
packet capture showing the window continuing to increase....


>>> - if I put a hack in the fq code to force the hash to a constant value
>>
>> You could also set "flows" to 1 to keep the hash being generated, but
>> not actually use it.
>>
>>> (effectively disabling fq without disabling codel), the problem
>>> disappears and even multiple streams get proper performance.
>>
>> Meaning you get 90-110Mbits ?
> Right.
>
>> Do you have a "before toke" figure for this platform?
> It's quite similar.
>
>>> Please let me know if you have any ideas.
>>
>> I am in berlin, packing hardware...
> Nice!
>
> - Felix
>


--=20
Dave T=C3=A4ht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org