From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x231.google.com (mail-oi0-x231.google.com [IPv6:2607:f8b0:4003:c06::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id C99173B25E for ; Tue, 12 Jul 2016 10:02:16 -0400 (EDT) Received: by mail-oi0-x231.google.com with SMTP id w18so23612757oiw.3 for ; Tue, 12 Jul 2016 07:02:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=k3vMUMp6BblxoDQQzmaLWDIwi+byFvvbXOr5s6yEI4k=; b=TdMyNlanRlD1x6ik/2RLBecnW9L3iwnKXdCa2GxXLWE09IeC/GVMGXwVSzwOpKulQX JhVEaer2p+i8bPdiWKWj76Z0wDcW2NwxGyzHgGamg/G532ACQdxszJLPC34/LGQXrKCN o70Ns0ak6K5Dxq5tWBRa1geAkytcEHcTLh6g5LQrrdaOJgbPzPnnz729xoBjdcvFPcmb /72mfZ6i/tjd2ET+4ZZ1vQsz/wmRyKCLPNjoEtCLZwzxc6OMPvxsRcKDElqADC/LVFLk 3U+BeK934kIpUoxHkjU+4k+dL3y7kASGeNzn4i9TTvFr6uUenJtv+PT+BP3XvBfaMhTX EFow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=k3vMUMp6BblxoDQQzmaLWDIwi+byFvvbXOr5s6yEI4k=; b=ZWO51nGXP9nNtorNDDaKc8d//RNoK45HLsktlGuhXsu9FrOJ5li/WywLycP9SShlP8 /SqMfEuB6kN/zL09zwpRhRd6V9o0a+NLvMAQyMwdmZzYbroKl6Pk+R7M6Vec5PkxJkLN 7VkysekVq7YfynJyTg20ByNZGmB9olwCp28O+7ZjfcU1ptlajRouytvAzHvGI0kNyrt+ wpOEw9VFM0m1T5bcWF0jjAsjiO2CXruLZ7tp7sbS+FZPPhxd6YGaVvHRqAoGBa45Xrxm BfeBbzZRek0QGqHnCfPZT2N91eED9FAnvLj2k1f2Dw1dG/UfC45114cnIyl2yVV/HcU2 d9og== X-Gm-Message-State: ALyK8tJ0jFpMQsF73CTvbT9/PXSPQVJRGqTAqseAbav13LoAURFXgsYWGgGoLBiG0KOhII7oF5JGoeAOHro8RA== X-Received: by 10.157.37.119 with SMTP id j52mr1463513otd.115.1468332136126; Tue, 12 Jul 2016 07:02:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.230.71 with HTTP; Tue, 12 Jul 2016 07:02:15 -0700 (PDT) In-Reply-To: <097af8e4-5393-8e1b-1748-36233e605867@nbd.name> References: <11fa6d16-21e2-2169-8d18-940f6dc11dca@nbd.name> <097af8e4-5393-8e1b-1748-36233e605867@nbd.name> From: Dave Taht Date: Tue, 12 Jul 2016 16:02:15 +0200 Message-ID: To: Felix Fietkau Cc: make-wifi-fast@lists.bufferbloat.net, linux-wireless , Michal Kazior , =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Make-wifi-fast] TCP performance regression in mac80211 triggered by the fq code X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2016 14:02:16 -0000 On Tue, Jul 12, 2016 at 3:21 PM, Felix Fietkau wrote: > On 2016-07-12 14:13, Dave Taht wrote: >> On Tue, Jul 12, 2016 at 12:09 PM, Felix Fietkau wrote: >>> Hi, >>> >>> With Toke's ath9k txq patch I've noticed a pretty nasty performance >>> regression when running local iperf on an AP (running the txq stuff) to >>> a wireless client. >> >> Your kernel? cpu architecture? > QCA9558, 720 MHz, running Linux 4.4.14 > >> What happens when going through the AP to a server from the wireless cli= ent? > Will test that next. > >> Which direction? > AP->STA, iperf running on the AP. Client is a regular MacBook Pro > (Broadcom). There are always 2 wifi chips in play. Like the Sith. >>> Here's some things that I found: >>> - when I use only one TCP stream I get around 90-110 Mbit/s >> >> with how much cpu left over? > ~20% > >>> - when running multiple TCP streams, I get only 35-40 Mbit/s total >> with how much cpu left over? > ~30% Hmm. Care to try netperf? > >> context switch difference between the two tests? > What's the easiest way to track that? if you have gnu "time" time -v the_process or: perf record -e context-switches -ag or: process /proc/$PID/status for cntx >> tcp_limit_output_bytes is? > 262144 I keep hoping to be able to reduce this to something saner like 4096 one day. It got bumped to 64k based on bad wifi performance once, and then to it's current size to make the Xen folk happier. The other param I'd like to see fiddled with is tcp_notsent_lowat. In both cases reductions will increase your context switches but reduce memory pressure and lead to a more reactive tcp. And in neither case I think this is the real cause of this problem. >> got perf? > Need to make a new build for that. > >>> - fairness between TCP streams looks completely fine >> >> A codel will get to long term fairness pretty fast. Packet captures >> from a fq will show much more regular interleaving of packets, >> regardless. >> >>> - there's no big queue buildup, the code never actually drops any packe= ts >> >> A "trick" I have been using to observe codel behavior has been to >> enable ecn on server and client, then checking in wireshark for ect(3) >> marked packets. > I verified this with printk. The same issue already appears if I have > just the fq patch (with the codel patch reverted). OK. A four flow test "should" trigger codel.... Running out of cpu (or hitting some other bottleneck), without loss/marking "should" result in a tcptrace -G and xplot.org of the packet capture showing the window continuing to increase.... >>> - if I put a hack in the fq code to force the hash to a constant value >> >> You could also set "flows" to 1 to keep the hash being generated, but >> not actually use it. >> >>> (effectively disabling fq without disabling codel), the problem >>> disappears and even multiple streams get proper performance. >> >> Meaning you get 90-110Mbits ? > Right. > >> Do you have a "before toke" figure for this platform? > It's quite similar. > >>> Please let me know if you have any ideas. >> >> I am in berlin, packing hardware... > Nice! > > - Felix > --=20 Dave T=C3=A4ht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org