From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x22a.google.com (mail-oi0-x22a.google.com [IPv6:2607:f8b0:4003:c06::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 1D9C63B25E for ; Wed, 13 Jul 2016 03:57:49 -0400 (EDT) Received: by mail-oi0-x22a.google.com with SMTP id w18so58931164oiw.3 for ; Wed, 13 Jul 2016 00:57:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6qxdTI1NObIPXHOXUsXTkhLufxWr6AyNUgJOxQX6cXk=; b=Br9uu1Mwi/yr9Y/YPabYstSoG1Let+7uvgb/+0/gxhob6lUkkkEpAHU/HuVq3WPdbO JzTjFjKJiCSabsGgd2o0+LsXpdq4mBI3Pd5JPmeF2JnslERJDxep3o62lIGX6pIcu5Y1 irXeLuBOMYq0lRn+QQi/dHvu8ilkSquPVyd3WTShD92e6veg0JeM6FjGILV2YRYlfTMU O9owe9x/EwpImTMNPZNxK7Now4xiA243MDjhHGmnt0UyP1byPo6THv2zo0V/grSFWN7d Ky51nssVQ4K5Xj4Pwqa4RvMfA2b8zls5bvQnb0sNAjax4STSguTz0X1XiAnO9FdRdu1N nTLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6qxdTI1NObIPXHOXUsXTkhLufxWr6AyNUgJOxQX6cXk=; b=RA7J7nuuF3IT5S1O+N+oLMG/I1moUMhTHKtE9AQsA4OpYhureHDt6iuXdZTo5X5SGy +KwcMzTLa5rNTl63FVcbiuo4CgH/m4C0e2lha8kZUoJkWewgRsz6/iSxKxjcEe1HqYlB IewGPOj8pPsmo95FBbCuDtK4eKvr3+7TN3jcaaM1eBkYHhJd65u03lMajmZz7P4ibRHR kCVNLleJ8OcS2LU7eFsQwgc8yId0qbLEGDFq8167o+TlS2jYFGQ4ctLyqXWcnNtLVUfY Q6/Pf0v3Olzzbcb4oarS4MlsPab4UIJvvtSXTsX4B7HKMa3L1goYAwkQxaFrSgtBCumx uBAQ== X-Gm-Message-State: ALyK8tLV/aDkImURKtLQXmFU2OGlud27wJz+CCRYmC91NtglY+7Bffb8GR8QGFVnhlsb5bfDzbv6VWT68y+IYw== X-Received: by 10.157.35.28 with SMTP id j28mr4319062otb.165.1468396668438; Wed, 13 Jul 2016 00:57:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.230.71 with HTTP; Wed, 13 Jul 2016 00:57:47 -0700 (PDT) In-Reply-To: References: <11fa6d16-21e2-2169-8d18-940f6dc11dca@nbd.name> <097af8e4-5393-8e1b-1748-36233e605867@nbd.name> From: Dave Taht Date: Wed, 13 Jul 2016 09:57:47 +0200 Message-ID: To: Felix Fietkau Cc: make-wifi-fast@lists.bufferbloat.net, linux-wireless , Michal Kazior , =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Make-wifi-fast] TCP performance regression in mac80211 triggered by the fq code X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2016 07:57:49 -0000 On Tue, Jul 12, 2016 at 4:02 PM, Dave Taht wrote: > On Tue, Jul 12, 2016 at 3:21 PM, Felix Fietkau wrote: >> On 2016-07-12 14:13, Dave Taht wrote: >>> On Tue, Jul 12, 2016 at 12:09 PM, Felix Fietkau wrote: >>>> Hi, >>>> >>>> With Toke's ath9k txq patch I've noticed a pretty nasty performance >>>> regression when running local iperf on an AP (running the txq stuff) t= o >>>> a wireless client. >>> >>> Your kernel? cpu architecture? >> QCA9558, 720 MHz, running Linux 4.4.14 So this is a single core at the near-bottom end of the range. I guess we also should find a MIPS 24c derivative that runs at 400Mhz or so. What HZ? (I no longer know how much higher HZ settings make any difference, but I'm usually at NOHZ and 250, rather than 100.) And all the testing to date was on much higher end multi-cores. >>> What happens when going through the AP to a server from the wireless cl= ient? >> Will test that next. Anddddd? >> >>> Which direction? >> AP->STA, iperf running on the AP. Client is a regular MacBook Pro >> (Broadcom). > > There are always 2 wifi chips in play. Like the Sith. > >>>> Here's some things that I found: >>>> - when I use only one TCP stream I get around 90-110 Mbit/s >>> >>> with how much cpu left over? >> ~20% >> >>>> - when running multiple TCP streams, I get only 35-40 Mbit/s total >>> with how much cpu left over? >> ~30% To me this implies a contending lock issue, too much work in the irq handler or too delayed work in the softirq handler.... I thought you were very brave to try and backport this. > > Hmm. > > Care to try netperf? > >> >>> context switch difference between the two tests? >> What's the easiest way to track that? > > if you have gnu "time" time -v the_process > > or: > > perf record -e context-switches -ag > > or: process /proc/$PID/status for cntx > >>> tcp_limit_output_bytes is? >> 262144 > > I keep hoping to be able to reduce this to something saner like 4096 > one day. It got bumped to 64k based on bad wifi performance once, and > then to it's current size to make the Xen folk happier. > > The other param I'd like to see fiddled with is tcp_notsent_lowat. > > In both cases reductions will increase your context switches but > reduce memory pressure and lead to a more reactive tcp. > > And in neither case I think this is the real cause of this problem. > > >>> got perf? >> Need to make a new build for that. >> >>>> - fairness between TCP streams looks completely fine >>> >>> A codel will get to long term fairness pretty fast. Packet captures >>> from a fq will show much more regular interleaving of packets, >>> regardless. >>> >>>> - there's no big queue buildup, the code never actually drops any pack= ets >>> >>> A "trick" I have been using to observe codel behavior has been to >>> enable ecn on server and client, then checking in wireshark for ect(3) >>> marked packets. >> I verified this with printk. The same issue already appears if I have >> just the fq patch (with the codel patch reverted). > > OK. A four flow test "should" trigger codel.... > > Running out of cpu (or hitting some other bottleneck), without > loss/marking "should" result in a tcptrace -G and xplot.org of the > packet capture showing the window continuing to increase.... > > >>>> - if I put a hack in the fq code to force the hash to a constant value >>> >>> You could also set "flows" to 1 to keep the hash being generated, but >>> not actually use it. >>> >>>> (effectively disabling fq without disabling codel), the problem >>>> disappears and even multiple streams get proper performance. >>> >>> Meaning you get 90-110Mbits ? >> Right. >> >>> Do you have a "before toke" figure for this platform? >> It's quite similar. >> >>>> Please let me know if you have any ideas. >>> >>> I am in berlin, packing hardware... >> Nice! >> >> - Felix >> > > > > -- > Dave T=C3=A4ht > Let's go make home routers and wifi faster! With better software! > http://blog.cerowrt.org --=20 Dave T=C3=A4ht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org