From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-x243.google.com (mail-qt0-x243.google.com [IPv6:2607:f8b0:400d:c0d::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 47C183B25D for ; Tue, 16 Aug 2016 19:13:07 -0400 (EDT) Received: by mail-qt0-x243.google.com with SMTP id u25so3913801qtb.3 for ; Tue, 16 Aug 2016 16:13:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc; bh=1OxILELq0uAkILreOFG0hogGk2mEfkcc8tiCp22eEJs=; b=NC90Y16CccUHIp2DKqaXuuopl1MhqJdlBl4M+SZJ/ENu4zOo2prnmto6iye4Ed0TGl C70rH52RVcQKd0C/eKJYKL9vaVuojDIIkN2nhzbi4J14i5Rv8yivMQvoHt65bnVKnQeZ nvNs7VGBCb0gvCnbW/VGQlkK42Be0u5qxwTBsspR+ps2TSYglRfyPONoDD2RrbFfIscA +zTc0nH8C2dFYFwlZenVwiGoHQg7/o0y2YwzMfMUu0fNcCtKLEH1h3W442lGQd0WAjJX LEWuxr4o/i8xlbssyKKKoiCS8LyA3oSjO6ZSnGqKgg6OZZQrrINiMfqcDnfrgSKPBm8n p2zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=1OxILELq0uAkILreOFG0hogGk2mEfkcc8tiCp22eEJs=; b=E7OlsE7Ch6pqrw1o666qcIM3/Pnm1/88BccjkzhDp2zskH1phUCIyEOg+GZo4Ycu2P QuuRomUCTlZh3bWNpCaCrwWXXRwlbeeZLlKqDxRJI1PKLoRavBFtAK3fPUZXgZCljYuv eAVFDgkxExUJvKEw2gD3hd9+hg7niYfhkYmPO2U1FNiTowY6pmMmL7vEzTGCo4aUuwtH i6bAzU+h8+kX9Cyv2p0t5RQP8cXQIuNzjz+X/2zqWlTWsb4oPBCEJC01v/sfjkZrvBA/ +qxWHpBfG3NlWFx24A6BS1sdukwa0mJtbXLjf4Lradh1E9WpzB/EDSV3DFRM+na573oK lwnA== X-Gm-Message-State: AEkoouv1kkPHnJMcC6TMuEa7EspoSlPBFHwUfKW+OHgzK8n2JIHbARPWchGXb9ikzVrlRpYGFspXQrl7oX2cSw== X-Received: by 10.237.35.201 with SMTP id k9mr41514454qtc.92.1471389186870; Tue, 16 Aug 2016 16:13:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.237.50.196 with HTTP; Tue, 16 Aug 2016 16:13:06 -0700 (PDT) Reply-To: spikin.kev@gmail.com In-Reply-To: <1471380444.4943.17.camel@edumazet-glaptop3.roam.corp.google.com> References: <87pop85tvr.fsf@toke.dk> <1471380444.4943.17.camel@edumazet-glaptop3.roam.corp.google.com> From: Kevin Hayes Message-ID: To: Eric Dumazet Cc: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , make-wifi-fast@lists.bufferbloat.net, linux-wireless@vger.kernel.org, Felix Fietkau Content-Type: multipart/alternative; boundary=001a11396598cf5717053a387d6a X-Mailman-Approved-At: Mon, 28 Nov 2016 08:47:10 -0500 Subject: Re: [Make-wifi-fast] On the ath9k performance regression with FQ and crypto X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Tue, 16 Aug 2016 23:13:07 -0000 X-Original-Date: Tue, 16 Aug 2016 16:13:06 -0700 X-List-Received-Date: Tue, 16 Aug 2016 23:13:07 -0000 --001a11396598cf5717053a387d6a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable >And the crypto itself should be offloaded to hardware (shouldn't it? we do see a marked drop in overall throughput from just enabling crypto) Seems like you need to deterministically determine if the hw crypto is enabled, and actually happening. If not, then SW crypto could be consuming, um, more CPU than you want. I assume you are working on this. Beyond that, yes, the use of crypto will add about 16 B of overhead to each MPDU, so maybe 1%. Maybe the reported MTU might be less than 1500B so something must fragment?? K++ On Tue, Aug 16, 2016 at 1:47 PM, Eric Dumazet wrote: > > Do you have tcpdumps of > > 1) sample with crypto > > 2) sample without crypto. > > Looks like some TCP Small queue interaction with skb->truesize, if GSO > is involved, or encapsulation adding overhead. > > > On Tue, 2016-08-16 at 22:41 +0200, Toke H=C3=B8iland-J=C3=B8rgensen wrote= : > > So Dave and I have been spending the last couple of days trying to > > narrow down why there's a performance regression in some cases on ath9k > > with the softq-FQ patches. Felix first noticed this regression, and LED= E > > currently carries a patch [1] to disable the FQ portion of the softq > > patches to avoid it. > > > > While we have been able to narrow it down a little bit, no solution has > > been forthcoming, so this is an attempt to describe the bug in the hope > > that someone else will have an idea about what could be causing it. > > > > What we're seeing is the following (when the access point is running > > ath9k with the softq patches): > > > > When running two or more flows to a station, their combined throughput > > will be roughly 20-30% lower than the throughput of a single flow to th= e > > same station. This happens: > > > > - for both TCP and UDP traffic. > > - independent of the base rate (i.e. signal quality). > > - but only with crypto enabled (WPA2 CCMP in this case). > > > > However, the regression completely disappears if either of the > > following is true: > > > > - no crypto is enabled. > > - the FQ part of mac80211 is disabled (as in [1]). > > > > We have been able to reproduce this behaviour on two different ath9k > > hardware chips and two different architectures. > > > > The cause of the regression seems to be that the aggregates are smaller > > when there are two flows than when there is only one. Adding debug > > statements to the aggregate forming code indicates that this is because > > no more packets are available when the aggregates are built (i.e. > > ieee80211_tx_dequeue() returns NULL). > > > > We have not been able to determine why the queues run empty when this > > combination of circumstances arise. Since we easily get upwards of 120 > > Mbps of TCP throughput without crypto but with full FQ, it's clearly no= t > > the hashing overhead in itself that does it (and the hashing also > > happens with just one flow, so the overhead is still there). And the > > crypto itself should be offloaded to hardware (shouldn't it? we do see = a > > marked drop in overall throughput from just enabling crypto), so how > > would the queueing (say, mixing of packets from different flows) > > influence that? > > > > Does anyone have any ideas? We are stumped... > > > > -Toke > > > > [1] https://git.lede-project.org/?p=3Dlede/nbd/staging.git;a=3Dblob; > f=3Dpackage/kernel/mac80211/patches/220-fq_disable_hack.patch;h=3D > 7f420beea56335d5043de6fd71b5febae3e9bd79;hb=3DHEAD > > _______________________________________________ > > Make-wifi-fast mailing list > > Make-wifi-fast@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/make-wifi-fast > > > --=20 Kevin Hayes --001a11396598cf5717053a387d6a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
>And thecrypto itself should be offloaded to h= ardware (shouldn't it? we do see a
marked drop in overall throughput from ju= st enabling crypto)

Seems like you need to determ= inistically determine if the hw crypto is enabled, and actually happening.= =C2=A0 If not, then SW crypto could be consuming, um, more CPU than you wan= t.=C2=A0 I assume you are working on this.

Beyond = that, yes, the use of crypto will add about 16 B of overhead to each MPDU, = so maybe 1%.=C2=A0 Maybe the reported MTU might be less than 1500B so somet= hing must fragment??

K++

On Tue, Aug 16, 2016 at 1:47 PM= , Eric Dumazet <eric.dumazet@gmail.com> wrote:

Do you have tcpdumps of

1) sample with crypto

2) sample without crypto.

Looks like some TCP Small queue interaction with skb->truesize, if GSO is involved, or encapsulation adding overhead.


On Tue, 2016-08-16 at 22:41 +0200, Toke H=C3=B8iland-J=C3=B8rgensen wrote:<= br> > So Dave and I have been spending the last couple of days trying to
> narrow down why there's a performance regression in some cases on = ath9k
> with the softq-FQ patches. Felix first noticed this regression, and LE= DE
> currently carries a patch [1] to disable the FQ portion of the softq > patches to avoid it.
>
> While we have been able to narrow it down a little bit, no solution ha= s
> been forthcoming, so this is an attempt to describe the bug in the hop= e
> that someone else will have an idea about what could be causing it. >
> What we're seeing is the following (when the access point is runni= ng
> ath9k with the softq patches):
>
> When running two or more flows to a station, their combined throughput=
> will be roughly 20-30% lower than the throughput of a single flow to t= he
> same station. This happens:
>
> - for both TCP and UDP traffic.
> - independent of the base rate (i.e. signal quality).
> - but only with crypto enabled (WPA2 CCMP in this case).
>
> However, the regression completely disappears if either of the
> following is true:
>
> - no crypto is enabled.
> - the FQ part of mac80211 is disabled (as in [1]).
>
> We have been able to reproduce this behaviour on two different ath9k > hardware chips and two different architectures.
>
> The cause of the regression seems to be that the aggregates are smalle= r
> when there are two flows than when there is only one. Adding debug
> statements to the aggregate forming code indicates that this is becaus= e
> no more packets are available when the aggregates are built (i.e.
> ieee80211_tx_dequeue() returns NULL).
>
> We have not been able to determine why the queues run empty when this<= br> > combination of circumstances arise. Since we easily get upwards of 120=
> Mbps of TCP throughput without crypto but with full FQ, it's clear= ly not
> the hashing overhead in itself that does it (and the hashing also
> happens with just one flow, so the overhead is still there). And the > crypto itself should be offloaded to hardware (shouldn't it? we do= see a
> marked drop in overall throughput from just enabling crypto), so how > would the queueing (say, mixing of packets from different flows)
> influence that?
>
> Does anyone have any ideas? We are stumped...
>
> -Toke
>
> [1] https://git.lede-project.org/?p=3Dlede/nbd/staging.git;a= =3Dblob;f=3Dpackage/kernel/mac80211/patches/220-fq_disable_hack.<= wbr>patch;h=3D7f420beea56335d5043de6fd71b5febae3e9bd79;hb=3DHEAD<= /a>
> _______________________________________________
> Make-wifi-fast mailing list
>
Make-wifi-fast= @lists.bufferbloat.net
> https://lists.bufferbloat.net/listin= fo/make-wifi-fast





--
Kevin Hayes
--001a11396598cf5717053a387d6a--