* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
@ 2016-03-21 17:10 Dave Taht
2016-03-22 8:05 ` Michal Kazior
0 siblings, 1 reply; 16+ messages in thread
From: Dave Taht @ 2016-03-21 17:10 UTC (permalink / raw)
To: Michal Kazior
Cc: Jasmine Strong, Network Development, linux-wireless, ath10k,
codel, make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 818 bytes --]
thx.
a lot to digest.
A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"
1) the new bursts_11e test *should* have stuck stuff in the VI and VO
queues, and there *should* have been some sort of difference shown on
the plots with it. There wasn't.
For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
CS6/CS7 should also land in VO (at least with the soft mac handler
last I looked). Is there a way to check if you are indeed exercising
all four 802.11e hardware queues in this test? in ath9k it is the
"xmit" sysfs var....
2) In all the old cases the BE UDP_RR flow died on the first burst
(why?), and the fullpatch preserved it. (I would have kind of hoped to
have seen the BK flow die, actually, in the fullpatch)
3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).
[-- Attachment #2: vivosame.png --]
[-- Type: image/png, Size: 276624 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-21 17:10 [Codel] [RFCv2 0/3] mac80211: implement fq codel Dave Taht
@ 2016-03-22 8:05 ` Michal Kazior
2016-03-22 9:51 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 16+ messages in thread
From: Michal Kazior @ 2016-03-22 8:05 UTC (permalink / raw)
To: Dave Taht
Cc: Jasmine Strong, Network Development, linux-wireless, ath10k,
codel, make-wifi-fast
On 21 March 2016 at 18:10, Dave Taht <dave.taht@gmail.com> wrote:
> thx.
>
> a lot to digest.
>
> A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"
>
> 1) the new bursts_11e test *should* have stuck stuff in the VI and VO
> queues, and there *should* have been some sort of difference shown on
> the plots with it. There wasn't.
traffic-gen generates only BE traffic. Everything else runs UDP_RR
which doesn't generate a lot of traffic.
> For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
> CS6/CS7 should also land in VO (at least with the soft mac handler
> last I looked). Is there a way to check if you are indeed exercising
> all four 802.11e hardware queues in this test? in ath9k it is the
> "xmit" sysfs var....
Hmm.. there are no txq stats. I guess it makes sense to have them?
There is /sys/kernel/debug/ieee80211/phy*/fq which dumps state of all
queues which will be mostly empty with UDP_RR. You can run netperf UDP
stream with diffserv marking to see onto which tid they are mapped.
You can see tid-AC mappings here:
https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues
I just checked and EF ends up as tid5 which is VI. It's actually the
same as CS5. You can use CS7 to run on tid7 which is VO.
> 2) In all the old cases the BE UDP_RR flow died on the first burst
> (why?), and the fullpatch preserved it.
I think it's related to my setup which involves veth pairs. I use them
to simulate bridging/AP behavior but maybe it's not doing the job
right, hmm..
> (I would have kind of hoped to
> have seen the BK flow die, actually, in the fullpatch)
There's no extra weight priority to BK. The difference between BE and
BK in 802.11 is contention window access time so BK gets less txops
statistically. Both share the same txop, which is 5.484ms in most
cases.
> 3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).
Yes, it should be albeit VI and VO have shorter txop compared to
BE/BK: 3.008ms and 1.504ms respectively.
UDP_RR doesn't really create a lot of opportunities for aggregation.
If you want to see how different queues behave when loaded you'll need
to modify traffic-gen and add bursts across different ACs in the
bursts_11e test.
Michał
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-22 8:05 ` Michal Kazior
@ 2016-03-22 9:51 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-03-22 9:51 UTC (permalink / raw)
To: Michal Kazior
Cc: Dave Taht, Network Development, linux-wireless, ath10k,
Jasmine Strong, codel, make-wifi-fast
Michal Kazior <michal.kazior@tieto.com> writes:
> traffic-gen generates only BE traffic. Everything else runs UDP_RR
> which doesn't generate a lot of traffic.
Good point. Fixed that: the newest git version of traffic-gen supports a
-t parameter which will be set as the TOS byte on outgoing traffic
(literal; no smart diffserv handling, so you can override the ECN bits
as well).
Added support for a burst-tos test parameter in the Flent burst test
configs which will use this new parameter if set.
-Toke
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Codel] [RFCv2 0/3] mac80211: implement fq codel
@ 2016-03-16 10:17 Michal Kazior
2016-03-16 10:26 ` Michal Kazior
0 siblings, 1 reply; 16+ messages in thread
From: Michal Kazior @ 2016-03-16 10:17 UTC (permalink / raw)
To: linux-wireless
Cc: ath10k, johannes, netdev, dave.taht, emmanuel.grumbach, nbd,
Tim Shepard, make-wifi-fast, codel, Michal Kazior
Hi,
Most notable changes:
* fixes (duh); fairness should work better now,
* EWMA codel target based on estimated service
time,
* new tx scheduling helper with in-flight
duration limiting (same idea Emmanuel
had for iwlwifi),
* added a few debugfs hooks.
* ath10k proof-of-concept that uses the new tx
scheduling (will post results in separate
email)
The patch grew pretty big and I plan on splitting
it before next submission. Any suggestions?
The tx scheduling probably needs more work and
testing. I didn't evaluate how CPU intensive it is
nor how it influences things like peak throughput
(lab conditions et al) yet.
I've uploaded a branch for convenience:
https://github.com/kazikcz/linux/tree/fqmac-rfc-v2
This is based on Kalle's ath tree.
Michal Kazior (3):
mac80211: implement fq_codel for software queuing
ath10k: report per-station tx/rate rates to mac80211
ath10k: use ieee80211_tx_schedule()
drivers/net/wireless/ath/ath10k/core.c | 2 -
drivers/net/wireless/ath/ath10k/core.h | 8 +-
drivers/net/wireless/ath/ath10k/debug.c | 61 ++-
drivers/net/wireless/ath/ath10k/mac.c | 126 +++---
drivers/net/wireless/ath/ath10k/wmi.h | 2 +-
include/net/mac80211.h | 96 ++++-
net/mac80211/agg-tx.c | 8 +-
net/mac80211/cfg.c | 2 +-
net/mac80211/codel.h | 264 +++++++++++++
net/mac80211/codel_i.h | 89 +++++
net/mac80211/debugfs.c | 267 +++++++++++++
net/mac80211/ieee80211_i.h | 45 ++-
net/mac80211/iface.c | 25 +-
net/mac80211/main.c | 9 +-
net/mac80211/rx.c | 2 +-
net/mac80211/sta_info.c | 10 +-
net/mac80211/sta_info.h | 27 ++
net/mac80211/status.c | 64 ++++
net/mac80211/tx.c | 658 ++++++++++++++++++++++++++++++--
net/mac80211/util.c | 21 +-
20 files changed, 1629 insertions(+), 157 deletions(-)
create mode 100644 net/mac80211/codel.h
create mode 100644 net/mac80211/codel_i.h
--
2.1.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 10:17 Michal Kazior
@ 2016-03-16 10:26 ` Michal Kazior
2016-03-16 15:37 ` Dave Taht
0 siblings, 1 reply; 16+ messages in thread
From: Michal Kazior @ 2016-03-16 10:26 UTC (permalink / raw)
To: linux-wireless
Cc: ath10k, Johannes Berg, Network Development, Dave Taht,
Emmanuel Grumbach, Felix Fietkau, Tim Shepard, make-wifi-fast,
codel, Michal Kazior
[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]
On 16 March 2016 at 11:17, Michal Kazior <michal.kazior@tieto.com> wrote:
> Hi,
>
> Most notable changes:
[...]
> * ath10k proof-of-concept that uses the new tx
> scheduling (will post results in separate
> email)
I'm attaching a bunch of tests I've done using flent. They are all
"burst" tests with burst-ports=1 and burst-length=2. The testing
topology is:
AP ----> STA
AP )) (( STA
[veth]--[br]--[wlan] )) (( [wlan]
You can notice that in some tests plot data gets cut-off. There are 2
problems I've identified:
- excess drops (not a problem with the patchset and can be seen when
there's no codel-in-mac or scheduling isn't used)
- UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
sometimes at times and doesn't Rx frames causing UDP_RR to stop
mid-way; confirmed with logs and sniffer; I haven't figured out *why*
exactly, could be some hw/fw quirk)
Let me know if you have questions or comments regarding my testing/results.
Michał
[-- Attachment #2: fq.tar.gz --]
[-- Type: application/x-gzip, Size: 63753 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 10:26 ` Michal Kazior
@ 2016-03-16 15:37 ` Dave Taht
2016-03-16 18:36 ` Dave Taht
2016-03-17 9:03 ` Michal Kazior
0 siblings, 2 replies; 16+ messages in thread
From: Dave Taht @ 2016-03-16 15:37 UTC (permalink / raw)
To: Michal Kazior
Cc: linux-wireless, ath10k, Johannes Berg, Network Development,
Emmanuel Grumbach, Felix Fietkau, Tim Shepard, make-wifi-fast,
codel
[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]
it is helpful to name the test files coherently in the flent tests, in
addition to using a directory structure and timestamp. It makes doing
comparison plots in data->add-other-open-data-files simpler. "-t
patched-mac-300mbps", for example.
Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
a while is "ok".
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi
On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
> On 16 March 2016 at 11:17, Michal Kazior <michal.kazior@tieto.com> wrote:
>> Hi,
>>
>> Most notable changes:
> [...]
>> * ath10k proof-of-concept that uses the new tx
>> scheduling (will post results in separate
>> email)
>
> I'm attaching a bunch of tests I've done using flent. They are all
> "burst" tests with burst-ports=1 and burst-length=2. The testing
> topology is:
>
> AP ----> STA
> AP )) (( STA
> [veth]--[br]--[wlan] )) (( [wlan]
>
> You can notice that in some tests plot data gets cut-off. There are 2
> problems I've identified:
> - excess drops (not a problem with the patchset and can be seen when
> there's no codel-in-mac or scheduling isn't used)
> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
> sometimes at times and doesn't Rx frames causing UDP_RR to stop
> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
> exactly, could be some hw/fw quirk)
>
> Let me know if you have questions or comments regarding my testing/results.
>
>
> Michał
[-- Attachment #2: cdf_comparison.png --]
[-- Type: image/png, Size: 87203 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 15:37 ` Dave Taht
@ 2016-03-16 18:36 ` Dave Taht
2016-03-16 18:55 ` Bob Copeland
2016-03-17 9:43 ` Michal Kazior
2016-03-17 9:03 ` Michal Kazior
1 sibling, 2 replies; 16+ messages in thread
From: Dave Taht @ 2016-03-16 18:36 UTC (permalink / raw)
To: Michal Kazior
Cc: linux-wireless, ath10k, Network Development, make-wifi-fast, codel
[-- Attachment #1: Type: text/plain, Size: 2224 bytes --]
That is the sanest 802.11e queue behavior I have ever seen! (at both
6 and 300mbit! in the ath10k patched mac test)
It would be good to add a flow to this test that exercises the VI
queue (CS5 diffserv marking?), and to repeat this test with wmm
disabled for comparison.
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi
On Wed, Mar 16, 2016 at 8:37 AM, Dave Taht <dave.taht@gmail.com> wrote:
> it is helpful to name the test files coherently in the flent tests, in
> addition to using a directory structure and timestamp. It makes doing
> comparison plots in data->add-other-open-data-files simpler. "-t
> patched-mac-300mbps", for example.
>
> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
> a while is "ok".
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>> On 16 March 2016 at 11:17, Michal Kazior <michal.kazior@tieto.com> wrote:
>>> Hi,
>>>
>>> Most notable changes:
>> [...]
>>> * ath10k proof-of-concept that uses the new tx
>>> scheduling (will post results in separate
>>> email)
>>
>> I'm attaching a bunch of tests I've done using flent. They are all
>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>> topology is:
>>
>> AP ----> STA
>> AP )) (( STA
>> [veth]--[br]--[wlan] )) (( [wlan]
>>
>> You can notice that in some tests plot data gets cut-off. There are 2
>> problems I've identified:
>> - excess drops (not a problem with the patchset and can be seen when
>> there's no codel-in-mac or scheduling isn't used)
>> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>> exactly, could be some hw/fw quirk)
>>
>> Let me know if you have questions or comments regarding my testing/results.
>>
>>
>> Michał
[-- Attachment #2: sanest_802.11eresult_i_have_ever_seen.png --]
[-- Type: image/png, Size: 146956 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 18:36 ` Dave Taht
@ 2016-03-16 18:55 ` Bob Copeland
2016-03-16 19:49 ` Jasmine Strong
2016-03-17 9:43 ` Michal Kazior
1 sibling, 1 reply; 16+ messages in thread
From: Bob Copeland @ 2016-03-16 18:55 UTC (permalink / raw)
To: Dave Taht
Cc: Michal Kazior, linux-wireless, ath10k, Network Development,
make-wifi-fast, codel
On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
> That is the sanest 802.11e queue behavior I have ever seen! (at both
> 6 and 300mbit! in the ath10k patched mac test)
Out of curiosity, why does BE have larger latency than BK in that chart?
I'd have expected the opposite.
--
Bob Copeland %% http://bobcopeland.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 18:55 ` Bob Copeland
@ 2016-03-16 19:49 ` Jasmine Strong
2016-03-17 8:55 ` Michal Kazior
0 siblings, 1 reply; 16+ messages in thread
From: Jasmine Strong @ 2016-03-16 19:49 UTC (permalink / raw)
To: Bob Copeland
Cc: Dave Taht, Network Development, linux-wireless, ath10k, codel,
Michal Kazior, make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 634 bytes --]
BK usually has 0 txop, so it doesn't do aggregation.
On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland <me@bobcopeland.com> wrote:
> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
> > That is the sanest 802.11e queue behavior I have ever seen! (at both
> > 6 and 300mbit! in the ath10k patched mac test)
>
> Out of curiosity, why does BE have larger latency than BK in that chart?
> I'd have expected the opposite.
>
> --
> Bob Copeland %% http://bobcopeland.com/
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
>
[-- Attachment #2: Type: text/html, Size: 1329 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 19:49 ` Jasmine Strong
@ 2016-03-17 8:55 ` Michal Kazior
2016-03-17 11:12 ` Bob Copeland
2016-03-17 17:00 ` Dave Taht
0 siblings, 2 replies; 16+ messages in thread
From: Michal Kazior @ 2016-03-17 8:55 UTC (permalink / raw)
To: Jasmine Strong
Cc: Bob Copeland, Dave Taht, Network Development, linux-wireless,
ath10k, codel, make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]
TxOP 0 has a special meaning in the standard. For HT/VHT it means the
it is actually limited to 5484us (mixed-mode) or 10000us (greenfield).
I suspect the BK/BE latency difference has to do with the fact that
there's bulk traffic going on BE queues (this isn't reflected
explicitly in the plots). The `bursts` flent test includes short
bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
(seen as green and blue lines on the plot). Due to (intended) limited
outflow (6mbps) BE queues build up and don't drain for the duration of
the entire test creating more opportunities for aggregating BE traffic
while other queues are near-empty and very short (time wise as well).
If you consider Wi-Fi is half-duplex and latency in the entire stack
(for processing ICMP and UDP_RR) is greater than 11e contention window
timings you can get your BE flow responses with extra delay (since
other queues might have responses ready quicker).
I've modified traffic-gen and re-run tests with bursts on all tested
tids/ACs (tid0, tid1, tid5). I'm attaching the results.
With bursts on all tids you can clearly see BK has much higher latency than BE.
(Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
this test; it doesn't have the weird hiccups I was seeing on QCA99X0
and newer QCA988X firmware reports bogus expected throughput which is
most likely a result of my sloppy proof-of-concept change in ath10k).
Michał
On 16 March 2016 at 20:48, Jasmine Strong <jas@eero.com> wrote:
> BK usually has 0 txop, so it doesn't do aggregation.
>
> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland <me@bobcopeland.com> wrote:
>>
>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>> > That is the sanest 802.11e queue behavior I have ever seen! (at both
>> > 6 and 300mbit! in the ath10k patched mac test)
>>
>> Out of curiosity, why does BE have larger latency than BK in that chart?
>> I'd have expected the opposite.
>>
>> --
>> Bob Copeland %% http://bobcopeland.com/
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
>
>
[-- Attachment #2: bursts-2016-03-17T083932.549858.qca988x_10_1_467_fqmac_ath10k_with_tx_sched_6mbps_.flent.gz --]
[-- Type: application/x-gzip, Size: 14649 bytes --]
[-- Attachment #3: bursts-2016-03-17T083803.348752.qca988x_10_1_467_fqmac_ath10k_with_tx_sched_6mbps_.flent.gz --]
[-- Type: application/x-gzip, Size: 15029 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-17 8:55 ` Michal Kazior
@ 2016-03-17 11:12 ` Bob Copeland
2016-03-17 17:00 ` Dave Taht
1 sibling, 0 replies; 16+ messages in thread
From: Bob Copeland @ 2016-03-17 11:12 UTC (permalink / raw)
To: Michal Kazior
Cc: Jasmine Strong, Dave Taht, Network Development, linux-wireless,
ath10k, codel, make-wifi-fast
On Thu, Mar 17, 2016 at 09:55:03AM +0100, Michal Kazior wrote:
> If you consider Wi-Fi is half-duplex and latency in the entire stack
> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).
Got it, that makes sense. Thanks for the explanation!
--
Bob Copeland %% http://bobcopeland.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-17 8:55 ` Michal Kazior
2016-03-17 11:12 ` Bob Copeland
@ 2016-03-17 17:00 ` Dave Taht
2016-03-17 17:24 ` Rick Jones
2016-03-21 11:57 ` Michal Kazior
1 sibling, 2 replies; 16+ messages in thread
From: Dave Taht @ 2016-03-17 17:00 UTC (permalink / raw)
To: Michal Kazior
Cc: Jasmine Strong, Network Development, linux-wireless, ath10k,
codel, make-wifi-fast
On Thu, Mar 17, 2016 at 1:55 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
> I suspect the BK/BE latency difference has to do with the fact that
> there's bulk traffic going on BE queues (this isn't reflected
> explicitly in the plots). The `bursts` flent test includes short
> bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
> (seen as green and blue lines on the plot). Due to (intended) limited
> outflow (6mbps) BE queues build up and don't drain for the duration of
> the entire test creating more opportunities for aggregating BE traffic
> while other queues are near-empty and very short (time wise as well).
I agree with your explanation. Access to the media and queue length
are the two variables at play here.
I just committed a new flent test that should exercise the vo,vi,be,
and bk queues, "bursts_11e". I dropped the conventional ping from it
and just rely on netperf's udp_rr for each queue. It seems to "do the
right thing" on the ath9k....
And while I'm all in favor of getting 802.11e's behaviors more right,
and this seems like a good way to get there...
netperf's udp_rr is not how much traffic conventionally behaves. It
doesn't do tcp slow start or congestion control in particular...
In the case of the VO queue, for example, the (2004) intended behavior
was 1 isochronous packet per 10ms per voice sending station and one
from the ap, not a "ping". And at the time, VI was intended to be
unicast video. TCP was an afterthought. (wifi's original (1993) mac
was actually designed for ipx/spx!)
I long for regular "rrul" and "rrul_be" tests against the new stuff to
blow it up thoroughly as references along the way.
(tcp_upload, tcp_download, (and several of the rtt_fair tests also
between stations)). Will get formal about it here as soon as we end up
on the same kernel trees....
Furthermore 802.11e is not widely used - in particular, not much
internet bound/sourced traffic falls into more than BE and BK,
presently. and in some cases weirder - comcast remarks a very large
percentage of to the home inbound traffic as CS1 (BK), btw, and
stations tend to use CS0. Data comes in on BK, acks go out on BE.
I/we will try to come up with intermediate tests between the burst
tests and the rrul tests as we go along the way.
> If you consider Wi-Fi is half-duplex and latency in the entire stack
In the context of this test regime...
<pedantry>
Saying wifi is "half"-duplex is a misleading way to think about it in
many respects. it is a shared medium more like early, non-switched
ethernet, with a weird mac that governs what sort of packets get
access to (a txop) the medium first, across all stations co-operating
within EDCA.
Half or full duplex is something that mostly applied to p2p serial
connections (or p2p wifi), not P2MP. Additionally characteristics like
exponential backoff make no sense were wifi any form of duplex, full
or half.
Certainly much stuff within a txop (block acks for example) can be
considered half duplex in a microcosmic context.
I wish we actually had words that accurately described wifi's actual behavior.
</pedantry>
> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).
yes. always having a request pending for each of the 802.11e queues is
actually not the best idea, it is better to take advantage of better
aggregation afforded by 802.11n/ac, to only have one or two of the
queues in use against any given station and promote or demote traffic
into a more-right queue.
simple example of the damage having all 4 queues always contending is
exemplified by running the rrul and rrul_be tests against nearly any
given AP.
>
> I've modified traffic-gen and re-run tests with bursts on all tested
> tids/ACs (tid0, tid1, tid5). I'm attaching the results.
>
> With bursts on all tids you can clearly see BK has much higher latency than BE.
The long term goal here, of course, is for BK (or the other queues) to
not have seconds of queuing latency but something more bounded to 2x
media access time...
> (Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
> this test; it doesn't have the weird hiccups I was seeing on QCA99X0
> and newer QCA988X firmware reports bogus expected throughput which is
> most likely a result of my sloppy proof-of-concept change in ath10k).
So I should avoid ben greer's firmware for now?
>
>
> Michał
>
> On 16 March 2016 at 20:48, Jasmine Strong <jas@eero.com> wrote:
>> BK usually has 0 txop, so it doesn't do aggregation.
>>
>> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland <me@bobcopeland.com> wrote:
>>>
>>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>>> > That is the sanest 802.11e queue behavior I have ever seen! (at both
>>> > 6 and 300mbit! in the ath10k patched mac test)
>>>
>>> Out of curiosity, why does BE have larger latency than BK in that chart?
>>> I'd have expected the opposite.
>>>
>>> --
>>> Bob Copeland %% http://bobcopeland.com/
>>>
>>> _______________________________________________
>>> ath10k mailing list
>>> ath10k@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/ath10k
>>
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-17 17:00 ` Dave Taht
@ 2016-03-17 17:24 ` Rick Jones
2016-03-21 11:57 ` Michal Kazior
1 sibling, 0 replies; 16+ messages in thread
From: Rick Jones @ 2016-03-17 17:24 UTC (permalink / raw)
To: Dave Taht, Michal Kazior
Cc: Network Development, linux-wireless, ath10k, Jasmine Strong,
codel, make-wifi-fast
On 03/17/2016 10:00 AM, Dave Taht wrote:
> netperf's udp_rr is not how much traffic conventionally behaves. It
> doesn't do tcp slow start or congestion control in particular...
Nor would one expect it to need to, unless one were using "burst mode"
to have more than one transaction inflight at one time.
And unless one uses the test-specific -e option to provide a very crude
retransmission mechanism based on a socket read timeout, neither does
UDP_RR recover from lost datagrams.
happy benchmarking,
rick jones
http://www.netperf.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-17 17:00 ` Dave Taht
2016-03-17 17:24 ` Rick Jones
@ 2016-03-21 11:57 ` Michal Kazior
1 sibling, 0 replies; 16+ messages in thread
From: Michal Kazior @ 2016-03-21 11:57 UTC (permalink / raw)
To: Dave Taht
Cc: Jasmine Strong, Network Development, linux-wireless, ath10k,
codel, make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 5516 bytes --]
On 17 March 2016 at 18:00, Dave Taht <dave.taht@gmail.com> wrote:
> On Thu, Mar 17, 2016 at 1:55 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>
>> I suspect the BK/BE latency difference has to do with the fact that
>> there's bulk traffic going on BE queues (this isn't reflected
>> explicitly in the plots). The `bursts` flent test includes short
>> bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
>> (seen as green and blue lines on the plot). Due to (intended) limited
>> outflow (6mbps) BE queues build up and don't drain for the duration of
>> the entire test creating more opportunities for aggregating BE traffic
>> while other queues are near-empty and very short (time wise as well).
>
> I agree with your explanation. Access to the media and queue length
> are the two variables at play here.
>
> I just committed a new flent test that should exercise the vo,vi,be,
> and bk queues, "bursts_11e". I dropped the conventional ping from it
> and just rely on netperf's udp_rr for each queue. It seems to "do the
> right thing" on the ath9k....
[...]
> I long for regular "rrul" and "rrul_be" tests against the new stuff to
> blow it up thoroughly as references along the way.
> (tcp_upload, tcp_download, (and several of the rtt_fair tests also
> between stations)). Will get formal about it here as soon as we end up
> on the same kernel trees....
[...]
> simple example of the damage having all 4 queues always contending is
> exemplified by running the rrul and rrul_be tests against nearly any
> given AP.
Thanks! I've run more tests and am attaching results.
A couple of words on the test naming:
- "fast" means 1x1 station with good RF conditions
- "slow" means 1x1 station with bad RF conditions (antenna unplugged)
- "fast+slow" means traffic is directed to both "fast" and "slow" stations
- "verfast" means 4x4 station for peak tput measurement
- "autorate" means rate control is enabled
- "rate6m" means 6mbps fixed tx rate on DUT
- the DUT is acting as AP in all tests
- other devices in the setup *do not* have any extra patches (so
bidirectional tests must be carefully analyzed)
- 4 sets of software patches:
- fullpatch contains all codel patches (mac80211+ath10k)
- macpatch contains only mac80211 changes (so ath10k at least gets
to use per-txq fq-codel like queuing)
- pre-waketx is ath10k with some patches reverted (before
pull-push/wake-tx-queue stuff was applied)
- waketx is current ath10k (i.e. with simple wake_tx_queue implementation)
Observations/ notes:
- "slow" case proves my naive get_expected_throughput() for ath10k is
highly inaccurate due to not considering retries. because of that
latency gets bad as mac80211's tx scheduling is queuing up more than
necessary; ath9k should do a lot better with minstrel
- i kept netperf2.6 (which has no udp-rr recovery) for now as it's
easier to spot glitches
Please let me know if you see anything interesting or worrying in these plots.
>> I've modified traffic-gen and re-run tests with bursts on all tested
>> tids/ACs (tid0, tid1, tid5). I'm attaching the results.
>>
>> With bursts on all tids you can clearly see BK has much higher latency than BE.
>
> The long term goal here, of course, is for BK (or the other queues) to
> not have seconds of queuing latency but something more bounded to 2x
> media access time...
My patch already tries to maintain txop-based in-flight tx queue
depth. Current defaults are to keep between 3-4 txops per hardware and
roughly 2txops per tid. You could argue these are too big but I wanted
to keep them conservative, at least initially, to make sure to not
affect peak throughput badly. All of these are knobs you can play with
via debugfs.
This requires drivers to use ieee80211_tx_schedule(). If driver merely
uses wake_tx_queue it will only benefit from flow fairness (albeit
limited) but it will not keep queues at N txop fill level (unless
driver does that on it's own).
This means that Tim's ath9k patch will need to be adjusted a bit to
make use of this new API prototype for full effect. Unfortunately I
didn't have time to play on this front yet.
>> (Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
>> this test; it doesn't have the weird hiccups I was seeing on QCA99X0
>> and newer QCA988X firmware reports bogus expected throughput which is
>> most likely a result of my sloppy proof-of-concept change in ath10k).
>
> So I should avoid ben greer's firmware for now?
I'm guessing his 10.1 fork should work fine. Not sure about the 10.2.4 though.
Anyway, keep in mind you'll get mixed results with ath10k. The
throughput estimation I've done for now is an ugly hack. It works in
fixed-rate conditions (which I use to prove a point that given
adequate rate estimation you can keep fw/hw tx queues at a reasonable
latency). It doesn't consider tx retries and unstable RF conditions
(rate control is in firmware and there's limited information available
to the driver) though which leads to more frames being queued than
necessary (and therefore increasing latency). This becomes apparent
with real-life interference and tx retries (just compare
"autorate,slow" against "rate6m,fast").
ath9k should do a lot better job at this (although that requires Tim's
patches; I haven't tested that myself) because it uses minstrel which
and should predict throughput a lot more reliably.
Michał
[-- Attachment #2: flent-2016-03-21.tar.gz --]
[-- Type: application/x-gzip, Size: 2029295 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 18:36 ` Dave Taht
2016-03-16 18:55 ` Bob Copeland
@ 2016-03-17 9:43 ` Michal Kazior
1 sibling, 0 replies; 16+ messages in thread
From: Michal Kazior @ 2016-03-17 9:43 UTC (permalink / raw)
To: Dave Taht
Cc: linux-wireless, ath10k, Network Development, make-wifi-fast, codel
[-- Attachment #1: Type: text/plain, Size: 2810 bytes --]
I've re-tested selected cases with wmm_enabled=0 set on the DUT AP.
I'm attaching results.
Naming:
* "old-" is without mac/ath10k changes (referred to as kvalo-reverts
previously) and fq_codel on qdiscs,
* "patched-" is all patches applied (both mac and ath),
* "-be-bursts" is stock "bursts" flent test,
* "-all-bursts" is modified "bursts" flent test to burst on all 3
tids simultaneously: tid0(BE), tid1(BK), tid5(VI).
Michał
On 16 March 2016 at 19:36, Dave Taht <dave.taht@gmail.com> wrote:
> That is the sanest 802.11e queue behavior I have ever seen! (at both
> 6 and 300mbit! in the ath10k patched mac test)
>
> It would be good to add a flow to this test that exercises the VI
> queue (CS5 diffserv marking?), and to repeat this test with wmm
> disabled for comparison.
>
>
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 8:37 AM, Dave Taht <dave.taht@gmail.com> wrote:
>> it is helpful to name the test files coherently in the flent tests, in
>> addition to using a directory structure and timestamp. It makes doing
>> comparison plots in data->add-other-open-data-files simpler. "-t
>> patched-mac-300mbps", for example.
>>
>> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
>> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
>> a while is "ok".
>> Dave Täht
>> Let's go make home routers and wifi faster! With better software!
>> https://www.gofundme.com/savewifi
>>
>>
>> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>>> On 16 March 2016 at 11:17, Michal Kazior <michal.kazior@tieto.com> wrote:
>>>> Hi,
>>>>
>>>> Most notable changes:
>>> [...]
>>>> * ath10k proof-of-concept that uses the new tx
>>>> scheduling (will post results in separate
>>>> email)
>>>
>>> I'm attaching a bunch of tests I've done using flent. They are all
>>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>>> topology is:
>>>
>>> AP ----> STA
>>> AP )) (( STA
>>> [veth]--[br]--[wlan] )) (( [wlan]
>>>
>>> You can notice that in some tests plot data gets cut-off. There are 2
>>> problems I've identified:
>>> - excess drops (not a problem with the patchset and can be seen when
>>> there's no codel-in-mac or scheduling isn't used)
>>> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>>> exactly, could be some hw/fw quirk)
>>>
>>> Let me know if you have questions or comments regarding my testing/results.
>>>
>>>
>>> Michał
[-- Attachment #2: bursts-2016-03-17T093033.443115.patched_all_bursts.flent.gz --]
[-- Type: application/x-gzip, Size: 13841 bytes --]
[-- Attachment #3: bursts-2016-03-17T092946.721003.patched_be_bursts.flent.gz --]
[-- Type: application/x-gzip, Size: 13786 bytes --]
[-- Attachment #4: bursts-2016-03-17T092445.132728.old_be_bursts.flent.gz --]
[-- Type: application/x-gzip, Size: 6349 bytes --]
[-- Attachment #5: bursts-2016-03-17T091952.053950.old_all_bursts.flent.gz --]
[-- Type: application/x-gzip, Size: 5458 bytes --]
[-- Attachment #6: patched-be-bursts.gif --]
[-- Type: image/gif, Size: 17961 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel
2016-03-16 15:37 ` Dave Taht
2016-03-16 18:36 ` Dave Taht
@ 2016-03-17 9:03 ` Michal Kazior
1 sibling, 0 replies; 16+ messages in thread
From: Michal Kazior @ 2016-03-17 9:03 UTC (permalink / raw)
To: Dave Taht
Cc: linux-wireless, ath10k, Johannes Berg, Network Development,
Emmanuel Grumbach, Felix Fietkau, Tim Shepard, make-wifi-fast,
codel
On 16 March 2016 at 16:37, Dave Taht <dave.taht@gmail.com> wrote:
> it is helpful to name the test files coherently in the flent tests, in
> addition to using a directory structure and timestamp. It makes doing
> comparison plots in data->add-other-open-data-files simpler. "-t
> patched-mac-300mbps", for example.
Sorry. I'm still trying to figure out what variables are worth
considering for comparison purposes.
> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
> a while is "ok".
I'm using 2.6 straight out of debian repos so yeah. I guess I'll try
using more recent netperf if I can't figure out the hiccups.
Michał
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>> On 16 March 2016 at 11:17, Michal Kazior <michal.kazior@tieto.com> wrote:
>>> Hi,
>>>
>>> Most notable changes:
>> [...]
>>> * ath10k proof-of-concept that uses the new tx
>>> scheduling (will post results in separate
>>> email)
>>
>> I'm attaching a bunch of tests I've done using flent. They are all
>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>> topology is:
>>
>> AP ----> STA
>> AP )) (( STA
>> [veth]--[br]--[wlan] )) (( [wlan]
>>
>> You can notice that in some tests plot data gets cut-off. There are 2
>> problems I've identified:
>> - excess drops (not a problem with the patchset and can be seen when
>> there's no codel-in-mac or scheduling isn't used)
>> - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>> exactly, could be some hw/fw quirk)
>>
>> Let me know if you have questions or comments regarding my testing/results.
>>
>>
>> Michał
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-03-22 9:51 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-21 17:10 [Codel] [RFCv2 0/3] mac80211: implement fq codel Dave Taht
2016-03-22 8:05 ` Michal Kazior
2016-03-22 9:51 ` Toke Høiland-Jørgensen
-- strict thread matches above, loose matches on Subject: below --
2016-03-16 10:17 Michal Kazior
2016-03-16 10:26 ` Michal Kazior
2016-03-16 15:37 ` Dave Taht
2016-03-16 18:36 ` Dave Taht
2016-03-16 18:55 ` Bob Copeland
2016-03-16 19:49 ` Jasmine Strong
2016-03-17 8:55 ` Michal Kazior
2016-03-17 11:12 ` Bob Copeland
2016-03-17 17:00 ` Dave Taht
2016-03-17 17:24 ` Rick Jones
2016-03-21 11:57 ` Michal Kazior
2016-03-17 9:43 ` Michal Kazior
2016-03-17 9:03 ` Michal Kazior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox