Lets make wifi fast again!
 help / color / mirror / Atom feed
* Re: [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
       [not found] <E1bETEY-0000BM-00@www.xplot.org>
@ 2016-06-19  8:52 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 6+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-19  8:52 UTC (permalink / raw)
  To: Tim Shepard; +Cc: linux-wireless, make-wifi-fast, ath9k-devel, Felix Fietkau

Tim Shepard <shep@alum.mit.edu> writes:

>> +static struct sk_buff *
>> +ath_tid_pull(struct ath_atx_tid *tid)
>> +{
>> +	struct ath_softc *sc = tid->an->sc;
>> +	struct ieee80211_hw *hw = sc->hw;
>> +	struct ath_tx_control txctl = {
>> +		.txq = tid->txq,
>> +		.sta = tid->an->sta,
>> +	};
>> +	struct sk_buff *skb;
>> +	struct ath_frame_info *fi;
>> +	int q;
>> +
>> +	if (!tid->has_queued)
>> +		return NULL;
>> +
>> +	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
>> +	if (!skb) {
>> +		tid->has_queued = false;
>> +		return NULL;
>> +	}
>> +
>> +	if (ath_tx_prepare(hw, skb, &txctl)) {
>> +		ieee80211_free_txskb(hw, skb);
>> +		return NULL;
>> +	}
>> +
>> +	q = skb_get_queue_mapping(skb);
>> +	if (tid->txq == sc->tx.txq_map[q]) {
>> +		fi = get_frame_info(skb);
>> +		fi->txq = q;
>> +		++tid->txq->pending_frames;
>> +	}
>> +
>> +	return skb;
>> + }
>> +
>> +
>
> The increment of ->pending_frames lacks a corresponding check against
> sc->tx.txq_max_pending to see if we've reached the limit.  (Which begs
> the question: what to do if it has?)
>
> I discovered this bug doing experiments by trying to turn down the
> various /sys/kernel/debug/ieee80211/phy0/ath9k/qlen_* to low numbers
> (including as low as one, and then even zero) and found it had no
> effect.

You're right that it doesn't check the max. However, this is less of a
problem now that there is no intermediate queueing in the driver; and
indeed the utility of haven the qlen_* tunables is somewhat questionable
with the patch applied: The only thing this is going to control is the
size of the retry queue, and possible limit the size of the retry queue.
The actual queueing is happening in the mac80211 layer, which these
tunables can't control (and which is not FQ-CoDel controlled in
mac80211-next). So it might actually be that simply removing the
tunables is the right thing to do with this patch.

Removing the limits would also probably mean getting rid of txq->stopped
and the calls to ieee80211_wake_queue() and ieee80211_stop_queue().
I suspect that is fine when using the mac80211 intermediate queues, but
I'm not sure.

Felix, care to comment? :)

> The second more mysterious bug which I'm still struggling to
> understand is why doesn't large values in these ath9k/qlen_* (or more
> accurately, given the first bug above, the failure to check these qlen
> limit values at all) allow for increased hardware queue bloat (with
> observable delay).

Because there's a second limit in play (which has always been there): in
ath_tx_sched_aggr() there is this check:

	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
		*stop = true;
		return false;
	}

The two constants are 2 and 8 respectively. This means that, with
aggregation enabled, no more than two full aggregates will be queued up.
The size of the aggregates is dynamically computed from the current
rate: they are limited a maximum of four milliseconds of (estimated)
airtime (for the BE queue; the others have different limits).

So in a sense there's already a dynamic limit on the hardware queues.
Now, whether four milliseconds is the right maximum aggregate size might
be worth discussing. It is the maximum allowed by the standard. Dave and
I have been 

-Toke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 13:23   ` Felix Fietkau
@ 2016-07-06 14:45     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 6+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-06 14:45 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Tim Shepard, linux-wireless, make-wifi-fast, ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

>> Well, presumably the upper layers won't try to transmit anything through
>> the old TX path if the start/stop logic is implemented properly. The
>> chanctx code already seems to call the ieee80211_{start,stop}_queue()
>> functions when changing context, so not sure what else is needed. Guess
>> I'll go see if I can provoke an actual triggering of the bug, unless
>> Felix elaborates on what he means before I get around to it (poke,
>> Felix? :)).
> Then I guess the logic in ath_tx_start was a leftover from a time before
> some queue related rework happened to the chanctx code.
> In that case you can simply remove the chanctx related software queueing
> stuff from ath_tx_start.

Awesome. I'll double-check that I can't get the WARN_ON to trigger, then
send a v2 :)

-Toke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-04 17:46 ` Toke Høiland-Jørgensen
@ 2016-07-06 13:23   ` Felix Fietkau
  2016-07-06 14:45     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 6+ messages in thread
From: Felix Fietkau @ 2016-07-06 13:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Tim Shepard
  Cc: linux-wireless, make-wifi-fast, ath9k-devel

On 2016-07-04 19:46, Toke Høiland-Jørgensen wrote:
> Tim Shepard <shep@alum.mit.edu> writes:
> 
>> Thanks for unconfusing me a couple weeks ago, and cluing me into how
>> the limit on ->pending_frames is not really relevant for the data
>> packets that go through the new intermediate queues.
>>
>> But I'm not sure if this would allow us to remove the limit on
>> pending_frames because even though normal data packets would not
>> normally build up that many packets, there are other packet types
>> which bypass the intermediate queues and are transmitted directly
>> (also in most cases bypassing the ath9k internal queues in the way
>> ath9k worked before we patched it to use the mac80211 intermediate
>> queues).
> 
> Yes, but, well, since they're not queued they are not going to overflow
> anything. The aggregation building logic stops at two queued aggregates,
> so the default limit of 123 packets is never going to be hit when the
> queue is moved into the mac80211 layer. So keeping the knobs around only
> helps people who purposefully want to cripple their ability to do
> aggregation; and it won't be doing what it promises (limiting qlen),
> since that is now moved out of the driver. So IMO, removing the knobs is
> the right thing to do. I have already updated my patch to do so, which
> I'll send as a v2 once the other bits are resolved.
I agree.

>> Earlier Felix said:
>>
>>> Channel context based queue handling can be dealt with by
>>> stopping/starting relevant queues on channel context changes.
>>
>> But I don't see how to handle the case here where packets get passed
>> to the driver with ath_tx_start() and wind up in the scenario above.
> 
> Well, presumably the upper layers won't try to transmit anything through
> the old TX path if the start/stop logic is implemented properly. The
> chanctx code already seems to call the ieee80211_{start,stop}_queue()
> functions when changing context, so not sure what else is needed. Guess
> I'll go see if I can provoke an actual triggering of the bug, unless
> Felix elaborates on what he means before I get around to it (poke,
> Felix? :)).
Then I guess the logic in ath_tx_start was a leftover from a time before
some queue related rework happened to the chanctx code.
In that case you can simply remove the chanctx related software queueing
stuff from ath_tx_start.

- Felix

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
       [not found] <E1bJYSp-0001M8-00@www.xplot.org>
@ 2016-07-04 17:46 ` Toke Høiland-Jørgensen
  2016-07-06 13:23   ` Felix Fietkau
  0 siblings, 1 reply; 6+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-04 17:46 UTC (permalink / raw)
  To: Tim Shepard; +Cc: linux-wireless, make-wifi-fast, ath9k-devel, Felix Fietkau

Tim Shepard <shep@alum.mit.edu> writes:

> Thanks for unconfusing me a couple weeks ago, and cluing me into how
> the limit on ->pending_frames is not really relevant for the data
> packets that go through the new intermediate queues.
>
> But I'm not sure if this would allow us to remove the limit on
> pending_frames because even though normal data packets would not
> normally build up that many packets, there are other packet types
> which bypass the intermediate queues and are transmitted directly
> (also in most cases bypassing the ath9k internal queues in the way
> ath9k worked before we patched it to use the mac80211 intermediate
> queues).

Yes, but, well, since they're not queued they are not going to overflow
anything. The aggregation building logic stops at two queued aggregates,
so the default limit of 123 packets is never going to be hit when the
queue is moved into the mac80211 layer. So keeping the knobs around only
helps people who purposefully want to cripple their ability to do
aggregation; and it won't be doing what it promises (limiting qlen),
since that is now moved out of the driver. So IMO, removing the knobs is
the right thing to do. I have already updated my patch to do so, which
I'll send as a v2 once the other bits are resolved.

> Along similar lines, from reading the code I think your patch has
> introduced a bug (but I don't know how to demonstrate it at runtime).
>
> Looking in the body of ath_tx_start() at the result of applying your
> patch, we now see this:
>
> 	[...]
>
> 	/* Force queueing of all frames that belong to a virtual interface on
> 	 * a different channel context, to ensure that they are sent on the
> 	 * correct channel.
> 	 */
> 	if (((avp && avp->chanctx != sc->cur_chan) ||
> 	     sc->cur_chan->stopped) && !txctl->force_channel) {
> 		if (!txctl->an)
> 			txctl->an = &avp->mcast_node;
> 		queue = true;
> 		skip_uapsd = true;
> 	}
>
> 	if (!skip_uapsd && ps_resp) {
> 		ath_txq_unlock(sc, txq);
> 		txq = sc->tx.uapsdq;
> 		ath_txq_lock(sc, txq);
> 	} else if(WARN_ON(txctl->an && queue)) 
> 		ath_txq_unlock(sc, txq);
> 		return -EINVAL;
> 	}
>
> 	[...]
>
> In the case where the first if body above is run to force queuing of
> all packets (not just normal data packets), then the else case of the
> second if statement above will surely run and its if statement will
> surely be true, so your new WARN_ON will happen.

Yup, I'm aware of that (and it's why I put in the WARN_ON instead of
just removing those code paths). Haven't seen it trigger yet, but
haven't tried very hard either. Guess you're right that it requires
vifs on different channels...

> Earlier Felix said:
>
>> Channel context based queue handling can be dealt with by
>> stopping/starting relevant queues on channel context changes.
>
> But I don't see how to handle the case here where packets get passed
> to the driver with ath_tx_start() and wind up in the scenario above.

Well, presumably the upper layers won't try to transmit anything through
the old TX path if the start/stop logic is implemented properly. The
chanctx code already seems to call the ieee80211_{start,stop}_queue()
functions when changing context, so not sure what else is needed. Guess
I'll go see if I can provoke an actual triggering of the bug, unless
Felix elaborates on what he means before I get around to it (poke,
Felix? :)).

-Toke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
       [not found] <E1bEcwS-0006jR-00@www.xplot.org>
@ 2016-06-19 13:50 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 6+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-19 13:50 UTC (permalink / raw)
  To: Tim Shepard; +Cc: linux-wireless, make-wifi-fast, ath9k-devel, Felix Fietkau

Tim Shepard <shep@alum.mit.edu> writes:

>> 
>> You're right that it doesn't check the max. However, this is less of a
>> problem now that there is no intermediate queueing in the driver; and
>> indeed the utility of haven the qlen_* tunables is somewhat questionable
>> with the patch applied: The only thing this is going to control is the
>> size of the retry queue, and possible limit the size of the retry queue.
>> [....]
>
> The driver queues things up for the hardware to DMA and transmit.
> Something has to limit the amount of packets handed over to the
> hardware.  (We lack access to hardware documentation (grrrr!) but it
> appears to me that the hardware has a hard limit on how many packets
> can be handed to it.)

There's a ring buffer eight entries long that the aggregates (or
packets) are put on when actually being handed to the hardware.

This is in ath_txq->txq_fifo.

>> Because there's a second limit in play (which has always been there): in
>> ath_tx_sched_aggr() there is this check:
>> 
>> 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
>> 	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
>> 		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
>> 		*stop = true;
>> 		return false;
>> 	}
>> 
>> The two constants are 2 and 8 respectively. This means that, with
>> aggregation enabled, no more than two full aggregates will be queued up.
>> The size of the aggregates is dynamically computed from the current
>> rate: they are limited a maximum of four milliseconds of (estimated)
>> airtime (for the BE queue; the others have different limits).
>> 
>> So in a sense there's already a dynamic limit on the hardware queues.
>> Now, whether four milliseconds is the right maximum aggregate size might
>> be worth discussing. It is the maximum allowed by the standard. Dave and
>> I have been 
>
> Ah that may be the clue that I lacked.  There's got to be a dependency
> on processor speed (how quickly the system and driver can get another
> packet hooked up for transmission after completions) but perhaps with
> aggregates being so large in time, with full aggregates even the
> slowest processors are fast enough to avoid starvation.
>
> If there's no aggregation, a limit of some sort is needed (probably to
> prevent malfunction of the hardware/driver, but in any case to limit
> excess latency).  And this limit will depend on processor speed (and
> will need to be autotuned at runtime).

ATH_NON_AGGR_MIN_QDEPTH is 8 -- so yeah, the limit is higher if there is
no aggregation.

These are hard-coded values, so presumably they are large enough to keep
the hardware busy on most platforms (or someone would have noticed and
changed them?). So I doubt there is much to be gained to add a mechanism
to dynamically tune them (between 0 and 2?).

The exception being in case pulling from the mac80211 queue is too slow
to keep the hardware busy at the current settings. I see no problems
with this on my hardware, but that's an x86 box. I would probably hold
off on the dynamic tuning until having proven that there's actually a
bottleneck, though... ;)

-Toke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
       [not found] <20160617090929.31606-2-toke@toke.dk>
@ 2016-06-18 19:05 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 6+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-18 19:05 UTC (permalink / raw)
  To: linux-wireless, make-wifi-fast, ath9k-devel
  Cc: Toke Høiland-Jørgensen, Tim Shepard, Felix Fietkau

This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.

The old code path in ath_tx_start that would queue packets has been
disabled and turned into a WARN_ON() and failure. Figure it can be
removed in a v2 (or kept and removed later).

Based on Tim's original patch set, but reworked quite thoroughly.

Cc: Tim Shepard <shep@alum.mit.edu>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 drivers/net/wireless/ath/ath9k/ath9k.h     |   8 +-
 drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
 drivers/net/wireless/ath/ath9k/init.c      |   1 +
 drivers/net/wireless/ath/ath9k/main.c      |   1 +
 drivers/net/wireless/ath/ath9k/xmit.c      | 254 ++++++++++++++---------------
 5 files changed, 129 insertions(+), 139 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 5294595..b9cdf20 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -145,7 +145,9 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define BAW_WITHIN(_start, _bawsz, _seqno) \
 	((((_seqno) - (_start)) & 4095) < (_bawsz))
 
-#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
+#define ATH_STA_2_TID(_sta, _tidno) ((struct ath_atx_tid *)(_sta)->txq[_tidno]->drv_priv)
+#define ATH_VIF_2_TID(_vif) ((struct ath_atx_tid *)(_vif)->txq->drv_priv)
+#define ATH_AN_2_TID(_an, _tidno) ((_an)->sta ? ATH_STA_2_TID((_an)->sta, _tidno) : ATH_VIF_2_TID((_an)->vif))
 
 #define IS_HT_RATE(rate)   (rate & 0x80)
 #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
@@ -232,7 +234,6 @@ struct ath_buf {
 
 struct ath_atx_tid {
 	struct list_head list;
-	struct sk_buff_head buf_q;
 	struct sk_buff_head retry_q;
 	struct ath_node *an;
 	struct ath_txq *txq;
@@ -247,13 +248,13 @@ struct ath_atx_tid {
 	s8 bar_index;
 	bool active;
 	bool clear_ps_filter;
+	bool has_queued;
 };
 
 struct ath_node {
 	struct ath_softc *sc;
 	struct ieee80211_sta *sta; /* station struct we're part of */
 	struct ieee80211_vif *vif; /* interface with which we're associated */
-	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
 
 	u16 maxampdu;
 	u8 mpdudensity;
@@ -585,6 +586,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   u16 tids, int nframes,
 				   enum ieee80211_frame_release_type reason,
 				   bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue);
 
 /********/
 /* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index c2ca57a..d789798 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -52,8 +52,8 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
 			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 		ath_txq_lock(sc, txq);
 		if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 1c226d6..1434018 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -867,6 +867,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
 	hw->max_rate_tries = 10;
 	hw->sta_data_size = sizeof(struct ath_node);
 	hw->vif_data_size = sizeof(struct ath_vif);
+	hw->txq_data_size = sizeof(struct ath_atx_tid);
 	hw->extra_tx_headroom = 4;
 
 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 3aed43a..f584e19 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2673,4 +2673,5 @@ struct ieee80211_ops ath9k_ops = {
 	.sw_scan_start	    = ath9k_sw_scan_start,
 	.sw_scan_complete   = ath9k_sw_scan_complete,
 	.get_txpower        = ath9k_get_txpower,
+	.wake_tx_queue      = ath9k_wake_tx_queue,
 };
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index fe795fc..81fd480 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
 					   struct ath_txq *txq,
 					   struct ath_atx_tid *tid,
 					   struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+			  struct ath_tx_control *txctl);
 
 enum {
 	MCS_HT20,
@@ -118,6 +120,26 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 		list_add_tail(&tid->list, list);
 }
 
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue)
+{
+	struct ath_softc *sc = hw->priv;
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ath_atx_tid *tid = (struct ath_atx_tid *) queue->drv_priv;
+	struct ath_txq *txq = tid->txq;
+
+	ath_dbg(common, QUEUE, "Waking TX queue: %pM (%d)\n",
+		queue->sta ? queue->sta->addr : queue->vif->addr,
+		tid->tidno);
+
+	ath_txq_lock(sc, txq);
+
+	tid->has_queued = true;
+	ath_tx_queue_tid(sc, txq, tid);
+	ath_txq_schedule(sc, txq);
+
+	ath_txq_unlock(sc, txq);
+}
+
 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 {
 	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -173,9 +195,47 @@ ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
 	return ATH_AN_2_TID(an, tidno);
 }
 
+static struct sk_buff *
+ath_tid_pull(struct ath_atx_tid *tid)
+{
+	struct ath_softc *sc = tid->an->sc;
+	struct ieee80211_hw *hw = sc->hw;
+	struct ath_tx_control txctl = {
+		.txq = tid->txq,
+		.sta = tid->an->sta,
+	};
+	struct sk_buff *skb;
+	struct ath_frame_info *fi;
+	int q;
+
+	if (!tid->has_queued)
+		return NULL;
+
+	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
+	if (!skb) {
+		tid->has_queued = false;
+		return NULL;
+	}
+
+	if (ath_tx_prepare(hw, skb, &txctl)) {
+		ieee80211_free_txskb(hw, skb);
+		return NULL;
+	}
+
+	q = skb_get_queue_mapping(skb);
+	if (tid->txq == sc->tx.txq_map[q]) {
+		fi = get_frame_info(skb);
+		fi->txq = q;
+		++tid->txq->pending_frames;
+	}
+
+	return skb;
+ }
+
+
 static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
 {
-	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+	return !skb_queue_empty(&tid->retry_q) || tid->has_queued;
 }
 
 static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -184,46 +244,11 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
 
 	skb = __skb_dequeue(&tid->retry_q);
 	if (!skb)
-		skb = __skb_dequeue(&tid->buf_q);
+		skb = ath_tid_pull(tid);
 
 	return skb;
 }
 
-/*
- * ath_tx_tid_change_state:
- * - clears a-mpdu flag of previous session
- * - force sequence number allocation to fix next BlockAck Window
- */
-static void
-ath_tx_tid_change_state(struct ath_softc *sc, struct ath_atx_tid *tid)
-{
-	struct ath_txq *txq = tid->txq;
-	struct ieee80211_tx_info *tx_info;
-	struct sk_buff *skb, *tskb;
-	struct ath_buf *bf;
-	struct ath_frame_info *fi;
-
-	skb_queue_walk_safe(&tid->buf_q, skb, tskb) {
-		fi = get_frame_info(skb);
-		bf = fi->bf;
-
-		tx_info = IEEE80211_SKB_CB(skb);
-		tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU;
-
-		if (bf)
-			continue;
-
-		bf = ath_tx_setup_buffer(sc, txq, tid, skb);
-		if (!bf) {
-			__skb_unlink(skb, &tid->buf_q);
-			ath_txq_skb_done(sc, txq, skb);
-			ieee80211_free_txskb(sc->hw, skb);
-			continue;
-		}
-	}
-
-}
-
 static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
 {
 	struct ath_txq *txq = tid->txq;
@@ -858,7 +883,7 @@ static int ath_compute_num_delims(struct ath_softc *sc, struct ath_atx_tid *tid,
 
 static struct ath_buf *
 ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
-			struct ath_atx_tid *tid, struct sk_buff_head **q)
+			struct ath_atx_tid *tid)
 {
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
@@ -867,11 +892,7 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	u16 seqno;
 
 	while (1) {
-		*q = &tid->retry_q;
-		if (skb_queue_empty(*q))
-			*q = &tid->buf_q;
-
-		skb = skb_peek(*q);
+		skb = ath_tid_dequeue(tid);
 		if (!skb)
 			break;
 
@@ -883,7 +904,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 			bf->bf_state.stale = false;
 
 		if (!bf) {
-			__skb_unlink(skb, *q);
 			ath_txq_skb_done(sc, txq, skb);
 			ieee80211_free_txskb(sc->hw, skb);
 			continue;
@@ -912,8 +932,18 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 		seqno = bf->bf_state.seqno;
 
 		/* do not step over block-ack window */
-		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
+		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
+			__skb_queue_tail(&tid->retry_q, skb);
+
+			/* If there are other skbs in the retry q, they are
+			 * probably within the BAW, so loop immediately to get
+			 * one of them. Otherwise the queue can get stuck.
+			 *
+			 * FIXME: Do we need to protect against looping forever? */
+			if (!skb_queue_is_first(&tid->retry_q, skb))
+				continue;
 			break;
+		}
 
 		if (tid->bar_index > ATH_BA_INDEX(tid->seq_start, seqno)) {
 			struct ath_tx_status ts = {};
@@ -921,7 +951,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 
 			INIT_LIST_HEAD(&bf_head);
 			list_add(&bf->list, &bf_head);
-			__skb_unlink(skb, *q);
 			ath_tx_update_baw(sc, tid, seqno);
 			ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0);
 			continue;
@@ -933,11 +962,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	return NULL;
 }
 
-static bool
+static int
 ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		 struct ath_atx_tid *tid, struct list_head *bf_q,
-		 struct ath_buf *bf_first, struct sk_buff_head *tid_q,
-		 int *aggr_len)
+		 struct ath_buf *bf_first)
 {
 #define PADBYTES(_len) ((4 - ((_len) % 4)) % 4)
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
@@ -947,12 +975,13 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
 	struct sk_buff *skb;
-	bool closed = false;
+
 
 	bf = bf_first;
 	aggr_limit = ath_lookup_rate(sc, bf, tid);
 
-	do {
+	while (bf)
+	{
 		skb = bf->bf_mpdu;
 		fi = get_frame_info(skb);
 
@@ -961,12 +990,12 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes) {
 			if (aggr_limit < al + bpad + al_delta ||
 			    ath_lookup_legacy(bf) || nframes >= h_baw)
-				break;
+				goto stop;
 
 			tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 			if ((tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) ||
 			    !(tx_info->flags & IEEE80211_TX_CTL_AMPDU))
-				break;
+				goto stop;
 		}
 
 		/* add padding for previous frame to aggregation length */
@@ -988,20 +1017,18 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 			ath_tx_addto_baw(sc, tid, bf);
 		bf->bf_state.ndelim = ndelim;
 
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
 
 		bf_prev = bf;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
-		if (!bf) {
-			closed = true;
-			break;
-		}
-	} while (ath_tid_has_buffered(tid));
-
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
+	}
+	goto finish;
+stop:
+	__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
+finish:
 	bf = bf_first;
 	bf->bf_lastbf = bf_prev;
 
@@ -1012,9 +1039,7 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		TX_STAT_INC(txq->axq_qnum, a_aggr);
 	}
 
-	*aggr_len = al;
-
-	return closed;
+	return al;
 #undef PADBYTES
 }
 
@@ -1391,18 +1416,15 @@ static void ath_tx_fill_desc(struct ath_softc *sc, struct ath_buf *bf,
 static void
 ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		  struct ath_atx_tid *tid, struct list_head *bf_q,
-		  struct ath_buf *bf_first, struct sk_buff_head *tid_q)
+		  struct ath_buf *bf_first)
 {
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
-	struct sk_buff *skb;
 	int nframes = 0;
 
 	do {
 		struct ieee80211_tx_info *tx_info;
-		skb = bf->bf_mpdu;
 
 		nframes++;
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
@@ -1411,13 +1433,15 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes >= 2)
 			break;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
 		if (!bf)
 			break;
 
 		tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
-		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU)
+		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU) {
+			__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 			break;
+		}
 
 		ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	} while (1);
@@ -1428,34 +1452,33 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
 {
 	struct ath_buf *bf;
 	struct ieee80211_tx_info *tx_info;
-	struct sk_buff_head *tid_q;
 	struct list_head bf_q;
 	int aggr_len = 0;
-	bool aggr, last = true;
+	bool aggr;
 
 	if (!ath_tid_has_buffered(tid))
 		return false;
 
 	INIT_LIST_HEAD(&bf_q);
 
-	bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+	bf = ath_tx_get_tid_subframe(sc, txq, tid);
 	if (!bf)
 		return false;
 
 	tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
-		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 		*stop = true;
 		return false;
 	}
 
 	ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	if (aggr)
-		last = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf,
-					tid_q, &aggr_len);
+		aggr_len = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf);
 	else
-		ath_tx_form_burst(sc, txq, tid, &bf_q, bf, tid_q);
+		ath_tx_form_burst(sc, txq, tid, &bf_q, bf);
 
 	if (list_empty(&bf_q))
 		return false;
@@ -1498,9 +1521,6 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 		an->mpdudensity = density;
 	}
 
-	/* force sequence number allocation for pending frames */
-	ath_tx_tid_change_state(sc, txtid);
-
 	txtid->active = true;
 	*ssn = txtid->seq_start = txtid->seq_next;
 	txtid->bar_index = -1;
@@ -1525,7 +1545,6 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
 	ath_txq_lock(sc, txq);
 	txtid->active = false;
 	ath_tx_flush_tid(sc, txtid);
-	ath_tx_tid_change_state(sc, txtid);
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1535,14 +1554,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
-	bool buffered;
 	int tidno;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1552,13 +1569,9 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 			continue;
 		}
 
-		buffered = ath_tid_has_buffered(tid);
-
 		list_del_init(&tid->list);
 
 		ath_txq_unlock(sc, txq);
-
-		ieee80211_sta_set_buffered(sta, tidno, buffered);
 	}
 }
 
@@ -1571,19 +1584,12 @@ void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
 		tid->clear_ps_filter = true;
-
-		if (ath_tid_has_buffered(tid)) {
-			ath_tx_queue_tid(sc, txq, tid);
-			ath_txq_schedule(sc, txq);
-		}
-
 		ath_txq_unlock_complete(sc, txq);
 	}
 }
@@ -1606,11 +1612,6 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
 
 	tid->baw_size = IEEE80211_MIN_AMPDU_BUF << sta->ht_cap.ampdu_factor;
 
-	if (ath_tid_has_buffered(tid)) {
-		ath_tx_queue_tid(sc, txq, tid);
-		ath_txq_schedule(sc, txq);
-	}
-
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1626,7 +1627,6 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 	struct ieee80211_tx_info *info;
 	struct list_head bf_q;
 	struct ath_buf *bf_tail = NULL, *bf;
-	struct sk_buff_head *tid_q;
 	int sent = 0;
 	int i;
 
@@ -1641,11 +1641,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 
 		ath_txq_lock(sc, tid->txq);
 		while (nframes > 0) {
-			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid, &tid_q);
+			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid);
 			if (!bf)
 				break;
 
-			__skb_unlink(bf->bf_mpdu, tid_q);
 			list_add_tail(&bf->list, &bf_q);
 			ath_set_rates(tid->an->vif, tid->an->sta, bf);
 			if (bf_isampdu(bf)) {
@@ -1660,7 +1659,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 			sent++;
 			TX_STAT_INC(txq->axq_qnum, a_queued_hw);
 
-			if (an->sta && !ath_tid_has_buffered(tid))
+			if (an->sta && skb_queue_empty(&tid->retry_q))
 				ieee80211_sta_set_buffered(an->sta, i, false);
 		}
 		ath_txq_unlock_complete(sc, tid->txq);
@@ -2349,30 +2348,13 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 		skip_uapsd = true;
 	}
 
-	if (txctl->an && queue)
-		tid = ath_get_skb_tid(sc, txctl->an, skb);
-
 	if (!skip_uapsd && ps_resp) {
 		ath_txq_unlock(sc, txq);
 		txq = sc->tx.uapsdq;
 		ath_txq_lock(sc, txq);
-	} else if (txctl->an && queue) {
-		WARN_ON(tid->txq != txctl->txq);
-
-		if (info->flags & IEEE80211_TX_CTL_CLEAR_PS_FILT)
-			tid->clear_ps_filter = true;
-
-		/*
-		 * Add this frame to software queue for scheduling later
-		 * for aggregation.
-		 */
-		TX_STAT_INC(txq->axq_qnum, a_queued_sw);
-		__skb_queue_tail(&tid->buf_q, skb);
-		if (!txctl->an->sleeping)
-			ath_tx_queue_tid(sc, txq, tid);
-
-		ath_txq_schedule(sc, txq);
-		goto out;
+	} else if(WARN_ON(txctl->an && queue)) {
+		ath_txq_unlock(sc, txq);
+		return -EINVAL;
 	}
 
 	bf = ath_tx_setup_buffer(sc, txq, tid, skb);
@@ -2856,9 +2838,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS;
-	     tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		tid->an        = an;
 		tid->tidno     = tidno;
 		tid->seq_start = tid->seq_next = 0;
@@ -2866,11 +2847,14 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 		tid->baw_head  = tid->baw_tail = 0;
 		tid->active	   = false;
 		tid->clear_ps_filter = true;
-		__skb_queue_head_init(&tid->buf_q);
+		tid->has_queued  = false;
 		__skb_queue_head_init(&tid->retry_q);
 		INIT_LIST_HEAD(&tid->list);
 		acno = TID_TO_WME_AC(tidno);
 		tid->txq = sc->tx.txq_map[acno];
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
@@ -2880,9 +2864,8 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 	struct ath_txq *txq;
 	int tidno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -2894,6 +2877,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 		tid->active = false;
 
 		ath_txq_unlock(sc, txq);
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
-- 
2.8.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-06 14:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <E1bETEY-0000BM-00@www.xplot.org>
2016-06-19  8:52 ` [Make-wifi-fast] [PATCH] ath9k: Switch to using mac80211 intermediate software queues Toke Høiland-Jørgensen
     [not found] <E1bJYSp-0001M8-00@www.xplot.org>
2016-07-04 17:46 ` Toke Høiland-Jørgensen
2016-07-06 13:23   ` Felix Fietkau
2016-07-06 14:45     ` Toke Høiland-Jørgensen
     [not found] <E1bEcwS-0006jR-00@www.xplot.org>
2016-06-19 13:50 ` Toke Høiland-Jørgensen
     [not found] <20160617090929.31606-2-toke@toke.dk>
2016-06-18 19:05 ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox