From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from arrakis.dune.hu (arrakis.dune.hu [78.24.191.176]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 809F821F22F for ; Mon, 14 Jul 2014 03:48:44 -0700 (PDT) Received: from arrakis.dune.hu (localhost [127.0.0.1]) by arrakis.dune.hu (Postfix) with ESMTP id 821E92806A8; Mon, 14 Jul 2014 12:46:26 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on arrakis.dune.hu X-Spam-Level: **** X-Spam-Status: No, score=4.2 required=5.0 tests=BAYES_00, HELO_DYNAMIC_IPADDR, RDNS_DYNAMIC autolearn=no version=3.3.2 Received: from ip-109-47-141-111.web.vodafone.de (ip-109-47-141-111.web.vodafone.de [109.47.141.111]) by arrakis.dune.hu (Postfix) with ESMTPSA; Mon, 14 Jul 2014 12:46:26 +0200 (CEST) Message-ID: <53C3B57E.7070302@openwrt.org> Date: Mon, 14 Jul 2014 12:48:30 +0200 From: Felix Fietkau User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Sujith Manoharan , Dave Taht References: <21443.23457.914407.892188@gargle.gargle.HOWL> In-Reply-To: <21443.23457.914407.892188@gargle.gargle.HOWL> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "ath9k-devel@lists.ath9k.org" , Antonio Quartulli , cerowrt-devel Subject: Re: [Cerowrt-devel] [ath9k-devel] periodic hang of ath9k X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2014 10:48:45 -0000 On 2014-07-14 06:25, Sujith Manoharan wrote: > Dave Taht wrote: >> cc-ing ath9k-devel for this update on http://www.bufferbloat.net/issues/442 >> >> this bug, which some people (usually on macs with low signal strength) >> can get to occur fairly rapidly, but I can't, is driving me 9 kinds of >> crazy... > > Does stock OpenWrt also have this bug, or is this specific to Cerowrt ? After receiving some useful debug output from Antonio Quartulli (who was able to reproduce it easily), I believe that I have tracked down this issue to some bugs in counting pending tx frames. When frames get pushed through the U-APSD queue for PS responses and dropped there due to retransmissions, the counter probably does not get decremented properly. I've come up with an untested patch that should fix this codepath and make it easier to verify. If you're affected by the bug, please test this patch: --- --- a/drivers/net/wireless/ath/ath9k/ath9k.h +++ b/drivers/net/wireless/ath/ath9k/ath9k.h @@ -185,7 +185,8 @@ struct ath_atx_ac { struct ath_frame_info { struct ath_buf *bf; - int framelen; + u16 framelen; + s8 txq; enum ath9k_key_type keytype; u8 keyix; u8 rtscts_rate; --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -147,15 +147,13 @@ static void ath_set_rates(struct ieee802 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq, struct sk_buff *skb) { - int q; - - q = skb_get_queue_mapping(skb); - if (txq == sc->tx.uapsdq) - txq = sc->tx.txq_map[q]; + struct ath_frame_info *fi = get_frame_info(skb); + int q = fi->txq; - if (txq != sc->tx.txq_map[q]) + if (q < 0) return; + txq = sc->tx.txq_map[q]; if (WARN_ON(--txq->pending_frames < 0)) txq->pending_frames = 0; @@ -1999,6 +1997,7 @@ static void setup_frame_info(struct ieee an = (struct ath_node *) sta->drv_priv; memset(fi, 0, sizeof(*fi)); + fi->txq = -1; if (hw_key) fi->keyix = hw_key->hw_key_idx; else if (an && ieee80211_is_data(hdr->frame_control) && an->ps_key > 0) @@ -2150,6 +2149,7 @@ int ath_tx_start(struct ieee80211_hw *hw struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); struct ieee80211_sta *sta = txctl->sta; struct ieee80211_vif *vif = info->control.vif; + struct ath_frame_info *fi = get_frame_info(skb); struct ath_softc *sc = hw->priv; struct ath_txq *txq = txctl->txq; struct ath_atx_tid *tid = NULL; @@ -2170,11 +2170,13 @@ int ath_tx_start(struct ieee80211_hw *hw q = skb_get_queue_mapping(skb); ath_txq_lock(sc, txq); - if (txq == sc->tx.txq_map[q] && - ++txq->pending_frames > sc->tx.txq_max_pending[q] && - !txq->stopped) { - ieee80211_stop_queue(sc->hw, q); - txq->stopped = true; + if (txq == sc->tx.txq_map[q]) { + fi->txq = q; + if (++txq->pending_frames > sc->tx.txq_max_pending[q] && + !txq->stopped) { + ieee80211_stop_queue(sc->hw, q); + txq->stopped = true; + } } if (txctl->an && ieee80211_is_data_present(hdr->frame_control))