* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
[not found] ` <C0DD393A-6810-4CB6-B705-AE801ED5BBBA@gmx.de>
@ 2013-12-13 9:27 ` Sujith Manoharan
2013-12-13 9:48 ` Sebastian Moeller
2013-12-13 20:56 ` Dave Taht
0 siblings, 2 replies; 14+ messages in thread
From: Sujith Manoharan @ 2013-12-13 9:27 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: ath9k-devel, linux-wireless, cerowrt-devel
Sebastian Moeller wrote:
> It is a net gear WNDR3700 v2, so according to:
> http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680
> MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros
> AR9220 802.11an.
>
> Sure, I hope I got the right one. Now this is not from the same boot as the
> one with the errors, but I assume that does not make a difference… Since I am
> located in Germany I set the regulatory domain to DE. please let me know if I
> you need any additional information or testing (note I am not set up to build
> cerowrt myself, so I would need Dave Täht's help to build a modified firmware)
Can you try this patch ?
diff --git a/drivers/net/wireless/ath/ath9k/ar9002_mac.c b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
index 8d78253..0337de7 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_mac.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
@@ -76,9 +76,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
mask2 |= ATH9K_INT_CST;
if (isr2 & AR_ISR_S2_TSFOOR)
mask2 |= ATH9K_INT_TSFOOR;
+
+ if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
+ REG_WRITE(ah, AR_ISR_S2, isr2);
+ isr &= ~AR_ISR_BCNMISC;
+ }
}
- isr = REG_READ(ah, AR_ISR_RAC);
+ if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)
+ isr = REG_READ(ah, AR_ISR_RAC);
+
if (isr == 0xffffffff) {
*masked = 0;
return false;
@@ -97,11 +104,23 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
*masked |= ATH9K_INT_TX;
- s0_s = REG_READ(ah, AR_ISR_S0_S);
+ if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
+ s0_s = REG_READ(ah, AR_ISR_S0_S);
+ s1_s = REG_READ(ah, AR_ISR_S1_S);
+ } else {
+ s0_s = REG_READ(ah, AR_ISR_S0);
+ REG_WRITE(ah, AR_ISR_S0, s0_s);
+ s1_s = REG_READ(ah, AR_ISR_S1);
+ REG_WRITE(ah, AR_ISR_S1, s1_s);
+
+ isr &= ~(AR_ISR_TXOK |
+ AR_ISR_TXDESC |
+ AR_ISR_TXERR |
+ AR_ISR_TXEOL);
+ }
+
ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXOK);
ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXDESC);
-
- s1_s = REG_READ(ah, AR_ISR_S1_S);
ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXERR);
ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXEOL);
}
@@ -120,7 +139,12 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
if (isr & AR_ISR_GENTMR) {
u32 s5_s;
- s5_s = REG_READ(ah, AR_ISR_S5_S);
+ if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
+ s5_s = REG_READ(ah, AR_ISR_S5_S);
+ } else {
+ s5_s = REG_READ(ah, AR_ISR_S5);
+ }
+
ah->intr_gen_timer_trigger =
MS(s5_s, AR_ISR_S5_GENTIMER_TRIG);
@@ -133,6 +157,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
if ((s5_s & AR_ISR_S5_TIM_TIMER) &&
!(pCap->hw_caps & ATH9K_HW_CAP_AUTOSLEEP))
*masked |= ATH9K_INT_TIM_TIMER;
+
+ if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
+ REG_WRITE(ah, AR_ISR_S5, s5_s);
+ isr &= ~AR_ISR_GENTMR;
+ }
+ }
+
+ if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
+ REG_WRITE(ah, AR_ISR, isr);
+ REG_READ(ah, AR_ISR);
}
if (sync_cause) {
A version that applies over OpenWrt trunk is here:
http://msujith.org/dir/patches/wl/Dec-13-2013/0001-ath9k-Interrupt-handling-fix-for-AR9002-family.patch
Sujith
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-13 9:27 ` [Cerowrt-devel] Wireless failures 3.10.17-3 Sujith Manoharan
@ 2013-12-13 9:48 ` Sebastian Moeller
2013-12-13 16:51 ` Felix Fietkau
2013-12-13 20:56 ` Dave Taht
1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-13 9:48 UTC (permalink / raw)
To: Sujith Manoharan; +Cc: ath9k-devel, linux-wireless, cerowrt-devel
Hi Sujith,
On Dec 13, 2013, at 10:27 , Sujith Manoharan <sujith@msujith.org> wrote:
> Sebastian Moeller wrote:
>> It is a net gear WNDR3700 v2, so according to:
>> http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680
>> MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros
>> AR9220 802.11an.
>>
>> Sure, I hope I got the right one. Now this is not from the same boot as the
>> one with the errors, but I assume that does not make a difference… Since I am
>> located in Germany I set the regulatory domain to DE. please let me know if I
>> you need any additional information or testing (note I am not set up to build
>> cerowrt myself, so I would need Dave Täht's help to build a modified firmware)
>
> Can you try this patch ?
I will, but it will take some time, as I cannot build the firmware for this device myself, but need help. So I let you know once I tested the patched kernel.
Best Regards & many thanks
Sebastian
>
> diff --git a/drivers/net/wireless/ath/ath9k/ar9002_mac.c b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
> index 8d78253..0337de7 100644
> --- a/drivers/net/wireless/ath/ath9k/ar9002_mac.c
> +++ b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
> @@ -76,9 +76,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> mask2 |= ATH9K_INT_CST;
> if (isr2 & AR_ISR_S2_TSFOOR)
> mask2 |= ATH9K_INT_TSFOOR;
> +
> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
> + REG_WRITE(ah, AR_ISR_S2, isr2);
> + isr &= ~AR_ISR_BCNMISC;
> + }
> }
>
> - isr = REG_READ(ah, AR_ISR_RAC);
> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)
> + isr = REG_READ(ah, AR_ISR_RAC);
> +
> if (isr == 0xffffffff) {
> *masked = 0;
> return false;
> @@ -97,11 +104,23 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
>
> *masked |= ATH9K_INT_TX;
>
> - s0_s = REG_READ(ah, AR_ISR_S0_S);
> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
> + s0_s = REG_READ(ah, AR_ISR_S0_S);
> + s1_s = REG_READ(ah, AR_ISR_S1_S);
> + } else {
> + s0_s = REG_READ(ah, AR_ISR_S0);
> + REG_WRITE(ah, AR_ISR_S0, s0_s);
> + s1_s = REG_READ(ah, AR_ISR_S1);
> + REG_WRITE(ah, AR_ISR_S1, s1_s);
> +
> + isr &= ~(AR_ISR_TXOK |
> + AR_ISR_TXDESC |
> + AR_ISR_TXERR |
> + AR_ISR_TXEOL);
> + }
> +
> ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXOK);
> ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXDESC);
> -
> - s1_s = REG_READ(ah, AR_ISR_S1_S);
> ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXERR);
> ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXEOL);
> }
> @@ -120,7 +139,12 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> if (isr & AR_ISR_GENTMR) {
> u32 s5_s;
>
> - s5_s = REG_READ(ah, AR_ISR_S5_S);
> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
> + s5_s = REG_READ(ah, AR_ISR_S5_S);
> + } else {
> + s5_s = REG_READ(ah, AR_ISR_S5);
> + }
> +
> ah->intr_gen_timer_trigger =
> MS(s5_s, AR_ISR_S5_GENTIMER_TRIG);
>
> @@ -133,6 +157,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> if ((s5_s & AR_ISR_S5_TIM_TIMER) &&
> !(pCap->hw_caps & ATH9K_HW_CAP_AUTOSLEEP))
> *masked |= ATH9K_INT_TIM_TIMER;
> +
> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
> + REG_WRITE(ah, AR_ISR_S5, s5_s);
> + isr &= ~AR_ISR_GENTMR;
> + }
> + }
> +
> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
> + REG_WRITE(ah, AR_ISR, isr);
> + REG_READ(ah, AR_ISR);
> }
>
> if (sync_cause) {
>
>
> A version that applies over OpenWrt trunk is here:
> http://msujith.org/dir/patches/wl/Dec-13-2013/0001-ath9k-Interrupt-handling-fix-for-AR9002-family.patch
>
> Sujith
--
Sandra, Okko, Joris, & Sebastian Moeller
Telefon: +49 7071 96 49 783, +49 7071 96 49 784, +49 7071 96 49 785
GSM: +49-1577-190 31 41
GSM: +49-1517-00 70 355
Moltkestrasse 6
72072 Tuebingen
Deutschland
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-13 9:48 ` Sebastian Moeller
@ 2013-12-13 16:51 ` Felix Fietkau
2013-12-13 19:08 ` Sebastian Moeller
0 siblings, 1 reply; 14+ messages in thread
From: Felix Fietkau @ 2013-12-13 16:51 UTC (permalink / raw)
To: Sebastian Moeller, Sujith Manoharan
Cc: ath9k-devel, linux-wireless, cerowrt-devel
On 2013-12-13 10:48, Sebastian Moeller wrote:
> Hi Sujith,
>
> On Dec 13, 2013, at 10:27 , Sujith Manoharan <sujith@msujith.org> wrote:
>
>> Sebastian Moeller wrote:
>>> It is a net gear WNDR3700 v2, so according to:
>>> http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680
>>> MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros
>>> AR9220 802.11an.
>>>
>>> Sure, I hope I got the right one. Now this is not from the same boot as the
>>> one with the errors, but I assume that does not make a difference… Since I am
>>> located in Germany I set the regulatory domain to DE. please let me know if I
>>> you need any additional information or testing (note I am not set up to build
>>> cerowrt myself, so I would need Dave Täht's help to build a modified firmware)
>>
>> Can you try this patch ?
>
> I will, but it will take some time, as I cannot build the firmware for this device myself, but need help. So I let you know once I tested the patched kernel.
On OpenWrt/CeroWrt you should not patch it into the kernel. You need to
add it as a patch for package/kernel/mac80211.
- Felix
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-13 16:51 ` Felix Fietkau
@ 2013-12-13 19:08 ` Sebastian Moeller
0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-13 19:08 UTC (permalink / raw)
To: Felix Fietkau; +Cc: ath9k-devel, linux-wireless, cerowrt-devel
Hello Felix,
On Dec 13, 2013, at 17:51 , Felix Fietkau <nbd@openwrt.org> wrote:
> On 2013-12-13 10:48, Sebastian Moeller wrote:
>> Hi Sujith,
>>
>> On Dec 13, 2013, at 10:27 , Sujith Manoharan <sujith@msujith.org> wrote:
>>
>>> Sebastian Moeller wrote:
>>>> It is a net gear WNDR3700 v2, so according to:
>>>> http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680
>>>> MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros
>>>> AR9220 802.11an.
>>>>
>>>> Sure, I hope I got the right one. Now this is not from the same boot as the
>>>> one with the errors, but I assume that does not make a difference… Since I am
>>>> located in Germany I set the regulatory domain to DE. please let me know if I
>>>> you need any additional information or testing (note I am not set up to build
>>>> cerowrt myself, so I would need Dave Täht's help to build a modified firmware)
>>>
>>> Can you try this patch ?
>>
>> I will, but it will take some time, as I cannot build the firmware for this device myself, but need help. So I let you know once I tested the patched kernel.
> On OpenWrt/CeroWrt you should not patch it into the kernel. You need to
> add it as a patch for package/kernel/mac80211.
Ah, thanks, good to know. Vielen Dank. (I still need Dave's help in integrating this patch into a firmware image so I can actually test it...)
Best Regards
Sebastian
>
> - Felix
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-13 9:27 ` [Cerowrt-devel] Wireless failures 3.10.17-3 Sujith Manoharan
2013-12-13 9:48 ` Sebastian Moeller
@ 2013-12-13 20:56 ` Dave Taht
2013-12-13 23:02 ` Dave Taht
1 sibling, 1 reply; 14+ messages in thread
From: Dave Taht @ 2013-12-13 20:56 UTC (permalink / raw)
To: Sujith Manoharan
Cc: Sebastian Moeller, ath9k-devel, linux-wireless, cerowrt-devel
On Fri, Dec 13, 2013 at 1:27 AM, Sujith Manoharan <sujith@msujith.org> wrote:
> Sebastian Moeller wrote:
>> It is a net gear WNDR3700 v2, so according to:
>> http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680
>> MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros
>> AR9220 802.11an.
>>
>> Sure, I hope I got the right one. Now this is not from the same boot as the
>> one with the errors, but I assume that does not make a difference… Since I am
>> located in Germany I set the regulatory domain to DE. please let me know if I
>> you need any additional information or testing (note I am not set up to build
>> cerowrt myself, so I would need Dave Täht's help to build a modified firmware)
THANK YOU!
I have applied the patch to the next build of cerowrt-3.10.24-1 for
the wndr3700v2 and 3800 which will be here when the build completes:
http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.24-1
100% completely untested by me til sunday! Don't try this on your
default home router.
While I'm here on linux-wireless:
Cerowrt really needs a new maintainer and more people able to build
it. I am generally working on some queuing theory (in wireless/wifi)
right now, fixing a new chipset in a new box that I can't talk about
(yet), and low on free time, and working on standardizing fq_codel in
the ietf is eating what little spare time I have left.
Although dedicating my sundays to Cero, I'm losing the general purpose
skill set required to keep the continuous integration phase from
openwrt to cero on the wndr3800 going. I care about keeping cero
going, but after 3 years of building it and after struggling to make
it stable since august, I'm feeling washed up and burned out on it. I
think we are very close to a stable release, though, and I'll feel
much better about things after this bug is gone…
But while I'm limping along...
Any volunteers to help get the next release after this one out? Any
suggestions for doing it mo better? Or a better strategy for testing
more fixes for bufferbloat?
There MIGHT be some funding for Cero next year. There never has been
before, and there have been too many broken promises, sooo the only
true reward I know of for working on bufferbloat with cerowrt (and it
is major!) is doing bleeding edge research on the Internet's most
nagging problems…. and *solving them*.
OK, then there's also the user base, which is wonderful. And the
notoriety. And kicking the vendors and ISPs making crappy routers in
the shins on a regular basis. Etc.
I'd like to add a next-generation bleeding edge chip to the effort but
can't without more funding and more volunteers.
> Can you try this patch ?
I have folded this into cerowrt-3.10.24-1. Note that in addition to
this problem the last couple builds have been testing dnsmasq 2.68
which may have also broke at the same time, and I am far from the
yurtlab right now so I am unable to test before sunday. (use fixed ip
addrs if it's still busted)
:Crossed fingers:
I note that I don't know if there is a cause or effect relationship in
the DMA tx bug to what we are actually seeing, with radios falling off
the net. I have a similar long-standing bug with babel doing ipv6
ad-hoc mode multicasts and receives and seeing other nodes, but no
actual unicast traffic being capable of being transmitted. That too
seems to happen after seeing the DMA tx bug and days of uptime.
I have also setup an ath9k in several x86 boxes to see if this problem
occurs there. I'd thought it didn't, and that pointed to some sort of
write barrier problem, maybe...
thanks again for taking a stab at the problem! I was merely going to
add a WARN_ON to start searching, didn't think this would arrive in my
mailbox this morning!
> diff --git a/drivers/net/wireless/ath/ath9k/ar9002_mac.c b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
> index 8d78253..0337de7 100644
> --- a/drivers/net/wireless/ath/ath9k/ar9002_mac.c
> +++ b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
> @@ -76,9 +76,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> mask2 |= ATH9K_INT_CST;
> if (isr2 & AR_ISR_S2_TSFOOR)
> mask2 |= ATH9K_INT_TSFOOR;
> +
> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
> + REG_WRITE(ah, AR_ISR_S2, isr2);
> + isr &= ~AR_ISR_BCNMISC;
> + }
> }
>
> - isr = REG_READ(ah, AR_ISR_RAC);
> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)
> + isr = REG_READ(ah, AR_ISR_RAC);
> +
> if (isr == 0xffffffff) {
> *masked = 0;
> return false;
> @@ -97,11 +104,23 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
>
> *masked |= ATH9K_INT_TX;
>
> - s0_s = REG_READ(ah, AR_ISR_S0_S);
> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
> + s0_s = REG_READ(ah, AR_ISR_S0_S);
> + s1_s = REG_READ(ah, AR_ISR_S1_S);
> + } else {
> + s0_s = REG_READ(ah, AR_ISR_S0);
> + REG_WRITE(ah, AR_ISR_S0, s0_s);
> + s1_s = REG_READ(ah, AR_ISR_S1);
> + REG_WRITE(ah, AR_ISR_S1, s1_s);
> +
> + isr &= ~(AR_ISR_TXOK |
> + AR_ISR_TXDESC |
> + AR_ISR_TXERR |
> + AR_ISR_TXEOL);
> + }
> +
> ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXOK);
> ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXDESC);
> -
> - s1_s = REG_READ(ah, AR_ISR_S1_S);
> ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXERR);
> ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXEOL);
> }
> @@ -120,7 +139,12 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> if (isr & AR_ISR_GENTMR) {
> u32 s5_s;
>
> - s5_s = REG_READ(ah, AR_ISR_S5_S);
> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
> + s5_s = REG_READ(ah, AR_ISR_S5_S);
> + } else {
> + s5_s = REG_READ(ah, AR_ISR_S5);
> + }
> +
> ah->intr_gen_timer_trigger =
> MS(s5_s, AR_ISR_S5_GENTIMER_TRIG);
>
> @@ -133,6 +157,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> if ((s5_s & AR_ISR_S5_TIM_TIMER) &&
> !(pCap->hw_caps & ATH9K_HW_CAP_AUTOSLEEP))
> *masked |= ATH9K_INT_TIM_TIMER;
> +
> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
> + REG_WRITE(ah, AR_ISR_S5, s5_s);
> + isr &= ~AR_ISR_GENTMR;
> + }
> + }
> +
> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
> + REG_WRITE(ah, AR_ISR, isr);
> + REG_READ(ah, AR_ISR);
> }
>
> if (sync_cause) {
>
>
> A version that applies over OpenWrt trunk is here:
> http://msujith.org/dir/patches/wl/Dec-13-2013/0001-ath9k-Interrupt-handling-fix-for-AR9002-family.patch
Lots of whitespace errors in the git tree. applied. THANKS!
>
> Sujith
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-13 20:56 ` Dave Taht
@ 2013-12-13 23:02 ` Dave Taht
2013-12-14 4:00 ` Sujith Manoharan
0 siblings, 1 reply; 14+ messages in thread
From: Dave Taht @ 2013-12-13 23:02 UTC (permalink / raw)
To: Sujith Manoharan
Cc: Sebastian Moeller, ath9k-devel, linux-wireless, cerowrt-devel
OK, I couldn't help myself but boot up that release. Wet paint! It
successfully brought up
the 5ghz radio, but did not manage to assign an ip address to it
(netifd bug?) and failed on the 2ghz radio utterly.
trying to restart it manually fails to bring up the 5ghz radio as well.
Here's an strace of that.
http://snapon.lab.bufferbloat.net/~d/hostapd.strace.txt
I don't see it beacon, either.
Now, I don't have a grip on what started happening two releases back
(I was out of town) but I figure it is perhaps more relevant than
chasing the DMA tx thing. And ENOTIME for me on this til sunday. I
will revert this patch and bisect backwards.
root@CMTS:~# wifi enable
command failed: Device or resource busy (-16)
Configuration file: /var/run/hostapd-phy0.conf
nl80211: Could not configure driver mode
nl80211 driver initialization failed.
hostapd_free_hapd_data: Interface gw00 wasn't started
hostapd_free_hapd_data: Interface gw00 wasn't started
hostapd_free_hapd_data: Interface sw00 wasn't started
Failed to start hostapd for phy0
command failed: Too many open files in system (-23)
command failed: Too many open files in system (-23)
ifconfig: SIOCSIFHWADDR: Device or resource busy
command failed: Device or resource busy (-16)
Configuration file: /var/run/hostapd-phy1.conf
nl80211: Could not configure driver mode
nl80211 driver initialization failed.
hostapd_free_hapd_data: Interface gw10 wasn't started
hostapd_free_hapd_data: Interface sw10 wasn't started
Failed to start hostapd for phy1
netifd: Interface 'sw10' is enabled
On Fri, Dec 13, 2013 at 12:56 PM, Dave Taht <dave.taht@gmail.com> wrote:
> On Fri, Dec 13, 2013 at 1:27 AM, Sujith Manoharan <sujith@msujith.org> wrote:
>> Sebastian Moeller wrote:
>>> It is a net gear WNDR3700 v2, so according to:
>>> http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680
>>> MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros
>>> AR9220 802.11an.
>>>
>>> Sure, I hope I got the right one. Now this is not from the same boot as the
>>> one with the errors, but I assume that does not make a difference… Since I am
>>> located in Germany I set the regulatory domain to DE. please let me know if I
>>> you need any additional information or testing (note I am not set up to build
>>> cerowrt myself, so I would need Dave Täht's help to build a modified firmware)
>
> THANK YOU!
>
> I have applied the patch to the next build of cerowrt-3.10.24-1 for
> the wndr3700v2 and 3800 which will be here when the build completes:
>
> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.24-1
>
> 100% completely untested by me til sunday! Don't try this on your
> default home router.
>
> While I'm here on linux-wireless:
>
> Cerowrt really needs a new maintainer and more people able to build
> it. I am generally working on some queuing theory (in wireless/wifi)
> right now, fixing a new chipset in a new box that I can't talk about
> (yet), and low on free time, and working on standardizing fq_codel in
> the ietf is eating what little spare time I have left.
>
> Although dedicating my sundays to Cero, I'm losing the general purpose
> skill set required to keep the continuous integration phase from
> openwrt to cero on the wndr3800 going. I care about keeping cero
> going, but after 3 years of building it and after struggling to make
> it stable since august, I'm feeling washed up and burned out on it. I
> think we are very close to a stable release, though, and I'll feel
> much better about things after this bug is gone…
>
> But while I'm limping along...
>
> Any volunteers to help get the next release after this one out? Any
> suggestions for doing it mo better? Or a better strategy for testing
> more fixes for bufferbloat?
>
> There MIGHT be some funding for Cero next year. There never has been
> before, and there have been too many broken promises, sooo the only
> true reward I know of for working on bufferbloat with cerowrt (and it
> is major!) is doing bleeding edge research on the Internet's most
> nagging problems…. and *solving them*.
>
> OK, then there's also the user base, which is wonderful. And the
> notoriety. And kicking the vendors and ISPs making crappy routers in
> the shins on a regular basis. Etc.
>
> I'd like to add a next-generation bleeding edge chip to the effort but
> can't without more funding and more volunteers.
>
>> Can you try this patch ?
>
> I have folded this into cerowrt-3.10.24-1. Note that in addition to
> this problem the last couple builds have been testing dnsmasq 2.68
> which may have also broke at the same time, and I am far from the
> yurtlab right now so I am unable to test before sunday. (use fixed ip
> addrs if it's still busted)
>
> :Crossed fingers:
>
> I note that I don't know if there is a cause or effect relationship in
> the DMA tx bug to what we are actually seeing, with radios falling off
> the net. I have a similar long-standing bug with babel doing ipv6
> ad-hoc mode multicasts and receives and seeing other nodes, but no
> actual unicast traffic being capable of being transmitted. That too
> seems to happen after seeing the DMA tx bug and days of uptime.
>
> I have also setup an ath9k in several x86 boxes to see if this problem
> occurs there. I'd thought it didn't, and that pointed to some sort of
> write barrier problem, maybe...
>
> thanks again for taking a stab at the problem! I was merely going to
> add a WARN_ON to start searching, didn't think this would arrive in my
> mailbox this morning!
>
>> diff --git a/drivers/net/wireless/ath/ath9k/ar9002_mac.c b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
>> index 8d78253..0337de7 100644
>> --- a/drivers/net/wireless/ath/ath9k/ar9002_mac.c
>> +++ b/drivers/net/wireless/ath/ath9k/ar9002_mac.c
>> @@ -76,9 +76,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
>> mask2 |= ATH9K_INT_CST;
>> if (isr2 & AR_ISR_S2_TSFOOR)
>> mask2 |= ATH9K_INT_TSFOOR;
>> +
>> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
>> + REG_WRITE(ah, AR_ISR_S2, isr2);
>> + isr &= ~AR_ISR_BCNMISC;
>> + }
>> }
>>
>> - isr = REG_READ(ah, AR_ISR_RAC);
>> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)
>> + isr = REG_READ(ah, AR_ISR_RAC);
>> +
>> if (isr == 0xffffffff) {
>> *masked = 0;
>> return false;
>> @@ -97,11 +104,23 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
>>
>> *masked |= ATH9K_INT_TX;
>>
>> - s0_s = REG_READ(ah, AR_ISR_S0_S);
>> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
>> + s0_s = REG_READ(ah, AR_ISR_S0_S);
>> + s1_s = REG_READ(ah, AR_ISR_S1_S);
>> + } else {
>> + s0_s = REG_READ(ah, AR_ISR_S0);
>> + REG_WRITE(ah, AR_ISR_S0, s0_s);
>> + s1_s = REG_READ(ah, AR_ISR_S1);
>> + REG_WRITE(ah, AR_ISR_S1, s1_s);
>> +
>> + isr &= ~(AR_ISR_TXOK |
>> + AR_ISR_TXDESC |
>> + AR_ISR_TXERR |
>> + AR_ISR_TXEOL);
>> + }
>> +
>> ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXOK);
>> ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXDESC);
>> -
>> - s1_s = REG_READ(ah, AR_ISR_S1_S);
>> ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXERR);
>> ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXEOL);
>> }
>> @@ -120,7 +139,12 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
>> if (isr & AR_ISR_GENTMR) {
>> u32 s5_s;
>>
>> - s5_s = REG_READ(ah, AR_ISR_S5_S);
>> + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) {
>> + s5_s = REG_READ(ah, AR_ISR_S5_S);
>> + } else {
>> + s5_s = REG_READ(ah, AR_ISR_S5);
>> + }
>> +
>> ah->intr_gen_timer_trigger =
>> MS(s5_s, AR_ISR_S5_GENTIMER_TRIG);
>>
>> @@ -133,6 +157,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
>> if ((s5_s & AR_ISR_S5_TIM_TIMER) &&
>> !(pCap->hw_caps & ATH9K_HW_CAP_AUTOSLEEP))
>> *masked |= ATH9K_INT_TIM_TIMER;
>> +
>> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
>> + REG_WRITE(ah, AR_ISR_S5, s5_s);
>> + isr &= ~AR_ISR_GENTMR;
>> + }
>> + }
>> +
>> + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) {
>> + REG_WRITE(ah, AR_ISR, isr);
>> + REG_READ(ah, AR_ISR);
>> }
>>
>> if (sync_cause) {
>>
>>
>> A version that applies over OpenWrt trunk is here:
>> http://msujith.org/dir/patches/wl/Dec-13-2013/0001-ath9k-Interrupt-handling-fix-for-AR9002-family.patch
>
> Lots of whitespace errors in the git tree. applied. THANKS!
>
>>
>> Sujith
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-13 23:02 ` Dave Taht
@ 2013-12-14 4:00 ` Sujith Manoharan
2013-12-14 21:40 ` Dave Taht
0 siblings, 1 reply; 14+ messages in thread
From: Sujith Manoharan @ 2013-12-14 4:00 UTC (permalink / raw)
To: Dave Taht; +Cc: Sebastian Moeller, ath9k-devel, linux-wireless, cerowrt-devel
Dave Taht wrote:
> OK, I couldn't help myself but boot up that release. Wet paint! It
> successfully brought up
> the 5ghz radio, but did not manage to assign an ip address to it
> (netifd bug?) and failed on the 2ghz radio utterly.
>
> trying to restart it manually fails to bring up the 5ghz radio as well.
> Here's an strace of that.
>
> http://snapon.lab.bufferbloat.net/~d/hostapd.strace.txt
>
> I don't see it beacon, either.
>
> Now, I don't have a grip on what started happening two releases back
> (I was out of town) but I figure it is perhaps more relevant than
> chasing the DMA tx thing. And ENOTIME for me on this til sunday. I
> will revert this patch and bisect backwards.
I am not sure how the patch would break things.
Booting OpenWrt trunk (with the patch) on an AP96 reference board seems
to work fine: http://pastebin.com/3rPSfuad
Sujith
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-14 4:00 ` Sujith Manoharan
@ 2013-12-14 21:40 ` Dave Taht
0 siblings, 0 replies; 14+ messages in thread
From: Dave Taht @ 2013-12-14 21:40 UTC (permalink / raw)
To: Sujith Manoharan
Cc: Sebastian Moeller, ath9k-devel, linux-wireless, cerowrt-devel
On Fri, Dec 13, 2013 at 8:00 PM, Sujith Manoharan <sujith@msujith.org> wrote:
> Dave Taht wrote:
>> OK, I couldn't help myself but boot up that release. Wet paint! It
>> successfully brought up
>> the 5ghz radio, but did not manage to assign an ip address to it
>> (netifd bug?) and failed on the 2ghz radio utterly.
>>
>> trying to restart it manually fails to bring up the 5ghz radio as well.
>> Here's an strace of that.
>>
>> http://snapon.lab.bufferbloat.net/~d/hostapd.strace.txt
>>
>> I don't see it beacon, either.
>>
>> Now, I don't have a grip on what started happening two releases back
>> (I was out of town) but I figure it is perhaps more relevant than
>> chasing the DMA tx thing. And ENOTIME for me on this til sunday. I
>> will revert this patch and bisect backwards.
>
> I am not sure how the patch would break things.
>
> Booting OpenWrt trunk (with the patch) on an AP96 reference board seems
> to work fine: http://pastebin.com/3rPSfuad
>
> Sujith
We appear to have a deeper problem. I reverted your patch, moved my build back
to 3.10.21 to match openwrt, felix reverted some stuff in netifd...
... and on first boot, somehow, all the wireless interfaces come up.
After a reboot, most don't. doing openwrt's wifi up is also
interesting with all sorts of failures trying to bring up different
interfaces.
So I suspect netifd has a race still.
Afer we get that sorted, we can move onto trying to poke into the DMA
tx error, or since it seems by hammering on it I can get an interface
up I can move forward a bit, in parallel.
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-11 20:41 ` Sebastian Moeller
@ 2013-12-11 22:05 ` Jim Gettys
0 siblings, 0 replies; 14+ messages in thread
From: Jim Gettys @ 2013-12-11 22:05 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 9041 bytes --]
Yes, those are the error messages I saw in my log.
It is wonderful you seem to be able to trigger them at will.
- Jim
On Wed, Dec 11, 2013 at 3:41 PM, Sebastian Moeller <moeller0@gmx.de> wrote:
> Hi List, hi Dave,
>
>
> On Dec 11, 2013, at 19:41 , Dave Taht <dave.taht@gmail.com> wrote:
>
> > I have the regrettable problem of mostly testing the 5ghz channel due
> > to interference issues on the 2ghz band.
> >
> > What I am seeing in the last several releases of the 3.8.x and 3.10
> > series is after tons of traffic and multiple days of uptime a DMA tx
> > error which you can see via the logread or dmesg tool, and once it
> > happens, at least sometimes, that radio can "go away" and not be
> > resettable. "cannot stop tx dma" is the error.
>
> I think I can make tho error appear "at will" by running
> netperf-wrapper against my wndr3700v2, just tested under 3.10.21-1:
> /netperf-wrapper -l 300 -H gw.home.lan rrul -p all -t
> hms-beagle_cerowrt3.10.21-1_2_nacktmulle
>
> dmesg on the router:
> [ 53.007812] IPv6: ADDRCONF(NETDEV_CHANGE): gw11: link becomes ready
> [28792.039062] ath: phy1: Failed to stop TX DMA, queues=0x00e!
> [28794.078125] ath: phy1: Failed to stop TX DMA, queues=0x00e!
> [28807.164062] ath: phy1: Failed to stop TX DMA, queues=0x00e!
> [28809.191406] ath: phy1: Failed to stop TX DMA, queues=0x002!
> [28823.269531] ath: phy1: Failed to stop TX DMA, queues=0x00e!
>
> dmesg was clean before so these 5 failures are from the rrul test over the
> 5GHz radio
>
> running the same over the 2.4GHz radio adds the following:
>
> [29200.921875] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29206.980468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29209.019531] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29211.066406] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29215.109375] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29227.195312] ath: phy0: Failed to stop TX DMA, queues=0x006!
> [29233.257812] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29238.308593] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29240.351562] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29247.417968] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29251.480468] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29253.515625] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29256.558593] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29262.617187] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29264.652343] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29269.699218] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29273.750000] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29278.804687] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29281.859375] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29291.933593] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29294.972656] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29304.050781] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29312.117187] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29315.167968] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29322.246093] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29325.292968] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29330.355468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29332.390625] ath: phy0: Failed to stop TX DMA, queues=0x00a!
> [29334.445312] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29336.484375] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29337.527343] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29343.617187] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29349.679687] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29358.757812] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29361.816406] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29363.851562] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29364.882812] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29370.937500] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29371.976562] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29376.031250] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29378.062500] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29381.105468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29388.175781] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29393.230468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29401.292968] ath: phy0: Failed to stop TX DMA, queues=0x003!
> [29403.332031] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29413.429687] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29417.480468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29422.542968] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29424.582031] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29427.636718] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29429.671875] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29431.718750] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29433.765625] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29445.835937] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29449.898437] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29454.960937] ath: phy0: Failed to stop TX DMA, queues=0x00f!
> [29461.023437] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29463.062500] ath: phy0: Failed to stop TX DMA, queues=0x00e!
> [29466.117187] ath: phy0: Failed to stop TX DMA, queues=0x00f!
>
> I have to admit before today I never tested with 2.4GHz and only say the 4
> to 5 messages in the 5GHz band.
>
> Running the same over the wired interface does not cause these messages…
>
> And running from a 5GHz client through the router to a wired client (both
> on the internal side) just adds:
> [30643.500000] ath: phy1: Failed to stop TX DMA, queues=0x00c!
> [30736.898437] ath: phy1: Failed to stop TX DMA, queues=0x00e!
>
> It does not immediately lead to a drop of the radio though...
>
> Maybe this can be helpful in the hands of a real expert?
>
>
> > I have seen this error
> > many, many times in cerowrt releases for the last 2 years, but this
> > time it seems more severe than usual.
> >
> > There was also a bug in dnsmasq or somewhere in the lower level of the
> > stack where it stops responding to multicast dhcp packets.
> >
> > The upcoming 3.10.23-1 development release has a refresh of mac80211,
> > and a bug fix related to multicast, so I have some hope for it.
> >
> > It has also the latest dnsmasq 2.68 (which fixes a bug in cname
> > handling in particular), and also pie v3 but I am (as usual) not in a
> > position to test it right now.
> >
> > It is my hope that now that the bug happens a lot we can track it
> > down. Or, that it's fixed. :)
> >
> > I just put that release up at:
> >
> > http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.23-1/
> >
> > It does not have the updated aqm-scripts code and gui (sorry
> > sebastian),
>
> Ah, even better, I finished the discussed cosmetic changes and
> tested them, I will try to send them before Sunday, so they might end up in
> the next cero release. That means you will have to integrate with your
> changes to avoid HTB for high bandwidths… (or you just put your version in
> and I will do the integration after the next release :) )
> Also, I still need to figure out how to make mutually exclusive
> with the default QOS system...
>
>
> > nor the pie v4 drop that just got rejected for kernel
> > mainline. I'll try to do a respin this weekend with those, and poke
> > harder at the dma tx issue after I get back in the lab. Thoughts
> > towards being able to isolate the cause and minimize the effect are
> > welcomed - it's one of the biggest barriers to declaring a stable
> > release at this point!
> >
> >
> > On Wed, Dec 11, 2013 at 8:58 AM, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> >> Has anyone seen wireless failing after several days with 3.10.17-3?
> >>
> >> The symptoms are devices fall off the net several days (or a week) after
> >> router has been running. I saw the bg AP go away, but the 5 Ghz AP still
> >> working. Wired attachment works.
> >> _______________________________________________
> >> Cerowrt-devel mailing list
> >> Cerowrt-devel@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
> >
> >
> > --
> > Dave Täht
> >
> > Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
[-- Attachment #2: Type: text/html, Size: 10974 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-11 18:41 ` Dave Taht
@ 2013-12-11 20:41 ` Sebastian Moeller
2013-12-11 22:05 ` Jim Gettys
0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Moeller @ 2013-12-11 20:41 UTC (permalink / raw)
To: Dave Taht; +Cc: cerowrt-devel
Hi List, hi Dave,
On Dec 11, 2013, at 19:41 , Dave Taht <dave.taht@gmail.com> wrote:
> I have the regrettable problem of mostly testing the 5ghz channel due
> to interference issues on the 2ghz band.
>
> What I am seeing in the last several releases of the 3.8.x and 3.10
> series is after tons of traffic and multiple days of uptime a DMA tx
> error which you can see via the logread or dmesg tool, and once it
> happens, at least sometimes, that radio can "go away" and not be
> resettable. "cannot stop tx dma" is the error.
I think I can make tho error appear "at will" by running netperf-wrapper against my wndr3700v2, just tested under 3.10.21-1:
/netperf-wrapper -l 300 -H gw.home.lan rrul -p all -t hms-beagle_cerowrt3.10.21-1_2_nacktmulle
dmesg on the router:
[ 53.007812] IPv6: ADDRCONF(NETDEV_CHANGE): gw11: link becomes ready
[28792.039062] ath: phy1: Failed to stop TX DMA, queues=0x00e!
[28794.078125] ath: phy1: Failed to stop TX DMA, queues=0x00e!
[28807.164062] ath: phy1: Failed to stop TX DMA, queues=0x00e!
[28809.191406] ath: phy1: Failed to stop TX DMA, queues=0x002!
[28823.269531] ath: phy1: Failed to stop TX DMA, queues=0x00e!
dmesg was clean before so these 5 failures are from the rrul test over the 5GHz radio
running the same over the 2.4GHz radio adds the following:
[29200.921875] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29206.980468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29209.019531] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29211.066406] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29215.109375] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29227.195312] ath: phy0: Failed to stop TX DMA, queues=0x006!
[29233.257812] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29238.308593] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29240.351562] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29247.417968] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29251.480468] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29253.515625] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29256.558593] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29262.617187] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29264.652343] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29269.699218] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29273.750000] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29278.804687] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29281.859375] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29291.933593] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29294.972656] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29304.050781] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29312.117187] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29315.167968] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29322.246093] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29325.292968] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29330.355468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29332.390625] ath: phy0: Failed to stop TX DMA, queues=0x00a!
[29334.445312] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29336.484375] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29337.527343] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29343.617187] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29349.679687] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29358.757812] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29361.816406] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29363.851562] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29364.882812] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29370.937500] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29371.976562] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29376.031250] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29378.062500] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29381.105468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29388.175781] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29393.230468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29401.292968] ath: phy0: Failed to stop TX DMA, queues=0x003!
[29403.332031] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29413.429687] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29417.480468] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29422.542968] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29424.582031] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29427.636718] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29429.671875] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29431.718750] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29433.765625] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29445.835937] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29449.898437] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29454.960937] ath: phy0: Failed to stop TX DMA, queues=0x00f!
[29461.023437] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29463.062500] ath: phy0: Failed to stop TX DMA, queues=0x00e!
[29466.117187] ath: phy0: Failed to stop TX DMA, queues=0x00f!
I have to admit before today I never tested with 2.4GHz and only say the 4 to 5 messages in the 5GHz band.
Running the same over the wired interface does not cause these messages…
And running from a 5GHz client through the router to a wired client (both on the internal side) just adds:
[30643.500000] ath: phy1: Failed to stop TX DMA, queues=0x00c!
[30736.898437] ath: phy1: Failed to stop TX DMA, queues=0x00e!
It does not immediately lead to a drop of the radio though...
Maybe this can be helpful in the hands of a real expert?
> I have seen this error
> many, many times in cerowrt releases for the last 2 years, but this
> time it seems more severe than usual.
>
> There was also a bug in dnsmasq or somewhere in the lower level of the
> stack where it stops responding to multicast dhcp packets.
>
> The upcoming 3.10.23-1 development release has a refresh of mac80211,
> and a bug fix related to multicast, so I have some hope for it.
>
> It has also the latest dnsmasq 2.68 (which fixes a bug in cname
> handling in particular), and also pie v3 but I am (as usual) not in a
> position to test it right now.
>
> It is my hope that now that the bug happens a lot we can track it
> down. Or, that it's fixed. :)
>
> I just put that release up at:
>
> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.23-1/
>
> It does not have the updated aqm-scripts code and gui (sorry
> sebastian),
Ah, even better, I finished the discussed cosmetic changes and tested them, I will try to send them before Sunday, so they might end up in the next cero release. That means you will have to integrate with your changes to avoid HTB for high bandwidths… (or you just put your version in and I will do the integration after the next release :) )
Also, I still need to figure out how to make mutually exclusive with the default QOS system...
> nor the pie v4 drop that just got rejected for kernel
> mainline. I'll try to do a respin this weekend with those, and poke
> harder at the dma tx issue after I get back in the lab. Thoughts
> towards being able to isolate the cause and minimize the effect are
> welcomed - it's one of the biggest barriers to declaring a stable
> release at this point!
>
>
> On Wed, Dec 11, 2013 at 8:58 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> Has anyone seen wireless failing after several days with 3.10.17-3?
>>
>> The symptoms are devices fall off the net several days (or a week) after
>> router has been running. I saw the bg AP go away, but the 5 Ghz AP still
>> working. Wired attachment works.
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-11 16:58 Stephen Hemminger
2013-12-11 18:25 ` Jim Gettys
2013-12-11 18:30 ` David Personette
@ 2013-12-11 18:41 ` Dave Taht
2013-12-11 20:41 ` Sebastian Moeller
2 siblings, 1 reply; 14+ messages in thread
From: Dave Taht @ 2013-12-11 18:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: cerowrt-devel
I have the regrettable problem of mostly testing the 5ghz channel due
to interference issues on the 2ghz band.
What I am seeing in the last several releases of the 3.8.x and 3.10
series is after tons of traffic and multiple days of uptime a DMA tx
error which you can see via the logread or dmesg tool, and once it
happens, at least sometimes, that radio can "go away" and not be
resettable. "cannot stop tx dma" is the error. I have seen this error
many, many times in cerowrt releases for the last 2 years, but this
time it seems more severe than usual.
There was also a bug in dnsmasq or somewhere in the lower level of the
stack where it stops responding to multicast dhcp packets.
The upcoming 3.10.23-1 development release has a refresh of mac80211,
and a bug fix related to multicast, so I have some hope for it.
It has also the latest dnsmasq 2.68 (which fixes a bug in cname
handling in particular), and also pie v3 but I am (as usual) not in a
position to test it right now.
It is my hope that now that the bug happens a lot we can track it
down. Or, that it's fixed. :)
I just put that release up at:
http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.23-1/
It does not have the updated aqm-scripts code and gui (sorry
sebastian), nor the pie v4 drop that just got rejected for kernel
mainline. I'll try to do a respin this weekend with those, and poke
harder at the dma tx issue after I get back in the lab. Thoughts
towards being able to isolate the cause and minimize the effect are
welcomed - it's one of the biggest barriers to declaring a stable
release at this point!
On Wed, Dec 11, 2013 at 8:58 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> Has anyone seen wireless failing after several days with 3.10.17-3?
>
> The symptoms are devices fall off the net several days (or a week) after
> router has been running. I saw the bg AP go away, but the 5 Ghz AP still
> working. Wired attachment works.
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
--
Dave Täht
Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-11 16:58 Stephen Hemminger
2013-12-11 18:25 ` Jim Gettys
@ 2013-12-11 18:30 ` David Personette
2013-12-11 18:41 ` Dave Taht
2 siblings, 0 replies; 14+ messages in thread
From: David Personette @ 2013-12-11 18:30 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]
I've seen similar (not necessarily on 3.10.17-3) on earlier versions of
cerowrt and openwrt. And a co-worker has had it happen on dd-wrt.
The solution (band-aid / work around / hack) was to restart dnsmasq on the
router. I added a line to crontab (included below) to do it preemptively
and haven't seen the problem manifest in close to a year now.
# restart dnsmasq at 03:17 every morning:
17 3 * * * /etc/init.d/dnsmasq restart
I never found anything useful in the logs, or any better way to reproduce
it than wait between 2-7 days for devices to stop working. If I'd had
better luck coming up with a cause / error message, I'd have logged an
error, but felt embarrassed to. Not I'm sorry that I didn't share what I
found to help others that it was affecting. I initially just rebooted the
router when it flared up, but eventually got fed up and tried restarting
daemons one at a time till things worked again. Hope this helps.
--
David P.
On Wed, Dec 11, 2013 at 11:58 AM, Stephen Hemminger <
stephen@networkplumber.org> wrote:
> Has anyone seen wireless failing after several days with 3.10.17-3?
>
> The symptoms are devices fall off the net several days (or a week) after
> router has been running. I saw the bg AP go away, but the 5 Ghz AP still
> working. Wired attachment works.
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
[-- Attachment #2: Type: text/html, Size: 2145 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Cerowrt-devel] Wireless failures 3.10.17-3
2013-12-11 16:58 Stephen Hemminger
@ 2013-12-11 18:25 ` Jim Gettys
2013-12-11 18:30 ` David Personette
2013-12-11 18:41 ` Dave Taht
2 siblings, 0 replies; 14+ messages in thread
From: Jim Gettys @ 2013-12-11 18:25 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: cerowrt-devel
[-- Attachment #1: Type: text/plain, Size: 1049 bytes --]
On Wed, Dec 11, 2013 at 11:58 AM, Stephen Hemminger <
stephen@networkplumber.org> wrote:
> Has anyone seen wireless failing after several days with 3.10.17-3?
>
I'm running 3.10.21-1, which I installed late last week.
I've had the router's wireless go catatonic several times, and there are
error messages in the log from the ath9k driver.
I got the logs to Dave to be able to look at, which won't be before the
weekend at best: he's conferencing this week.
> The symptoms are devices fall off the net several days (or a week) after
> router has been running. I saw the bg AP go away, but the 5 Ghz AP still
> working. Wired attachment works.
>
I didn't investigate if the 5ghz was still running; wired definitely was
running fine (I have a second router running stock Netgear firmware, so I
can associate with it and poke at the main router I have running CeroWrt.
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
[-- Attachment #2: Type: text/html, Size: 2407 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cerowrt-devel] Wireless failures 3.10.17-3
@ 2013-12-11 16:58 Stephen Hemminger
2013-12-11 18:25 ` Jim Gettys
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Stephen Hemminger @ 2013-12-11 16:58 UTC (permalink / raw)
To: Dave Täht; +Cc: cerowrt-devel
Has anyone seen wireless failing after several days with 3.10.17-3?
The symptoms are devices fall off the net several days (or a week) after
router has been running. I saw the bg AP go away, but the 5 Ghz AP still
working. Wired attachment works.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-14 21:40 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20131211174519.34966001@nehalam.linuxnetplumber.net>
[not found] ` <21161.18818.926049.511664@gargle.gargle.HOWL>
[not found] ` <C0DD393A-6810-4CB6-B705-AE801ED5BBBA@gmx.de>
2013-12-13 9:27 ` [Cerowrt-devel] Wireless failures 3.10.17-3 Sujith Manoharan
2013-12-13 9:48 ` Sebastian Moeller
2013-12-13 16:51 ` Felix Fietkau
2013-12-13 19:08 ` Sebastian Moeller
2013-12-13 20:56 ` Dave Taht
2013-12-13 23:02 ` Dave Taht
2013-12-14 4:00 ` Sujith Manoharan
2013-12-14 21:40 ` Dave Taht
2013-12-11 16:58 Stephen Hemminger
2013-12-11 18:25 ` Jim Gettys
2013-12-11 18:30 ` David Personette
2013-12-11 18:41 ` Dave Taht
2013-12-11 20:41 ` Sebastian Moeller
2013-12-11 22:05 ` Jim Gettys
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox