* [Make-wifi-fast] tx queue stuck for many minutes
@ 2019-04-11 23:38 Joshua Zhao
2019-04-12 0:04 ` Joshua Zhao
2019-04-12 9:18 ` Toke Høiland-Jørgensen
0 siblings, 2 replies; 7+ messages in thread
From: Joshua Zhao @ 2019-04-11 23:38 UTC (permalink / raw)
To: make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]
Hi,
I run into a weird symptom that occasionally the aqm tx queue can get stuck
for quite long time (maybe around 30min). What happened is that we're
sending a low bandwidth UDP traffic from the host to multiple peers over a
WiFi radio simultaneously. We're running opensource mac80211 + ath10k
driver. Very occasionally when there are temporary issues causing trouble
on the host sending to one of the peers, the queue to all peers can get
stuck and stay stuck for quite long (even though the link trouble had been
gone or TX had been stopped). Only after quite a while, then the queue
cleans up.
The following dump shows an example that the backlog-packets count stays at
74 for around 30min after the TX had totally stopped. Here when video queue
is stuck, other TID's traffic can go through.
I wonder why it takes so long for the queue to clean up. Shouldn't packets
be dropped quickly if they cannot be sent?
Many thanks!
cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan3/stations/xx:xx:xx/aqm
target 19999us interval 99999us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets
0 2 0 0 3 0 0 0 0 390 3
1 3 0 0 0 0 0 0 0 0 0
2 3 0 0 0 0 0 0 0 0 0
3 2 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0
5 1 112344 74 1627 0 0 0 0 2373618 1632
6 0 0 0 0 0 0 0 0 0 0
7 0 0 0 1 0 0 0 0 32 1
8 2 0 0 0 0 0 0 0 0 0
9 3 0 0 0 0 0 0 0 0 0
10 3 0 0 0 0 0 0 0 0 0
11 2 0 0 0 0 0 0 0 0 0
12 1 0 0 0 0 0 0 0 0 0
13 1 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0
[-- Attachment #2: Type: text/html, Size: 1816 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes
2019-04-11 23:38 [Make-wifi-fast] tx queue stuck for many minutes Joshua Zhao
@ 2019-04-12 0:04 ` Joshua Zhao
2019-04-12 9:18 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 7+ messages in thread
From: Joshua Zhao @ 2019-04-12 0:04 UTC (permalink / raw)
To: make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]
and, BTW, what's best way to disable or skip AQM for comparison purposes?
Thanks!
Joshua
On Thu, Apr 11, 2019 at 4:38 PM Joshua Zhao <swzhao@gmail.com> wrote:
> Hi,
> I run into a weird symptom that occasionally the aqm tx queue can get
> stuck for quite long time (maybe around 30min). What happened is that
> we're sending a low bandwidth UDP traffic from the host to multiple peers
> over a WiFi radio simultaneously. We're running opensource mac80211 +
> ath10k driver. Very occasionally when there are temporary issues causing
> trouble on the host sending to one of the peers, the queue to all peers can
> get stuck and stay stuck for quite long (even though the link trouble had
> been gone or TX had been stopped). Only after quite a while, then the queue
> cleans up.
> The following dump shows an example that the backlog-packets count stays
> at 74 for around 30min after the TX had totally stopped. Here when video
> queue is stuck, other TID's traffic can go through.
> I wonder why it takes so long for the queue to clean up. Shouldn't packets
> be dropped quickly if they cannot be sent?
>
> Many thanks!
>
> cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan3/stations/xx:xx:xx/aqm
>
> target 19999us interval 99999us ecn yes
>
> tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
> collisions tx-bytes tx-packets
>
> 0 2 0 0 3 0 0 0 0 390 3
>
> 1 3 0 0 0 0 0 0 0 0 0
>
> 2 3 0 0 0 0 0 0 0 0 0
>
> 3 2 0 0 0 0 0 0 0 0 0
>
> 4 1 0 0 0 0 0 0 0 0 0
>
> 5 1 112344 74 1627 0 0 0 0 2373618 1632
>
> 6 0 0 0 0 0 0 0 0 0 0
>
> 7 0 0 0 1 0 0 0 0 32 1
>
> 8 2 0 0 0 0 0 0 0 0 0
>
> 9 3 0 0 0 0 0 0 0 0 0
>
> 10 3 0 0 0 0 0 0 0 0 0
>
> 11 2 0 0 0 0 0 0 0 0 0
>
> 12 1 0 0 0 0 0 0 0 0 0
>
> 13 1 0 0 0 0 0 0 0 0 0
>
> 14 0 0 0 0 0 0 0 0 0 0
>
> 15 0 0 0 0 0 0 0 0 0 0
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 2292 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes
2019-04-11 23:38 [Make-wifi-fast] tx queue stuck for many minutes Joshua Zhao
2019-04-12 0:04 ` Joshua Zhao
@ 2019-04-12 9:18 ` Toke Høiland-Jørgensen
2019-04-12 17:26 ` Joshua Zhao
1 sibling, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-04-12 9:18 UTC (permalink / raw)
To: Joshua Zhao, make-wifi-fast
Joshua Zhao <swzhao@gmail.com> writes:
> Hi,
> I run into a weird symptom that occasionally the aqm tx queue can get stuck
> for quite long time (maybe around 30min). What happened is that we're
> sending a low bandwidth UDP traffic from the host to multiple peers over a
> WiFi radio simultaneously. We're running opensource mac80211 + ath10k
> driver. Very occasionally when there are temporary issues causing trouble
> on the host sending to one of the peers, the queue to all peers can get
> stuck and stay stuck for quite long (even though the link trouble had been
> gone or TX had been stopped). Only after quite a while, then the queue
> cleans up.
This sorta sounds like a driver bug where the queue doesn't get
restarted after the issue is resolved? I'd suggest you email the ath10k
(ath10k@lists.infradead.org - maybe cc linux-wireless@vger.kernel.org
and this list) with a description of the issue that triggers this bug,
as well as system details (hardware model, firmware version and kernel
version).
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes
2019-04-12 9:18 ` Toke Høiland-Jørgensen
@ 2019-04-12 17:26 ` Joshua Zhao
2019-04-12 19:57 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 7+ messages in thread
From: Joshua Zhao @ 2019-04-12 17:26 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]
Hi,
Thanks for the reply! I've also emailed the ath10k and linux-wireless list
and waiting to hear back suggestions.
In the meantime can you educate me how the aqm queue interacts with wifi
driver? Is that the driver pulls from the queue from time to time, instead
of aqm pushes to the network interface? How often or what triggers the
driver to pull?
I hope I can verify that if you can point me to the code to check that :)
And, for the queue itself, how long it's supposed to drop packets and clean
up? It seems that when it's full, it notifies back-pressure to the socket
instead of simply dropping the packets from the head or the tail of the
queue?
Many thanks!
Joshua
On Fri, Apr 12, 2019 at 2:18 AM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:
> Joshua Zhao <swzhao@gmail.com> writes:
>
> > Hi,
> > I run into a weird symptom that occasionally the aqm tx queue can get
> stuck
> > for quite long time (maybe around 30min). What happened is that we're
> > sending a low bandwidth UDP traffic from the host to multiple peers over
> a
> > WiFi radio simultaneously. We're running opensource mac80211 + ath10k
> > driver. Very occasionally when there are temporary issues causing trouble
> > on the host sending to one of the peers, the queue to all peers can get
> > stuck and stay stuck for quite long (even though the link trouble had
> been
> > gone or TX had been stopped). Only after quite a while, then the queue
> > cleans up.
>
> This sorta sounds like a driver bug where the queue doesn't get
> restarted after the issue is resolved? I'd suggest you email the ath10k
> (ath10k@lists.infradead.org - maybe cc linux-wireless@vger.kernel.org
> and this list) with a description of the issue that triggers this bug,
> as well as system details (hardware model, firmware version and kernel
> version).
>
> -Toke
>
[-- Attachment #2: Type: text/html, Size: 2495 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes
2019-04-12 17:26 ` Joshua Zhao
@ 2019-04-12 19:57 ` Toke Høiland-Jørgensen
2019-04-12 23:03 ` Joshua Zhao
0 siblings, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-04-12 19:57 UTC (permalink / raw)
To: Joshua Zhao; +Cc: make-wifi-fast
Joshua Zhao <swzhao@gmail.com> writes:
> Hi,
> Thanks for the reply! I've also emailed the ath10k and linux-wireless list
> and waiting to hear back suggestions.
> In the meantime can you educate me how the aqm queue interacts with wifi
> driver? Is that the driver pulls from the queue from time to time, instead
> of aqm pushes to the network interface? How often or what triggers the
> driver to pull?
Generally two paths:
1. Packet comes in from upper netdev -> mac80211 queues the packet to tx ->
driver is notified through wake_tx_queue() op, driver initiates
transmission scheduling and pulls from TXQ
and
2. Driver gets notification from hardware (mostly TX completion) ->
driver initiates TX scheduling and pulls from TXQ
There are some more cases that are variants of the above (e.g., wakeup
from powersave etc). My guess is that in your case it is one of the
cases in the second category that goes wrong...
> I hope I can verify that if you can point me to the code to check that
> :) And, for the queue itself, how long it's supposed to drop packets
> and clean up?
Well, when the hardware is reset, or the station is disassociated, the
queue will be flushed. Other than that, there's no separate "cleanup"
per se; rather, the two mechanisms outlined above should ensure that
packets keep flowing towards the station at the other end.
> It seems that when it's full, it notifies back-pressure to the socket
> instead of simply dropping the packets from the head or the tail of
> the queue?
No, it doesn't generally do much back-pressure. Rather, when it fills
up, it will drop packets from the head of the longest flow to clear
space (see fq_tin_enqueue()). The limit is pretty high, though - 8192
packets or 16 Mbytes of memory...
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes
2019-04-12 19:57 ` Toke Høiland-Jørgensen
@ 2019-04-12 23:03 ` Joshua Zhao
2019-04-13 6:45 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 7+ messages in thread
From: Joshua Zhao @ 2019-04-12 23:03 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: make-wifi-fast
[-- Attachment #1: Type: text/plain, Size: 2884 bytes --]
That makes sense. I guess missing TX completion could be potential suspect
and I'll check on that.
On the other hand, why I ask about back-pressure is because when the
problem happens the UDP TX socket shows as stuck and doesn't take any new
packets.
~# netstat -tulnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
udp 0 22400 0.0.0.0:48439 0.0.0.0:*
2407/audiod-xxxxx
Basically the "Send-Q" number stays as a very high number for long time (I
didn't save what the exact number is when the problem happens) in the above
example, so that the sendto() function simply fails.
This is why I wondered about back-pressure being applied. Otherwise
shouldn't UDP socket keeps sending and packets would be dropped by the
queue scheduler?
Thanks,
Joshua
On Fri, Apr 12, 2019 at 12:57 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:
> Joshua Zhao <swzhao@gmail.com> writes:
>
> > Hi,
> > Thanks for the reply! I've also emailed the ath10k and linux-wireless
> list
> > and waiting to hear back suggestions.
> > In the meantime can you educate me how the aqm queue interacts with wifi
> > driver? Is that the driver pulls from the queue from time to time,
> instead
> > of aqm pushes to the network interface? How often or what triggers the
> > driver to pull?
>
> Generally two paths:
>
> 1. Packet comes in from upper netdev -> mac80211 queues the packet to tx ->
> driver is notified through wake_tx_queue() op, driver initiates
> transmission scheduling and pulls from TXQ
>
> and
>
> 2. Driver gets notification from hardware (mostly TX completion) ->
> driver initiates TX scheduling and pulls from TXQ
>
> There are some more cases that are variants of the above (e.g., wakeup
> from powersave etc). My guess is that in your case it is one of the
> cases in the second category that goes wrong...
>
> > I hope I can verify that if you can point me to the code to check that
> > :) And, for the queue itself, how long it's supposed to drop packets
> > and clean up?
>
> Well, when the hardware is reset, or the station is disassociated, the
> queue will be flushed. Other than that, there's no separate "cleanup"
> per se; rather, the two mechanisms outlined above should ensure that
> packets keep flowing towards the station at the other end.
>
> > It seems that when it's full, it notifies back-pressure to the socket
> > instead of simply dropping the packets from the head or the tail of
> > the queue?
>
> No, it doesn't generally do much back-pressure. Rather, when it fills
> up, it will drop packets from the head of the longest flow to clear
> space (see fq_tin_enqueue()). The limit is pretty high, though - 8192
> packets or 16 Mbytes of memory...
>
> -Toke
>
[-- Attachment #2: Type: text/html, Size: 5570 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes
2019-04-12 23:03 ` Joshua Zhao
@ 2019-04-13 6:45 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-04-13 6:45 UTC (permalink / raw)
To: Joshua Zhao; +Cc: make-wifi-fast
Joshua Zhao <swzhao@gmail.com> writes:
> That makes sense. I guess missing TX completion could be potential suspect
> and I'll check on that.
>
> On the other hand, why I ask about back-pressure is because when the
> problem happens the UDP TX socket shows as stuck and doesn't take any new
> packets.
>
> ~# netstat -tulnp
>
> Active Internet connections (only servers)
>
> Proto Recv-Q Send-Q Local Address Foreign Address State
> PID/Program name
>
> udp 0 22400 0.0.0.0:48439 0.0.0.0:*
> 2407/audiod-xxxxx
>
> Basically the "Send-Q" number stays as a very high number for long time (I
> didn't save what the exact number is when the problem happens) in the above
> example, so that the sendto() function simply fails.
> This is why I wondered about back-pressure being applied. Otherwise
> shouldn't UDP socket keeps sending and packets would be dropped by the
> queue scheduler?
I would expect so; mac80211 only ever returns NETDEV_TX_OK from its
netif_start_xmit() function. Guess the socket layer can stall out for
some reason, or something?
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-04-13 7:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 23:38 [Make-wifi-fast] tx queue stuck for many minutes Joshua Zhao
2019-04-12 0:04 ` Joshua Zhao
2019-04-12 9:18 ` Toke Høiland-Jørgensen
2019-04-12 17:26 ` Joshua Zhao
2019-04-12 19:57 ` Toke Høiland-Jørgensen
2019-04-12 23:03 ` Joshua Zhao
2019-04-13 6:45 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox