* [Make-wifi-fast] tx queue stuck for many minutes @ 2019-04-11 23:38 Joshua Zhao 2019-04-12 0:04 ` Joshua Zhao 2019-04-12 9:18 ` Toke Høiland-Jørgensen 0 siblings, 2 replies; 7+ messages in thread From: Joshua Zhao @ 2019-04-11 23:38 UTC (permalink / raw) To: make-wifi-fast [-- Attachment #1: Type: text/plain, Size: 1539 bytes --] Hi, I run into a weird symptom that occasionally the aqm tx queue can get stuck for quite long time (maybe around 30min). What happened is that we're sending a low bandwidth UDP traffic from the host to multiple peers over a WiFi radio simultaneously. We're running opensource mac80211 + ath10k driver. Very occasionally when there are temporary issues causing trouble on the host sending to one of the peers, the queue to all peers can get stuck and stay stuck for quite long (even though the link trouble had been gone or TX had been stopped). Only after quite a while, then the queue cleans up. The following dump shows an example that the backlog-packets count stays at 74 for around 30min after the TX had totally stopped. Here when video queue is stuck, other TID's traffic can go through. I wonder why it takes so long for the queue to clean up. Shouldn't packets be dropped quickly if they cannot be sent? Many thanks! cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan3/stations/xx:xx:xx/aqm target 19999us interval 99999us ecn yes tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets 0 2 0 0 3 0 0 0 0 390 3 1 3 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 3 2 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 5 1 112344 74 1627 0 0 0 0 2373618 1632 6 0 0 0 0 0 0 0 0 0 0 7 0 0 0 1 0 0 0 0 32 1 8 2 0 0 0 0 0 0 0 0 0 9 3 0 0 0 0 0 0 0 0 0 10 3 0 0 0 0 0 0 0 0 0 11 2 0 0 0 0 0 0 0 0 0 12 1 0 0 0 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 [-- Attachment #2: Type: text/html, Size: 1816 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes 2019-04-11 23:38 [Make-wifi-fast] tx queue stuck for many minutes Joshua Zhao @ 2019-04-12 0:04 ` Joshua Zhao 2019-04-12 9:18 ` Toke Høiland-Jørgensen 1 sibling, 0 replies; 7+ messages in thread From: Joshua Zhao @ 2019-04-12 0:04 UTC (permalink / raw) To: make-wifi-fast [-- Attachment #1: Type: text/plain, Size: 1801 bytes --] and, BTW, what's best way to disable or skip AQM for comparison purposes? Thanks! Joshua On Thu, Apr 11, 2019 at 4:38 PM Joshua Zhao <swzhao@gmail.com> wrote: > Hi, > I run into a weird symptom that occasionally the aqm tx queue can get > stuck for quite long time (maybe around 30min). What happened is that > we're sending a low bandwidth UDP traffic from the host to multiple peers > over a WiFi radio simultaneously. We're running opensource mac80211 + > ath10k driver. Very occasionally when there are temporary issues causing > trouble on the host sending to one of the peers, the queue to all peers can > get stuck and stay stuck for quite long (even though the link trouble had > been gone or TX had been stopped). Only after quite a while, then the queue > cleans up. > The following dump shows an example that the backlog-packets count stays > at 74 for around 30min after the TX had totally stopped. Here when video > queue is stuck, other TID's traffic can go through. > I wonder why it takes so long for the queue to clean up. Shouldn't packets > be dropped quickly if they cannot be sent? > > Many thanks! > > cat /sys/kernel/debug/ieee80211/phy0/netdev:wlan3/stations/xx:xx:xx/aqm > > target 19999us interval 99999us ecn yes > > tid ac backlog-bytes backlog-packets new-flows drops marks overlimit > collisions tx-bytes tx-packets > > 0 2 0 0 3 0 0 0 0 390 3 > > 1 3 0 0 0 0 0 0 0 0 0 > > 2 3 0 0 0 0 0 0 0 0 0 > > 3 2 0 0 0 0 0 0 0 0 0 > > 4 1 0 0 0 0 0 0 0 0 0 > > 5 1 112344 74 1627 0 0 0 0 2373618 1632 > > 6 0 0 0 0 0 0 0 0 0 0 > > 7 0 0 0 1 0 0 0 0 32 1 > > 8 2 0 0 0 0 0 0 0 0 0 > > 9 3 0 0 0 0 0 0 0 0 0 > > 10 3 0 0 0 0 0 0 0 0 0 > > 11 2 0 0 0 0 0 0 0 0 0 > > 12 1 0 0 0 0 0 0 0 0 0 > > 13 1 0 0 0 0 0 0 0 0 0 > > 14 0 0 0 0 0 0 0 0 0 0 > > 15 0 0 0 0 0 0 0 0 0 0 > > > > [-- Attachment #2: Type: text/html, Size: 2292 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes 2019-04-11 23:38 [Make-wifi-fast] tx queue stuck for many minutes Joshua Zhao 2019-04-12 0:04 ` Joshua Zhao @ 2019-04-12 9:18 ` Toke Høiland-Jørgensen 2019-04-12 17:26 ` Joshua Zhao 1 sibling, 1 reply; 7+ messages in thread From: Toke Høiland-Jørgensen @ 2019-04-12 9:18 UTC (permalink / raw) To: Joshua Zhao, make-wifi-fast Joshua Zhao <swzhao@gmail.com> writes: > Hi, > I run into a weird symptom that occasionally the aqm tx queue can get stuck > for quite long time (maybe around 30min). What happened is that we're > sending a low bandwidth UDP traffic from the host to multiple peers over a > WiFi radio simultaneously. We're running opensource mac80211 + ath10k > driver. Very occasionally when there are temporary issues causing trouble > on the host sending to one of the peers, the queue to all peers can get > stuck and stay stuck for quite long (even though the link trouble had been > gone or TX had been stopped). Only after quite a while, then the queue > cleans up. This sorta sounds like a driver bug where the queue doesn't get restarted after the issue is resolved? I'd suggest you email the ath10k (ath10k@lists.infradead.org - maybe cc linux-wireless@vger.kernel.org and this list) with a description of the issue that triggers this bug, as well as system details (hardware model, firmware version and kernel version). -Toke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes 2019-04-12 9:18 ` Toke Høiland-Jørgensen @ 2019-04-12 17:26 ` Joshua Zhao 2019-04-12 19:57 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 7+ messages in thread From: Joshua Zhao @ 2019-04-12 17:26 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: make-wifi-fast [-- Attachment #1: Type: text/plain, Size: 1877 bytes --] Hi, Thanks for the reply! I've also emailed the ath10k and linux-wireless list and waiting to hear back suggestions. In the meantime can you educate me how the aqm queue interacts with wifi driver? Is that the driver pulls from the queue from time to time, instead of aqm pushes to the network interface? How often or what triggers the driver to pull? I hope I can verify that if you can point me to the code to check that :) And, for the queue itself, how long it's supposed to drop packets and clean up? It seems that when it's full, it notifies back-pressure to the socket instead of simply dropping the packets from the head or the tail of the queue? Many thanks! Joshua On Fri, Apr 12, 2019 at 2:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Joshua Zhao <swzhao@gmail.com> writes: > > > Hi, > > I run into a weird symptom that occasionally the aqm tx queue can get > stuck > > for quite long time (maybe around 30min). What happened is that we're > > sending a low bandwidth UDP traffic from the host to multiple peers over > a > > WiFi radio simultaneously. We're running opensource mac80211 + ath10k > > driver. Very occasionally when there are temporary issues causing trouble > > on the host sending to one of the peers, the queue to all peers can get > > stuck and stay stuck for quite long (even though the link trouble had > been > > gone or TX had been stopped). Only after quite a while, then the queue > > cleans up. > > This sorta sounds like a driver bug where the queue doesn't get > restarted after the issue is resolved? I'd suggest you email the ath10k > (ath10k@lists.infradead.org - maybe cc linux-wireless@vger.kernel.org > and this list) with a description of the issue that triggers this bug, > as well as system details (hardware model, firmware version and kernel > version). > > -Toke > [-- Attachment #2: Type: text/html, Size: 2495 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes 2019-04-12 17:26 ` Joshua Zhao @ 2019-04-12 19:57 ` Toke Høiland-Jørgensen 2019-04-12 23:03 ` Joshua Zhao 0 siblings, 1 reply; 7+ messages in thread From: Toke Høiland-Jørgensen @ 2019-04-12 19:57 UTC (permalink / raw) To: Joshua Zhao; +Cc: make-wifi-fast Joshua Zhao <swzhao@gmail.com> writes: > Hi, > Thanks for the reply! I've also emailed the ath10k and linux-wireless list > and waiting to hear back suggestions. > In the meantime can you educate me how the aqm queue interacts with wifi > driver? Is that the driver pulls from the queue from time to time, instead > of aqm pushes to the network interface? How often or what triggers the > driver to pull? Generally two paths: 1. Packet comes in from upper netdev -> mac80211 queues the packet to tx -> driver is notified through wake_tx_queue() op, driver initiates transmission scheduling and pulls from TXQ and 2. Driver gets notification from hardware (mostly TX completion) -> driver initiates TX scheduling and pulls from TXQ There are some more cases that are variants of the above (e.g., wakeup from powersave etc). My guess is that in your case it is one of the cases in the second category that goes wrong... > I hope I can verify that if you can point me to the code to check that > :) And, for the queue itself, how long it's supposed to drop packets > and clean up? Well, when the hardware is reset, or the station is disassociated, the queue will be flushed. Other than that, there's no separate "cleanup" per se; rather, the two mechanisms outlined above should ensure that packets keep flowing towards the station at the other end. > It seems that when it's full, it notifies back-pressure to the socket > instead of simply dropping the packets from the head or the tail of > the queue? No, it doesn't generally do much back-pressure. Rather, when it fills up, it will drop packets from the head of the longest flow to clear space (see fq_tin_enqueue()). The limit is pretty high, though - 8192 packets or 16 Mbytes of memory... -Toke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes 2019-04-12 19:57 ` Toke Høiland-Jørgensen @ 2019-04-12 23:03 ` Joshua Zhao 2019-04-13 6:45 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 7+ messages in thread From: Joshua Zhao @ 2019-04-12 23:03 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: make-wifi-fast [-- Attachment #1: Type: text/plain, Size: 2884 bytes --] That makes sense. I guess missing TX completion could be potential suspect and I'll check on that. On the other hand, why I ask about back-pressure is because when the problem happens the UDP TX socket shows as stuck and doesn't take any new packets. ~# netstat -tulnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp 0 22400 0.0.0.0:48439 0.0.0.0:* 2407/audiod-xxxxx Basically the "Send-Q" number stays as a very high number for long time (I didn't save what the exact number is when the problem happens) in the above example, so that the sendto() function simply fails. This is why I wondered about back-pressure being applied. Otherwise shouldn't UDP socket keeps sending and packets would be dropped by the queue scheduler? Thanks, Joshua On Fri, Apr 12, 2019 at 12:57 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Joshua Zhao <swzhao@gmail.com> writes: > > > Hi, > > Thanks for the reply! I've also emailed the ath10k and linux-wireless > list > > and waiting to hear back suggestions. > > In the meantime can you educate me how the aqm queue interacts with wifi > > driver? Is that the driver pulls from the queue from time to time, > instead > > of aqm pushes to the network interface? How often or what triggers the > > driver to pull? > > Generally two paths: > > 1. Packet comes in from upper netdev -> mac80211 queues the packet to tx -> > driver is notified through wake_tx_queue() op, driver initiates > transmission scheduling and pulls from TXQ > > and > > 2. Driver gets notification from hardware (mostly TX completion) -> > driver initiates TX scheduling and pulls from TXQ > > There are some more cases that are variants of the above (e.g., wakeup > from powersave etc). My guess is that in your case it is one of the > cases in the second category that goes wrong... > > > I hope I can verify that if you can point me to the code to check that > > :) And, for the queue itself, how long it's supposed to drop packets > > and clean up? > > Well, when the hardware is reset, or the station is disassociated, the > queue will be flushed. Other than that, there's no separate "cleanup" > per se; rather, the two mechanisms outlined above should ensure that > packets keep flowing towards the station at the other end. > > > It seems that when it's full, it notifies back-pressure to the socket > > instead of simply dropping the packets from the head or the tail of > > the queue? > > No, it doesn't generally do much back-pressure. Rather, when it fills > up, it will drop packets from the head of the longest flow to clear > space (see fq_tin_enqueue()). The limit is pretty high, though - 8192 > packets or 16 Mbytes of memory... > > -Toke > [-- Attachment #2: Type: text/html, Size: 5570 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Make-wifi-fast] tx queue stuck for many minutes 2019-04-12 23:03 ` Joshua Zhao @ 2019-04-13 6:45 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 7+ messages in thread From: Toke Høiland-Jørgensen @ 2019-04-13 6:45 UTC (permalink / raw) To: Joshua Zhao; +Cc: make-wifi-fast Joshua Zhao <swzhao@gmail.com> writes: > That makes sense. I guess missing TX completion could be potential suspect > and I'll check on that. > > On the other hand, why I ask about back-pressure is because when the > problem happens the UDP TX socket shows as stuck and doesn't take any new > packets. > > ~# netstat -tulnp > > Active Internet connections (only servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State > PID/Program name > > udp 0 22400 0.0.0.0:48439 0.0.0.0:* > 2407/audiod-xxxxx > > Basically the "Send-Q" number stays as a very high number for long time (I > didn't save what the exact number is when the problem happens) in the above > example, so that the sendto() function simply fails. > This is why I wondered about back-pressure being applied. Otherwise > shouldn't UDP socket keeps sending and packets would be dropped by the > queue scheduler? I would expect so; mac80211 only ever returns NETDEV_TX_OK from its netif_start_xmit() function. Guess the socket layer can stall out for some reason, or something? -Toke ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-04-13 7:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-11 23:38 [Make-wifi-fast] tx queue stuck for many minutes Joshua Zhao 2019-04-12 0:04 ` Joshua Zhao 2019-04-12 9:18 ` Toke Høiland-Jørgensen 2019-04-12 17:26 ` Joshua Zhao 2019-04-12 19:57 ` Toke Høiland-Jørgensen 2019-04-12 23:03 ` Joshua Zhao 2019-04-13 6:45 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox