That makes sense. I guess missing TX completion could be potential suspect and I'll check on that. 

On the other hand, why I ask about back-pressure is because when the problem happens the UDP TX socket shows as stuck and doesn't take any new packets. 

~# netstat -tulnp

Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    

udp        0  22400 0.0.0.0:48439           0.0.0.0:*                           2407/audiod-xxxxx


Basically the "Send-Q" number stays as a very high number for long time (I didn't save what the exact number is when the problem happens) in the above example, so that the sendto() function simply fails.
This is why I wondered about back-pressure being applied.  Otherwise shouldn't UDP socket keeps sending and packets would be dropped by the queue scheduler?

Thanks,
Joshua


On Fri, Apr 12, 2019 at 12:57 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
Joshua Zhao <swzhao@gmail.com> writes:

> Hi,
> Thanks for the reply!  I've also emailed the ath10k and linux-wireless list
> and waiting to hear back suggestions.
> In the meantime can you educate me how the aqm queue interacts with wifi
> driver? Is that the driver pulls from the queue from time to time, instead
> of aqm pushes to the network interface? How often or what triggers the
> driver to pull?

Generally two paths:

1. Packet comes in from upper netdev -> mac80211 queues the packet to tx ->
   driver is notified through wake_tx_queue() op, driver initiates
   transmission scheduling and pulls from TXQ

and

2. Driver gets notification from hardware (mostly TX completion) ->
   driver initiates TX scheduling and pulls from TXQ

There are some more cases that are variants of the above (e.g., wakeup
from powersave etc). My guess is that in your case it is one of the
cases in the second category that goes wrong...

> I hope I can verify that if you can point me to the code to check that
> :) And, for the queue itself, how long it's supposed to drop packets
> and clean up?

Well, when the hardware is reset, or the station is disassociated, the
queue will be flushed. Other than that, there's no separate "cleanup"
per se; rather, the two mechanisms outlined above should ensure that
packets keep flowing towards the station at the other end.

> It seems that when it's full, it notifies back-pressure to the socket
> instead of simply dropping the packets from the head or the tail of
> the queue?

No, it doesn't generally do much back-pressure. Rather, when it fills
up, it will drop packets from the head of the longest flow to clear
space (see fq_tin_enqueue()). The limit is pretty high, though - 8192
packets or 16 Mbytes of memory...

-Toke