[Make-wifi-fast] [PATCH v11 4/4] mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Thu Feb 27 05:42:55 EST 2020

On 2020-02-27 09:24, Kan Yan wrote:
>> - AQL estimated airtime does not take into account A-MPDU, so it is
>> significantly overestimating airtime use for aggregated traffic,
>> especially on high rates.
>> My proposed solution would be to check for a running aggregation session
>> and set estimated tx time to something like:
>> expected_airtime(16 * skb->len) / 16.
> 
> Yes, that's a known limitation that needs to be improved. I actually
> post a comment for this patch:
> "[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76"
>>
>> When a txq is subjected to the queue limit,
>> there is usually a significant amount of frames being queued and those
>> frames are likely being sent out in large aggregations. So, in most
>> cases when AQL is active, the medium access overhead can be amortized
>> over many frames and the per frame overhead could be considerably
>> lower than 36, especially at higher data rate. As a result, the
>> pending airtime calculated this way could be higher than the actual
>> airtime. In my test, I have to compensate that by increasing
>> "aql_txq_limit" via debugfs to get the same peak throughput as without
>> AQL.
> 
> 
>> My proposed solution would be to check for a running aggregation session
>> and set estimated tx time to something like:
>> expected_airtime(16 * skb->len) / 16.
> 
> I think that's a reasonable approximation, but doubts aggregation
> information is available in all architectures. In some architecture,
> firmware may only report aggregation information after the frame has
> been transmitted.
I'm not proposing using frame transmission aggregation. It would be a
simple approximation where it would use a fixed assumed average
aggregation size if an aggregation session is established.

> In my earlier version of AQL for the out-of-tree ChromeOS kernel,
> A-MPDU is dealt this way:  The the medium access overhead is only
> counted once for each TXQ for all frames released to the hardware over
> a 4ms period, assuming those frames are likely to be aggregated
> together.
I think that would be more accurate, but probably also more expensive to
calculate.

> Instead of calculating the estimated airtime using the last known phy
> rate, then try to add some estimated overhead for medium access time,
> another potentially better approach is to use average data rate, which
> is byte_transmited/firmware_reported_actual_airtime. The average rate
> not only includes medium access overhead, but also takes retries into
> account.
Also an interesting idea, though probably not available on all hardware.

>> - We need an API that allows the driver to change the pending airtime
>> values, e.g. subtract estimated tx time for a packet.
>> mt76 an ath9k can queue packets inside the driver that are not currently
>> in the hardware queues. Typically if the txqs have more data than what
>> gets put into the hardware queue, both drivers will pull an extra frame
>> and queue it in its private txq struct. This frame will get used on the
>> next txq scheduling round for that particular station.
>> If you have lots of stations doing traffic (or having driver buffered
>> frames in powersave mode), this could use up a sizable chunk of the AQL
>> budget.
> 
> Not sure I fully understand your concerns here. The AQL budget is per
> STA, per TID, so frames queued in the driver's special queue for one
> station won't impact another station's budget. Those frames are
> counted towards the per interface pending airtime, which could trigger
> AQL to switch to use the lower queue limit. In this case, that could
> be the desirable behavior when there is heavy traffic.
Yes, the per interface limit is what I'm concerned about. I'm not sure
if it will be an issue in practice, it's just something that I noticed
while reviewing the code.

- Felix