From: Yibo Zhao <yiboz@codeaurora.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: make-wifi-fast@lists.bufferbloat.net,
linux-wireless@vger.kernel.org, Felix Fietkau <nbd@nbd.name>,
Rajkumar Manoharan <rmanohar@codeaurora.org>,
Kan Yan <kyan@google.com>,
linux-wireless-owner@vger.kernel.org
Subject: Re: [Make-wifi-fast] [RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler
Date: Fri, 12 Apr 2019 14:34:11 +0800 [thread overview]
Message-ID: <e0b66d5ea193c2bf402d6f2c7c50a991@codeaurora.org> (raw)
In-Reply-To: <877ec0u6mu.fsf@toke.dk>
On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
>
>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>
>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>
>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>
>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>>> This switches the airtime scheduler in mac80211 to use a
>>>>>>>>> virtual
>>>>>>>>> time-based
>>>>>>>>> scheduler instead of the round-robin scheduler used before.
>>>>>>>>> This
>>>>>>>>> has
>>>>>>>>> a
>>>>>>>>> couple of advantages:
>>>>>>>>>
>>>>>>>>> - No need to sync up the round-robin scheduler in
>>>>>>>>> firmware/hardware
>>>>>>>>> with
>>>>>>>>> the round-robin airtime scheduler.
>>>>>>>>>
>>>>>>>>> - If several stations are eligible for transmission we can
>>>>>>>>> schedule
>>>>>>>>> both of
>>>>>>>>> them; no need to hard-block the scheduling rotation until the
>>>>>>>>> head
>>>>>>>>> of
>>>>>>>>> the
>>>>>>>>> queue has used up its quantum.
>>>>>>>>>
>>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>>> becomes
>>>>>>>>> simpler (in ieee80211_txq_may_transmit()).
>>>>>>>>>
>>>>>>>>> The drawback is that scheduling becomes slightly more
>>>>>>>>> expensive,
>>>>>>>>> as
>>>>>>>>> we
>>>>>>>>> need
>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This
>>>>>>>>> means
>>>>>>>>> that
>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>>> currently
>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too
>>>>>>>>> big
>>>>>>>>> (it's
>>>>>>>>> only TXQs currently backlogged, not all associated stations),
>>>>>>>>> so
>>>>>>>>> it
>>>>>>>>> shouldn't be too big of an issue.
>>>>>>>>>
>>>>>>>>> @@ -1831,18 +1830,32 @@ void
>>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>> {
>>>>>>>>> struct sta_info *sta = container_of(pubsta, struct sta_info,
>>>>>>>>> sta);
>>>>>>>>> struct ieee80211_local *local = sta->sdata->local;
>>>>>>>>> + struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>> u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>>> - u32 airtime = 0;
>>>>>>>>> + u64 airtime = 0, weight_sum;
>>>>>>>>> +
>>>>>>>>> + if (!txq)
>>>>>>>>> + return;
>>>>>>>>>
>>>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>> airtime += tx_airtime;
>>>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>> airtime += rx_airtime;
>>>>>>>>>
>>>>>>>>> + /* Weights scale so the unit weight is 256 */
>>>>>>>>> + airtime <<= 8;
>>>>>>>>> +
>>>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>> +
>>>>>>>>> sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>> sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>>> - sta->airtime[ac].deficit -= airtime;
>>>>>>>>> +
>>>>>>>>> + weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>>> sta->airtime_weight;
>>>>>>>>> +
>>>>>>>>> + local->airtime_v_t[ac] += airtime / weight_sum;
>>>>>>>> Hi Toke,
>>>>>>>>
>>>>>>>> Please ignore the previous two broken emails regarding this new
>>>>>>>> proposal
>>>>>>>> from me.
>>>>>>>>
>>>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only
>>>>>>>> the
>>>>>>>> stations with less airtime than that are valid for Tx. That
>>>>>>>> means
>>>>>>>> there
>>>>>>>> are situations, like 50 clients, that some of the stations can
>>>>>>>> be
>>>>>>>> used
>>>>>>>> to Tx when putting next_txq in the loop. Am I right?
>>>>>>>
>>>>>>> I'm not sure what you mean here. Are you referring to the case
>>>>>>> where
>>>>>>> new
>>>>>>> stations appear with a very low (zero) airtime_v_t? That is
>>>>>>> handled
>>>>>>> when
>>>>>>> the station is enqueued.
>>>>>> Hi Toke,
>>>>>>
>>>>>> Sorry for the confusion. I am not referring to the case that you
>>>>>> mentioned though it can be solved by your subtle design, max(local
>>>>>> vt,
>>>>>> sta vt). :-)
>>>>>>
>>>>>> Actually, my concern is situation about putting next_txq in the
>>>>>> loop.
>>>>>> Let me explain a little more and see below.
>>>>>>
>>>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>>>>> struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw
>>>>>>> *hw,
>>>>>>> u8
>>>>>>> ac)
>>>>>>> {
>>>>>>> struct ieee80211_local *local = hw_to_local(hw);
>>>>>>> + struct rb_node *node = local->schedule_pos[ac];
>>>>>>> struct txq_info *txqi = NULL;
>>>>>>> + bool first = false;
>>>>>>>
>>>>>>> lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>>>>
>>>>>>> - begin:
>>>>>>> - txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>>>>>> - struct txq_info,
>>>>>>> - schedule_order);
>>>>>>> - if (!txqi)
>>>>>>> + if (!node) {
>>>>>>> + node = rb_first_cached(&local->active_txqs[ac]);
>>>>>>> + first = true;
>>>>>>> + } else
>>>>>>> + node = rb_next(node);
>>>>>>
>>>>>> Consider below piece of code from ath10k_mac_schedule_txq:
>>>>>>
>>>>>> ieee80211_txq_schedule_start(hw, ac);
>>>>>> while ((txq = ieee80211_next_txq(hw, ac))) {
>>>>>> while (ath10k_mac_tx_can_push(hw, txq)) {
>>>>>> ret = ath10k_mac_tx_push_txq(hw, txq);
>>>>>> if (ret < 0)
>>>>>> break;
>>>>>> }
>>>>>> ieee80211_return_txq(hw, txq);
>>>>>> ath10k_htt_tx_txq_update(hw, txq);
>>>>>> if (ret == -EBUSY)
>>>>>> break;
>>>>>> }
>>>>>> ieee80211_txq_schedule_end(hw, ac);
>>>>>>
>>>>>> If my understanding is right, local->schedule_pos is used to
>>>>>> record
>>>>>> the
>>>>>> last scheduled node and used for traversal rbtree for valid txq.
>>>>>> There
>>>>>> is chance that an empty txq is feeded to return_txq and got
>>>>>> removed
>>>>>> from
>>>>>> rbtree. The empty txq will always be the rb_first node. Then in
>>>>>> the
>>>>>> following next_txq, local->schedule_pos becomes meaningless since
>>>>>> its
>>>>>> rb_next will return NULL and the loop break. Only rb_first get
>>>>>> dequeued
>>>>>> during this loop.
>>>>>>
>>>>>> if (!node || RB_EMPTY_NODE(node)) {
>>>>>> node = rb_first_cached(&local->active_txqs[ac]);
>>>>>> first = true;
>>>>>> } else
>>>>>> node = rb_next(node);
>>>>>
>>>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice
>>>>> catch! :)
>>>>>
>>>>>> How about this? The nodes on the rbtree will be dequeued and
>>>>>> removed
>>>>>> from rbtree one by one until HW is busy. Please note local vt and
>>>>>> sta
>>>>>> vt will not be updated since txq lock is held during this time.
>>>>>
>>>>> Insertion and removal from the rbtree are relatively expensive, so
>>>>> I'd
>>>>> rather not do that for every txq. I think a better way to solve
>>>>> this
>>>>> is to just defer the actual removal from the tree until
>>>>> ieee80211_txq_schedule_end()... Will fix that when I submit this
>>>>> again.
>>>>
>>>> Do you mean we keep the empty txqs in the rbtree until loop finishes
>>>> and
>>>> remove them in ieee80211_txq_schedule_end(may be put return_txq in
>>>> it)?
>>>> If it is the case, I suppose a list is needed to store the empty
>>>> txqs
>>>> so
>>>> as to dequeue them in ieee80211_txq_schedule_end.
>>>
>>> Yeah, return_txq() would just put "to be removed" TXQs on a list, and
>>> schedule_end() would do the actual removal (after checking whether a
>>> new
>>> packet showed up in the meantime).
>>
>> SGTM
>>
>>>
>>>> And one more thing,
>>>>
>>>>> + if (sta->airtime[ac].v_t > local->airtime_v_t[ac])
>>>>> {
>>>>> + if (first)
>>>>> + local->airtime_v_t[ac] =
>>>>> sta->airtime[ac].v_t;
>>>>> + else
>>>>> + return NULL;
>>>>
>>>> As local->airtime_v_t will not be updated during loop, we don't need
>>>> to
>>>> return NULL.
>>>
>>> Yes we do; this is actually the break condition. I.e., stations whose
>>> virtual time are higher than the global time (in local->airtime_v_t)
>>> are
>>> not allowed to transmit. And since we are traversing them in order,
>>> when
>>> we find the first such station, we are done and can break out of the
>>> scheduling loop entirely (which is what we do by returning NULL). The
>>> other branch in the inner if() is just for the case where no stations
>>> are currently eligible to transmit according to this rule; here we
>>> don't
>>> want to stall, so we advance the global timer so the first station
>>> becomes eligible...
>>
>> Yes,the inner if() make sure first node always get scheduled no matter
>> its vt.
>>
>> To detail my concern, let's assume only two nodes in the tree and
>> empty nodes will be in tree until schedule_end(). In the loop and in
>> case hw is not busy, ath10k will drain every node next_txq returned
>> before asking for another txq again. Then as we are traversing to next
>> rb node, it is highly possible the second node is not allowed to
>> transmit since the global time has not been updated yet as the active
>> txq lock is held. At this time, only second node on the tree has data
>> and hw is capable of sending more data. I don't think the second node
>> is not valid for transmission in this situation.
>>
>> With more nodes in the tree in this situation, I think same thing
>> happens that all nodes except the first node are not allowed to
>> transmit since none of their vts are less than the global time which
>> is not updated in time. The loop breaks when we are checking the
>> second node.
>
> Yeah, in many cases we will end up throttling all but the first (couple
> of) node(s). This is by design; otherwise we can't ensure fairness. As
> long as we are making forward progress that is fine, though...
All right, I see your point. Fairness is our first priority. :)
>
> -Toke
--
Yibo
next prev parent reply other threads:[~2019-04-12 6:34 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-15 17:05 Toke Høiland-Jørgensen
2019-02-15 19:44 ` Dave Taht
2019-03-05 15:45 ` Toke Høiland-Jørgensen
2019-03-06 23:09 ` Rajkumar Manoharan
2019-03-07 9:46 ` Toke Høiland-Jørgensen
2019-03-07 14:27 ` Felix Fietkau
2019-03-08 11:05 ` Toke Høiland-Jørgensen
2019-03-08 18:16 ` Felix Fietkau
2019-03-08 19:06 ` Toke Høiland-Jørgensen
2019-04-04 4:41 ` Yibo Zhao
2019-04-04 4:43 ` Yibo Zhao
2019-04-04 5:00 ` Yibo Zhao
2019-04-04 8:31 ` Toke Høiland-Jørgensen
2019-04-04 8:36 ` Dave Taht
2019-04-04 8:50 ` Toke Høiland-Jørgensen
2019-04-09 13:25 ` Yibo Zhao
2019-04-09 20:41 ` Toke Høiland-Jørgensen
2019-04-10 6:35 ` Yibo Zhao
2019-04-10 10:40 ` Toke Høiland-Jørgensen
2019-04-11 3:12 ` Yibo Zhao
2019-04-11 11:24 ` Toke Høiland-Jørgensen
2019-04-12 6:34 ` Yibo Zhao [this message]
2019-04-19 15:05 ` Yibo Zhao
2019-04-20 21:15 ` Toke Høiland-Jørgensen
2019-04-30 9:45 ` Yibo Zhao
2019-04-30 10:39 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/make-wifi-fast.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0b66d5ea193c2bf402d6f2c7c50a991@codeaurora.org \
--to=yiboz@codeaurora.org \
--cc=kyan@google.com \
--cc=linux-wireless-owner@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=make-wifi-fast@lists.bufferbloat.net \
--cc=nbd@nbd.name \
--cc=rmanohar@codeaurora.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox