From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 8CA743CB3B for ; Thu, 11 Apr 2019 07:24:27 -0400 (EDT) Received: by mail-ed1-f66.google.com with SMTP id d11so4851142edp.11 for ; Thu, 11 Apr 2019 04:24:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=mavxHbU2VAAoXiF/CcQpkVYeCUBAZYGCXl4k0uV4INE=; b=rlhRu0f7+mlhnNcEKWaETKa32pdywoYL++Sura1/N3tIHS3qlXmeelbFZ/sWjodop1 N3oAYnuXUjvxp1u8QZ4EHR8QKsEzOWzZNkLQfGaSP6keM5V1smA151ec+9wwTXg95sjv 1baP7yUhEH9cD3KvXInBATYcl1m9wTy+szIzYUYGSwXMLVbgu/jwvYN2VlZJkNCS4fXW vTO4PrRXqlur4dW33dE6B7gX2djIZdKHF9QfbAEwpsk19b48JWAJbdzw3x8HI8zRYdoI i7lbGvqYAdxZn4G5LbwmBs4FBUlerYRNO7FKO3pWNQ+M6oJsNJ93JwYrd/npA4ix196a 4nCg== X-Gm-Message-State: APjAAAVwmbkN5Fwrwc2ezui7Avtlwg8oEpyIzql9ExAsjaYlB9X9VVp/ 93OJ9k5DVUMVYDYvjKQOJCCXBA== X-Google-Smtp-Source: APXvYqz5/CqB3YrIderIkHTLwoKPUdpDiU7DRSrVSEZCfGWxT9i2WF0T6Itt3oeJCI6fRj1MO6CVuw== X-Received: by 2002:a50:fa4c:: with SMTP id c12mr24726675edq.37.1554981866586; Thu, 11 Apr 2019 04:24:26 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk (alrua-x1.vpn.toke.dk. [2a00:7660:6da:10::2]) by smtp.gmail.com with ESMTPSA id m26sm1114236edr.48.2019.04.11.04.24.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 11 Apr 2019 04:24:25 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 63617180C01; Thu, 11 Apr 2019 13:24:25 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Yibo Zhao Cc: make-wifi-fast@lists.bufferbloat.net, linux-wireless@vger.kernel.org, Felix Fietkau , Rajkumar Manoharan , Kan Yan , linux-wireless-owner@vger.kernel.org In-Reply-To: <73077ba7cda566d5eeb2395978b3524c@codeaurora.org> References: <20190215170512.31512-1-toke@redhat.com> <753b328855b85f960ceaf974194a7506@codeaurora.org> <87ftqy41ea.fsf@toke.dk> <877ec2ykrh.fsf@toke.dk> <89d32174b282006c8d4e7614657171be@codeaurora.org> <87a7gyw3cu.fsf@toke.dk> <73077ba7cda566d5eeb2395978b3524c@codeaurora.org> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 11 Apr 2019 13:24:25 +0200 Message-ID: <877ec0u6mu.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Make-wifi-fast] [RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Apr 2019 11:24:27 -0000 Yibo Zhao writes: > On 2019-04-10 18:40, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> Yibo Zhao writes: >>=20 >>> On 2019-04-10 04:41, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>>> Yibo Zhao writes: >>>>=20 >>>>> On 2019-04-04 16:31, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>>>>> Yibo Zhao writes: >>>>>>=20 >>>>>>> On 2019-02-16 01:05, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>>>>>>> This switches the airtime scheduler in mac80211 to use a virtual >>>>>>>> time-based >>>>>>>> scheduler instead of the round-robin scheduler used before. This >>>>>>>> has >>>>>>>> a >>>>>>>> couple of advantages: >>>>>>>>=20 >>>>>>>> - No need to sync up the round-robin scheduler in=20 >>>>>>>> firmware/hardware >>>>>>>> with >>>>>>>> the round-robin airtime scheduler. >>>>>>>>=20 >>>>>>>> - If several stations are eligible for transmission we can=20 >>>>>>>> schedule >>>>>>>> both of >>>>>>>> them; no need to hard-block the scheduling rotation until the >>>>>>>> head >>>>>>>> of >>>>>>>> the >>>>>>>> queue has used up its quantum. >>>>>>>>=20 >>>>>>>> - The check of whether a station is eligible for transmission >>>>>>>> becomes >>>>>>>> simpler (in ieee80211_txq_may_transmit()). >>>>>>>>=20 >>>>>>>> The drawback is that scheduling becomes slightly more expensive,=20 >>>>>>>> as >>>>>>>> we >>>>>>>> need >>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This means >>>>>>>> that >>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of >>>>>>>> currently >>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too=20 >>>>>>>> big >>>>>>>> (it's >>>>>>>> only TXQs currently backlogged, not all associated stations), so=20 >>>>>>>> it >>>>>>>> shouldn't be too big of an issue. >>>>>>>>=20 >>>>>>>> @@ -1831,18 +1830,32 @@ void=20 >>>>>>>> ieee80211_sta_register_airtime(struct >>>>>>>> ieee80211_sta *pubsta, u8 tid, >>>>>>>> { >>>>>>>> struct sta_info *sta =3D container_of(pubsta, struct sta_info, >>>>>>>> sta); >>>>>>>> struct ieee80211_local *local =3D sta->sdata->local; >>>>>>>> + struct ieee80211_txq *txq =3D sta->sta.txq[tid]; >>>>>>>> u8 ac =3D ieee80211_ac_from_tid(tid); >>>>>>>> - u32 airtime =3D 0; >>>>>>>> + u64 airtime =3D 0, weight_sum; >>>>>>>> + >>>>>>>> + if (!txq) >>>>>>>> + return; >>>>>>>>=20 >>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_TX) >>>>>>>> airtime +=3D tx_airtime; >>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_RX) >>>>>>>> airtime +=3D rx_airtime; >>>>>>>>=20 >>>>>>>> + /* Weights scale so the unit weight is 256 */ >>>>>>>> + airtime <<=3D 8; >>>>>>>> + >>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]); >>>>>>>> + >>>>>>>> sta->airtime[ac].tx_airtime +=3D tx_airtime; >>>>>>>> sta->airtime[ac].rx_airtime +=3D rx_airtime; >>>>>>>> - sta->airtime[ac].deficit -=3D airtime; >>>>>>>> + >>>>>>>> + weight_sum =3D local->airtime_weight_sum[ac] ?: >>>>>>>> sta->airtime_weight; >>>>>>>> + >>>>>>>> + local->airtime_v_t[ac] +=3D airtime / weight_sum; >>>>>>> Hi Toke, >>>>>>>=20 >>>>>>> Please ignore the previous two broken emails regarding this new >>>>>>> proposal >>>>>>> from me. >>>>>>>=20 >>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the >>>>>>> stations with less airtime than that are valid for Tx. That means >>>>>>> there >>>>>>> are situations, like 50 clients, that some of the stations can be >>>>>>> used >>>>>>> to Tx when putting next_txq in the loop. Am I right? >>>>>>=20 >>>>>> I'm not sure what you mean here. Are you referring to the case=20 >>>>>> where >>>>>> new >>>>>> stations appear with a very low (zero) airtime_v_t? That is handled >>>>>> when >>>>>> the station is enqueued. >>>>> Hi Toke, >>>>>=20 >>>>> Sorry for the confusion. I am not referring to the case that you >>>>> mentioned though it can be solved by your subtle design, max(local=20 >>>>> vt, >>>>> sta vt). :-) >>>>>=20 >>>>> Actually, my concern is situation about putting next_txq in the=20 >>>>> loop. >>>>> Let me explain a little more and see below. >>>>>=20 >>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue); >>>>>> struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw,=20 >>>>>> u8 >>>>>> ac) >>>>>> { >>>>>> struct ieee80211_local *local =3D hw_to_local(hw); >>>>>> + struct rb_node *node =3D local->schedule_pos[ac]; >>>>>> struct txq_info *txqi =3D NULL; >>>>>> + bool first =3D false; >>>>>>=20 >>>>>> lockdep_assert_held(&local->active_txq_lock[ac]); >>>>>>=20 >>>>>> - begin: >>>>>> - txqi =3D list_first_entry_or_null(&local->active_txqs[ac], >>>>>> - struct txq_info, >>>>>> - schedule_order); >>>>>> - if (!txqi) >>>>>> + if (!node) { >>>>>> + node =3D rb_first_cached(&local->active_txqs[ac]); >>>>>> + first =3D true; >>>>>> + } else >>>>>> + node =3D rb_next(node); >>>>>=20 >>>>> Consider below piece of code from ath10k_mac_schedule_txq: >>>>>=20 >>>>> ieee80211_txq_schedule_start(hw, ac); >>>>> while ((txq =3D ieee80211_next_txq(hw, ac))) { >>>>> while (ath10k_mac_tx_can_push(hw, txq)) { >>>>> ret =3D ath10k_mac_tx_push_txq(hw, txq); >>>>> if (ret < 0) >>>>> break; >>>>> } >>>>> ieee80211_return_txq(hw, txq); >>>>> ath10k_htt_tx_txq_update(hw, txq); >>>>> if (ret =3D=3D -EBUSY) >>>>> break; >>>>> } >>>>> ieee80211_txq_schedule_end(hw, ac); >>>>>=20 >>>>> If my understanding is right, local->schedule_pos is used to record >>>>> the >>>>> last scheduled node and used for traversal rbtree for valid txq.=20 >>>>> There >>>>> is chance that an empty txq is feeded to return_txq and got removed >>>>> from >>>>> rbtree. The empty txq will always be the rb_first node. Then in the >>>>> following next_txq, local->schedule_pos becomes meaningless since=20 >>>>> its >>>>> rb_next will return NULL and the loop break. Only rb_first get >>>>> dequeued >>>>> during this loop. >>>>>=20 >>>>> if (!node || RB_EMPTY_NODE(node)) { >>>>> node =3D rb_first_cached(&local->active_txqs[ac]); >>>>> first =3D true; >>>>> } else >>>>> node =3D rb_next(node); >>>>=20 >>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice >>>> catch! :) >>>>=20 >>>>> How about this? The nodes on the rbtree will be dequeued and removed >>>>> from rbtree one by one until HW is busy. Please note local vt and=20 >>>>> sta >>>>> vt will not be updated since txq lock is held during this time. >>>>=20 >>>> Insertion and removal from the rbtree are relatively expensive, so=20 >>>> I'd >>>> rather not do that for every txq. I think a better way to solve this >>>> is to just defer the actual removal from the tree until >>>> ieee80211_txq_schedule_end()... Will fix that when I submit this=20 >>>> again. >>>=20 >>> Do you mean we keep the empty txqs in the rbtree until loop finishes=20 >>> and >>> remove them in ieee80211_txq_schedule_end(may be put return_txq in=20 >>> it)? >>> If it is the case, I suppose a list is needed to store the empty txqs=20 >>> so >>> as to dequeue them in ieee80211_txq_schedule_end. >>=20 >> Yeah, return_txq() would just put "to be removed" TXQs on a list, and >> schedule_end() would do the actual removal (after checking whether a=20 >> new >> packet showed up in the meantime). > > SGTM > >>=20 >>> And one more thing, >>>=20 >>>> + if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) { >>>> + if (first) >>>> + local->airtime_v_t[ac] =3D >>>> sta->airtime[ac].v_t; >>>> + else >>>> + return NULL; >>>=20 >>> As local->airtime_v_t will not be updated during loop, we don't need=20 >>> to >>> return NULL. >>=20 >> Yes we do; this is actually the break condition. I.e., stations whose >> virtual time are higher than the global time (in local->airtime_v_t)=20 >> are >> not allowed to transmit. And since we are traversing them in order,=20 >> when >> we find the first such station, we are done and can break out of the >> scheduling loop entirely (which is what we do by returning NULL). The >> other branch in the inner if() is just for the case where no stations >> are currently eligible to transmit according to this rule; here we=20 >> don't >> want to stall, so we advance the global timer so the first station >> becomes eligible... > > Yes=EF=BC=8Cthe inner if() make sure first node always get scheduled no m= atter=20 > its vt. > > To detail my concern, let's assume only two nodes in the tree and > empty nodes will be in tree until schedule_end(). In the loop and in > case hw is not busy, ath10k will drain every node next_txq returned > before asking for another txq again. Then as we are traversing to next > rb node, it is highly possible the second node is not allowed to > transmit since the global time has not been updated yet as the active > txq lock is held. At this time, only second node on the tree has data > and hw is capable of sending more data. I don't think the second node > is not valid for transmission in this situation. > > With more nodes in the tree in this situation, I think same thing > happens that all nodes except the first node are not allowed to > transmit since none of their vts are less than the global time which > is not updated in time. The loop breaks when we are checking the > second node. Yeah, in many cases we will end up throttling all but the first (couple of) node(s). This is by design; otherwise we can't ensure fairness. As long as we are making forward progress that is fine, though... -Toke