From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mail.toke.dk; spf=pass smtp.mailfrom=; dkim=pass header.d=kernel.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=kernel.org policy.dmarc=quarantine Received: from sea.source.kernel.org (sea.source.kernel.org [IPv6:2600:3c0a:e001:78e:0:1991:8:25]) by mail.toke.dk (Postfix) with ESMTPS id 942C9DEE114 for ; Wed, 11 Mar 2026 02:47:18 +0100 (CET) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 973E1408A5; Wed, 11 Mar 2026 01:47:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C723CC19423; Wed, 11 Mar 2026 01:47:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773193635; bh=og1LLtXorZrs+Q+bQZo49nlyuGcPsXUkWjbetN4e8Y4=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=hC9iKbFLOvNhBKvS7YoJYNwqhg68O/CbQNlPatn8SqrBKkGxVp5RcQvmw2PPwh/vR mfPDuDbYTKiHxeVdoCym+69abXYTPJLNWRUn9hNrg43aij5TJrdIVtAgLFieEX/8h0 0dMgjklpi1+S4pqYoxk3WpsVTiZY0Dvzg/LFVAX3/8QBtxKDgGCfwhZ2B644nSVURf mvKRcJimn4BFyZjJ/nePbYCfywjRdNUHhrveQGNt+/t6CbJMVdlQvGT7kyWCtsDMnA RjjdQ3wRvSq0N3bZ2uwo5FOfUU1Ze7VWHgOtmgtiHN2SKMR+nzC6S7oUDHhfDaPNMc NxYTehWIgxo3g== Date: Tue, 10 Mar 2026 18:47:13 -0700 From: Jakub Kicinski To: Jamal Hadi Salim Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, horms@kernel.org, jiri@resnulli.us, toke@toke.dk, vinicius.gomes@intel.com, stephen@networkplumber.org, vladbu@nvidia.com, cake@lists.bufferbloat.net, bpf@vger.kernel.org, ghandatmanas@gmail.com, km.kim1503@gmail.com, security@kernel.org, Victor Nogueira Message-ID: <20260310184713.7e810431@kernel.org> In-Reply-To: <20260307212058.169511-1-jhs@mojatatu.com> References: <20260307212058.169511-1-jhs@mojatatu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: AYWBFFCCY7D3M3MWJMHNJYUGBCKN7LS6 X-Message-ID-Hash: AYWBFFCCY7D3M3MWJMHNJYUGBCKN7LS6 X-MailFrom: kuba@kernel.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Subject: [Cake] Re: [PATCH net] net/sched: Mark qdisc for deletion if graft cannot delete List-Id: Cake - FQ_codel the next generation Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Sat, 7 Mar 2026 16:20:58 -0500 Jamal Hadi Salim wrote: > Note: We tried a couple of different approaches that had smaller code > footprint but were a bit fugly. The first approach was to use recursion > on the qdisc hash table to iterate the descendants of the qdisc; however, > the challenge here is if the graph depth is "high" - we may overflow the > stack. The second approach was to use a breadth first search to achieve > the same goal; the challenge here was it was a quadratic algorithm. Lots of complexity when realistically only ingress/clsact support the unlocked operations. Can we not just take rtnl before the references and not bother all the real qdiscs with this @#%$ ? (diff just to illustrate the point not even compiled) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 4829c27446e3..21b461f3323d 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -2255,6 +2255,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n, int err; int tp_created; bool rtnl_held = false; + bool rtnl_take = false u32 flags; replay: @@ -2290,11 +2291,17 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n, } } + /* Realistically only INGRESS supports unlocked ops */ + if (parent != TC_H_INGRESS) { + rtnl_held = true; + rtnl_lock(); + } + /* Find head of filter chain. */ err = __tcf_qdisc_find(net, &q, &parent, t->tcm_ifindex, false, extack); if (err) - return err; + goto errout; if (tcf_proto_check_kind(tca[TCA_KIND], name)) { NL_SET_ERR_MSG(extack, "Specified TC filter name too long"); @@ -2306,11 +2313,12 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n, * block is shared (no qdisc found), qdisc is not unlocked, classifier * type is not specified, classifier is not unlocked. */ - if (rtnl_held || + if (rtnl_take || (q && !(q->ops->cl_ops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED)) || !tcf_proto_is_unlocked(name)) { + if (!rtnl_held) + rtnl_lock(); rtnl_held = true; - rtnl_lock(); } err = __tcf_qdisc_cl_find(q, parent, &cl, t->tcm_ifindex, extack); @@ -2451,17 +2459,16 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n, } tcf_block_release(q, block, rtnl_held); - if (rtnl_held) - rtnl_unlock(); - if (err == -EAGAIN) { /* Take rtnl lock in case EAGAIN is caused by concurrent flush * of target chain. */ - rtnl_held = true; + rtnl_take = true; /* Replay the request. */ goto replay; } + if (rtnl_held) + rtnl_unlock(); return err; errout_locked: