From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mail.toke.dk; spf=pass smtp.mailfrom=; dkim=pass header.d=kernel.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=kernel.org policy.dmarc=quarantine Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by mail.toke.dk (Postfix) with ESMTPS id 651F3DFC31C for ; Fri, 13 Mar 2026 00:51:17 +0100 (CET) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 0987460142; Thu, 12 Mar 2026 23:51:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC6CFC4CEF7; Thu, 12 Mar 2026 23:51:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773359475; bh=j+iIf2CgdsCIx42yS6EMYtotC9J1oAn8bvNT3RWoS3M=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=LKuiq7syko/TtMDXlmbT70PyYi7dZ9cgv0bdOmNIlWANEN5nyZm4Idtu30XmU0WnR 93q/8VZjuf4pkNa0/eVrF/szwMugLfY11PHVLubl20eFapnRr0tDjM9SFH6Qv89dCC vpZyQGy3mko2amFNrMr16bhRO1u8bksXYYM7hlaMGwzLAwklxIQ+/ZYiVJoAoWEfgB NJtxKuvKK99JF3IXMBjdGban//UJO6e7rLyPus5Y+Fh95DLPqdxFabrzuWdRfd2WQt wgrukXXfZ6JG0ln5CLmhtkcD+yEJUB1zkr8Tm9FK74H7Kv4qA+/1AqEM+mX+cP3RVp I8eL3QvIhKmzg== Date: Thu, 12 Mar 2026 16:51:13 -0700 From: Jakub Kicinski To: Jamal Hadi Salim Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, horms@kernel.org, jiri@resnulli.us, toke@toke.dk, vinicius.gomes@intel.com, stephen@networkplumber.org, vladbu@nvidia.com, cake@lists.bufferbloat.net, bpf@vger.kernel.org, ghandatmanas@gmail.com, km.kim1503@gmail.com, security@kernel.org, Victor Nogueira Message-ID: <20260312165113.773a5f44@kernel.org> In-Reply-To: References: <20260307212058.169511-1-jhs@mojatatu.com> <20260310184713.7e810431@kernel.org> <20260311175249.54abe1b6@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: BKV32T4THBYVW4IAROWFCKDLF3WS5WGK X-Message-ID-Hash: BKV32T4THBYVW4IAROWFCKDLF3WS5WGK X-MailFrom: kuba@kernel.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Subject: [Cake] Re: [PATCH net] net/sched: Mark qdisc for deletion if graft cannot delete List-Id: Cake - FQ_codel the next generation Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 12 Mar 2026 16:36:48 -0400 Jamal Hadi Salim wrote: > > > Two of the several (I think 4!) patches we had took a similar path. I > > > am trying to remember at least one variant was bad for performance and > > > the other was unstable. Let's see if we can revive it and take a > > > closer look. BTW - none were pretty, it was maybe half the lines of > > > code but touched many things. > > > > FWIW / of course, we have to apply similar change to all(?) callers of > > __tcf_qdisc_find in cls_api. So LOC-wise it may end up also pretty long. > > And it's not going to help the already spaghetti-looking locking. But > > even if it's more LoC I quite like the idea of containing the poopy > > code to where problems originate which is the lockless filter handling. > > Fingers crossed.. > > Something like attached. > Unfortunately after running it for a few hours it reproduced. > The action code path (entered by virtue of filter code path execution) > releases the rtnl when attempting to load an action module. A parallel > qdisc operation waiting for the lock then grabs it and we hit the same > issue... > > So now we have to be more invasive and start coordinating the action > code etc, which is not appealing. Thoughts? I see. Doesn't seem entirely crazy to let tcf_proto_lookup_ops() return -EAGAIN without actually loading the module, and have it's call path (of which there are only 2?) do the module loading once all the locks are released. The call paths handle the EAGAIN and retry already they just assume tcf_proto_lookup_ops() has loaded the module so they don't have to.