From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 10D283B29D for ; Thu, 21 Mar 2024 01:18:21 -0400 (EDT) Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2d2509c66daso8601711fa.3 for ; Wed, 20 Mar 2024 22:18:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710998298; x=1711603098; darn=lists.bufferbloat.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LnrBMqyjCyHY4FVrJLZeL/0aHArS/CRSPl+rQl7kV7c=; b=IPqMeOsAeWHpKhZdxEQrAvIb2o9Z2lkXEtMgX2eQoUK5btoMEyDuwKcWq13H6DfStn ZvrOcgAi8hIomAAosUrqDFpX5pgBvYzMW7hB2mh9XPcduDOIpIK9C+LAtgEH70v8Vuf7 gyD/WdmPa+wb4JQtfU1cRVGk0WhVdauS/9usvARVHX7hPNgEmBqsCL8uQ2InfMkWqhb8 6zowBR8+hXUr21wAGTDiKY6z6OI575EFwKTqN9LRHU6s4OtJIY5Ga7XDPVygyEUeg3mp YBVmDzFrhcwIqpF2mBufiOcNM5LNyMJZ0Ge3WEERutth6J3O19WrrVt9WiOJ7z897vlQ CXfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710998298; x=1711603098; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LnrBMqyjCyHY4FVrJLZeL/0aHArS/CRSPl+rQl7kV7c=; b=wQxrLfernzOZgdStuOlHwc+S/+yUAWFqy1Z8TwGvDSDz4rrWJR0N5PH/t36ZNNYsvm YagbpyDJ/nOQUzfp5jcj/7XyMoD9eHHX68od/3U0P+8UJT9CD+xMGeV8vGc9+yBtQp/i qY6nZIDI5LP/Ri+MzfMaF35H8nydqCM9QObQAKG6/E55YtGb/zxbtKEBUJCiLqEO8pWG Dm9c6l86+HFoNpdEjmSbXPquZLuF21/AOw73guenzvHsB3DQigG0fGtJM9EKpno4D6D3 tXC8OjSgWRdc8hscyKAPM2eq0AmA7qZ8v0hetsUlJ7RpQIpTDuz1BwjsjRiJXfuBGqQn gvNg== X-Gm-Message-State: AOJu0Yzj5R778PZuE6fkyUfYlHK1P7Ny1UfoNjkPY1b4bgHvVNlGcRyj PLIuAnLczKB3l6lv/MX7RX7LViKdiS9O4NDeojhjBg85oVjbMfnhUVWVL+Qb39KySE4D0OJ1IXN O7w277B2BMBuoU9xWIQjdEVak5R0wVsEt X-Google-Smtp-Source: AGHT+IH0B32iAUELty7rwfMVLlEEbzL5F7VN2hYC2imHJVCeL/jUlb3EsdjyMpCuZ6IAzQY54ArqcpdWyWYdZJxSfzE= X-Received: by 2002:a2e:97cd:0:b0:2d4:a041:874d with SMTP id m13-20020a2e97cd000000b002d4a041874dmr540688ljj.48.1710998298466; Wed, 20 Mar 2024 22:18:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Thu, 21 Mar 2024 01:18:06 -0400 Message-ID: To: libreqos Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [LibreQoS] Fwd: [PATCH v5 net 0/3] Report RCU QS for busy network kthreads X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Many ISPs need the kinds of quality shaping cake can do List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2024 05:18:21 -0000 ---------- Forwarded message --------- From: Yan Zhai Date: Tue, Mar 19, 2024 at 4:44=E2=80=AFPM Subject: [PATCH v5 net 0/3] Report RCU QS for busy network kthreads To: Cc: David S. Miller , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jiri Pirko , Simon Horman , Daniel Borkmann , Lorenzo Bianconi , Coco Li , Wei Wang , Alexander Duyck , , , , , Joel Fernandes , Paul E. McKenney , Toke H=C3=B8iland-J=C3=B8rgensen , Alexei Starovoitov , Steven Rostedt , , Jesper Dangaard Brouer , Sebastian Andrzej Siewior This changeset fixes a common problem for busy networking kthreads. These threads, e.g. NAPI threads, typically will do: * polling a batch of packets * if there are more work, call cond_resched() to allow scheduling * continue to poll more packets when rx queue is not empty We observed this being a problem in production, since it can block RCU tasks from making progress under heavy load. Investigation indicates that just calling cond_resched() is insufficient for RCU tasks to reach quiescent states. This also has the side effect of frequently clearing the TIF_NEED_RESCHED flag on voluntary preempt kernels. As a result, schedule() will not be called in these circumstances, despite schedule() in fact provides required quiescent states. This at least affects NAPI threads, napi_busy_loop, and also cpumap kthread. By reporting RCU QSes in these kthreads periodically before cond_resched, t= he blocked RCU waiters can correctly progress. Instead of just reporting QS fo= r RCU tasks, these code share the same concern as noted in the commit d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when saf= e"). So report a consolidated QS for safety. It is worth noting that, although this problem is reproducible in napi_busy_loop, it only shows up when setting the polling interval to as hi= gh as 2ms, which is far larger than recommended 50us-100us in the documentatio= n. So napi_busy_loop is left untouched. Lastly, this does not affect RT kernels, which does not enter the scheduler through cond_resched(). Without the mentioned side effect, schedule() will be called time by time, and clear the RCU task holdouts. V4: https://lore.kernel.org/bpf/cover.1710525524.git.yan@cloudflare.com/ V3: https://lore.kernel.org/lkml/20240314145459.7b3aedf1@kernel.org/t/ V2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/ V1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t changes since v4: * polished comments and docs for the RCU helper as Paul McKenney suggested changes since v3: * fixed kernel-doc errors changes since v2: * created a helper in rcu header to abstract the behavior * fixed cpumap kthread in addition changes since v1: * disable preemption first as Paul McKenney suggested Yan Zhai (3): rcu: add a helper to report consolidated flavor QS net: report RCU QS on threaded NAPI repolling bpf: report RCU QS in cpumap kthread include/linux/rcupdate.h | 31 +++++++++++++++++++++++++++++++ kernel/bpf/cpumap.c | 3 +++ net/core/dev.c | 3 +++ 3 files changed, 37 insertions(+) -- 2.30.2 --=20 https://www.youtube.com/watch?v=3DN0Tmvv5jJKs Epik Mellon Podcast Dave T=C3=A4ht CSO, LibreQos