From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 829F83B29E for ; Thu, 31 Jan 2019 15:25:34 -0500 (EST) Received: by mail-wr1-x443.google.com with SMTP id t6so4704365wrr.12 for ; Thu, 31 Jan 2019 12:25:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heistp.net; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=jAzyuhWGaVj/yW0YGjwyASPH5IDkLIQc+EQdCTgaG3A=; b=QgL7eMWsmP+eexu4M+p4uXiEgWLu1P5Ls4pwUwWphtjeU0sR3OOzqpeba/SSzCpeLe l4K2QXB864sjNPAQtrj0I2TOeq6oENOmTA98FP0qlUCV5aWOowHbmMqcDk7SegD5T9pE IZNhAPzkrNl5RhpsvF54i40kBnKcnR2W0THupr2bWh5hiS9phnaZZ4xDNkkDwhGWlOgn UlU10jR1VaAXk0XOeHCkla6PtmvHstawGf66mkCDX1dnoNoSmzuszXc3ZFZfXAeR2XIo J1dJoHXfsQOiATxzZgleGCjYCcnx5jYr7t29vQazmPO0p/ec+uDeDNKoAwo74iMQAPD9 fnPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=jAzyuhWGaVj/yW0YGjwyASPH5IDkLIQc+EQdCTgaG3A=; b=YSFIADzuvYeNLWRURHt9AQtNeYoZtchoMmpQuDTYE0r68v4hyc7rJYCfqj/vWapD/y OZOiSRLPSJ1e/fFPnbL2zZVokHh2PBg9t597YBMhPsR3AMcP7yui9Sd59j3BDOX/rZ2+ 4RNzKnTcocR/v6/k781NLi7qETY/OhxGwLOCdhTvm3S7qM1xOLgMWa0ueffYBAYpzYJm /xFoGFXyi8OZh7RSjPJ8Ax7yeTmwnoHCYeg9WqQeT2qL7ytrVx2gSwgd8RXjCF8W2rGB e1XQvGAP1/h8DlOwCsAnYEyZCaYNycAkbTT/fKyPxbuiZu9hD9kIYpK6tq8vMH4SBzrP eZ2A== X-Gm-Message-State: AJcUukezjdMj3HZcyaQyhAJHIMbhTOG6/Hx5Qolr0AFnJbGme22MoDZk 2+wRhwHMVPqyQqvE0VfooplL5Q== X-Google-Smtp-Source: ALg8bN7nk0nYXKu8JozFKsf5LA6WUxRK0nqQaK/ssWvaDGPFVW8nQQDp9ZHpSXL95y9hl+aKna6+3A== X-Received: by 2002:adf:f888:: with SMTP id u8mr34492731wrp.297.1548966333560; Thu, 31 Jan 2019 12:25:33 -0800 (PST) Received: from tron.luk.heistp.net (h-1169.lbcfree.net. [185.193.85.130]) by smtp.gmail.com with ESMTPSA id a6sm315280wmh.10.2019.01.31.12.25.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Jan 2019 12:25:32 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Pete Heist In-Reply-To: <87r2csj2uk.fsf@toke.dk> Date: Thu, 31 Jan 2019 21:25:31 +0100 Cc: Cake List Content-Transfer-Encoding: quoted-printable Message-Id: <60A1337C-DE0E-43DE-B5CA-5815F615124D@heistp.net> References: <15FB76CC-44B2-496B-80EC-8D00AD2AF9B7@heistp.net> <87zhrhiwfv.fsf@toke.dk> <9540B582-7B7C-4846-BA40-54419DF109D4@heistp.net> <87r2csj2uk.fsf@toke.dk> To: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Cake] lockup with multiple cake instances on 3.16.7 X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2019 20:25:34 -0000 > On Jan 31, 2019, at 3:53 PM, Toke H=C3=B8iland-J=C3=B8rgensen = wrote: >=20 > Well, the backtrace is definitely hanging on that lock in > gnet_stats_start_copy_compat(). If it's not related to the CAKE = logging, > I guess it must be a bug in the upstream kernel; which we probably = can't > fix from the cake side anyway. >=20 > I don't suppose you can reproduce this on a newer kernel? So far it works fine on 3.10.107 (with mipsel EdgeOS build) and 4.9.0-8 = amd64, haven=E2=80=99t tried anything in between, but=E2=80=A6 printk tells me that it=E2=80=99s not locking up in = cake_dump_class_stats, but after a failure in cake_dump_stats. When = =E2=80=9Ctc qdisc=E2=80=9D is run after adding the fifth cake instance, = line 2974 is failing: PUT_TSTAT_U32(TARGET_US, = ktime_to_us(ns_to_ktime(b->cparams.target))); So the call to nla_put_u32 returns nonzero. Then it ends up at = nla_put_failure where nla_nest_cancel is called. The function returns, = but the lock is not being released by the kernel in the failure case. = The following patch =E2=80=9Cfixes it=E2=80=9D: diff --git a/sch_cake.c b/sch_cake.c index 3a26db0..ae3e16c 100644 --- a/sch_cake.c +++ b/sch_cake.c @@ -3010,6 +3010,7 @@ static int cake_dump_stats(struct Qdisc *sch, = struct gnet_dump *d) =20 nla_put_failure: nla_nest_cancel(d->skb, stats); + sch_tree_unlock(sch); return -1; } Two questions: 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five = cake instances? 2) Is calling sch_tree_unlock the right thing to do in the failure case, = or am I working around a kernel bug, and doing something that would fail = in other kernels?