From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id D76853B29E for ; Thu, 31 Jan 2019 18:10:31 -0500 (EST) Received: by mail-wr1-x442.google.com with SMTP id t27so5138243wra.6 for ; Thu, 31 Jan 2019 15:10:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heistp.net; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XiSxO/j71r0khD55HtdZxTqXGG96homz2bSiQm4XmLg=; b=P/iHk9/n0/Av4pMqzoWHai1+/sW4hJJ7BjeNRUFOWaFRi7NS5pEDLFkhj8gSn+oVXe Wyz+PWyL1OCLFPmZwpbgiook6e44BitSXfrATiQ/2A+88wpoP/zPeGr2mU5+sXNy+6Rl qdOGBQLFfXQj54AfxVwckppvk9YgUeXxyricI1Pq8zkg+lPsnlekDLR1+e8RrUTAtVI2 WMMJAi8L5inszsiK1qvjFS9MbD8ITp+vMGpToQM4iCQdoYqwM0u9ZukwVcxPeuAa68kf GCJgG3idRKW7tYPcVRIQxKBVU/pLVHv9TkJjfSKJEDBNx6Oynkz+Oy5rSEWVQE/892nG ZATg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XiSxO/j71r0khD55HtdZxTqXGG96homz2bSiQm4XmLg=; b=PyaLisKaf7606AxTlCna8klov359QPGEfzYTA0nEQDjeqrd0QHm9kTNvFMQGDEuaGI SEjjznlQ67PAyVs1nPH45E/enbOY0GcdYE4a/+P6sa1W33jk7wujAhF4sH30tUtn9WFr 9JvzFwmhrvvejk8ZfutTpuclnR7VSdPLxVxqRW/CN1/yGuFZrtKKJ1F2o0LYd29ntgDc YAJuiN0hUNXtAe5H+75UsX6iOP9/cK4Kt6UXijNsvjTb1I12DkiaTbeSlr08P7fvO0Pw 5YV5XlXlQ1F1biMcDFDMeo+NXJgeGAt1ky+UgjaAxjltaIC49sf5nw5L7LzvCBdyriME A2Yg== X-Gm-Message-State: AJcUukfmRAwi9+BgNgkG2GZAqnMVxhe2IrYshNd1CmoYXKzKgEhX3Il+ UdEVE9tbMs43JwVfa7pkj+Mcsg== X-Google-Smtp-Source: ALg8bN4Ao2dRl8fe/Dhdff0qYJG5xRN8sEfWdyoJ5aia2KzlX0ppo/93QeT2CnyI4Sle7fcIzFUM9A== X-Received: by 2002:a5d:4250:: with SMTP id s16mr36448432wrr.253.1548976230925; Thu, 31 Jan 2019 15:10:30 -0800 (PST) Received: from tron.luk.heistp.net (h-1169.lbcfree.net. [185.193.85.130]) by smtp.gmail.com with ESMTPSA id p139sm472600wmd.31.2019.01.31.15.10.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Jan 2019 15:10:30 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Pete Heist In-Reply-To: <60A1337C-DE0E-43DE-B5CA-5815F615124D@heistp.net> Date: Fri, 1 Feb 2019 00:10:29 +0100 Cc: Cake List Content-Transfer-Encoding: quoted-printable Message-Id: References: <15FB76CC-44B2-496B-80EC-8D00AD2AF9B7@heistp.net> <87zhrhiwfv.fsf@toke.dk> <9540B582-7B7C-4846-BA40-54419DF109D4@heistp.net> <87r2csj2uk.fsf@toke.dk> <60A1337C-DE0E-43DE-B5CA-5815F615124D@heistp.net> To: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Cake] lockup with multiple cake instances on 3.16.7 X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2019 23:10:32 -0000 > On Jan 31, 2019, at 9:25 PM, Pete Heist wrote: >=20 > 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five = cake instances? nla_put_u32 is returning -EMSGSIZE, so the skb space in tailroom isn=E2=80= =99t large enough (per nla_put doc). After it fails, cake_dump_stats is called a second time right away, = which succeeds. I _think_ what=E2=80=99s happening here is that after it = sees -EMSGSIZE, the kernel allocates more tailroom and calls = cake_dump_stats again. This doesn=E2=80=99t happen for kernel 4.9.0, it = always succeeds, so presumably the initial size is larger. > 2) Is calling sch_tree_unlock the right thing to do in the failure = case, or am I working around a kernel bug, and doing something that = would fail in other kernels? I don=E2=80=99t think we=E2=80=99d want to do this after that same = edb09eb17 commit. :) So at a minimum, to unlock after error: nla_put_failure: nla_nest_cancel(d->skb, stats); +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 8, 0) + sch_tree_unlock(sch); +#endif return -1; } The question still is, would that break other kernel versions if an = error occurs? If the tree is unlocked again elsewhere in some kernel = versions after an error, that would end badly. It looks like in = tc_fill_qdisc (sch_api.c) that gnet_stats_finish_copy, which unlocks, is = not called if q->ops->dump_stats returns < 0. So I=E2=80=99m not sure = how it ever unlocked properly in case of error. Hrm. Do you think this is a correct patch?