From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id E17E83B29E for ; Thu, 31 Jan 2019 18:18:15 -0500 (EST) Received: by mail-wr1-x42f.google.com with SMTP id p7so5203241wru.0 for ; Thu, 31 Jan 2019 15:18:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heistp.net; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=aVyDllO0bB2DrQhGeQZSA8ybaLxpMoGJM9UNoUOKJQ4=; b=iLIWcE7rQCj/iOItA8T4Wsh9tI4Te3xGzxK94lrzbvgg+zsvPeSEQdMtrB0NnjhnRL UHLu3djGd0Yo5brPki9/XbZRrRomYtiQcXkabcElWFcbRq1sh6zGA0mfPjT69htOr46j mNpWqYJOruz1D0ADeA3sG+r6gN28tLJGgtU5mVVNNrmsFfT8yKQGHRBg/6uh5UiGlGSF sMRmpNHIY+CZqJYVIDH4Rq2AtRMTYFnE4Yv12rKvCcIf9uBg77PyOCa6qc5WW6TdGuWK A5yAJA81+4Qkkrz7WmUN3USkmJRmynGg9kJqfNWPDt9IcEx2DbsElVaTVVr0E4cnO/Cw /GEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=aVyDllO0bB2DrQhGeQZSA8ybaLxpMoGJM9UNoUOKJQ4=; b=ctOYaVbVRlS7KyfkrFGInFkKSgJPhX8ZLvUqxhkvTerSUiHirfD2+MpWJLvEQkzrWr zT/ZAlhl2ufaaSOI9epuQ8ZDaqArw0qlfcSkzVv/6EJysUzMoLGFIPM6oogfIZbwzHbJ Z+AwALinyzF30GljxlpPeoM5s4i6DMyIJpWNG2yMrKkAyde8OXnD3sdZXm906K+twZgV KEw+FXOsa86LpebixVLZPc+w1EoxdCHFkuA5Joei6lLH8NPhmBzKt79HWtqDgcRp6GWx B9CWue1052dVosOFS5vyBkOSVprjEeeUGR9ldPUrvj6g8toaD4qbeeERN+kN0jDxTahQ 11Kg== X-Gm-Message-State: AJcUukfw6E0L+isjRc/Y33bumfg4MB2dhXk7SsCCqtUSxfkNl/6pitPE w745vN+aXjxJlC1BmVeRiabmDIPxWkw= X-Google-Smtp-Source: ALg8bN61DPg08raT6y+sgGWx5V0gF69hirISjUvM0izuCIE+o6YVwqr3qAormOz7wlLkD9F35rCKRg== X-Received: by 2002:adf:ea11:: with SMTP id q17mr34761392wrm.328.1548976695014; Thu, 31 Jan 2019 15:18:15 -0800 (PST) Received: from tron.luk.heistp.net (h-1169.lbcfree.net. [185.193.85.130]) by smtp.gmail.com with ESMTPSA id y12sm420899wmi.7.2019.01.31.15.18.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Jan 2019 15:18:14 -0800 (PST) From: Pete Heist Message-Id: <002E991C-EE0E-4288-B18A-D0FD7BF3152F@heistp.net> Content-Type: multipart/alternative; boundary="Apple-Mail=_52A310DE-7801-476B-904A-C14F3D2B8C8B" Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Date: Fri, 1 Feb 2019 00:18:13 +0100 In-Reply-To: <87h8doifve.fsf@toke.dk> Cc: Cake List To: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= References: <15FB76CC-44B2-496B-80EC-8D00AD2AF9B7@heistp.net> <87zhrhiwfv.fsf@toke.dk> <9540B582-7B7C-4846-BA40-54419DF109D4@heistp.net> <87r2csj2uk.fsf@toke.dk> <60A1337C-DE0E-43DE-B5CA-5815F615124D@heistp.net> <87h8doifve.fsf@toke.dk> X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Cake] lockup with multiple cake instances on 3.16.7 X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2019 23:18:16 -0000 --Apple-Mail=_52A310DE-7801-476B-904A-C14F3D2B8C8B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Feb 1, 2019, at 12:09 AM, Toke H=C3=B8iland-J=C3=B8rgensen = wrote: >>=20 >> 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding = five >> cake instances? >=20 > Probably because it's running out of kernel memory? How much system > memory do you have on the system you are testing this on? Plenty of memory (used 131308, free 1911900). I=E2=80=99m guessing this = was by design where earlier kernels allocated a smaller initial size for = tail space, but that=E2=80=99s only a guess as I haven=E2=80=99t found = where that=E2=80=99s done. >> 2) Is calling sch_tree_unlock the right thing to do in the failure >> case, or am I working around a kernel bug, and doing something that >> would fail in other kernels? >=20 > Yes, I think you are working around a kernel bug. See > = https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_api.c#L1330 = >=20 > The lock is taken in gnet_stats_start_copy_compat() and released in > gnet_stats_finish_copy(). The latter is skipped in the failure path. = It > seems this bug is present all the way up to Eric's change to remove = the > locking entirely (which went into 4.8). So I guess you could get a = patch > accepted for the stable trees in 3.16 and 4.4; not that this would = help > you much if you are stuck on 3.16.7=E2=80=A6 Hehe, =E2=80=9Ccrossing the streams=E2=80=9D here. :) That=E2=80=99s = what I gathered after looking at that code for a while, but I=E2=80=99m = glad to be sure about it. Would you accept my workaround in cake_dump_stats, or rather not?= --Apple-Mail=_52A310DE-7801-476B-904A-C14F3D2B8C8B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On = Feb 1, 2019, at 12:09 AM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@redhat.com> = wrote:

1) Why is nla_put_u32 = suddenly failing for TARGET_US after adding five
cake = instances?

Probably because it's running out of kernel memory? How much = system
memory do you = have on the system you are testing this on?

Plenty of = memory (used 131308, free 1911900). I=E2=80=99m guessing this = was by design where earlier kernels allocated a smaller initial size for = tail space, but that=E2=80=99s only a guess as I haven=E2=80=99t found = where that=E2=80=99s done.

2) Is = calling sch_tree_unlock the right thing to do in the failure
case, or am I working around a kernel bug, and doing = something that
would fail in other kernels?

Yes, I think you are working around a kernel bug. = See
https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_a= pi.c#L1330

The lock is = taken in gnet_stats_start_copy_compat() and released in
gnet_stats_finish_copy(). The = latter is skipped in the failure path. It
seems this bug is present all the way up to Eric's change to = remove the
locking = entirely (which went into 4.8). So I guess you could get a = patch
accepted for = the stable trees in 3.16 and 4.4; not that this would help
you much if you are stuck on = 3.16.7=E2=80=A6

Hehe, =E2=80=9Ccrossing the streams=E2=80=9D = here. :) That=E2=80=99s what I gathered after looking at that code for a = while, but I=E2=80=99m glad to be sure about it.

Would you accept my workaround in = cake_dump_stats, or rather not?
= --Apple-Mail=_52A310DE-7801-476B-904A-C14F3D2B8C8B--