From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Pete Heist <pete@heistp.net>
Cc: Cake List <cake@lists.bufferbloat.net>
Subject: Re: [Cake] lockup with multiple cake instances on 3.16.7
Date: Fri, 01 Feb 2019 00:09:25 +0100 [thread overview]
Message-ID: <87h8doifve.fsf@toke.dk> (raw)
In-Reply-To: <60A1337C-DE0E-43DE-B5CA-5815F615124D@heistp.net>
Pete Heist <pete@heistp.net> writes:
>> On Jan 31, 2019, at 3:53 PM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Well, the backtrace is definitely hanging on that lock in
>> gnet_stats_start_copy_compat(). If it's not related to the CAKE logging,
>> I guess it must be a bug in the upstream kernel; which we probably can't
>> fix from the cake side anyway.
>>
>> I don't suppose you can reproduce this on a newer kernel?
>
> So far it works fine on 3.10.107 (with mipsel EdgeOS build) and
> 4.9.0-8 amd64, haven’t tried anything in between, but…
>
> printk tells me that it’s not locking up in cake_dump_class_stats, but
> after a failure in cake_dump_stats. When “tc qdisc” is run after
> adding the fifth cake instance, line 2974 is failing:
>
> PUT_TSTAT_U32(TARGET_US,
> ktime_to_us(ns_to_ktime(b->cparams.target)));
>
> So the call to nla_put_u32 returns nonzero. Then it ends up at
> nla_put_failure where nla_nest_cancel is called. The function returns,
> but the lock is not being released by the kernel in the failure case.
> The following patch “fixes it”:
Ah, good find!
> diff --git a/sch_cake.c b/sch_cake.c
> index 3a26db0..ae3e16c 100644
> --- a/sch_cake.c
> +++ b/sch_cake.c
> @@ -3010,6 +3010,7 @@ static int cake_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
>
> nla_put_failure:
> nla_nest_cancel(d->skb, stats);
> + sch_tree_unlock(sch);
> return -1;
> }
>
> Two questions:
>
> 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five
> cake instances?
Probably because it's running out of kernel memory? How much system
memory do you have on the system you are testing this on?
> 2) Is calling sch_tree_unlock the right thing to do in the failure
> case, or am I working around a kernel bug, and doing something that
> would fail in other kernels?
Yes, I think you are working around a kernel bug. See
https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_api.c#L1330
The lock is taken in gnet_stats_start_copy_compat() and released in
gnet_stats_finish_copy(). The latter is skipped in the failure path. It
seems this bug is present all the way up to Eric's change to remove the
locking entirely (which went into 4.8). So I guess you could get a patch
accepted for the stable trees in 3.16 and 4.4; not that this would help
you much if you are stuck on 3.16.7...
-Toke
next prev parent reply other threads:[~2019-01-31 23:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-30 8:58 Pete Heist
2019-01-30 22:59 ` Toke Høiland-Jørgensen
2019-01-31 7:15 ` Pete Heist
2019-01-31 14:53 ` Toke Høiland-Jørgensen
2019-01-31 20:25 ` Pete Heist
2019-01-31 23:09 ` Toke Høiland-Jørgensen [this message]
2019-01-31 23:18 ` Pete Heist
2019-01-31 23:10 ` Pete Heist
2019-01-31 23:18 ` Toke Høiland-Jørgensen
2019-01-31 23:23 ` Pete Heist
2019-01-31 23:31 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/cake.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h8doifve.fsf@toke.dk \
--to=toke@redhat.com \
--cc=cake@lists.bufferbloat.net \
--cc=pete@heistp.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox