From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [52.28.52.200]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 029643B2A4 for ; Fri, 6 Jul 2018 09:29:32 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1530883771; bh=/FrXcT0mVuM/H7f/ZWUlkz1qmmNShunbhhbTPLZhzQA=; h=From:To:Subject:In-Reply-To:References:Date:From; b=A/yQFDMimlvH3vHK0xRJsIvJoavE9qADCmc5vw01GyIvVe2k+T+G3YGTNl+cbCwUg hjksD7ZouC1nJi+42k0ukTKuYj8zCZxijFDBMdMuYXqRLz4cjNQOAwF0wFwC85MfJD WFFt0BoHk96xxHeMB1cAm3cXjFptEKccbF8UGASkzZHDa4uvJaxHUiYzYAfzOMmvDI 1evOtX5sy5t1Df2aePSlSp/wqXZ7kJ1fTTIs0y3J4Aaibz0WMcaUZdIUmX3oGTapFz ivlE6hblop79yVjufUg1/OonwFgsy2RyIU+XE5+Gfbnsb0sSNfcEK7TBnclunREdHr eXW/Kr4GJWuxA== To: Pete Heist , cake@lists.bufferbloat.net In-Reply-To: <761C7004-247B-42B4-B56C-2527816826C7@heistp.net> References: <761C7004-247B-42B4-B56C-2527816826C7@heistp.net> Date: Fri, 06 Jul 2018 15:29:28 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87y3eoa3wn.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] lockup with cake and veth X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2018 13:29:33 -0000 Pete Heist writes: > I don=E2=80=99t know if we want to call this an issue, but... > > I=E2=80=99m seeing a lockup with cake (and also sfq, but not either pfifo= or > fq_codel), when run over veth devices. Two network namespaces are > created, one for client and one for server, each with one veth device. > Netem is added as the root qdisc with a delay of 1ms, and a leaf qdisc > may be added. Lockups occur on my box when the leaf qdisc is either > cake or sfq, and I'm running flent=E2=80=99s tcp_ndown test with >=3D 4 d= ownload > streams. Note that I happen to be running on a quad-core. > > - If no leaf qdisc is added below netem, no lockup occurs. > - If either pfifo or fq_codel is added below netem, no lockup occurs. > - If either cake or sfq is the leaf, the lockup occurs. > > The symptoms (lockup with >=3D 4 streams on a quad-core box), and the > fact that it occurs with both cake and sfq, make me think that it may > simply have to do with the code not being re-entrant, which may be the > case for veth, and this is just by design? maybe something that we > should consider fixing but wouldn=E2=80=99t be a show-stopper? But that s= hould > be confirmed. > > I=E2=80=99ll keep investigating, but am sharing the scripts I=E2=80=99m r= unning > meanwhile in case anyone else wants to look. See README.txt in the > attached... Thanks for investigating! I'll take a look later. The fact that it happens with sfq as well means it's probably not cake-specific, though, so I don't think we should hold off on the upstream submission until we've figured it out. Using leaf qdiscs with netem has been dodgy for a while IIRC... -Toke