From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2001:470:dc45:1000::1]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 14D0A3B2A4 for ; Fri, 6 Jul 2018 08:05:01 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1530878697; bh=QJtdXqq5V70VpbXm61/j+4Cwxg2WZ0YojjOYPQUG7IU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Fgl74s3biCYN2AgEc0RdrTkWe1r2Zr19oGPDBJgun2pppYJ77gYgxC5m1RS/Z0STi vk6TxeSspIxxo3vz0QhjrkdleqaLutHxdg9/kiIVqvZulLz8Qo6VEEs00MHyONwvd9 amii5YxE3CedLeJhmq3Xtz00rNILGbkplyDuxXDe5Y/hm91S58uxqEgCuprjwkaVbA 5Do/+OLn1pg9GWh9xnTTvJEbnr3rBEfA1Vds03XPKT+KMNHcSelRdBA8VhbOUaOkVE LicetHXfnZB+JCJJN8GPhIJAvS7C72aTX0H2Qre+l7Xt62JX7jNy/U/QrWFQvOdaLr 82fdOVl8VDs/Q== To: Pete Heist Cc: Jonathan Morton , Cake List In-Reply-To: <8184CEEA-64C0-4CCD-A831-D90CFDC56F22@heistp.net> References: <871scligay.fsf@toke.dk> <2AE036E5-BD3D-4176-9476-9EC824EC1D18@darbyshire-bryant.me.uk> <87r2klh1fz.fsf@toke.dk> <87lgath01v.fsf@toke.dk> <52B2B44D-4382-404C-8F6D-03F12A72B11F@heistp.net> <31667353-48F2-4FAB-AC05-163680451719@toke.dk> <48ECB6C8-5D22-4785-A6CE-696D87EC5496@toke.dk> <73DD74AD-C2E7-4A12-AE49-C06D4486660E@gmail.com> <87fu10haw7.fsf@toke.dk> <8736wxco28.fsf@toke.dk> <87o9fkbtky.fsf@toke.dk> <87lgaobq0m.fsf@toke.dk> <87in5sbnuo.fsf@toke.dk> <8184CEEA-64C0-4CCD-A831-D90CFDC56F22@heistp.net> Date: Fri, 06 Jul 2018 14:04:56 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <877em8bmdz.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] cake at 60gbit X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2018 12:05:01 -0000 Pete Heist writes: >> On Jul 6, 2018, at 1:33 PM, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>=20 >> AHA! Found the culprit! >>=20 >> The bulk dequeue mechanism in sch_generic.c will dequeue a bunch of >> packets at once, then check if they belong on the same hardware txq. If >> they don't, they will be put back on a separate queue in the qdisc >> structure (sch->skb_bad_txq), and the qlen will be increased, without >> telling the qdisc about it. > > Solid, nice work! Thanks :) >> This obviously only happens on hardware with multiple TXQs, which is why >> the bug doesn't happen on veth. > > > It would be nice if veth were mq capable. > > For whatever reason, I didn=E2=80=99t see this on my i210at=E2=80=99s (1g= bit ethernet > with 4 transmit and 4 receive queues). Well, you have to hit the exact conditions; i.e., a bulk dequeue that happens to get a bunch of packets that hit different TX queues. So that depends on both the TXQ hashing, and the queue state, number of flows etc. I only get a handful of "lockups" (debug lines) on a 10-sec netperf test with 6 flows. > I=E2=80=99m now playing with netem, cake and veth for the first time (two > namespaces with netem as the parent qdisc to cake for each namespace). > I=E2=80=99ve gotten the setup not to lock up in an infinite loop but to > occasionally stop passing traffic sometimes after a netperf test. This > could easily be a problem specific to netns though, so I=E2=80=99ll be pl= aying > with it some more and will post if I can narrow it down to something > specific. Yay, more fun! :P Please do see if you can narrow this down; it would be good to fix this as well before we submit another version upstream... -Toke