From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-x241.google.com (mail-qk0-x241.google.com [IPv6:2607:f8b0:400d:c09::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id CABBC3BA8E for ; Thu, 5 Jul 2018 21:21:42 -0400 (EDT) Received: by mail-qk0-x241.google.com with SMTP id t79-v6so5526732qke.4 for ; Thu, 05 Jul 2018 18:21:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=WhF/FlkZG+Cf+HL894H/MN7SU+sFGsZojvI1tAuqG2w=; b=f+6gj7K4llRSEAPpK+fRcTp8NRjCKjrX4c0e0tBW4tdR3txFSo7SU2SoOpgbSzRkH2 +FcO+x7NxRq3iPa5c12GZy4WuZLKZu/MptR8rq34h10P2SVUVD0bxQZgY6VMwnO5C9gb +TrgQA4/yTaWT0rHwTNIR2YZdb7sfLskQAwT8kc1cASCtFsOhCVx+oyWcntXR0eKk5Uc I7ioJDb15i0Ks5B7jM3Az+p95m09dnufHTKe0L25nPe6mYHXsFq05g2eQnNNMkN25cL3 HvFjW+R8ggvcvl9mDDUVVR2uYeMJPmL05l2VYWrPJLGQ+8YBX8Kb7ZSp8aVVEIbkVpu3 MkQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=WhF/FlkZG+Cf+HL894H/MN7SU+sFGsZojvI1tAuqG2w=; b=dKL1I9QLyL8+9Uk5qwQSvVhIjQMBykyzdSjiXzBT3p2pdsvqcSEP3KUXNktXzBXHOU JAFS9yAgMWvzfL4FSi/o3QQxrBefbRcQCkUF5F2E9W6eAms2ymGbBVl4sgkjUfVWDDOX cJV34Q6AHH5ZtM93JGP9B9mW19dCJnpS5PLeKaDCU8pnMEnVxySxCWX6mCYwD3LOs2MH 4W9CIBqXaoQ9DxNLYzclHLs8R56MaZYjNOqgzsS6DmK1IkbAsg5tELqr6To4DD44lkdJ 0S8akmhnWDmVQeukmAKjAJSV4fYCEugbHeMw9ULNj9FBaZN3t3sbvs+AKWtW/LlVAzk2 1E4A== X-Gm-Message-State: APt69E1mlTdUQAiG/tDacOzlJ54KNZrJGx5RPfRq7inyG5kh+/VP0AqO ctP2ikGYti90X8gArEVCMKA0aOELtSnD+xDoiqc= X-Google-Smtp-Source: AAOMgpf/7EQZR/fvnX/A4WXFXBngGR7HzqP9eKKW+IB6Xtq0bI6h532zeVrqhxgIwsSMBZpw1kzZgdbrDomg0IXALu8= X-Received: by 2002:a37:4d1:: with SMTP id 200-v6mr6876858qke.35.1530840102351; Thu, 05 Jul 2018 18:21:42 -0700 (PDT) MIME-Version: 1.0 References: <17AF79A0-0213-44E3-95B9-62795A644A47@heistp.net> <87lgatj13k.fsf@toke.dk> <87fu11ipir.fsf@toke.dk> <871scligay.fsf@toke.dk> <2AE036E5-BD3D-4176-9476-9EC824EC1D18@darbyshire-bryant.me.uk> <87r2klh1fz.fsf@toke.dk> <87lgath01v.fsf@toke.dk> <52B2B44D-4382-404C-8F6D-03F12A72B11F@heistp.net> <31667353-48F2-4FAB-AC05-163680451719@toke.dk> <48ECB6C8-5D22-4785-A6CE-696D87EC5496@toke.dk> <73DD74AD-C2E7-4A12-AE49-C06D4486660E@gmail.com> <87fu10haw7.fsf@toke.dk> <8736wxco28.fsf@toke.dk> In-Reply-To: From: Dave Taht Date: Thu, 5 Jul 2018 18:21:54 -0700 Message-ID: To: George Amanakis Cc: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , Cake List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] cake at 60gbit X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2018 01:21:42 -0000 0 length packet? maybe coming out of the new GSO/GRO code? check truesize also? On Thu, Jul 5, 2018 at 4:48 PM Georgios Amanakis wrot= e: > > I am going to give it a try, with your patch applied tonight and report. > Thank you! > > George > > On Thu, Jul 5, 2018, 6:31 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >> Toke H=C3=B8iland-J=C3=B8rgensen writes: >> >> > Jonathan Morton writes: >> > >> >>> On 3 Jul, 2018, at 1:23 am, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> >>> >> >>> My hunch is that this has something to do with the way mlx5 uses >> >>> multiple receive queues (and thus multiple CPUs). Which is probably >> >>> different from veth... >> >> >> >> At this stage I'm pretty confident it has nothing to do with Cake, an= d >> >> everything to do with the Mellanox hardware and driver. It does strik= e >> >> me that Linux' default handling of multiqueue hardware doesn't map >> >> very well to the qdisc interface. >> > >> > Well, it doesn't happen with fq_codel, so even if it is a driver bug, = it >> > is being triggered by cake specifically... >> >> Right, so finally got some time to investigate this further. >> >> I suspected that cake_dequeue() was looping forever, so I added some >> debug statements to investigate this; and turns out I was right. Using >> the debug patch below, in unlimited mode I get loop aborts on loop 'i' >> for unlimited mode and loop 'l' if I enable the shaper at 70 gbit. It >> happens pretty reliably, but only when I load up the link sufficiently >> (need 4-6 TCP flows which get ~50 Gbps of total throughput). >> >> The weird thing is that what appears to be happening, is that cake >> somehow gets into a state where sch->q.qlen is >0 while all tin backlogs >> are 0. I have no clue how this happens; as far as I can tell, all >> changes to tin_backlog are paired with a change to q.qlen. The only >> thing outside of cake itself that modifies q.qlen is peek(), which is >> not being used here. >> >> I'm giving up for tonight; if anyone else has any ideas, I'm all ears. >> >> -Toke >> >> Sample debug output: >> >> [ 5456.068281] Loop counter i hit 100k; aborting! i 100001 j 0 k 180 l 3= m 0 qlen 2 qbkllog 33184 tin 2 deficit 172 tot backlog 0 >> >> With this debug patch: >> >> @@ -1892,6 +1892,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc = *sch) >> u64 delay; >> u32 len; >> >> + int i=3D0,j=3D0,k=3D0,l=3D0,m=3D0; >> + >> +#define COUNT_LOOP(v) do { \ >> + if (++v > 100000) { \ >> + int tot_bkl =3D 0; = \ >> + struct cake_tin_data *t; = \ >> + int n; = \ >> + for(n=3D0,t =3D q->tins; n < CAKE_MAX_TINS; n++,= t++) \ >> + tot_bkl +=3D t->tin_backlog; = \ >> + net_warn_ratelimited("Loop counter " #v " hit 10= 0k; aborting! i %d j %d k %d l %d m %d qlen %d qbkllog %d tin %d deficit %d= tot backlog %d", i, j, k, l, m, sch->q.qlen, sch->qstats.backlog, q->cur_t= in, b->tin_deficit, tot_bkl); \ >> + return NULL; = \ >> + } = \ >> + } while(0); >> + >> begin: >> if (!sch->q.qlen) >> return NULL; >> @@ -1912,6 +1926,7 @@ begin: >> /* In unlimited mode, can't rely on shaper timings, just= balance >> * with DRR >> */ >> + i=3D0; >> while (b->tin_deficit < 0 || >> !(b->sparse_flow_count + b->bulk_flow_count)) { >> if (b->tin_deficit <=3D 0) >> @@ -1923,6 +1938,7 @@ begin: >> q->cur_tin =3D 0; >> b =3D q->tins; >> } >> + COUNT_LOOP(i); >> } >> } else { >> /* In shaped mode, choose: >> @@ -1960,8 +1976,10 @@ retry: >> head =3D &b->old_flows; >> if (unlikely(list_empty(head))) { >> head =3D &b->decaying_flows; >> - if (unlikely(list_empty(head))) >> + if (unlikely(list_empty(head))) { >> + COUNT_LOOP(j); >> goto begin; >> + } >> } >> } >> } >> @@ -2008,6 +2026,7 @@ retry: >> flow->set =3D CAKE_SET_SPARSE_WAIT; >> } >> } >> + COUNT_LOOP(k); >> goto retry; >> } >> >> @@ -2050,6 +2069,7 @@ retry: >> srchost->srchost_refcnt--; >> dsthost->dsthost_refcnt--; >> } >> + COUNT_LOOP(l); >> goto begin; >> } >> >> @@ -2075,6 +2095,8 @@ retry: >> kfree_skb(skb); >> if (q->rate_flags & CAKE_FLAG_INGRESS) >> goto retry; >> + >> + COUNT_LOOP(m); >> } >> >> b->tin_ecn_mark +=3D !!flow->cvars.ecn_marked; >> >> >> > _______________________________________________ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake --=20 Dave T=C3=A4ht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619