[Cake] cake at 60gbit

Georgios Amanakis gamanakis at gmail.com
Thu Jul 5 19:48:07 EDT 2018


I am going to give it a try, with your patch applied tonight and report.
Thank you!

George

On Thu, Jul 5, 2018, 6:31 PM Toke Høiland-Jørgensen <toke at toke.dk> wrote:

> Toke Høiland-Jørgensen <toke at toke.dk> writes:
>
> > Jonathan Morton <chromatix99 at gmail.com> writes:
> >
> >>> On 3 Jul, 2018, at 1:23 am, Toke Høiland-Jørgensen <toke at toke.dk>
> wrote:
> >>>
> >>> My hunch is that this has something to do with the way mlx5 uses
> >>> multiple receive queues (and thus multiple CPUs). Which is probably
> >>> different from veth...
> >>
> >> At this stage I'm pretty confident it has nothing to do with Cake, and
> >> everything to do with the Mellanox hardware and driver. It does strike
> >> me that Linux' default handling of multiqueue hardware doesn't map
> >> very well to the qdisc interface.
> >
> > Well, it doesn't happen with fq_codel, so even if it is a driver bug, it
> > is being triggered by cake specifically...
>
> Right, so finally got some time to investigate this further.
>
> I suspected that cake_dequeue() was looping forever, so I added some
> debug statements to investigate this; and turns out I was right. Using
> the debug patch below, in unlimited mode I get loop aborts on loop 'i'
> for unlimited mode and loop 'l' if I enable the shaper at 70 gbit. It
> happens pretty reliably, but only when I load up the link sufficiently
> (need 4-6 TCP flows which get ~50 Gbps of total throughput).
>
> The weird thing is that what appears to be happening, is that cake
> somehow gets into a state where sch->q.qlen is >0 while all tin backlogs
> are 0. I have no clue how this happens; as far as I can tell, all
> changes to tin_backlog are paired with a change to q.qlen. The only
> thing outside of cake itself that modifies q.qlen is peek(), which is
> not being used here.
>
> I'm giving up for tonight; if anyone else has any ideas, I'm all ears.
>
> -Toke
>
> Sample debug output:
>
> [ 5456.068281] Loop counter i hit 100k; aborting! i 100001 j 0 k 180 l 3 m
> 0 qlen 2 qbkllog 33184 tin 2 deficit 172 tot backlog 0
>
> With this debug patch:
>
> @@ -1892,6 +1892,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc
> *sch)
>         u64 delay;
>         u32 len;
>
> +       int i=0,j=0,k=0,l=0,m=0;
> +
> +#define COUNT_LOOP(v) do {                     \
> +               if (++v > 100000) {             \
> +                       int tot_bkl = 0;                                \
> +                       struct cake_tin_data *t;                        \
> +                       int n;                                          \
> +                       for(n=0,t = q->tins; n < CAKE_MAX_TINS; n++,t++)
>       \
> +                               tot_bkl += t->tin_backlog;              \
> +                       net_warn_ratelimited("Loop counter " #v " hit
> 100k; aborting! i %d j %d k %d l %d m %d qlen %d qbkllog %d tin %d deficit
> %d tot backlog %d", i, j, k, l, m, sch->q.qlen, sch->qstats.backlog,
> q->cur_tin, b->tin_deficit, tot_bkl); \
> +                       return NULL;                                    \
> +               }                                                       \
> +       } while(0);
> +
>  begin:
>         if (!sch->q.qlen)
>                 return NULL;
> @@ -1912,6 +1926,7 @@ begin:
>                 /* In unlimited mode, can't rely on shaper timings, just
> balance
>                  * with DRR
>                  */
> +               i=0;
>                 while (b->tin_deficit < 0 ||
>                        !(b->sparse_flow_count + b->bulk_flow_count)) {
>                         if (b->tin_deficit <= 0)
> @@ -1923,6 +1938,7 @@ begin:
>                                 q->cur_tin = 0;
>                                 b = q->tins;
>                         }
> +                       COUNT_LOOP(i);
>                 }
>         } else {
>                 /* In shaped mode, choose:
> @@ -1960,8 +1976,10 @@ retry:
>                         head = &b->old_flows;
>                         if (unlikely(list_empty(head))) {
>                                 head = &b->decaying_flows;
> -                               if (unlikely(list_empty(head)))
> +                               if (unlikely(list_empty(head))) {
> +                                       COUNT_LOOP(j);
>                                         goto begin;
> +                               }
>                         }
>                 }
>         }
> @@ -2008,6 +2026,7 @@ retry:
>                                 flow->set = CAKE_SET_SPARSE_WAIT;
>                         }
>                 }
> +               COUNT_LOOP(k);
>                 goto retry;
>         }
>
> @@ -2050,6 +2069,7 @@ retry:
>                                 srchost->srchost_refcnt--;
>                                 dsthost->dsthost_refcnt--;
>                         }
> +                       COUNT_LOOP(l);
>                         goto begin;
>                 }
>
> @@ -2075,6 +2095,8 @@ retry:
>                 kfree_skb(skb);
>                 if (q->rate_flags & CAKE_FLAG_INGRESS)
>                         goto retry;
> +
> +               COUNT_LOOP(m);
>         }
>
>         b->tin_ecn_mark += !!flow->cvars.ecn_marked;
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20180705/8170a37e/attachment.html>


More information about the Cake mailing list