[Cake] cake at 60gbit
Dave Taht
dave.taht at gmail.com
Thu Jul 5 21:21:54 EDT 2018
0 length packet? maybe coming out of the new GSO/GRO code?
check truesize also?
On Thu, Jul 5, 2018 at 4:48 PM Georgios Amanakis <gamanakis at gmail.com> wrote:
>
> I am going to give it a try, with your patch applied tonight and report.
> Thank you!
>
> George
>
> On Thu, Jul 5, 2018, 6:31 PM Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>>
>> Toke Høiland-Jørgensen <toke at toke.dk> writes:
>>
>> > Jonathan Morton <chromatix99 at gmail.com> writes:
>> >
>> >>> On 3 Jul, 2018, at 1:23 am, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>> >>>
>> >>> My hunch is that this has something to do with the way mlx5 uses
>> >>> multiple receive queues (and thus multiple CPUs). Which is probably
>> >>> different from veth...
>> >>
>> >> At this stage I'm pretty confident it has nothing to do with Cake, and
>> >> everything to do with the Mellanox hardware and driver. It does strike
>> >> me that Linux' default handling of multiqueue hardware doesn't map
>> >> very well to the qdisc interface.
>> >
>> > Well, it doesn't happen with fq_codel, so even if it is a driver bug, it
>> > is being triggered by cake specifically...
>>
>> Right, so finally got some time to investigate this further.
>>
>> I suspected that cake_dequeue() was looping forever, so I added some
>> debug statements to investigate this; and turns out I was right. Using
>> the debug patch below, in unlimited mode I get loop aborts on loop 'i'
>> for unlimited mode and loop 'l' if I enable the shaper at 70 gbit. It
>> happens pretty reliably, but only when I load up the link sufficiently
>> (need 4-6 TCP flows which get ~50 Gbps of total throughput).
>>
>> The weird thing is that what appears to be happening, is that cake
>> somehow gets into a state where sch->q.qlen is >0 while all tin backlogs
>> are 0. I have no clue how this happens; as far as I can tell, all
>> changes to tin_backlog are paired with a change to q.qlen. The only
>> thing outside of cake itself that modifies q.qlen is peek(), which is
>> not being used here.
>>
>> I'm giving up for tonight; if anyone else has any ideas, I'm all ears.
>>
>> -Toke
>>
>> Sample debug output:
>>
>> [ 5456.068281] Loop counter i hit 100k; aborting! i 100001 j 0 k 180 l 3 m 0 qlen 2 qbkllog 33184 tin 2 deficit 172 tot backlog 0
>>
>> With this debug patch:
>>
>> @@ -1892,6 +1892,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
>> u64 delay;
>> u32 len;
>>
>> + int i=0,j=0,k=0,l=0,m=0;
>> +
>> +#define COUNT_LOOP(v) do { \
>> + if (++v > 100000) { \
>> + int tot_bkl = 0; \
>> + struct cake_tin_data *t; \
>> + int n; \
>> + for(n=0,t = q->tins; n < CAKE_MAX_TINS; n++,t++) \
>> + tot_bkl += t->tin_backlog; \
>> + net_warn_ratelimited("Loop counter " #v " hit 100k; aborting! i %d j %d k %d l %d m %d qlen %d qbkllog %d tin %d deficit %d tot backlog %d", i, j, k, l, m, sch->q.qlen, sch->qstats.backlog, q->cur_tin, b->tin_deficit, tot_bkl); \
>> + return NULL; \
>> + } \
>> + } while(0);
>> +
>> begin:
>> if (!sch->q.qlen)
>> return NULL;
>> @@ -1912,6 +1926,7 @@ begin:
>> /* In unlimited mode, can't rely on shaper timings, just balance
>> * with DRR
>> */
>> + i=0;
>> while (b->tin_deficit < 0 ||
>> !(b->sparse_flow_count + b->bulk_flow_count)) {
>> if (b->tin_deficit <= 0)
>> @@ -1923,6 +1938,7 @@ begin:
>> q->cur_tin = 0;
>> b = q->tins;
>> }
>> + COUNT_LOOP(i);
>> }
>> } else {
>> /* In shaped mode, choose:
>> @@ -1960,8 +1976,10 @@ retry:
>> head = &b->old_flows;
>> if (unlikely(list_empty(head))) {
>> head = &b->decaying_flows;
>> - if (unlikely(list_empty(head)))
>> + if (unlikely(list_empty(head))) {
>> + COUNT_LOOP(j);
>> goto begin;
>> + }
>> }
>> }
>> }
>> @@ -2008,6 +2026,7 @@ retry:
>> flow->set = CAKE_SET_SPARSE_WAIT;
>> }
>> }
>> + COUNT_LOOP(k);
>> goto retry;
>> }
>>
>> @@ -2050,6 +2069,7 @@ retry:
>> srchost->srchost_refcnt--;
>> dsthost->dsthost_refcnt--;
>> }
>> + COUNT_LOOP(l);
>> goto begin;
>> }
>>
>> @@ -2075,6 +2095,8 @@ retry:
>> kfree_skb(skb);
>> if (q->rate_flags & CAKE_FLAG_INGRESS)
>> goto retry;
>> +
>> + COUNT_LOOP(m);
>> }
>>
>> b->tin_ecn_mark += !!flow->cvars.ecn_marked;
>>
>>
>>
> _______________________________________________
> Cake mailing list
> Cake at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
More information about the Cake
mailing list