From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adriannnpopescu@gmail.com>
Received: from mail-it1-x143.google.com (mail-it1-x143.google.com
 [IPv6:2607:f8b0:4864:20::143])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 43EB83B29E
 for <cake@lists.bufferbloat.net>; Mon, 18 Feb 2019 15:43:08 -0500 (EST)
Received: by mail-it1-x143.google.com with SMTP id m137so1132227ita.0
 for <cake@lists.bufferbloat.net>; Mon, 18 Feb 2019 12:43:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=9Kli05ww2txqmR7LFLd1pS0XvZhbfKATA3L0TMxDo4U=;
 b=aFnluOX1TFFcU2ikHxHwFbRCt+NtMsMJNTHmJOo8+mPwSvvJDnVqufa3mOiW6ctTDC
 lKyic7Xhx94YPLrhoy9JLIZbTMndahRUTDqiVK4uksggainLMKu+gv52fcBuozydIC+j
 WEy4ewKyvJP3hv01A2puuCRB9VQm810OkYBtFvyvEteUAYUENlVc/UZXbrSw8QvrOiU8
 mOdRsElS8EEdhWZ+wTVkjl59tfC5/0BpufhW/g3c8D39tn6keAlGaGKyWr4TdR8lANL/
 6TeSvHezfmXv/iBqVNSEmAQnBggF7A5nVXMXdlXJcyGX0AP326P7hxizXAx7S6jlEQ4V
 T3qw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=9Kli05ww2txqmR7LFLd1pS0XvZhbfKATA3L0TMxDo4U=;
 b=l3rU4M1ovdH6ntvLvWFaLqBNG5EF/zJivajV5K2mArPF+0reP/46n48h651kCFE63h
 tDGK8KCz9gGAkrTmThGbEEN4r7uCu71Y6sHLkaGsUgqDYxpC0w4GFgN6fGx03H+l1REu
 tlxeC5zdULFUdbiLquyr1t6z9SV0IvdusALiIa58kisgXwWBz9OHM7GLkJu1IpVn39Jc
 gRmYWiCrqE0AjI3ms8asDZPAkkV8bFri2rCP1zb4RA6SnH2s7kEnO5WJW9+7uOpD7VUs
 VF2jsOLUd5Oes+5YPPl4g1DDiA5gf0PX+SbY30VHmManqw1a+v/atz8cxdeSrUUZ9QvA
 nMMQ==
X-Gm-Message-State: AHQUAubIz+1gjFTh5ywB/fOAGzwDp7QmLfS6Y2f1/5lQFC1U8RkBCugC
 ugLCWeOx/6od+uDOK2s+ETneaMHwjHZY9sZ7iss=
X-Google-Smtp-Source: AHgI3IZF3tsfGRBj/pYgMzE+aVb/1CpVuOMJq16VpHF2Q/9lPu/5a6ne6HNEmeSl8jI6ddzMjvPVz6JkG9zzJNVTZvc=
X-Received: by 2002:a24:17:: with SMTP id 23mr516916ita.158.1550522587740;
 Mon, 18 Feb 2019 12:43:07 -0800 (PST)
MIME-Version: 1.0
References: <CAF3M4P1pqJkQMG7LOt4cTxpP4Z+SjDrm+4YXpXXaOCHL+rH4Cw@mail.gmail.com>
 <CAA93jw6Zo_STv=Aq_iks75m5bAxBUHKnsuoXX-yKbHGKVHuo=g@mail.gmail.com>
 <CAF3M4P0zLPNexzB_uKDcmNFOUTDaSvue144fGf7uPDtocPhLWQ@mail.gmail.com>
In-Reply-To: <CAF3M4P0zLPNexzB_uKDcmNFOUTDaSvue144fGf7uPDtocPhLWQ@mail.gmail.com>
From: Adrian Popescu <adriannnpopescu@gmail.com>
Date: Mon, 18 Feb 2019 22:42:56 +0200
Message-ID: <CAF3M4P319FzfzNWHO4JOVn-i4nUYRyR+HOaNDjOKg27f-ZBEYw@mail.gmail.com>
To: Dave Taht <dave.taht@gmail.com>
Cc: Cake List <cake@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary="0000000000000efcdd0582312cec"
Subject: Re: [Cake] Dropping dropped
X-BeenThere: cake@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Cake - FQ_codel the next generation <cake.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cake>,
 <mailto:cake-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cake>
List-Post: <mailto:cake@lists.bufferbloat.net>
List-Help: <mailto:cake-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cake>,
 <mailto:cake-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2019 20:43:08 -0000

--0000000000000efcdd0582312cec
Content-Type: text/plain; charset="UTF-8"

Hello,

This answers some of my own questions.

It seems the mirred and ifb combination is indeed what reduces performance
in my case. All optimizations made to fq_codel didn't help with ingress.

A simple fq_police would be a better solution for ingress than cake or
fq_codel.


On Sat, Feb 16, 2019 at 11:35 AM Adrian Popescu <adriannnpopescu@gmail.com>
wrote:

> Hello,
>
> On Fri, Feb 15, 2019 at 10:45 PM Dave Taht <dave.taht@gmail.com> wrote:
>
>> I still regard inbound shaping as our biggest deployment problem,
>> especially on cheap hardware.
>>
>> Some days I want to go back to revisiting the ideas in the "bobbie"
>> shaper, other days...
>>
>> In terms of speeding up cake:
>>
>> * At higher speeds (e.g. > 200mbit) cake tends to bottleneck on a
>> single cpu, in softirq. A lwn article just went by about a proposed
>> set of improvements for that:
>> https://lwn.net/SubscriberLink/779738/771e8f7050c26ade/
>
> Will this help devices with a single core CPU?
>
>
>>
>>
>> * Hardware multiqueue is more and more common (APU2 has 4). FQ_codel
>> is inherently parallel and could take advantage of hardware
>> multiqueue, if there was a better way to express it. What happens
>> nowadays is you get the "mq" scheduler with 4 fq_codel instances, when
>> running at line rate, but I tend to think with 64 hardware queues,
>> increasingly common in the >10GigE, having 64k fq_codel queues is
>> excessive. I'd love it if there was a way to have there be a divisor
>> in the mq -> subqdisc code so that we would have, oh, 32 queues per hw
>> queue in this case.
>>
>> Worse, there's no way to attach a global shaped instance to that
>> hardware, e.g. in cake, which forces all those hardware queues (even
>> across cpus) into one. The ingress mirred code, here, is also a
>> problem. a "cake-mq" seemed feasible (basically you just turn the
>> shaper tracking into an atomic operation in three places), but the
>> overlying qdisc architecture for sch_mq -> subqdiscs has to be
>> extended or bypassed, somehow. (there's no way for sch_mq to
>> automagically pass sub-qdisc options to the next qdisc, and there's no
>> reason to have sch_mq
>>
>
> The problem I deal with is performance on even lower end hardware with a
> single queue. My experience with mq has been limited.
>
>
>>
>> * I really liked the ingress "skb list" rework, but I'm not sure how
>> to get that from A to B.
>>
>
> What was this skb list rework? Is there a patch somewhere?
>
>
>>
>> * and I have a long standing dream of being able to kill off mirred
>> entirely and just be able to write
>>
>> tc qdisc add dev eth0 ingress cake bandwidth X
>>
>
> Ingress on its own seems to be a performance hit. Do you think this would
> reduce the performance hit?
>
>
>>
>> *  native codel is 32 bit, cake is 64 bit. I
>>
>
> Was there something else you forgot to write here?
>
>
>>
>> * hashing three times as cake does is expensive. Getting a partial
>> hash and combining it into a final would be faster.
>>
>
> Could you elaborate how this would look, please? I've read the code a
> while ago. It might be that I didn't figure out all the places where
> hashing is done.
>
>
>>
>> * 8 way set associative is slower than 4 way and almost
>> indistinguishable from 8. Even direct mapping
>>
>
> This should be easy to address by changing the 8 ways to to 4. Was there
> something else you wanted to write here?
>
>
>>
>> * The cake blue code is rarely triggered and inline
>>
>> I really did want cake to be faster than htb+fq_codel, I started a
>> project to basically ressurrect "early cake" - which WAS 40% faster
>> than htb+fq_codel and add in the idea *only* of an atomic builtin
>> hw-mq shaper a while back, but haven't got back to it.
>>
>> https://github.com/dtaht/fq_codel_fast
>>
>> with everything I ripped out in that it was about 5% less cpu to start
>> with.
>>
>
> Perhaps further improvements made to the codel_vars struct will also help
> fq_codel_fast. Do you think this could be improved further?
>
> A cake_fast might be worth a shot.
>
>
>>
>> I can't tell you how many times I've looked over
>>
>> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_mqprio.c
>>
>> hoping that enlightment would strike and there was a clean way to get
>> rid of that layer of abstraction.
>>
>> But coming up with how to run more stuff in parallel was beyond my
>> rcu-foo.
>>
>

--0000000000000efcdd0582312cec
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello,</div><div><br></div><div>This answers some of =
my own questions.<br></div><div><br></div><div>It seems the mirred and ifb =
combination is indeed what reduces performance in my case. All optimization=
s made to fq_codel didn&#39;t help with ingress.</div><div><br></div><div> =
A simple fq_police would be a better solution for ingress than cake or fq_c=
odel.<br></div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr" class=3D"gmail_attr">On Sat, Feb 16, 2019 at 11:35 AM Adrian Popes=
cu &lt;<a href=3D"mailto:adriannnpopescu@gmail.com">adriannnpopescu@gmail.c=
om</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
"><div dir=3D"ltr"><div>Hello,<br></div><br><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Fri, Feb 15, 2019 at 10:45 PM Dave Tah=
t &lt;<a href=3D"mailto:dave.taht@gmail.com" target=3D"_blank">dave.taht@gm=
ail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex">I still regard inbound shaping as our biggest deployment problem,<br=
>
especially on cheap hardware.<br>
<br>
Some days I want to go back to revisiting the ideas in the &quot;bobbie&quo=
t;<br>
shaper, other days...<br>
<br>
In terms of speeding up cake:<br>
<br>
* At higher speeds (e.g. &gt; 200mbit) cake tends to bottleneck on a<br>
single cpu, in softirq. A lwn article just went by about a proposed<br>
set of improvements for that:<br>
<a href=3D"https://lwn.net/SubscriberLink/779738/771e8f7050c26ade/" rel=3D"=
noreferrer" target=3D"_blank">https://lwn.net/SubscriberLink/779738/771e8f7=
050c26ade/</a></blockquote><div>Will this help devices with a single core C=
PU?<br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex"><br>
<br>
* Hardware multiqueue is more and more common (APU2 has 4). FQ_codel<br>
is inherently parallel and could take advantage of hardware<br>
multiqueue, if there was a better way to express it. What happens<br>
nowadays is you get the &quot;mq&quot; scheduler with 4 fq_codel instances,=
 when<br>
running at line rate, but I tend to think with 64 hardware queues,<br>
increasingly common in the &gt;10GigE, having 64k fq_codel queues is<br>
excessive. I&#39;d love it if there was a way to have there be a divisor<br=
>
in the mq -&gt; subqdisc code so that we would have, oh, 32 queues per hw<b=
r>
queue in this case.<br>
<br>
Worse, there&#39;s no way to attach a global shaped instance to that<br>
hardware, e.g. in cake, which forces all those hardware queues (even<br>
across cpus) into one. The ingress mirred code, here, is also a<br>
problem. a &quot;cake-mq&quot; seemed feasible (basically you just turn the=
<br>
shaper tracking into an atomic operation in three places), but the<br>
overlying qdisc architecture for sch_mq -&gt; subqdiscs has to be<br>
extended or bypassed, somehow. (there&#39;s no way for sch_mq to<br>
automagically pass sub-qdisc options to the next qdisc, and there&#39;s no<=
br>
reason to have sch_mq<br></blockquote><div><br></div><div>The problem I dea=
l with is performance on even lower end hardware with a single queue. My ex=
perience with mq has been limited.<br></div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex">
<br>
* I really liked the ingress &quot;skb list&quot; rework, but I&#39;m not s=
ure how<br>
to get that from A to B.<br></blockquote><div><br></div><div>What was this =
skb list rework? Is there a patch somewhere?<br></div><div>=C2=A0</div><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left=
:1px solid rgb(204,204,204);padding-left:1ex">
<br>
* and I have a long standing dream of being able to kill off mirred<br>
entirely and just be able to write<br>
<br>
tc qdisc add dev eth0 ingress cake bandwidth X<br></blockquote><div><br></d=
iv><div>Ingress on its own seems to be a performance hit. Do you think this=
 would reduce the performance hit?<br></div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex">
<br>
*=C2=A0 native codel is 32 bit, cake is 64 bit. I<br></blockquote><div><br>=
</div><div>Was there something else you forgot to write here?<br></div><div=
>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
* hashing three times as cake does is expensive. Getting a partial<br>
hash and combining it into a final would be faster.<br></blockquote><div><b=
r></div><div>Could you elaborate how this would look, please? I&#39;ve read=
 the code a while ago. It might be that I didn&#39;t figure out all the pla=
ces where hashing is done.<br></div><div>=C2=A0</div><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204=
,204,204);padding-left:1ex">
<br>
* 8 way set associative is slower than 4 way and almost<br>
indistinguishable from 8. Even direct mapping<br></blockquote><div><br></di=
v><div>This should be easy to address by changing the 8 ways to to 4. Was t=
here something else you wanted to write here?<br></div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex">
<br>
* The cake blue code is rarely triggered and inline<br>
<br>
I really did want cake to be faster than htb+fq_codel, I started a<br>
project to basically ressurrect &quot;early cake&quot; - which WAS 40% fast=
er<br>
than htb+fq_codel and add in the idea *only* of an atomic builtin<br>
hw-mq shaper a while back, but haven&#39;t got back to it.<br>
<br>
<a href=3D"https://github.com/dtaht/fq_codel_fast" rel=3D"noreferrer" targe=
t=3D"_blank">https://github.com/dtaht/fq_codel_fast</a><br>
<br>
with everything I ripped out in that it was about 5% less cpu to start with=
.<br></blockquote><div><br></div><div>Perhaps further improvements made to =
the codel_vars struct will also help fq_codel_fast. Do you think this could=
 be improved further?</div><div><br></div><div>A cake_fast might be worth a=
 shot.<br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D=
"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-le=
ft:1ex">
<br>
I can&#39;t tell you how many times I&#39;ve looked over<br>
<br>
<a href=3D"https://elixir.bootlin.com/linux/latest/source/net/sched/sch_mqp=
rio.c" rel=3D"noreferrer" target=3D"_blank">https://elixir.bootlin.com/linu=
x/latest/source/net/sched/sch_mqprio.c</a><br>
<br>
hoping that enlightment would strike and there was a clean way to get<br>
rid of that layer of abstraction.<br>
<br>
But coming up with how to run more stuff in parallel was beyond my rcu-foo.=
<br>
</blockquote></div></div>
</blockquote></div>

--0000000000000efcdd0582312cec--