From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <woody77@gmail.com>
Received: from mail-qk0-x235.google.com (mail-qk0-x235.google.com
	[IPv6:2607:f8b0:400d:c09::235])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id D746021F504;
	Wed,  3 Jun 2015 15:43:54 -0700 (PDT)
Received: by qkhq76 with SMTP id q76so14628061qkh.2;
	Wed, 03 Jun 2015 15:43:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=/1EEYi/4n91WOsZnDo9+30d8ltI8nVpehNthvnRH6No=;
	b=KXGPNLpU0/oqGy8okWNSwsYEZUFQMtmj6305PnR4LP12o/q0Dwd/OgEW4OLnHCowav
	2VODQXjYjpq2aL3DAbDHX6PkrcvT+f+0GgzLlvVoBF1TEYQ3NvNszEZozjGFM+2kvXSF
	7M3ZnmO4RC0FY1hmV83PXh5tu/q/zxj2BzSyy5aIwns7rWeq7UnqDaK1q0z68L7cFaSU
	ROTQ6ntr3oyc3LPWWDbrynK8Yb/afejEotJ1bbUg8azs/0kDDKRz0JBhhppisIzuM7DA
	9/YyhmfOhlzdMczfGkIzfhIJzdcgk4VgiwgFHGB5bkQkmcWR3fvMB92xeEO/KxAedBL7
	8wEA==
MIME-Version: 1.0
X-Received: by 10.140.81.16 with SMTP id e16mr13028366qgd.75.1433371433410;
	Wed, 03 Jun 2015 15:43:53 -0700 (PDT)
Received: by 10.96.187.71 with HTTP; Wed, 3 Jun 2015 15:43:53 -0700 (PDT)
In-Reply-To: <CAA93jw5TM1SaBka0Jt+C=uadbz1aAr=5x=_To9Fe5QCyLDe4Dg@mail.gmail.com>
References: <CALQXh-OjiUaStSrVAOcRodA8eGCL2eExNO76Ncu-7i3JJPRPPw@mail.gmail.com>
	<5A699476-8E71-4D38-BABE-F755931447B5@gmx.de>
	<CALQXh-MSbgCJxvamiSGH0xS83Dd3v++_2a2rkdzQi4bB3nQUmQ@mail.gmail.com>
	<CAA93jw5TM1SaBka0Jt+C=uadbz1aAr=5x=_To9Fe5QCyLDe4Dg@mail.gmail.com>
Date: Wed, 3 Jun 2015 15:43:53 -0700
Message-ID: <CALQXh-O4spRSmTjpMJ0_kyh8SucPm6vudZ03ExCZQ-Yan6zK7g@mail.gmail.com>
From: Aaron Wood <woody77@gmail.com>
To: Dave Taht <dave.taht@gmail.com>
Content-Type: multipart/alternative; boundary=001a11c119e21ea5dc0517a4cbb0
Cc: cake@lists.bufferbloat.net,
	cerowrt-devel <cerowrt-devel@lists.bufferbloat.net>
Subject: Re: [Cake] [Cerowrt-devel] ingress rate limiting falling short
X-BeenThere: cake@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Cake - FQ_codel the next generation <cake.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cake>,
	<mailto:cake-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cake>
List-Post: <mailto:cake@lists.bufferbloat.net>
List-Help: <mailto:cake-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cake>,
	<mailto:cake-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Wed, 03 Jun 2015 22:44:23 -0000

--001a11c119e21ea5dc0517a4cbb0
Content-Type: text/plain; charset=UTF-8

On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht <dave.taht@gmail.com> wrote:

>
>
>> kbps = quantum = time
>> 20000 = 3000 = 1.2ms
>> 30000 = 6000 = 1.6ms
>> 40000 = 12000 = 2.4ms
>> 50000 = 24000 = 3.84ms
>> 60000 = 48000 = 6.4ms
>> 80000 = 96000 = 9.6ms
>>
>
>
>> So it appears that the goal of these values was to keep increases the
>> quantum as rates went up to provide more bytes per operation, but that's
>> going to risk adding latency as the time-per-quantum crosses the delay
>> target in fq_codel (if I'm understanding this correctly).
>>
>> So one thing that I can do is play around with this, and see if I can
>> keep that quantum time at a linear level (ie, 10ms, which seems _awfully_
>> long), or continue increasing it (which seems like a bad idea).  I'd love
>> to hear from whoever put this in as to what it's goal was (or was it just
>> empirically tuned?)
>>
>
> Empirical and tested only to about 60Mbits. I got back about 15% cpu to do
> it this way at the time I did it on the wndr3800.
>

Basically, increasing the quantums to get more cpu available...  So a
too-small quantum is going to be excessive cpu, and a too-large quantum is
going to be poor fairness?


> and WOW, thx for the analysis! I did not think much about this crossover
> point at the time - because we'd maxed on cpu long beforehand.
>

No problem, this is the sort of thing I _can_ help with, since I don't know
the kernel internals very well.


I can certainly see this batching interacting with the codel target.
>

Which may also explain your comments about poor fairness on my 3800 results
when up at 60-80Mbps, when htb's quantum has crossed over fq_codel's target?


> On the other hand, you gotta not be running out of cpu in the first place.
> I am liking where cake is going.
>

Yeah.  That's what I _also_ need to figure out.  Load seems "reasonable",
but load and cpu stats get reported oddly on multi-core (some things are
per-core, some are per-total available, etc).  I know I've seen the
"soft_irq" thread at 70% in top doing some tests (in the past).  I wouldn't
be surprised if this is a single-core-only bit of code?  (or can htb
processing and fq_codel processing be shoved to separate cores?)

One of my daydreams is that once we have writable custom ethernet hardware
> that we can easily do hardware outbound rate limiting/shaping merely by
> programming a register to return a completion interrupt at the set rate
> rather than the actual rate.
>

well, inbound is certainly more of an issue than outbound right now...

So, for my next rounds of tests, I can play around with different quantum
values/schemes, and also play with simple.qos vs. simplest.qos, and
instrument the whole thing to capture processor utilization vs. bandwidth.

-Aaron

--001a11c119e21ea5dc0517a4cbb0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht <span dir=3D"ltr">&lt;<a href=
=3D"mailto:dave.taht@gmail.com" target=3D"_blank">dave.taht@gmail.com</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><br><di=
v class=3D"gmail_extra"><div class=3D"gmail_quote"><div><div class=3D"h5"><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><=
div class=3D"gmail_quote"><div><br></div><div>kbps =3D quantum =3D time</di=
v><div><div>20000 =3D 3000 =3D 1.2ms</div><div>30000 =3D 6000 =3D 1.6ms</di=
v><div>40000 =3D 12000 =3D 2.4ms</div><div>50000 =3D 24000 =3D 3.84ms</div>=
<div>60000 =3D 48000 =3D 6.4ms</div><div>80000 =3D 96000 =3D 9.6ms</div></d=
iv></div></div></div></blockquote><div>=C2=A0</div><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quot=
e"><div>So it appears that the goal of these values was to keep increases t=
he quantum as rates went up to provide more bytes per operation, but that&#=
39;s going to risk adding latency as the time-per-quantum crosses the delay=
 target in fq_codel (if I&#39;m understanding this correctly).</div><div><b=
r></div><div>So one thing that I can do is play around with this, and see i=
f I can keep that quantum time at a linear level (ie, 10ms, which seems _aw=
fully_ long), or continue increasing it (which seems like a bad idea).=C2=
=A0 I&#39;d love to hear from whoever put this in as to what it&#39;s goal =
was (or was it just empirically tuned?)</div></div></div></div></blockquote=
><div><br></div></div></div><div>Empirical and tested only to about 60Mbits=
. I got back about 15% cpu to do it this way at the time I did it on the wn=
dr3800.</div></div></div></div></blockquote><div><br></div><div>Basically, =
increasing the quantums to get more cpu available...=C2=A0 So a too-small q=
uantum is going to be excessive cpu, and a too-large quantum is going to be=
 poor fairness?</div><div><br></div><div>=C2=A0</div><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_qu=
ote"><div>and WOW, thx for the analysis! I did not think much about this cr=
ossover point at the time - because we&#39;d maxed on cpu long beforehand.=
=C2=A0<br></div></div></div></div></blockquote><div><br></div><div>No probl=
em, this is the sort of thing I _can_ help with, since I don&#39;t know the=
 kernel internals very well.</div><div>=C2=A0</div><div><br></div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div clas=
s=3D"gmail_quote"><div></div><div>I can certainly see this batching interac=
ting with the codel target.<br></div></div></div></div></blockquote><div><b=
r></div><div>Which may also explain your comments about poor fairness on my=
 3800 results when up at 60-80Mbps, when htb&#39;s quantum has crossed over=
 fq_codel&#39;s target?</div><div><br></div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"=
gmail_quote"><div></div><div>On the other hand, you gotta not be running ou=
t of cpu in the first place. I am liking where cake is going.<br></div></di=
v></div></div></blockquote><div><br></div><div>Yeah.=C2=A0 That&#39;s what =
I _also_ need to figure out.=C2=A0 Load seems &quot;reasonable&quot;, but l=
oad and cpu stats get reported oddly on multi-core (some things are per-cor=
e, some are per-total available, etc).=C2=A0 I know I&#39;ve seen the &quot=
;soft_irq&quot; thread at 70% in top doing some tests (in the past).=C2=A0 =
I wouldn&#39;t be surprised if this is a single-core-only bit of code? =C2=
=A0(or can htb processing and fq_codel processing be shoved to separate cor=
es?)</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><d=
iv class=3D"gmail_extra"><div class=3D"gmail_quote"><div>One of my daydream=
s is that once we have writable custom ethernet hardware that we can easily=
 do hardware outbound rate limiting/shaping merely by programming a registe=
r to return a completion interrupt at the set rate rather than the actual r=
ate.=C2=A0</div></div></div></div></blockquote><div><br></div><div>well, in=
bound is certainly more of an issue than outbound right now...</div><div><b=
r></div><div>So, for my next rounds of tests, I can play around with differ=
ent quantum values/schemes, and also play with simple.qos vs. simplest.qos,=
 and instrument the whole thing to capture processor utilization vs. bandwi=
dth.</div><div><br></div><div>-Aaron</div></div></div></div>

--001a11c119e21ea5dc0517a4cbb0--