From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-x235.google.com (mail-qk0-x235.google.com [IPv6:2607:f8b0:400d:c09::235]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id D746021F504; Wed, 3 Jun 2015 15:43:54 -0700 (PDT) Received: by qkhq76 with SMTP id q76so14628061qkh.2; Wed, 03 Jun 2015 15:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/1EEYi/4n91WOsZnDo9+30d8ltI8nVpehNthvnRH6No=; b=KXGPNLpU0/oqGy8okWNSwsYEZUFQMtmj6305PnR4LP12o/q0Dwd/OgEW4OLnHCowav 2VODQXjYjpq2aL3DAbDHX6PkrcvT+f+0GgzLlvVoBF1TEYQ3NvNszEZozjGFM+2kvXSF 7M3ZnmO4RC0FY1hmV83PXh5tu/q/zxj2BzSyy5aIwns7rWeq7UnqDaK1q0z68L7cFaSU ROTQ6ntr3oyc3LPWWDbrynK8Yb/afejEotJ1bbUg8azs/0kDDKRz0JBhhppisIzuM7DA 9/YyhmfOhlzdMczfGkIzfhIJzdcgk4VgiwgFHGB5bkQkmcWR3fvMB92xeEO/KxAedBL7 8wEA== MIME-Version: 1.0 X-Received: by 10.140.81.16 with SMTP id e16mr13028366qgd.75.1433371433410; Wed, 03 Jun 2015 15:43:53 -0700 (PDT) Received: by 10.96.187.71 with HTTP; Wed, 3 Jun 2015 15:43:53 -0700 (PDT) In-Reply-To: References: <5A699476-8E71-4D38-BABE-F755931447B5@gmx.de> Date: Wed, 3 Jun 2015 15:43:53 -0700 Message-ID: From: Aaron Wood To: Dave Taht Content-Type: multipart/alternative; boundary=001a11c119e21ea5dc0517a4cbb0 Cc: cake@lists.bufferbloat.net, cerowrt-devel Subject: Re: [Cake] [Cerowrt-devel] ingress rate limiting falling short X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jun 2015 22:44:23 -0000 --001a11c119e21ea5dc0517a4cbb0 Content-Type: text/plain; charset=UTF-8 On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht wrote: > > >> kbps = quantum = time >> 20000 = 3000 = 1.2ms >> 30000 = 6000 = 1.6ms >> 40000 = 12000 = 2.4ms >> 50000 = 24000 = 3.84ms >> 60000 = 48000 = 6.4ms >> 80000 = 96000 = 9.6ms >> > > >> So it appears that the goal of these values was to keep increases the >> quantum as rates went up to provide more bytes per operation, but that's >> going to risk adding latency as the time-per-quantum crosses the delay >> target in fq_codel (if I'm understanding this correctly). >> >> So one thing that I can do is play around with this, and see if I can >> keep that quantum time at a linear level (ie, 10ms, which seems _awfully_ >> long), or continue increasing it (which seems like a bad idea). I'd love >> to hear from whoever put this in as to what it's goal was (or was it just >> empirically tuned?) >> > > Empirical and tested only to about 60Mbits. I got back about 15% cpu to do > it this way at the time I did it on the wndr3800. > Basically, increasing the quantums to get more cpu available... So a too-small quantum is going to be excessive cpu, and a too-large quantum is going to be poor fairness? > and WOW, thx for the analysis! I did not think much about this crossover > point at the time - because we'd maxed on cpu long beforehand. > No problem, this is the sort of thing I _can_ help with, since I don't know the kernel internals very well. I can certainly see this batching interacting with the codel target. > Which may also explain your comments about poor fairness on my 3800 results when up at 60-80Mbps, when htb's quantum has crossed over fq_codel's target? > On the other hand, you gotta not be running out of cpu in the first place. > I am liking where cake is going. > Yeah. That's what I _also_ need to figure out. Load seems "reasonable", but load and cpu stats get reported oddly on multi-core (some things are per-core, some are per-total available, etc). I know I've seen the "soft_irq" thread at 70% in top doing some tests (in the past). I wouldn't be surprised if this is a single-core-only bit of code? (or can htb processing and fq_codel processing be shoved to separate cores?) One of my daydreams is that once we have writable custom ethernet hardware > that we can easily do hardware outbound rate limiting/shaping merely by > programming a register to return a completion interrupt at the set rate > rather than the actual rate. > well, inbound is certainly more of an issue than outbound right now... So, for my next rounds of tests, I can play around with different quantum values/schemes, and also play with simple.qos vs. simplest.qos, and instrument the whole thing to capture processor utilization vs. bandwidth. -Aaron --001a11c119e21ea5dc0517a4cbb0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht <dave.taht@gmail.com&g= t; wrote:

<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
<= div class=3D"gmail_quote">

kbps =3D quantum =3D time
20000 =3D 3000 =3D 1.2ms
30000 =3D 6000 =3D 1.6ms
40000 =3D 12000 =3D 2.4ms
50000 =3D 24000 =3D 3.84ms
=
60000 =3D 48000 =3D 6.4ms
80000 =3D 96000 =3D 9.6ms
=C2=A0
So it appears that the goal of these values was to keep increases t= he quantum as rates went up to provide more bytes per operation, but that&#= 39;s going to risk adding latency as the time-per-quantum crosses the delay= target in fq_codel (if I'm understanding this correctly).
So one thing that I can do is play around with this, and see i= f I can keep that quantum time at a linear level (ie, 10ms, which seems _aw= fully_ long), or continue increasing it (which seems like a bad idea).=C2= =A0 I'd love to hear from whoever put this in as to what it's goal = was (or was it just empirically tuned?)

Empirical and tested only to about 60Mbits= . I got back about 15% cpu to do it this way at the time I did it on the wn= dr3800.

Basically, = increasing the quantums to get more cpu available...=C2=A0 So a too-small q= uantum is going to be excessive cpu, and a too-large quantum is going to be= poor fairness?

=C2=A0
and WOW, thx for the analysis! I did not think much about this cr= ossover point at the time - because we'd maxed on cpu long beforehand.= =C2=A0

No probl= em, this is the sort of thing I _can_ help with, since I don't know the= kernel internals very well.
=C2=A0

I can certainly see this batching interac= ting with the codel target.
Which may also explain your comments about poor fairness on my= 3800 results when up at 60-80Mbps, when htb's quantum has crossed over= fq_codel's target?

=C2=A0
On the other hand, you gotta not be running ou= t of cpu in the first place. I am liking where cake is going.

Yeah.=C2=A0 That's what = I _also_ need to figure out.=C2=A0 Load seems "reasonable", but l= oad and cpu stats get reported oddly on multi-core (some things are per-cor= e, some are per-total available, etc).=C2=A0 I know I've seen the "= ;soft_irq" thread at 70% in top doing some tests (in the past).=C2=A0 = I wouldn't be surprised if this is a single-core-only bit of code? =C2= =A0(or can htb processing and fq_codel processing be shoved to separate cor= es?)

One of my daydream= s is that once we have writable custom ethernet hardware that we can easily= do hardware outbound rate limiting/shaping merely by programming a registe= r to return a completion interrupt at the set rate rather than the actual r= ate.=C2=A0

well, in= bound is certainly more of an issue than outbound right now...
So, for my next rounds of tests, I can play around with differ= ent quantum values/schemes, and also play with simple.qos vs. simplest.qos,= and instrument the whole thing to capture processor utilization vs. bandwi= dth.

-Aaron
--001a11c119e21ea5dc0517a4cbb0--