Re: [Cake] [Cerowrt-devel] ingress rate limiting falling short

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* Re: [Cake] [Cerowrt-devel] ingress rate limiting falling short
       [not found]   ` <CALQXh-MSbgCJxvamiSGH0xS83Dd3v++_2a2rkdzQi4bB3nQUmQ@mail.gmail.com>
@ 2015-06-03 22:27     ` Dave Taht
  2015-06-03 22:34       ` Dave Taht
  2015-06-03 22:43       ` Aaron Wood
  0 siblings, 2 replies; 3+ messages in thread
From: Dave Taht @ 2015-06-03 22:27 UTC (permalink / raw)
  To: Aaron Wood, cake; +Cc: cerowrt-devel


[-- Attachment #1.1: Type: text/plain, Size: 3825 bytes --]

On Wed, Jun 3, 2015 at 3:16 PM, Aaron Wood <woody77@gmail.com> wrote:

>
> > On the 3800, it never meets the rate, but it's only off by maybe 5%.
>>
>>         As Jonathan pointed out already this is in the range of the
>> difference between raw rates and tcp good put, so nothing to write home
>> about ;)
>>
>
> Yeah, I'm not too worried about that 5%, based on that explanation.
>
>
>>
>> > But on my new WRT1900AC, it's wildly off, even over the same
>> performance range (I tested it from 80-220Mbps rates in 20Mbps jumps, and
>> saw from 40-150Mbps.
>>
>>         So you started with the WRT1900AC where the wndr3800 dropped off?
>> I wonder maybe the Belkin is also almost linear for the lower range?
>
>
> Yeah, good point on a methodology fail.  I'll run another series of tests
> walking up the same series of rate limits and see what I get.
>
>
>> I also note we adjust the quantum based on the rates:
>> from functions .sh:
>> get_mtu() {
>>
> ... snip
>
>> }
>>
>> which we use in the htb invocations via this indirection:
>> LQ="quantum `get_mtu $IFACE $CEIL`”
>>
>>
> That is odd, and that's quite the aggressive curve on quantum, doubling
> every 10-20Mbps.
>
> I did some math, and plotted out the quantum vs. bandwidth based on that
> snippet of code (and assuming a 1500-byte MTU):
>
>
>
> And then plotted out the corresponding time in ms that each quantum bytes
> (it is bytes, right?) is on the wire:
>
>
> Which I think is a really interesting plot (and here are the points that
> line up with the steps in the script):
>
> kbps = quantum = time
> 20000 = 3000 = 1.2ms
> 30000 = 6000 = 1.6ms
> 40000 = 12000 = 2.4ms
> 50000 = 24000 = 3.84ms
> 60000 = 48000 = 6.4ms
> 80000 = 96000 = 9.6ms
>


> So it appears that the goal of these values was to keep increases the
> quantum as rates went up to provide more bytes per operation, but that's
> going to risk adding latency as the time-per-quantum crosses the delay
> target in fq_codel (if I'm understanding this correctly).
>
> So one thing that I can do is play around with this, and see if I can keep
> that quantum time at a linear level (ie, 10ms, which seems _awfully_ long),
> or continue increasing it (which seems like a bad idea).  I'd love to hear
> from whoever put this in as to what it's goal was (or was it just
> empirically tuned?)
>

Empirical and tested only to about 60Mbits. I got back about 15% cpu to do
it this way at the time I did it on the wndr3800.

and WOW, thx for the analysis! I did not think much about this crossover
point at the time - because we'd maxed on cpu long beforehand.

I can certainly see this batching interacting with the codel target.

On the other hand, you gotta not be running out of cpu in the first place.
I am liking where cake is going.

One of my daydreams is that once we have writable custom ethernet hardware
that we can easily do hardware outbound rate limiting/shaping merely by
programming a register to return a completion interrupt at the set rate
rather than the actual rate.


> >
>> > I have no idea where to start looking for the cause.  But for now, I'm
>> just setting my ingress rate MUCH higher than I should, because it's
>> working out to the right value as a result.
>>
>>         It would be great to understand why we need to massively
>> under-shape in that situation to get decent shaping and decent latency
>> under load.
>>
>
> Agreed.
>
> -Aaron
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>


-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

[-- Attachment #1.2: Type: text/html, Size: 6543 bytes --]

[-- Attachment #2: quantum_per_kbps.png --]
[-- Type: image/png, Size: 5851 bytes --]

[-- Attachment #3: quantum_in_ms_per_kbps.png --]
[-- Type: image/png, Size: 8686 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Cake] [Cerowrt-devel] ingress rate limiting falling short
  2015-06-03 22:27     ` [Cake] [Cerowrt-devel] ingress rate limiting falling short Dave Taht
@ 2015-06-03 22:34       ` Dave Taht
  2015-06-03 22:43       ` Aaron Wood
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Taht @ 2015-06-03 22:34 UTC (permalink / raw)
  To: Aaron Wood, cake; +Cc: cerowrt-devel


[-- Attachment #1.1: Type: text/plain, Size: 4500 bytes --]

On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht <dave.taht@gmail.com> wrote:

>
>
> On Wed, Jun 3, 2015 at 3:16 PM, Aaron Wood <woody77@gmail.com> wrote:
>
>>
>> > On the 3800, it never meets the rate, but it's only off by maybe 5%.
>>>
>>>         As Jonathan pointed out already this is in the range of the
>>> difference between raw rates and tcp good put, so nothing to write home
>>> about ;)
>>>
>>
>> Yeah, I'm not too worried about that 5%, based on that explanation.
>>
>>
>>>
>>> > But on my new WRT1900AC, it's wildly off, even over the same
>>> performance range (I tested it from 80-220Mbps rates in 20Mbps jumps, and
>>> saw from 40-150Mbps.
>>>
>>>         So you started with the WRT1900AC where the wndr3800 dropped
>>> off? I wonder maybe the Belkin is also almost linear for the lower range?
>>
>>
>> Yeah, good point on a methodology fail.  I'll run another series of tests
>> walking up the same series of rate limits and see what I get.
>>
>>
>>> I also note we adjust the quantum based on the rates:
>>> from functions .sh:
>>> get_mtu() {
>>>
>> ... snip
>>
>>> }
>>>
>>> which we use in the htb invocations via this indirection:
>>> LQ="quantum `get_mtu $IFACE $CEIL`”
>>>
>>>
>> That is odd, and that's quite the aggressive curve on quantum, doubling
>> every 10-20Mbps.
>>
>> I did some math, and plotted out the quantum vs. bandwidth based on that
>> snippet of code (and assuming a 1500-byte MTU):
>>
>>
>>
>> And then plotted out the corresponding time in ms that each quantum
>> bytes (it is bytes, right?) is on the wire:
>>
>>
>> Which I think is a really interesting plot (and here are the points that
>> line up with the steps in the script):
>>
>> kbps = quantum = time
>> 20000 = 3000 = 1.2ms
>> 30000 = 6000 = 1.6ms
>> 40000 = 12000 = 2.4ms
>> 50000 = 24000 = 3.84ms
>> 60000 = 48000 = 6.4ms
>> 80000 = 96000 = 9.6ms
>>
>
>
>> So it appears that the goal of these values was to keep increases the
>> quantum as rates went up to provide more bytes per operation, but that's
>> going to risk adding latency as the time-per-quantum crosses the delay
>> target in fq_codel (if I'm understanding this correctly).
>>
>> So one thing that I can do is play around with this, and see if I can
>> keep that quantum time at a linear level (ie, 10ms, which seems _awfully_
>> long), or continue increasing it (which seems like a bad idea).  I'd love
>> to hear from whoever put this in as to what it's goal was (or was it just
>> empirically tuned?)
>>
>
> Empirical and tested only to about 60Mbits. I got back about 15% cpu to do
> it this way at the time I did it on the wndr3800.
>
> and WOW, thx for the analysis! I did not think much about this crossover
> point at the time - because we'd maxed on cpu long beforehand.
>

And most of my testing on x86 has been with this change to the htb quantum
entirely disabled and set to 1514.

the "production" sqm-scripts and my own hacked up version(s) have differed
in this respect for quite some time. (at least 6 months).

Great spot on this discrepancy.

:egg, otherwise, on face:



> I can certainly see this batching interacting with the codel target.
>
> On the other hand, you gotta not be running out of cpu in the first place.
> I am liking where cake is going.
>
> One of my daydreams is that once we have writable custom ethernet hardware
> that we can easily do hardware outbound rate limiting/shaping merely by
> programming a register to return a completion interrupt at the set rate
> rather than the actual rate.
>



>
>
>> >
>>> > I have no idea where to start looking for the cause.  But for now, I'm
>>> just setting my ingress rate MUCH higher than I should, because it's
>>> working out to the right value as a result.
>>>
>>>         It would be great to understand why we need to massively
>>> under-shape in that situation to get decent shaping and decent latency
>>> under load.
>>>
>>
>> Agreed.
>>
>> -Aaron
>>
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
>



-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

[-- Attachment #1.2: Type: text/html, Size: 8074 bytes --]

[-- Attachment #2: quantum_per_kbps.png --]
[-- Type: image/png, Size: 5851 bytes --]

[-- Attachment #3: quantum_in_ms_per_kbps.png --]
[-- Type: image/png, Size: 8686 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Cake] [Cerowrt-devel] ingress rate limiting falling short
  2015-06-03 22:27     ` [Cake] [Cerowrt-devel] ingress rate limiting falling short Dave Taht
  2015-06-03 22:34       ` Dave Taht
@ 2015-06-03 22:43       ` Aaron Wood
  1 sibling, 0 replies; 3+ messages in thread
From: Aaron Wood @ 2015-06-03 22:43 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 2759 bytes --]

On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht <dave.taht@gmail.com> wrote:

>
>
>> kbps = quantum = time
>> 20000 = 3000 = 1.2ms
>> 30000 = 6000 = 1.6ms
>> 40000 = 12000 = 2.4ms
>> 50000 = 24000 = 3.84ms
>> 60000 = 48000 = 6.4ms
>> 80000 = 96000 = 9.6ms
>>
>
>
>> So it appears that the goal of these values was to keep increases the
>> quantum as rates went up to provide more bytes per operation, but that's
>> going to risk adding latency as the time-per-quantum crosses the delay
>> target in fq_codel (if I'm understanding this correctly).
>>
>> So one thing that I can do is play around with this, and see if I can
>> keep that quantum time at a linear level (ie, 10ms, which seems _awfully_
>> long), or continue increasing it (which seems like a bad idea).  I'd love
>> to hear from whoever put this in as to what it's goal was (or was it just
>> empirically tuned?)
>>
>
> Empirical and tested only to about 60Mbits. I got back about 15% cpu to do
> it this way at the time I did it on the wndr3800.
>

Basically, increasing the quantums to get more cpu available...  So a
too-small quantum is going to be excessive cpu, and a too-large quantum is
going to be poor fairness?

> and WOW, thx for the analysis! I did not think much about this crossover
> point at the time - because we'd maxed on cpu long beforehand.
>

No problem, this is the sort of thing I _can_ help with, since I don't know
the kernel internals very well.

I can certainly see this batching interacting with the codel target.
>

Which may also explain your comments about poor fairness on my 3800 results
when up at 60-80Mbps, when htb's quantum has crossed over fq_codel's target?

> On the other hand, you gotta not be running out of cpu in the first place.
> I am liking where cake is going.
>

Yeah.  That's what I _also_ need to figure out.  Load seems "reasonable",
but load and cpu stats get reported oddly on multi-core (some things are
per-core, some are per-total available, etc).  I know I've seen the
"soft_irq" thread at 70% in top doing some tests (in the past).  I wouldn't
be surprised if this is a single-core-only bit of code?  (or can htb
processing and fq_codel processing be shoved to separate cores?)

One of my daydreams is that once we have writable custom ethernet hardware
> that we can easily do hardware outbound rate limiting/shaping merely by
> programming a register to return a completion interrupt at the set rate
> rather than the actual rate.
>

well, inbound is certainly more of an issue than outbound right now...

So, for my next rounds of tests, I can play around with different quantum
values/schemes, and also play with simple.qos vs. simplest.qos, and
instrument the whole thing to capture processor utilization vs. bandwidth.

-Aaron

[-- Attachment #2: Type: text/html, Size: 4844 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-06-03 22:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALQXh-OjiUaStSrVAOcRodA8eGCL2eExNO76Ncu-7i3JJPRPPw@mail.gmail.com>
     [not found] ` <5A699476-8E71-4D38-BABE-F755931447B5@gmx.de>
     [not found]   ` <CALQXh-MSbgCJxvamiSGH0xS83Dd3v++_2a2rkdzQi4bB3nQUmQ@mail.gmail.com>
2015-06-03 22:27     ` [Cake] [Cerowrt-devel] ingress rate limiting falling short Dave Taht
2015-06-03 22:34       ` Dave Taht
2015-06-03 22:43       ` Aaron Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox