On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht <dave.taht@gmail.com> wrote:

>
>
>> kbps = quantum = time
>> 20000 = 3000 = 1.2ms
>> 30000 = 6000 = 1.6ms
>> 40000 = 12000 = 2.4ms
>> 50000 = 24000 = 3.84ms
>> 60000 = 48000 = 6.4ms
>> 80000 = 96000 = 9.6ms
>>
>
>
>> So it appears that the goal of these values was to keep increases the
>> quantum as rates went up to provide more bytes per operation, but that's
>> going to risk adding latency as the time-per-quantum crosses the delay
>> target in fq_codel (if I'm understanding this correctly).
>>
>> So one thing that I can do is play around with this, and see if I can
>> keep that quantum time at a linear level (ie, 10ms, which seems _awfully_
>> long), or continue increasing it (which seems like a bad idea).  I'd love
>> to hear from whoever put this in as to what it's goal was (or was it just
>> empirically tuned?)
>>
>
> Empirical and tested only to about 60Mbits. I got back about 15% cpu to do
> it this way at the time I did it on the wndr3800.
>

Basically, increasing the quantums to get more cpu available...  So a
too-small quantum is going to be excessive cpu, and a too-large quantum is
going to be poor fairness?


> and WOW, thx for the analysis! I did not think much about this crossover
> point at the time - because we'd maxed on cpu long beforehand.
>

No problem, this is the sort of thing I _can_ help with, since I don't know
the kernel internals very well.


I can certainly see this batching interacting with the codel target.
>

Which may also explain your comments about poor fairness on my 3800 results
when up at 60-80Mbps, when htb's quantum has crossed over fq_codel's target?


> On the other hand, you gotta not be running out of cpu in the first place.
> I am liking where cake is going.
>

Yeah.  That's what I _also_ need to figure out.  Load seems "reasonable",
but load and cpu stats get reported oddly on multi-core (some things are
per-core, some are per-total available, etc).  I know I've seen the
"soft_irq" thread at 70% in top doing some tests (in the past).  I wouldn't
be surprised if this is a single-core-only bit of code?  (or can htb
processing and fq_codel processing be shoved to separate cores?)

One of my daydreams is that once we have writable custom ethernet hardware
> that we can easily do hardware outbound rate limiting/shaping merely by
> programming a register to return a completion interrupt at the set rate
> rather than the actual rate.
>

well, inbound is certainly more of an issue than outbound right now...

So, for my next rounds of tests, I can play around with different quantum
values/schemes, and also play with simple.qos vs. simplest.qos, and
instrument the whole thing to capture processor utilization vs. bandwidth.

-Aaron