On Wed, Jun 3, 2015 at 3:27 PM, Dave Taht wrote: > > >> kbps = quantum = time >> 20000 = 3000 = 1.2ms >> 30000 = 6000 = 1.6ms >> 40000 = 12000 = 2.4ms >> 50000 = 24000 = 3.84ms >> 60000 = 48000 = 6.4ms >> 80000 = 96000 = 9.6ms >> > > >> So it appears that the goal of these values was to keep increases the >> quantum as rates went up to provide more bytes per operation, but that's >> going to risk adding latency as the time-per-quantum crosses the delay >> target in fq_codel (if I'm understanding this correctly). >> >> So one thing that I can do is play around with this, and see if I can >> keep that quantum time at a linear level (ie, 10ms, which seems _awfully_ >> long), or continue increasing it (which seems like a bad idea). I'd love >> to hear from whoever put this in as to what it's goal was (or was it just >> empirically tuned?) >> > > Empirical and tested only to about 60Mbits. I got back about 15% cpu to do > it this way at the time I did it on the wndr3800. > Basically, increasing the quantums to get more cpu available... So a too-small quantum is going to be excessive cpu, and a too-large quantum is going to be poor fairness? > and WOW, thx for the analysis! I did not think much about this crossover > point at the time - because we'd maxed on cpu long beforehand. > No problem, this is the sort of thing I _can_ help with, since I don't know the kernel internals very well. I can certainly see this batching interacting with the codel target. > Which may also explain your comments about poor fairness on my 3800 results when up at 60-80Mbps, when htb's quantum has crossed over fq_codel's target? > On the other hand, you gotta not be running out of cpu in the first place. > I am liking where cake is going. > Yeah. That's what I _also_ need to figure out. Load seems "reasonable", but load and cpu stats get reported oddly on multi-core (some things are per-core, some are per-total available, etc). I know I've seen the "soft_irq" thread at 70% in top doing some tests (in the past). I wouldn't be surprised if this is a single-core-only bit of code? (or can htb processing and fq_codel processing be shoved to separate cores?) One of my daydreams is that once we have writable custom ethernet hardware > that we can easily do hardware outbound rate limiting/shaping merely by > programming a register to return a completion interrupt at the set rate > rather than the actual rate. > well, inbound is certainly more of an issue than outbound right now... So, for my next rounds of tests, I can play around with different quantum values/schemes, and also play with simple.qos vs. simplest.qos, and instrument the whole thing to capture processor utilization vs. bandwidth. -Aaron