[Codel] [PATCH] Preliminary codel implementation

Dave Taht dave.taht at gmail.com
Thu May 3 14:35:25 PDT 2012


On Thu, May 3, 2012 at 1:47 PM, Jim Gettys
> There is a note in the article at this point in its exposition of the
> pseudo code:
>
> "Note that for embedded systems or kernel implementation that the
> inverse *sqrt* can be computed efficiently using only integer
> multiplication."
>
> That will likely be faster than the division by sqrt; if/when this
> optimisation is done, we should leave comments about why it is division
> by a square root, to correspond to TCP's control law.
>
> Before we upstream this, I'd recommend taking the comments documenting
> the algorithm from the article and including them.  I think Kathie had
> to trim on the slides.
>                                    - Jim


1) My own hurry is because I want to see this work outside of a model,
before monday.

2) I strongly believe in correctness first, optimizations last. I
guess I'm the only guy here that kind of wishes we were using floating
point for this function!

3) I also would massively prefer that we measure queue length from
actual ingress (eg the application or incoming network card),
to egress. I regard scribbling on the cb as a hack, I'd rather use a
normalized hardware timestamp gained on ingres.

There are 3+ pages of code that need to be traversed to get from there
to there,  and those delays are significant (as we proved last year)
and not accounted for in a fluid model.

4) Despite eric's request that I rebase on net-next, there's plenty of
time for that, after the code is proven correct, and stable.

I'm very glad we're at a point where it will be very easy to port
forward to net-next, but I'm totally unwilling to cope with random
breakage that goes with stepping over that edge.

I have a dozen machines in the lab, setup on 3.3.4, ready to let me do
head to head comparisons of various technologies in play, on three
different architectures, using 3 forms of wireless and 2 forms of
ethernet.

(If I can beg pity from anyone, one of those arches has 300+ out of
tree patches and it's a cast iron bitch to work on net-next in it)

Yes, we absolutely want something to go upstream. It doesn't need to
happen next week or even next month. I'd like a sane API and naming
scheme, and a man page to go with it.

I happen to be fond of ECN for long RTTs, and think it has
great potential for wireless, I know others are not,
and the code structure turned out to be hard to put that in...

I didn't get sufficient time to evaluate sfq+red before it went
upstream, and I really regret that.

There is plenty of time in net-next's upcoming window, regardless,
to get something into linux. I'd like to see someone working on BSD,
and on a ns3 model that can integrate with the wifi model.

Let's try to do some quality work here!

I'd like very much to explore, for example, what happens in a massive
dropping scenario, such as what eric alludes to might happen on bql +
gigE. And to have effective measurements of what the actual overhead
of a sqrt and divide are - so far as I know on modern arches it's like
11-25 clocks, much of which is lost in the noise on superscalar
architecture. This particular portion of the algorithm is not
triggered particularly often.

I'd like to run a real 1000 streams of real ledbat through it, voip,
web access, and games... play with prioritization and FQ techniques,
etc.

but first up the simplest possible implementation that's correct?
OK?

Measure first. Get it correct. Then analyze the real behavior. Then
optimize.... Wash, rinse, repeat. This is such an old lesson I can't
believe you guys are so over focused on it! A sqrt seems correct to me
for tcp traffic but what about everything else?

I very much appreciate the code review that happened earlier today,
and hopefully my next attempt will be closer to correct.

(In particular I hadn't realized how gnarly dealing with ktime was)

But: I assure you all I NEVER get right on the first couple of tries.

And I wasn't planning on being the guy to write the code, I just
couldn't sleep, not seeing, feeling, experiencing, it work.

So I'd like to setup a git repo for this and iproute2 ASAP.

I'd be willing to wait til after monday and just rely on code review.


-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net


More information about the Codel mailing list