[Codel] The next slice of cake

Dave Taht dave.taht at gmail.com
Mon Mar 30 15:28:45 EDT 2015


On Mon, Mar 30, 2015 at 10:28 AM, Jonathan Morton <chromatix99 at gmail.com> wrote:
> I made a lot of progress over the weekend, and got cake3 running on a sacrificial box today.  No crashes yet!

Preliminary patches always appreciated. I plan to do a build thursday
for a test cycle this weekend of various things.

> The set-associative hashing is in, currently hard-coded to 8-way, which should be more than adequate for typical home workloads (especially if BitTorrent is marked as background).  I’m now adding statistics outputs so that we can all see how well it works under real loads, distinguishing between direct hits (the fast path, where the hash points directly to a correctly allocated queue), indirect hits (where a search through the set locates an allocated queue), misses (where an empty queue was allocated) and collisions (where all the queues were already occupied by other flows).  Note that “allocation” in this case simply means updating the tag on the queue to match the packet, not allocating memory.  These statistics should help with tuning the number of flow queues in each class.
>
> The new Diffserv logic is also in and apparently working well.  I’m able to observe both the priority-balanced and bandwidth-balanced behaviours under appropriate circumstances.  It is, however, fairly sensitive to the baseline quantum value from which the outer DRR weights are derived.  Setting it to the MTU lets the high-priority classes burst too much, temporarily starving the lower classes.  It seems to work a lot better using 256 as a baseline, but I hope the overhead of going around the DRR loop several times per packet isn’t too high.  It might be sane to vary the baseline quantum based on the bandwidth setting, but we can experiment with that later.
>
>> We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'.
>
> The buffer-occupancy queue limit is in, using the field quoted above.  Previously, it would just count packets.  As with previous versions of cake (and fq_codel), it checks the limit immediately after enqueuing each packet, and drops one packet from the head of the longest queue (by byte count) if it’s violated.

did this include the overload protection fallback to dropping rather
than ecn-ing packets?

> Calculating the appropriate size to use here was an interesting problem.  I start by estimating the BDP (from the shaper bandwidth and the Codel interval), quadrupling that and enforcing a lower limit of 64KB.  If it’s set to “unlimited bandwidth”, I set the size to an arbitrary 1MB.  Then I look at the packet limit, multiply that by the MTU and clamp the

Hmm..

resulting size down to that - providing a straightforward mechanism to
set a limit on memory usage.
>
> This all comes with a *fifth* almost-copy of codel.h, which incorporates the more efficient dequeue callback from codel4.h, but puts back the separate treatment of interval and target.  Cake auto-tunes both of them to related values, but the relationship isn’t as simple as scaling one from the other, and I’m not comfortable with the mental contortions involved in the new reasoning.

k. will test.

> I also added a lookup table for the first few inverse square roots, because I found (using a spreadsheet) that these are very inaccurate if you only do one Newton iteration on them - something like 1.0, 0.500, 0.563, 0.488 instead of 1.0, 0.707, 0.577, 0.500.  Nebulous problems with “insufficiently tight control” might possibly be traced to this.  With the first fifteen entries calculated up-front using four iterations and stored, accuracy is vastly improved and, in practice, almost all square-root calculations will now reduce to a simple table lookup.  The memory cost is trivial, and Newton is good enough for exceptionally large drop counts.

We explored the sqrt cache idea in the very first implementations of
codel which you can revisit in the earliest, exciting days of the
existence of the codel mailing list.

I am not allergic to trying it again. What we had found then was the
count variable was increasing without bound (with the original
published ns2 version), for which we substituted another modification
that works well at low bandwidths but less well at higher ones.

My sole mathematical contribution to the invsqrt approximation in
linux codel was the discovery that newton's method had much larger
errors than 2% when run in reverse (eg, when decreasing count by more
than 1, as various forms of the algorithm have done while trying to
come up with a better estimate for the the resumption portion of the
algorithm)

When using alternate forms of recalculating count, I have usually run
newton's twice to smooth out that error somewhat, but I am open to
other alternatives.

One thing I would like us to be looking at much harder this time,
instead of the latency of the fq'd measurement flows, is the actual
size of cwnd and other tcp behaviors from the packet captures, against
modern tcps. Us perpetually publishing just the fq results in rrul is
misleading some people I think.

It would be nice to be sampling these across the course of a run also
from netperf-wrapper but I don't know how to do that without web10g.

> Along with the hashing statistics, I’m also adding three measures of latency to replace the one from the first version.  This no longer relies on data provided by Codel (so there’s no space penalty per flow for collecting it, and the inner loop in the stats dump is also eliminated), but performs the necessary calculation just after dequeuing each packet, storing the aggregated values per class instead of per flow.  Each measure is an EWMA (using optimised weights which reduce to shifts and adds), but two of them are biased - one towards high latencies (so it’ll mostly track the fattest flows), and one towards low latencies (so it’ll mostly track sparse flows).

Nifty. Note that I tried hard in codel4 (I think it was) to reduce the
size of the codel stats to a single 32byte cache-line on 32 bit arches
- down from 48 or so bytes - and succeeded with one word to spare. I
am not allergic to keeping lots of statistics but we are also aiming
for speed on lame processors here....

>
> Still lots of little things to do…
>
>  - Jonathan Morton



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



More information about the Codel mailing list