[Codel] The next slice of cake
Sebastian Moeller
moeller0 at gmx.de
Mon Mar 30 14:23:35 EDT 2015
On Mar 30, 2015, at 19:28 , Jonathan Morton <chromatix99 at gmail.com> wrote:
> I made a lot of progress over the weekend, and got cake3 running on a sacrificial box today. No crashes yet!
>
> The set-associative hashing is in, currently hard-coded to 8-way, which should be more than adequate for typical home workloads (especially if BitTorrent is marked as background). I’m now adding statistics outputs so that we can all see how well it works under real loads, distinguishing between direct hits (the fast path, where the hash points directly to a correctly allocated queue), indirect hits (where a search through the set locates an allocated queue), misses (where an empty queue was allocated) and collisions (where all the queues were already occupied by other flows). Note that “allocation” in this case simply means updating the tag on the queue to match the packet, not allocating memory. These statistics should help with tuning the number of flow queues in each class.
>
> The new Diffserv logic is also in and apparently working well. I’m able to observe both the priority-balanced and bandwidth-balanced behaviours under appropriate circumstances. It is, however, fairly sensitive to the baseline quantum value from which the outer DRR weights are derived. Setting it to the MTU lets the high-priority classes burst too much, temporarily starving the lower classes. It seems to work a lot better using 256 as a baseline, but I hope the overhead of going around the DRR loop several times per packet isn’t too high. It might be sane to vary the baseline quantum based on the bandwidth setting, but we can experiment with that later.
>
>> We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'.
>
> The buffer-occupancy queue limit is in, using the field quoted above. Previously, it would just count packets. As with previous versions of cake (and fq_codel), it checks the limit immediately after enqueuing each packet, and drops one packet from the head of the longest queue (by byte count) if it’s violated.
Does this mean the largest size the queue actually occupies is limit plus the last packet’s true size temporarily? What about the case a full MTU packet comes in while the eligible packet at the head of the longest queue is say 64 byte?
>
> Calculating the appropriate size to use here was an interesting problem. I start by estimating the BDP (from the shaper bandwidth and the Codel interval), quadrupling that and enforcing a lower limit of 64KB. If it’s set to “unlimited bandwidth”, I set the size to an arbitrary 1MB. Then I look at the packet limit, multiply that by the MTU and clamp the resulting size down to that - providing a straightforward mechanism to set a limit on memory usage.
>
> This all comes with a *fifth* almost-copy of codel.h, which incorporates the more efficient dequeue callback from codel4.h, but puts back the separate treatment of interval and target. Cake auto-tunes both of them to related values, but the relationship isn’t as simple as scaling one from the other, and I’m not comfortable with the mental contortions involved in the new reasoning.
I thought the main argument was, that interval needs to be roughly in the right range and target is supposed to be between 5 to 10 percent of interval.
>
> I also added a lookup table for the first few inverse square roots, because I found (using a spreadsheet) that these are very inaccurate if you only do one Newton iteration on them - something like 1.0, 0.500, 0.563, 0.488 instead of 1.0, 0.707, 0.577, 0.500. Nebulous problems with “insufficiently tight control” might possibly be traced to this. With the first fifteen entries calculated up-front using four iterations and stored, accuracy is vastly improved and, in practice, almost all square-root calculations will now reduce to a simple table lookup. The memory cost is trivial, and Newton is good enough for exceptionally large drop counts.
>
> Along with the hashing statistics, I’m also adding three measures of latency to replace the one from the first version. This no longer relies on data provided by Codel (so there’s no space penalty per flow for collecting it, and the inner loop in the stats dump is also eliminated), but performs the necessary calculation just after dequeuing each packet, storing the aggregated values per class instead of per flow. Each measure is an EWMA (using optimised weights which reduce to shifts and adds), but two of them are biased - one towards high latencies (so it’ll mostly track the fattest flows), and one towards low latencies (so it’ll mostly track sparse flows).
>
> Still lots of little things to do…
>
> - Jonathan Morton
Wahoo, this sounds great. I cant’t wait to do a “test-drive” with this ;), (once you are looking for testers let me know)
Best Regards
Sebastian
More information about the Codel
mailing list