[Codel] The next slice of cake

Mon Mar 30 23:22:58 EDT 2015

> On 30 Mar, 2015, at 21:23, Sebastian Moeller <moeller0 at gmx.de> wrote:
> 
>>> We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'.
>> 
>> The buffer-occupancy queue limit is in, using the field quoted above.  Previously, it would just count packets.  As with previous versions of cake (and fq_codel), it checks the limit immediately after enqueuing each packet, and drops one packet from the head of the longest queue (by byte count) if it’s violated.
> 
> Does this mean the largest size the queue actually occupies is limit plus the last packet’s true size temporarily? What about the case a full MTU packet comes in while the eligible packet at the head of the longest queue is say 64 byte?

Yes - I could make it capable of dropping multiple packets to cope better with the latter case.

The former case is not a problem really, since the packet is already in memory when we’re given it to enqueue; to fix it we’d have to anticipate the largest possible arriving packet and pre-emptively leave space for it.

> I thought the main argument was, that interval needs to be roughly in the right range and target is supposed to be between 5 to 10 percent of interval.

Yes, but why?

I have seen other arguments, both recently and otherwise, which modify target to accommodate the MAC grant latency of some medium, while interval is still described as a rough estimate of the RTT.

The existence of both explanations gives me epic cognitive-dissonance headaches, which I can’t resolve *unless* I am able to vary target and interval independently.  Since the cost to do so appears to be small, I put that facility back in.

>>> We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'.
>> 
>> The buffer-occupancy queue limit is in, using the field quoted above.  Previously, it would just count packets.  As with previous versions of cake (and fq_codel), it checks the limit immediately after enqueuing each packet, and drops one packet from the head of the longest queue (by byte count) if it’s violated.
> 
> did this include the overload protection fallback to dropping rather
> than ecn-ing packets?

Codel already has such a protection mechanism, but I haven’t yet converted it to byte mode.

> We explored the sqrt cache idea in the very first implementations of
> codel which you can revisit in the earliest, exciting days of the
> existence of the codel mailing list.
> 
> I am not allergic to trying it again. What we had found then was the
> count variable was increasing without bound (with the original
> published ns2 version), for which we substituted another modification
> that works well at low bandwidths but less well at higher ones.
> 
> My sole mathematical contribution to the invsqrt approximation in
> linux codel was the discovery that newton's method had much larger
> errors than 2% when run in reverse (eg, when decreasing count by more
> than 1, as various forms of the algorithm have done while trying to
> come up with a better estimate for the the resumption portion of the
> algorithm)

A quick check in that spreadsheet suggests that this inaccuracy only occurs for low values of count, in basically the same places as running it forward is also inaccurate.  So having the pre-calculated lookup table for those first values works, and Newton is still used directly for larger counts if they occur.

> Note that I tried hard in codel4 (I think it was) to reduce the
> size of the codel stats to a single 32byte cache-line on 32 bit arches
> - down from 48 or so bytes - and succeeded with one word to spare. I
> am not allergic to keeping lots of statistics but we are also aiming
> for speed on lame processors here…

I’ve managed to add these statistics without increasing the size of the per-flow data, only the per-class data.  The weakness of the CPUs involved is why I’m aiming for shifts and adds (which an ARM in particular can do extremely efficiently, but a MIPS isn’t hurt much either) rather than multiplies in the fast path.  There’s a stray multiply in the hash function which I still want to excise.

It might also be possible, later on, to add a kernel config switch to drop out the stats computations.  It probably won’t make very much difference, but when every little helps...

 - Jonathan Morton