[Codel] The next slice of cake

Sebastian Moeller moeller0 at gmx.de
Tue Mar 31 03:12:58 EDT 2015


Hi Jonathan,

On Mar 31, 2015, at 05:22 , Jonathan Morton <chromatix99 at gmail.com> wrote:

> 
>> On 30 Mar, 2015, at 21:23, Sebastian Moeller <moeller0 at gmx.de> wrote:
>> 
>>>> We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'.
>>> 
>>> The buffer-occupancy queue limit is in, using the field quoted above.  Previously, it would just count packets.  As with previous versions of cake (and fq_codel), it checks the limit immediately after enqueuing each packet, and drops one packet from the head of the longest queue (by byte count) if it’s violated.
>> 
>> Does this mean the largest size the queue actually occupies is limit plus the last packet’s true size temporarily? What about the case a full MTU packet comes in while the eligible packet at the head of the longest queue is say 64 byte?
> 
> Yes - I could make it capable of dropping multiple packets to cope better with the latter case.
> 
> The former case is not a problem really, since the packet is already in memory when we’re given it to enqueue; to fix it we’d have to anticipate the largest possible arriving packet and pre-emptively leave space for it.

	Well, we at least need to drop as much data as we add at the end, otherwise a UDP flood can certainly make us exceed all memory (queue is filled with small packets, then a flood of full MTU packets arrives, boom). So on resource starved machines it would be nice if we could set a “hard" limit on the memory consumption by cake (and yes this leaves the buffers leading into cake).

> 
>> I thought the main argument was, that interval needs to be roughly in the right range and target is supposed to be between 5 to 10 percent of interval.
> 
> Yes, but why?

	I think this is explained well in sections 4.3 and 4.4 of https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/?include_text=1

> 
> I have seen other arguments, both recently and otherwise, which modify target to accommodate the MAC grant latency of some medium, while interval is still described as a rough estimate of the RTT.

	According to Kathleen Nichols http://pollere.net/Pdfdocs/noteburstymacs.pdf the reasoning of the RFC draft should also apply to the MAC issue. I admit that in sqm-scripts I went the easy way and sized target to allow ~1.5 Packets and then just added the target duration to interval, but according to Kathleen Nichols this will basically sacrifice more latency than necessary to reclaim some bandwidth. Now I thought that adding the slow link transfer time to the interval appealed to me as effectively all RTTs are increased by that amount, but empirically we want to fiddle with target. I guess I will go and implement a smarter adjustment, that first ramps target from 5 to 10% of interval (the RFC argues for target in the 5-10% range), and only then starts to increase interval, so that adjusted_interval = adjusted_target * 10. But increasing target above a full MTU transfer time is still necessary otherwise people;e on slow links sacrifice (too?) much of their already scarce bandwidth. (I believe Dave argued that this problem does not really exist n the ns version of the ACM paper, but since ns2 is not a full model of the actual linux kernel, the do not drop for sing;e packet queues heuristic in codel does not seem to do the right thing in real life (I could easily have gotten this wrong, though))
	I guess I should go measure and only then come back to this discussion.


> 
> The existence of both explanations gives me epic cognitive-dissonance headaches, which I can’t resolve *unless* I am able to vary target and interval independently.  Since the cost to do so appears to be small, I put that facility back in.

	I am all for manual overrides, I could also live with a mode where if only one is given, the other is calculated according to the above ides.

> 
>>>> We keep track of how many bytes of system memory are consumed by a packet in 'skb->truesize'.
>>> 
>>> The buffer-occupancy queue limit is in, using the field quoted above.  Previously, it would just count packets.  As with previous versions of cake (and fq_codel), it checks the limit immediately after enqueuing each packet, and drops one packet from the head of the longest queue (by byte count) if it’s violated.
>> 
>> did this include the overload protection fallback to dropping rather
>> than ecn-ing packets?
> 
> Codel already has such a protection mechanism, but I haven’t yet converted it to byte mode.

	Ah, good this is one more important step to make smallish routers survive a udp flood; not that I think a router should work great during a flood/attack, but it certainly should not crash...

> 
>> We explored the sqrt cache idea in the very first implementations of
>> codel which you can revisit in the earliest, exciting days of the
>> existence of the codel mailing list.
>> 
>> I am not allergic to trying it again. What we had found then was the
>> count variable was increasing without bound (with the original
>> published ns2 version), for which we substituted another modification
>> that works well at low bandwidths but less well at higher ones.
>> 
>> My sole mathematical contribution to the invsqrt approximation in
>> linux codel was the discovery that newton's method had much larger
>> errors than 2% when run in reverse (eg, when decreasing count by more
>> than 1, as various forms of the algorithm have done while trying to
>> come up with a better estimate for the the resumption portion of the
>> algorithm)
> 
> A quick check in that spreadsheet suggests that this inaccuracy only occurs for low values of count, in basically the same places as running it forward is also inaccurate.  So having the pre-calculated lookup table for those first values works, and Newton is still used directly for larger counts if they occur.
> 
>> Note that I tried hard in codel4 (I think it was) to reduce the
>> size of the codel stats to a single 32byte cache-line on 32 bit arches
>> - down from 48 or so bytes - and succeeded with one word to spare. I
>> am not allergic to keeping lots of statistics but we are also aiming
>> for speed on lame processors here…
> 
> I’ve managed to add these statistics without increasing the size of the per-flow data, only the per-class data.  The weakness of the CPUs involved is why I’m aiming for shifts and adds (which an ARM in particular can do extremely efficiently, but a MIPS isn’t hurt much either) rather than multiplies in the fast path.  There’s a stray multiply in the hash function which I still want to excise.
> 
> It might also be possible, later on, to add a kernel config switch to drop out the stats computations.  It probably won’t make very much difference, but when every little helps...
> 
> - Jonathan Morton
> 


Best Regards
	Sebastian




More information about the Codel mailing list