[Cake] Long-RTT broken again
chromatix99 at gmail.com
Tue Nov 3 11:43:12 EST 2015
> On 3 Nov, 2015, at 13:50, Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>> The question remains why a 15MB buffer (which comfortably exceeds the
>> traditional FIFO rule of thumb for 1 second * 100Mbps) is apparently
>> insufficient according to Toke’s tests, even with the target increased
>> as requested.
> Because it's not a 15MB buffer; it's a 10240 packet buffer. So anything
> from ~.5 to ~15MB. In this case, it's a bidirectional test, so about
> half the packets will be tiny ACKs. I guess doing byte accounting would
> actually be better here, regardless of what happens to the overall limit... :)
Cake does the queue accounting in bytes, and calculates 15MB (by default) as the upper limit. It’s *not* meant to be a packet buffer.
However, the bytes counted are those allocated, not the on-wire packet sizes, because this limit is meant to avoid consuming all of a small router’s RAM for one queue. This isn’t just about hard OOM, which the kernel probably has handling for already, but sharing RAM between different queues on the same device; note the different behaviour of the upload and download streams in the results given.
The only way this could behave like a “packet buffer” instead of a byte-accounted queue is if there is a fixed size allocation per packet, regardless of the size of said packet. There are hints that this might actually be the case, and that the allocation is a hugely wasteful (for an ack) 2KB. (This also means that it’s not a 10240 packet buffer, but about 7500.)
But in a bidirectional TCP scenario with ECN, only about a third of the packets should be acks (ignoring the relatively low number of ICMP and UDP probes); ECN causes an ack to be sent immediately, but then normal delayed-ack processing should resume. This makes 6KB allocated per ~3KB transmitted. The effective buffer size is thus 7.5MB, which is still compliant with the traditional rule of thumb (BDP / sqrt(flows)), given that there are four bulk flows each way.
This effect is therefore not enough to explain the huge deficit Toke measured. The calculus also changes by only a small factor if we ignore delayed acks, making 8KB allocated per 3KB transmitted.
So, again - what’s going on? Are there any clues in packet traces with sequence analysis?
I’ll put in a configurable memory limit anyway, but I really do want to understand why this is happening.
- Jonathan Morton
More information about the Cake