[Cake] A few puzzling Cake results

Wed Apr 18 09:21:22 EDT 2018

Jonas Mårtensson <martensson.jonas at gmail.com> writes:

> On Wed, Apr 18, 2018 at 1:25 PM, Toke Høiland-Jørgensen <toke at toke.dk>
> wrote:
>
>> Toke Høiland-Jørgensen <toke at toke.dk> writes:
>>
>> > Jonathan Morton <chromatix99 at gmail.com> writes:
>> >
>> >>> On 17 Apr, 2018, at 12:42 pm, Toke Høiland-Jørgensen <toke at toke.dk>
>> wrote:
>> >>>
>> >>> - The TCP RTT of the 32 flows is *way* higher for Cake. FQ-CoDel
>> >>>  controls TCP flow latency to around 65 ms, while for Cake it is all
>> >>>  the way up around the 180ms mark. Is the Codel version in Cake too
>> >>>  lenient, or what is going on here?
>> >>
>> >> A recent change was to increase the target dynamically so that at
>> >> least 4 MTUs per flow could fit in each queue without AQM activity.
>> >> That should improve throughput in high-contention scenarios, but it
>> >> does come at the expense of intra-flow latency when it's relevant.
>> >
>> > Ah, right, that might explain it. In the 128 flow case each flow has
>> > less than 100 Kbps available to it, so four MTUs are going to take a
>> > while to dequeue...
>>
>> OK, so I went and looked at the code and found this:
>>
>>         bool over_target = sojourn > p->target &&
>>                            sojourn > p->mtu_time * bulk_flows * 4;
>>
>>
>> Which means that we scale the allowed sojourn time for each flow by the
>> time of four packets *times the number of bulk flows*.
>>
>> So if there is one active bulk flow, we allow each flow to queue four
>> packets. But if there are ten active bulk flows, we allow *each* flow to
>> queue *40* packets.
>
>
> I'm confused. Isn't the sojourn time for a packet a result of the
> total number of queued packets from all flows? If each flow were
> allowed to queue 40 packets, the sojourn time would be mtu_time *
> bulk_flows * 40, no?

No, the 40 in my example came from the bulk_flows multiplier.

Basically, what the current code does is that it scales the AQM target
by the number of active flows, so that the less effective bandwidth is
available to a flow, the more lenient the AQM is going to be.

Which is wrong; the AQM should signal the flow to slow down when it
exceeds its available bandwidth and starts building a queue. So if the
available bandwidth decreases (by more flows sharing it), the AQM is
*expected* to react by sending more "slow down" signals (dropping more
packets).

-Toke