[Cake] More on 'target' corner cases - rate/target/interval confusion?

Mon Nov 9 10:07:40 EST 2015

> On 9 Nov, 2015, at 14:12, Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> wrote:
> 
> In the presence of a full link, that link having competing ‘full' flows in all 'tins', then how should cake split the link in terms of bandwidth?

That’s a good question, and one that I think becomes more critical at low bandwidths.  I’ve tended towards generous allocations for that reason, so as to avoid causing trouble to low-latency applications.

The main requirement I keep in mind is that an application should not be able to guarantee itself an excessive bandwidth share simply by selecting a particular DSCP.  At the same time, there are applications for which a relatively large, consistent bandwidth is a requirement for satisfactory performance (consider streaming video), such that best-effort traffic should defer to them.  These are conflicting requirements, so a compromise has to be established somehow.

The current thresholds are at 100%, 15/16, ¾ and ¼.  Under saturated conditions, this gives throughputs of 1/16, 3/16, ½ and ¼.  The “video” class (Tin 2) can usurp ¾ of the bandwidth when competing against any mixture of best-effort and bulk traffic.  This, admittedly, might turn out to be too much, so I could consider setting Tin 2’s threshold at ½ instead of ¾.

And yes, I have long noticed that Flent’s standard RRUL test doesn’t use Tin 2 at all.

> With each increasing priority of tini[0-3], we decrease the 'expected' bandwidth (good) but also as a result increase the target and interval.

Yes, there are a number of counter-intuitive things happening here.

Most of Cake’s latency-reducing power comes from the liberal application of flow isolation, *not* from AQM itself.  Diffserv prioritisation plays a lesser role, and mainly has to do with restoring the desired allocations of bandwidth, replacing the reliance on measure queue fill level that some protocols presently use to stay out from underfoot.  Both these mechanisms primarily control the effects that one flow can have on another, and say little about the latency that a flow causes to itself.

This latter is the domain of AQM, specifically Codel, which is what we’re talking about when we mention “interval" and "target”.  As I mentioned elsewhere recently, Codel is designed specifically to give congestion signals to TCP-like flows, and deals rather less efficiently with unresponsive and anti-responsive flows, which as a result tend to spend some time bouncing off the queue’s hard limit until Codel finds the correct operating point to control them.  There are other AQMs which are designed with unresponsive flows more in mind, but which somehow perform less well with TCP-like flows.

A key design principle of Codel is that no packet whose sojourn time is below target will be signalled.  However, if the sojourn time is consistently above target, signalling begins and increases steadily in frequency.  It is also a fundamental truth that if it takes longer than target to transmit the previous packet, the following packet can have a “congested” sojourn time even if there is consistently precisely one packet in the queue (which is the ideal state).  This is why I constrain “target” to be at least 1.5 packet times at MTU; the difference can be substantial at low bandwidths.

But there are subtleties here too.

If there are multiple flows, isolated into multiple queues, then the effective packet-to-packet time for each queue will be increased proportionately.  Early versions of Codel refused to signal on the last packet in the queue, to account for this.  However, if there were a large number of occupied queues, this meant that the minimum queue fill could be rather high, and this seemed to lead to high induced latencies in fq_codel under heavy load.  Due to the statistical multiplexing effect, it turned out to be sufficient to tune “target” as above for the final output bandwidth (even though this is unknown to fq_codel) and to remove the special status of the last remaining packet.

The same logic could naively be applied to traffic in separate tins.  However, unlike queues for flow isolation, bandwidth is not shared evenly between tins.  More subtly, traffic characteristics also differ - low-latency traffic tends to be unresponsive to TCP-style congestion signalling, and dropping any of it tends to reduce service quality in some way.  Note that network-control traffic (most relevantly NTP) falls into the “voice” category.  Since unresponsive flows aren’t what Codel is meant to deal with, the mere fact that “target” is higher is not meaningful - and in any case this has no effect on the primary flow-isolation mechanism.

The tin-specific tuning of target and interval was introduced when Cake had a separate hard shaper per tin.  It was the obvious design at the time.  Now that Cake uses soft shaping between tins (allowing any tin to use the full link bandwidth if uncontended), it’s possible that choosing identical target and interval for all tins might suffice.  Alternatively, we might choose an even more conservative strategy.

But which - and how do we decide?

As a final point, I haven’t even mentioned “rtt” as the user-specified input to this mayhem.  That parameter must be understood to be *separate* from both “target” and “interval”, even though Codel specifies the latter to be related to expected RTT.  Simply put, the user tells us what the expected RTT is (on the understanding that an order of magnitude variation either way is typical), and we calculate “target” and “interval” to be as consistent with that estimate as is practical, given the link bandwidth and other constraints that he has also specified.  So there is a firm conceptual distinction between the user’s intent and the implementation details.

 - Jonathan Morton