[Cake] Proposing COBALT

Fri May 20 08:18:11 EDT 2016

>> One of the major reasons why Codel fails on UDP floods is that its drop schedule is time-based.  This is the correct behaviour for TCP flows, which respond adequately to one congestion signal per RTT, regardless of the packet rate.  However, it means it is easily overwhelmed by high-packet-rate unresponsive (or anti-responsive, as with TCP acks) floods, which an attacker or lab test can easily produce on a high-bandwidth ingress, especially using small packets.
> 
> In essence I agree, but want to point out that the protocol itself does not really matter but rather the observed behavior of a flow. Civilized UDP applications (that expect their data to be carried over the best-effort internet) will also react to drops similar to decent TCP flows, and crappy TCP implementations might not. I would guess with the maturity of TCP stacks misbehaving TCP flows will be rarer than misbehaving UDP flows (which might be for example well-behaved fixed-rate isochronous flows that simply should never have been sent over the internet).

Codel properly handles both actual TCP flows and other flows supporting TCP-friendly congestion control.  The intent of COBALT is for BLUE to activate whenever Codel clearly cannot cope, rather than on a protocol-specific basis.  This happens to dovetail neatly with the way BLUE works anyway.

>> BLUE’s up-trigger should be on a packet drop due to overflow (only) targeting the individual subqueue managed by that particular BLUE instance.  It is not correct to trigger BLUE globally when an overall overflow occurs.  Note also that BLUE has a timeout between triggers, which should I think be scaled according to the estimated RTT.
> 
> That sounds nice in that no additional state is required. But with the current fq_codel I believe, the packet causing the memory limit overrun, is not necessarily from the flow that actually caused the problem to beginn with, and I doesn’t fq_codel actuall search the fattest flow and drops from there. But I guess that selection procedure could be run with blue as as well.

Yes, both fq_codel and Cake search for the longest extant queue and drop packets from that on overflow.  It is this longest queue which would receive the BLUE up-trigger at that point, which is not necessarily the queue for the arriving packet.

>> BLUE’s down-trigger is on the subqueue being empty when a packet is requested from it, again on a timeout.  To ensure this occurs, it may be necessary to retain subqueues in the DRR list while BLUE’s drop probability is nonzero.
> 
> Question, doesn’t this mean the affected flow will be throttled quite harshly? Will blue slowly decrease the drop probability p if the flow behaves? If so, blue could just disengage if p drops below a threshold?

Given that within COBALT, BLUE will normally only trigger on unresponsive flows, an aggressive up-trigger response from BLUE is in fact desirable.  Codel is far too meek to handle this situation; we should not seek to emulate it when designing a scheme to work around its limitations.

BLUE’s down-trigger decreases the drop probability by a smaller amount (say 1/4000) than the up-trigger increases it (say 1/400).  These figures are the best-performing configuration from the original paper, which is very readable, and behaviour doesn’t seem to be especially sensitive to the precise values (though only highly-aggregated traffic was considered, and probably on a long timescale).  For an actual implementation, I would choose convenient binary fractions, such as 1/256 up and 1/4096 down, and a relatively short trigger timeout.

If the relative load from the flow decreases, BLUE’s action will begin to leave the subqueue empty when serviced, causing BLUE’s drop probability to fall off gradually, potentially until it reaches zero.  At this point the subqueue is naturally reset and will react normally to subsequent traffic using it.

The BLUE paper: http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf

>> Note that this does nothing to improve the situation regarding fragmented packets.  I think the correct solution in that case is to divert all fragments (including the first) into a particular queue dependent only on the host pair, by assuming zero for src and dst ports and a “special” protocol number.  
> 
> I believe the RFC recommends using the SRC IP, DST IP, Protocol, Identity tuple, as otherwise all fragmented flows between a host pair will hash into the same bucket…

I disagree with that recommendation, because the Identity field will be different for each fragmented packet, even if many such packets belong to the same flow.  This would spread these packets across many subqueues and give them an unfair advantage over normal flows, which is the opposite of what we want.

Normal traffic does not include large numbers of fragmented packets (I would expect a mere handful from certain one-shot request-response protocols which can produce large responses), so it is better to shunt them to a single queue per host-pair.

 - Jonathan Morton