[Cake] Control theory and congestion control

Sun May 10 14:32:38 EDT 2015

> On 10 May, 2015, at 19:48, Sebastian Moeller <moeller0 at gmx.de> wrote:
> 
>> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.
> 
> I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.

Yes, but I consider that a degraded mode of operation.  Even if it is, for the time being, the dominant mode.

> 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete?

Backwards compatibility for current ECN means dropping non-ECN packets that would have been marked.  That works, so we can use it as a model.

Backwards compatibility for “enhanced” ECN - let’s call it ELR for Explicit Load Regulation - would mean providing legacy ECN signals to legacy ECN traffic.  But, in the absence of flow isolation, if we only marked packets with ECN when they fell into the “fast down” category (which corresponds to their actual behaviour), then they’d get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in the day (and for basically the same reason).

The solution is to provide robust flow isolation, and/or to ECN-mark packets in “hold” and “slow down” states as well as “fast down”.  This ensures that legacy ECN does not unfairly outcompete ELR, although it might reduce ECN traffic’s throughput.

The other side of the compatibility coin is what happens when ELR traffic hits a legacy router (whether ECN enabled or not).  Such a router should be able to recognise ELR packets as ECN and perform ECN marking when appropriate, to be interpreted as a “fast down” signal.  Or, of course, to simply drop packets if it doesn’t even support ECN.

> And how can the intermediate router control/check that a flow truly is well-behaved, especially with all the allergies against keeping per-flow state that router’s seem to have?

Core routers don’t track flow state, but they are typically provisioned to not saturate their links in the first place.  Adequate backwards-compatibility handling will do here.

Edge routers are rather more capable of keeping sufficient per-flow state for effective flow isolation, as cake and fq_codel do.

Unresponsive flows are already just as much of a problem with ECN as they would be with ELR.  Flow isolation contains the problem neatly.  Transitioning to packet drops (ignoring both ECN and ELR) under overload conditions is also a good safety valve.

> Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?

It’s a fair question, and probably requires further research to answer reliably.  However, you should also probably consider the typical nature of the *bottleneck* link, rather than every possible Internet link.  It’s usually the last mile.

> True, but how stable is a network path actually over seconds time frames?

Stable enough for VoIP and multiplayer twitch games to work already, if the link is idle.

> Could an intermediate router actually figure out what signal to send all flows realistically?

I described a possible method of doing so, using information already available in fq_codel and cake.  Whether they would work satisfactorily in practice is an open question.

 - Jonathan Morton