[Cake] Control theory and congestion control

Mon May 11 03:36:30 EDT 2015

Hi Jonathan,

On May 10, 2015, at 20:32 , Jonathan Morton <chromatix99 at gmail.com> wrote:

> 
>> On 10 May, 2015, at 19:48, Sebastian Moeller <moeller0 at gmx.de> wrote:
>> 
>>> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.
>> 
>> I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.
> 
> Yes, but I consider that a degraded mode of operation.  Even if it is, for the time being, the dominant mode.
> 
>> 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete?
> 
> Backwards compatibility for current ECN means dropping non-ECN packets that would have been marked.  That works, so we can use it as a model.

	Let me elaborate, what I mean is if we got an ecn reduce slowly signal on the ecn flow and the router goes into overload, what guarantees that our flow with the double reduce-slowly ecn signal plus the reduce-hard drop will end not end up at an disadvantage over greedy non-ecn flows? It probably is quite simple but I can not see it right now.

> 
> Backwards compatibility for “enhanced” ECN - let’s call it ELR for Explicit Load Regulation - would mean providing legacy ECN signals to legacy ECN traffic.  But, in the absence of flow isolation, if we only marked packets with ECN when they fell into the “fast down” category (which corresponds to their actual behaviour), then they’d get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in the day (and for basically the same reason).

	In other words ELR will be outcompeted by ECN classic?

> 
> The solution is to provide robust flow isolation, and/or to ECN-mark packets in “hold” and “slow down” states as well as “fast down”.  This ensures that legacy ECN does not unfairly outcompete ELR, although it might reduce ECN traffic’s throughput.

	Well if we want ELR to be the next big thing we should aim to make it more competitive than classic ECN (assuming we get enough “buy-in” from the regulating parties, like IETF and friends)

> 
> The other side of the compatibility coin is what happens when ELR traffic hits a legacy router (whether ECN enabled or not).  Such a router should be able to recognise ELR packets as ECN and perform ECN marking when appropriate, to be interpreted as a “fast down” signal.  Or, of course, to simply drop packets if it doesn’t even support ECN.
> 
>> And how can the intermediate router control/check that a flow truly is well-behaved, especially with all the allergies against keeping per-flow state that router’s seem to have?
> 
> Core routers don’t track flow state, but they are typically provisioned to not saturate their links in the first place.  

	This I heard quite often; it always makes me wonder whether there is a better way to design a network to work well at capacity instead of working  around this by simply over-provisining, I thought it is called network engineering not network-“brute-forcing”...

> Adequate backwards-compatibility handling will do here.
> 
> Edge routers are rather more capable of keeping sufficient per-flow state for effective flow isolation, as cake and fq_codel do.

	But we already have a hard time to convince the operators of the edge routers (telcos cable cos…) to actually implement something saner than deep buffers at those devices. If they would at least own up to the head-end buffers for the downlink we would be in much better shape, and if they would offer to handle up-link buffer bloat as part of their optional ISP-router-thingy the issue would be stamped already. But did you look inside a typical CPE recently, still kernel from the 2.X series, so no codel/fq_codel and what ever else fixes were found in the several years since 2.X was the hot new thing…

> 
> Unresponsive flows are already just as much of a problem with ECN as they would be with ELR.  Flow isolation contains the problem neatly.  Transitioning to packet drops (ignoring both ECN and ELR) under overload conditions is also a good safety valve.
> 
>> Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?
> 
> It’s a fair question, and probably requires further research to answer reliably.  However, you should also probably consider the typical nature of the *bottleneck* link, rather than every possible Internet link.  It’s usually the last mile.

	I wish that was true… I switched to a 100/40 link and since then suffer from bad peering of my ISP (this seems to be on purpose to incentivise content providers to agree to payed peering with my ISP, but it seems only very little of the content providers went along, and so I feel that even the router’s connecting different networks could work much better/fairer under saturating load… but I have no real data nor ways to measure it so this is conjecture)

> 
>> True, but how stable is a network path actually over seconds time frames?
> 
> Stable enough for VoIP and multiplayer twitch games to work already, if the link is idle.

	Both of which pretty much try to keep constant bitrate UDP traffic flows going I believe, so they only care if the immediate network path and or alternatives a) has sufficient headroom for the data and b) latency changes due to path re-routing stay inside the de-jitter/de-lag buffer systems that are in use; or put differently, these traffic types will not attempt to saturate a given link by themselves so they are not the most sensitive probes for network path stability, no?

> 
>> Could an intermediate router actually figure out what signal to send all flows realistically?
> 
> I described a possible method of doing so, using information already available in fq_codel and cake.  

	We are back at the issue, how to make sure big routers learn codel /q_codel as options in their AQM subsystems… It would be interesting to know what the cisco’s/juniper’s/huawei’s of the world actually test in their private labs ;)

Best Regards
	Sebastian

> Whether they would work satisfactorily in practice is an open question.
> 
> - Jonathan Morton
>