[Cake] Control theory and congestion control

Mon May 11 07:34:16 EDT 2015

>>>> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.
>>> 
>>> I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.
>> 
>> Yes, but I consider that a degraded mode of operation.  Even if it is, for the time being, the dominant mode.
>> 
>>> 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete?
>> 
>> Backwards compatibility for current ECN means dropping non-ECN packets that would have been marked.  That works, so we can use it as a model.
> 
> 	Let me elaborate, what I mean is if we got an ecn reduce slowly signal on the ecn flow and the router goes into overload, what guarantees that our flow with the double reduce-slowly ecn signal plus the reduce-hard drop will end not end up at an disadvantage over greedy non-ecn flows? It probably is quite simple but I can not see it right now.

There are two possible answers to this:

1) The most restrictive signal seen during an RTT is the one to react to.  So a “fast down” signal overrides anything else.

2) If ELR signals are being received which indicate that the bottleneck queue is basically under control, then it might be reasonable to assume that packet drops in the same RTT are *not* congestion related, but due to random losses.  This is not in itself novel behaviour: Westwood+ uses RTT variation to infer the same thing.

>> Backwards compatibility for “enhanced” ECN - let’s call it ELR for Explicit Load Regulation - would mean providing legacy ECN signals to legacy ECN traffic.  But, in the absence of flow isolation, if we only marked packets with ECN when they fell into the “fast down” category (which corresponds to their actual behaviour), then they’d get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in the day (and for basically the same reason).
> 
> 	In other words ELR will be outcompeted by ECN classic?

Given such a naive implementation, yes.  Bear in mind that I’m essentially thinking out loud here.  The details are *not* all worked out.

>> The solution is to provide robust flow isolation, and/or to ECN-mark packets in “hold” and “slow down” states as well as “fast down”.  This ensures that legacy ECN does not unfairly outcompete ELR, although it might reduce ECN traffic’s throughput.
> 
> 	Well if we want ELR to be the next big thing we should aim to make it more competitive than classic ECN (assuming we get enough “buy-in” from the regulating parties, like IETF and friends)

It’s one possible approach.  Unambiguous throughput improvements probably do sell well.

I’m also now thinking about how to approximate fairness between ELR flows *without* flow isolation.  Since ELR would aim to provide a continuous signal rather than a stochastic one, this is actually a harder problem than it sounds; naively, a new flow would stay at minimum cwnd as long as a competing flow was saturating the link, since both would be given the same up/down signals.  There might need to be some non-obvious properties in the way the signal is provided to overcome that; I have the beginnings of an idea, but need to work it out.

>> Edge routers are rather more capable of keeping sufficient per-flow state for effective flow isolation, as cake and fq_codel do.
> 
> 	But we already have a hard time to convince the operators of the edge routers (telcos cable cos…) to actually implement something saner than deep buffers at those devices. If they would at least own up to the head-end buffers for the downlink we would be in much better shape, and if they would offer to handle up-link buffer bloat as part of their optional ISP-router-thingy the issue would be stamped already. But did you look inside a typical CPE recently, still kernel from the 2.X series, so no codel/fq_codel and what ever else fixes were found in the several years since 2.X was the hot new thing…

For CPE at least, there exists a market opportunity for somebody to fill.  OpenWRT shows what can be done with existing hardware with some user engagement.  In principle, it’s only a short step from there to a new commercial product that Does the Right Things.

>>> Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?
>> 
>> It’s a fair question, and probably requires further research to answer reliably.  However, you should also probably consider the typical nature of the *bottleneck* link, rather than every possible Internet link.  It’s usually the last mile.
> 
> 	I wish that was true… I switched to a 100/40 link and since then suffer from bad peering of my ISP (this seems to be on purpose to incentivise content providers to agree to payed peering with my ISP, but it seems only very little of the content providers went along, and so I feel that even the router’s connecting different networks could work much better/fairer under saturating load… but I have no real data nor ways to measure it so this is conjecture)

>> Core routers don’t track flow state, but they are typically provisioned to not saturate their links in the first place.  
> 
> 	This I heard quite often; it always makes me wonder whether there is a better way to design a network to work well at capacity instead of working  around this by simply over-provisining, I thought it is called network engineering not network-“brute-forcing”…

Peering points are one of the few “core like” locations where adequate capacity cannot be relied on.  Fortunately, what I hear is that peering links are often made using a set of 10GbE cables.  At 10Gbps, it’s entirely feasible to run fq_codel (probably based on IP addresses, not individual flows) in software, never mind in hardware.  So that’s a solvable problem at the technical level.

The fact that certain ISPs are *deliberately* restricting capacity is a thornier problem, and one that’s entirely political.

True core networks are, I hear, often made using optical switches rather than routers per se.  It’s a very alien environment.  I wouldn’t be surprised if there was difficulty even running something as simple as RED at the speeds they use.  I’m perfectly happy with the idea of them aiming to keep the bottlenecks elsewhere - at the peering points if nowhere else.

>>> True, but how stable is a network path actually over seconds time frames?
>> 
>> Stable enough for VoIP and multiplayer twitch games to work already, if the link is idle.
> 
> 	Both of which pretty much try to keep constant bitrate UDP traffic flows going I believe, so they only care if the immediate network path and or alternatives a) has sufficient headroom for the data and b) latency changes due to path re-routing stay inside the de-jitter/de-lag buffer systems that are in use; or put differently, these traffic types will not attempt to saturate a given link by themselves so they are not the most sensitive probes for network path stability, no?

I fully appreciate that *some* network paths may be unstable, and any congestion control system will need to chase the sweet spot up and down under such conditions.

Most of the time, however, baseline RTT is stable over timescales of the order of minutes, and available bandwidth is dictated by the last-mile link as the bottleneck.  BDP and therefore the ideal cwnd is a simple function of baseline RTT and bandwidth.  Hence there are common scenarios in which a steady-state condition can exist.  That’s enough to justify the “hold” signal.

>>> Could an intermediate router actually figure out what signal to send all flows realistically?
>> 
>> I described a possible method of doing so, using information already available in fq_codel and cake.  
> 
> 	We are back at the issue, how to make sure big routers learn codel /q_codel as options in their AQM subsystems… It would be interesting to know what the cisco’s/juniper’s/huawei’s of the world actually test in their private labs ;)