[Cake] Control theory and congestion control

Tue May 12 23:12:04 EDT 2015

On Wed, 13 May 2015, Jonathan Morton wrote:

>> On 13 May, 2015, at 02:23, David Lang <david at lang.hm> wrote:
>>
>>> 1) The most restrictive signal seen during an RTT is the one to react to. 
>>> So a “fast down” signal overrides anything else.
>>
>> sorry for joining in late, but I think you are modeling something that 
>> doesn't match reality.
>>
>> are you really going to see two bottlenecks in a given round trip (or even 
>> one connection)? Since you are ramping up fairly slowly, aren't you far more 
>> likely to only see one bottleneck (and once you get through that one, you are 
>> pretty much set through the rest of the link)
>
> It’s important to remember that link speeds can change drastically over time 
> (usually if it’s *anything* wireless), that new competing traffic might reduce 
> the available bandwidth suddenly, and that as a result the bottleneck can 
> *move* from an ELR-enabled queue to a different queue which might not be.  I 
> consider that far more likely than an ELR queue abruptly losing control as 
> Sebastian originally suggested, but it looks similar to the endpoints.

agreed.

> So what you might have is an ELR queue happily controlling the cwnd based on 
> the assumption that *it* is the bottleneck, which until now it has been.  But 
> *after* that queue is another one which has just *become* the bottleneck, and 
> it’s not ELR - it’s plain ECN.  The only way it can tell the flow to slow down 
> is by giving “fast down” signals.  But that’s okay, the endpoints will react 
> to that just as they should do, as long as they correctly interpret the most 
> restrictive signal as being the operative one.

how would the ELR queue know that things should slow down? If it isn't the 
bottleneck, how does it know that there is a bottleneck and the flow that it's 
seeing isn't just the application behaving normally?

If the ELR queue is the endpoint, it has some chance of knowing what the 
application is trying to do, but if it's on the router that was previously the 
bottleneck (usually the 'last mile' device), it has no way of knowing.

> The safe option here is to react like an ECN-enabled flow, treating any lost 
> packet as a “fast down” signal.  An alternative is to treat a lost packet as 
> “slow down” *if* it is accompanied by “slow up” or “hold” signals in the same 
> RTT (ie. there’s a reasonable belief that we’re being properly controlled by 
> ELR).  While “slow down” doesn’t react as quickly as a new bottleneck queue 
> might prefer, it does at least respond; if enough drops appear, the ELR 
> queue’s control loop will be shifted to “fast up”, relinquishing control. 
> Or, if the AQM isn’t tight enough to do that, the corresponding increase in 
> RTT will do it instead.

It's the application or the endpoint machine that needs to react, not the queue 
device.

>> (if it's a new flow, it should start slow and ramp up, so you, and the other 
>> affected flows, should all be good with a 'slow down' signal)
>
> Given that slow-start grows the cwnd exponentially, that might not be the case 
> after the first few RTTs.  But that’s all part of the control loop, and ELR 
> would normally signal it with the CE codepoint rather than dropping packets. 
> Sebastian’s scenario of “slow down” suddenly changing to “omgwtfbbq drop 
> everything now” within the same queue is indeed unlikely.
>
>>> I fully appreciate that *some* network paths may be unstable, and any 
>>> congestion control system will need to chase the sweet spot up and down 
>>> under such conditions.
>>>
>>> Most of the time, however, baseline RTT is stable over timescales of the 
>>> order of minutes, and available bandwidth is dictated by the last-mile link 
>>> as the bottleneck.  BDP and therefore the ideal cwnd is a simple function of 
>>> baseline RTT and bandwidth.  Hence there are common scenarios in which a 
>>> steady-state condition can exist.  That’s enough to justify the “hold” 
>>> signal.
>>
>> Unless you prevent other traffic from showing up on the network (phones 
>> checking e-mail, etc). I don't believe that you are ever going to have stable 
>> bandwidth available for any noticable timeframe.
>
> On many links, light traffic such as e-mail will disturb the balance too 
> little to even notice, especially with flow isolation.

This depends on the bandwidth and the type of e-mail. It's very common for 
single e-mails in an office environment to be several MB (not the ones with big 
document or spreadsheet attachments, but the things like holiday party 
announcements and other similar things)

e-mail to a mobile device can have a rather significant impact on cell or wifi 
bandwidth.

>  Assuming ELR is 
> implemented as per my later post, running without flow isolation will allow 
> light traffic to perturb the ELR signal slightly, converting a “hold” into a 
> random sequence of “slow up”, “hold" and “slow down”, but this will 
> self-correct conservatively, with ELR transitioning to a true “slow up” 
> briefly if required.
>
> Of course, as with any speculation of this nature, simulations and other 
> experiments will tell a more convincing story.

I have a significant distrust of simulations at this point. We can only simulate 
how we thing the devices in the network act. Bufferbloat came to be because of 
the disconnect between the mental model of the people designing the protocols 
and the people designing the equipment.

David Lang