[Cake] Control theory and congestion control

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] Control theory and congestion control
@ 2015-05-09 19:02 Jonathan Morton
  2015-05-10  3:35 ` Dave Taht
  2015-05-10 16:48 ` [Cake] " Sebastian Moeller
  0 siblings, 2 replies; 18+ messages in thread
From: Jonathan Morton @ 2015-05-09 19:02 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 4590 bytes --]

> The "right" amount of buffering is *1* packet, all the time (the goal is
nearly 0 latency with 100% utilization). We are quite far from achieving
that on anything...

And control theory shows, I think, that we never will unless the mechanisms
available to us for signalling congestion improve. ECN is good, but it's
not sufficient to achieve that ultimate goal. I'll try to explain why.

Aside from computer networking, I also dabble in computer simulated trains.
Some of my bigger projects involve detailed simulations of what goes on
inside them, especially the older ones which are relatively simple. These
were built at a time when the idea of putting anything as delicate as a
transistor inside what was effectively a megawatt-class power station was
unthinkable, so the control gear tended to be electromechanical or even
electropneumatic. The control laws therefore tended to be the simplest ones
they could get away with.

The bulk of the generated power went into the main traction circuit, where
a dedicated main generator is connected rather directly to the traction
motors through a small amount of switchgear (mainly to reverse the fields
on the motors at either end off the line). Control of the megawatts of
power surging through this circuit was effected by varying the excitation
of the main generator. Excitation is in turn provided by shunting the
auxiliary voltage through an automatic rheostat known as the Load Regulator
before it reaches the field winding of the generator. Without field
current, the generator produces no power.

The load regulator is what I want to focus on here. Its job was to adjust
the output of the generator to match the power - more precisely the torque
- that the engine was capable of producing (or, in English Electric locos
at least, the torque set by the driver's controls, which wasn't always the
maximum). The load regulator had a little electric motor to move it up and
down. A good proxy for engine torque was available in the form of the fuel
rack position; the torque output of a diesel engine is closely related to
the amount of fuel injected per cycle. The fuel rack, of course, was
controlled by the governor which was set to maintain a particular engine
speed; a straightforward PI control problem solved by a reasonably simple
mechanical device.

So it looks like a simple control problem; if the torque is too low,
increase the excitation, and vice versa.

Congestion control looks like a simple problem too. If there is no
congestion, increase the amount of data in flight; if there is, reduce it.
We even have Explicit Congestion Notification now to tell us that crucial
data point, but we could always infer it from dropped packets before.

So what does the load regulator's control system look like? It has as many
as five states: fast down, slow down, hold, slow up, fast up. It turns out
that trains really like changes in tractive effort to be slow and smooth,
and as infrequent as possible. So while a very simple "bang bang" control
scheme would be possible, it would inevitably oscillate around the set
point instead of settling on it. Introducing a central hold state allows it
to settle when cruising at constant speed, and the two slow states allow
the sort of fine adjustments needed as a train gradually accelerates or
slows, putting the generator only slightly out of balance with the engine.
The fast states remain to allow for quick response to large changes - the
driver moves the throttle, or the motors abruptly reconfigure for a
different speed range (the electrical equivalent of changing gear).

On the Internet, we're firmly stuck with bang-bang control. As big an
improvement as ECN is, it still provides only one bit of information to the
sender: whether or not there was congestion reported during the last RTT.
Thus we can only use the "slow up" and "fast down" states of our virtual
load regulator (except for slow start, which ironically uses the "fast up"
state), and we are doomed to oscillate around the ideal congestion window,
never actually settling on it.

Bufferbloat is fundamentally about having insufficient information at the
endpoints about conditions in the network. We've done a lot to improve
that, by moving from zero information to one bit per RTT. But to achieve
that holy grail, we need more information still.

Specifically, we need to know when we're at the correct BDP, not just when
it's too high. And it'd be nice if we also knew if we were close to it. But
there is currently no way to provide that information from the network to
the endpoints.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 4895 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-09 19:02 [Cake] Control theory and congestion control Jonathan Morton
@ 2015-05-10  3:35 ` Dave Taht
  2015-05-10  6:55   ` Jonathan Morton
                     ` (2 more replies)
  2015-05-10 16:48 ` [Cake] " Sebastian Moeller
  1 sibling, 3 replies; 18+ messages in thread
From: Dave Taht @ 2015-05-10  3:35 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel, bloat

On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>> The "right" amount of buffering is *1* packet, all the time (the goal is
>> nearly 0 latency with 100% utilization). We are quite far from achieving
>> that on anything...
>
> And control theory shows, I think, that we never will unless the mechanisms
> available to us for signalling congestion improve. ECN is good, but it's not
> sufficient to achieve that ultimate goal. I'll try to explain why.

The conex and dctcp work explored using ecn for multi-bit signalling.

While this is a great set of analogies below (and why I am broadening
the cc) there are two things missing from it.

>
> Aside from computer networking, I also dabble in computer simulated trains.
> Some of my bigger projects involve detailed simulations of what goes on
> inside them, especially the older ones which are relatively simple. These
> were built at a time when the idea of putting anything as delicate as a
> transistor inside what was effectively a megawatt-class power station was
> unthinkable, so the control gear tended to be electromechanical or even
> electropneumatic. The control laws therefore tended to be the simplest ones
> they could get away with.
>
> The bulk of the generated power went into the main traction circuit, where a
> dedicated main generator is connected rather directly to the traction motors
> through a small amount of switchgear (mainly to reverse the fields on the
> motors at either end off the line). Control of the megawatts of power
> surging through this circuit was effected by varying the excitation of the
> main generator. Excitation is in turn provided by shunting the auxiliary
> voltage through an automatic rheostat known as the Load Regulator before it
> reaches the field winding of the generator. Without field current, the
> generator produces no power.
>
> The load regulator is what I want to focus on here. Its job was to adjust
> the output of the generator to match the power - more precisely the torque -
> that the engine was capable of producing (or, in English Electric locos at
> least, the torque set by the driver's controls, which wasn't always the
> maximum). The load regulator had a little electric motor to move it up and
> down. A good proxy for engine torque was available in the form of the fuel
> rack position; the torque output of a diesel engine is closely related to
> the amount of fuel injected per cycle. The fuel rack, of course, was
> controlled by the governor which was set to maintain a particular engine
> speed; a straightforward PI control problem solved by a reasonably simple
> mechanical device.
>
> So it looks like a simple control problem; if the torque is too low,
> increase the excitation, and vice versa.
>
> Congestion control looks like a simple problem too. If there is no
> congestion, increase the amount of data in flight; if there is, reduce it.
> We even have Explicit Congestion Notification now to tell us that crucial
> data point, but we could always infer it from dropped packets before.
>
> So what does the load regulator's control system look like? It has as many
> as five states: fast down, slow down, hold, slow up, fast up. It turns out
> that trains really like changes in tractive effort to be slow and smooth,
> and as infrequent as possible. So while a very simple "bang bang" control
> scheme would be possible, it would inevitably oscillate around the set point
> instead of settling on it. Introducing a central hold state allows it to
> settle when cruising at constant speed, and the two slow states allow the
> sort of fine adjustments needed as a train gradually accelerates or slows,
> putting the generator only slightly out of balance with the engine. The fast
> states remain to allow for quick response to large changes - the driver
> moves the throttle, or the motors abruptly reconfigure for a different speed
> range (the electrical equivalent of changing gear).
>
> On the Internet, we're firmly stuck with bang-bang control. As big an
> improvement as ECN is, it still provides only one bit of information to the
> sender: whether or not there was congestion reported during the last RTT.
> Thus we can only use the "slow up" and "fast down" states of our virtual
> load regulator (except for slow start, which ironically uses the "fast up"
> state), and we are doomed to oscillate around the ideal congestion window,
> never actually settling on it.
>
> Bufferbloat is fundamentally about having insufficient information at the
> endpoints about conditions in the network.

Well said.

> We've done a lot to improve that,
> by moving from zero information to one bit per RTT. But to achieve that holy
> grail, we need more information still.

context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.

> Specifically, we need to know when we're at the correct BDP, not just when
> it's too high. And it'd be nice if we also knew if we were close to it. But
> there is currently no way to provide that information from the network to
> the endpoints.

This is where I was pointing out that FQ and the behavior of multiple
flows in their two phases (slow start and congestion avoidance)
provides a few pieces of useful information  that could possibly be
used to get closer to the ideal.

We know total service times for all active flows. We also have a
separate calculable service time for "sparse flows" in two algorithms
we understand deeply.

We could have some grip on the history for flows that are not currently queued.

We know that the way we currently seek new set points tend to be
bursty ("chasing the inchworm" - I still gotta use that title on a
paper!).

New flows tend to be extremely bursty - and new flows in the real
world also tend to be pretty short, with 95% of all web traffic
fitting into a single IW10.

If e2e we know we are being FQ´d, and yet are bursting to find new
setpoints we can infer from the spacing on the other endpoint what the
contention really is.

There was a stanford result for 10s of thousands of flows that found
an ideal setpoint much lower than we are achieving for dozens, at much
higher rates.

A control theory-ish issue with codel is that it depends on an
arbitrary ideal (5ms) as a definition for "good queue", where "a
gooder queue"
is, in my definition at the moment, "1 packet outstanding ever closer
to 100% of the time while there is 100% utilization".

We could continue to bang on things (reducing the target or other
methods) and aim for a lower ideal setpoint until utilization dropped
below 100%.

Which becomes easier the more flows we know are in progress.

> - Jonathan Morton
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-10  3:35 ` Dave Taht
@ 2015-05-10  6:55   ` Jonathan Morton
  2015-05-10 17:00     ` [Cake] [Codel] " Sebastian Moeller
  2015-05-10 14:46   ` [Cake] " Jonathan Morton
  2015-05-10 17:04   ` [Cake] [Codel] " Sebastian Moeller
  2 siblings, 1 reply; 18+ messages in thread
From: Jonathan Morton @ 2015-05-10  6:55 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake, codel, bloat

> On 10 May, 2015, at 06:35, Dave Taht <dave.taht@gmail.com> wrote:
> 
> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>> that on anything...
>> 
>> And control theory shows, I think, that we never will unless the mechanisms
>> available to us for signalling congestion improve. ECN is good, but it's not
>> sufficient to achieve that ultimate goal. I'll try to explain why.
> 
> The conex and dctcp work explored using ecn for multi-bit signalling.

A quick glance at those indicates that they’re focusing on the echo path - getting the data back from the receiver to the sender.  That’s the *easy* part; all you need is a small TCP option, which can be slotted into the padding left by TCP Timestamps and/or SACK, so it doesn’t even take any extra space.

But they do nothing to address the problem of allowing routers to provide a “hold” signal.  Even a single ECN mark has to be taken to mean “back off”; being able to signal that more than one ECN mark happened in one RTT simply means that you now have a way to say “back off harder”.

The problem is that we need a three-bit signal (five new-style signalling states, two states indicating legacy ECN support, and one “ECN unsupported” state) at the IP layer to do it properly, and we’re basically out of bits there, at least in IPv4.  The best solution I can think of right now is to use both of the ECT states somehow, but we’d have to make sure that doesn’t conflict too badly with existing uses of ECT(1), such as the “nonce sum”.  Backwards and forwards compatibility here is essential.

I’m thinking about the problem.

>> Bufferbloat is fundamentally about having insufficient information at the
>> endpoints about conditions in the network.
> 
> Well said.
> 
>> We've done a lot to improve that,
>> by moving from zero information to one bit per RTT. But to achieve that holy
>> grail, we need more information still.
> 
> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
> 
>> Specifically, we need to know when we're at the correct BDP, not just when
>> it's too high. And it'd be nice if we also knew if we were close to it. But
>> there is currently no way to provide that information from the network to
>> the endpoints.
> 
> This is where I was pointing out that FQ and the behavior of multiple
> flows in their two phases (slow start and congestion avoidance)
> provides a few pieces of useful information  that could possibly be
> used to get closer to the ideal.

There certainly is enough information available in fq_codel and cake to derive a five-state congestion signal, rather than a two-state one, with very little extra effort.

Flow is sparse -> “Fast up”
Flow is saturating, but no standing queue -> “Slow up”
Flow is saturating, with small standing queue -> “Hold”
Flow is saturating, with large standing queue -> “Slow down”
Flow is saturating, with large, *persistent* standing queue -> “Fast down”

In simple terms, “fast” here means “multiplicative” and “slow” means “additive”, in the sense of AIMD being the current standard for TCP behaviour.  AIMD itself is a result of the two-state “bang-bang” control model introduced back in the 1980s.

It’s worth remembering that the Great Internet Congestion Collapse Event was 30 years ago, and ECN was specified 15 years ago.

> A control theory-ish issue with codel is that it depends on an arbitrary ideal (5ms) as a definition for "good queue", where "a
> gooder queue” is, in my definition at the moment, "1 packet outstanding ever closer to 100% of the time while there is 100% utilization”.

As the above table shows, Codel reacts (by design) only to the most extreme situation that we would want to plug into an improved congestion-control model.  It’s really quite remarkable, in that context, that it works as well as it does.  I don’t think we can hope to do significantly better until a better signalling mechanism is available.

But it does highlight that the correct meaning of an ECN mark is “back off hard, now”.  That’s how it’s currently interpreted by TCPs, in accordance with the ECN RFCs, and Codel relies on that behaviour too.  We have to use some other, deliberately softer signal to give a “hold” or even a “slow down” indication.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-10  3:35 ` Dave Taht
  2015-05-10  6:55   ` Jonathan Morton
@ 2015-05-10 14:46   ` Jonathan Morton
  2015-05-10 17:04   ` [Cake] [Codel] " Sebastian Moeller
  2 siblings, 0 replies; 18+ messages in thread
From: Jonathan Morton @ 2015-05-10 14:46 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake, codel, bloat

> On 10 May, 2015, at 06:35, Dave Taht <dave.taht@gmail.com> wrote:
> 
> New flows tend to be extremely bursty - and new flows in the real
> world also tend to be pretty short, with 95% of all web traffic
> fitting into a single IW10.

There is some hope that HTTP/2 will reduce the prevalence of this characteristic.  It might take a while to reach full effect, due to how much sharding there is right now, but I’m mildly optimistic - it’s an application software change rather than at kernel level.  So there’ll be more flows spending more time in the congestion-avoidance state than in slow-start.

Meanwhile, I understand the reasons behind IW10, but it’s clear that pacing those packets according to the RTT measured during the TCP handshake is desirable.  That *does* need kernel support, but it has the fairly large benefit of not attempting to stuff ten packets into a buffer at the same time.  On drop-tail FIFOs, that inevitably leads to a spike in induced latency (more so if several flows start up in parallel) and a relatively high chance of burst loss, requiring retransmission of some of those packets anyway.

Aside from sch_fq, what support for TCP pacing is out there?

> If e2e we know we are being FQ´d…

In general, E2E we don’t know what’s in the middle unless we receive notice about it.  Or unless we’re in a lab.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-09 19:02 [Cake] Control theory and congestion control Jonathan Morton
  2015-05-10  3:35 ` Dave Taht
@ 2015-05-10 16:48 ` Sebastian Moeller
  2015-05-10 18:32   ` Jonathan Morton
  1 sibling, 1 reply; 18+ messages in thread
From: Sebastian Moeller @ 2015-05-10 16:48 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

Hi Jonathan,

interesting post, lots of points to think about ;)


On May 9, 2015, at 21:02 , Jonathan Morton <chromatix99@gmail.com> wrote:

> > The "right" amount of buffering is *1* packet, all the time (the goal is nearly 0 latency with 100% utilization). We are quite far from achieving that on anything...
> 
> And control theory shows, I think, that we never will unless the mechanisms available to us for signalling congestion improve. ECN is good, but it's not sufficient to achieve that ultimate goal. I'll try to explain why.

	I wonder, given the potentially hostile state of the internet, can we realistically expect more than what we have right now?

> 
> Aside from computer networking, I also dabble in computer simulated trains. Some of my bigger projects involve detailed simulations of what goes on inside them, especially the older ones which are relatively simple. These were built at a time when the idea of putting anything as delicate as a transistor inside what was effectively a megawatt-class power station was unthinkable, so the control gear tended to be electromechanical or even electropneumatic. The control laws therefore tended to be the simplest ones they could get away with.
> 
> The bulk of the generated power went into the main traction circuit, where a dedicated main generator is connected rather directly to the traction motors through a small amount of switchgear (mainly to reverse the fields on the motors at either end off the line). Control of the megawatts of power surging through this circuit was effected by varying the excitation of the main generator. Excitation is in turn provided by shunting the auxiliary voltage through an automatic rheostat known as the Load Regulator before it reaches the field winding of the generator. Without field current, the generator produces no power.
> 
> The load regulator is what I want to focus on here. Its job was to adjust the output of the generator to match the power - more precisely the torque - that the engine was capable of producing (or, in English Electric locos at least, the torque set by the driver's controls, which wasn't always the maximum). The load regulator had a little electric motor to move it up and down. A good proxy for engine torque was available in the form of the fuel rack position; the torque output of a diesel engine is closely related to the amount of fuel injected per cycle. The fuel rack, of course, was controlled by the governor which was set to maintain a particular engine speed; a straightforward PI control problem solved by a reasonably simple mechanical device.
> 
> So it looks like a simple control problem; if the torque is too low, increase the excitation, and vice versa.
> 
> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.

	I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.

> 
> So what does the load regulator's control system look like? It has as many as five states: fast down, slow down, hold, slow up, fast up. It turns out that trains really like changes in tractive effort to be slow and smooth, and as infrequent as possible. So while a very simple "bang bang" control scheme would be possible, it would inevitably oscillate around the set point instead of settling on it. Introducing a central hold state allows it to settle when cruising at constant speed, and the two slow states allow the sort of fine adjustments needed as a train gradually accelerates or slows, putting the generator only slightly out of balance with the engine. The fast states remain to allow for quick response to large changes - the driver moves the throttle, or the motors abruptly reconfigure for a different speed range (the electrical equivalent of changing gear).

	I think I see where you are going with this ;). Question: how would a 5 state system look at an intermediate network router? I have two [points I do not see clear at all. 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete? And how can the intermediate router control/check that a flow truly is well-behaved, especially with all the allergies against keeping per-flow state that router’s seem to have?

> 
> On the Internet, we're firmly stuck with bang-bang control. As big an improvement as ECN is, it still provides only one bit of information to the sender: whether or not there was congestion reported during the last RTT. Thus we can only use the "slow up" and "fast down" states of our virtual load regulator (except for slow start, which ironically uses the "fast up" state), and we are doomed to oscillate around the ideal congestion window, never actually settling on it.

	Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?

> 
> Bufferbloat is fundamentally about having insufficient information at the endpoints about conditions in the network. We've done a lot to improve that, by moving from zero information to one bit per RTT. But to achieve that holy grail, we need more information still.

	True, but how stable is a network path actually over seconds time frames?

> 
> Specifically, we need to know when we're at the correct BDP, not just when it's too high. And it'd be nice if we also knew if we were close to it. But there is currently no way to provide that information from the network to the endpoints.

	Could an intermediate router actually figure out what signal to send all flows realistically?

Best Regards
	Sebastian

> 
> - Jonathan Morton
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [Codel]  Control theory and congestion control
  2015-05-10  6:55   ` Jonathan Morton
@ 2015-05-10 17:00     ` Sebastian Moeller
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Moeller @ 2015-05-10 17:00 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel, bloat

Hi Jonathan,


On May 10, 2015, at 08:55 , Jonathan Morton <chromatix99@gmail.com> wrote:

> 
>> On 10 May, 2015, at 06:35, Dave Taht <dave.taht@gmail.com> wrote:
>> 
>> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>>> that on anything...
>>> 
>>> And control theory shows, I think, that we never will unless the mechanisms
>>> available to us for signalling congestion improve. ECN is good, but it's not
>>> sufficient to achieve that ultimate goal. I'll try to explain why.
>> 
>> The conex and dctcp work explored using ecn for multi-bit signalling.
> 
> A quick glance at those indicates that they’re focusing on the echo path - getting the data back from the receiver to the sender.  That’s the *easy* part; all you need is a small TCP option, which can be slotted into the padding left by TCP Timestamps and/or SACK, so it doesn’t even take any extra space.
> 
> But they do nothing to address the problem of allowing routers to provide a “hold” signal.  Even a single ECN mark has to be taken to mean “back off”; being able to signal that more than one ECN mark happened in one RTT simply means that you now have a way to say “back off harder”.
> 
> The problem is that we need a three-bit signal (five new-style signalling states, two states indicating legacy ECN support, and one “ECN unsupported” state) at the IP layer to do it properly, and we’re basically out of bits there, at least in IPv4.  The best solution I can think of right now is to use both of the ECT states somehow, but we’d have to make sure that doesn’t conflict too badly with existing uses of ECT(1), such as the “nonce sum”.  Backwards and forwards compatibility here is essential.

	On the danger of sounding like I had a tin of snark for breakfast; what about re-dedicating 3 of the 6 TOS bits for this ;) (if I understand correctly ethernet and MPLS transports only allow 3 bits anyway, so the 6 bits are fiction anyway, outside of l3-routers) And the BCP still is to re-color the TOS bits in ingress, so I guess 3 bits should be plenty.

Best Regards
	Sebastian

> 
> I’m thinking about the problem.
> 
>>> Bufferbloat is fundamentally about having insufficient information at the
>>> endpoints about conditions in the network.
>> 
>> Well said.
>> 
>>> We've done a lot to improve that,
>>> by moving from zero information to one bit per RTT. But to achieve that holy
>>> grail, we need more information still.
>> 
>> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
>> 
>>> Specifically, we need to know when we're at the correct BDP, not just when
>>> it's too high. And it'd be nice if we also knew if we were close to it. But
>>> there is currently no way to provide that information from the network to
>>> the endpoints.
>> 
>> This is where I was pointing out that FQ and the behavior of multiple
>> flows in their two phases (slow start and congestion avoidance)
>> provides a few pieces of useful information  that could possibly be
>> used to get closer to the ideal.
> 
> There certainly is enough information available in fq_codel and cake to derive a five-state congestion signal, rather than a two-state one, with very little extra effort.
> 
> Flow is sparse -> “Fast up”
> Flow is saturating, but no standing queue -> “Slow up”
> Flow is saturating, with small standing queue -> “Hold”
> Flow is saturating, with large standing queue -> “Slow down”
> Flow is saturating, with large, *persistent* standing queue -> “Fast down”
> 
> In simple terms, “fast” here means “multiplicative” and “slow” means “additive”, in the sense of AIMD being the current standard for TCP behaviour.  AIMD itself is a result of the two-state “bang-bang” control model introduced back in the 1980s.
> 
> It’s worth remembering that the Great Internet Congestion Collapse Event was 30 years ago, and ECN was specified 15 years ago.
> 
>> A control theory-ish issue with codel is that it depends on an arbitrary ideal (5ms) as a definition for "good queue", where "a
>> gooder queue” is, in my definition at the moment, "1 packet outstanding ever closer to 100% of the time while there is 100% utilization”.
> 
> As the above table shows, Codel reacts (by design) only to the most extreme situation that we would want to plug into an improved congestion-control model.  It’s really quite remarkable, in that context, that it works as well as it does.  I don’t think we can hope to do significantly better until a better signalling mechanism is available.
> 
> But it does highlight that the correct meaning of an ECN mark is “back off hard, now”.  That’s how it’s currently interpreted by TCPs, in accordance with the ECN RFCs, and Codel relies on that behaviour too.  We have to use some other, deliberately softer signal to give a “hold” or even a “slow down” indication.
> 
> - Jonathan Morton
> 
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [Codel]  Control theory and congestion control
  2015-05-10  3:35 ` Dave Taht
  2015-05-10  6:55   ` Jonathan Morton
  2015-05-10 14:46   ` [Cake] " Jonathan Morton
@ 2015-05-10 17:04   ` Sebastian Moeller
  2015-05-10 17:48     ` Dave Taht
  2 siblings, 1 reply; 18+ messages in thread
From: Sebastian Moeller @ 2015-05-10 17:04 UTC (permalink / raw)
  To: Dave Täht; +Cc: cake, codel, bloat


On May 10, 2015, at 05:35 , Dave Taht <dave.taht@gmail.com> wrote:

> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>> that on anything...
>> 
>> And control theory shows, I think, that we never will unless the mechanisms
>> available to us for signalling congestion improve. ECN is good, but it's not
>> sufficient to achieve that ultimate goal. I'll try to explain why.
> 
> The conex and dctcp work explored using ecn for multi-bit signalling.
> 
> While this is a great set of analogies below (and why I am broadening
> the cc) there are two things missing from it.
> 
>> 
>> Aside from computer networking, I also dabble in computer simulated trains.
>> Some of my bigger projects involve detailed simulations of what goes on
>> inside them, especially the older ones which are relatively simple. These
>> were built at a time when the idea of putting anything as delicate as a
>> transistor inside what was effectively a megawatt-class power station was
>> unthinkable, so the control gear tended to be electromechanical or even
>> electropneumatic. The control laws therefore tended to be the simplest ones
>> they could get away with.
>> 
>> The bulk of the generated power went into the main traction circuit, where a
>> dedicated main generator is connected rather directly to the traction motors
>> through a small amount of switchgear (mainly to reverse the fields on the
>> motors at either end off the line). Control of the megawatts of power
>> surging through this circuit was effected by varying the excitation of the
>> main generator. Excitation is in turn provided by shunting the auxiliary
>> voltage through an automatic rheostat known as the Load Regulator before it
>> reaches the field winding of the generator. Without field current, the
>> generator produces no power.
>> 
>> The load regulator is what I want to focus on here. Its job was to adjust
>> the output of the generator to match the power - more precisely the torque -
>> that the engine was capable of producing (or, in English Electric locos at
>> least, the torque set by the driver's controls, which wasn't always the
>> maximum). The load regulator had a little electric motor to move it up and
>> down. A good proxy for engine torque was available in the form of the fuel
>> rack position; the torque output of a diesel engine is closely related to
>> the amount of fuel injected per cycle. The fuel rack, of course, was
>> controlled by the governor which was set to maintain a particular engine
>> speed; a straightforward PI control problem solved by a reasonably simple
>> mechanical device.
>> 
>> So it looks like a simple control problem; if the torque is too low,
>> increase the excitation, and vice versa.
>> 
>> Congestion control looks like a simple problem too. If there is no
>> congestion, increase the amount of data in flight; if there is, reduce it.
>> We even have Explicit Congestion Notification now to tell us that crucial
>> data point, but we could always infer it from dropped packets before.
>> 
>> So what does the load regulator's control system look like? It has as many
>> as five states: fast down, slow down, hold, slow up, fast up. It turns out
>> that trains really like changes in tractive effort to be slow and smooth,
>> and as infrequent as possible. So while a very simple "bang bang" control
>> scheme would be possible, it would inevitably oscillate around the set point
>> instead of settling on it. Introducing a central hold state allows it to
>> settle when cruising at constant speed, and the two slow states allow the
>> sort of fine adjustments needed as a train gradually accelerates or slows,
>> putting the generator only slightly out of balance with the engine. The fast
>> states remain to allow for quick response to large changes - the driver
>> moves the throttle, or the motors abruptly reconfigure for a different speed
>> range (the electrical equivalent of changing gear).
>> 
>> On the Internet, we're firmly stuck with bang-bang control. As big an
>> improvement as ECN is, it still provides only one bit of information to the
>> sender: whether or not there was congestion reported during the last RTT.
>> Thus we can only use the "slow up" and "fast down" states of our virtual
>> load regulator (except for slow start, which ironically uses the "fast up"
>> state), and we are doomed to oscillate around the ideal congestion window,
>> never actually settling on it.
>> 
>> Bufferbloat is fundamentally about having insufficient information at the
>> endpoints about conditions in the network.
> 
> Well said.
> 
>> We've done a lot to improve that,
>> by moving from zero information to one bit per RTT. But to achieve that holy
>> grail, we need more information still.
> 
> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
> 
>> Specifically, we need to know when we're at the correct BDP, not just when
>> it's too high. And it'd be nice if we also knew if we were close to it. But
>> there is currently no way to provide that information from the network to
>> the endpoints.
> 
> This is where I was pointing out that FQ and the behavior of multiple
> flows in their two phases (slow start and congestion avoidance)
> provides a few pieces of useful information  that could possibly be
> used to get closer to the ideal.
> 
> We know total service times for all active flows. We also have a
> separate calculable service time for "sparse flows" in two algorithms
> we understand deeply.
> 
> We could have some grip on the history for flows that are not currently queued.
> 
> We know that the way we currently seek new set points tend to be
> bursty ("chasing the inchworm" - I still gotta use that title on a
> paper!).
> 
> New flows tend to be extremely bursty - and new flows in the real
> world also tend to be pretty short, with 95% of all web traffic
> fitting into a single IW10.
> 
> If e2e we know we are being FQ´d, and yet are bursting to find new
> setpoints we can infer from the spacing on the other endpoint what the
> contention really is.
> 
> There was a stanford result for 10s of thousands of flows that found
> an ideal setpoint much lower than we are achieving for dozens, at much
> higher rates.
> 
> A control theory-ish issue with codel is that it depends on an
> arbitrary ideal (5ms) as a definition for "good queue", where "a
> gooder queue”

	I thought that our set point really is 5% of the estimated RTT, and we just default to 5 sincere we guestimate our RTT to be 100ms. Not that I complain, these two numbers seem to work decently over a relive broad range of true RTTs…


Best Regards
	Sebastian

> is, in my definition at the moment, "1 packet outstanding ever closer
> to 100% of the time while there is 100% utilization".
> 
> We could continue to bang on things (reducing the target or other
> methods) and aim for a lower ideal setpoint until utilization dropped
> below 100%.
> 
> Which becomes easier the more flows we know are in progress.
> 
>> - Jonathan Morton
>> 
>> 
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>> 
> 
> 
> 
> -- 
> Dave Täht
> Open Networking needs **Open Source Hardware**
> 
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [Codel]  Control theory and congestion control
  2015-05-10 17:04   ` [Cake] [Codel] " Sebastian Moeller
@ 2015-05-10 17:48     ` Dave Taht
  2015-05-10 17:58       ` Dave Taht
  2015-05-10 18:25       ` Dave Taht
  0 siblings, 2 replies; 18+ messages in thread
From: Dave Taht @ 2015-05-10 17:48 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake, codel, bloat

On Sun, May 10, 2015 at 10:04 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
> On May 10, 2015, at 05:35 , Dave Taht <dave.taht@gmail.com> wrote:
>
>> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>>> that on anything...
>>>
>>> And control theory shows, I think, that we never will unless the mechanisms
>>> available to us for signalling congestion improve. ECN is good, but it's not
>>> sufficient to achieve that ultimate goal. I'll try to explain why.
>>
>> The conex and dctcp work explored using ecn for multi-bit signalling.
>>
>> While this is a great set of analogies below (and why I am broadening
>> the cc) there are two things missing from it.
>>
>>>
>>> Aside from computer networking, I also dabble in computer simulated trains.
>>> Some of my bigger projects involve detailed simulations of what goes on
>>> inside them, especially the older ones which are relatively simple. These
>>> were built at a time when the idea of putting anything as delicate as a
>>> transistor inside what was effectively a megawatt-class power station was
>>> unthinkable, so the control gear tended to be electromechanical or even
>>> electropneumatic. The control laws therefore tended to be the simplest ones
>>> they could get away with.
>>>
>>> The bulk of the generated power went into the main traction circuit, where a
>>> dedicated main generator is connected rather directly to the traction motors
>>> through a small amount of switchgear (mainly to reverse the fields on the
>>> motors at either end off the line). Control of the megawatts of power
>>> surging through this circuit was effected by varying the excitation of the
>>> main generator. Excitation is in turn provided by shunting the auxiliary
>>> voltage through an automatic rheostat known as the Load Regulator before it
>>> reaches the field winding of the generator. Without field current, the
>>> generator produces no power.
>>>
>>> The load regulator is what I want to focus on here. Its job was to adjust
>>> the output of the generator to match the power - more precisely the torque -
>>> that the engine was capable of producing (or, in English Electric locos at
>>> least, the torque set by the driver's controls, which wasn't always the
>>> maximum). The load regulator had a little electric motor to move it up and
>>> down. A good proxy for engine torque was available in the form of the fuel
>>> rack position; the torque output of a diesel engine is closely related to
>>> the amount of fuel injected per cycle. The fuel rack, of course, was
>>> controlled by the governor which was set to maintain a particular engine
>>> speed; a straightforward PI control problem solved by a reasonably simple
>>> mechanical device.
>>>
>>> So it looks like a simple control problem; if the torque is too low,
>>> increase the excitation, and vice versa.
>>>
>>> Congestion control looks like a simple problem too. If there is no
>>> congestion, increase the amount of data in flight; if there is, reduce it.
>>> We even have Explicit Congestion Notification now to tell us that crucial
>>> data point, but we could always infer it from dropped packets before.
>>>
>>> So what does the load regulator's control system look like? It has as many
>>> as five states: fast down, slow down, hold, slow up, fast up. It turns out
>>> that trains really like changes in tractive effort to be slow and smooth,
>>> and as infrequent as possible. So while a very simple "bang bang" control
>>> scheme would be possible, it would inevitably oscillate around the set point
>>> instead of settling on it. Introducing a central hold state allows it to
>>> settle when cruising at constant speed, and the two slow states allow the
>>> sort of fine adjustments needed as a train gradually accelerates or slows,
>>> putting the generator only slightly out of balance with the engine. The fast
>>> states remain to allow for quick response to large changes - the driver
>>> moves the throttle, or the motors abruptly reconfigure for a different speed
>>> range (the electrical equivalent of changing gear).
>>>
>>> On the Internet, we're firmly stuck with bang-bang control. As big an
>>> improvement as ECN is, it still provides only one bit of information to the
>>> sender: whether or not there was congestion reported during the last RTT.
>>> Thus we can only use the "slow up" and "fast down" states of our virtual
>>> load regulator (except for slow start, which ironically uses the "fast up"
>>> state), and we are doomed to oscillate around the ideal congestion window,
>>> never actually settling on it.
>>>
>>> Bufferbloat is fundamentally about having insufficient information at the
>>> endpoints about conditions in the network.
>>
>> Well said.
>>
>>> We've done a lot to improve that,
>>> by moving from zero information to one bit per RTT. But to achieve that holy
>>> grail, we need more information still.
>>
>> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
>>
>>> Specifically, we need to know when we're at the correct BDP, not just when
>>> it's too high. And it'd be nice if we also knew if we were close to it. But
>>> there is currently no way to provide that information from the network to
>>> the endpoints.
>>
>> This is where I was pointing out that FQ and the behavior of multiple
>> flows in their two phases (slow start and congestion avoidance)
>> provides a few pieces of useful information  that could possibly be
>> used to get closer to the ideal.
>>
>> We know total service times for all active flows. We also have a
>> separate calculable service time for "sparse flows" in two algorithms
>> we understand deeply.
>>
>> We could have some grip on the history for flows that are not currently queued.
>>
>> We know that the way we currently seek new set points tend to be
>> bursty ("chasing the inchworm" - I still gotta use that title on a
>> paper!).
>>
>> New flows tend to be extremely bursty - and new flows in the real
>> world also tend to be pretty short, with 95% of all web traffic
>> fitting into a single IW10.
>>
>> If e2e we know we are being FQ´d, and yet are bursting to find new
>> setpoints we can infer from the spacing on the other endpoint what the
>> contention really is.
>>
>> There was a stanford result for 10s of thousands of flows that found
>> an ideal setpoint much lower than we are achieving for dozens, at much
>> higher rates.
>>
>> A control theory-ish issue with codel is that it depends on an
>> arbitrary ideal (5ms) as a definition for "good queue", where "a
>> gooder queue”
>
>         I thought that our set point really is 5% of the estimated RTT, and we just default to 5 sincere we guestimate our RTT to be 100ms. Not that I complain, these two numbers seem to work decently over a relive broad range of true RTTs…

Yes, I should have talked about it as estimated RTT (interval) and a
seemingly desirable percentage(target). It is very helpful to think of
it that way if (as in my current testing) you are trying to see how
much better you can do at very short (sub 1ms) RTTs, where it really
is the interval you want to be modifying...

I have been fiddling with as a proof of concept - not an actual
algorithm - how much shorter you can make the queues at short RTTs.
What I did was gradually (per packet) subtract 10ns from the cake
target while at 100% utilization until the target hit 1ms (or bytes
outstanding dropped below 3k). Had the cake code still used a
calculated target from the interval (target >> 4) I would have fiddled
with the interval instead. Using the netperf-wrapper tcp_upload test:

There were two significant results from that (I really should just
start a blog so I can do images inline)

1) At 100Mbit, TSO offloads (bulking) add significant latency to
competing streams:

http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/offload_damage_100mbit.png

This gets much worse as you add tcp flows. I figure day traders would
take notice. TSO packets have much more mass.

2) You CAN get less packets outstanding at this RTT and still keep the
link 100% utilized.

The default codel algo stayed steady at 30-31 packets outstanding with
no losses or marks evident (TSQ?) while the shrinking dynamic target
ecn marked fairly heavily and ultimately reduced the packets
outstanding to 7-17 packets with a slight improvement in actual
throughput. (This stuff is so totally inside the noise floor that it
is hard to discern a difference at all - and you can see the linux
de-optimization for handing ping packets off to hardware in some of
the tests, after the tcp flows end, which skews the latency figures)

http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/dynamic_target_vs_static.png

I think it is back to ns3 to get better grips on some of this.

>
>
> Best Regards
>         Sebastian
>
>> is, in my definition at the moment, "1 packet outstanding ever closer
>> to 100% of the time while there is 100% utilization".
>>
>> We could continue to bang on things (reducing the target or other
>> methods) and aim for a lower ideal setpoint until utilization dropped
>> below 100%.
>>
>> Which becomes easier the more flows we know are in progress.
>>
>>> - Jonathan Morton
>>>
>>>
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>>
>>
>>
>>
>> --
>> Dave Täht
>> Open Networking needs **Open Source Hardware**
>>
>> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
>> _______________________________________________
>> Codel mailing list
>> Codel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/codel
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [Codel]  Control theory and congestion control
  2015-05-10 17:48     ` Dave Taht
@ 2015-05-10 17:58       ` Dave Taht
  2015-05-10 18:25       ` Dave Taht
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Taht @ 2015-05-10 17:58 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake, codel, bloat

This was that patch against the https://github.com/dtaht/sch_cake repo.

http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/0001-sch_cake-add-experimental-decreasing-target-at-100-p.patch

(I am not seriously proposing this for anything... but I am loving
having cake be out of the main linux tree. I can have a new idea, make
a change to the algo, compile, test in a matter of seconds, and/or run
a comprehensive netperf-wrapper suite over a cup of coffee, and then
try something else)...

there is something of a backlog of new ideas on the cake mailing list
and elsewhere.

I guess I should track the random ideas in branches in the github repo.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [Codel]  Control theory and congestion control
  2015-05-10 17:48     ` Dave Taht
  2015-05-10 17:58       ` Dave Taht
@ 2015-05-10 18:25       ` Dave Taht
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Taht @ 2015-05-10 18:25 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake, codel, bloat

On Sun, May 10, 2015 at 10:48 AM, Dave Taht <dave.taht@gmail.com> wrote:

>>> A control theory-ish issue with codel is that it depends on an
>>> arbitrary ideal (5ms) as a definition for "good queue", where "a
>>> gooder queue”
>>
>>         I thought that our set point really is 5% of the estimated RTT, and we just default to 5 sincere we guestimate our RTT to be 100ms. Not that I complain, these two numbers seem to work decently over a relive broad range of true RTTs…
>
> Yes, I should have talked about it as estimated RTT (interval) and a
> seemingly desirable percentage(target). It is very helpful to think of
> it that way if (as in my current testing) you are trying to see how
> much better you can do at very short (sub 1ms) RTTs, where it really
> is the interval you want to be modifying...

oops. I meant target = interval >> 4; and would have decreased
interval by a larger amount or something relative to the rate, but
merely wanted to see the slope of the curve, and really need to write
cake_drop_monitor rather than just "watch tc -s qdisc show dev eth0"

>
> I have been fiddling with as a proof of concept - not an actual
> algorithm - how much shorter you can make the queues at short RTTs.
> What I did was gradually (per packet) subtract 10ns from the cake
> target while at 100% utilization until the target hit 1ms (or bytes
> outstanding dropped below 3k). Had the cake code still used a
> calculated target from the interval (target >> 4) I would have fiddled
> with the interval instead. Using the netperf-wrapper tcp_upload test:
>
> There were two significant results from that (I really should just
> start a blog so I can do images inline)
>
> 1) At 100Mbit, TSO offloads (bulking) add significant latency to
> competing streams:
>
> http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/offload_damage_100mbit.png
>
> This gets much worse as you add tcp flows. I figure day traders would
> take notice. TSO packets have much more mass.
>
> 2) You CAN get less packets outstanding at this RTT and still keep the
> link 100% utilized.
>
> The default codel algo stayed steady at 30-31 packets outstanding with
> no losses or marks evident (TSQ?) while the shrinking dynamic target
> ecn marked fairly heavily and ultimately reduced the packets
> outstanding to 7-17 packets with a slight improvement in actual
> throughput. (This stuff is so totally inside the noise floor that it
> is hard to discern a difference at all - and you can see the linux
> de-optimization for handing ping packets off to hardware in some of
> the tests, after the tcp flows end, which skews the latency figures)
>
> http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/dynamic_target_vs_static.png
>
> I think it is back to ns3 to get better grips on some of this.
>
>>
>>
>> Best Regards
>>         Sebastian
>>
>>> is, in my definition at the moment, "1 packet outstanding ever closer
>>> to 100% of the time while there is 100% utilization".
>>>
>>> We could continue to bang on things (reducing the target or other
>>> methods) and aim for a lower ideal setpoint until utilization dropped
>>> below 100%.
>>>
>>> Which becomes easier the more flows we know are in progress.
>>>
>>>> - Jonathan Morton
>>>>
>>>>
>>>> _______________________________________________
>>>> Cake mailing list
>>>> Cake@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cake
>>>>
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> Open Networking needs **Open Source Hardware**
>>>
>>> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
>>> _______________________________________________
>>> Codel mailing list
>>> Codel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/codel
>>
>
>
>
> --
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-10 16:48 ` [Cake] " Sebastian Moeller
@ 2015-05-10 18:32   ` Jonathan Morton
  2015-05-11  7:36     ` Sebastian Moeller
  0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Morton @ 2015-05-10 18:32 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake

> On 10 May, 2015, at 19:48, Sebastian Moeller <moeller0@gmx.de> wrote:
> 
>> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.
> 
> I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.

Yes, but I consider that a degraded mode of operation.  Even if it is, for the time being, the dominant mode.

> 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete?

Backwards compatibility for current ECN means dropping non-ECN packets that would have been marked.  That works, so we can use it as a model.

Backwards compatibility for “enhanced” ECN - let’s call it ELR for Explicit Load Regulation - would mean providing legacy ECN signals to legacy ECN traffic.  But, in the absence of flow isolation, if we only marked packets with ECN when they fell into the “fast down” category (which corresponds to their actual behaviour), then they’d get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in the day (and for basically the same reason).

The solution is to provide robust flow isolation, and/or to ECN-mark packets in “hold” and “slow down” states as well as “fast down”.  This ensures that legacy ECN does not unfairly outcompete ELR, although it might reduce ECN traffic’s throughput.

The other side of the compatibility coin is what happens when ELR traffic hits a legacy router (whether ECN enabled or not).  Such a router should be able to recognise ELR packets as ECN and perform ECN marking when appropriate, to be interpreted as a “fast down” signal.  Or, of course, to simply drop packets if it doesn’t even support ECN.

> And how can the intermediate router control/check that a flow truly is well-behaved, especially with all the allergies against keeping per-flow state that router’s seem to have?

Core routers don’t track flow state, but they are typically provisioned to not saturate their links in the first place.  Adequate backwards-compatibility handling will do here.

Edge routers are rather more capable of keeping sufficient per-flow state for effective flow isolation, as cake and fq_codel do.

Unresponsive flows are already just as much of a problem with ECN as they would be with ELR.  Flow isolation contains the problem neatly.  Transitioning to packet drops (ignoring both ECN and ELR) under overload conditions is also a good safety valve.

> Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?

It’s a fair question, and probably requires further research to answer reliably.  However, you should also probably consider the typical nature of the *bottleneck* link, rather than every possible Internet link.  It’s usually the last mile.

> True, but how stable is a network path actually over seconds time frames?

Stable enough for VoIP and multiplayer twitch games to work already, if the link is idle.

> Could an intermediate router actually figure out what signal to send all flows realistically?

I described a possible method of doing so, using information already available in fq_codel and cake.  Whether they would work satisfactorily in practice is an open question.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-10 18:32   ` Jonathan Morton
@ 2015-05-11  7:36     ` Sebastian Moeller
  2015-05-11 11:34       ` Jonathan Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Moeller @ 2015-05-11  7:36 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

Hi Jonathan,

On May 10, 2015, at 20:32 , Jonathan Morton <chromatix99@gmail.com> wrote:

> 
>> On 10 May, 2015, at 19:48, Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>>> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.
>> 
>> I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.
> 
> Yes, but I consider that a degraded mode of operation.  Even if it is, for the time being, the dominant mode.
> 
>> 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete?
> 
> Backwards compatibility for current ECN means dropping non-ECN packets that would have been marked.  That works, so we can use it as a model.

	Let me elaborate, what I mean is if we got an ecn reduce slowly signal on the ecn flow and the router goes into overload, what guarantees that our flow with the double reduce-slowly ecn signal plus the reduce-hard drop will end not end up at an disadvantage over greedy non-ecn flows? It probably is quite simple but I can not see it right now.

> 
> Backwards compatibility for “enhanced” ECN - let’s call it ELR for Explicit Load Regulation - would mean providing legacy ECN signals to legacy ECN traffic.  But, in the absence of flow isolation, if we only marked packets with ECN when they fell into the “fast down” category (which corresponds to their actual behaviour), then they’d get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in the day (and for basically the same reason).

	In other words ELR will be outcompeted by ECN classic?

> 
> The solution is to provide robust flow isolation, and/or to ECN-mark packets in “hold” and “slow down” states as well as “fast down”.  This ensures that legacy ECN does not unfairly outcompete ELR, although it might reduce ECN traffic’s throughput.

	Well if we want ELR to be the next big thing we should aim to make it more competitive than classic ECN (assuming we get enough “buy-in” from the regulating parties, like IETF and friends)

> 
> The other side of the compatibility coin is what happens when ELR traffic hits a legacy router (whether ECN enabled or not).  Such a router should be able to recognise ELR packets as ECN and perform ECN marking when appropriate, to be interpreted as a “fast down” signal.  Or, of course, to simply drop packets if it doesn’t even support ECN.
> 
>> And how can the intermediate router control/check that a flow truly is well-behaved, especially with all the allergies against keeping per-flow state that router’s seem to have?
> 
> Core routers don’t track flow state, but they are typically provisioned to not saturate their links in the first place.  

	This I heard quite often; it always makes me wonder whether there is a better way to design a network to work well at capacity instead of working  around this by simply over-provisining, I thought it is called network engineering not network-“brute-forcing”...

> Adequate backwards-compatibility handling will do here.
> 
> Edge routers are rather more capable of keeping sufficient per-flow state for effective flow isolation, as cake and fq_codel do.

	But we already have a hard time to convince the operators of the edge routers (telcos cable cos…) to actually implement something saner than deep buffers at those devices. If they would at least own up to the head-end buffers for the downlink we would be in much better shape, and if they would offer to handle up-link buffer bloat as part of their optional ISP-router-thingy the issue would be stamped already. But did you look inside a typical CPE recently, still kernel from the 2.X series, so no codel/fq_codel and what ever else fixes were found in the several years since 2.X was the hot new thing…

> 
> Unresponsive flows are already just as much of a problem with ECN as they would be with ELR.  Flow isolation contains the problem neatly.  Transitioning to packet drops (ignoring both ECN and ELR) under overload conditions is also a good safety valve.
> 
>> Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?
> 
> It’s a fair question, and probably requires further research to answer reliably.  However, you should also probably consider the typical nature of the *bottleneck* link, rather than every possible Internet link.  It’s usually the last mile.

	I wish that was true… I switched to a 100/40 link and since then suffer from bad peering of my ISP (this seems to be on purpose to incentivise content providers to agree to payed peering with my ISP, but it seems only very little of the content providers went along, and so I feel that even the router’s connecting different networks could work much better/fairer under saturating load… but I have no real data nor ways to measure it so this is conjecture)

> 
>> True, but how stable is a network path actually over seconds time frames?
> 
> Stable enough for VoIP and multiplayer twitch games to work already, if the link is idle.

	Both of which pretty much try to keep constant bitrate UDP traffic flows going I believe, so they only care if the immediate network path and or alternatives a) has sufficient headroom for the data and b) latency changes due to path re-routing stay inside the de-jitter/de-lag buffer systems that are in use; or put differently, these traffic types will not attempt to saturate a given link by themselves so they are not the most sensitive probes for network path stability, no?

> 
>> Could an intermediate router actually figure out what signal to send all flows realistically?
> 
> I described a possible method of doing so, using information already available in fq_codel and cake.  

	We are back at the issue, how to make sure big routers learn codel /q_codel as options in their AQM subsystems… It would be interesting to know what the cisco’s/juniper’s/huawei’s of the world actually test in their private labs ;)

Best Regards
	Sebastian


> Whether they would work satisfactorily in practice is an open question.
> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-11  7:36     ` Sebastian Moeller
@ 2015-05-11 11:34       ` Jonathan Morton
  2015-05-11 13:54         ` [Cake] Explicit Load Regulation - was: " Jonathan Morton
  2015-05-12 23:23         ` [Cake] " David Lang
  0 siblings, 2 replies; 18+ messages in thread
From: Jonathan Morton @ 2015-05-11 11:34 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake

>>>> Congestion control looks like a simple problem too. If there is no congestion, increase the amount of data in flight; if there is, reduce it. We even have Explicit Congestion Notification now to tell us that crucial data point, but we could always infer it from dropped packets before.
>>> 
>>> I think we critically depend on being able to interpret lost packets as well, as a) not all network nodes use ECN signaling, and b) even those that do can go into “drop-everything” mode if overloaded.
>> 
>> Yes, but I consider that a degraded mode of operation.  Even if it is, for the time being, the dominant mode.
>> 
>>> 1) Competiton with simple greedy non-ECN flows, if these push the router into the dropping regime how will well behaved ECN flows be able to compete?
>> 
>> Backwards compatibility for current ECN means dropping non-ECN packets that would have been marked.  That works, so we can use it as a model.
> 
> 	Let me elaborate, what I mean is if we got an ecn reduce slowly signal on the ecn flow and the router goes into overload, what guarantees that our flow with the double reduce-slowly ecn signal plus the reduce-hard drop will end not end up at an disadvantage over greedy non-ecn flows? It probably is quite simple but I can not see it right now.

There are two possible answers to this:

1) The most restrictive signal seen during an RTT is the one to react to.  So a “fast down” signal overrides anything else.

2) If ELR signals are being received which indicate that the bottleneck queue is basically under control, then it might be reasonable to assume that packet drops in the same RTT are *not* congestion related, but due to random losses.  This is not in itself novel behaviour: Westwood+ uses RTT variation to infer the same thing.

>> Backwards compatibility for “enhanced” ECN - let’s call it ELR for Explicit Load Regulation - would mean providing legacy ECN signals to legacy ECN traffic.  But, in the absence of flow isolation, if we only marked packets with ECN when they fell into the “fast down” category (which corresponds to their actual behaviour), then they’d get a clear advantage over ELR, similar to TCP Vegas vs. Reno back in the day (and for basically the same reason).
> 
> 	In other words ELR will be outcompeted by ECN classic?

Given such a naive implementation, yes.  Bear in mind that I’m essentially thinking out loud here.  The details are *not* all worked out.

>> The solution is to provide robust flow isolation, and/or to ECN-mark packets in “hold” and “slow down” states as well as “fast down”.  This ensures that legacy ECN does not unfairly outcompete ELR, although it might reduce ECN traffic’s throughput.
> 
> 	Well if we want ELR to be the next big thing we should aim to make it more competitive than classic ECN (assuming we get enough “buy-in” from the regulating parties, like IETF and friends)

It’s one possible approach.  Unambiguous throughput improvements probably do sell well.

I’m also now thinking about how to approximate fairness between ELR flows *without* flow isolation.  Since ELR would aim to provide a continuous signal rather than a stochastic one, this is actually a harder problem than it sounds; naively, a new flow would stay at minimum cwnd as long as a competing flow was saturating the link, since both would be given the same up/down signals.  There might need to be some non-obvious properties in the way the signal is provided to overcome that; I have the beginnings of an idea, but need to work it out.

>> Edge routers are rather more capable of keeping sufficient per-flow state for effective flow isolation, as cake and fq_codel do.
> 
> 	But we already have a hard time to convince the operators of the edge routers (telcos cable cos…) to actually implement something saner than deep buffers at those devices. If they would at least own up to the head-end buffers for the downlink we would be in much better shape, and if they would offer to handle up-link buffer bloat as part of their optional ISP-router-thingy the issue would be stamped already. But did you look inside a typical CPE recently, still kernel from the 2.X series, so no codel/fq_codel and what ever else fixes were found in the several years since 2.X was the hot new thing…

For CPE at least, there exists a market opportunity for somebody to fill.  OpenWRT shows what can be done with existing hardware with some user engagement.  In principle, it’s only a short step from there to a new commercial product that Does the Right Things.

>>> Is the steady state, potentially outside of the home, link truly likely enough that an non-oscillating congestion controller will effectively work better? In other words would the intermediate node ever signal hold sufficiently often that implementing this stage seems reasonable?
>> 
>> It’s a fair question, and probably requires further research to answer reliably.  However, you should also probably consider the typical nature of the *bottleneck* link, rather than every possible Internet link.  It’s usually the last mile.
> 
> 	I wish that was true… I switched to a 100/40 link and since then suffer from bad peering of my ISP (this seems to be on purpose to incentivise content providers to agree to payed peering with my ISP, but it seems only very little of the content providers went along, and so I feel that even the router’s connecting different networks could work much better/fairer under saturating load… but I have no real data nor ways to measure it so this is conjecture)

>> Core routers don’t track flow state, but they are typically provisioned to not saturate their links in the first place.  
> 
> 	This I heard quite often; it always makes me wonder whether there is a better way to design a network to work well at capacity instead of working  around this by simply over-provisining, I thought it is called network engineering not network-“brute-forcing”…

Peering points are one of the few “core like” locations where adequate capacity cannot be relied on.  Fortunately, what I hear is that peering links are often made using a set of 10GbE cables.  At 10Gbps, it’s entirely feasible to run fq_codel (probably based on IP addresses, not individual flows) in software, never mind in hardware.  So that’s a solvable problem at the technical level.

The fact that certain ISPs are *deliberately* restricting capacity is a thornier problem, and one that’s entirely political.

True core networks are, I hear, often made using optical switches rather than routers per se.  It’s a very alien environment.  I wouldn’t be surprised if there was difficulty even running something as simple as RED at the speeds they use.  I’m perfectly happy with the idea of them aiming to keep the bottlenecks elsewhere - at the peering points if nowhere else.

>>> True, but how stable is a network path actually over seconds time frames?
>> 
>> Stable enough for VoIP and multiplayer twitch games to work already, if the link is idle.
> 
> 	Both of which pretty much try to keep constant bitrate UDP traffic flows going I believe, so they only care if the immediate network path and or alternatives a) has sufficient headroom for the data and b) latency changes due to path re-routing stay inside the de-jitter/de-lag buffer systems that are in use; or put differently, these traffic types will not attempt to saturate a given link by themselves so they are not the most sensitive probes for network path stability, no?

I fully appreciate that *some* network paths may be unstable, and any congestion control system will need to chase the sweet spot up and down under such conditions.

Most of the time, however, baseline RTT is stable over timescales of the order of minutes, and available bandwidth is dictated by the last-mile link as the bottleneck.  BDP and therefore the ideal cwnd is a simple function of baseline RTT and bandwidth.  Hence there are common scenarios in which a steady-state condition can exist.  That’s enough to justify the “hold” signal.

>>> Could an intermediate router actually figure out what signal to send all flows realistically?
>> 
>> I described a possible method of doing so, using information already available in fq_codel and cake.  
> 
> 	We are back at the issue, how to make sure big routers learn codel /q_codel as options in their AQM subsystems… It would be interesting to know what the cisco’s/juniper’s/huawei’s of the world actually test in their private labs ;)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cake] Explicit Load Regulation - was: Control theory and congestion control
  2015-05-11 11:34       ` Jonathan Morton
@ 2015-05-11 13:54         ` Jonathan Morton
  2015-05-12 23:23         ` [Cake] " David Lang
  1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Morton @ 2015-05-11 13:54 UTC (permalink / raw)
  To: cake; +Cc: codel, bloat

> On 11 May, 2015, at 14:34, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> I’m also now thinking about how to approximate fairness between ELR flows *without* flow isolation.  Since ELR would aim to provide a continuous signal rather than a stochastic one, this is actually a harder problem than it sounds; naively, a new flow would stay at minimum cwnd as long as a competing flow was saturating the link, since both would be given the same up/down signals.  There might need to be some non-obvious properties in the way the signal is provided to overcome that; I have the beginnings of an idea, but need to work it out.

And the result of a good wander is that I think we can, in fact, use the distinction between ECT(0) and ECT(1) to perform this signalling, and therefore we don’t need to somehow find extra bits in the IP headers.  This might take a little while to explain:

When an ELR flow is negotiated by the endpoints, senders set ECT(1) on all relevant packets they originate.  Since most ECN senders currently set ECT(0), and those that use ECT(1) at all tend to alternate between ECT(0) and ECT(1), routers are able to assume with sufficient safety that an ECT(1) packet can be used to carry an ELR signal, and that an ECT(0) packet belongs to a legacy ECN flow.

The “fast down” signal for both ECN and ELR flows is the ECN codepoint set in any packet during one RTT.  This is echoed back to the sender by the receiver, as for legacy ECN.  Compliant senders should halve their congestion window, or perform an equivalent backoff.

In an ELR flow, the ratio of ECT(1) to ECT(0) packets received is significant, and carries the remaining four states of the ELR protocol.  Receivers keep a history of the ECN codepoints in the most recent three data-bearing packets received on the flow.  They echo back to the sender the number of such packets which had ECT(1) set.  The significance of this number is as follows:

0: slow down - sender should perform a small, multiplicative (exponential) backoff in this RTT
1: hold      - sender should not increase send rate in this RTT
2: slow up   - sender may perform only additive (linear) increase in this RTT
3: fast up   - sender may perform multiplicative (exponential) increase (eg. slow start) in this RTT

Since one byte can hold four of these indications, the receiver may indicate a twelve-packet history in this way, allowing for sparse and lost acks.  Senders should perform all of the actions indicated by these signals which have *not* yet been performed, allowing for the possibility of overlap between subsequent signals.

Queues implementing ELR maintain one or more five-state control variables, which may be per flow, per traffic class or global, and reflect the queue's opinion of whether senders associated with each control variable may increase, should hold or should decrease their send rates (and how quickly) in order to match link capacity, or a fair share thereof, at that queue.  In most cases, there will be at most one queue on a given network path for which this opinion is not “may increase rapidly”; this is known as the bottleneck queue.

In the “may increase rapidly” state, the queue performs no modifications to the ECN field.

In the “may increase gradually” state, the queue changes one out of every three ECT(1) packets to ECT(0), and leaves all other packets unchanged.

In the “should hold” state, the queue changes two out of every three ECT(1) packets to ECT(0), and leaves all other packets unchanged.

In the “should decrease gradually” state, the queue changes all ECT(1) packets to ECT(0), and additionally changes some proportion of originally ECT(0) packets to the ECN codepoint, and drops the same proportion of Not-ECT packets.

In the “should decrease rapidly” state, all of the actions performed in “should decrease gradually” state are performed, but also ECT(1) packets are changed to the ECN codepoint at the same rate as ECT(0) packets.

It should be obvious that these five states correspond precisely to the “fast up”, “slow up”, “hold”, “slow down” and “fast down” signals observed by the receiver of a single flow.  Thus an ELR-compliant queue implementing flow isolation is able to precisely control the send rates of each flow passing through it.

The behaviour of multiple flows sharing a single ELR queue with a single control variable is more complex.  Consider the case where one ELR flow is established on the link, and has stabilised in the “hold” state, when a new ELR flow begins.  After the new flow’s initial congestion window is sent and acknowledged, it will also see the same two-out-of-three ECT(0) pattern (on average) as the established flow, and might then appear to be stuck in the “hold” state with its initial congestion window for all subsequent traffic.

However, packets for the new flow will disrupt the regular pattern of the established flow’s ELR signal, and vice versa, resulting in a stochastic distribution of “slow down” and “slow up” signals actually being received by both flows.  The resulting low-amplitude AIMD behaviour should result in the congestion windows of the two flows converging, eventually giving fair sharing of the link.  While convergence time and sensitivity to RTT are both inferior to a flow-isolating queue, they should be no worse than for conventional AQM queues.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-11 11:34       ` Jonathan Morton
  2015-05-11 13:54         ` [Cake] Explicit Load Regulation - was: " Jonathan Morton
@ 2015-05-12 23:23         ` David Lang
  2015-05-13  2:51           ` Jonathan Morton
  1 sibling, 1 reply; 18+ messages in thread
From: David Lang @ 2015-05-12 23:23 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6983 bytes --]

On Mon, 11 May 2015, Jonathan Morton wrote:

>>>>> Congestion control looks like a simple problem too. If there is no 
>>>>> congestion, increase the amount of data in flight; if there is, reduce it. 
>>>>> We even have Explicit Congestion Notification now to tell us that crucial 
>>>>> data point, but we could always infer it from dropped packets before.
>>>>
>>>> I think we critically depend on being able to interpret lost packets as 
>>>> well, as a) not all network nodes use ECN signaling, and b) even those that 
>>>> do can go into “drop-everything” mode if overloaded.
>>>
>>> Yes, but I consider that a degraded mode of operation.  Even if it is, for 
>>> the time being, the dominant mode.
>>>
>>>> 1) Competiton with simple greedy non-ECN flows, if these push the router 
>>>> into the dropping regime how will well behaved ECN flows be able to 
>>>> compete?
>>>
>>> Backwards compatibility for current ECN means dropping non-ECN packets that 
>>> would have been marked.  That works, so we can use it as a model.
>>
>> 	Let me elaborate, what I mean is if we got an ecn reduce slowly signal 
>> on the ecn flow and the router goes into overload, what guarantees that our 
>> flow with the double reduce-slowly ecn signal plus the reduce-hard drop will 
>> end not end up at an disadvantage over greedy non-ecn flows? It probably is 
>> quite simple but I can not see it right now.
>
> There are two possible answers to this:
>
> 1) The most restrictive signal seen during an RTT is the one to react to.  So 
> a “fast down” signal overrides anything else.

sorry for joining in late, but I think you are modeling something that doesn't 
match reality.

are you really going to see two bottlenecks in a given round trip (or even one 
connection)? Since you are ramping up fairly slowly, aren't you far more likely 
to only see one bottleneck (and once you get through that one, you are pretty 
much set through the rest of the link)

That one bottleneck could try to give you a 'fast down' signal, but I think you 
are unlikely to get multiple 'down' signals.

Looking at the network, you have two real scenarios ('server' == fast 
connection, 'client' == slow connection)

Start off with the possible combinations
1. server <-> server
2. server <-> client
3. client <-> server
4. client <-> client

#4 is the same as #1, just with smaller numbers

if both ends have the same available bandwidth (#1 and #4), then the bottleneck 
you are going to have to deal with is the local one

If you have less bandwidth than the other end (#3), the bottleneck that you are 
going to have to deal with is the local one

If you have more bandwidth than the other end (#2), then the bottleneck that you 
are going to have to deal with is the remote one

If you end up with an underprovisioned peer somewhere in the middle, it's either 
lower available bandwidth than either end, or it doesn't matter.

I think the only way you would need a 'fast down' signal is if the route changes 
to go through an underprvisioned peering, or when you have a new flow starting 
to contend with you due to routing changes (if it's a new flow, it should start 
slow and ramp up, so you, and the other affected flows, should all be good with 
a 'slow down' signal)

>>
>> 	But we already have a hard time to convince the operators of the edge 
>> routers (telcos cable cos…) to actually implement something saner than deep 
>> buffers at those devices. If they would at least own up to the head-end 
>> buffers for the downlink we would be in much better shape, and if they would 
>> offer to handle up-link buffer bloat as part of their optional 
>> ISP-router-thingy the issue would be stamped already. But did you look inside 
>> a typical CPE recently, still kernel from the 2.X series, so no 
>> codel/fq_codel and what ever else fixes were found in the several years since 
>> 2.X was the hot new thing…
>
> For CPE at least, there exists a market opportunity for somebody to fill. 
> OpenWRT shows what can be done with existing hardware with some user 
> engagement.  In principle, it’s only a short step from there to a new 
> commercial product that Does the Right Things.

especially if the device is managed by the ISP who knows what speed it should be 
using.

>>>> Is the steady state, potentially outside of the home, link truly likely 
>>>> enough that an non-oscillating congestion controller will effectively work 
>>>> better? In other words would the intermediate node ever signal hold 
>>>> sufficiently often that implementing this stage seems reasonable?

Is there really such a thing as a steady-state link outside of a 'dark fiber' 
point-to-point link used for a single application?  If not, that doesn't sound 
like something really worth optimizing for.

>>> Core routers don’t track flow state, but they are typically provisioned to 
>>> not saturate their links in the first place.
>>
>> 	This I heard quite often; it always makes me wonder whether there is a 
>> better way to design a network to work well at capacity instead of working 
>> around this by simply over-provisining, I thought it is called network 
>> engineering not network-“brute-forcing”…

network engineering at that level requires knowing future usage patterns. That's 
not possible yet, so you have to overbuild to account for the unexpected. This 
is no different than the fact that Bridges are overbuilt to allow for the 
unexpected loads of the future and possible sub-standard materials creeping in 
at some point in the construction process.

Only in Rocket Science where ounces count do they work to eliminate headroom 
(and as the first Space-X landing attempt showed, sometimes they cut too far)

>> 	Both of which pretty much try to keep constant bitrate UDP traffic flows 
>> going I believe, so they only care if the immediate network path and or 
>> alternatives a) has sufficient headroom for the data and b) latency changes 
>> due to path re-routing stay inside the de-jitter/de-lag buffer systems that 
>> are in use; or put differently, these traffic types will not attempt to 
>> saturate a given link by themselves so they are not the most sensitive probes 
>> for network path stability, no?
>
> I fully appreciate that *some* network paths may be unstable, and any 
> congestion control system will need to chase the sweet spot up and down under 
> such conditions.
>
> Most of the time, however, baseline RTT is stable over timescales of the order 
> of minutes, and available bandwidth is dictated by the last-mile link as the 
> bottleneck.  BDP and therefore the ideal cwnd is a simple function of baseline 
> RTT and bandwidth.  Hence there are common scenarios in which a steady-state 
> condition can exist.  That’s enough to justify the “hold” signal.

Unless you prevent other traffic from showing up on the network (phones checking 
e-mail, etc). I don't believe that you are ever going to have stable bandwidth 
available for any noticable timeframe.

David Lang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-12 23:23         ` [Cake] " David Lang
@ 2015-05-13  2:51           ` Jonathan Morton
  2015-05-13  3:12             ` David Lang
  0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Morton @ 2015-05-13  2:51 UTC (permalink / raw)
  To: David Lang; +Cc: cake

> On 13 May, 2015, at 02:23, David Lang <david@lang.hm> wrote:
> 
>> 1) The most restrictive signal seen during an RTT is the one to react to.  So a “fast down” signal overrides anything else.
> 
> sorry for joining in late, but I think you are modeling something that doesn't match reality.
> 
> are you really going to see two bottlenecks in a given round trip (or even one connection)? Since you are ramping up fairly slowly, aren't you far more likely to only see one bottleneck (and once you get through that one, you are pretty much set through the rest of the link)

It’s important to remember that link speeds can change drastically over time (usually if it’s *anything* wireless), that new competing traffic might reduce the available bandwidth suddenly, and that as a result the bottleneck can *move* from an ELR-enabled queue to a different queue which might not be.  I consider that far more likely than an ELR queue abruptly losing control as Sebastian originally suggested, but it looks similar to the endpoints.

So what you might have is an ELR queue happily controlling the cwnd based on the assumption that *it* is the bottleneck, which until now it has been.  But *after* that queue is another one which has just *become* the bottleneck, and it’s not ELR - it’s plain ECN.  The only way it can tell the flow to slow down is by giving “fast down” signals.  But that’s okay, the endpoints will react to that just as they should do, as long as they correctly interpret the most restrictive signal as being the operative one.

Or maybe the new bottleneck is a dumb FIFO.  In this case, ELR will initially hold the cwnd constant, but the FIFO will fill up, increasing latency and reducing throughput at the same BDP.  This will cause ELR to start giving “slow up” and then maybe “fast up” signals, and might thereby relinquish control of the flow automatically.  Note that “fast up” is signalled by ELR *not modifying* any packets.

Or maybe the new bottleneck is a drop-only AQM.  In that case, the first sign of it will be a dropped packet after, if anything, only a small increase in latency (ie. not enough, for long enough, for ELR to do very much about).  At this point, the observable network state is indistinguishable from a randomly-lost packet, ie. not congestion related.

The safe option here is to react like an ECN-enabled flow, treating any lost packet as a “fast down” signal.  An alternative is to treat a lost packet as “slow down” *if* it is accompanied by “slow up” or “hold” signals in the same RTT (ie. there’s a reasonable belief that we’re being properly controlled by ELR).  While “slow down” doesn’t react as quickly as a new bottleneck queue might prefer, it does at least respond; if enough drops appear, the ELR queue’s control loop will be shifted to “fast up”, relinquishing control.  Or, if the AQM isn’t tight enough to do that, the corresponding increase in RTT will do it instead.

> (if it's a new flow, it should start slow and ramp up, so you, and the other affected flows, should all be good with a 'slow down' signal)

Given that slow-start grows the cwnd exponentially, that might not be the case after the first few RTTs.  But that’s all part of the control loop, and ELR would normally signal it with the CE codepoint rather than dropping packets.  Sebastian’s scenario of “slow down” suddenly changing to “omgwtfbbq drop everything now” within the same queue is indeed unlikely.

>> I fully appreciate that *some* network paths may be unstable, and any congestion control system will need to chase the sweet spot up and down under such conditions.
>> 
>> Most of the time, however, baseline RTT is stable over timescales of the order of minutes, and available bandwidth is dictated by the last-mile link as the bottleneck.  BDP and therefore the ideal cwnd is a simple function of baseline RTT and bandwidth.  Hence there are common scenarios in which a steady-state condition can exist.  That’s enough to justify the “hold” signal.
> 
> Unless you prevent other traffic from showing up on the network (phones checking e-mail, etc). I don't believe that you are ever going to have stable bandwidth available for any noticable timeframe.

On many links, light traffic such as e-mail will disturb the balance too little to even notice, especially with flow isolation.  Assuming ELR is implemented as per my later post, running without flow isolation will allow light traffic to perturb the ELR signal slightly, converting a “hold” into a random sequence of “slow up”, “hold" and “slow down”, but this will self-correct conservatively, with ELR transitioning to a true “slow up” briefly if required.

Of course, as with any speculation of this nature, simulations and other experiments will tell a more convincing story.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-13  2:51           ` Jonathan Morton
@ 2015-05-13  3:12             ` David Lang
  2015-05-13  3:53               ` Jonathan Morton
  0 siblings, 1 reply; 18+ messages in thread
From: David Lang @ 2015-05-13  3:12 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5438 bytes --]

On Wed, 13 May 2015, Jonathan Morton wrote:

>> On 13 May, 2015, at 02:23, David Lang <david@lang.hm> wrote:
>>
>>> 1) The most restrictive signal seen during an RTT is the one to react to. 
>>> So a “fast down” signal overrides anything else.
>>
>> sorry for joining in late, but I think you are modeling something that 
>> doesn't match reality.
>>
>> are you really going to see two bottlenecks in a given round trip (or even 
>> one connection)? Since you are ramping up fairly slowly, aren't you far more 
>> likely to only see one bottleneck (and once you get through that one, you are 
>> pretty much set through the rest of the link)
>
> It’s important to remember that link speeds can change drastically over time 
> (usually if it’s *anything* wireless), that new competing traffic might reduce 
> the available bandwidth suddenly, and that as a result the bottleneck can 
> *move* from an ELR-enabled queue to a different queue which might not be.  I 
> consider that far more likely than an ELR queue abruptly losing control as 
> Sebastian originally suggested, but it looks similar to the endpoints.

agreed.

> So what you might have is an ELR queue happily controlling the cwnd based on 
> the assumption that *it* is the bottleneck, which until now it has been.  But 
> *after* that queue is another one which has just *become* the bottleneck, and 
> it’s not ELR - it’s plain ECN.  The only way it can tell the flow to slow down 
> is by giving “fast down” signals.  But that’s okay, the endpoints will react 
> to that just as they should do, as long as they correctly interpret the most 
> restrictive signal as being the operative one.

how would the ELR queue know that things should slow down? If it isn't the 
bottleneck, how does it know that there is a bottleneck and the flow that it's 
seeing isn't just the application behaving normally?

If the ELR queue is the endpoint, it has some chance of knowing what the 
application is trying to do, but if it's on the router that was previously the 
bottleneck (usually the 'last mile' device), it has no way of knowing.

> The safe option here is to react like an ECN-enabled flow, treating any lost 
> packet as a “fast down” signal.  An alternative is to treat a lost packet as 
> “slow down” *if* it is accompanied by “slow up” or “hold” signals in the same 
> RTT (ie. there’s a reasonable belief that we’re being properly controlled by 
> ELR).  While “slow down” doesn’t react as quickly as a new bottleneck queue 
> might prefer, it does at least respond; if enough drops appear, the ELR 
> queue’s control loop will be shifted to “fast up”, relinquishing control. 
> Or, if the AQM isn’t tight enough to do that, the corresponding increase in 
> RTT will do it instead.

It's the application or the endpoint machine that needs to react, not the queue 
device.

>> (if it's a new flow, it should start slow and ramp up, so you, and the other 
>> affected flows, should all be good with a 'slow down' signal)
>
> Given that slow-start grows the cwnd exponentially, that might not be the case 
> after the first few RTTs.  But that’s all part of the control loop, and ELR 
> would normally signal it with the CE codepoint rather than dropping packets. 
> Sebastian’s scenario of “slow down” suddenly changing to “omgwtfbbq drop 
> everything now” within the same queue is indeed unlikely.
>
>>> I fully appreciate that *some* network paths may be unstable, and any 
>>> congestion control system will need to chase the sweet spot up and down 
>>> under such conditions.
>>>
>>> Most of the time, however, baseline RTT is stable over timescales of the 
>>> order of minutes, and available bandwidth is dictated by the last-mile link 
>>> as the bottleneck.  BDP and therefore the ideal cwnd is a simple function of 
>>> baseline RTT and bandwidth.  Hence there are common scenarios in which a 
>>> steady-state condition can exist.  That’s enough to justify the “hold” 
>>> signal.
>>
>> Unless you prevent other traffic from showing up on the network (phones 
>> checking e-mail, etc). I don't believe that you are ever going to have stable 
>> bandwidth available for any noticable timeframe.
>
> On many links, light traffic such as e-mail will disturb the balance too 
> little to even notice, especially with flow isolation.

This depends on the bandwidth and the type of e-mail. It's very common for 
single e-mails in an office environment to be several MB (not the ones with big 
document or spreadsheet attachments, but the things like holiday party 
announcements and other similar things)

e-mail to a mobile device can have a rather significant impact on cell or wifi 
bandwidth.

>  Assuming ELR is 
> implemented as per my later post, running without flow isolation will allow 
> light traffic to perturb the ELR signal slightly, converting a “hold” into a 
> random sequence of “slow up”, “hold" and “slow down”, but this will 
> self-correct conservatively, with ELR transitioning to a true “slow up” 
> briefly if required.
>
> Of course, as with any speculation of this nature, simulations and other 
> experiments will tell a more convincing story.

I have a significant distrust of simulations at this point. We can only simulate 
how we thing the devices in the network act. Bufferbloat came to be because of 
the disconnect between the mental model of the people designing the protocols 
and the people designing the equipment.

David Lang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] Control theory and congestion control
  2015-05-13  3:12             ` David Lang
@ 2015-05-13  3:53               ` Jonathan Morton
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Morton @ 2015-05-13  3:53 UTC (permalink / raw)
  To: David Lang; +Cc: cake

>> So what you might have is an ELR queue happily controlling the cwnd based on the assumption that *it* is the bottleneck, which until now it has been.  But *after* that queue is another one which has just *become* the bottleneck, and it’s not ELR - it’s plain ECN.  The only way it can tell the flow to slow down is by giving “fast down” signals.  But that’s okay, the endpoints will react to that just as they should do, as long as they correctly interpret the most restrictive signal as being the operative one.
> 
> how would the ELR queue know that things should slow down? If it isn't the bottleneck, how does it know that there is a bottleneck and the flow that it's seeing isn't just the application behaving normally?

The ELR queue doesn’t know anything about the other one, and doesn’t need to.  The *new* bottleneck sends ECN signals, which override the ELR “hold" signals (which are sent using the same two bits in the TOS byte).  ELR endpoints react to both ECN and ELR signals.  The send rate reduces, and the ELR queue is no longer saturated, ergo no longer the bottleneck; it then stops sending “hold”.

So it’s possible to have two queues which simultaneously believe they are the bottleneck, but only as a transient condition.  In fact, we often have that today, when we insert a shaped ingress queue *after* our last-mile link.

Please read the post entitled “Explicit Load Regulation”, which I wrote after spending several hours figuring out the right way to do it, and try to keep up.

>> On many links, light traffic such as e-mail will disturb the balance too little to even notice, especially with flow isolation.
> 
> This depends on the bandwidth and the type of e-mail. It's very common for single e-mails in an office environment to be several MB (not the ones with big document or spreadsheet attachments, but the things like holiday party announcements and other similar things)

But you’re not getting those continuously, are you?  Or, if you are, it’s time to reconfigure the office’s spam filter.

So while a *big* email is coming in, the bandwidth available to other flows might be disturbed.  ELR will help to adjust to that, just like ECN does.  Then it’ll adjust back when the disturbance has gone, and resume the steady state.

This is not rocket science.  This is 1950s locomotive technology.

> e-mail to a mobile device can have a rather significant impact on cell or wifi bandwidth.

Yes - at least on mobile, I’d agree - but that’s one case.  There are others.

And even on a mobile connection, it’s potentially useful to have a well-defined steady state, when conditions are right for it.  It’s harder to get to those conditions in a wireless environment, but not impossible, especially for rate-limited subscriptions.

As they say, don’t ban steak just because a baby can’t eat it.

>> Of course, as with any speculation of this nature, simulations and other experiments will tell a more convincing story.
> 
> I have a significant distrust of simulations at this point.

Hence “and other experiments”.

But it’s also likely that simulations will help to understand the emergent behaviour of something like ELR, before anyone expends too much effort on implementation and standardisation.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-13  3:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-09 19:02 [Cake] Control theory and congestion control Jonathan Morton
2015-05-10  3:35 ` Dave Taht
2015-05-10  6:55   ` Jonathan Morton
2015-05-10 17:00     ` [Cake] [Codel] " Sebastian Moeller
2015-05-10 14:46   ` [Cake] " Jonathan Morton
2015-05-10 17:04   ` [Cake] [Codel] " Sebastian Moeller
2015-05-10 17:48     ` Dave Taht
2015-05-10 17:58       ` Dave Taht
2015-05-10 18:25       ` Dave Taht
2015-05-10 16:48 ` [Cake] " Sebastian Moeller
2015-05-10 18:32   ` Jonathan Morton
2015-05-11  7:36     ` Sebastian Moeller
2015-05-11 11:34       ` Jonathan Morton
2015-05-11 13:54         ` [Cake] Explicit Load Regulation - was: " Jonathan Morton
2015-05-12 23:23         ` [Cake] " David Lang
2015-05-13  2:51           ` Jonathan Morton
2015-05-13  3:12             ` David Lang
2015-05-13  3:53               ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox