Re: [Codel] [Cake] Control theory and congestion control

CoDel AQM discussions
 help / color / mirror / Atom feed

* Re: [Codel] [Cake] Control theory and congestion control
       [not found] <CAJq5cE35ptd-P=EPB4-qhnfVZiMmXWUFL4jD2_BxUUCvU2ACGw@mail.gmail.com>
@ 2015-05-10  3:35 ` Dave Taht
  2015-05-10  6:55   ` Jonathan Morton
                     ` (2 more replies)
       [not found] ` <152DD781-725D-4DD7-AB94-C7412D92F82C@gmx.de>
  1 sibling, 3 replies; 9+ messages in thread
From: Dave Taht @ 2015-05-10  3:35 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel, bloat

On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>> The "right" amount of buffering is *1* packet, all the time (the goal is
>> nearly 0 latency with 100% utilization). We are quite far from achieving
>> that on anything...
>
> And control theory shows, I think, that we never will unless the mechanisms
> available to us for signalling congestion improve. ECN is good, but it's not
> sufficient to achieve that ultimate goal. I'll try to explain why.

The conex and dctcp work explored using ecn for multi-bit signalling.

While this is a great set of analogies below (and why I am broadening
the cc) there are two things missing from it.

>
> Aside from computer networking, I also dabble in computer simulated trains.
> Some of my bigger projects involve detailed simulations of what goes on
> inside them, especially the older ones which are relatively simple. These
> were built at a time when the idea of putting anything as delicate as a
> transistor inside what was effectively a megawatt-class power station was
> unthinkable, so the control gear tended to be electromechanical or even
> electropneumatic. The control laws therefore tended to be the simplest ones
> they could get away with.
>
> The bulk of the generated power went into the main traction circuit, where a
> dedicated main generator is connected rather directly to the traction motors
> through a small amount of switchgear (mainly to reverse the fields on the
> motors at either end off the line). Control of the megawatts of power
> surging through this circuit was effected by varying the excitation of the
> main generator. Excitation is in turn provided by shunting the auxiliary
> voltage through an automatic rheostat known as the Load Regulator before it
> reaches the field winding of the generator. Without field current, the
> generator produces no power.
>
> The load regulator is what I want to focus on here. Its job was to adjust
> the output of the generator to match the power - more precisely the torque -
> that the engine was capable of producing (or, in English Electric locos at
> least, the torque set by the driver's controls, which wasn't always the
> maximum). The load regulator had a little electric motor to move it up and
> down. A good proxy for engine torque was available in the form of the fuel
> rack position; the torque output of a diesel engine is closely related to
> the amount of fuel injected per cycle. The fuel rack, of course, was
> controlled by the governor which was set to maintain a particular engine
> speed; a straightforward PI control problem solved by a reasonably simple
> mechanical device.
>
> So it looks like a simple control problem; if the torque is too low,
> increase the excitation, and vice versa.
>
> Congestion control looks like a simple problem too. If there is no
> congestion, increase the amount of data in flight; if there is, reduce it.
> We even have Explicit Congestion Notification now to tell us that crucial
> data point, but we could always infer it from dropped packets before.
>
> So what does the load regulator's control system look like? It has as many
> as five states: fast down, slow down, hold, slow up, fast up. It turns out
> that trains really like changes in tractive effort to be slow and smooth,
> and as infrequent as possible. So while a very simple "bang bang" control
> scheme would be possible, it would inevitably oscillate around the set point
> instead of settling on it. Introducing a central hold state allows it to
> settle when cruising at constant speed, and the two slow states allow the
> sort of fine adjustments needed as a train gradually accelerates or slows,
> putting the generator only slightly out of balance with the engine. The fast
> states remain to allow for quick response to large changes - the driver
> moves the throttle, or the motors abruptly reconfigure for a different speed
> range (the electrical equivalent of changing gear).
>
> On the Internet, we're firmly stuck with bang-bang control. As big an
> improvement as ECN is, it still provides only one bit of information to the
> sender: whether or not there was congestion reported during the last RTT.
> Thus we can only use the "slow up" and "fast down" states of our virtual
> load regulator (except for slow start, which ironically uses the "fast up"
> state), and we are doomed to oscillate around the ideal congestion window,
> never actually settling on it.
>
> Bufferbloat is fundamentally about having insufficient information at the
> endpoints about conditions in the network.

Well said.

> We've done a lot to improve that,
> by moving from zero information to one bit per RTT. But to achieve that holy
> grail, we need more information still.

context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.

> Specifically, we need to know when we're at the correct BDP, not just when
> it's too high. And it'd be nice if we also knew if we were close to it. But
> there is currently no way to provide that information from the network to
> the endpoints.

This is where I was pointing out that FQ and the behavior of multiple
flows in their two phases (slow start and congestion avoidance)
provides a few pieces of useful information  that could possibly be
used to get closer to the ideal.

We know total service times for all active flows. We also have a
separate calculable service time for "sparse flows" in two algorithms
we understand deeply.

We could have some grip on the history for flows that are not currently queued.

We know that the way we currently seek new set points tend to be
bursty ("chasing the inchworm" - I still gotta use that title on a
paper!).

New flows tend to be extremely bursty - and new flows in the real
world also tend to be pretty short, with 95% of all web traffic
fitting into a single IW10.

If e2e we know we are being FQ´d, and yet are bursting to find new
setpoints we can infer from the spacing on the other endpoint what the
contention really is.

There was a stanford result for 10s of thousands of flows that found
an ideal setpoint much lower than we are achieving for dozens, at much
higher rates.

A control theory-ish issue with codel is that it depends on an
arbitrary ideal (5ms) as a definition for "good queue", where "a
gooder queue"
is, in my definition at the moment, "1 packet outstanding ever closer
to 100% of the time while there is 100% utilization".

We could continue to bang on things (reducing the target or other
methods) and aim for a lower ideal setpoint until utilization dropped
below 100%.

Which becomes easier the more flows we know are in progress.

> - Jonathan Morton
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10  3:35 ` [Codel] [Cake] Control theory and congestion control Dave Taht
@ 2015-05-10  6:55   ` Jonathan Morton
  2015-05-10 17:00     ` Sebastian Moeller
  2015-05-10 14:46   ` Jonathan Morton
  2015-05-10 17:04   ` Sebastian Moeller
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2015-05-10  6:55 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake, codel, bloat

> On 10 May, 2015, at 06:35, Dave Taht <dave.taht@gmail.com> wrote:
> 
> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>> that on anything...
>> 
>> And control theory shows, I think, that we never will unless the mechanisms
>> available to us for signalling congestion improve. ECN is good, but it's not
>> sufficient to achieve that ultimate goal. I'll try to explain why.
> 
> The conex and dctcp work explored using ecn for multi-bit signalling.

A quick glance at those indicates that they’re focusing on the echo path - getting the data back from the receiver to the sender.  That’s the *easy* part; all you need is a small TCP option, which can be slotted into the padding left by TCP Timestamps and/or SACK, so it doesn’t even take any extra space.

But they do nothing to address the problem of allowing routers to provide a “hold” signal.  Even a single ECN mark has to be taken to mean “back off”; being able to signal that more than one ECN mark happened in one RTT simply means that you now have a way to say “back off harder”.

The problem is that we need a three-bit signal (five new-style signalling states, two states indicating legacy ECN support, and one “ECN unsupported” state) at the IP layer to do it properly, and we’re basically out of bits there, at least in IPv4.  The best solution I can think of right now is to use both of the ECT states somehow, but we’d have to make sure that doesn’t conflict too badly with existing uses of ECT(1), such as the “nonce sum”.  Backwards and forwards compatibility here is essential.

I’m thinking about the problem.

>> Bufferbloat is fundamentally about having insufficient information at the
>> endpoints about conditions in the network.
> 
> Well said.
> 
>> We've done a lot to improve that,
>> by moving from zero information to one bit per RTT. But to achieve that holy
>> grail, we need more information still.
> 
> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
> 
>> Specifically, we need to know when we're at the correct BDP, not just when
>> it's too high. And it'd be nice if we also knew if we were close to it. But
>> there is currently no way to provide that information from the network to
>> the endpoints.
> 
> This is where I was pointing out that FQ and the behavior of multiple
> flows in their two phases (slow start and congestion avoidance)
> provides a few pieces of useful information  that could possibly be
> used to get closer to the ideal.

There certainly is enough information available in fq_codel and cake to derive a five-state congestion signal, rather than a two-state one, with very little extra effort.

Flow is sparse -> “Fast up”
Flow is saturating, but no standing queue -> “Slow up”
Flow is saturating, with small standing queue -> “Hold”
Flow is saturating, with large standing queue -> “Slow down”
Flow is saturating, with large, *persistent* standing queue -> “Fast down”

In simple terms, “fast” here means “multiplicative” and “slow” means “additive”, in the sense of AIMD being the current standard for TCP behaviour.  AIMD itself is a result of the two-state “bang-bang” control model introduced back in the 1980s.

It’s worth remembering that the Great Internet Congestion Collapse Event was 30 years ago, and ECN was specified 15 years ago.

> A control theory-ish issue with codel is that it depends on an arbitrary ideal (5ms) as a definition for "good queue", where "a
> gooder queue” is, in my definition at the moment, "1 packet outstanding ever closer to 100% of the time while there is 100% utilization”.

As the above table shows, Codel reacts (by design) only to the most extreme situation that we would want to plug into an improved congestion-control model.  It’s really quite remarkable, in that context, that it works as well as it does.  I don’t think we can hope to do significantly better until a better signalling mechanism is available.

But it does highlight that the correct meaning of an ECN mark is “back off hard, now”.  That’s how it’s currently interpreted by TCPs, in accordance with the ECN RFCs, and Codel relies on that behaviour too.  We have to use some other, deliberately softer signal to give a “hold” or even a “slow down” indication.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10  6:55   ` Jonathan Morton
@ 2015-05-10 17:00     ` Sebastian Moeller
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Moeller @ 2015-05-10 17:00 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel, bloat

Hi Jonathan,


On May 10, 2015, at 08:55 , Jonathan Morton <chromatix99@gmail.com> wrote:

> 
>> On 10 May, 2015, at 06:35, Dave Taht <dave.taht@gmail.com> wrote:
>> 
>> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>>> that on anything...
>>> 
>>> And control theory shows, I think, that we never will unless the mechanisms
>>> available to us for signalling congestion improve. ECN is good, but it's not
>>> sufficient to achieve that ultimate goal. I'll try to explain why.
>> 
>> The conex and dctcp work explored using ecn for multi-bit signalling.
> 
> A quick glance at those indicates that they’re focusing on the echo path - getting the data back from the receiver to the sender.  That’s the *easy* part; all you need is a small TCP option, which can be slotted into the padding left by TCP Timestamps and/or SACK, so it doesn’t even take any extra space.
> 
> But they do nothing to address the problem of allowing routers to provide a “hold” signal.  Even a single ECN mark has to be taken to mean “back off”; being able to signal that more than one ECN mark happened in one RTT simply means that you now have a way to say “back off harder”.
> 
> The problem is that we need a three-bit signal (five new-style signalling states, two states indicating legacy ECN support, and one “ECN unsupported” state) at the IP layer to do it properly, and we’re basically out of bits there, at least in IPv4.  The best solution I can think of right now is to use both of the ECT states somehow, but we’d have to make sure that doesn’t conflict too badly with existing uses of ECT(1), such as the “nonce sum”.  Backwards and forwards compatibility here is essential.

	On the danger of sounding like I had a tin of snark for breakfast; what about re-dedicating 3 of the 6 TOS bits for this ;) (if I understand correctly ethernet and MPLS transports only allow 3 bits anyway, so the 6 bits are fiction anyway, outside of l3-routers) And the BCP still is to re-color the TOS bits in ingress, so I guess 3 bits should be plenty.

Best Regards
	Sebastian

> 
> I’m thinking about the problem.
> 
>>> Bufferbloat is fundamentally about having insufficient information at the
>>> endpoints about conditions in the network.
>> 
>> Well said.
>> 
>>> We've done a lot to improve that,
>>> by moving from zero information to one bit per RTT. But to achieve that holy
>>> grail, we need more information still.
>> 
>> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
>> 
>>> Specifically, we need to know when we're at the correct BDP, not just when
>>> it's too high. And it'd be nice if we also knew if we were close to it. But
>>> there is currently no way to provide that information from the network to
>>> the endpoints.
>> 
>> This is where I was pointing out that FQ and the behavior of multiple
>> flows in their two phases (slow start and congestion avoidance)
>> provides a few pieces of useful information  that could possibly be
>> used to get closer to the ideal.
> 
> There certainly is enough information available in fq_codel and cake to derive a five-state congestion signal, rather than a two-state one, with very little extra effort.
> 
> Flow is sparse -> “Fast up”
> Flow is saturating, but no standing queue -> “Slow up”
> Flow is saturating, with small standing queue -> “Hold”
> Flow is saturating, with large standing queue -> “Slow down”
> Flow is saturating, with large, *persistent* standing queue -> “Fast down”
> 
> In simple terms, “fast” here means “multiplicative” and “slow” means “additive”, in the sense of AIMD being the current standard for TCP behaviour.  AIMD itself is a result of the two-state “bang-bang” control model introduced back in the 1980s.
> 
> It’s worth remembering that the Great Internet Congestion Collapse Event was 30 years ago, and ECN was specified 15 years ago.
> 
>> A control theory-ish issue with codel is that it depends on an arbitrary ideal (5ms) as a definition for "good queue", where "a
>> gooder queue” is, in my definition at the moment, "1 packet outstanding ever closer to 100% of the time while there is 100% utilization”.
> 
> As the above table shows, Codel reacts (by design) only to the most extreme situation that we would want to plug into an improved congestion-control model.  It’s really quite remarkable, in that context, that it works as well as it does.  I don’t think we can hope to do significantly better until a better signalling mechanism is available.
> 
> But it does highlight that the correct meaning of an ECN mark is “back off hard, now”.  That’s how it’s currently interpreted by TCPs, in accordance with the ECN RFCs, and Codel relies on that behaviour too.  We have to use some other, deliberately softer signal to give a “hold” or even a “slow down” indication.
> 
> - Jonathan Morton
> 
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10  3:35 ` [Codel] [Cake] Control theory and congestion control Dave Taht
  2015-05-10  6:55   ` Jonathan Morton
@ 2015-05-10 14:46   ` Jonathan Morton
  2015-05-10 17:04   ` Sebastian Moeller
  2 siblings, 0 replies; 9+ messages in thread
From: Jonathan Morton @ 2015-05-10 14:46 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake, codel, bloat

> On 10 May, 2015, at 06:35, Dave Taht <dave.taht@gmail.com> wrote:
> 
> New flows tend to be extremely bursty - and new flows in the real
> world also tend to be pretty short, with 95% of all web traffic
> fitting into a single IW10.

There is some hope that HTTP/2 will reduce the prevalence of this characteristic.  It might take a while to reach full effect, due to how much sharding there is right now, but I’m mildly optimistic - it’s an application software change rather than at kernel level.  So there’ll be more flows spending more time in the congestion-avoidance state than in slow-start.

Meanwhile, I understand the reasons behind IW10, but it’s clear that pacing those packets according to the RTT measured during the TCP handshake is desirable.  That *does* need kernel support, but it has the fairly large benefit of not attempting to stuff ten packets into a buffer at the same time.  On drop-tail FIFOs, that inevitably leads to a spike in induced latency (more so if several flows start up in parallel) and a relatively high chance of burst loss, requiring retransmission of some of those packets anyway.

Aside from sch_fq, what support for TCP pacing is out there?

> If e2e we know we are being FQ´d…

In general, E2E we don’t know what’s in the middle unless we receive notice about it.  Or unless we’re in a lab.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10  3:35 ` [Codel] [Cake] Control theory and congestion control Dave Taht
  2015-05-10  6:55   ` Jonathan Morton
  2015-05-10 14:46   ` Jonathan Morton
@ 2015-05-10 17:04   ` Sebastian Moeller
  2015-05-10 17:48     ` Dave Taht
  2 siblings, 1 reply; 9+ messages in thread
From: Sebastian Moeller @ 2015-05-10 17:04 UTC (permalink / raw)
  To: Dave Täht; +Cc: cake, Jonathan Morton, codel, bloat


On May 10, 2015, at 05:35 , Dave Taht <dave.taht@gmail.com> wrote:

> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>> that on anything...
>> 
>> And control theory shows, I think, that we never will unless the mechanisms
>> available to us for signalling congestion improve. ECN is good, but it's not
>> sufficient to achieve that ultimate goal. I'll try to explain why.
> 
> The conex and dctcp work explored using ecn for multi-bit signalling.
> 
> While this is a great set of analogies below (and why I am broadening
> the cc) there are two things missing from it.
> 
>> 
>> Aside from computer networking, I also dabble in computer simulated trains.
>> Some of my bigger projects involve detailed simulations of what goes on
>> inside them, especially the older ones which are relatively simple. These
>> were built at a time when the idea of putting anything as delicate as a
>> transistor inside what was effectively a megawatt-class power station was
>> unthinkable, so the control gear tended to be electromechanical or even
>> electropneumatic. The control laws therefore tended to be the simplest ones
>> they could get away with.
>> 
>> The bulk of the generated power went into the main traction circuit, where a
>> dedicated main generator is connected rather directly to the traction motors
>> through a small amount of switchgear (mainly to reverse the fields on the
>> motors at either end off the line). Control of the megawatts of power
>> surging through this circuit was effected by varying the excitation of the
>> main generator. Excitation is in turn provided by shunting the auxiliary
>> voltage through an automatic rheostat known as the Load Regulator before it
>> reaches the field winding of the generator. Without field current, the
>> generator produces no power.
>> 
>> The load regulator is what I want to focus on here. Its job was to adjust
>> the output of the generator to match the power - more precisely the torque -
>> that the engine was capable of producing (or, in English Electric locos at
>> least, the torque set by the driver's controls, which wasn't always the
>> maximum). The load regulator had a little electric motor to move it up and
>> down. A good proxy for engine torque was available in the form of the fuel
>> rack position; the torque output of a diesel engine is closely related to
>> the amount of fuel injected per cycle. The fuel rack, of course, was
>> controlled by the governor which was set to maintain a particular engine
>> speed; a straightforward PI control problem solved by a reasonably simple
>> mechanical device.
>> 
>> So it looks like a simple control problem; if the torque is too low,
>> increase the excitation, and vice versa.
>> 
>> Congestion control looks like a simple problem too. If there is no
>> congestion, increase the amount of data in flight; if there is, reduce it.
>> We even have Explicit Congestion Notification now to tell us that crucial
>> data point, but we could always infer it from dropped packets before.
>> 
>> So what does the load regulator's control system look like? It has as many
>> as five states: fast down, slow down, hold, slow up, fast up. It turns out
>> that trains really like changes in tractive effort to be slow and smooth,
>> and as infrequent as possible. So while a very simple "bang bang" control
>> scheme would be possible, it would inevitably oscillate around the set point
>> instead of settling on it. Introducing a central hold state allows it to
>> settle when cruising at constant speed, and the two slow states allow the
>> sort of fine adjustments needed as a train gradually accelerates or slows,
>> putting the generator only slightly out of balance with the engine. The fast
>> states remain to allow for quick response to large changes - the driver
>> moves the throttle, or the motors abruptly reconfigure for a different speed
>> range (the electrical equivalent of changing gear).
>> 
>> On the Internet, we're firmly stuck with bang-bang control. As big an
>> improvement as ECN is, it still provides only one bit of information to the
>> sender: whether or not there was congestion reported during the last RTT.
>> Thus we can only use the "slow up" and "fast down" states of our virtual
>> load regulator (except for slow start, which ironically uses the "fast up"
>> state), and we are doomed to oscillate around the ideal congestion window,
>> never actually settling on it.
>> 
>> Bufferbloat is fundamentally about having insufficient information at the
>> endpoints about conditions in the network.
> 
> Well said.
> 
>> We've done a lot to improve that,
>> by moving from zero information to one bit per RTT. But to achieve that holy
>> grail, we need more information still.
> 
> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
> 
>> Specifically, we need to know when we're at the correct BDP, not just when
>> it's too high. And it'd be nice if we also knew if we were close to it. But
>> there is currently no way to provide that information from the network to
>> the endpoints.
> 
> This is where I was pointing out that FQ and the behavior of multiple
> flows in their two phases (slow start and congestion avoidance)
> provides a few pieces of useful information  that could possibly be
> used to get closer to the ideal.
> 
> We know total service times for all active flows. We also have a
> separate calculable service time for "sparse flows" in two algorithms
> we understand deeply.
> 
> We could have some grip on the history for flows that are not currently queued.
> 
> We know that the way we currently seek new set points tend to be
> bursty ("chasing the inchworm" - I still gotta use that title on a
> paper!).
> 
> New flows tend to be extremely bursty - and new flows in the real
> world also tend to be pretty short, with 95% of all web traffic
> fitting into a single IW10.
> 
> If e2e we know we are being FQ´d, and yet are bursting to find new
> setpoints we can infer from the spacing on the other endpoint what the
> contention really is.
> 
> There was a stanford result for 10s of thousands of flows that found
> an ideal setpoint much lower than we are achieving for dozens, at much
> higher rates.
> 
> A control theory-ish issue with codel is that it depends on an
> arbitrary ideal (5ms) as a definition for "good queue", where "a
> gooder queue”

	I thought that our set point really is 5% of the estimated RTT, and we just default to 5 sincere we guestimate our RTT to be 100ms. Not that I complain, these two numbers seem to work decently over a relive broad range of true RTTs…


Best Regards
	Sebastian

> is, in my definition at the moment, "1 packet outstanding ever closer
> to 100% of the time while there is 100% utilization".
> 
> We could continue to bang on things (reducing the target or other
> methods) and aim for a lower ideal setpoint until utilization dropped
> below 100%.
> 
> Which becomes easier the more flows we know are in progress.
> 
>> - Jonathan Morton
>> 
>> 
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>> 
> 
> 
> 
> -- 
> Dave Täht
> Open Networking needs **Open Source Hardware**
> 
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10 17:04   ` Sebastian Moeller
@ 2015-05-10 17:48     ` Dave Taht
  2015-05-10 17:58       ` Dave Taht
  2015-05-10 18:25       ` Dave Taht
  0 siblings, 2 replies; 9+ messages in thread
From: Dave Taht @ 2015-05-10 17:48 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake, Jonathan Morton, codel, bloat

On Sun, May 10, 2015 at 10:04 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
> On May 10, 2015, at 05:35 , Dave Taht <dave.taht@gmail.com> wrote:
>
>> On Sat, May 9, 2015 at 12:02 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>> The "right" amount of buffering is *1* packet, all the time (the goal is
>>>> nearly 0 latency with 100% utilization). We are quite far from achieving
>>>> that on anything...
>>>
>>> And control theory shows, I think, that we never will unless the mechanisms
>>> available to us for signalling congestion improve. ECN is good, but it's not
>>> sufficient to achieve that ultimate goal. I'll try to explain why.
>>
>> The conex and dctcp work explored using ecn for multi-bit signalling.
>>
>> While this is a great set of analogies below (and why I am broadening
>> the cc) there are two things missing from it.
>>
>>>
>>> Aside from computer networking, I also dabble in computer simulated trains.
>>> Some of my bigger projects involve detailed simulations of what goes on
>>> inside them, especially the older ones which are relatively simple. These
>>> were built at a time when the idea of putting anything as delicate as a
>>> transistor inside what was effectively a megawatt-class power station was
>>> unthinkable, so the control gear tended to be electromechanical or even
>>> electropneumatic. The control laws therefore tended to be the simplest ones
>>> they could get away with.
>>>
>>> The bulk of the generated power went into the main traction circuit, where a
>>> dedicated main generator is connected rather directly to the traction motors
>>> through a small amount of switchgear (mainly to reverse the fields on the
>>> motors at either end off the line). Control of the megawatts of power
>>> surging through this circuit was effected by varying the excitation of the
>>> main generator. Excitation is in turn provided by shunting the auxiliary
>>> voltage through an automatic rheostat known as the Load Regulator before it
>>> reaches the field winding of the generator. Without field current, the
>>> generator produces no power.
>>>
>>> The load regulator is what I want to focus on here. Its job was to adjust
>>> the output of the generator to match the power - more precisely the torque -
>>> that the engine was capable of producing (or, in English Electric locos at
>>> least, the torque set by the driver's controls, which wasn't always the
>>> maximum). The load regulator had a little electric motor to move it up and
>>> down. A good proxy for engine torque was available in the form of the fuel
>>> rack position; the torque output of a diesel engine is closely related to
>>> the amount of fuel injected per cycle. The fuel rack, of course, was
>>> controlled by the governor which was set to maintain a particular engine
>>> speed; a straightforward PI control problem solved by a reasonably simple
>>> mechanical device.
>>>
>>> So it looks like a simple control problem; if the torque is too low,
>>> increase the excitation, and vice versa.
>>>
>>> Congestion control looks like a simple problem too. If there is no
>>> congestion, increase the amount of data in flight; if there is, reduce it.
>>> We even have Explicit Congestion Notification now to tell us that crucial
>>> data point, but we could always infer it from dropped packets before.
>>>
>>> So what does the load regulator's control system look like? It has as many
>>> as five states: fast down, slow down, hold, slow up, fast up. It turns out
>>> that trains really like changes in tractive effort to be slow and smooth,
>>> and as infrequent as possible. So while a very simple "bang bang" control
>>> scheme would be possible, it would inevitably oscillate around the set point
>>> instead of settling on it. Introducing a central hold state allows it to
>>> settle when cruising at constant speed, and the two slow states allow the
>>> sort of fine adjustments needed as a train gradually accelerates or slows,
>>> putting the generator only slightly out of balance with the engine. The fast
>>> states remain to allow for quick response to large changes - the driver
>>> moves the throttle, or the motors abruptly reconfigure for a different speed
>>> range (the electrical equivalent of changing gear).
>>>
>>> On the Internet, we're firmly stuck with bang-bang control. As big an
>>> improvement as ECN is, it still provides only one bit of information to the
>>> sender: whether or not there was congestion reported during the last RTT.
>>> Thus we can only use the "slow up" and "fast down" states of our virtual
>>> load regulator (except for slow start, which ironically uses the "fast up"
>>> state), and we are doomed to oscillate around the ideal congestion window,
>>> never actually settling on it.
>>>
>>> Bufferbloat is fundamentally about having insufficient information at the
>>> endpoints about conditions in the network.
>>
>> Well said.
>>
>>> We've done a lot to improve that,
>>> by moving from zero information to one bit per RTT. But to achieve that holy
>>> grail, we need more information still.
>>
>> context being aqm + ecn, fq, fq+aqm, fq+aqm+ecn, dctcp, conex, etc.
>>
>>> Specifically, we need to know when we're at the correct BDP, not just when
>>> it's too high. And it'd be nice if we also knew if we were close to it. But
>>> there is currently no way to provide that information from the network to
>>> the endpoints.
>>
>> This is where I was pointing out that FQ and the behavior of multiple
>> flows in their two phases (slow start and congestion avoidance)
>> provides a few pieces of useful information  that could possibly be
>> used to get closer to the ideal.
>>
>> We know total service times for all active flows. We also have a
>> separate calculable service time for "sparse flows" in two algorithms
>> we understand deeply.
>>
>> We could have some grip on the history for flows that are not currently queued.
>>
>> We know that the way we currently seek new set points tend to be
>> bursty ("chasing the inchworm" - I still gotta use that title on a
>> paper!).
>>
>> New flows tend to be extremely bursty - and new flows in the real
>> world also tend to be pretty short, with 95% of all web traffic
>> fitting into a single IW10.
>>
>> If e2e we know we are being FQ´d, and yet are bursting to find new
>> setpoints we can infer from the spacing on the other endpoint what the
>> contention really is.
>>
>> There was a stanford result for 10s of thousands of flows that found
>> an ideal setpoint much lower than we are achieving for dozens, at much
>> higher rates.
>>
>> A control theory-ish issue with codel is that it depends on an
>> arbitrary ideal (5ms) as a definition for "good queue", where "a
>> gooder queue”
>
>         I thought that our set point really is 5% of the estimated RTT, and we just default to 5 sincere we guestimate our RTT to be 100ms. Not that I complain, these two numbers seem to work decently over a relive broad range of true RTTs…

Yes, I should have talked about it as estimated RTT (interval) and a
seemingly desirable percentage(target). It is very helpful to think of
it that way if (as in my current testing) you are trying to see how
much better you can do at very short (sub 1ms) RTTs, where it really
is the interval you want to be modifying...

I have been fiddling with as a proof of concept - not an actual
algorithm - how much shorter you can make the queues at short RTTs.
What I did was gradually (per packet) subtract 10ns from the cake
target while at 100% utilization until the target hit 1ms (or bytes
outstanding dropped below 3k). Had the cake code still used a
calculated target from the interval (target >> 4) I would have fiddled
with the interval instead. Using the netperf-wrapper tcp_upload test:

There were two significant results from that (I really should just
start a blog so I can do images inline)

1) At 100Mbit, TSO offloads (bulking) add significant latency to
competing streams:

http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/offload_damage_100mbit.png

This gets much worse as you add tcp flows. I figure day traders would
take notice. TSO packets have much more mass.

2) You CAN get less packets outstanding at this RTT and still keep the
link 100% utilized.

The default codel algo stayed steady at 30-31 packets outstanding with
no losses or marks evident (TSQ?) while the shrinking dynamic target
ecn marked fairly heavily and ultimately reduced the packets
outstanding to 7-17 packets with a slight improvement in actual
throughput. (This stuff is so totally inside the noise floor that it
is hard to discern a difference at all - and you can see the linux
de-optimization for handing ping packets off to hardware in some of
the tests, after the tcp flows end, which skews the latency figures)

http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/dynamic_target_vs_static.png

I think it is back to ns3 to get better grips on some of this.

>
>
> Best Regards
>         Sebastian
>
>> is, in my definition at the moment, "1 packet outstanding ever closer
>> to 100% of the time while there is 100% utilization".
>>
>> We could continue to bang on things (reducing the target or other
>> methods) and aim for a lower ideal setpoint until utilization dropped
>> below 100%.
>>
>> Which becomes easier the more flows we know are in progress.
>>
>>> - Jonathan Morton
>>>
>>>
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>>
>>
>>
>>
>> --
>> Dave Täht
>> Open Networking needs **Open Source Hardware**
>>
>> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
>> _______________________________________________
>> Codel mailing list
>> Codel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/codel
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10 17:48     ` Dave Taht
@ 2015-05-10 17:58       ` Dave Taht
  2015-05-10 18:25       ` Dave Taht
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Taht @ 2015-05-10 17:58 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake, Jonathan Morton, codel, bloat

This was that patch against the https://github.com/dtaht/sch_cake repo.

http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/0001-sch_cake-add-experimental-decreasing-target-at-100-p.patch

(I am not seriously proposing this for anything... but I am loving
having cake be out of the main linux tree. I can have a new idea, make
a change to the algo, compile, test in a matter of seconds, and/or run
a comprehensive netperf-wrapper suite over a cup of coffee, and then
try something else)...

there is something of a backlog of new ideas on the cake mailing list
and elsewhere.

I guess I should track the random ideas in branches in the github repo.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Codel] [Cake] Control theory and congestion control
  2015-05-10 17:48     ` Dave Taht
  2015-05-10 17:58       ` Dave Taht
@ 2015-05-10 18:25       ` Dave Taht
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Taht @ 2015-05-10 18:25 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: cake, Jonathan Morton, codel, bloat

On Sun, May 10, 2015 at 10:48 AM, Dave Taht <dave.taht@gmail.com> wrote:

>>> A control theory-ish issue with codel is that it depends on an
>>> arbitrary ideal (5ms) as a definition for "good queue", where "a
>>> gooder queue”
>>
>>         I thought that our set point really is 5% of the estimated RTT, and we just default to 5 sincere we guestimate our RTT to be 100ms. Not that I complain, these two numbers seem to work decently over a relive broad range of true RTTs…
>
> Yes, I should have talked about it as estimated RTT (interval) and a
> seemingly desirable percentage(target). It is very helpful to think of
> it that way if (as in my current testing) you are trying to see how
> much better you can do at very short (sub 1ms) RTTs, where it really
> is the interval you want to be modifying...

oops. I meant target = interval >> 4; and would have decreased
interval by a larger amount or something relative to the rate, but
merely wanted to see the slope of the curve, and really need to write
cake_drop_monitor rather than just "watch tc -s qdisc show dev eth0"

>
> I have been fiddling with as a proof of concept - not an actual
> algorithm - how much shorter you can make the queues at short RTTs.
> What I did was gradually (per packet) subtract 10ns from the cake
> target while at 100% utilization until the target hit 1ms (or bytes
> outstanding dropped below 3k). Had the cake code still used a
> calculated target from the interval (target >> 4) I would have fiddled
> with the interval instead. Using the netperf-wrapper tcp_upload test:
>
> There were two significant results from that (I really should just
> start a blog so I can do images inline)
>
> 1) At 100Mbit, TSO offloads (bulking) add significant latency to
> competing streams:
>
> http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/offload_damage_100mbit.png
>
> This gets much worse as you add tcp flows. I figure day traders would
> take notice. TSO packets have much more mass.
>
> 2) You CAN get less packets outstanding at this RTT and still keep the
> link 100% utilized.
>
> The default codel algo stayed steady at 30-31 packets outstanding with
> no losses or marks evident (TSQ?) while the shrinking dynamic target
> ecn marked fairly heavily and ultimately reduced the packets
> outstanding to 7-17 packets with a slight improvement in actual
> throughput. (This stuff is so totally inside the noise floor that it
> is hard to discern a difference at all - and you can see the linux
> de-optimization for handing ping packets off to hardware in some of
> the tests, after the tcp flows end, which skews the latency figures)
>
> http://snapon.lab.bufferbloat.net/~d/cake_reduced_target/dynamic_target_vs_static.png
>
> I think it is back to ns3 to get better grips on some of this.
>
>>
>>
>> Best Regards
>>         Sebastian
>>
>>> is, in my definition at the moment, "1 packet outstanding ever closer
>>> to 100% of the time while there is 100% utilization".
>>>
>>> We could continue to bang on things (reducing the target or other
>>> methods) and aim for a lower ideal setpoint until utilization dropped
>>> below 100%.
>>>
>>> Which becomes easier the more flows we know are in progress.
>>>
>>>> - Jonathan Morton
>>>>
>>>>
>>>> _______________________________________________
>>>> Cake mailing list
>>>> Cake@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cake
>>>>
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> Open Networking needs **Open Source Hardware**
>>>
>>> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67
>>> _______________________________________________
>>> Codel mailing list
>>> Codel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/codel
>>
>
>
>
> --
> Dave Täht
> Open Networking needs **Open Source Hardware**
>
> https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <152DD781-725D-4DD7-AB94-C7412D92F82C@gmx.de>]

[parent not found: <1F323E22-817A-4212-A354-C6A14D2F1DBB@gmail.com>]

[parent not found: <E453FF95-5C1C-4A89-9C66-17FA33BBC83B@gmx.de>]

[parent not found: <C7425B27-6704-4269-8EFC-3D4CD9EE1FD1@gmail.com>]

* [Codel] Explicit Load Regulation - was: Control theory and congestion control
       [not found]       ` <C7425B27-6704-4269-8EFC-3D4CD9EE1FD1@gmail.com>
@ 2015-05-11 13:54         ` Jonathan Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Morton @ 2015-05-11 13:54 UTC (permalink / raw)
  To: cake; +Cc: codel, bloat

> On 11 May, 2015, at 14:34, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> I’m also now thinking about how to approximate fairness between ELR flows *without* flow isolation.  Since ELR would aim to provide a continuous signal rather than a stochastic one, this is actually a harder problem than it sounds; naively, a new flow would stay at minimum cwnd as long as a competing flow was saturating the link, since both would be given the same up/down signals.  There might need to be some non-obvious properties in the way the signal is provided to overcome that; I have the beginnings of an idea, but need to work it out.

And the result of a good wander is that I think we can, in fact, use the distinction between ECT(0) and ECT(1) to perform this signalling, and therefore we don’t need to somehow find extra bits in the IP headers.  This might take a little while to explain:

When an ELR flow is negotiated by the endpoints, senders set ECT(1) on all relevant packets they originate.  Since most ECN senders currently set ECT(0), and those that use ECT(1) at all tend to alternate between ECT(0) and ECT(1), routers are able to assume with sufficient safety that an ECT(1) packet can be used to carry an ELR signal, and that an ECT(0) packet belongs to a legacy ECN flow.

The “fast down” signal for both ECN and ELR flows is the ECN codepoint set in any packet during one RTT.  This is echoed back to the sender by the receiver, as for legacy ECN.  Compliant senders should halve their congestion window, or perform an equivalent backoff.

In an ELR flow, the ratio of ECT(1) to ECT(0) packets received is significant, and carries the remaining four states of the ELR protocol.  Receivers keep a history of the ECN codepoints in the most recent three data-bearing packets received on the flow.  They echo back to the sender the number of such packets which had ECT(1) set.  The significance of this number is as follows:

0: slow down - sender should perform a small, multiplicative (exponential) backoff in this RTT
1: hold      - sender should not increase send rate in this RTT
2: slow up   - sender may perform only additive (linear) increase in this RTT
3: fast up   - sender may perform multiplicative (exponential) increase (eg. slow start) in this RTT

Since one byte can hold four of these indications, the receiver may indicate a twelve-packet history in this way, allowing for sparse and lost acks.  Senders should perform all of the actions indicated by these signals which have *not* yet been performed, allowing for the possibility of overlap between subsequent signals.

Queues implementing ELR maintain one or more five-state control variables, which may be per flow, per traffic class or global, and reflect the queue's opinion of whether senders associated with each control variable may increase, should hold or should decrease their send rates (and how quickly) in order to match link capacity, or a fair share thereof, at that queue.  In most cases, there will be at most one queue on a given network path for which this opinion is not “may increase rapidly”; this is known as the bottleneck queue.

In the “may increase rapidly” state, the queue performs no modifications to the ECN field.

In the “may increase gradually” state, the queue changes one out of every three ECT(1) packets to ECT(0), and leaves all other packets unchanged.

In the “should hold” state, the queue changes two out of every three ECT(1) packets to ECT(0), and leaves all other packets unchanged.

In the “should decrease gradually” state, the queue changes all ECT(1) packets to ECT(0), and additionally changes some proportion of originally ECT(0) packets to the ECN codepoint, and drops the same proportion of Not-ECT packets.

In the “should decrease rapidly” state, all of the actions performed in “should decrease gradually” state are performed, but also ECT(1) packets are changed to the ECN codepoint at the same rate as ECT(0) packets.

It should be obvious that these five states correspond precisely to the “fast up”, “slow up”, “hold”, “slow down” and “fast down” signals observed by the receiver of a single flow.  Thus an ELR-compliant queue implementing flow isolation is able to precisely control the send rates of each flow passing through it.

The behaviour of multiple flows sharing a single ELR queue with a single control variable is more complex.  Consider the case where one ELR flow is established on the link, and has stabilised in the “hold” state, when a new ELR flow begins.  After the new flow’s initial congestion window is sent and acknowledged, it will also see the same two-out-of-three ECT(0) pattern (on average) as the established flow, and might then appear to be stuck in the “hold” state with its initial congestion window for all subsequent traffic.

However, packets for the new flow will disrupt the regular pattern of the established flow’s ELR signal, and vice versa, resulting in a stochastic distribution of “slow down” and “slow up” signals actually being received by both flows.  The resulting low-amplitude AIMD behaviour should result in the congestion windows of the two flows converging, eventually giving fair sharing of the link.  While convergence time and sensitivity to RTT are both inferior to a flow-isolating queue, they should be no worse than for conventional AQM queues.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-05-11 14:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAJq5cE35ptd-P=EPB4-qhnfVZiMmXWUFL4jD2_BxUUCvU2ACGw@mail.gmail.com>
2015-05-10  3:35 ` [Codel] [Cake] Control theory and congestion control Dave Taht
2015-05-10  6:55   ` Jonathan Morton
2015-05-10 17:00     ` Sebastian Moeller
2015-05-10 14:46   ` Jonathan Morton
2015-05-10 17:04   ` Sebastian Moeller
2015-05-10 17:48     ` Dave Taht
2015-05-10 17:58       ` Dave Taht
2015-05-10 18:25       ` Dave Taht
     [not found] ` <152DD781-725D-4DD7-AB94-C7412D92F82C@gmx.de>
     [not found]   ` <1F323E22-817A-4212-A354-C6A14D2F1DBB@gmail.com>
     [not found]     ` <E453FF95-5C1C-4A89-9C66-17FA33BBC83B@gmx.de>
     [not found]       ` <C7425B27-6704-4269-8EFC-3D4CD9EE1FD1@gmail.com>
2015-05-11 13:54         ` [Codel] Explicit Load Regulation - was: " Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox