[Codel] Proposing COBALT

CoDel AQM discussions
 help / color / mirror / Atom feed

* [Codel] Proposing COBALT
@ 2016-05-20 10:04 Jonathan Morton
  2016-05-20 11:37 ` [Codel] [Cake] " moeller0
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 10:04 UTC (permalink / raw)
  To: cake, codel

With the recent debate over handling unresponsive flows in fq_codel, I had a brainwave involving constructing a hybrid AQM which preserves Codel’s excellent properties on responsive flows, while also reacting appropriately when faced with a UDP flood.  The key difficulty was deciding when to switch over from the Codel behaviour to a PIE or RED like behaviour.

It turns out that BLUE is a perfect fit for this job, because it activates when the queue is completely full - an unambiguous signal that Codel has lost the plot and is unable to control the queue alone.  BLUE was one of the more promising AQMs in the days immediately prior to Codel’s ascendance, so it should be effective outside Codel’s speciality.

The name COBALT, as well as referring to a nice shade of blue, can read “Codel-BLUE Alternate”.

It is unnecessary to explicitly “switch over” between Codel and BLUE; they can work in parallel, since their operating characteristics are independent.  It may be feasible to simplify the Codel implementation, since it will no longer need to handle overload conditions as robustly.  For example, the Codel section should use ECN marking whenever possible, and never drop an ECN-Capable packet; the BLUE section should ignore ECN capability and simply drop packets, since the traffic is evidently not responding to any ECN signals if BLUE is triggered.

One of the major reasons why Codel fails on UDP floods is that its drop schedule is time-based.  This is the correct behaviour for TCP flows, which respond adequately to one congestion signal per RTT, regardless of the packet rate.  However, it means it is easily overwhelmed by high-packet-rate unresponsive (or anti-responsive, as with TCP acks) floods, which an attacker or lab test can easily produce on a high-bandwidth ingress, especially using small packets.

BLUE, by contrast, uses a drop *probability*, so its effectiveness on floods is independent of the packet rate.  If necessary, its drop rate can increase to 100% in a reasonable amount of time.

A couple of details are necessary to integrate BLUE with a flow-isolating qdisc:

BLUE’s up-trigger should be on a packet drop due to overflow (only) targeting the individual subqueue managed by that particular BLUE instance.  It is not correct to trigger BLUE globally when an overall overflow occurs.  Note also that BLUE has a timeout between triggers, which should I think be scaled according to the estimated RTT.

BLUE’s down-trigger is on the subqueue being empty when a packet is requested from it, again on a timeout.  To ensure this occurs, it may be necessary to retain subqueues in the DRR list while BLUE’s drop probability is nonzero.

Note that this does nothing to improve the situation regarding fragmented packets.  I think the correct solution in that case is to divert all fragments (including the first) into a particular queue dependent only on the host pair, by assuming zero for src and dst ports and a “special” protocol number.  This has the distinct advantages of keeping related fragments together, and ensuring they can’t take up a disproportionate share of bandwidth in competition with normal traffic.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 10:04 [Codel] Proposing COBALT Jonathan Morton
@ 2016-05-20 11:37 ` moeller0
  2016-05-20 12:18   ` Jonathan Morton
  0 siblings, 1 reply; 45+ messages in thread
From: moeller0 @ 2016-05-20 11:37 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel

Hi Jonathan,

interesting ideas.

> On May 20, 2016, at 12:04 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> With the recent debate over handling unresponsive flows in fq_codel, I had a brainwave involving constructing a hybrid AQM which preserves Codel’s excellent properties on responsive flows, while also reacting appropriately when faced with a UDP flood.  The key difficulty was deciding when to switch over from the Codel behaviour to a PIE or RED like behaviour.
> 
> It turns out that BLUE is a perfect fit for this job, because it activates when the queue is completely full - an unambiguous signal that Codel has lost the plot and is unable to control the queue alone.  BLUE was one of the more promising AQMs in the days immediately prior to Codel’s ascendance, so it should be effective outside Codel’s speciality.
> 
> The name COBALT, as well as referring to a nice shade of blue, can read “Codel-BLUE Alternate”.

	That is important, alwas start with a good acronym ;) (now really there are some EU funding programs that actually require you to supply an acronym if applying for a grant).

> 
> It is unnecessary to explicitly “switch over” between Codel and BLUE; they can work in parallel, since their operating characteristics are independent.  It may be feasible to simplify the Codel implementation, since it will no longer need to handle overload conditions as robustly.  For example, the Codel section should use ECN marking whenever possible, and never drop an ECN-Capable packet; the BLUE section should ignore ECN capability and simply drop packets, since the traffic is evidently not responding to any ECN signals if BLUE is triggered.
> 
> One of the major reasons why Codel fails on UDP floods is that its drop schedule is time-based.  This is the correct behaviour for TCP flows, which respond adequately to one congestion signal per RTT, regardless of the packet rate.  However, it means it is easily overwhelmed by high-packet-rate unresponsive (or anti-responsive, as with TCP acks) floods, which an attacker or lab test can easily produce on a high-bandwidth ingress, especially using small packets.

	In essence I agree, but want to point out that the protocol itself does not really matter but rather the observed behavior of a flow. Civilized UDP applications (that expect their data to be carried over the best-effort internet) will also react to drops similar to decent TCP flows, and crappy TCP implementations might not. I would guess with the maturity of TCP stacks misbehaving TCP flows will be rarer than misbehaving UDP flows (which might be for example well-behaved fixed-rate isochronous flows that simply should never have been sent over the internet).

> 
> BLUE, by contrast, uses a drop *probability*, so its effectiveness on floods is independent of the packet rate.  If necessary, its drop rate can increase to 100% in a reasonable amount of time.
> 
> A couple of details are necessary to integrate BLUE with a flow-isolating qdisc:
> 
> BLUE’s up-trigger should be on a packet drop due to overflow (only) targeting the individual subqueue managed by that particular BLUE instance.  It is not correct to trigger BLUE globally when an overall overflow occurs.  Note also that BLUE has a timeout between triggers, which should I think be scaled according to the estimated RTT.

	That sounds nice in that no additional state is required. But with the current fq_codel I believe, the packet causing the memory limit overrun, is not necessarily from the flow that actually caused the problem to beginn with, and I doesn’t fq_codel actuall search the fattest flow and drops from there. But I guess that selection procedure could be run with blue as as well.

> 
> BLUE’s down-trigger is on the subqueue being empty when a packet is requested from it, again on a timeout.  To ensure this occurs, it may be necessary to retain subqueues in the DRR list while BLUE’s drop probability is nonzero.

	Question, doesn’t this mean the affected flow will be throttled quite harshly? Will blue slowly decrease the drop probability p if the flow behaves? If so, blue could just disengage if p drops below a threshold?

> 
> Note that this does nothing to improve the situation regarding fragmented packets.  I think the correct solution in that case is to divert all fragments (including the first) into a particular queue dependent only on the host pair, by assuming zero for src and dst ports and a “special” protocol number.  

	I believe the RFC recommends using the SRC IP, DST IP, Protocol, Identity tuple, as otherwise all fragmented flows between a host pair will hash into the same bucket…

Best Regards
	Sebastian

> This has the distinct advantages of keeping related fragments together, and ensuring they can’t take up a disproportionate share of bandwidth in competition with normal traffic.
> 
> - Jonathan Morton
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 11:37 ` [Codel] [Cake] " moeller0
@ 2016-05-20 12:18   ` Jonathan Morton
  2016-05-20 13:22     ` moeller0
  2016-05-20 13:41     ` David Lang
  0 siblings, 2 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 12:18 UTC (permalink / raw)
  To: moeller0; +Cc: cake, codel

>> One of the major reasons why Codel fails on UDP floods is that its drop schedule is time-based.  This is the correct behaviour for TCP flows, which respond adequately to one congestion signal per RTT, regardless of the packet rate.  However, it means it is easily overwhelmed by high-packet-rate unresponsive (or anti-responsive, as with TCP acks) floods, which an attacker or lab test can easily produce on a high-bandwidth ingress, especially using small packets.
> 
> In essence I agree, but want to point out that the protocol itself does not really matter but rather the observed behavior of a flow. Civilized UDP applications (that expect their data to be carried over the best-effort internet) will also react to drops similar to decent TCP flows, and crappy TCP implementations might not. I would guess with the maturity of TCP stacks misbehaving TCP flows will be rarer than misbehaving UDP flows (which might be for example well-behaved fixed-rate isochronous flows that simply should never have been sent over the internet).

Codel properly handles both actual TCP flows and other flows supporting TCP-friendly congestion control.  The intent of COBALT is for BLUE to activate whenever Codel clearly cannot cope, rather than on a protocol-specific basis.  This happens to dovetail neatly with the way BLUE works anyway.

>> BLUE’s up-trigger should be on a packet drop due to overflow (only) targeting the individual subqueue managed by that particular BLUE instance.  It is not correct to trigger BLUE globally when an overall overflow occurs.  Note also that BLUE has a timeout between triggers, which should I think be scaled according to the estimated RTT.
> 
> That sounds nice in that no additional state is required. But with the current fq_codel I believe, the packet causing the memory limit overrun, is not necessarily from the flow that actually caused the problem to beginn with, and I doesn’t fq_codel actuall search the fattest flow and drops from there. But I guess that selection procedure could be run with blue as as well.

Yes, both fq_codel and Cake search for the longest extant queue and drop packets from that on overflow.  It is this longest queue which would receive the BLUE up-trigger at that point, which is not necessarily the queue for the arriving packet.

>> BLUE’s down-trigger is on the subqueue being empty when a packet is requested from it, again on a timeout.  To ensure this occurs, it may be necessary to retain subqueues in the DRR list while BLUE’s drop probability is nonzero.
> 
> Question, doesn’t this mean the affected flow will be throttled quite harshly? Will blue slowly decrease the drop probability p if the flow behaves? If so, blue could just disengage if p drops below a threshold?

Given that within COBALT, BLUE will normally only trigger on unresponsive flows, an aggressive up-trigger response from BLUE is in fact desirable.  Codel is far too meek to handle this situation; we should not seek to emulate it when designing a scheme to work around its limitations.

BLUE’s down-trigger decreases the drop probability by a smaller amount (say 1/4000) than the up-trigger increases it (say 1/400).  These figures are the best-performing configuration from the original paper, which is very readable, and behaviour doesn’t seem to be especially sensitive to the precise values (though only highly-aggregated traffic was considered, and probably on a long timescale).  For an actual implementation, I would choose convenient binary fractions, such as 1/256 up and 1/4096 down, and a relatively short trigger timeout.

If the relative load from the flow decreases, BLUE’s action will begin to leave the subqueue empty when serviced, causing BLUE’s drop probability to fall off gradually, potentially until it reaches zero.  At this point the subqueue is naturally reset and will react normally to subsequent traffic using it.

The BLUE paper: http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf

>> Note that this does nothing to improve the situation regarding fragmented packets.  I think the correct solution in that case is to divert all fragments (including the first) into a particular queue dependent only on the host pair, by assuming zero for src and dst ports and a “special” protocol number.  
> 
> I believe the RFC recommends using the SRC IP, DST IP, Protocol, Identity tuple, as otherwise all fragmented flows between a host pair will hash into the same bucket…

I disagree with that recommendation, because the Identity field will be different for each fragmented packet, even if many such packets belong to the same flow.  This would spread these packets across many subqueues and give them an unfair advantage over normal flows, which is the opposite of what we want.

Normal traffic does not include large numbers of fragmented packets (I would expect a mere handful from certain one-shot request-response protocols which can produce large responses), so it is better to shunt them to a single queue per host-pair.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 12:18   ` Jonathan Morton
@ 2016-05-20 13:22     ` moeller0
  2016-05-20 14:36       ` Jonathan Morton
  2016-05-20 13:41     ` David Lang
  1 sibling, 1 reply; 45+ messages in thread
From: moeller0 @ 2016-05-20 13:22 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel

Hi Jonathan,

> On May 20, 2016, at 14:18 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>>> One of the major reasons why Codel fails on UDP floods is that its drop schedule is time-based.  This is the correct behaviour for TCP flows, which respond adequately to one congestion signal per RTT, regardless of the packet rate.  However, it means it is easily overwhelmed by high-packet-rate unresponsive (or anti-responsive, as with TCP acks) floods, which an attacker or lab test can easily produce on a high-bandwidth ingress, especially using small packets.
>> 
>> In essence I agree, but want to point out that the protocol itself does not really matter but rather the observed behavior of a flow. Civilized UDP applications (that expect their data to be carried over the best-effort internet) will also react to drops similar to decent TCP flows, and crappy TCP implementations might not. I would guess with the maturity of TCP stacks misbehaving TCP flows will be rarer than misbehaving UDP flows (which might be for example well-behaved fixed-rate isochronous flows that simply should never have been sent over the internet).
> 
> Codel properly handles both actual TCP flows and other flows supporting TCP-friendly congestion control.  The intent of COBALT is for BLUE to activate whenever Codel clearly cannot cope, rather than on a protocol-specific basis.  This happens to dovetail neatly with the way BLUE works anyway.

	Well, as I said I agree, only wanted to smart alec around the tcp versus udp flood destinction. And I fully agree the behaviur should depend on observed flow behavior and not header values…

> 
>>> BLUE’s up-trigger should be on a packet drop due to overflow (only) targeting the individual subqueue managed by that particular BLUE instance.  It is not correct to trigger BLUE globally when an overall overflow occurs.  Note also that BLUE has a timeout between triggers, which should I think be scaled according to the estimated RTT.
>> 
>> That sounds nice in that no additional state is required. But with the current fq_codel I believe, the packet causing the memory limit overrun, is not necessarily from the flow that actually caused the problem to beginn with, and I doesn’t fq_codel actuall search the fattest flow and drops from there. But I guess that selection procedure could be run with blue as as well.
> 
> Yes, both fq_codel and Cake search for the longest extant queue and drop packets from that on overflow.  It is this longest queue which would receive the BLUE up-trigger at that point, which is not necessarily the queue for the arriving packet.
> 
>>> BLUE’s down-trigger is on the subqueue being empty when a packet is requested from it, again on a timeout.  To ensure this occurs, it may be necessary to retain subqueues in the DRR list while BLUE’s drop probability is nonzero.
>> 
>> Question, doesn’t this mean the affected flow will be throttled quite harshly? Will blue slowly decrease the drop probability p if the flow behaves? If so, blue could just disengage if p drops below a threshold?
> 
> Given that within COBALT, BLUE will normally only trigger on unresponsive flows, an aggressive up-trigger response from BLUE is in fact desirable.  

	Sure, by that point the flow had ample/some time to react, but didn’t so a sliding tackle is warranted.

> Codel is far too meek to handle this situation; we should not seek to emulate it when designing a scheme to work around its limitations.

	And again since we triggerd blue by crossiing a threshold we know that codel’s way of asking nicely whether the flow might reduce its bandwidth lead o where…

> 
> BLUE’s down-trigger decreases the drop probability by a smaller amount (say 1/4000) than the up-trigger increases it (say 1/400).  These figures are the best-performing configuration from the original paper, which is very readable, and behaviour doesn’t seem to be especially sensitive to the precise values (though only highly-aggregated traffic was considered, and probably on a long timescale).  For an actual implementation, I would choose convenient binary fractions, such as 1/256 up and 1/4096 down, and a relatively short trigger timeout.
> 
> If the relative load from the flow decreases, BLUE’s action will begin to leave the subqueue empty when serviced, causing BLUE’s drop probability to fall off gradually, potentially until it reaches zero.  At this point the subqueue is naturally reset and will react normally to subsequent traffic using it.

	But if we reach a queue length of codel’s target (for some small amount of time), would that not be the best point in time to hand back to codel? Otherwise we push the queue to zero only to have codel come in and let it grow back to target (well approximately).

> 
> The BLUE paper: http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf

	If I had time I would read that now ;)

> 
>>> Note that this does nothing to improve the situation regarding fragmented packets.  I think the correct solution in that case is to divert all fragments (including the first) into a particular queue dependent only on the host pair, by assuming zero for src and dst ports and a “special” protocol number.  
>> 
>> I believe the RFC recommends using the SRC IP, DST IP, Protocol, Identity tuple, as otherwise all fragmented flows between a host pair will hash into the same bucket…
> 
> I disagree with that recommendation, because the Identity field will be different for each fragmented packet,

	Ah, I see from rfc 791 (https://tools.ietf.org/html/rfc791):
    The identification field is used to distinguish the fragments of one
    datagram from those of another.  The originating protocol module of
    an internet datagram sets the identification field to a value that
    must be unique for that source-destination pair and protocol for the
    time the datagram will be active in the internet system.  The
    originating protocol module of a complete datagram sets the
    more-fragments flag to zero and the fragment offset to zero.

I agree the identity field decidely does the wrong thing, by spreading even a single flow over all hash buckets. That leaves my proposal from earlier, extract the ports from packets marked MF=1 Fragment offset packets, store the identity and use the stored values to calculate the hash values for all other packets in the same fragmented datagram… That sounds expensive enough to initially punt and use your idea, but certainly it is not ideal.

> even if many such packets belong to the same flow.  This would spread these packets across many subqueues and give them an unfair advantage over normal flows, which is the opposite of what we want.
> 
> Normal traffic does not include large numbers of fragmented packets (I would expect a mere handful from certain one-shot request-response protocols which can produce large responses), so it is better to shunt them to a single queue per host-pair.

	This kind of special-casing can easily be abused as an attack vector… really if possible even fragmented flows should be hashed properly. If you are unlucky and set the wrong MTU for a ppoe link for example all full MTU packets will be fragmented and it would be nice to even show grace under load ;)

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 13:22     ` moeller0
@ 2016-05-20 14:36       ` Jonathan Morton
  2016-05-20 16:03         ` David Lang
       [not found]         ` <CALnBQ5mNgHgFoTcvLxppv2P9XODc4D-4NObKyqbZJ0PccVkwiA@mail.gmail.com>
  0 siblings, 2 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 14:36 UTC (permalink / raw)
  To: moeller0; +Cc: cake, codel

>> If the relative load from the flow decreases, BLUE’s action will begin to leave the subqueue empty when serviced, causing BLUE’s drop probability to fall off gradually, potentially until it reaches zero.  At this point the subqueue is naturally reset and will react normally to subsequent traffic using it.
> 
> But if we reach a queue length of codel’s target (for some small amount of time), would that not be the best point in time to hand back to codel? Otherwise we push the queue to zero only to have codel come in and let it grow back to target (well approximately).

No, because at that moment we can only assume that it is the heavy pressure of BLUE that is keeping the queue under control.  Again, this is an aspect of Codel’s behaviour which should not be duplicated in its alternate, because it depends on assumptions which have been demonstrated not to hold.

BLUE doesn’t even start backing off until it sees the queue empty, so for the simple and common case of an isochronous flow (or a steady flood limited by some upstream capacity), BLUE will rapidly increase its drop rate until the queue stops being continuously full.  In all likelihood the queue will now slowly and steadily drain until it is empty.  But the load is still there, so if BLUE stopped dropping entirely at that point, the queue would almost instantly be full again and it would have to ramp up from scratch.

Instead, BLUE backs off slightly and waits to see if the queue *remains* empty during its timeout.  If so, it backs off some more.  As long as the queue is still serviced while BLUE’s drop probability is nonzero, it will back down all the way to zero *if* the traffic has really gone away or become responsive enough for Codel to deal with.

Hence BLUE will hand control back to Codel only when it is sure its extra effort is not required.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 14:36       ` Jonathan Morton
@ 2016-05-20 16:03         ` David Lang
  2016-05-20 17:31           ` Jonathan Morton
       [not found]         ` <CALnBQ5mNgHgFoTcvLxppv2P9XODc4D-4NObKyqbZJ0PccVkwiA@mail.gmail.com>
  1 sibling, 1 reply; 45+ messages in thread
From: David Lang @ 2016-05-20 16:03 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: moeller0, cake, codel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2363 bytes --]

On Fri, 20 May 2016, Jonathan Morton wrote:

>>> If the relative load from the flow decreases, BLUE’s action will begin to 
>>> leave the subqueue empty when serviced, causing BLUE’s drop probability to 
>>> fall off gradually, potentially until it reaches zero.  At this point the 
>>> subqueue is naturally reset and will react normally to subsequent traffic 
>>> using it.
>> 
>> But if we reach a queue length of codel’s target (for some small amount of 
>> time), would that not be the best point in time to hand back to codel? 
>> Otherwise we push the queue to zero only to have codel come in and let it 
>> grow back to target (well approximately).
>
> No, because at that moment we can only assume that it is the heavy pressure of 
> BLUE that is keeping the queue under control.  Again, this is an aspect of 
> Codel’s behaviour which should not be duplicated in its alternate, because it 
> depends on assumptions which have been demonstrated not to hold.
>
> BLUE doesn’t even start backing off until it sees the queue empty, so for the 
> simple and common case of an isochronous flow (or a steady flood limited by 
> some upstream capacity), BLUE will rapidly increase its drop rate until the 
> queue stops being continuously full.  In all likelihood the queue will now 
> slowly and steadily drain until it is empty.  But the load is still there, so 
> if BLUE stopped dropping entirely at that point, the queue would almost 
> instantly be full again and it would have to ramp up from scratch.
>
> Instead, BLUE backs off slightly and waits to see if the queue *remains* empty 
> during its timeout.  If so, it backs off some more.  As long as the queue is 
> still serviced while BLUE’s drop probability is nonzero, it will back down all 
> the way to zero *if* the traffic has really gone away or become responsive 
> enough for Codel to deal with.
>
> Hence BLUE will hand control back to Codel only when it is sure its extra 
> effort is not required.

How about testing having BLUE back off not when the queue is empty, but when the 
queue is down to the codel target.

If it's efforts are still needed, it will ramp back up, but waiting for the 
queue to go all the way to zero repeatedly, with codel still running, seems like 
a pessimistic way to go that will over-penalize a flow that eventually corrects 
itself.

David Lang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 16:03         ` David Lang
@ 2016-05-20 17:31           ` Jonathan Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 17:31 UTC (permalink / raw)
  To: David Lang; +Cc: moeller0, cake, codel

> On 20 May, 2016, at 19:03, David Lang <david@lang.hm> wrote:
> 
> On Fri, 20 May 2016, Jonathan Morton wrote:
> 
>>>> If the relative load from the flow decreases, BLUE’s action will begin to leave the subqueue empty when serviced, causing BLUE’s drop probability to fall off gradually, potentially until it reaches zero.  At this point the subqueue is naturally reset and will react normally to subsequent traffic using it.
>>> But if we reach a queue length of codel’s target (for some small amount of time), would that not be the best point in time to hand back to codel? Otherwise we push the queue to zero only to have codel come in and let it grow back to target (well approximately).
>> 
>> No, because at that moment we can only assume that it is the heavy pressure of BLUE that is keeping the queue under control.  Again, this is an aspect of Codel’s behaviour which should not be duplicated in its alternate, because it depends on assumptions which have been demonstrated not to hold.
>> 
>> BLUE doesn’t even start backing off until it sees the queue empty, so for the simple and common case of an isochronous flow (or a steady flood limited by some upstream capacity), BLUE will rapidly increase its drop rate until the queue stops being continuously full.  In all likelihood the queue will now slowly and steadily drain until it is empty.  But the load is still there, so if BLUE stopped dropping entirely at that point, the queue would almost instantly be full again and it would have to ramp up from scratch.
>> 
>> Instead, BLUE backs off slightly and waits to see if the queue *remains* empty during its timeout.  If so, it backs off some more.  As long as the queue is still serviced while BLUE’s drop probability is nonzero, it will back down all the way to zero *if* the traffic has really gone away or become responsive enough for Codel to deal with.
>> 
>> Hence BLUE will hand control back to Codel only when it is sure its extra effort is not required.
> 
> How about testing having BLUE back off not when the queue is empty, but when the queue is down to the codel target.
> 
> If it's efforts are still needed, it will ramp back up, but waiting for the queue to go all the way to zero repeatedly, with codel still running, seems like a pessimistic way to go that will over-penalize a flow that eventually corrects itself.

I can think of three cases where BLUE might trigger on a flow which is, in fact, responsive.

1) The true RTT is *much* longer than the estimated RTT - this could reasonably happen if the estimated RTT is “regional” grade but the flow is going over a satellite link or two.  Since the queue limit is auto-tuned (in Cake) with the estimated RTT in mind, it’s possible for a TCP in slow-start to bounce off the end of that queue if it has a very long RTT, even though Codel is frantically signalling to it over ECN.  The damage here is limited to a few, widely-spaced dropped packets which are easily retransmitted, with no RTOs, and a somewhat excessive (temporary) reduction in the congestion window.

2) Something downstream is munging the ECN bits and erasing Codel’s signal.  This should be a rare situation in practice, but effectively leaves BLUE in sole charge of managing the queue length via packet drops.  It should be adequate at that job.

3) The flow is managed by DCTCP rather than a normal TCP.  DCTCP’s response to ECN signalling is much softer than standard, and I suspect Codel will be unable to control it, because it doesn’t behave the way DCTCP expects (which is something more akin to a specially configured RED).  However, DCTCP is (supposed to be) responsive to packet drops to the standard extent, so as in case 2, BLUE will take sole charge.

I don’t really see how leaving Codel “primed” to take over from BLUE helps in any of these situations.  However, Codel’s additional action superimposed on BLUE might result in a stable state emerging, rather than the queue length oscillating between “empty” and “full”.  This is best achieved by having BLUE’s down-trigger *below* that of Codel.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

[parent not found: <CALnBQ5mNgHgFoTcvLxppv2P9XODc4D-4NObKyqbZJ0PccVkwiA@mail.gmail.com>]

* Re: [Codel] [Cake] Proposing COBALT
       [not found]         ` <CALnBQ5mNgHgFoTcvLxppv2P9XODc4D-4NObKyqbZJ0PccVkwiA@mail.gmail.com>
@ 2016-05-20 16:43           ` Jonathan Morton
  2016-05-23 18:30             ` Jonathan Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 16:43 UTC (permalink / raw)
  To: Luis E. Garcia; +Cc: cake, codel

> On 20 May, 2016, at 19:37, Luis E. Garcia <luis@bitamins.net> wrote:
> 
> I think this would be a great idea to implement and test.
> Can COBALT's behavior be easily implemented to test it using the OpenWRT or LEVE ?

I assume you mean LEDE.

Yes, the BLUE algorithm is very simple (and is already in Linux, if you want to see how it behaves independently).  It’s merely a case of modifying a fork of sch_codel and/or sch_fq_codel and/or sch_cake to run it in parallel with the Codel algorithm.

I’ll probably get around to it once I’ve got some current Cake changes out of the way.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 16:43           ` Jonathan Morton
@ 2016-05-23 18:30             ` Jonathan Morton
  2016-05-24 13:47               ` Jeff Weeks
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-23 18:30 UTC (permalink / raw)
  To: Luis E. Garcia; +Cc: cake, codel

> On 20 May, 2016, at 19:43, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>> On 20 May, 2016, at 19:37, Luis E. Garcia <luis@bitamins.net> wrote:
>> 
>> I think this would be a great idea to implement and test.
>> Can COBALT's behavior be easily implemented to test it using the OpenWRT or LEVE ?
> 
> I assume you mean LEDE.
> 
> Yes, the BLUE algorithm is very simple (and is already in Linux, if you want to see how it behaves independently).  It’s merely a case of modifying a fork of sch_codel and/or sch_fq_codel and/or sch_cake to run it in parallel with the Codel algorithm.
> 
> I’ll probably get around to it once I’ve got some current Cake changes out of the way.

While I don’t have COBALT working in an actual qdisc yet, I’ve coded the core algorithm - including a major refactoring of Codel.  This core code, containing *both* Codel and BLUE, is 90 lines *shorter* than codel5.h alone.  Quite a surprising amount of simplification.

There’s even a possibility that it’ll be faster, especially on embedded CPUs, simply because it’s smaller.

The simplification partly results from a change in API structure.  Rather than calling back into the qdisc to retrieve however many packets it wants, COBALT is handed one packet and a timestamp, and returns a flag indicating whether that packet should be dropped or delivered.  It becomes the qdisc’s responsibility to dequeue candidate packets and perform the actual dropping.  So there is no longer a gnarly branched loop in the middle of the AQM algorithm.

There were objections to Codel’s “callback” structure for other reasons, too.  The refactoring obviates them all.

The one remaining loop in the fast path is a new backoff strategy for the Codel phase where it’s just come out of the dropping state.  Originally Codel reduced count by 1 or 2 immediately, and reset count to zero after an arbitrary number of intervals had passed without the target delay being exceeded.  My previous modification changed the immediate reduction to a halving, in an attempt to avoid unbounded growth of the count value.

In COBALT, I keep the drop-scheduler running in this phase, but without actually dropping packets, and *decrementing* count instead of incrementing it; the backoff phase then naturally ends when count returns to zero, instead of after an arbitrary hard timeout.  The loop simply ensures that count will reduce by the correct amount, even if traffic temporarily ceases on the queue.  Ideally, this should cause Codel’s count value to stabilise where 50% of the time is spent above target sojourn time, and 50% below.  (Actual behaviour won’t quite be ideal, but it should be closer than before.)

As another simplification, I eliminated the “primed” state (waiting for interval to expire) as an explicit entity, by simply scheduling the first drop event to be at now+interval when entering the dropping state.  This also eliminates the first_above_time variable.  Any packets with sojourn times below target will bump Codel out of the dropping state anyway.

Yet another simplification is enabled by COBALT’s hybrid structure and the tacit assumption that it will be used with flow isolation.  Because BLUE is present to handle overloads, much logic to handle overloads and “extra” signalling in Codel is not replicated in the refactored version.  For example, there is no “drop then mark” logic any more; in all probability the traffic in one flow is either all ECN capable or all not so.

BLUE doesn’t add much code; only two lines in the fast path, and a couple of extra entry points for the qdisc to signal queue-full and queue-empty.

While going over codel5.h, I noticed that the invsqrt cache stored as u16 while calculations were being done in u32.  This probably caused some major weirdness with Codel’s behaviour in Cake, so I fixed it in the original.

Next step: get an actual qdisc working using this.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-23 18:30             ` Jonathan Morton
@ 2016-05-24 13:47               ` Jeff Weeks
  2016-05-24 14:07                 ` Jonathan Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff Weeks @ 2016-05-24 13:47 UTC (permalink / raw)
  To: Jonathan Morton, Luis E. Garcia; +Cc: cake, codel

> In COBALT, I keep the drop-scheduler running in this phase, but without actually dropping packets, and *decrementing* count instead of incrementing it; the backoff phase then 
> naturally ends when count returns to zero, instead of after an arbitrary hard timeout.  The loop simply ensures that count will reduce by the correct amount, even if traffic 
> temporarily ceases on the queue.  Ideally, this should cause Codel’s count value to stabilise where 50% of the time is spent above target sojourn time, and 50% below.  (Actual 
> behaviour won’t quite be ideal, but it should be closer than before.)

I tried this as well, at one point, but can't remember, off-hand, why I didn't stick with it; will have to see if I can find mention of it in my notes.
What trigger are you using to decrement count?  I initially did a crude decrement of count every interval, but then you end up with a ramp-down time which is considerably slower then the ramp-up (and the ramp up is slow to begin with).
I assume you're actually re-calculating the next drop, using the 1/sqrt(count) but instead of dropping and increasing count, you're simply decreasing count, so the time to get from 1->N is the same as the time to get to N->1?

> As another simplification, I eliminated the “primed” state (waiting for interval to expire) as an explicit entity, by simply scheduling the first drop event to be at now+interval when 
> entering the dropping state.  This also eliminates the first_above_time variable.  Any packets with sojourn times below target will bump Codel out of the dropping state anyway.

How do you handle the case where you're scheduled a drop event 100ms in the future, and we immediately see low latency; is the event descheduled?
If not, what if we then see high latency again; can the still-scheduled-event cause us to start dropping packets earlier than 100ms?

--Jeff
________________________________________
From: Codel [codel-bounces@lists.bufferbloat.net] on behalf of Jonathan Morton [chromatix99@gmail.com]
Sent: Monday, May 23, 2016 2:30 PM
To: Luis E. Garcia
Cc: cake@lists.bufferbloat.net; codel@lists.bufferbloat.net
Subject: Re: [Codel] [Cake] Proposing COBALT

> On 20 May, 2016, at 19:43, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 20 May, 2016, at 19:37, Luis E. Garcia <luis@bitamins.net> wrote:
>>
>> I think this would be a great idea to implement and test.
>> Can COBALT's behavior be easily implemented to test it using the OpenWRT or LEVE ?
>
> I assume you mean LEDE.
>
> Yes, the BLUE algorithm is very simple (and is already in Linux, if you want to see how it behaves independently).  It’s merely a case of modifying a fork of sch_codel and/or sch_fq_codel and/or sch_cake to run it in parallel with the Codel algorithm.
>
> I’ll probably get around to it once I’ve got some current Cake changes out of the way.

While I don’t have COBALT working in an actual qdisc yet, I’ve coded the core algorithm - including a major refactoring of Codel.  This core code, containing *both* Codel and BLUE, is 90 lines *shorter* than codel5.h alone.  Quite a surprising amount of simplification.

There’s even a possibility that it’ll be faster, especially on embedded CPUs, simply because it’s smaller.

The simplification partly results from a change in API structure.  Rather than calling back into the qdisc to retrieve however many packets it wants, COBALT is handed one packet and a timestamp, and returns a flag indicating whether that packet should be dropped or delivered.  It becomes the qdisc’s responsibility to dequeue candidate packets and perform the actual dropping.  So there is no longer a gnarly branched loop in the middle of the AQM algorithm.

There were objections to Codel’s “callback” structure for other reasons, too.  The refactoring obviates them all.

The one remaining loop in the fast path is a new backoff strategy for the Codel phase where it’s just come out of the dropping state.  Originally Codel reduced count by 1 or 2 immediately, and reset count to zero after an arbitrary number of intervals had passed without the target delay being exceeded.  My previous modification changed the immediate reduction to a halving, in an attempt to avoid unbounded growth of the count value.

In COBALT, I keep the drop-scheduler running in this phase, but without actually dropping packets, and *decrementing* count instead of incrementing it; the backoff phase then naturally ends when count returns to zero, instead of after an arbitrary hard timeout.  The loop simply ensures that count will reduce by the correct amount, even if traffic temporarily ceases on the queue.  Ideally, this should cause Codel’s count value to stabilise where 50% of the time is spent above target sojourn time, and 50% below.  (Actual behaviour won’t quite be ideal, but it should be closer than before.)

As another simplification, I eliminated the “primed” state (waiting for interval to expire) as an explicit entity, by simply scheduling the first drop event to be at now+interval when entering the dropping state.  This also eliminates the first_above_time variable.  Any packets with sojourn times below target will bump Codel out of the dropping state anyway.

Yet another simplification is enabled by COBALT’s hybrid structure and the tacit assumption that it will be used with flow isolation.  Because BLUE is present to handle overloads, much logic to handle overloads and “extra” signalling in Codel is not replicated in the refactored version.  For example, there is no “drop then mark” logic any more; in all probability the traffic in one flow is either all ECN capable or all not so.

BLUE doesn’t add much code; only two lines in the fast path, and a couple of extra entry points for the qdisc to signal queue-full and queue-empty.

While going over codel5.h, I noticed that the invsqrt cache stored as u16 while calculations were being done in u32.  This probably caused some major weirdness with Codel’s behaviour in Cake, so I fixed it in the original.

Next step: get an actual qdisc working using this.

 - Jonathan Morton

_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-24 13:47               ` Jeff Weeks
@ 2016-05-24 14:07                 ` Jonathan Morton
  2016-05-24 15:52                   ` Dave Täht
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-24 14:07 UTC (permalink / raw)
  To: Jeff Weeks; +Cc: Luis E. Garcia, cake, codel


> On 24 May, 2016, at 16:47, Jeff Weeks <jweeks@sandvine.com> wrote:
> 
>> In COBALT, I keep the drop-scheduler running in this phase, but without actually dropping packets, and *decrementing* count instead of incrementing it; the backoff phase then 
>> naturally ends when count returns to zero, instead of after an arbitrary hard timeout.  The loop simply ensures that count will reduce by the correct amount, even if traffic 
>> temporarily ceases on the queue.  Ideally, this should cause Codel’s count value to stabilise where 50% of the time is spent above target sojourn time, and 50% below.  (Actual 
>> behaviour won’t quite be ideal, but it should be closer than before.)
> 
> I tried this as well, at one point, but can't remember, off-hand, why I didn't stick with it; will have to see if I can find mention of it in my notes.
> What trigger are you using to decrement count?  I initially did a crude decrement of count every interval, but then you end up with a ramp-down time which is considerably slower then the ramp-up (and the ramp up is slow to begin with).
> I assume you're actually re-calculating the next drop, using the 1/sqrt(count) but instead of dropping and increasing count, you're simply decreasing count, so the time to get from 1->N is the same as the time to get to N->1?

That’s basically right.  In retrospect, it seems like a very obvious approach to the backoff problem.  :-)

Of course, due to the “priming” delay and the possibility of the signalling frequency exceeding the packet rate, it’s likely to take *less* time to ramp down than to ramp up; this is why the ramping down is guarded by a while loop.

>> As another simplification, I eliminated the “primed” state (waiting for interval to expire) as an explicit entity, by simply scheduling the first drop event to be at now+interval when 
>> entering the dropping state.  This also eliminates the first_above_time variable.  Any packets with sojourn times below target will bump Codel out of the dropping state anyway.
> 
> How do you handle the case where you're scheduled a drop event 100ms in the future, and we immediately see low latency; is the event descheduled?
> If not, what if we then see high latency again; can the still-scheduled-event cause us to start dropping packets earlier than 100ms?

The first drop event is scheduled by setting the “dropping” flag, ensuring that “count” is nonzero, and setting the “drop_next” timestamp to now+interval.  Any packet below the target sojourn time clears the “dropping” flag, which prevents marking or dropping from occurring - which is why the explicit “primed” state is eliminated.

Since the timestamp is set in this way whenever the “dropping” flag transitions from cleared to set, there are no spurious drop events.

The code is in the sch_cake repo if you want to examine the details.  I promise it’s a lot easier to read than the original Codel code.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-24 14:07                 ` Jonathan Morton
@ 2016-05-24 15:52                   ` Dave Täht
  2016-05-24 15:56                     ` Jonathan Morton
  2016-05-26 12:33                     ` Jonathan Morton
  0 siblings, 2 replies; 45+ messages in thread
From: Dave Täht @ 2016-05-24 15:52 UTC (permalink / raw)
  To: codel, cake

1) I am all in favor of continued experimentation and coding in these areas.

2) However I strongly advise the first thing you attempt to do when
futzing with an aqm, is to try it at various RTTs, and then do it at
high bandwidths and low.

Some of the discussion below makes me nervous, in that a point of codel
is to try and catch the next harmonic. There's no smooth ramp up or ramp
down, there' a wave coming sometime in the future that needs to be
smoothed to fill the pipe, not the queue.

My last attempts with cake the way it was had it performing miserably at
longer RTTs (try 50ms) vs codel or fq-codel - as in half the throughput
achieved by codel, at that RTT.

Please test at larger RTTs.

On 5/24/16 8:07 AM, Jonathan Morton wrote:
> 
>> On 24 May, 2016, at 16:47, Jeff Weeks <jweeks@sandvine.com> wrote:
>>
>>> In COBALT, I keep the drop-scheduler running in this phase, but without actually dropping packets, and *decrementing* count instead of incrementing it; the backoff phase then 
>>> naturally ends when count returns to zero, instead of after an arbitrary hard timeout.  The loop simply ensures that count will reduce by the correct amount, even if traffic 
>>> temporarily ceases on the queue.  Ideally, this should cause Codel’s count value to stabilise where 50% of the time is spent above target sojourn time, and 50% below.  (Actual 
>>> behaviour won’t quite be ideal, but it should be closer than before.)
>>
>> I tried this as well, at one point, but can't remember, off-hand, why I didn't stick with it; will have to see if I can find mention of it in my notes.
>> What trigger are you using to decrement count?  I initially did a crude decrement of count every interval, but then you end up with a ramp-down time which is considerably slower then the ramp-up (and the ramp up is slow to begin with).
>> I assume you're actually re-calculating the next drop, using the 1/sqrt(count) but instead of dropping and increasing count, you're simply decreasing count, so the time to get from 1->N is the same as the time to get to N->1?
> 
> That’s basically right.  In retrospect, it seems like a very obvious approach to the backoff problem.  :-)
> 
> Of course, due to the “priming” delay and the possibility of the signalling frequency exceeding the packet rate, it’s likely to take *less* time to ramp down than to ramp up; this is why the ramping down is guarded by a while loop.
> 
>>> As another simplification, I eliminated the “primed” state (waiting for interval to expire) as an explicit entity, by simply scheduling the first drop event to be at now+interval when 
>>> entering the dropping state.  This also eliminates the first_above_time variable.  Any packets with sojourn times below target will bump Codel out of the dropping state anyway.
>>
>> How do you handle the case where you're scheduled a drop event 100ms in the future, and we immediately see low latency; is the event descheduled?
>> If not, what if we then see high latency again; can the still-scheduled-event cause us to start dropping packets earlier than 100ms?
> 
> The first drop event is scheduled by setting the “dropping” flag, ensuring that “count” is nonzero, and setting the “drop_next” timestamp to now+interval.  Any packet below the target sojourn time clears the “dropping” flag, which prevents marking or dropping from occurring - which is why the explicit “primed” state is eliminated.
> 
> Since the timestamp is set in this way whenever the “dropping” flag transitions from cleared to set, there are no spurious drop events.
> 
> The code is in the sch_cake repo if you want to examine the details.  I promise it’s a lot easier to read than the original Codel code.
> 
>  - Jonathan Morton
> 
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]   Proposing COBALT
  2016-05-24 15:52                   ` Dave Täht
@ 2016-05-24 15:56                     ` Jonathan Morton
  2016-05-24 16:02                       ` Dave Taht
  2016-05-26 12:33                     ` Jonathan Morton
  1 sibling, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-24 15:56 UTC (permalink / raw)
  To: Dave Täht; +Cc: codel, cake


> On 24 May, 2016, at 18:52, Dave Täht <dave@taht.net> wrote:
> 
> My last attempts with cake the way it was had it performing miserably at
> longer RTTs (try 50ms) vs codel or fq-codel - as in half the throughput
> achieved by codel, at that RTT.

Was that before or after I found and fixed the invsqrt cache bug yesterday?

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-24 15:56                     ` Jonathan Morton
@ 2016-05-24 16:02                       ` Dave Taht
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Taht @ 2016-05-24 16:02 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Dave Täht, cake, codel

On Tue, May 24, 2016 at 9:56 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 24 May, 2016, at 18:52, Dave Täht <dave@taht.net> wrote:
>>
>> My last attempts with cake the way it was had it performing miserably at
>> longer RTTs (try 50ms) vs codel or fq-codel - as in half the throughput
>> achieved by codel, at that RTT.
>
> Was that before or after I found and fixed the invsqrt cache bug yesterday?

That was back in december, when I had also ripped out the invsqrt
cache entirely with things like "bcake".

(I still do not see any point to the invsqrt cache, nor )

I did some informal testing of recent cakes about a month back.. and
went back to fq_codel. cake ate too much cpu, queue depth was longer,
throughput at 50ms or longer, worse.

I decided that I'd much rather work on wifi.

Please test at longer rtts.




>  - Jonathan Morton
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]   Proposing COBALT
  2016-05-24 15:52                   ` Dave Täht
  2016-05-24 15:56                     ` Jonathan Morton
@ 2016-05-26 12:33                     ` Jonathan Morton
  2016-06-03 19:09                       ` Noah Causin
  1 sibling, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-26 12:33 UTC (permalink / raw)
  To: Dave Täht; +Cc: codel, cake

> On 24 May, 2016, at 18:52, Dave Täht <dave@taht.net> wrote:
> 
> My last attempts with cake the way it was had it performing miserably at
> longer RTTs (try 50ms) vs codel or fq-codel - as in half the throughput
> achieved by codel, at that RTT.

There’s definitely something weird going on - as if the marks and drops reported by the stats are not actually occurring, except for the drops on queue overflow.

Testing 50:1 flows on 1:10 bandwidth, for example, the single flow is stuck below the aggregate throughput of the 50, suggesting strongly that its acks are not being thinned.  Cake used to be best-in-class on that very test.

I will investigate further.  But the solution may be to just perform the refactoring necessary to cleanly integrate COBALT, and eliminate subtle bugs by dint of simply sweeping away some unnecessarily gnarly code.  I think COBALT’s API is probably a lot easier to integrate into a complex qdisc than Codel’s.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]  Proposing COBALT
  2016-05-26 12:33                     ` Jonathan Morton
@ 2016-06-03 19:09                       ` Noah Causin
  2016-06-03 19:34                         ` Jonathan Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Noah Causin @ 2016-06-03 19:09 UTC (permalink / raw)
  To: Jonathan Morton, Dave Täht; +Cc: cake, codel

Was the issue, where the drops and marks did not seem to occur, resolved?

On 5/26/2016 8:33 AM, Jonathan Morton wrote:
>> On 24 May, 2016, at 18:52, Dave Täht <dave@taht.net> wrote:
>>
>> My last attempts with cake the way it was had it performing miserably at
>> longer RTTs (try 50ms) vs codel or fq-codel - as in half the throughput
>> achieved by codel, at that RTT.
> There’s definitely something weird going on - as if the marks and drops reported by the stats are not actually occurring, except for the drops on queue overflow.
>
> Testing 50:1 flows on 1:10 bandwidth, for example, the single flow is stuck below the aggregate throughput of the 50, suggesting strongly that its acks are not being thinned.  Cake used to be best-in-class on that very test.
>
> I will investigate further.  But the solution may be to just perform the refactoring necessary to cleanly integrate COBALT, and eliminate subtle bugs by dint of simply sweeping away some unnecessarily gnarly code.  I think COBALT’s API is probably a lot easier to integrate into a complex qdisc than Codel’s.
>
>   - Jonathan Morton
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]  Proposing COBALT
  2016-06-03 19:09                       ` Noah Causin
@ 2016-06-03 19:34                         ` Jonathan Morton
  2016-06-04  1:01                           ` Andrew McGregor
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-06-03 19:34 UTC (permalink / raw)
  To: Noah Causin; +Cc: Dave Täht, cake, codel

> On 3 Jun, 2016, at 22:09, Noah Causin <n0manletter@gmail.com> wrote:
> 
> Was the issue, where the drops and marks did not seem to occur, resolved?

Examination of packet dumps obtained under controlled conditions showed that marking and dropping *did* occur as normal, and I got a normal response from a local machine sending through a virtual delay line.  My Internet connection is such that extremely short RTTs never occur.

However, it seems that some Internet servers I use often do not respond as much as they should to ECN marking, resulting in excessively long queues despite a relatively small number of flows.

It rather reminds me of the symptoms one would expect to see if DCTCP found its way onto the public Internet.  And these are very popular servers with an extremely large userbase.  However it’s also possible that the ECN information is somehow disappearing en route.

I plan to investigate in more detail once COBALT is up and running, with behaviour I can reason about more intuitively than the “evolved Codel” Cake has been using up to now.  With COBALT integrated into Cake, I’ll also be able to directly track the number of unresponsive flows.

Part of that investigation may be to enquire as to whether DCTCP is in fact in use.  If so, the TCP Prague people should be brought into the loop, as this would constitute evidence that Codel can’t control DCTCP via ECN under practical Internet conditions.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-03 19:34                         ` Jonathan Morton
@ 2016-06-04  1:01                           ` Andrew McGregor
  2016-06-04  6:23                             ` Jonathan Morton
  2016-06-04 13:55                             ` Jonathan Morton
  0 siblings, 2 replies; 45+ messages in thread
From: Andrew McGregor @ 2016-06-04  1:01 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Noah Causin, cake, codel

There are undoubtedly DCTCP-like ECN responses widely deployed, since
that is the default behaviour in Windows Server (gated on RTT in some
versions).  But also, ECN bleaching exists, as do servers with ECN
response turned off even though they negotiate ECN.  It would be good
to know some specifics as to which site, whose DC they're hosted in,
etc.

Also, do you have fallback behaviour such that an ECN-unresponsive
flow eventually sees drops?  I think that will be essential.

On Sat, Jun 4, 2016 at 5:34 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 3 Jun, 2016, at 22:09, Noah Causin <n0manletter@gmail.com> wrote:
>>
>> Was the issue, where the drops and marks did not seem to occur, resolved?
>
> Examination of packet dumps obtained under controlled conditions showed that marking and dropping *did* occur as normal, and I got a normal response from a local machine sending through a virtual delay line.  My Internet connection is such that extremely short RTTs never occur.
>
> However, it seems that some Internet servers I use often do not respond as much as they should to ECN marking, resulting in excessively long queues despite a relatively small number of flows.
>
> It rather reminds me of the symptoms one would expect to see if DCTCP found its way onto the public Internet.  And these are very popular servers with an extremely large userbase.  However it’s also possible that the ECN information is somehow disappearing en route.
>
> I plan to investigate in more detail once COBALT is up and running, with behaviour I can reason about more intuitively than the “evolved Codel” Cake has been using up to now.  With COBALT integrated into Cake, I’ll also be able to directly track the number of unresponsive flows.
>
> Part of that investigation may be to enquire as to whether DCTCP is in fact in use.  If so, the TCP Prague people should be brought into the loop, as this would constitute evidence that Codel can’t control DCTCP via ECN under practical Internet conditions.
>
>  - Jonathan Morton
>
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04  1:01                           ` Andrew McGregor
@ 2016-06-04  6:23                             ` Jonathan Morton
  2016-06-04 13:55                             ` Jonathan Morton
  1 sibling, 0 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-06-04  6:23 UTC (permalink / raw)
  To: Andrew McGregor; +Cc: Noah Causin, cake, codel

> On 4 Jun, 2016, at 04:01, Andrew McGregor <andrewmcgr@gmail.com> wrote:
> 
> There are undoubtedly DCTCP-like ECN responses widely deployed, since
> that is the default behaviour in Windows Server (gated on RTT in some
> versions).  But also, ECN bleaching exists, as do servers with ECN
> response turned off even though they negotiate ECN.  It would be good
> to know some specifics as to which site, whose DC they're hosted in,
> etc.

I’m keeping my mouth shut until I’ve analysed the specific traffic in more detail, so I know what I’m accusing people of and precisely who to accuse.  It’s even possible that the fault lies in my ISP’s network - I think they’ve made some significant changes recently.

If people are really negotiating ECN and then ignoring its signals at the host level, that’s a clear RFC violation.  Fortunately, I think this particular site would be interested in correcting such behaviour if confirmed and explained.

> Also, do you have fallback behaviour such that an ECN-unresponsive
> flow eventually sees drops?  I think that will be essential.

Yes, COBALT essentially *is* such a mechanism.  The Codel half always uses ECN if it’s available (and drops otherwise), but the BLUE half - the part responsible for handling unresponsive flows in the first place - always uses packet drops.

Cake also performs “head drop on the longest queue” when the global queue limit is reached (as does fq_codel).  This can be considered a second such mechanism, though a much blunter one; it is significantly superior to tail-drop for two major reasons, but can easily result in burst loss.

It is also this overflow which acts as the up-trigger for BLUE; the longest queue not only gets the instant head-drop but a notification to its COBALT instance.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04  1:01                           ` Andrew McGregor
  2016-06-04  6:23                             ` Jonathan Morton
@ 2016-06-04 13:55                             ` Jonathan Morton
  2016-06-04 14:01                               ` moeller0
  2016-06-04 17:10                               ` Noah Causin
  1 sibling, 2 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-06-04 13:55 UTC (permalink / raw)
  To: Andrew McGregor; +Cc: Noah Causin, cake, codel

> On 4 Jun, 2016, at 04:01, Andrew McGregor <andrewmcgr@gmail.com> wrote:
> 
> ...servers with ECN response turned off even though they negotiate ECN.

It appears that I’m looking at precisely that scenario.

A random selection of connections from a packet dump show very high marking rates, which are apparently acknowledged using CWR, but a subsequent dropped packet (probably due to queue overflow) takes many seconds to be retransmitted (I’m using a rather high memory limit for observation purposes).

Overall the TCP behaviour is approximately normal for NewReno on a dumb FIFO, and the ECN signalling is completely ignored.  This doesn’t rule out the possibility that it’s a different Reno relative, such as Westwood+ or Compound.

There’s often more than one CWR per RTT.  This isn’t a consistent characteristic; some connections have normal-looking CWRs while others issue them every three packets, as if they’re fishing for “more accurate” ECN feedback.  It might vary by host; I didn’t keep track of that.  But this can’t be DCTCP; even that should back off in the face of a 100% marking rate, which is often achieved at my low bandwidth and with very persistent queues.

Other servers respond normally to ECN signals, ruling out interference by my ISP. It’s possible the ECE flag is wiped and the CWRs are faked, but there’s no legitimate reason to do that.  The CWRs ultimately make no difference, since at 100% CE marks, every ack has ECE set anyway.

Turning off ECN negotiation at the client results in a much better managed queue with similar throughput.  It’s not immediately obvious whether that’s due to a functioning congestion response or simply the AQM clearing out the queue the hard way.  It’ll be interesting to see what effect COBALT has here, when I get it to actually work.

As for who these servers are: Valve Software’s Steam platform.  I did say they were large and popular.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]   Proposing COBALT
  2016-06-04 13:55                             ` Jonathan Morton
@ 2016-06-04 14:01                               ` moeller0
  2016-06-04 14:16                                 ` Jonathan Morton
  2016-06-04 17:10                               ` Noah Causin
  1 sibling, 1 reply; 45+ messages in thread
From: moeller0 @ 2016-06-04 14:01 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Andrew McGregor, cake, codel

Hi Jonathan,


> On Jun 4, 2016, at 15:55 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 4 Jun, 2016, at 04:01, Andrew McGregor <andrewmcgr@gmail.com> wrote:
>> 
>> ...servers with ECN response turned off even though they negotiate ECN.
> 
> It appears that I’m looking at precisely that scenario.
> 
> A random selection of connections from a packet dump show very high marking rates, which are apparently acknowledged using CWR, but a subsequent dropped packet (probably due to queue overflow) takes many seconds to be retransmitted (I’m using a rather high memory limit for observation purposes).
> 
> Overall the TCP behaviour is approximately normal for NewReno on a dumb FIFO, and the ECN signalling is completely ignored.  This doesn’t rule out the possibility that it’s a different Reno relative, such as Westwood+ or Compound.
> 
> There’s often more than one CWR per RTT.  This isn’t a consistent characteristic; some connections have normal-looking CWRs while others issue them every three packets, as if they’re fishing for “more accurate” ECN feedback.  It might vary by host; I didn’t keep track of that.  But this can’t be DCTCP; even that should back off in the face of a 100% marking rate, which is often achieved at my low bandwidth and with very persistent queues.
> 
> Other servers respond normally to ECN signals, ruling out interference by my ISP. It’s possible the ECE flag is wiped and the CWRs are faked, but there’s no legitimate reason to do that.  The CWRs ultimately make no difference, since at 100% CE marks, every ack has ECE set anyway.
> 
> Turning off ECN negotiation at the client results in a much better managed queue with similar throughput.  It’s not immediately obvious whether that’s due to a functioning congestion response or simply the AQM clearing out the queue the hard way.  It’ll be interesting to see what effect COBALT has here, when I get it to actually work.
> 
> As for who these servers are: Valve Software’s Steam platform.  I did say they were large and popular.

	Maybe cake should allow to switch from the default mark by ECN policy to mark by drop per command line argument? At least that would allow much easier in the field testing… As is there is only the option of disabling ECN at the endpoint(s)…

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]   Proposing COBALT
  2016-06-04 14:01                               ` moeller0
@ 2016-06-04 14:16                                 ` Jonathan Morton
  2016-06-04 15:03                                   ` moeller0
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-06-04 14:16 UTC (permalink / raw)
  To: moeller0; +Cc: Andrew McGregor, cake, codel

> On 4 Jun, 2016, at 17:01, moeller0 <moeller0@gmx.de> wrote:
> 
> Maybe cake should allow to switch from the default mark by ECN policy to mark by drop per command line argument? At least that would allow much easier in the field testing… As is there is only the option of disabling ECN at the endpoint(s)…

I consider ignoring ECN in the way I described to be a fault condition inevitably resulting in unresponsive traffic.  As a fault condition, it should be rare.

The main effect in practice is that the RTT for the affected flows grows well beyond normal, but since they are bulk transfers, this has only a minor detrimental effect (much of which is incurred sender-side in the form of retransmission buffers two orders of magnitude larger than necessary).

Rather than further complicate Codel or Cake, I’d like to simply apply a general solution for unresponsive traffic, ie. COBALT.

 - Jonathan Mortob

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]   Proposing COBALT
  2016-06-04 14:16                                 ` Jonathan Morton
@ 2016-06-04 15:03                                   ` moeller0
  0 siblings, 0 replies; 45+ messages in thread
From: moeller0 @ 2016-06-04 15:03 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Andrew McGregor, cake, codel

Hi Jonathan,


> On Jun 4, 2016, at 16:16 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 4 Jun, 2016, at 17:01, moeller0 <moeller0@gmx.de> wrote:
>> 
>> Maybe cake should allow to switch from the default mark by ECN policy to mark by drop per command line argument? At least that would allow much easier in the field testing… As is there is only the option of disabling ECN at the endpoint(s)…
> 
> I consider ignoring ECN in the way I described to be a fault condition inevitably resulting in unresponsive traffic.  As a fault condition, it should be rare.

	Operative word being “should” in my opinion; as long as we have no reliable statistics either way, assuming rarity seems overly optimistic to me. Not giving the user control over policy requires the default policy to be almost 100% applicable., here we have a demonstrated case where this requirement seems violated. Make out of that what you want, if cake were my project I would make ECN versus drop configurable at the qdisc, as the control via the endhosts seems comparatively tedious, especially for quick comparative testing. But cake is not my project, so all I can do is try to make a case for introducing a policy control toggle…

> 
> The main effect in practice is that the RTT for the affected flows grows well beyond normal, but since they are bulk transfers,
> this has only a minor detrimental effect (much of which is incurred sender-side in the form of retransmission buffers two orders of magnitude larger than necessary).
> 
> Rather than further complicate Codel or Cake, I’d like to simply apply a general solution for unresponsive traffic, ie. COBALT.

	If adding a toggle for ECN versus drop is your only concern in the complexity of cake’s configuration you have not been reading my arguments regarding the labyrinthian overhead keywords… Really not exposing this control for this might actually be a reasonable thing to do, but trying to “blame” this on added complexity seems far fetched… but what do I know…

Best Regards
	Sebastian

> 
> - Jonathan Mortob
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04 13:55                             ` Jonathan Morton
  2016-06-04 14:01                               ` moeller0
@ 2016-06-04 17:10                               ` Noah Causin
  2016-06-04 17:49                                 ` Eric Dumazet
  1 sibling, 1 reply; 45+ messages in thread
From: Noah Causin @ 2016-06-04 17:10 UTC (permalink / raw)
  To: Jonathan Morton, Andrew McGregor; +Cc: cake, codel

I notice that issue with Steam.  Steam uses lots of ECN, which can be 
nice for saving bandwidth with large games.  The issue I notice is that 
Steam is the one application that can cause me to have ping spikes of 
over 100ms, even though I have thoroughly tested my network using both 
flent and dslreports.

I also notice that I get large sparse delays in the cake stats during 
steam downloads.  The highest I can remember right now is like 22ms.



On 6/4/2016 9:55 AM, Jonathan Morton wrote:
>> On 4 Jun, 2016, at 04:01, Andrew McGregor <andrewmcgr@gmail.com> wrote:
>>
>> ...servers with ECN response turned off even though they negotiate ECN.
> It appears that I’m looking at precisely that scenario.
>
> A random selection of connections from a packet dump show very high marking rates, which are apparently acknowledged using CWR, but a subsequent dropped packet (probably due to queue overflow) takes many seconds to be retransmitted (I’m using a rather high memory limit for observation purposes).
>
> Overall the TCP behaviour is approximately normal for NewReno on a dumb FIFO, and the ECN signalling is completely ignored.  This doesn’t rule out the possibility that it’s a different Reno relative, such as Westwood+ or Compound.
>
> There’s often more than one CWR per RTT.  This isn’t a consistent characteristic; some connections have normal-looking CWRs while others issue them every three packets, as if they’re fishing for “more accurate” ECN feedback.  It might vary by host; I didn’t keep track of that.  But this can’t be DCTCP; even that should back off in the face of a 100% marking rate, which is often achieved at my low bandwidth and with very persistent queues.
>
> Other servers respond normally to ECN signals, ruling out interference by my ISP. It’s possible the ECE flag is wiped and the CWRs are faked, but there’s no legitimate reason to do that.  The CWRs ultimately make no difference, since at 100% CE marks, every ack has ECE set anyway.
>
> Turning off ECN negotiation at the client results in a much better managed queue with similar throughput.  It’s not immediately obvious whether that’s due to a functioning congestion response or simply the AQM clearing out the queue the hard way.  It’ll be interesting to see what effect COBALT has here, when I get it to actually work.
>
> As for who these servers are: Valve Software’s Steam platform.  I did say they were large and popular.
>
>   - Jonathan Morton
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04 17:10                               ` Noah Causin
@ 2016-06-04 17:49                                 ` Eric Dumazet
  2016-06-04 19:55                                   ` Jonathan Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2016-06-04 17:49 UTC (permalink / raw)
  To: Noah Causin; +Cc: Jonathan Morton, Andrew McGregor, cake, codel

On Sat, 2016-06-04 at 13:10 -0400, Noah Causin wrote:
> I notice that issue with Steam.  Steam uses lots of ECN, which can be 
> nice for saving bandwidth with large games.  The issue I notice is that 
> Steam is the one application that can cause me to have ping spikes of 
> over 100ms, even though I have thoroughly tested my network using both 
> flent and dslreports.
> 
> I also notice that I get large sparse delays in the cake stats during 
> steam downloads.  The highest I can remember right now is like 22ms.

ECN (as in RFC 3168) is well known to be trivially exploited by peers
pretending to be ECN ready, but not reacting to feedbacks, only to let
their packets traverse congested hops with a lower drop probability.

https://www.ietf.org/rfc/rfc3540.txt



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04 17:49                                 ` Eric Dumazet
@ 2016-06-04 19:55                                   ` Jonathan Morton
  2016-06-04 20:56                                     ` Eric Dumazet
  2016-06-27  3:56                                     ` Jonathan Morton
  0 siblings, 2 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-06-04 19:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Noah Causin, Andrew McGregor, cake, codel

> On 4 Jun, 2016, at 20:49, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> ECN (as in RFC 3168) is well known to be trivially exploited by peers
> pretending to be ECN ready, but not reacting to feedbacks, only to let
> their packets traverse congested hops with a lower drop probability.

In this case it is the sender cheating, not the receiver, nor the network.  ECN Nonce doesn’t apply, as it is designed to protect against the latter two forms of cheating (and in any case nobody ever deployed it).

Given that it’s *Valve* doing it, we have a good chance of convincing them to correct it, simply by explaining that it has an unreasonable effect on network latency and therefore game performance while Steam is downloading in the background.  This is especially pertinent since several of Valve’s own games are notoriously latency-sensitive FPSes.

COBALT should turn out to be a reasonable antidote to sender-side cheating, due to the way BLUE works; the drop probability remains steady until the queue has completely emptied, and then decays slowly.  Assuming the congestion-control response to packet drops is normal, BLUE should find a stable operating point where the queue is kept partly full on average.  The resulting packet loss will be higher than for a dumb FIFO or a naive ECN AQM, but lower than for a loss-based AQM with a tight sojourn-time target.

For this reason, I’m putting off drafting such an explanation to Valve until I have a chance to evaluate COBALT’s performance against the faulty traffic.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04 19:55                                   ` Jonathan Morton
@ 2016-06-04 20:56                                     ` Eric Dumazet
  2016-06-27  3:56                                     ` Jonathan Morton
  1 sibling, 0 replies; 45+ messages in thread
From: Eric Dumazet @ 2016-06-04 20:56 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Noah Causin, Andrew McGregor, cake, codel

On Sat, 2016-06-04 at 22:55 +0300, Jonathan Morton wrote:
> > On 4 Jun, 2016, at 20:49, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > ECN (as in RFC 3168) is well known to be trivially exploited by peers
> > pretending to be ECN ready, but not reacting to feedbacks, only to let
> > their packets traverse congested hops with a lower drop probability.
> 
> In this case it is the sender cheating, not the receiver, nor the
> network.  ECN Nonce doesn’t apply, as it is designed to protect
> against the latter two forms of cheating (and in any case nobody ever
> deployed it).

Well, this is another demonstration of how ECN can be fooled, either by
malicious peers (senders and/or receivers), or simply bugs in TOS byte
remarking.

Senders (or a buggy router) can mark all packets with ECT(0), regardless
of ECN being negotiated at all in TCP 3WHS





^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-06-04 19:55                                   ` Jonathan Morton
  2016-06-04 20:56                                     ` Eric Dumazet
@ 2016-06-27  3:56                                     ` Jonathan Morton
  2016-06-27  7:59                                       ` moeller0
  1 sibling, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-06-27  3:56 UTC (permalink / raw)
  To: cake, codel

> On 4 Jun, 2016, at 22:55, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> COBALT should turn out to be a reasonable antidote to sender-side cheating, due to the way BLUE works; the drop probability remains steady until the queue has completely emptied, and then decays slowly.  Assuming the congestion-control response to packet drops is normal, BLUE should find a stable operating point where the queue is kept partly full on average.  The resulting packet loss will be higher than for a dumb FIFO or a naive ECN AQM, but lower than for a loss-based AQM with a tight sojourn-time target.
> 
> For this reason, I’m putting off drafting such an explanation to Valve until I have a chance to evaluate COBALT’s performance against the faulty traffic.

The COBALTified Cake is now working quite nicely, after I located and excised some annoying lockup bugs.  As a side-effect of these fixes (which introduced a third, lightly-serviced flowchain for “decaying flows”, which are counted as “sparse” in the stats report), the sparse and bulk flow counts should be somewhat less jittery and more useful.

I replaced the defunct “last_len” stat with a new “un_flows”, meaning “unresponsive flows”, to indicate when the BLUE part of COBALT is active.  This lights up nicely when passing Steam traffic, which no longer has anywhere near as detrimental effect on my Internet connection as it did with only Codel; this indicates that BLUE’s ECN-blind dropping is successfully keeping the upstream queue empty.  (Of course it wouldn’t help against a UDP flood, but nothing can do that in this topology.)

While working on this, I also noticed that the triple-isolation logic is probably quite CPU-intensive.  It should be feasible to do better, so I’ll have a go at that soon.  Also on the to-do list is enhancing the overhead logic with new data, and adding a three-class Diffserv mode which Dave has wanted for a while.

I’ve also come up with a tentative experimental setup to test the “85% rule” more robustly than the Chinese paper found recently.  I should be able to do it wth just three hosts, one having dual NICs, and using only Cake and netem qdiscs.

Now if only the sauna were not the *coolest* part of my residence right now…

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake]   Proposing COBALT
  2016-06-27  3:56                                     ` Jonathan Morton
@ 2016-06-27  7:59                                       ` moeller0
  0 siblings, 0 replies; 45+ messages in thread
From: moeller0 @ 2016-06-27  7:59 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake, codel, Eric Dumazet, Andrew McGregor

Hi Jonathan, 

all of this sounds great! One question inlined below…


> On Jun 27, 2016, at 05:56 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 4 Jun, 2016, at 22:55, Jonathan Morton <chromatix99@gmail.com> wrote:
>> 
>> COBALT should turn out to be a reasonable antidote to sender-side cheating, due to the way BLUE works; the drop probability remains steady until the queue has completely emptied, and then decays slowly.  Assuming the congestion-control response to packet drops is normal, BLUE should find a stable operating point where the queue is kept partly full on average.  The resulting packet loss will be higher than for a dumb FIFO or a naive ECN AQM, but lower than for a loss-based AQM with a tight sojourn-time target.
>> 
>> For this reason, I’m putting off drafting such an explanation to Valve until I have a chance to evaluate COBALT’s performance against the faulty traffic.
> 
> The COBALTified Cake is now working quite nicely, after I located and excised some annoying lockup bugs.  As a side-effect of these fixes (which introduced a third, lightly-serviced flowchain for “decaying flows”, which are counted as “sparse” in the stats report), the sparse and bulk flow counts should be somewhat less jittery and more useful.
> 
> I replaced the defunct “last_len” stat with a new “un_flows”, meaning “unresponsive flows”, to indicate when the BLUE part of COBALT is active.  This lights up nicely when passing Steam traffic, which no longer has anywhere near as detrimental effect on my Internet connection as it did with only Codel; this indicates that BLUE’s ECN-blind dropping is successfully keeping the upstream queue empty.  (Of course it wouldn’t help against a UDP flood, but nothing can do that in this topology.)
> 
> While working on this, I also noticed that the triple-isolation logic is probably quite CPU-intensive.

	Does this also affect the dual[src|dst]host isolation options? How do you test this option internally (I am trying to solicit testers from the openwrt forum, but they are hard to come by and understandably only want to spent limited time with testing, so the results so far are tentative at best)

>  It should be feasible to do better, so I’ll have a go at that soon.  Also on the to-do list is enhancing the overhead logic with new data,

	Could I, ask nicely again, that you add something to the keywords that will easily signify whether the keyword has a side-effect on the atm encapsulation, please? 
	A lot of our users on openwrt/lede only see the output of “tc qdisc add cake help” at best and the different scopes of the keywords are simply not easy to understand from that. (The “scope” of the keywords could be cmade clearer for example by either a pre-/suffix to the keyword names, like in pppoe-ptm or by using two word configurations like “adsl-overhead pppoe-vcmux”. I admit that both being less visually pleasing and concise than the existing keywords, but clearer to our users).



> and adding a three-class Diffserv mode which Dave has wanted for a while.
> 
> I’ve also come up with a tentative experimental setup to test the “85% rule” more robustly than the Chinese paper found recently.  I should be able to do it wth just three hosts, one having dual NICs, and using only Cake and netem qdiscs.
> 
> Now if only the sauna were not the *coolest* part of my residence right now…

	Your are in Finland? I envy you for your nice long days…

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 12:18   ` Jonathan Morton
  2016-05-20 13:22     ` moeller0
@ 2016-05-20 13:41     ` David Lang
  2016-05-20 13:46       ` moeller0
  2016-05-20 14:09       ` Jonathan Morton
  1 sibling, 2 replies; 45+ messages in thread
From: David Lang @ 2016-05-20 13:41 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: moeller0, cake, codel

On Fri, 20 May 2016, Jonathan Morton wrote:

> Normal traffic does not include large numbers of fragmented packets (I would 
> expect a mere handful from certain one-shot request-response protocols which 
> can produce large responses), so it is better to shunt them to a single queue 
> per host-pair.

I don't agree with this.

Normal traffic on a well setup network should not include large numbers of 
fragmented packets. But I have seen too many networks that fragment almost 
everything as a result of there being a hop that goes through one or more 
tunneling layers that lower the effective MTU (and no, path mtu discovery does 
not always work)

David Lang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 13:41     ` David Lang
@ 2016-05-20 13:46       ` moeller0
  2016-05-20 14:04         ` David Lang
  2016-05-20 14:09       ` Jonathan Morton
  1 sibling, 1 reply; 45+ messages in thread
From: moeller0 @ 2016-05-20 13:46 UTC (permalink / raw)
  To: David Lang; +Cc: Jonathan Morton, cake, codel


> On May 20, 2016, at 15:41 , David Lang <david@lang.hm> wrote:
> 
> On Fri, 20 May 2016, Jonathan Morton wrote:
> 
>> Normal traffic does not include large numbers of fragmented packets (I would expect a mere handful from certain one-shot request-response protocols which can produce large responses), so it is better to shunt them to a single queue per host-pair.
> 
> I don't agree with this.
> 
> Normal traffic on a well setup network should not include large numbers of fragmented packets. But I have seen too many networks that fragment almost everything as a result of there being a hop that goes through one or more tunneling layers that lower the effective MTU (and no, path mtu discovery does not always work)

	True, do you have a cheaper idea of getting the flow identity cheaply from fragmented packets, short of ressembly ;) ?

Best Regards
	Sebastian

> 
> David Lang


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 13:46       ` moeller0
@ 2016-05-20 14:04         ` David Lang
  2016-05-20 14:42           ` Kathleen Nichols
  2016-05-20 15:12           ` Jonathan Morton
  0 siblings, 2 replies; 45+ messages in thread
From: David Lang @ 2016-05-20 14:04 UTC (permalink / raw)
  To: moeller0; +Cc: Jonathan Morton, cake, codel

On Fri, 20 May 2016, moeller0 wrote:

>> On May 20, 2016, at 15:41 , David Lang <david@lang.hm> wrote:
>>
>> On Fri, 20 May 2016, Jonathan Morton wrote:
>>
>>> Normal traffic does not include large numbers of fragmented packets (I would expect a mere handful from certain one-shot request-response protocols which can produce large responses), so it is better to shunt them to a single queue per host-pair.
>>
>> I don't agree with this.
>>
>> Normal traffic on a well setup network should not include large numbers of fragmented packets. But I have seen too many networks that fragment almost everything as a result of there being a hop that goes through one or more tunneling layers that lower the effective MTU (and no, path mtu discovery does not always work)
>
> 	True, do you have a cheaper idea of getting the flow identity cheaply from fragmented packets, short of ressembly ;) ?

How big a problem is this in the real world? ARe we working on a theoretical 
problem, or something that is actually hurting people?

by default (and it's a fairly hard default to disable in OpenWRT), the kernel is 
doing connection tracking so that NAT (masq) and stateful firewalling can work. 
That process has to solve this problem. The days of allowing fragments through 
the firewall ended over a decade ago, and if you don't NAT the fragments 
correctly, things break.

So, assuming that we can do as well as conntrack (or ideally use the work that 
it's already doing), then the only case where this starts to matter is in places 
that have a custom kernel with conntrack disabled and are still seeing enough 
fragments to matter.

I strongly suspect that in the real world, grouping those fragments by 
source/dest IP will spread them into enough buckets to keep them from hurting 
any other systems, while still keeping them concentrated enough to keep 
fragmentation from being a backdoor around limits.

Remember, perfect is the enemy of good enough. A broken network that is 
fragmenting a lot of traffic is going to have other problems (especially if it's 
the typical "fragment due to tunnel overhead" where you have a full packate and 
minimum size packet pair that you fragment into). Our main goal needs to be to 
keep such systems from hurting others. Keeping it from hurting other traffic on 
the same broken host is a secondary goal.

Is it possible to get speed testing software to detect that it's receiving 
fragments and warn about that?

David Lang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 14:04         ` David Lang
@ 2016-05-20 14:42           ` Kathleen Nichols
  2016-05-20 15:11             ` Jonathan Morton
  2016-05-20 15:12           ` Jonathan Morton
  1 sibling, 1 reply; 45+ messages in thread
From: Kathleen Nichols @ 2016-05-20 14:42 UTC (permalink / raw)
  To: codel


On 5/20/16 7:04 AM, David Lang wrote:
>
> How big a problem is this in the real world? ARe we working on a
> theoretical problem, or something that is actually hurting people?
>

The above seems like it should be the FIRST thing to consider.

The entire thread:
> On Fri, 20 May 2016, moeller0 wrote:
> 
>>> On May 20, 2016, at 15:41 , David Lang <david@lang.hm> wrote:
>>>
>>> On Fri, 20 May 2016, Jonathan Morton wrote:
>>>
>>>> Normal traffic does not include large numbers of fragmented packets
>>>> (I would expect a mere handful from certain one-shot
>>>> request-response protocols which can produce large responses), so it
>>>> is better to shunt them to a single queue per host-pair.
>>>
>>> I don't agree with this.
>>>
>>> Normal traffic on a well setup network should not include large
>>> numbers of fragmented packets. But I have seen too many networks that
>>> fragment almost everything as a result of there being a hop that goes
>>> through one or more tunneling layers that lower the effective MTU
>>> (and no, path mtu discovery does not always work)
>>
>>     True, do you have a cheaper idea of getting the flow identity
>> cheaply from fragmented packets, short of ressembly ;) ?
> 
> How big a problem is this in the real world? ARe we working on a
> theoretical problem, or something that is actually hurting people?
> 
> by default (and it's a fairly hard default to disable in OpenWRT), the
> kernel is doing connection tracking so that NAT (masq) and stateful
> firewalling can work. That process has to solve this problem. The days
> of allowing fragments through the firewall ended over a decade ago, and
> if you don't NAT the fragments correctly, things break.
> 
> So, assuming that we can do as well as conntrack (or ideally use the
> work that it's already doing), then the only case where this starts to
> matter is in places that have a custom kernel with conntrack disabled
> and are still seeing enough fragments to matter.
> 
> I strongly suspect that in the real world, grouping those fragments by
> source/dest IP will spread them into enough buckets to keep them from
> hurting any other systems, while still keeping them concentrated enough
> to keep fragmentation from being a backdoor around limits.
> 
> Remember, perfect is the enemy of good enough. A broken network that is
> fragmenting a lot of traffic is going to have other problems (especially
> if it's the typical "fragment due to tunnel overhead" where you have a
> full packate and minimum size packet pair that you fragment into). Our
> main goal needs to be to keep such systems from hurting others. Keeping
> it from hurting other traffic on the same broken host is a secondary goal.
> 
> Is it possible to get speed testing software to detect that it's
> receiving fragments and warn about that?
> 
> David Lang
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 14:42           ` Kathleen Nichols
@ 2016-05-20 15:11             ` Jonathan Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 15:11 UTC (permalink / raw)
  To: Kathleen Nichols; +Cc: codel

> On 20 May, 2016, at 17:42, Kathleen Nichols <nichols@pollere.com> wrote:
> 
>> How big a problem is this in the real world? ARe we working on a
>> theoretical problem, or something that is actually hurting people?
> 
> The above seems like it should be the FIRST thing to consider.

Fragmented packets *are* a real-world problem, IMHO, in that iperf3 in UDP mode produces lots of them by default, and hardware vendors tend to use tools like iperf3 UDP (in a Faraday cage, no less) to demonstrate the throughput of their new kit.  Historically, that’s been the method which produces the biggest and most impressive numbers, because there is almost no reverse traffic contending for airtime.

Currently fq_codel does a *really bad* job of showing high iperf3 UDP numbers, even though it shows very good real-world TCP performance, and that is likely to severely put off hardware vendors from deploying fq_codel by default - because it’s not the TCP goodput numbers that they like to use for marketing.

And that’s a real-world problem when increasing AQM deployment is a real-world goal.

However, I think the relatively straightforward fix of isolating fragmented packets (including the initial fragment, in which the transport header remains visible) only by addresses should be sufficient.  This will keep the different parts of each packet together (to the extent they were together on ingress) and allow more of them to be successfully reassembled, allowing iperf3 to show numbers closer to the no-AQM case.  I’ve already described why it should be sufficient for real-world traffic as well.

Reassembling fragmented packets would also work, provided they are only re-fragmented *after* passing through the qdisc, but carries dangers of its own due to the resources required for reassembly.  Presently those costs are borne by the receiving host, which is in a better position to do so.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 14:04         ` David Lang
  2016-05-20 14:42           ` Kathleen Nichols
@ 2016-05-20 15:12           ` Jonathan Morton
  2016-05-20 16:05             ` David Lang
  2016-05-20 16:20             ` Rick Jones
  1 sibling, 2 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 15:12 UTC (permalink / raw)
  To: David Lang; +Cc: moeller0, cake, codel


> On 20 May, 2016, at 17:04, David Lang <david@lang.hm> wrote:
> 
> Is it possible to get speed testing software to detect that it's receiving fragments and warn about that?

Do iperf3’s maintainers accept patches?

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 15:12           ` Jonathan Morton
@ 2016-05-20 16:05             ` David Lang
  2016-05-20 17:06               ` Jonathan Morton
  2016-05-20 16:20             ` Rick Jones
  1 sibling, 1 reply; 45+ messages in thread
From: David Lang @ 2016-05-20 16:05 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: moeller0, cake, codel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 366 bytes --]

On Fri, 20 May 2016, Jonathan Morton wrote:

>> On 20 May, 2016, at 17:04, David Lang <david@lang.hm> wrote:
>>
>> Is it possible to get speed testing software to detect that it's receiving fragments and warn about that?
>
> Do iperf3’s maintainers accept patches?

don't know, I was thinking more the dslreports speedtest site and that sort of 
thing.

David Lang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 16:05             ` David Lang
@ 2016-05-20 17:06               ` Jonathan Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 17:06 UTC (permalink / raw)
  To: David Lang; +Cc: moeller0, cake, codel


> On 20 May, 2016, at 19:05, David Lang <david@lang.hm> wrote:
> 
> don't know, I was thinking more the dslreports speedtest site and that sort of thing.

I imagine both dslreports and netalyzr would find this metric interesting, if they don’t have it already.  They’re both in a position to examine packet traces associated with each test.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 15:12           ` Jonathan Morton
  2016-05-20 16:05             ` David Lang
@ 2016-05-20 16:20             ` Rick Jones
  2016-05-20 16:35               ` Jonathan Morton
  1 sibling, 1 reply; 45+ messages in thread
From: Rick Jones @ 2016-05-20 16:20 UTC (permalink / raw)
  To: Jonathan Morton, David Lang; +Cc: cake, codel

On 05/20/2016 08:12 AM, Jonathan Morton wrote:
>
>> On 20 May, 2016, at 17:04, David Lang <david@lang.hm> wrote:
>>
>> Is it possible to get speed testing software to detect that it's receiving fragments and warn about that?
>
> Do iperf3’s maintainers accept patches?

Netperf's maintainer has been known to accept patches so long as they 
aren't too hairy.  That said, it isn't clear how something operating 
above the socket interface is going to know that the traffic it was 
receiving was in the form of reassembled IP datagram fragments.

happy benchmarking,

rick jones

I suppose if said software were to dive below the socket interface it 
could find-out, though that will tend to lack portability.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 16:20             ` Rick Jones
@ 2016-05-20 16:35               ` Jonathan Morton
  2016-05-20 17:01                 ` Rick Jones
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 16:35 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Lang, cake, codel


> On 20 May, 2016, at 19:20, Rick Jones <rick.jones2@hpe.com> wrote:
> 
> On 05/20/2016 08:12 AM, Jonathan Morton wrote:
>> 
>>> On 20 May, 2016, at 17:04, David Lang <david@lang.hm> wrote:
>>> 
>>> Is it possible to get speed testing software to detect that it's receiving fragments and warn about that?
>> 
>> Do iperf3’s maintainers accept patches?
> 
> Netperf's maintainer has been known to accept patches so long as they aren't too hairy.  That said, it isn't clear how something operating above the socket interface is going to know that the traffic it was receiving was in the form of reassembled IP datagram fragments.
> 
> happy benchmarking,
> 
> rick jones
> 
> I suppose if said software were to dive below the socket interface it could find-out, though that will tend to lack portability.

I’m a little fuzzy on UDP socket semantics.

Could the sender set DF on a small proportion of the packets, and listen for ICMP errors to the effect?  These packets could also be salted with distinguishable data so that the receiver can tell whether the DF packets, in particular, got through.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 16:35               ` Jonathan Morton
@ 2016-05-20 17:01                 ` Rick Jones
  2016-05-20 17:07                   ` Jonathan Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Rick Jones @ 2016-05-20 17:01 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: David Lang, cake, codel

On 05/20/2016 09:35 AM, Jonathan Morton wrote:
>> On 20 May, 2016, at 19:20, Rick Jones <rick.jones2@hpe.com> wrote:
>> I suppose if said software were to dive below the socket interface
>> it could find-out, though that will tend to lack portability.
>
> I’m a little fuzzy on UDP socket semantics.
>
> Could the sender set DF on a small proportion of the packets, and
> listen for ICMP errors to the effect?  These packets could also be
> salted with distinguishable data so that the receiver can tell
> whether the DF packets, in particular, got through.
>

The Linux manapge for UDP asserts:

>        By default, Linux UDP does path MTU (Maximum Transmission Unit) discov‐
>        ery.  This means the kernel will keep track of the MTU  to  a  specific
>        target  IP  address and return EMSGSIZE when a UDP packet write exceeds
>        it.  When this happens, the  application  should  decrease  the  packet
>        size.   Path MTU discovery can be also turned off using the IP_MTU_DIS‐
>        COVER socket option or the /proc/sys/net/ipv4/ip_no_pmtu_disc file; see
>        ip(7)  for  details.   When  turned off, UDP will fragment outgoing UDP
>        packets that exceed the interface MTU.  However, disabling  it  is  not
>        recommended for performance and reliability reasons.

But I haven't seen that EMSGSIZE happen with netperf UDP tests - could 
be though I've never run them in an environment which triggered PTMUD.

I don't have visibility into the assertions for *BSD and other Unices.

I'm thinking that modulo not knowing with certainty it was the only 
thing sending and/or receiving traffic, sampling IP stats about 
fragments before the test and again after would be a more 
straightforward way to check instead of complicating the benchmark's 
data path.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 17:01                 ` Rick Jones
@ 2016-05-20 17:07                   ` Jonathan Morton
  2016-05-20 17:21                     ` Rick Jones
  2016-05-20 17:26                     ` David Lang
  0 siblings, 2 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 17:07 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Lang, cake, codel


> On 20 May, 2016, at 20:01, Rick Jones <rick.jones2@hpe.com> wrote:
> 
> But I haven't seen that EMSGSIZE happen with netperf UDP tests - could be though I've never run them in an environment which triggered PTMUD.

It’s entirely possible that netperf and/or iperf3 are (ab)using the IP_MTU_DISCOVERY socket option mentioned.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 17:07                   ` Jonathan Morton
@ 2016-05-20 17:21                     ` Rick Jones
  2016-05-20 17:26                     ` David Lang
  1 sibling, 0 replies; 45+ messages in thread
From: Rick Jones @ 2016-05-20 17:21 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: David Lang, cake, codel

On 05/20/2016 10:07 AM, Jonathan Morton wrote:
>
>> On 20 May, 2016, at 20:01, Rick Jones <rick.jones2@hpe.com> wrote:
>>
>> But I haven't seen that EMSGSIZE happen with netperf UDP tests -
>> could be though I've never run them in an environment which
>> triggered PTMUD.
>
> It’s entirely possible that netperf and/or iperf3 are (ab)using the
> IP_MTU_DISCOVERY socket option mentioned.

I cannot speak to iperf, but netperf doesn't manipulate 
IP_MTU_DISCOVERY.  It will set IP_RECVERR for a UDP_STREAM test, and 
will by default set SO_DONTROUTE (*) for UDP_STREAM but no IP_MTU_DISCOVERY.

happy benchmarking,

rick jones

* QA engineers doing link up/down testing with UDP_STREAM on test 
systems not air-gapped from the production networks are dangerous 
creatures...


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 17:07                   ` Jonathan Morton
  2016-05-20 17:21                     ` Rick Jones
@ 2016-05-20 17:26                     ` David Lang
  2016-05-20 17:33                       ` Jonathan Morton
  1 sibling, 1 reply; 45+ messages in thread
From: David Lang @ 2016-05-20 17:26 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Rick Jones, cake, codel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 450 bytes --]

On Fri, 20 May 2016, Jonathan Morton wrote:

>> On 20 May, 2016, at 20:01, Rick Jones <rick.jones2@hpe.com> wrote:
>>
>> But I haven't seen that EMSGSIZE happen with netperf UDP tests - could be though I've never run them in an environment which triggered PTMUD.
>
> It’s entirely possible that netperf and/or iperf3 are (ab)using the IP_MTU_DISCOVERY socket option mentioned.

iperf3 defaults to a mss of 1200 bytes, well below the MTU

David Lang

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 17:26                     ` David Lang
@ 2016-05-20 17:33                       ` Jonathan Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 17:33 UTC (permalink / raw)
  To: David Lang; +Cc: Rick Jones, cake, codel


> On 20 May, 2016, at 20:26, David Lang <david@lang.hm> wrote:
> 
> iperf3 defaults to a mss of 1200 bytes, well below the MTU

That’s not what was implied by the test run earlier.  It turned out to be producing large, heavily fragmented packets by default.

Unless I’ve somehow completely got the wrong end of the stick.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Codel] [Cake] Proposing COBALT
  2016-05-20 13:41     ` David Lang
  2016-05-20 13:46       ` moeller0
@ 2016-05-20 14:09       ` Jonathan Morton
  1 sibling, 0 replies; 45+ messages in thread
From: Jonathan Morton @ 2016-05-20 14:09 UTC (permalink / raw)
  To: David Lang; +Cc: moeller0, cake, codel

> On 20 May, 2016, at 16:41, David Lang <david@lang.hm> wrote:
> 
> On Fri, 20 May 2016, Jonathan Morton wrote:
> 
>> Normal traffic does not include large numbers of fragmented packets (I would expect a mere handful from certain one-shot request-response protocols which can produce large responses), so it is better to shunt them to a single queue per host-pair.
> 
> I don't agree with this.
> 
> Normal traffic on a well setup network should not include large numbers of fragmented packets. But I have seen too many networks that fragment almost everything as a result of there being a hop that goes through one or more tunneling layers that lower the effective MTU (and no, path mtu discovery does not always work)

One case of which would be the misconfigured PPPoE link Sebastian mentioned.

But I don’t think this is as big a problem as you do.  Most latency-sensitive protocols (and critical TCP phases such as handshake and teardown) use sub-MTU sized packets, so are less likely to be fragmented, so will still benefit from flow isolation.

And under normal circumstances, most MTU-sized packets are associated with congestion-responsive protocols, which can tolerate being shunted into a single AQM-managed subqueue per host-pair.  Flow isolation also still occurs between traffic to different hosts.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-06-27  7:59 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20 10:04 [Codel] Proposing COBALT Jonathan Morton
2016-05-20 11:37 ` [Codel] [Cake] " moeller0
2016-05-20 12:18   ` Jonathan Morton
2016-05-20 13:22     ` moeller0
2016-05-20 14:36       ` Jonathan Morton
2016-05-20 16:03         ` David Lang
2016-05-20 17:31           ` Jonathan Morton
     [not found]         ` <CALnBQ5mNgHgFoTcvLxppv2P9XODc4D-4NObKyqbZJ0PccVkwiA@mail.gmail.com>
2016-05-20 16:43           ` Jonathan Morton
2016-05-23 18:30             ` Jonathan Morton
2016-05-24 13:47               ` Jeff Weeks
2016-05-24 14:07                 ` Jonathan Morton
2016-05-24 15:52                   ` Dave Täht
2016-05-24 15:56                     ` Jonathan Morton
2016-05-24 16:02                       ` Dave Taht
2016-05-26 12:33                     ` Jonathan Morton
2016-06-03 19:09                       ` Noah Causin
2016-06-03 19:34                         ` Jonathan Morton
2016-06-04  1:01                           ` Andrew McGregor
2016-06-04  6:23                             ` Jonathan Morton
2016-06-04 13:55                             ` Jonathan Morton
2016-06-04 14:01                               ` moeller0
2016-06-04 14:16                                 ` Jonathan Morton
2016-06-04 15:03                                   ` moeller0
2016-06-04 17:10                               ` Noah Causin
2016-06-04 17:49                                 ` Eric Dumazet
2016-06-04 19:55                                   ` Jonathan Morton
2016-06-04 20:56                                     ` Eric Dumazet
2016-06-27  3:56                                     ` Jonathan Morton
2016-06-27  7:59                                       ` moeller0
2016-05-20 13:41     ` David Lang
2016-05-20 13:46       ` moeller0
2016-05-20 14:04         ` David Lang
2016-05-20 14:42           ` Kathleen Nichols
2016-05-20 15:11             ` Jonathan Morton
2016-05-20 15:12           ` Jonathan Morton
2016-05-20 16:05             ` David Lang
2016-05-20 17:06               ` Jonathan Morton
2016-05-20 16:20             ` Rick Jones
2016-05-20 16:35               ` Jonathan Morton
2016-05-20 17:01                 ` Rick Jones
2016-05-20 17:07                   ` Jonathan Morton
2016-05-20 17:21                     ` Rick Jones
2016-05-20 17:26                     ` David Lang
2016-05-20 17:33                       ` Jonathan Morton
2016-05-20 14:09       ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox