Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed
* [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
@ 2019-06-13 16:48 Bob Briscoe
  2019-07-09 14:41 ` [Ecn-sane] [tsvwg] " Black, David
  2019-07-09 15:41 ` [Ecn-sane] " Jonathan Morton
  0 siblings, 2 replies; 22+ messages in thread
From: Bob Briscoe @ 2019-06-13 16:48 UTC (permalink / raw)
  To: ecn-sane, tcpm IETF list; +Cc: tsvwg IETF list

[-- Attachment #1: Type: text/plain, Size: 5668 bytes --]


[I'm sending this to ecn-sane 'cos that's where I detect that this 
concern is still rumbling.
I'm also sending to tcpm@ietf 'cos there's a question for TCP experts 
just before the quoted text below.
And tsvwg@ietf is where it ought to be discussed.]

Now that the IPR issue with L4S has been put to bed, one by one I am 
going through the other concerns that have been raised about L4S.

In the IETF draft that records all the pros and cons of different 
identifiers to use for L4S, under the "ECT(1) and CE" choice (which is 
currently the one adopted at the IETF) there was already an explanation 
of why there would be vanishingly low risk of any harmful consequences 
from CE that was originally ECT(0) being classified into the L4S queue:
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#page-32

Re-reading that, I have found some things unstated that I had thought 
were obvious. So I've spelled it all out long-hand in the text below, 
which is now in my local copy of the draft and will be in the next 
revision unless people suggest improvements/corrections here.

*Q#1:* If this glosses over any concerns you have, please explain.
Otherwise I will continue to consider that this is effectively a 
non-issue, which is the conclusion everyone in the TCP community came to 
at the time the L4S identifier was chosen back in 2015.

*Q#2: *The last couple of lines are the only part I am not sure of. Do 
most of today's TCP implementations recover the reduction in congestion 
window when they discover later that a fast retransmit was spurious? 
There's a note at the end of the intro to rfc4015 saying there was 
insufficient consensus to standardize this behaviour, but that most 
likely means it's done in different ways, rather than it isn't done at all.


Bob


======================================

    Risk of reordering classic CE packets:  Classifying all CE packets
       into the L4S queue risks any CE packets that were originally
       ECT(0) being incorrectly classified as L4S.  If there were delay
       in the Classic queue, these incorrectly classified CE packets
       would arrive early, which is a form of reordering.  Reordering can
       cause TCP senders (and senders of similar transports) to
       retransmit spuriously.  However, the risk of spurious
       retransmissions would be extremely low for the following reasons:

       1.  It is quite unusual to experience queuing at more than one
           bottleneck on the same path (the available capacities have to
           be identical).

       2.  In only a subset of these unusual cases would the first
           bottleneck support classic ECN marking while the second
           supported L4S ECN marking, which would be the only scenario
           where some ECT(0) packets could be CE marked by a non-L4S AQM
           then the remainder experienced further delay through the
           Classic side of a subsequent L4S DualQ AQM.

       3.  Even then, when a few packets are delivered early, it takes
           very unusual conditions to cause a spurious retransmission, in
           contrast to when some packets are delivered late.  The first
           bottleneck has to apply CE-marks to at least N contiguous
           packets and the second bottleneck has to inject an
           uninterrupted sequence of at least N of these packets between
           two packets earlier in the stream (where N is the reordering
           window that the transport protocol allows before it considers
           a packet is lost).

              For example consider N=3, and consider the sequence of
              packets 100, 101, 102, 103,... and imagine that packets
              150,151,152 from later in the flow are injected as follows:
              100, 150, 151, 101, 152, 102, 103...  If this were late
              reordering, even one packet arriving 50 out of sequence
              would trigger a spurious retransmission, but there is no
              spurious retransmission here, because packet 101 moves the
              cumulative ACK counter forward before 3 packets have
              arrived out of order.  Later, when packets 148, 149, 153...
              arrive, even though there is a 3-packet hole, there will be
              no problem, because the packets to fill the hole are
              already in the receive buffer.

       4.  Even with the current recommended TCP (N=3) spurious
           retransmissions will be unlikely for all the above reasons.
           As RACK [I-D.ietf-tcpm-rack] is becoming widely deployed, it
           tends to adapt its reordering window to a larger value of N,
           which will make the chance of a contiguous sequence of N early
           arrivals vanishingly small.

       5.  Even a run of 2 CE marks within a classic ECN flow is
           unlikely, given FQ-CoDel is the only known widely deployed AQM
           that supports classic ECN marking and it takes great care to
           separate out flows and to space any markings evenly along each
           flow.

       It is extremely unlikely that the above set of 5 eventualities
       that are each unusual in themselves would all happen
       simultaneously.  But, even if they did, the consequences would
       hardly be dire: the odd spurious fast retransmission.  Admittedly
       TCP reduces its congestion window when it deems there has been a
       loss, but even this can be recovered once the sender detects that
       the retransmission was spurious.




-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


[-- Attachment #2: Type: text/html, Size: 6296 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-06-13 16:48 [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S Bob Briscoe
@ 2019-07-09 14:41 ` Black, David
  2019-07-09 15:32   ` [Ecn-sane] [tcpm] " Neal Cardwell
  2019-07-09 15:41 ` [Ecn-sane] " Jonathan Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Black, David @ 2019-07-09 14:41 UTC (permalink / raw)
  To: Bob Briscoe, ecn-sane, tcpm IETF list; +Cc: tsvwg IETF list, Black, David

[-- Attachment #1: Type: text/plain, Size: 8965 bytes --]

Bob,

Commenting as an individual, not a WG chair.

> Q#1: If this glosses over any concerns you have, please explain.

It does gloss over, at least for me.  The TL;DR summary is that items 1-3 aren’t relevant or helpful, IMHO, leaving items 4 and 5, whose effectiveness depends on widespread deployment of RACK and FQ AQMs (e.g., FQ-CoDel) respectively.

Items 1 & 2: The general expectation for Internet transport protocols is that they’re robust against “stupid network tricks” like reordering, but existing protocols transport wind up being designed/implemented for the network we have, not the one we wish we had.  I’m generally skeptical of “highly unlikely” arguments, as horrendous results in a highly unlikely scenario are not acceptable if that scenario occurs repeatedly, even with long intervals in between occurrences.  In light of that, I view items 1 and 2 as defining the problem scenario that needs to be addressed, particularly if L4S is to be widely deployed, and prefer to focus on items 3-5 about how the problem is dealt with.

Item 3: This begins by correctly points out that 3DupACK is the criteria for triggering conventional TCP retransmission, e.g., 2DupACK doesn’t.  An aspect that isn’t mentioned is that AQMs for classic (non-L4S) traffic should be randomly marking (above a queue threshold, CE marking probability depends on queue occupancy), not threshold marking (above a queue threshold, mark all packets with CE).  If threshold marking is used, 3 CE marks in a row is a near certainty, as for non-mice flows, one can expect to have at least that many packets in an RTT window; this is a “Doctor it hurts when I do <this>.”/”Don’t do that!” scenario where the right answer is to fix the broken threshold marking implementation.

Assuming probabilistic marking, one then needs to look at 3-in-a-row CE marking probabilities based on the marking rate.  These are not small - for example, at a 10% marking probability, the likelihood of CE-marking 3 packets in a row starting from a specific packet is 1 in 1,000 (1/10th of 1%), but across 500 packets in a flow, that probability is about 50%.   My initial take-away from this is that if the two bottlenecks (conventional followed by L4S) persist, then the “unusual scenario” of 3 CE-marked packets in a row is nearly certain to happen, which suggests that item 3 is not particularly helpful, leaving items 4 (RACK) and 5 (FQ-CoDel).

So, while I don’t have a conclusion to draw, it appears to me that the countermeasures to this conventional TCP flow misbehavior with L4S are deployment of RACK at endpoints and deployment of FQ AQMs such as FQ-CoDel at non-L4S potential bottleneck nodes.  Items 4 and 5 below effectively assert wide deployment of those algorithms – additional information and data on that would be of interest.

Thanks, --David

From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Bob Briscoe
Sent: Thursday, June 13, 2019 12:48 PM
To: ecn-sane@lists.bufferbloat.net; tcpm IETF list
Cc: tsvwg IETF list
Subject: [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S


[EXTERNAL EMAIL]

[I'm sending this to ecn-sane 'cos that's where I detect that this concern is still rumbling.
I'm also sending to tcpm@ietf 'cos there's a question for TCP experts just before the quoted text below.
And tsvwg@ietf is where it ought to be discussed.]

Now that the IPR issue with L4S has been put to bed, one by one I am going through the other concerns that have been raised about L4S.

In the IETF draft that records all the pros and cons of different identifiers to use for L4S, under the "ECT(1) and CE" choice (which is currently the one adopted at the IETF) there was already an explanation of why there would be vanishingly low risk of any harmful consequences from CE that was originally ECT(0) being classified into the L4S queue:
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#page-32

Re-reading that, I have found some things unstated that I had thought were obvious. So I've spelled it all out long-hand in the text below, which is now in my local copy of the draft and will be in the next revision unless people suggest improvements/corrections here.

Q#1: If this glosses over any concerns you have, please explain.
Otherwise I will continue to consider that this is effectively a non-issue, which is the conclusion everyone in the TCP community came to at the time the L4S identifier was chosen back in 2015.

Q#2: The last couple of lines are the only part I am not sure of. Do most of today's TCP implementations recover the reduction in congestion window when they discover later that a fast retransmit was spurious? There's a note at the end of the intro to rfc4015 saying there was insufficient consensus to standardize this behaviour, but that most likely means it's done in different ways, rather than it isn't done at all.


Bob


======================================

   Risk of reordering classic CE packets:  Classifying all CE packets

      into the L4S queue risks any CE packets that were originally

      ECT(0) being incorrectly classified as L4S.  If there were delay

      in the Classic queue, these incorrectly classified CE packets

      would arrive early, which is a form of reordering.  Reordering can

      cause TCP senders (and senders of similar transports) to

      retransmit spuriously.  However, the risk of spurious

      retransmissions would be extremely low for the following reasons:



      1.  It is quite unusual to experience queuing at more than one

          bottleneck on the same path (the available capacities have to

          be identical).



      2.  In only a subset of these unusual cases would the first

          bottleneck support classic ECN marking while the second

          supported L4S ECN marking, which would be the only scenario

          where some ECT(0) packets could be CE marked by a non-L4S AQM

          then the remainder experienced further delay through the

          Classic side of a subsequent L4S DualQ AQM.



      3.  Even then, when a few packets are delivered early, it takes

          very unusual conditions to cause a spurious retransmission, in

          contrast to when some packets are delivered late.  The first

          bottleneck has to apply CE-marks to at least N contiguous

          packets and the second bottleneck has to inject an

          uninterrupted sequence of at least N of these packets between

          two packets earlier in the stream (where N is the reordering

          window that the transport protocol allows before it considers

          a packet is lost).



             For example consider N=3, and consider the sequence of

             packets 100, 101, 102, 103,... and imagine that packets

             150,151,152 from later in the flow are injected as follows:

             100, 150, 151, 101, 152, 102, 103...  If this were late

             reordering, even one packet arriving 50 out of sequence

             would trigger a spurious retransmission, but there is no

             spurious retransmission here, because packet 101 moves the

             cumulative ACK counter forward before 3 packets have

             arrived out of order.  Later, when packets 148, 149, 153...

             arrive, even though there is a 3-packet hole, there will be

             no problem, because the packets to fill the hole are

             already in the receive buffer.



      4.  Even with the current recommended TCP (N=3) spurious

          retransmissions will be unlikely for all the above reasons.

          As RACK [I-D.ietf-tcpm-rack] is becoming widely deployed, it

          tends to adapt its reordering window to a larger value of N,

          which will make the chance of a contiguous sequence of N early

          arrivals vanishingly small.



      5.  Even a run of 2 CE marks within a classic ECN flow is

          unlikely, given FQ-CoDel is the only known widely deployed AQM

          that supports classic ECN marking and it takes great care to

          separate out flows and to space any markings evenly along each

          flow.



      It is extremely unlikely that the above set of 5 eventualities

      that are each unusual in themselves would all happen

      simultaneously.  But, even if they did, the consequences would

      hardly be dire: the odd spurious fast retransmission.  Admittedly

      TCP reduces its congestion window when it deems there has been a

      loss, but even this can be recovered once the sender detects that

      the retransmission was spurious.





--

________________________________________________________________

Bob Briscoe                               http://bobbriscoe.net/

[-- Attachment #2: Type: text/html, Size: 17465 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tcpm] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-07-09 14:41 ` [Ecn-sane] [tsvwg] " Black, David
@ 2019-07-09 15:32   ` Neal Cardwell
  0 siblings, 0 replies; 22+ messages in thread
From: Neal Cardwell @ 2019-07-09 15:32 UTC (permalink / raw)
  To: Black, David; +Cc: Bob Briscoe, ecn-sane, tcpm IETF list, tsvwg IETF list

On the particular question of Q#2 ("Do most of today's TCP
implementations recover the reduction in congestion window when they
discover later that a fast retransmit was spurious?"), one relevant
data point: Linux TCP should generally be able to  revert reductions
in cwnd from spurious fast recovery episodes, using either TCP
timestamps (along the lines of RFC 4015, starting from at least as far
back as 2005 and working well since 7026b912f97d912 in 2013) or DSACKs
(functioning well from 5628adf1a0ff3 in late 2011).

thanks,
neal

On Tue, Jul 9, 2019 at 10:44 AM Black, David <David.Black@dell.com> wrote:
>
> Bob,
>
>
>
> Commenting as an individual, not a WG chair.
>
>
>
> > Q#1: If this glosses over any concerns you have, please explain.
>
>
>
> It does gloss over, at least for me.  The TL;DR summary is that items 1-3 aren’t relevant or helpful, IMHO, leaving items 4 and 5, whose effectiveness depends on widespread deployment of RACK and FQ AQMs (e.g., FQ-CoDel) respectively.
>
>
>
> Items 1 & 2: The general expectation for Internet transport protocols is that they’re robust against “stupid network tricks” like reordering, but existing protocols transport wind up being designed/implemented for the network we have, not the one we wish we had.  I’m generally skeptical of “highly unlikely” arguments, as horrendous results in a highly unlikely scenario are not acceptable if that scenario occurs repeatedly, even with long intervals in between occurrences.  In light of that, I view items 1 and 2 as defining the problem scenario that needs to be addressed, particularly if L4S is to be widely deployed, and prefer to focus on items 3-5 about how the problem is dealt with.
>
>
>
> Item 3: This begins by correctly points out that 3DupACK is the criteria for triggering conventional TCP retransmission, e.g., 2DupACK doesn’t.  An aspect that isn’t mentioned is that AQMs for classic (non-L4S) traffic should be randomly marking (above a queue threshold, CE marking probability depends on queue occupancy), not threshold marking (above a queue threshold, mark all packets with CE).  If threshold marking is used, 3 CE marks in a row is a near certainty, as for non-mice flows, one can expect to have at least that many packets in an RTT window; this is a “Doctor it hurts when I do <this>.”/”Don’t do that!” scenario where the right answer is to fix the broken threshold marking implementation.
>
>
>
> Assuming probabilistic marking, one then needs to look at 3-in-a-row CE marking probabilities based on the marking rate.  These are not small - for example, at a 10% marking probability, the likelihood of CE-marking 3 packets in a row starting from a specific packet is 1 in 1,000 (1/10th of 1%), but across 500 packets in a flow, that probability is about 50%.   My initial take-away from this is that if the two bottlenecks (conventional followed by L4S) persist, then the “unusual scenario” of 3 CE-marked packets in a row is nearly certain to happen, which suggests that item 3 is not particularly helpful, leaving items 4 (RACK) and 5 (FQ-CoDel).
>
>
>
> So, while I don’t have a conclusion to draw, it appears to me that the countermeasures to this conventional TCP flow misbehavior with L4S are deployment of RACK at endpoints and deployment of FQ AQMs such as FQ-CoDel at non-L4S potential bottleneck nodes.  Items 4 and 5 below effectively assert wide deployment of those algorithms – additional information and data on that would be of interest.
>
>
>
> Thanks, --David
>
>
>
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Bob Briscoe
> Sent: Thursday, June 13, 2019 12:48 PM
> To: ecn-sane@lists.bufferbloat.net; tcpm IETF list
> Cc: tsvwg IETF list
> Subject: [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
>
>
>
> [EXTERNAL EMAIL]
>
>
> [I'm sending this to ecn-sane 'cos that's where I detect that this concern is still rumbling.
> I'm also sending to tcpm@ietf 'cos there's a question for TCP experts just before the quoted text below.
> And tsvwg@ietf is where it ought to be discussed.]
>
> Now that the IPR issue with L4S has been put to bed, one by one I am going through the other concerns that have been raised about L4S.
>
> In the IETF draft that records all the pros and cons of different identifiers to use for L4S, under the "ECT(1) and CE" choice (which is currently the one adopted at the IETF) there was already an explanation of why there would be vanishingly low risk of any harmful consequences from CE that was originally ECT(0) being classified into the L4S queue:
> https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#page-32
>
> Re-reading that, I have found some things unstated that I had thought were obvious. So I've spelled it all out long-hand in the text below, which is now in my local copy of the draft and will be in the next revision unless people suggest improvements/corrections here.
>
> Q#1: If this glosses over any concerns you have, please explain.
> Otherwise I will continue to consider that this is effectively a non-issue, which is the conclusion everyone in the TCP community came to at the time the L4S identifier was chosen back in 2015.
>
> Q#2: The last couple of lines are the only part I am not sure of. Do most of today's TCP implementations recover the reduction in congestion window when they discover later that a fast retransmit was spurious? There's a note at the end of the intro to rfc4015 saying there was insufficient consensus to standardize this behaviour, but that most likely means it's done in different ways, rather than it isn't done at all.
>
>
> Bob
>
>
> ======================================
>
>    Risk of reordering classic CE packets:  Classifying all CE packets
>
>       into the L4S queue risks any CE packets that were originally
>
>       ECT(0) being incorrectly classified as L4S.  If there were delay
>
>       in the Classic queue, these incorrectly classified CE packets
>
>       would arrive early, which is a form of reordering.  Reordering can
>
>       cause TCP senders (and senders of similar transports) to
>
>       retransmit spuriously.  However, the risk of spurious
>
>       retransmissions would be extremely low for the following reasons:
>
>
>
>       1.  It is quite unusual to experience queuing at more than one
>
>           bottleneck on the same path (the available capacities have to
>
>           be identical).
>
>
>
>       2.  In only a subset of these unusual cases would the first
>
>           bottleneck support classic ECN marking while the second
>
>           supported L4S ECN marking, which would be the only scenario
>
>           where some ECT(0) packets could be CE marked by a non-L4S AQM
>
>           then the remainder experienced further delay through the
>
>           Classic side of a subsequent L4S DualQ AQM.
>
>
>
>       3.  Even then, when a few packets are delivered early, it takes
>
>           very unusual conditions to cause a spurious retransmission, in
>
>           contrast to when some packets are delivered late.  The first
>
>           bottleneck has to apply CE-marks to at least N contiguous
>
>           packets and the second bottleneck has to inject an
>
>           uninterrupted sequence of at least N of these packets between
>
>           two packets earlier in the stream (where N is the reordering
>
>           window that the transport protocol allows before it considers
>
>           a packet is lost).
>
>
>
>              For example consider N=3, and consider the sequence of
>
>              packets 100, 101, 102, 103,... and imagine that packets
>
>              150,151,152 from later in the flow are injected as follows:
>
>              100, 150, 151, 101, 152, 102, 103...  If this were late
>
>              reordering, even one packet arriving 50 out of sequence
>
>              would trigger a spurious retransmission, but there is no
>
>              spurious retransmission here, because packet 101 moves the
>
>              cumulative ACK counter forward before 3 packets have
>
>              arrived out of order.  Later, when packets 148, 149, 153...
>
>              arrive, even though there is a 3-packet hole, there will be
>
>              no problem, because the packets to fill the hole are
>
>              already in the receive buffer.
>
>
>
>       4.  Even with the current recommended TCP (N=3) spurious
>
>           retransmissions will be unlikely for all the above reasons.
>
>           As RACK [I-D.ietf-tcpm-rack] is becoming widely deployed, it
>
>           tends to adapt its reordering window to a larger value of N,
>
>           which will make the chance of a contiguous sequence of N early
>
>           arrivals vanishingly small.
>
>
>
>       5.  Even a run of 2 CE marks within a classic ECN flow is
>
>           unlikely, given FQ-CoDel is the only known widely deployed AQM
>
>           that supports classic ECN marking and it takes great care to
>
>           separate out flows and to space any markings evenly along each
>
>           flow.
>
>
>
>       It is extremely unlikely that the above set of 5 eventualities
>
>       that are each unusual in themselves would all happen
>
>       simultaneously.  But, even if they did, the consequences would
>
>       hardly be dire: the odd spurious fast retransmission.  Admittedly
>
>       TCP reduces its congestion window when it deems there has been a
>
>       loss, but even this can be recovered once the sender detects that
>
>       the retransmission was spurious.
>
>
>
>
>
> --
>
> ________________________________________________________________
>
> Bob Briscoe                               http://bobbriscoe.net/
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-06-13 16:48 [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S Bob Briscoe
  2019-07-09 14:41 ` [Ecn-sane] [tsvwg] " Black, David
@ 2019-07-09 15:41 ` Jonathan Morton
  2019-07-09 23:08   ` [Ecn-sane] [tsvwg] " Yuchung Cheng
  2019-08-02  8:29   ` Ruediger.Geib
  1 sibling, 2 replies; 22+ messages in thread
From: Jonathan Morton @ 2019-07-09 15:41 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: ecn-sane, tcpm IETF list, tsvwg IETF list

> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
>       1.  It is quite unusual to experience queuing at more than one
>           bottleneck on the same path (the available capacities have to
>           be identical).

Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.

Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.

TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-07-09 15:41 ` [Ecn-sane] " Jonathan Morton
@ 2019-07-09 23:08   ` Yuchung Cheng
  2019-08-02  8:29   ` Ruediger.Geib
  1 sibling, 0 replies; 22+ messages in thread
From: Yuchung Cheng @ 2019-07-09 23:08 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Bob Briscoe, tcpm IETF list, ecn-sane, tsvwg IETF list

On Tue, Jul 9, 2019 at 8:41 AM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> >
> >       1.  It is quite unusual to experience queuing at more than one
> >           bottleneck on the same path (the available capacities have to
> >           be identical).
>
> Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.
>
> Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.
>
> TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.
We (Google TCP folks) are internally experimenting (always) pacing IW.
May hurt very long RTT and short transfers (<=IW), but could be an
overall win.

>
>  - Jonathan Morton
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-07-09 15:41 ` [Ecn-sane] " Jonathan Morton
  2019-07-09 23:08   ` [Ecn-sane] [tsvwg] " Yuchung Cheng
@ 2019-08-02  8:29   ` Ruediger.Geib
  2019-08-02  9:47     ` Jonathan Morton
  2019-08-02 13:15     ` Sebastian Moeller
  1 sibling, 2 replies; 22+ messages in thread
From: Ruediger.Geib @ 2019-08-02  8:29 UTC (permalink / raw)
  To: chromatix99; +Cc: tcpm, ecn-sane, tsvwg

Hi Jonathan,

could you provide a real world example of links which are consecutively narrower than sender access links?

I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).

There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 

Regards,

Ruediger 

-----Ursprüngliche Nachricht-----
Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Jonathan Morton
Gesendet: Dienstag, 9. Juli 2019 17:41
An: Bob Briscoe <ietf@bobbriscoe.net>
Cc: tcpm IETF list <tcpm@ietf.org>; ecn-sane@lists.bufferbloat.net; tsvwg IETF list <tsvwg@ietf.org>
Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S

> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
>       1.  It is quite unusual to experience queuing at more than one
>           bottleneck on the same path (the available capacities have to
>           be identical).

Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.

Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.

TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-02  8:29   ` Ruediger.Geib
@ 2019-08-02  9:47     ` Jonathan Morton
  2019-08-02 11:10       ` Dave Taht
  2019-08-05  9:35       ` Ruediger.Geib
  2019-08-02 13:15     ` Sebastian Moeller
  1 sibling, 2 replies; 22+ messages in thread
From: Jonathan Morton @ 2019-08-02  9:47 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: tcpm, ecn-sane, tsvwg

> On 2 Aug, 2019, at 9:29 am, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Jonathan,
> 
> could you provide a real world example of links which are consecutively narrower than sender access links?
> 
> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
> 
> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 

A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:

1: Core to Access network peering.  Most responsible ISPs regularly upgrade these links to minimise congestion, subject to cost effectiveness.  Some consumer ISPs however are less responsible, and permit regular congestion here, often on a daily and/or weekly cycle according to demand.  Even the responsible ones may experience short-term congestion here due to exceptional events.  Even if the originating server's NIC is slower than the peering link, queuing may occur here if the link is congested overall due to statistical multiplexing.

2: Access to Backhaul provisioning shaper.  Many ISPs have a provisioning shaper to handle "poverty tariffs" with deliberately limited capacity.  It may also be used to impose a sanity check on inbound traffic bursts, to limit backhaul network traffic to that actually deliverable to the customer (especially when the backhaul network is itself subcontracted on a gigabytes-carried basis, as is common in the UK).  In the ADSL context it's often called a BRAS.

3: Backhaul to head-end device.  Generally the backhaul network is sized to support many head-end devices, each of which serves some number of consumer last-mile links.  I'm being deliberately generic here; it could be a CMTS on a cable network, a DSLAM in a phone exchange, a cell tower, or a long-range wifi hub.  In many cases the head-end device shares several subscribers' lines over a common last-mile medium, so there is still some statistical multiplexing.  In the particular case of a cell tower, the subscriber usually gets less link capacity (due to propagation conditions) than his tariff limit.

4: CPE bufferbloat mitigation shaper.  This is *post-last-mile* ingress shaping, with AQM and FQ, to reduce the effects of the still-ubiquitous dumb FIFOs on the above bottlenecks, especially the head-end device.  The IQrouter is a ready-made CPE device which does this.

5: LAN, wifi, or powerline link.  Most people now have GigE, but 100base-TX is still in use in cheaper CPE and some low-end computers, and this will be a further bottleneck in cases where the last mile is faster.  For example, the Raspberry Pi has only just upgraded to GigE in its latest versions, and used 100base-TX before.  Wifi is also a likely bottleneck, especially if the mobile station is on the opposite side of the house from the base CPE, and is additionally a "bursty MAC" with heavy aggregation.  Some CPE now runs airtime-fairness and FQ-AQM on wifi to help manage that.

I think the above collection is not at all exotic.  Not all of these will apply in every case, or at all times in any given case, but it is certainly possible for all five to apply consecutively.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-02  9:47     ` Jonathan Morton
@ 2019-08-02 11:10       ` Dave Taht
  2019-08-02 12:05         ` Dave Taht
  2019-08-05  9:35       ` Ruediger.Geib
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Taht @ 2019-08-02 11:10 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Ruediger.Geib, tcpm IETF list, ECN-Sane, tsvwg IETF list

On Fri, Aug 2, 2019 at 2:47 AM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 2 Aug, 2019, at 9:29 am, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> >
> > Hi Jonathan,
> >
> > could you provide a real world example of links which are consecutively narrower than sender access links?
> >
> > I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
> >
> > There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim.
>
> A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:

0: Container and vm users are frequently using htb + something to keep
their bandwidths under control.

0.5: Cloudy providers use "something" to also rate limit traffic.
Policers and shapers, I assume.

> 1: Core to Access network peering.  Most responsible ISPs regularly upgrade these links to minimise congestion, subject to cost effectiveness.  Some consumer ISPs however are less responsible, and permit regular congestion here, often on a daily and/or weekly cycle according to demand.  Even the responsible ones may experience short-term congestion here due to exceptional events.  Even if the originating server's NIC is slower than the peering link, queuing may occur here if the link is congested overall due to statistical multiplexing.
>
> 2: Access to Backhaul provisioning shaper.  Many ISPs have a provisioning shaper to handle "poverty tariffs" with deliberately limited capacity.  It may also be used to impose a sanity check on inbound traffic bursts, to limit backhaul network traffic to that actually deliverable to the customer (especially when the backhaul network is itself subcontracted on a gigabytes-carried basis, as is common in the UK).  In the ADSL context it's often called a BRAS.
>
> 3: Backhaul to head-end device.  Generally the backhaul network is sized to support many head-end devices, each of which serves some number of consumer last-mile links.  I'm being deliberately generic here; it could be a CMTS on a cable network, a DSLAM in a phone exchange, a cell tower, or a long-range wifi hub.  In many cases the head-end device shares several subscribers' lines over a common last-mile medium, so there is still some statistical multiplexing.  In the particular case of a cell tower, the subscriber usually gets less link capacity (due to propagation conditions) than his tariff limit.
>
> 4: CPE bufferbloat mitigation shaper.  This is *post-last-mile* ingress shaping, with AQM and FQ, to reduce the effects of the still-ubiquitous dumb FIFOs on the above bottlenecks, especially the head-end device.  The IQrouter is a ready-made CPE device which does this.
>
> 5: LAN, wifi, or powerline link.  Most people now have GigE, but 100base-TX is still in use in cheaper CPE and some low-end computers, and this will be a further bottleneck in cases where the last mile is faster.  For example, the Raspberry Pi has only just upgraded to GigE in its latest versions, and used 100base-TX before.  Wifi is also a likely bottleneck, especially if the mobile station is on the opposite side of the house from the base CPE, and is additionally a "bursty MAC" with heavy aggregation.  Some CPE now runs airtime-fairness and FQ-AQM on wifi to help manage that.
>
> I think the above collection is not at all exotic.  Not all of these will apply in every case, or at all times in any given case, but it is certainly possible for all five to apply consecutively.
>
>  - Jonathan Morton
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-02 11:10       ` Dave Taht
@ 2019-08-02 12:05         ` Dave Taht
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Taht @ 2019-08-02 12:05 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Ruediger.Geib, tcpm IETF list, ECN-Sane, tsvwg IETF list

On Fri, Aug 2, 2019 at 4:10 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Fri, Aug 2, 2019 at 2:47 AM Jonathan Morton <chromatix99@gmail.com> wrote:
> >
> > > On 2 Aug, 2019, at 9:29 am, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> > >
> > > Hi Jonathan,
> > >
> > > could you provide a real world example of links which are consecutively narrower than sender access links?
> > >
> > > I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
> > >
> > > There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim.
> >
> > A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:
>
> 0: Container and vm users are frequently using htb + something to keep
> their bandwidths under control.
>
> 0.5: Cloudy providers use "something" to also rate limit traffic.
> Policers and shapers, I assume.

Stuff in the cloud thus far is looking quite jittery; sub-ms marking
thresholds do not look feasible. I have no idea what sorts of
software jitter and burstyness exist in NFV and ddpk based implementatons.

>
> > 1: Core to Access network peering.  Most responsible ISPs regularly upgrade these links to minimise congestion, subject to cost effectiveness.  Some consumer ISPs however are less responsible, and permit regular congestion here, often on a daily and/or weekly cycle according to demand.  Even the responsible ones may experience short-term congestion here due to exceptional events.  Even if the originating server's NIC is slower than the peering link, queuing may occur here if the link is congested overall due to statistical multiplexing.
> >
> > 2: Access to Backhaul provisioning shaper.  Many ISPs have a provisioning shaper to handle "poverty tariffs" with deliberately limited capacity.  It may also be used to impose a sanity check on inbound traffic bursts, to limit backhaul network traffic to that actually deliverable to the customer (especially when the backhaul network is itself subcontracted on a gigabytes-carried basis, as is common in the UK).  In the ADSL context it's often called a BRAS.
> >
> > 3: Backhaul to head-end device.  Generally the backhaul network is sized to support many head-end devices, each of which serves some number of consumer last-mile links.  I'm being deliberately generic here; it could be a CMTS on a cable network, a DSLAM in a phone exchange, a cell tower, or a long-range wifi hub.  In many cases the head-end device shares several subscribers' lines over a common last-mile medium, so there is still some statistical multiplexing.  In the particular case of a cell tower, the subscriber usually gets less link capacity (due to propagation conditions) than his tariff limit.

There are also bursts here from the vpn crypto engine.

> > 4: CPE bufferbloat mitigation shaper.  This is *post-last-mile* ingress shaping, with AQM and FQ, to reduce the effects of the still-ubiquitous dumb FIFOs on the above bottlenecks, especially the head-end device.  The IQrouter is a ready-made CPE device which does this.

I would say "inbound shaper" to be clear. And its far, far wider than
just that commercial product, Nearly everyone using this stuff
shapes both in and outbound - be it untangle, netduma, asus "adaptive
qos", edgerouter's or eero's sqm implemention, all of openwrt, dd-wrt,
and deratives, bsd-based pfsense and opfsense, preseem does a bump in
the wire for WISPs, streamboost, I think the list of off the shelf
home router qos systems that shape inbound is well above 80-90%, and
users have been trained to turn it on in both directions
and off only when they run out of cpu. We see new product sales driven
by the cpu cost of having to shape inbound, too.

Policers have become quite useless in the past decade.

> >
> > 5: LAN, wifi, or powerline link.  Most people now have GigE, but 100base-TX is still in use in cheaper CPE and some low-end

I'd so love to see powerline gain fq and AQM. I see it used a lot in
busy apt buildings to drag stuff from room to room. It can be *awful*

>computers, and this will be a further bottleneck in cases where the last mile is faster.  For example, the Raspberry Pi has only just upgraded to GigE in its latest versions, and used 100base-TX before.  Wifi is also a likely bottleneck, especially if the mobile station is on the opposite side of the house from the base CPE, and is additionally a "bursty MAC" with heavy aggregation.  Some CPE now runs airtime-fairness and FQ-AQM on wifi to help manage that.

"some" includes most of qca's ath9k or ath10k products. (40% of the AP
market?) Prominently known fq-codel for wifi users are ~3m chromebook
users and ~3m google wifi, (
http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf
) and nearly everyone else in the 802.11s meshy market... meraki is a
known sfq + codel user... starting in 2014...

A large number of wifi APs and p2p links oft have a faster wifi speed
than their 100base-tx link, and thus use the ethernet (with either a
short fifo or fq_codel) to smooth the bursts out. I can point to a few
ubnt products that I know a bit too much about. There's also some
802.11ad and ay.... there is a quite remarkable amount of 100base-tx
gear still being deployed.

If it helps any to those here doing simulation, long ago I put a
"slot" and statistical distribution of busty mac delay feature into
netem. I can say now that that was used to help guide the development
of google "stadia", and certainly "slotting" is a MAJOR tool that I'd
plug into the SCE work to better emulate the real orld behaviors of
bursty macs. google has collected WAY more examples characterizing
real world micro(bursts) than I could ever deal with, and I hope they
publish that data someday for others to use.

> >
> > I think the above collection is not at all exotic.  Not all of these will apply in every case, or at all times in any given case, but it is certainly possible for all five to apply consecutively.

For bursty.

> >  - Jonathan Morton
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/ecn-sane
>
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-02  8:29   ` Ruediger.Geib
  2019-08-02  9:47     ` Jonathan Morton
@ 2019-08-02 13:15     ` Sebastian Moeller
  2019-08-05  7:26       ` Ruediger.Geib
  1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Moeller @ 2019-08-02 13:15 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: Jonathan Morton, tcpm, ECN-Sane, tsvwg

Hi Ruediger,



> On Aug 2, 2019, at 10:29, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Jonathan,
> 
> could you provide a real world example of links which are consecutively narrower than sender access links?

	Just an example from a network you might be comfortable with, in DTAGs internet access network there typically are traffic limiting elements at the BNGs (or at the BRAS for the legacy network), I am not 100% sure whether these are implemented as policers or shapers, but they tended to come with >= 300ms buffering. Since recently, the BNG/BRAS traffic shaper's use the message field of the PPPoE Auth ACK to transfer information about the TCP/IPv4 Goodput endusers can expect on their link as a consequence of the BNG/BRAS"s traffic limiter. In DOCSIS and GPON networks the traffic shaper seems mandated by the standards, in DSL networks it seems optional (but there even without a shaper the limited bandwidth of the access link would be a natural traffic choke point).
 Fritzbox home router's now use this information to automatically set egress (and I believe also) ingress traffic shaping on the CPE to reduce the bufferbloat users experience. I have no insight in what Telekom's own Speedport routers do, but I would not be surprised if they would do the same (at least for egress). 
	As Jonathan and Dave mentioned, quite a number of end-users, especially the latency sensitive ones, employ their own ingress and egress traffic shapers at their home routers as the 300ms buffers of the BNG's are just not acceptable for any real-timish uses (VoIP, on-line twitch gaming, even for interactive sessions like ssh 300ms delay are undesirable). E.g. personally, I use an OpenWrt router with an FQ AQM for both ingress and egress (based on Jonathan's excellent cake qdisc) that allows a family of 5 to happily share a 50/10 connection between video streaming and interactive use with very little interference between the users, the same link with out the FQ-AQM active makes interactive applications feel like submerged in molasses once the link gets saturated...
	As far as I can tell there is a number of different solutions that offer home-router based bi-directional traffic shaping to solve bufferbloat" from home (well, not fully solve it, but remedy its consequences), including commercial options like evenroute's iqrouter, and open-source options like OpenWrt (with sqm-scripts as shaper packet). 
	It is exactly this use case and the fact that latency-sensitive users often opt for this solution, that causes me to ask the L4S crowd to actually measure the effect of L4S on RFC3168-FQ-AQMs in the exact configuration it is actually used today, to remedy the same issue L4S wants to tackle.

Best Regards
	Sebastian


> 
> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
> 
> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 
> 
> Regards,
> 
> Ruediger 
> 
> -----Ursprüngliche Nachricht-----
> Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Jonathan Morton
> Gesendet: Dienstag, 9. Juli 2019 17:41
> An: Bob Briscoe <ietf@bobbriscoe.net>
> Cc: tcpm IETF list <tcpm@ietf.org>; ecn-sane@lists.bufferbloat.net; tsvwg IETF list <tsvwg@ietf.org>
> Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
> 
>> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>> 
>>      1.  It is quite unusual to experience queuing at more than one
>>          bottleneck on the same path (the available capacities have to
>>          be identical).
> 
> Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.
> 
> Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.
> 
> TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.
> 
> - Jonathan Morton
> 
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-02 13:15     ` Sebastian Moeller
@ 2019-08-05  7:26       ` Ruediger.Geib
  2019-08-05 11:00         ` Sebastian Moeller
  0 siblings, 1 reply; 22+ messages in thread
From: Ruediger.Geib @ 2019-08-05  7:26 UTC (permalink / raw)
  To: moeller0; +Cc: tcpm, ecn-sane, tsvwg

Hi Sebastian,

the access link is the bottleneck, that's what's to be expected. As far as I know, in the operator world shapers here by and large removed policers.

A consecutive chain of narrower links results, if the Home Gateway runs with an additional ingress or egress shaper operating below the access bandwidth, if I get you right. 

I understand that you aren't interested in having 300ms buffer delay and may be some jitter for a phone conversation using best effort transport. A main driver for changes in consumer IP access features in Germany are publications of journals and regulators comparing IP access performance of different providers. Should one provider have an advantage over the others by deploying a solution as you (and Bob's team) work on, it likely will be generally deployed.

As far as I can see, latency aware consumers still are a minority and gamers seem to be a big group belonging here. Interest in well performing gaming seems to be growing, I guess (for me at least it's an impression rather than a clear trend).

I'd personally prefer an easy to deploy and operate standard solution offering Best Effort based transport being TCP friendly and at the same time congestion free for other flows at a BNG for traffic in access direction (and for similar devices in other architectures of course). 

Fighting bufferbloat in the upstream direction the way you describe it doesn't construct a chain of links which are consecutively narrower than the bottleneck link, I think.

Regards,

Ruediger

 



-----Ursprüngliche Nachricht-----
Von: Sebastian Moeller <moeller0@gmx.de> 
Gesendet: Freitag, 2. August 2019 15:15
An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
Cc: Jonathan Morton <chromatix99@gmail.com>; tcpm@ietf.org; ECN-Sane <ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org
Betreff: Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S

Hi Ruediger,



> On Aug 2, 2019, at 10:29, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Jonathan,
> 
> could you provide a real world example of links which are consecutively narrower than sender access links?

	Just an example from a network you might be comfortable with, in DTAGs internet access network there typically are traffic limiting elements at the BNGs (or at the BRAS for the legacy network), I am not 100% sure whether these are implemented as policers or shapers, but they tended to come with >= 300ms buffering. Since recently, the BNG/BRAS traffic shaper's use the message field of the PPPoE Auth ACK to transfer information about the TCP/IPv4 Goodput endusers can expect on their link as a consequence of the BNG/BRAS"s traffic limiter. In DOCSIS and GPON networks the traffic shaper seems mandated by the standards, in DSL networks it seems optional (but there even without a shaper the limited bandwidth of the access link would be a natural traffic choke point).
 Fritzbox home router's now use this information to automatically set egress (and I believe also) ingress traffic shaping on the CPE to reduce the bufferbloat users experience. I have no insight in what Telekom's own Speedport routers do, but I would not be surprised if they would do the same (at least for egress). 
	As Jonathan and Dave mentioned, quite a number of end-users, especially the latency sensitive ones, employ their own ingress and egress traffic shapers at their home routers as the 300ms buffers of the BNG's are just not acceptable for any real-timish uses (VoIP, on-line twitch gaming, even for interactive sessions like ssh 300ms delay are undesirable). E.g. personally, I use an OpenWrt router with an FQ AQM for both ingress and egress (based on Jonathan's excellent cake qdisc) that allows a family of 5 to happily share a 50/10 connection between video streaming and interactive use with very little interference between the users, the same link with out the FQ-AQM active makes interactive applications feel like submerged in molasses once the link gets saturated...
	As far as I can tell there is a number of different solutions that offer home-router based bi-directional traffic shaping to solve bufferbloat" from home (well, not fully solve it, but remedy its consequences), including commercial options like evenroute's iqrouter, and open-source options like OpenWrt (with sqm-scripts as shaper packet). 
	It is exactly this use case and the fact that latency-sensitive users often opt for this solution, that causes me to ask the L4S crowd to actually measure the effect of L4S on RFC3168-FQ-AQMs in the exact configuration it is actually used today, to remedy the same issue L4S wants to tackle.

Best Regards
	Sebastian


> 
> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
> 
> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 
> 
> Regards,
> 
> Ruediger 
> 
> -----Ursprüngliche Nachricht-----
> Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Jonathan Morton
> Gesendet: Dienstag, 9. Juli 2019 17:41
> An: Bob Briscoe <ietf@bobbriscoe.net>
> Cc: tcpm IETF list <tcpm@ietf.org>; ecn-sane@lists.bufferbloat.net; tsvwg IETF list <tsvwg@ietf.org>
> Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
> 
>> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>> 
>>      1.  It is quite unusual to experience queuing at more than one
>>          bottleneck on the same path (the available capacities have to
>>          be identical).
> 
> Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.
> 
> Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.
> 
> TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.
> 
> - Jonathan Morton
> 
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-02  9:47     ` Jonathan Morton
  2019-08-02 11:10       ` Dave Taht
@ 2019-08-05  9:35       ` Ruediger.Geib
  2019-08-05 10:59         ` Jonathan Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Ruediger.Geib @ 2019-08-05  9:35 UTC (permalink / raw)
  To: chromatix99; +Cc: tcpm, ecn-sane, tsvwg

Jonathan Morton marked [JM] below, Ruediger Geib [RG].

> On 2 Aug, 2019, at 9:29 am, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Jonathan,
> 
> could you provide a real world example of links which are consecutively narrower than sender access links?
> 
> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
> 
> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 

[JM] A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:

[RG] deleted. Please read https://tools.ietf.org/html/rfc5127#page-3 , first two sentences. That's a sound starting point, and I don't think much has changed since 2005. 

[RG] About the bursts to expect, it's probably worth noting that today's most popular application generating traffic bursts is watching video clips streamed over the Internet. Viewers dislike the movies to stall. My impression is, all major CDNs are aware of that and try their best to avoid this situation. In particular, I don't expect streaming bursts to overwhelm access link shaper buffers by design. And that, I think, limits burst sizes of the majority of traffic.

[RG] Other people use their equipment to communicate and play games (that's what I see when I use commuters). Unless gaming pictures are rendered on a server or live pictures of communicating persons are streamed, there should be no bursts. Still I miss the consecutively narrowing bottlenecks with queues being built at each instance with a likelihood justifying major engineering and deployment efforts. Any solution for Best Effort service which is TCP friendly and support scommunication expecting no congestion at the same time should be easy to deploy and come with obvious benefits. 

[RG] I found Sebastian's response sound. I think, there are people interested in avoiding congestion at their access.

[RG] I'd like to repeat again what's important to me: no corner case engineering. Is there something to be added to Sebastian's scenario?

 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05  9:35       ` Ruediger.Geib
@ 2019-08-05 10:59         ` Jonathan Morton
  2019-08-05 12:16           ` Ruediger.Geib
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Morton @ 2019-08-05 10:59 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: tcpm, ecn-sane, tsvwg

> [JM] A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:
> 
> [RG] deleted. Please read https://tools.ietf.org/html/rfc5127#page-3 , first two sentences. That's a sound starting point, and I don't think much has changed since 2005. 

As I said, that reference is *usually* true for *responsible* ISPs.  Not all ISPs, however, are responsible vis a vis their subscribers, as opposed to their shareholders.  There have been some high-profile incidents of *deliberately* inadequate peering arrangements in the USA (often involving Netflix vs major cable networks, for example), and consumer ISPs in the UK *typically* have diurnal cycles of general congestion due to under-investment in the high-speed segments of their network.

To say nothing of what goes on in Asia Minor and Africa, where demand routinely far outstrips supply.  In those areas, solutions to make the best use of limited capacity would doubtless be welcomed.

> [RG] About the bursts to expect, it's probably worth noting that today's most popular application generating traffic bursts is watching video clips streamed over the Internet. Viewers dislike the movies to stall. My impression is, all major CDNs are aware of that and try their best to avoid this situation. In particular, I don't expect streaming bursts to overwhelm access link shaper buffers by design. And that, I think, limits burst sizes of the majority of traffic.

In my personal experience with YouTube, to pick a major video streaming service not-at-all at random, the bursts last several seconds and are essentially ack-clocked.  It's just a high/low watermark system in the receiving client's buffer; when it's full, it tells the server to stop sending, and after it drains a bit it tells the server to start again.  When traffic is flowing, it's no different from any other bulk flow (aside from the use of BBR instead of CUBIC or Reno) and can be managed in the same way.

The timescale I'm talking about, on the other hand, is sub-RTT.  Packet intervals may be counted in microseconds at origin, then gradually spaced out into the millisecond range as they traverse the successive bottlenecks en route.  As I mentioned, there are several circumstances when today's servers emit line-rate bursts of traffic; these can also result from aggregation in certain link types (particularly wifi), and hardware offload engines which try to treat multiple physical packets from the same flow as one.  This then results in transient queuing delays as the next bottleneck spaces them out again.

When several such bursts coincide at a single bottleneck, moreover, the queuing required to accommodate them may be as much as their sum.  This "incast effect" is particularly relevant in datacentres, which routinely produce synchronised bursts of traffic as responses to distributed queries, but can also occur in ordinary web traffic when multiple servers are involved in a single page load.  IW10 does not mean you only need 10 packets of buffer space, and many CDNs are in fact using even larger IWs as well.

These effects really do exist; we have measured them in the real world, reproduced them in lab conditions, and designed qdiscs to accommodate them as cleanly as possible.  The question is to what extent they are relevant to the design of a particular technology or deployment; some will be much more sensitive than others.  The only way to be sure of the answer is to be aware, and do the appropriate maths.

> [RG] Other people use their equipment to communicate and play games

These are examples of traffic that would be sensitive to the delay from transient queuing caused by other traffic.  The most robust answer here is to implement FQ at each such queue.  Other solutions may also exist.

> Any solution for Best Effort service which is TCP friendly and support scommunication expecting no congestion at the same time should be easy to deploy and come with obvious benefits. 

Well, obviously.  Although not everyone remembers this at design time.

> [RG] I found Sebastian's response sound. I think, there are people interested in avoiding congestion at their access.

> the access link is the bottleneck, that's what's to be expected.

It is typically *a* bottleneck, but there can be more than one from the viewpoint of a line-rate burst.

> [RG] I'd like to repeat again what's important to me: no corner case engineering. Is there something to be added to Sebastian's scenario?

He makes an essentially similar point to mine, from a different perspective.  Hopefully the additional context provided above is enlightening.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05  7:26       ` Ruediger.Geib
@ 2019-08-05 11:00         ` Sebastian Moeller
  2019-08-05 11:47           ` Ruediger.Geib
  0 siblings, 1 reply; 22+ messages in thread
From: Sebastian Moeller @ 2019-08-05 11:00 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: tcpm, ECN-Sane, tsvwg

Hi Ruediger,


> On Aug 5, 2019, at 09:26, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Sebastian,
> 
> the access link is the bottleneck, that's what's to be expected.

	Mostly, then again there are situations with 1Gbps Plans where it is not the actual access link, but rather the CPEs Gigabit ethernet LAN ports that are the true bottleneck, but that does not substantially change the issue, it is still the upstream shaper/policer that needs to be worked around.

> As far as I know, in the operator world shapers here by and large removed policers.

	Good to know, shapers are somewhat nicer to user traffic than hard policers, at least that is my interpretation.

> 
> A consecutive chain of narrower links results, if the Home Gateway runs with an additional ingress or egress shaper operating below the access bandwidth, if I get you right. 

	Yes, as you state below, this only is true for the ingress direction, egress shaping works reliably and is typically not suffering from this. That said, if the egress link bandwidth is larger than a servers connection this issue can appear also for the egress direction. For example overly hot peering/transit links can cause downstream bottlenecks considerably narrower than the internet access link's upload direction, but that, while unfortunate, is not at the core of my issue.

> 
> I understand that you aren't interested in having 300ms buffer delay and may be some jitter for a phone conversation using best effort transport.

	+1

> A main driver for changes in consumer IP access features in Germany are publications of journals and regulators comparing IP access performance of different providers.

	Good to know,

> Should one provider have an advantage over the others by deploying a solution as you (and Bob's team) work on, it likely will be generally deployed.

	I do not believe that these mechanisms are actually in play in the German market, as an example for roughly a decade the DOCSIS-ISPs offer higher bandwidth for same or less money than the incumbent telco and yet only managed to get 30% of the customers of their ~75% of possible customers, so only 75*0.3 = 22.5 % market share, with the incumbent only reaching 250/40 for the masses while the DOCSIS ISPs offer 1000/50. And unlike latency, bandwidth (or rather rate) is the number that consumers understand intuitively.
	If anything will expedite the roll-out of L4S style AQMs it is the capability to use those to implement the "special services" that the EU net neutrality regulation explicitly allows, as that is a product that can be actually sold to customers, but I might be too pessimistic here.

> 
> As far as I can see, latency aware consumers still are a minority and gamers seem to be a big group belonging here. Interest in well performing gaming seems to be growing, I guess (for me at least it's an impression rather than a clear trend).

	Put that way, I see a way for ISPs to distinguish themselves from the rest by being gaming friendly, but unless this results in gamers paying more I fail to see the business case that management probably needs before green-lighting the funds required to implement this. This is where cablelabs approach to mandate this in the specs is brilliant. 

> 
> I'd personally prefer an easy to deploy and operate standard solution offering Best Effort based transport being TCP friendly and at the same time congestion free for other flows at a BNG for traffic in access direction (and for similar devices in other architectures of course). 
> 
> Fighting bufferbloat in the upstream direction the way you describe it doesn't construct a chain of links which are consecutively narrower than the bottleneck link, I think.

	Yes, fully agreed, that said, and ISPs CPE should implement an AQM to really solve the latency issues for end-users. The initial L4S paper side-stepped that requirement by making sure the uplinks were not saturated during the test, and state that that needs a real solution for proper roll-out. In theory the ISP could do the uplink shaping on its end (and to constrain users to their contracted rates, ISPs do this already) but as in the downstream case, running an AQM in front of a bottleneck as opposed to behind it makes everything much easier. Also with uplinks typically << downlinks, the typically weak CPE CPUs will still be able to AQM the uplink, nicely distributing that computation load away from the BNG/BRAS big iron ....


Best Regards
	Sebastian

> 
> Regards,
> 
> Ruediger
> 
> 
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Sebastian Moeller <moeller0@gmx.de> 
> Gesendet: Freitag, 2. August 2019 15:15
> An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
> Cc: Jonathan Morton <chromatix99@gmail.com>; tcpm@ietf.org; ECN-Sane <ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org
> Betreff: Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
> 
> Hi Ruediger,
> 
> 
> 
>> On Aug 2, 2019, at 10:29, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
>> 
>> Hi Jonathan,
>> 
>> could you provide a real world example of links which are consecutively narrower than sender access links?
> 
> 	Just an example from a network you might be comfortable with, in DTAGs internet access network there typically are traffic limiting elements at the BNGs (or at the BRAS for the legacy network), I am not 100% sure whether these are implemented as policers or shapers, but they tended to come with >= 300ms buffering. Since recently, the BNG/BRAS traffic shaper's use the message field of the PPPoE Auth ACK to transfer information about the TCP/IPv4 Goodput endusers can expect on their link as a consequence of the BNG/BRAS"s traffic limiter. In DOCSIS and GPON networks the traffic shaper seems mandated by the standards, in DSL networks it seems optional (but there even without a shaper the limited bandwidth of the access link would be a natural traffic choke point).
> Fritzbox home router's now use this information to automatically set egress (and I believe also) ingress traffic shaping on the CPE to reduce the bufferbloat users experience. I have no insight in what Telekom's own Speedport routers do, but I would not be surprised if they would do the same (at least for egress). 
> 	As Jonathan and Dave mentioned, quite a number of end-users, especially the latency sensitive ones, employ their own ingress and egress traffic shapers at their home routers as the 300ms buffers of the BNG's are just not acceptable for any real-timish uses (VoIP, on-line twitch gaming, even for interactive sessions like ssh 300ms delay are undesirable). E.g. personally, I use an OpenWrt router with an FQ AQM for both ingress and egress (based on Jonathan's excellent cake qdisc) that allows a family of 5 to happily share a 50/10 connection between video streaming and interactive use with very little interference between the users, the same link with out the FQ-AQM active makes interactive applications feel like submerged in molasses once the link gets saturated...
> 	As far as I can tell there is a number of different solutions that offer home-router based bi-directional traffic shaping to solve bufferbloat" from home (well, not fully solve it, but remedy its consequences), including commercial options like evenroute's iqrouter, and open-source options like OpenWrt (with sqm-scripts as shaper packet). 
> 	It is exactly this use case and the fact that latency-sensitive users often opt for this solution, that causes me to ask the L4S crowd to actually measure the effect of L4S on RFC3168-FQ-AQMs in the exact configuration it is actually used today, to remedy the same issue L4S wants to tackle.
> 
> Best Regards
> 	Sebastian
> 
> 
>> 
>> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
>> 
>> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 
>> 
>> Regards,
>> 
>> Ruediger 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Jonathan Morton
>> Gesendet: Dienstag, 9. Juli 2019 17:41
>> An: Bob Briscoe <ietf@bobbriscoe.net>
>> Cc: tcpm IETF list <tcpm@ietf.org>; ecn-sane@lists.bufferbloat.net; tsvwg IETF list <tsvwg@ietf.org>
>> Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
>> 
>>> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>> 
>>>     1.  It is quite unusual to experience queuing at more than one
>>>         bottleneck on the same path (the available capacities have to
>>>         be identical).
>> 
>> Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.
>> 
>> Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.
>> 
>> TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.
>> 
>> - Jonathan Morton
>> 
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05 11:00         ` Sebastian Moeller
@ 2019-08-05 11:47           ` Ruediger.Geib
  2019-08-05 13:47             ` Sebastian Moeller
  0 siblings, 1 reply; 22+ messages in thread
From: Ruediger.Geib @ 2019-08-05 11:47 UTC (permalink / raw)
  To: moeller0; +Cc: tcpm, ecn-sane, tsvwg

Hi Sebastian,

thanks. Three more remarks:

I'm happy for any AQM design which comes at low implementation cost and allows me to add value to network operation (be it saving cost, be it enabling value added services). And I think the representatives of other operators are so too.

For most consumers, streaming is the most bandwidth hungry application. I think someone now working for Google published research, that at the time when Internet access bandwidth no longer had an impact on the streaming quality, consumers started to lose interest in "access speed" as an important measure of quality of their Internet access. I think it takes a while until John Doe requires a n*100 Mbit/s Internet access, because any access below 100 Mbit/s causes congestion for the services consumed by John. 

I see many people around me conveniently use their smartphone to access the Internet. A handheld display likely requires less bandwidth for an acceptable display quality, than a large screen. That doesn't say, the latter disappear. But maybe only one or two of them need to be served per consumer access. That will work with 100 Mbit/s or less for a while (other bandwidth hungry applications will arrive some day; I prefer the copper access lines in the ground to be replaced by fiber ones).
 
Regards, Ruediger

-----Ursprüngliche Nachricht-----
Von: Sebastian Moeller <moeller0@gmx.de> 
Gesendet: Montag, 5. August 2019 13:00
An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
Cc: tcpm@ietf.org; ECN-Sane <ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org
Betreff: Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S

Hi Ruediger,


> On Aug 5, 2019, at 09:26, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Sebastian,
> 
> the access link is the bottleneck, that's what's to be expected.

	Mostly, then again there are situations with 1Gbps Plans where it is not the actual access link, but rather the CPEs Gigabit ethernet LAN ports that are the true bottleneck, but that does not substantially change the issue, it is still the upstream shaper/policer that needs to be worked around.

> As far as I know, in the operator world shapers here by and large removed policers.

	Good to know, shapers are somewhat nicer to user traffic than hard policers, at least that is my interpretation.

> 
> A consecutive chain of narrower links results, if the Home Gateway runs with an additional ingress or egress shaper operating below the access bandwidth, if I get you right. 

	Yes, as you state below, this only is true for the ingress direction, egress shaping works reliably and is typically not suffering from this. That said, if the egress link bandwidth is larger than a servers connection this issue can appear also for the egress direction. For example overly hot peering/transit links can cause downstream bottlenecks considerably narrower than the internet access link's upload direction, but that, while unfortunate, is not at the core of my issue.

> 
> I understand that you aren't interested in having 300ms buffer delay and may be some jitter for a phone conversation using best effort transport.

	+1

> A main driver for changes in consumer IP access features in Germany are publications of journals and regulators comparing IP access performance of different providers.

	Good to know,

> Should one provider have an advantage over the others by deploying a solution as you (and Bob's team) work on, it likely will be generally deployed.

	I do not believe that these mechanisms are actually in play in the German market, as an example for roughly a decade the DOCSIS-ISPs offer higher bandwidth for same or less money than the incumbent telco and yet only managed to get 30% of the customers of their ~75% of possible customers, so only 75*0.3 = 22.5 % market share, with the incumbent only reaching 250/40 for the masses while the DOCSIS ISPs offer 1000/50. And unlike latency, bandwidth (or rather rate) is the number that consumers understand intuitively.
	If anything will expedite the roll-out of L4S style AQMs it is the capability to use those to implement the "special services" that the EU net neutrality regulation explicitly allows, as that is a product that can be actually sold to customers, but I might be too pessimistic here.

> 
> As far as I can see, latency aware consumers still are a minority and gamers seem to be a big group belonging here. Interest in well performing gaming seems to be growing, I guess (for me at least it's an impression rather than a clear trend).

	Put that way, I see a way for ISPs to distinguish themselves from the rest by being gaming friendly, but unless this results in gamers paying more I fail to see the business case that management probably needs before green-lighting the funds required to implement this. This is where cablelabs approach to mandate this in the specs is brilliant. 

> 
> I'd personally prefer an easy to deploy and operate standard solution offering Best Effort based transport being TCP friendly and at the same time congestion free for other flows at a BNG for traffic in access direction (and for similar devices in other architectures of course). 
> 
> Fighting bufferbloat in the upstream direction the way you describe it doesn't construct a chain of links which are consecutively narrower than the bottleneck link, I think.

	Yes, fully agreed, that said, and ISPs CPE should implement an AQM to really solve the latency issues for end-users. The initial L4S paper side-stepped that requirement by making sure the uplinks were not saturated during the test, and state that that needs a real solution for proper roll-out. In theory the ISP could do the uplink shaping on its end (and to constrain users to their contracted rates, ISPs do this already) but as in the downstream case, running an AQM in front of a bottleneck as opposed to behind it makes everything much easier. Also with uplinks typically << downlinks, the typically weak CPE CPUs will still be able to AQM the uplink, nicely distributing that computation load away from the BNG/BRAS big iron ....


Best Regards
	Sebastian

> 
> Regards,
> 
> Ruediger
> 
> 
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Sebastian Moeller <moeller0@gmx.de> 
> Gesendet: Freitag, 2. August 2019 15:15
> An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
> Cc: Jonathan Morton <chromatix99@gmail.com>; tcpm@ietf.org; ECN-Sane <ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org
> Betreff: Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
> 
> Hi Ruediger,
> 
> 
> 
>> On Aug 2, 2019, at 10:29, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
>> 
>> Hi Jonathan,
>> 
>> could you provide a real world example of links which are consecutively narrower than sender access links?
> 
> 	Just an example from a network you might be comfortable with, in DTAGs internet access network there typically are traffic limiting elements at the BNGs (or at the BRAS for the legacy network), I am not 100% sure whether these are implemented as policers or shapers, but they tended to come with >= 300ms buffering. Since recently, the BNG/BRAS traffic shaper's use the message field of the PPPoE Auth ACK to transfer information about the TCP/IPv4 Goodput endusers can expect on their link as a consequence of the BNG/BRAS"s traffic limiter. In DOCSIS and GPON networks the traffic shaper seems mandated by the standards, in DSL networks it seems optional (but there even without a shaper the limited bandwidth of the access link would be a natural traffic choke point).
> Fritzbox home router's now use this information to automatically set egress (and I believe also) ingress traffic shaping on the CPE to reduce the bufferbloat users experience. I have no insight in what Telekom's own Speedport routers do, but I would not be surprised if they would do the same (at least for egress). 
> 	As Jonathan and Dave mentioned, quite a number of end-users, especially the latency sensitive ones, employ their own ingress and egress traffic shapers at their home routers as the 300ms buffers of the BNG's are just not acceptable for any real-timish uses (VoIP, on-line twitch gaming, even for interactive sessions like ssh 300ms delay are undesirable). E.g. personally, I use an OpenWrt router with an FQ AQM for both ingress and egress (based on Jonathan's excellent cake qdisc) that allows a family of 5 to happily share a 50/10 connection between video streaming and interactive use with very little interference between the users, the same link with out the FQ-AQM active makes interactive applications feel like submerged in molasses once the link gets saturated...
> 	As far as I can tell there is a number of different solutions that offer home-router based bi-directional traffic shaping to solve bufferbloat" from home (well, not fully solve it, but remedy its consequences), including commercial options like evenroute's iqrouter, and open-source options like OpenWrt (with sqm-scripts as shaper packet). 
> 	It is exactly this use case and the fact that latency-sensitive users often opt for this solution, that causes me to ask the L4S crowd to actually measure the effect of L4S on RFC3168-FQ-AQMs in the exact configuration it is actually used today, to remedy the same issue L4S wants to tackle.
> 
> Best Regards
> 	Sebastian
> 
> 
>> 
>> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
>> 
>> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 
>> 
>> Regards,
>> 
>> Ruediger 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Jonathan Morton
>> Gesendet: Dienstag, 9. Juli 2019 17:41
>> An: Bob Briscoe <ietf@bobbriscoe.net>
>> Cc: tcpm IETF list <tcpm@ietf.org>; ecn-sane@lists.bufferbloat.net; tsvwg IETF list <tsvwg@ietf.org>
>> Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
>> 
>>> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>> 
>>>     1.  It is quite unusual to experience queuing at more than one
>>>         bottleneck on the same path (the available capacities have to
>>>         be identical).
>> 
>> Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.
>> 
>> Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.
>> 
>> TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.
>> 
>> - Jonathan Morton
>> 
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05 10:59         ` Jonathan Morton
@ 2019-08-05 12:16           ` Ruediger.Geib
  2019-08-05 15:55             ` Jonathan Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Ruediger.Geib @ 2019-08-05 12:16 UTC (permalink / raw)
  To: chromatix99; +Cc: tcpm, ecn-sane, tsvwg

As I said, no corner-case engineering. On my vacation site abroad, Netflix worked as well as it does at home. Germany is often criticised as a laggard in developing a competitive broadband access market. My interest in IETF work being set up to work around commercial problems between ISPs and ISPs or CDNs in a rich country like the UK or US is low. 

If your technical standardization activities help to make using the Internet more convenient in African countries without raising extra cost or requiring extra skills, I'm sure that's a fair market. I know that these countries are struggling to improve the services operated in their networks. Conditions there (and some Asian countries) differ strongly from those in many other parts of the world.
It might be a good thing to clarify under which scenarios the problem solutions you work on create the most significant benefit.

In the market that I am aware of, there's a single regular bottleneck, which is the consumer access or terminal.

-----Ursprüngliche Nachricht-----
Von: Jonathan Morton <chromatix99@gmail.com> 
Gesendet: Montag, 5. August 2019 13:00
An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
Cc: tcpm@ietf.org; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S

> [JM] A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:
> 
> [RG] deleted. Please read https://tools.ietf.org/html/rfc5127#page-3 , first two sentences. That's a sound starting point, and I don't think much has changed since 2005. 

As I said, that reference is *usually* true for *responsible* ISPs.  Not all ISPs, however, are responsible vis a vis their subscribers, as opposed to their shareholders.  There have been some high-profile incidents of *deliberately* inadequate peering arrangements in the USA (often involving Netflix vs major cable networks, for example), and consumer ISPs in the UK *typically* have diurnal cycles of general congestion due to under-investment in the high-speed segments of their network.

To say nothing of what goes on in Asia Minor and Africa, where demand routinely far outstrips supply.  In those areas, solutions to make the best use of limited capacity would doubtless be welcomed.

> [RG] About the bursts to expect, it's probably worth noting that today's most popular application generating traffic bursts is watching video clips streamed over the Internet. Viewers dislike the movies to stall. My impression is, all major CDNs are aware of that and try their best to avoid this situation. In particular, I don't expect streaming bursts to overwhelm access link shaper buffers by design. And that, I think, limits burst sizes of the majority of traffic.

In my personal experience with YouTube, to pick a major video streaming service not-at-all at random, the bursts last several seconds and are essentially ack-clocked.  It's just a high/low watermark system in the receiving client's buffer; when it's full, it tells the server to stop sending, and after it drains a bit it tells the server to start again.  When traffic is flowing, it's no different from any other bulk flow (aside from the use of BBR instead of CUBIC or Reno) and can be managed in the same way.

The timescale I'm talking about, on the other hand, is sub-RTT.  Packet intervals may be counted in microseconds at origin, then gradually spaced out into the millisecond range as they traverse the successive bottlenecks en route.  As I mentioned, there are several circumstances when today's servers emit line-rate bursts of traffic; these can also result from aggregation in certain link types (particularly wifi), and hardware offload engines which try to treat multiple physical packets from the same flow as one.  This then results in transient queuing delays as the next bottleneck spaces them out again.

When several such bursts coincide at a single bottleneck, moreover, the queuing required to accommodate them may be as much as their sum.  This "incast effect" is particularly relevant in datacentres, which routinely produce synchronised bursts of traffic as responses to distributed queries, but can also occur in ordinary web traffic when multiple servers are involved in a single page load.  IW10 does not mean you only need 10 packets of buffer space, and many CDNs are in fact using even larger IWs as well.

These effects really do exist; we have measured them in the real world, reproduced them in lab conditions, and designed qdiscs to accommodate them as cleanly as possible.  The question is to what extent they are relevant to the design of a particular technology or deployment; some will be much more sensitive than others.  The only way to be sure of the answer is to be aware, and do the appropriate maths.

> [RG] Other people use their equipment to communicate and play games

These are examples of traffic that would be sensitive to the delay from transient queuing caused by other traffic.  The most robust answer here is to implement FQ at each such queue.  Other solutions may also exist.

> Any solution for Best Effort service which is TCP friendly and support scommunication expecting no congestion at the same time should be easy to deploy and come with obvious benefits. 

Well, obviously.  Although not everyone remembers this at design time.

> [RG] I found Sebastian's response sound. I think, there are people interested in avoiding congestion at their access.

> the access link is the bottleneck, that's what's to be expected.

It is typically *a* bottleneck, but there can be more than one from the viewpoint of a line-rate burst.

> [RG] I'd like to repeat again what's important to me: no corner case engineering. Is there something to be added to Sebastian's scenario?

He makes an essentially similar point to mine, from a different perspective.  Hopefully the additional context provided above is enlightening.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05 11:47           ` Ruediger.Geib
@ 2019-08-05 13:47             ` Sebastian Moeller
  2019-08-06  9:49               ` Mikael Abrahamsson
  0 siblings, 1 reply; 22+ messages in thread
From: Sebastian Moeller @ 2019-08-05 13:47 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: tcpm, ECN-Sane, tsvwg IETF list

Hello Ruediger,


> On Aug 5, 2019, at 13:47, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Hi Sebastian,
> 
> thanks. Three more remarks:
> 
> I'm happy for any AQM design which comes at low implementation cost and allows me to add value to network operation (be it saving cost, be it enabling value added services). And I think the representatives of other operators are so too.

	Yes, that is my premise too. The only cost saving opportunity I can see in both proposals would be if they allowed to run a fully saturated network without the adverse effects on latency and loss. Value added services, sure for the few users sensitive to latency under load. Maybe with the PR the 5G roll-out is getting in regards to low-latency it might be possible to convince more consumers that this is actually valuable?

> 
> For most consumers, streaming is the most bandwidth hungry application.

	I have no statistical numbers on this topic, but on the list of things that cause issues for latency sensitive home networks the following items come up repeatedly:
A) Streaming in (Youtube, Netflix, Amazon, ...) 
B) Streamin out (aka twitch and friends)
C) File sharing with bittorrent (a slight challenge for FQ-AQMs die to lots of parallel flows and a misdesigned back-off mechanism (react to 100ms induced latency under load?))
D) OS updates, (especially windows update when lveraging P2P technolgy from a close by CDN was/is notorious for being a tad too aggressive)

How these stack up proportionally to each other, I have no clue. Typically the reports are that perceived interactivity in FPS games goes down the drain if and combination of A-D are concurrently active.


> I think someone now working for Google published research, that at the time when Internet access bandwidth no longer had an impact on the streaming quality, consumers started to lose interest in "access speed" as an important measure of quality of their Internet access. I think it takes a while until John Doe requires a n*100 Mbit/s Internet access, because any access below 100 Mbit/s causes congestion for the services consumed by John. 

	That is a good question, as far as I see it, ISPs in Germany still seem to leverage access-rates as their main attraction (giga this and giga that), even though as you note higher rates have diminishing returns for most use-cases.

> 
> I see many people around me conveniently use their smartphone to access the Internet.

	So do I, but I realize how laggy this feels even for "simple" browsing duty (but I accept that for the ease of use and immediacy)... And it is not guaranteed that the smartphone uses the mobile network, it might just as well use wifi. But the latency/bottleneck issue also exists with smartphones, where the variable bandwidth nature of the radios and the opaqueness of 2-4G modems makes things even less enjoyable than on fixed networks (I see multi second stalls when browsing on a phone, versus 300ms on the fixed line without my AQM.

> A handheld display likely requires less bandwidth for an acceptable display quality, than a large screen.

	For browsing, I am not sure that is a real point, given that smartphone display resolution crossed from high to ridiculous some years ago (at least without glasses).

> That doesn't say, the latter disappear. But maybe only one or two of them need to be served per consumer access. That will work with 100 Mbit/s or less for a while

	I agree, I run a family of 5 on a 50/10 link including concurrent streaming (in SD) and thanks to employing a competent FQ-AQM on my router (ingress & egress) this works quite well even with interactive sessions. But without that AQM system the link feels noticeably worse... (and this is the reason why I want to see data, that L4S senders will not invalidate the effectiveness of my setup)

Best Regards
	Sebastian



> (other bandwidth hungry applications will arrive some day; I prefer the copper access lines in the ground to be replaced by fiber ones)
> 
> Regards, Ruediger
> 
> -----Ursprüngliche Nachricht-----
> Von: Sebastian Moeller <moeller0@gmx.de> 
> Gesendet: Montag, 5. August 2019 13:00
> An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
> Cc: tcpm@ietf.org; ECN-Sane <ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org
> Betreff: Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
> 
> Hi Ruediger,
> 
> 
>> On Aug 5, 2019, at 09:26, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
>> 
>> Hi Sebastian,
>> 
>> the access link is the bottleneck, that's what's to be expected.
> 
> 	Mostly, then again there are situations with 1Gbps Plans where it is not the actual access link, but rather the CPEs Gigabit ethernet LAN ports that are the true bottleneck, but that does not substantially change the issue, it is still the upstream shaper/policer that needs to be worked around.
> 
>> As far as I know, in the operator world shapers here by and large removed policers.
> 
> 	Good to know, shapers are somewhat nicer to user traffic than hard policers, at least that is my interpretation.
> 
>> 
>> A consecutive chain of narrower links results, if the Home Gateway runs with an additional ingress or egress shaper operating below the access bandwidth, if I get you right. 
> 
> 	Yes, as you state below, this only is true for the ingress direction, egress shaping works reliably and is typically not suffering from this. That said, if the egress link bandwidth is larger than a servers connection this issue can appear also for the egress direction. For example overly hot peering/transit links can cause downstream bottlenecks considerably narrower than the internet access link's upload direction, but that, while unfortunate, is not at the core of my issue.
> 
>> 
>> I understand that you aren't interested in having 300ms buffer delay and may be some jitter for a phone conversation using best effort transport.
> 
> 	+1
> 
>> A main driver for changes in consumer IP access features in Germany are publications of journals and regulators comparing IP access performance of different providers.
> 
> 	Good to know,
> 
>> Should one provider have an advantage over the others by deploying a solution as you (and Bob's team) work on, it likely will be generally deployed.
> 
> 	I do not believe that these mechanisms are actually in play in the German market, as an example for roughly a decade the DOCSIS-ISPs offer higher bandwidth for same or less money than the incumbent telco and yet only managed to get 30% of the customers of their ~75% of possible customers, so only 75*0.3 = 22.5 % market share, with the incumbent only reaching 250/40 for the masses while the DOCSIS ISPs offer 1000/50. And unlike latency, bandwidth (or rather rate) is the number that consumers understand intuitively.
> 	If anything will expedite the roll-out of L4S style AQMs it is the capability to use those to implement the "special services" that the EU net neutrality regulation explicitly allows, as that is a product that can be actually sold to customers, but I might be too pessimistic here.
> 
>> 
>> As far as I can see, latency aware consumers still are a minority and gamers seem to be a big group belonging here. Interest in well performing gaming seems to be growing, I guess (for me at least it's an impression rather than a clear trend).
> 
> 	Put that way, I see a way for ISPs to distinguish themselves from the rest by being gaming friendly, but unless this results in gamers paying more I fail to see the business case that management probably needs before green-lighting the funds required to implement this. This is where cablelabs approach to mandate this in the specs is brilliant. 
> 
>> 
>> I'd personally prefer an easy to deploy and operate standard solution offering Best Effort based transport being TCP friendly and at the same time congestion free for other flows at a BNG for traffic in access direction (and for similar devices in other architectures of course). 
>> 
>> Fighting bufferbloat in the upstream direction the way you describe it doesn't construct a chain of links which are consecutively narrower than the bottleneck link, I think.
> 
> 	Yes, fully agreed, that said, and ISPs CPE should implement an AQM to really solve the latency issues for end-users. The initial L4S paper side-stepped that requirement by making sure the uplinks were not saturated during the test, and state that that needs a real solution for proper roll-out. In theory the ISP could do the uplink shaping on its end (and to constrain users to their contracted rates, ISPs do this already) but as in the downstream case, running an AQM in front of a bottleneck as opposed to behind it makes everything much easier. Also with uplinks typically << downlinks, the typically weak CPE CPUs will still be able to AQM the uplink, nicely distributing that computation load away from the BNG/BRAS big iron ....
> 
> 
> Best Regards
> 	Sebastian
> 
>> 
>> Regards,
>> 
>> Ruediger
>> 
>> 
>> 
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Sebastian Moeller <moeller0@gmx.de> 
>> Gesendet: Freitag, 2. August 2019 15:15
>> An: Geib, Rüdiger <Ruediger.Geib@telekom.de>
>> Cc: Jonathan Morton <chromatix99@gmail.com>; tcpm@ietf.org; ECN-Sane <ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org
>> Betreff: Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
>> 
>> Hi Ruediger,
>> 
>> 
>> 
>>> On Aug 2, 2019, at 10:29, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
>>> 
>>> Hi Jonathan,
>>> 
>>> could you provide a real world example of links which are consecutively narrower than sender access links?
>> 
>> 	Just an example from a network you might be comfortable with, in DTAGs internet access network there typically are traffic limiting elements at the BNGs (or at the BRAS for the legacy network), I am not 100% sure whether these are implemented as policers or shapers, but they tended to come with >= 300ms buffering. Since recently, the BNG/BRAS traffic shaper's use the message field of the PPPoE Auth ACK to transfer information about the TCP/IPv4 Goodput endusers can expect on their link as a consequence of the BNG/BRAS"s traffic limiter. In DOCSIS and GPON networks the traffic shaper seems mandated by the standards, in DSL networks it seems optional (but there even without a shaper the limited bandwidth of the access link would be a natural traffic choke point).
>> Fritzbox home router's now use this information to automatically set egress (and I believe also) ingress traffic shaping on the CPE to reduce the bufferbloat users experience. I have no insight in what Telekom's own Speedport routers do, but I would not be surprised if they would do the same (at least for egress). 
>> 	As Jonathan and Dave mentioned, quite a number of end-users, especially the latency sensitive ones, employ their own ingress and egress traffic shapers at their home routers as the 300ms buffers of the BNG's are just not acceptable for any real-timish uses (VoIP, on-line twitch gaming, even for interactive sessions like ssh 300ms delay are undesirable). E.g. personally, I use an OpenWrt router with an FQ AQM for both ingress and egress (based on Jonathan's excellent cake qdisc) that allows a family of 5 to happily share a 50/10 connection between video streaming and interactive use with very little interference between the users, the same link with out the FQ-AQM active makes interactive applications feel like submerged in molasses once the link gets saturated...
>> 	As far as I can tell there is a number of different solutions that offer home-router based bi-directional traffic shaping to solve bufferbloat" from home (well, not fully solve it, but remedy its consequences), including commercial options like evenroute's iqrouter, and open-source options like OpenWrt (with sqm-scripts as shaper packet). 
>> 	It is exactly this use case and the fact that latency-sensitive users often opt for this solution, that causes me to ask the L4S crowd to actually measure the effect of L4S on RFC3168-FQ-AQMs in the exact configuration it is actually used today, to remedy the same issue L4S wants to tackle.
>> 
>> Best Regards
>> 	Sebastian
>> 
>> 
>>> 
>>> I could figure out a small campus network which has a bottleneck at the Internet access and a second one connecting the terminal equipment. But in a small campus network, the individual terminal could very well have a higher LAN access bandwidth, than the campus - Internet connection (and then there's only one bottleneck again).
>>> 
>>> There may be a tradeoff between simplicity and general applicability. Awareness of that tradeoff is important. To me, simplicity is the design aim. 
>>> 
>>> Regards,
>>> 
>>> Ruediger 
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Jonathan Morton
>>> Gesendet: Dienstag, 9. Juli 2019 17:41
>>> An: Bob Briscoe <ietf@bobbriscoe.net>
>>> Cc: tcpm IETF list <tcpm@ietf.org>; ecn-sane@lists.bufferbloat.net; tsvwg IETF list <tsvwg@ietf.org>
>>> Betreff: Re: [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
>>> 
>>>> On 13 Jun, 2019, at 7:48 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>>> 
>>>>    1.  It is quite unusual to experience queuing at more than one
>>>>        bottleneck on the same path (the available capacities have to
>>>>        be identical).
>>> 
>>> Following up on David Black's comments, I'd just like to note that the above is not the true criterion for multiple sequential queuing.
>>> 
>>> Many existing TCP senders are unpaced (aside from ack-clocking), including FreeBSD, resulting in potentially large line-rate bursts at the origin - especially during slow-start.  Even in congestion avoidance, each ack will trigger a closely spaced packet pair (or sometimes a triplet).  It is then easy to imagine, or to build a testbed containing, an arbitrarily long sequence of consecutively narrower links; upon entering each, the burst of packets will briefly collect in a queue and then be paced out at the new rate.
>>> 
>>> TCP pacing does largely eliminate these bursts when implemented correctly.  However, Linux' pacing and IW is specifically (and apparently deliberately) set up to issue a 10-packet line-rate burst on startup.  This effect has shown up in SCE tests to the point where we had to patch this behaviour out of the sending kernel to prevent an instant exit from slow-start.
>>> 
>>> - Jonathan Morton
>>> 
>>> _______________________________________________
>>> Ecn-sane mailing list
>>> Ecn-sane@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/ecn-sane
>> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05 12:16           ` Ruediger.Geib
@ 2019-08-05 15:55             ` Jonathan Morton
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Morton @ 2019-08-05 15:55 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: tcpm, ecn-sane, tsvwg

> On 5 Aug, 2019, at 3:16 pm, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> As I said, no corner-case engineering.

> In the market that I am aware of, there's a single regular bottleneck, which is the consumer access or terminal.

Let me explain one more time: this is *not* a corner case.  This is normal operation on today's Internet; faster core networks feed into larger numbers of slower edge networks in several stages.  If you haven't noticed it yourself, that's fortunate for you - but that "single regular bottleneck" is an illusory simplification, which only applies at relatively long timescales.

However, high-fidelity ECN markers *will* notice the difference, and the resulting congestion signals will appear at the receiver and be fed back to the sender.  That represents an engineering problem for which either a solution must be devised, or the fact that it is not a problem must be established.  At present, we're still working through the maths to determine the boundaries of acceptable operation.

It's even possible that conventional RFC-3168 AQMs notice it as well, depending on their design.  Codel, for example, is specifically designed to ignore transient queuing (if it persists for less than one 'interval', which is taken as an estimate of the RTT) and only act when a persistent standing queue exists.

And it's actually a very important problem when a dumb FIFO bottleneck is immediately followed by a slightly narrower AQM bottleneck.  In this case the dumb FIFO dilutes the benefit of the AQM significantly, because the AQM can't see that most of the queue exists in the upstream FIFO, and even if it could, it cannot apply any intelligence to it.  This is the scenario Sebastian is most concerned about, and for which I have tried to compensate in Cake with "ingress mode" shaping - because Cake is specifically designed to be deployed into that topology.

Most of the problem would go away if that dumb FIFO were merely replaced by a simple AQM.  This is a straightforward & deployable engineering solution that has been known for many years, and yet almost nobody has actually deployed it.  I would urge you to do what you can on that front; judging by your e-mail address, you probably have more influence where it counts than I do.

I repeat: this is normal operation on today's Internet.

> If your technical standardization activities help to make using the Internet more convenient in African countries without raising extra cost or requiring extra skills, I'm sure that's a fair market. I know that these countries are struggling to improve the services operated in their networks.

May I refer you to some useful work already done and widely deployed?  This is now the default in most Linux and Apple devices, and is also available in FreeBSD.  It is used by some of the better CPE devices as part of "Advanced QoS" and "Airtime Fairness", eg. in the Netgear R7000.

	https://tools.ietf.org/html/rfc8289
	https://tools.ietf.org/html/rfc8290

The problem is that it's not deployed at most of the actual bottlenecks, which mainly exist in ISPs' equipment if the core networks are indeed overprovisioned.  If it was deployed, congestion on the Internet would be a much easier problem than it is today.  And this should be especially applicable to less-developed areas of the Internet.

All it takes is enough ISPs manning up, talking to their hardware vendors, and saying: "We expect to have congestion occasionally/frequently (delete as applicable).  What AQM does your hardware support, and how do we turn it on?"  And if the answer is negative, being prepared to take business to a competitor who does.  The technology exists; money talks.

In the less developed parts of the world, one could easily jury rig an AQM router together using a Raspberry Pi and a couple of USB Ethernet dongles.  With the latest Raspberry Pi 4, that would work for up to a gigabit link; with older models, you can still handle 100Mbps.  These things are cheap enough to practically be disposable, and consume very little power.  If there's demand for that sort of thing, I and my colleagues could probably arrange for a ready-made SD card image to be built and published.

Or you could standardise on a consumer CPE router and build a no-knobs mesh-networking image for that.  There's a bunch of groups who regularly attend Battlemesh, who could help with that.  Be prepared to change models every few years as the old ones go out of production.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-05 13:47             ` Sebastian Moeller
@ 2019-08-06  9:49               ` Mikael Abrahamsson
  2019-08-06 14:34                 ` Ruediger.Geib
  0 siblings, 1 reply; 22+ messages in thread
From: Mikael Abrahamsson @ 2019-08-06  9:49 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Ruediger.Geib, tcpm, ECN-Sane, tsvwg IETF list

On Mon, 5 Aug 2019, Sebastian Moeller wrote:

> 	That is a good question, as far as I see it, ISPs in Germany still 
> seem to leverage access-rates as their main attraction (giga this and 
> giga that), even though as you note higher rates have diminishing 
> returns for most use-cases.

It's great marketing. Same way as car manufacturers make prototypes and 
high end cars ("halo cars") to sell their high-volume mainstream cars. 
It's brand building. So don't judge the customer interest for actual 
product sales, by the marketing you see. They might not correlate 
directly.

There is also one more thing that people nowadays do that wasn't directly 
on your list. Software downloads. Either on game console or on a PC, these 
downloads can easily be in tens of gigabytes. I personally have a 250/100 
connection at home, and download of a large modern game can take 30-60 
minutes, because it's 50 GB. I do not know what congestion avoidance 
algorithms are used, but it seems to me that at least some of these 
software download services do not actually create congestion. They do very 
slow ramp-ups and from what I can see they typically keep the utilisation 
of my Internet connection below congestion (as in perhaps averaging at 80% 
of capacity).

Personally I frequently see several potential congestion points between 
user device and what it's communicating with.

There is the ISP-CDN or ISP-ISP interconnect point.
There might be congestion on the core-core links.
There is the uplink to the BNG or whatever.
There is the user-unique shaper
There is the L2 aggregation network (DOCSIS/*PON/ETTH)
There is the in-house wifi network.

So even if the ISP does a great job, we might have the user-unique shaper 
and the wifi both congesting and the wifi might be slower than the 
user-access link. This is the case in my home sometimes. Even with a great 
wifi setup (multiple 5GHz APs) I frequently get congestion there resulting 
in lower speeds than my 250 megabit/s Internet access speed. So this means 
traffic might encounter half of the time my 250 megabit/s ISP shaper as 
the bottleneck, then sporadically it encounters my wifi lowering the speed 
even more, and then returning to my ISP shaper being the slowest point.

So I think your suggestion of what we should test is useful.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-06  9:49               ` Mikael Abrahamsson
@ 2019-08-06 14:34                 ` Ruediger.Geib
  2019-08-06 15:27                   ` Jonathan Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Ruediger.Geib @ 2019-08-06 14:34 UTC (permalink / raw)
  To: swmike; +Cc: tcpm, ecn-sane, tsvwg

I'd like to sort Mikael's list a little....

-----Ursprüngliche Nachricht-----
Von: Mikael Abrahamsson <swmike@swm.pp.se> 

Of course congestion occurs. But the probability of it matters (the less likely it is, the less likely are efforts to work around it) 

Congestion is not probable in a well dimensioned backbone and at paid peerings. I think the effort put into congestion avoidance by engineering is high at these locations and whether protocol design to deal with congestion there (and make a gain as compared to todays transport performance under congestion there) is worth a larger effort. Bulk transport optimisation makes sense there, that includes suitable AQM.

Public peerings and not well dimensioned networks may suffer from regular congestion. I'm not sure to which extent technical standards can significantly improve service quality in that situation. IP transport must work as good as possible also in such a situation, of course. 

The following are access issues:

   There is the uplink to the BNG or whatever.
   There is the user-unique shaper
   There is the L2 aggregation network (DOCSIS/*PON/ETTH) There is the in-house wifi network.

Maybe one can add LTE and 5G, these are layer 2 standards missing above. Shared L2 may result in annoying performance. To me, L3 protocol design improving IP performance over one or more L2 protocols standardised by other SDOs sounds good. I wonder to which extent that's feasible.   

As you know, the user-unique (BNG or the like) shaper is my favourite site where I'd appreciate improvements. 

Home Gateways are a mass market product. I'm not familiar with ways to impact the vendors of that segment (but agree that it's worth trying). Also wireless scheduling of IP traffic certainly is an interesting topic. I'm not sure whether IETF has sufficient impact to push for improved packet transport performance there (again, it's worth trying).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-06 14:34                 ` Ruediger.Geib
@ 2019-08-06 15:27                   ` Jonathan Morton
  2019-08-06 15:35                     ` Dave Taht
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Morton @ 2019-08-06 15:27 UTC (permalink / raw)
  To: Ruediger.Geib; +Cc: swmike, tcpm, ecn-sane, tsvwg

> On 6 Aug, 2019, at 5:34 pm, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Public peerings and not well dimensioned networks may suffer from regular congestion. I'm not sure to which extent technical standards can significantly improve service quality in that situation. IP transport must work as good as possible also in such a situation, of course. 

Obviously the application of AQM will not magically improve total throughput.  What it can do, however, is reduce latency and packet loss, and thereby improve perceived reliability of the service.  It may even improve goodput for the same throughput - an increase in efficiency.

This is especially true under emergency overload conditions, which is when people most desperately want a functioning network, but it is most likely to collapse under the strain.  Often a disaster will incidentally knock out some proportion of network infrastructure in the area, turning a previously well-proportioned network into an under-proportioned one, simultaneously with a sharp increase in demand as people try to find out what is going on or communicate with friends and relatives, and emergency response teams also try to coordinate their essential work.

So IMO networks should be designed to work well when congested, even when they are also designed to never *be* congested.  Technical specifications already exist and are well tested for this purpose.  Having them built in and turned on from the factory would be a great step forward.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Ecn-sane] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
  2019-08-06 15:27                   ` Jonathan Morton
@ 2019-08-06 15:35                     ` Dave Taht
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Taht @ 2019-08-06 15:35 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Ruediger.Geib, tcpm IETF list, ECN-Sane, tsvwg IETF list

On Tue, Aug 6, 2019 at 8:28 AM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 6 Aug, 2019, at 5:34 pm, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> >
> > Public peerings and not well dimensioned networks may suffer from regular congestion. I'm not sure to which extent technical standards can significantly improve service quality in that situation. IP transport must work as good as possible also in such a situation, of course.
>
> Obviously the application of AQM will not magically improve total throughput.  What it can do, however, is reduce latency and packet loss, and thereby improve perceived reliability of the service.  It may even improve goodput for the same throughput - an increase in efficiency.
>
> This is especially true under emergency overload conditions, which is when people most desperately want a functioning network, but it is most likely to collapse under the strain.  Often a disaster will incidentally knock out some proportion of network infrastructure in the area, turning a previously well-proportioned network into an under-proportioned one, simultaneously with a sharp increase in demand as people try to find out what is going on or communicate with friends and relatives, and emergency response teams also try to coordinate their essential work.

Overbuffering, the silent killer:
http://blog.cerowrt.org/post/bufferbloat_on_the_backbone/

> So IMO networks should be designed to work well when congested, even when they are also designed to never *be* congested.  Technical specifications already exist and are well tested for this purpose.  Having them built in and turned on from the factory would be a great step forward.

I live in california, where I kind of expect the network to collapse
in the next Big One.

>
>  - Jonathan Morton
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-08-06 15:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 16:48 [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S Bob Briscoe
2019-07-09 14:41 ` [Ecn-sane] [tsvwg] " Black, David
2019-07-09 15:32   ` [Ecn-sane] [tcpm] " Neal Cardwell
2019-07-09 15:41 ` [Ecn-sane] " Jonathan Morton
2019-07-09 23:08   ` [Ecn-sane] [tsvwg] " Yuchung Cheng
2019-08-02  8:29   ` Ruediger.Geib
2019-08-02  9:47     ` Jonathan Morton
2019-08-02 11:10       ` Dave Taht
2019-08-02 12:05         ` Dave Taht
2019-08-05  9:35       ` Ruediger.Geib
2019-08-05 10:59         ` Jonathan Morton
2019-08-05 12:16           ` Ruediger.Geib
2019-08-05 15:55             ` Jonathan Morton
2019-08-02 13:15     ` Sebastian Moeller
2019-08-05  7:26       ` Ruediger.Geib
2019-08-05 11:00         ` Sebastian Moeller
2019-08-05 11:47           ` Ruediger.Geib
2019-08-05 13:47             ` Sebastian Moeller
2019-08-06  9:49               ` Mikael Abrahamsson
2019-08-06 14:34                 ` Ruediger.Geib
2019-08-06 15:27                   ` Jonathan Morton
2019-08-06 15:35                     ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox