[Ecn-sane] Comments on L4S drafts

Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed

* [Ecn-sane] Comments on L4S drafts
@ 2019-06-05  0:01 Holland, Jake
  2019-06-07 18:07 ` Bob Briscoe
  0 siblings, 1 reply; 84+ messages in thread
From: Holland, Jake @ 2019-06-05  0:01 UTC (permalink / raw)
  To: tsvwg, Bob Briscoe; +Cc: ecn-sane

Hi Bob,

I have a few comments and questions on draft-ietf-tsvwg-ecn-l4s-id-06
and draft-ietf-tsvwg-l4s-arch-03.

I've been re-reading these with an eye toward whether it would be
feasible to make L4S compatible with SCE[1] by using ECN capability alone
as the dualq classifier (roughly as described in Appendix B.3 of l4s-id),
and using ECT(1) to indicate a sub-loss congestion signal, assuming
some reasonable mechanism for reflecting the ECT(1) signals to sender
(such as AccECN in TCP, or even just reflecting each SCE signal in the
NS bit from receiver, if AccECN is un-negotiated).

I'm trying to understand the impact this approach would have on the
overall L4S architecture, and I thought I'd write out some of the
comments and questions that taking this angle on a review has left me
with.

This approach of course would require some minor updates to DCTCP or other
CCs that hope to make use of the sub-loss signal, but the changes seem
relatively straightforward (I believe there's a preliminary
implementation that was able to achieve similarly reduced RTT in lab) and
the idea of course comes with some tradeoffs--I've tried to articulate the
key ones I noticed below, which I think are mostly already stated in the
l4s drafts, but I thought I'd ask your opinion of whether you agree with
this interpretation of what these tradeoffs would look like, or there
are other important points you'd like to mention for consideration.

1.
Of course, I understand using SCE-style signaling with ECT capability as
the dualq classifier would come with a cost that where there's classic ECT
behavior at endpoints, the low latency queue would routinely get some
queue-building, until there's pretty wide deployment of scalable controllers
and feedback for the congestion signals at the endpoints.

This is a downside for the proposal, but of course even under this downside,
there's the gains described in Section 5.2 of l4s-arch:
   "State-of-the-art AQMs:  AQMs such as PIE and fq_CoDel give a
      significant reduction in queuing delay relative to no AQM at all."

On top of that, the same pressures that l4s-arch describes that should
cause rapid rollout of L4S should for the same reasons cause rapid rollout
of the endpoint capabilities, especially if the network capability is
there.

But regardless, the queue-building from classic ECN-capable endpoints that
only get 1 congestion signal per RTT is what I understand as the main
downside of the tradeoff if we try to use ECN-capability as the dualq
classifier.  Does that match your understanding?

2.
I ended up confused about how falling back works, and I didn't see it
spelled out anywhere.  I had assumed it was a persistent state-change
for the sender for the rest of the flow lifetime after detecting a
condition that required it, but then I saw some text that seemed to
indicate it might be temporary? From section 4.3 in l4s-id:
   "Note that a scalable congestion control is not expected to change
      to setting ECT(0) while it temporarily falls back to coexist with
      Reno ."

Can you clarify whether the fall-back is meant to be temporary or not,
and whether I missed a more complete explanation of how it's supposed to
work?

3.
I also was a little confused on the implementation status of the fallback
logic.  I was looking through some of the various links I could find, and
I think these 2 are the main ones to consider? (from
https://riteproject.eu/dctth/#code ):
- https://github.com/L4STeam/sch_dualpi2_upstream
- https://github.com/L4STeam/tcp-prague

It looks like the prague_fallback_to_ca case so far only happens when
AccECN is not negotiated, right?

To me, the logic for when to do this (especially for rtt changes) seems
fairly complicated and easy to get wrong, especially if it's meant to be
temporary for the flow, or if needs to deal with things like network path
changes unrelated to the bottleneck, or variations in rtt due an endpoint
being a mobile device, or on wi-fi.

Which brings me to:

*4.
(* I think this is the biggest point driving me to ask about this.)

I'm pretty worried about mis-categorizing CE marking from classic AQM
algorithms as L4S-style markings, when using ECT(1) as the dualq
classifier.

I did see this issue addressed in the l4s drafts, but reviewing it
left me a little confused, so I thought I'd ask about a point I
noticed for clarification:

From section 6.3.3 of l4s-arch:
   "an L4S sender will have to
   fall back to a classic ('TCP-Friendly') behaviour if it detects that
   ECN marking is accompanied by greater queuing delay or greater delay
   variation than would be expected with L4S"

From the abstract in l4s-arch:
   "In
   extensive testing the new L4S service keeps average queuing delay
   under a millisecond for _all_ applications even under very heavy
   load"

My reading of these seems to suggest that if the sender can observe
a variance or increase of more than 1 millisecond of rtt, it should fall
back to classic ECN?

I'm not sure yet how to square that with Section A.1.4 of l4s-id:
   "An increase in queuing delay or in delay variation would be
   a tell-tale sign, but it is not yet clear where a line would be drawn
   between the two behaviours."

Is the discrepancy here because the extensive testing (also mentioned in
the abstract of l4s-arch) was mainly in controlled environments, but the
internet is expected to introduce extra non-bottleneck delays even where
a dualq is present at the bottleneck, such as those from wi-fi, mobile
networks, and path changes?

Regardless, this seems to me like a worrisome gap in the spec, because if
the claim that dualq will get deployed and enabled quickly and widely is
correct, it means this will be a common scenario in deployment--basically
wherever there's existing classic AQMs deployed, especially since in CPE
devices the existing AQMs are generally configured to have a lower
bandwidth limit than the subscriber limit, so they'll (deliberately) be
the bottleneck whenever the upstream access network isn't overly
congested.

I guess if it's really a 1-2 ms variance threshold to fall back, that
would probably address the safety concern, but it seems like it would
have a lot of false positives, and unnecessarily fall back on a lot of
flows.

But worse, if there's some (not yet specified?) logic that tries to reduce
those false positives by relaxing a simple very-few-ms threshold, it seems
like there's a high likelihood of logic that produces false negatives going
undetected.

If that's the case, to me it seems like it will remain a significant risk
even while TCP Prague has been deployed for quite a long time at a sender,
as long as different endpoint and AQM implementations roll out randomly
behind different network conditions, for the various endpoints that end
up connected with the sender.

It also seems to me there's a high likelihood of causing unsafe non-
responsive sender conditions in some of the cases where this kind of false
negative happens in any kind of systematic way.

By contrast, as I understand it an SCE-based approach wouldn't need the
same kind of fallback state-change logic for the flow, since any CE would
indicate a RFC 3168-style multiplicative decrease, and only ECT(1) would
indicate sub-loss congestion.

This is one of the big advantages of the SCE-based approach in my mind,
since there's no chance of mis-classifying the meaning of a CE mark and
no need for a state change for how the sender handles the ECT backoff logic
or sets the ECT markings.  (It just goes back to treating any CE as RFC3168-
style loss equivalent, and SCE as a sub-loss signal.)

Since an SCE-based approach would avoid this problem nicely, I consider
the reduced risk of false negatives (and unresponsive flows) here one of the
important gains, to be weighed against the key downside mentioned in comment
#1.

5.
Something similar comes up again in some other places, for instance:

from A.1.4 in l4s-id:
   "Description: A scalable congestion control needs to distinguish the
   packets it sends from those sent by classic congestion controls.

   Motivation: It needs to be possible for a network node to classify
   L4S packets without flow state into a queue that applies an L4S ECN
   marking behaviour and isolates L4S packets from the queuing delay of
   classic packets."

Listing this as a requirement seems to prioritize enabling the gains of
L4S ahead of avoiding the dangers of L4S flows failing to back off in the
presence of possibly-miscategorized CE markings, if I'm reading it right?

I guess Appendix A says these "requirements" are non-normative, but I'm a
little concerned that framing it as a requirement instead of a design
choice with a tradeoff in its consequences is misleading here, and
pushes toward a less safe choice.

6.
If queuing from classic ECN-capable flows is the main issue with using
ECT as the dualq classifier, do you think it would still be possible to
get the queuing delay down to a max of ~20-40ms right away for ECN-capable
endpoints in networks that deploy this kind of dualq, and then hopefully
see it drop further to ~1-5ms as more endpoints get updated with AccECN or
some kind of ECT(1) feedback and a scalable congestion controller that
can respond to SCE-style marking?

Or is it your position that the additional gains from the ~1ms queueing delay
that should be achievable from the beginning by using ECT(1) (in connections
where enough of the key entities upgrade) are worth the risks?

(And if so, do you happen to have a pointer to any presentations or papers
that made a quantitative comparison of the benefits from those 2 options?
I don't recall any offhand, but there's a lot of papers...)

Best regards,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] Comments on L4S drafts
  2019-06-05  0:01 [Ecn-sane] Comments on L4S drafts Holland, Jake
@ 2019-06-07 18:07 ` Bob Briscoe
  2019-06-14 17:39   ` Holland, Jake
  2019-06-14 20:10   ` [Ecn-sane] [tsvwg] " Luca Muscariello
  0 siblings, 2 replies; 84+ messages in thread
From: Bob Briscoe @ 2019-06-07 18:07 UTC (permalink / raw)
  To: Holland, Jake, tsvwg; +Cc: ecn-sane

[-- Attachment #1: Type: text/plain, Size: 23692 bytes --]

Thanks Jake,

I'll address each of your questions inline. But I notice that I need to 
lay down some context first.

The problem boils down to deployment incentives. The introduction of 
fine-grained congestion control requires changes to sender, receiver and 
at least the bottleneck link before it is effective. ECN deployment 
faced the same 3-part deployment problem. So we tried hard to learn from 
it.

Faced with a 3-part deployment, no single party makes a move unless they 
judge that the potential gain is worth the effort and that /all/ the 
other parts (server, client, network) are strongly likely to make the 
same judgement {Note 1}.

The effort isn't just the coding, it's all the hassle dealing with 
unexpected consequences of making the change, e.g. the risk of people's 
Internet service being taken out by a middlebox black-holing the new 
protocol. High risk of high cost/effort needs very high gain.

So the improvement has to be remarkable. Not just incremental, but 
stunning enough to enable applications that are not even possible 
otherwise.

The aim here is to use the last unicorn in the world (ECT(1)) to the 
full. If we don't make delay extremely low and extremely consistent 
we'll have wasted it. So we must focus on 99th percentile delay (and 
more 9s if you want to take longer to measure it). Now, inline...

On 05/06/2019 01:01, Holland, Jake wrote:
> Hi Bob,
>
> I have a few comments and questions on draft-ietf-tsvwg-ecn-l4s-id-06
> and draft-ietf-tsvwg-l4s-arch-03.
>
> I've been re-reading these with an eye toward whether it would be
> feasible to make L4S compatible with SCE[1] by using ECN capability alone
> as the dualq classifier (roughly as described in Appendix B.3 of l4s-id),
> and using ECT(1) to indicate a sub-loss congestion signal, assuming
> some reasonable mechanism for reflecting the ECT(1) signals to sender
> (such as AccECN in TCP, or even just reflecting each SCE signal in the
> NS bit from receiver, if AccECN is un-negotiated).
>
> I'm trying to understand the impact this approach would have on the
> overall L4S architecture, and I thought I'd write out some of the
> comments and questions that taking this angle on a review has left me
> with.
>
> This approach of course would require some minor updates to DCTCP or other
> CCs that hope to make use of the sub-loss signal, but the changes seem
> relatively straightforward (I believe there's a preliminary
> implementation that was able to achieve similarly reduced RTT in lab) and
> the idea of course comes with some tradeoffs--I've tried to articulate the
> key ones I noticed below, which I think are mostly already stated in the
> l4s drafts, but I thought I'd ask your opinion of whether you agree with
> this interpretation of what these tradeoffs would look like, or there
> are other important points you'd like to mention for consideration.
May I give this proposal a name for brevity: ECN-DualQ-SCE (which 
sort-of represents ECN as the input classifier into 1 of 2 queues and 
SCE as the output from that queue).

>
>
> 1.
> Of course, I understand using SCE-style signaling with ECT capability as
> the dualq classifier would come with a cost that where there's classic ECT
> behavior at endpoints, the low latency queue would routinely get some
> queue-building, until there's pretty wide deployment of scalable controllers
> and feedback for the congestion signals at the endpoints.
>
> This is a downside for the proposal, but of course even under this downside,
> there's the gains described in Section 5.2 of l4s-arch:
>     "State-of-the-art AQMs:  AQMs such as PIE and fq_CoDel give a
>        significant reduction in queuing delay relative to no AQM at all."
Indeed, herein lies the problem. Imagine you are trying to convince a 
network operator to start a major project to tender for a new low 
latency technology then deploy it across their access network. You tell 
them it will also depend on:
* servers/CDNs deploying new OS code.
* and clients deploying new OS code.
Then you tell them that, until /most/ servers deploy, and /most/ clients 
deploy (maybe a decade?), the low latency queue will routinely add as 
much queue delay as we can already get (without clients and servers 
changing)....

One day, you continue, if all the other servers and clients passing 
traffic through that box get upgraded, it will be cool. Until that day, 
a gamer in augmented reality gets stunningly low delay,... except every 
time her daughter in the bedroom looks at a mate's facebook page or 
watches a YouTube clip.

Is the network operator really going to take all those risks for jam 
tomorrow (=  maybe a decade)? I really don't think so.

Then we'll have burned the last unicorn to routinely get what we've 
already got.

  * Incremental deployment means, as you deploy the new capability, old
    traffic continues to work, while new traffic gets the new service.
  * As you say, with ECN-DualQ-SCE, new traffic only gets the new
    service if there's no old traffic there. That's not only incremental
    deployment; that's also ineffective deployment.

>
> On top of that, the same pressures that l4s-arch describes that should
> cause rapid rollout of L4S should for the same reasons cause rapid rollout
> of the endpoint capabilities, especially if the network capability is
> there.

I'm afraid there are not the same pressures to cause rapid roll-out at 
all, cos it's flakey now, jam tomorrow. (Actually ECN-DualQ-SCE has a 
much greater problem - complete starvation of SCE flows - but we'll come 
on to that in Q4.)

I want to say at this point, that I really appreciate all the effort 
you've been putting in, trying to find common ground.

In trying to find a compromise, you've taken the fire that is really 
aimed at the inadequacy of underlying SCE protocol - for anything other 
than FQ. If the primary SCE proponents had attempted to articulate a way 
to use SCE in a single queue or a dual queue, as you have, that would 
have taken my fire.

>
> But regardless, the queue-building from classic ECN-capable endpoints that
> only get 1 congestion signal per RTT is what I understand as the main
> downside of the tradeoff if we try to use ECN-capability as the dualq
> classifier.  Does that match your understanding?
This is indeed a major concern of mine (not as major as the starvation 
of SCE explained under Q4, but we'll come to that).

Fine-grained (DCTCP-like) and coarse-grained (Cubic-like) congestion 
controls need to be isolated, but I don't see how, unless their packets 
are tagged for separate queues. Without a specific fine/coarse 
identifier, we're left with having to re-use other identifiers:

  * You've tried to use ECN vs Not-ECN. But that still lumps two large
    incompatible groups (fine ECN and coarse ECN) together.
  * The only alternative that would serve this purpose is the flow
    identifier at layer-4, because it isolates everything from
    everything else. FQ is where SCE started, and that seems to be as
    far as it can go.

Should we burn the last unicorn for a capability needed on 
"carrier-scale" boxes, but which requires FQ to work? Perhaps yes if 
there was no alternative. But there is: L4S.

That brings us neatly to the outstanding issues with L4S...

>
>
> 2.
> I ended up confused about how falling back works, and I didn't see it
> spelled out anywhere.  I had assumed it was a persistent state-change
> for the sender for the rest of the flow lifetime after detecting a
> condition that required it, but then I saw some text that seemed to
> indicate it might be temporary? From section 4.3 in l4s-id:
>     "Note that a scalable congestion control is not expected to change
>        to setting ECT(0) while it temporarily falls back to coexist with
>        Reno ."
>
> Can you clarify whether the fall-back is meant to be temporary or not,
> and whether I missed a more complete explanation of how it's supposed to
> work?
Firstly, as has been made clear in our latest talk/paper at Linux netdev 
and in my latest iccrg talk, currently TCP Prague only includes 
fall-back to Reno on loss. It does not do fall-back on classic ECN 
marking (yet). We're still working on RTT-independence and scaling to 
very low RTT (sub-MSS window) first.

Fall-back on loss is definitely very temporary: it does one large 
Reno-style window halving on a loss (ignoring any other losses in that 
RTT as Reno does), then immediately continues with DCTCP-style 
congestion avoidance driven by all the ECN marks (not just one per-RTT).

For classic ECN AQM detection, we only have initial design ideas. 
Olivier posted his design ideas here:
     https://github.com/L4STeam/tcp-prague/issues/2

I want to keep it simple (see response to Q4 about false negatives). 
Fall-back would be temporary, but last longer than for loss - until the 
flow next goes idle. Here's the simplest that I think might work:
     Starting X RTTs after first CE mark;    // allows end of Slow Start 
to stabilize
     if (srtt > (min_rtt + Y) || rttvar > Z) {fallback()};
Where X,Y&Z are TBD, dependent on experiments, but say X=5-6 RTT, 
Y=4-5ms & Z=dunno_without_measuring. The min_rtt could be taken only 
since the previous start-up or idle period (or perhaps the previous 
two). An idle would have to be defined as >3-4 RTT, to allow any 
self-induced queue to drain.

The whole of L4S is experimental track. So others might take different 
approaches (e.g. BBRv2) and I'm sure our approach will evolve, which is 
why the requirement is worded liberally (it has to cover real-time, etc. 
not just TCP).

>
>
> 3.
> I also was a little confused on the implementation status of the fallback
> logic.  I was looking through some of the various links I could find, and
> I think these 2 are the main ones to consider? (from
> https://riteproject.eu/dctth/#code ):
> - https://github.com/L4STeam/sch_dualpi2_upstream
> - https://github.com/L4STeam/tcp-prague
>
> It looks like the prague_fallback_to_ca case so far only happens when
> AccECN is not negotiated, right?
That's not the same sort of fall-back. That's fall-back because without 
AccECN there's only one ECN feedback signal per RTT, so it falls back to 
the configured classic congestion controller for the whole connection. 
Which controller depends on the parameter prague_ca_fallback which 
defaults to cubic.

As said above, fall-back on classic ECN has not yet been implemented in 
TCP Prague. Of the 3 things left on our list, it's the last 'cos we're 
waiting to see the results of measurements from a CDN, to see if there 
are any single queue classic ECN AQMs out there. If there aren't we 
would not plan to implement this requirement until there were. Whether 
others do is up to them of course.

>
> To me, the logic for when to do this (especially for rtt changes) seems
> fairly complicated and easy to get wrong, especially if it's meant to be
> temporary for the flow, or if needs to deal with things like network path
> changes unrelated to the bottleneck, or variations in rtt due an endpoint
> being a mobile device, or on wi-fi.
>
> Which brings me to:
>
>
> *4.
> (* I think this is the biggest point driving me to ask about this.)
>
> I'm pretty worried about mis-categorizing CE marking from classic AQM
> algorithms as L4S-style markings, when using ECT(1) as the dualq
> classifier.
>
> I did see this issue addressed in the l4s drafts, but reviewing it
> left me a little confused, so I thought I'd ask about a point I
> noticed for clarification:
>
>  From section 6.3.3 of l4s-arch:
>     "an L4S sender will have to
>     fall back to a classic ('TCP-Friendly') behaviour if it detects that
>     ECN marking is accompanied by greater queuing delay or greater delay
>     variation than would be expected with L4S"
>
>  From the abstract in l4s-arch:
>     "In
>     extensive testing the new L4S service keeps average queuing delay
>     under a millisecond for _all_ applications even under very heavy
>     load"
>
> My reading of these seems to suggest that if the sender can observe
> a variance or increase of more than 1 millisecond of rtt, it should fall
> back to classic ECN?
>
> I'm not sure yet how to square that with Section A.1.4 of l4s-id:
>     "An increase in queuing delay or in delay variation would be
>     a tell-tale sign, but it is not yet clear where a line would be drawn
>     between the two behaviours."
>
> Is the discrepancy here because the extensive testing (also mentioned in
> the abstract of l4s-arch) was mainly in controlled environments, but the
> internet is expected to introduce extra non-bottleneck delays even where
> a dualq is present at the bottleneck, such as those from wi-fi, mobile
> networks, and path changes?
No, it's simply 'cos there is no implementation of this requirement yet.

>
> Regardless, this seems to me like a worrisome gap in the spec, because if
> the claim that dualq will get deployed and enabled quickly and widely is
> correct, it means this will be a common scenario in deployment--basically
> wherever there's existing classic AQMs deployed, especially since in CPE
> devices the existing AQMs are generally configured to have a lower
> bandwidth limit than the subscriber limit, so they'll (deliberately) be
> the bottleneck whenever the upstream access network isn't overly
> congested.
I believe FQ-CoDel is the only AQM in CPE that I know of that supports 
classic ECN. In this case, an L4S-ECN congestion controller cannot 
starve a Cubic-ECN or Reno-ECN flow, cos the FQ scheduler controls their 
capacity shares.

The only other CPE AQM I am aware of is DOCSIS-PIE, which doesn't 
support ECN.

If the IETF assigns the ECT(1) codepoint to L4S, then it would be 
extremely easy to modify FQ-Codel to set a very shallow ECN threshold in 
any queue where at least one ECT(1) codepoint had been detected. This 
would work fine with highly transient flow queues.

>
> I guess if it's really a 1-2 ms variance threshold to fall back, that
> would probably address the safety concern, but it seems like it would
> have a lot of false positives, and unnecessarily fall back on a lot of
> flows.
>
> But worse, if there's some (not yet specified?) logic that tries to reduce
> those false positives by relaxing a simple very-few-ms threshold, it seems
> like there's a high likelihood of logic that produces false negatives going
> undetected.
>
> If that's the case, to me it seems like it will remain a significant risk
> even while TCP Prague has been deployed for quite a long time at a sender,
> as long as different endpoint and AQM implementations roll out randomly
> behind different network conditions, for the various endpoints that end
> up connected with the sender.
I am less worried about this. I would be comfortable erring on the side 
of reducing false positives at the expense of false negatives.

Nonetheless, this position depends on what we find in measurement studies.
* If we find no single-queue AQMs that do ECN-marking, it's a 
non-problem {Note 2}.
* If such AQMs exist but are rare, they are likely to be in specific 
operator's networks, so there would be operator-specific ways to address 
such problems. E.g. if a CDN wanted to deploy the L4S experiment on its 
caches for that network, in collaboration with the network operator it 
could set a local-use DSCP instead of using ECT(1). That would still not 
deal with L4S traffic to/from the Internet, but the probability that 
different types of long-running flows coincide is low anyway, so the 
probability that different types of flows that are both long-running and 
non-CDN will coincide must surely be tiny.

>
> It also seems to me there's a high likelihood of causing unsafe non-
> responsive sender conditions in some of the cases where this kind of false
> negative happens in any kind of systematic way.
This overstates the problem. There is no unresponsiveness. Even when two 
long-running flows coincide, an L4S flow does not actually starve a 
classic (e.g. Reno-ECN) TCP flow. They come to a balance that can be 
highly unequal in high BDP links, but never starvation or 
unresponsiveness. Indeed, as the link's BDP gets smaller, or the more 
flows there are, the more DCTCP & Reno-ECN tend to equality.

>
> By contrast, as I understand it an SCE-based approach wouldn't need the
> same kind of fallback state-change logic for the flow, since any CE would
> indicate a RFC 3168-style multiplicative decrease, and only ECT(1) would
> indicate sub-loss congestion.
I'm afraid you understand it wrong.

With the ECN-DualQ-SCE approach, any flows where the receiver does not 
feed back SCE (ECT(1)) markings starve any SCE (DCTCP-like) flows in the 
same bottleneck.

Similarly, any Reno-ECN or Cubic-ECN senders (i.e. without the logic to 
understand SCE) starve the SCE (DCTCP-like) flows in the same 
ECN-DualQ-SCE bottleneck.

And here, starve actually means starve. Not just come to a highly 
unbalanced equilibrium, but completely starve.

This is because a Cubic-ECN flow will keep pushing the queue up to the 
point where it emits CE markings, because it doesn't understand and 
therefore ignores the SCE markings. One queue can only have one length. 
So, because the Cubic flow(s) have pushed the queue past the shallower 
point where it starts to emit SCE markings, all packets not marked CE 
will be marked SCE.

For example, say Cubic flow(s) induce a fairly normal 0.5% CE marking 
(or 0.5% drop for non-ECN flows). Then there will be 99.5% SCE marking.

Then, the DCTCP-like flows designed to understand SCE will keep reducing 
in response to this saturated SCE marking and the Cubic flows will fill 
the space they leave and starve them.

We did experiments to try to minimize this starvation, with two AQMs in 
one queue where one type of CC ignores the signals from the lower 
threshold back in 2012. See:
     http://bobbriscoe.net/pubs.html#DCTCP-Internet
This led us to realize we would have to use at least two queues.

>
> This is one of the big advantages of the SCE-based approach in my mind,
> since there's no chance of mis-classifying the meaning of a CE mark and
> no need for a state change for how the sender handles the ECT backoff logic
> or sets the ECT markings.  (It just goes back to treating any CE as RFC3168-
> style loss equivalent, and SCE as a sub-loss signal.)
>
> Since an SCE-based approach would avoid this problem nicely, I consider
> the reduced risk of false negatives (and unresponsive flows) here one of the
> important gains, to be weighed against the key downside mentioned in comment
> #1.
I hope you can see now that the ECN-DualQ-SCE approach suffers from the 
same problem as you are concerned about with L4S. Except the difference 
is it's not in 'legacy' non-SCE queues, but in the queue implementing 
SCE marking itself.

Unless one separates non-SCE traffic into a different queue, it starves 
SCE traffic.

>
>
> 5.
> Something similar comes up again in some other places, for instance:
>
> from A.1.4 in l4s-id:
(it's A.1.1.)
>     "Description: A scalable congestion control needs to distinguish the
>     packets it sends from those sent by classic congestion controls.
>
>     Motivation: It needs to be possible for a network node to classify
>     L4S packets without flow state into a queue that applies an L4S ECN
>     marking behaviour and isolates L4S packets from the queuing delay of
>     classic packets."
>
> Listing this as a requirement seems to prioritize enabling the gains of
> L4S ahead of avoiding the dangers of L4S flows failing to back off in the
> presence of possibly-miscategorized CE markings, if I'm reading it right?
> I guess Appendix A says these "requirements" are non-normative, but I'm a
> little concerned that framing it as a requirement instead of a design
> choice with a tradeoff in its consequences is misleading here, and
> pushes toward a less safe choice.
As I hope you can now see from the last part of answer #4 that, if you 
try to classify ECN flows with fine-grained (DCTCP-like) and coarse 
(Cubic-like) congestion controls into the same queue (whether L4S or SCE 
marking), the Cubic-like congestion controls ruin it.

So I think this requirement stands. I've made a note-to-self to add the 
text: "To avoid having to use per-flow classification..." though.

>
>
> 6.
> If queuing from classic ECN-capable flows is the main issue with using
> ECT as the dualq classifier, do you think it would still be possible to
> get the queuing delay down to a max of ~20-40ms right away for ECN-capable
> endpoints in networks that deploy this kind of dualq, and then hopefully
> see it drop further to ~1-5ms as more endpoints get updated with AccECN or
> some kind of ECT(1) feedback and a scalable congestion controller that
> can respond to SCE-style marking?
Technically yes, but realistically no.

What I mean is, as I said from the start, if you remove the feature that 
deploying the L4S DualQ Coupled AQM gives very low and consistently very 
low latency straight away, then operators will lose interest in 
deploying it.

> Or is it your position that the additional gains from the ~1ms queueing delay
> that should be achievable from the beginning by using ECT(1) (in connections
> where enough of the key entities upgrade) are worth the risks?
Well, I'd say "probably worth the risks", cos we're waiting for 
measurements to get a feel for whether any of the CE markings seen by 
the tests Apple reported in 2016-2017 are from single queue ECN AQMs.

See 
https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-implementing-the-prague-requirements-in-tcp-for-l4s-01#page=11

>
> (And if so, do you happen to have a pointer to any presentations or papers
> that made a quantitative comparison of the benefits from those 2 options?
> I don't recall any offhand, but there's a lot of papers...)
Latest results here (actually no different from results we reported in 
2015 - all the changes to the code since have been non-performance related):
"DUALPI2 - Low Latency, Low Loss and Scalable (L4S) AQM" Olga Albisser 
(Simula), Koen De Schepper (Nokia Bell-Labs), Bob Briscoe (Independent), 
Olivier Tilmans (Nokia Bell-Labs) and Henrik Steen (Simula), in Proc. 
Netdev 0x13 
<https://www.netdevconf.org/0x13/session.html?talk-DUALPI2-AQM> (Mar 2019).

The paper via the netdev link shows qdelay, utilization, completion time 
efficiency, etc with the most extreme traffic load we use (2 
long-running flows plus 5X Web flows per sec, where X is each link rate 
in Mb/s, e.g. 600 flows/sec over the 120Mb/s link), for a full range of 
link rates, round trip times, etc.

The plots are pretty crammed, so if you'd prefer one example qdelay 
cumulative distribution function for the same extreme traffic load, see 
here:
https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-implementing-the-prague-requirements-in-tcp-for-l4s-01#page=22

If you want results from a range of less-extreme traffic models, just ask.

HTH

Bob

>
>
> Best regards,
> Jake
>
>

{Note 1}: Or different server, client and network operators all agree to 
deploy, but let's assume that would be a bonus and not rely on it.

{Note 2}: Even where there are no single-queue AQMs now, there might be 
a concern that some could be enabled in future. Given study after study 
since ECN was first standardized (2001) have detected hardly any CE 
marks on the Internet until FQ-CoDel was deployed about 15 years later, 
the chance of those AQMs being turned on now is surely vanishing.

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[-- Attachment #2: Type: text/html, Size: 29215 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] Comments on L4S drafts
  2019-06-07 18:07 ` Bob Briscoe
@ 2019-06-14 17:39   ` Holland, Jake
  2019-06-19 14:11     ` Bob Briscoe
  2019-06-14 20:10   ` [Ecn-sane] [tsvwg] " Luca Muscariello
  1 sibling, 1 reply; 84+ messages in thread
From: Holland, Jake @ 2019-06-14 17:39 UTC (permalink / raw)
  To: Bob Briscoe, tsvwg; +Cc: ecn-sane

[-- Attachment #1: Type: text/plain, Size: 4097 bytes --]

Hi Bob,

Thanks for your response, I think it helped clarify some important things

for me.

The point about starvation especially was a good one I hadn't fully

considered, and I agree if SCE-based implementations can’t demonstrate a

solution, that would be a major problem with the SCE approach for signaling.

And sorry for my slow response, I ended up restarting a few times to try to

dodge ratholes.  (Plus some day-job duties, apologies...)

I found it a bit challenging to avoid the ratholes effectively, so I'm

thinking maybe the right move is to set up a testbed.  Maybe playing with

that (very cool-looking!) L4SDemo tool can either ease my concerns, or

provide some more specific and detailed scenarios to address.

I see that the source code is published now at

https://github.com/L4STeam/l4sdemo (thanks Olivier!).  So I’ll try to

bring that up at some point, time permitting, in hopes it makes the

comments and questions more productive.

One meta-point I wanted to make:

  "In trying to find a compromise, you've taken the fire that is really

  aimed at the inadequacy of underlying SCE protocol - for anything

  other than FQ.  If the primary SCE proponents had attempted to

  articulate a way to use SCE in a single queue or a dual queue, as you

  have, that would have taken my fire."

I think "fire" here is a potentially harmful metaphor--I don't take your

comments as an attack or this discussion as a battle, but rather a

collaborative attempt to reach a common goal of a better internet.

I hope my comments on this are received the same way, even where we don't

see eye to eye yet.  While both ideas can't be the best use of ECT(1) at

the same time, I take this discussion as an effort to reach a common and

complete understanding of the issues at hand, so that we can hopefully

agree on the best approach in the end (or if we can't get there, maybe we

can at least agree on the underlying reasons we don't agree).

With that said, a few brief points I think really should be raised:

1. "non-problem" is an unreasonably strong conclusion to reach from a

snapshot failure to detect any single-queue marking AQMs.

We know that tc-pie exists in widely deployed systems, supports ECN, and

could be turned on at any moment by anybody, and we also know there's an

increased interest in ECN since Apple and Linux got it turned on on

endpoints.  Even if we measure everything today, it’s hard to be sure this

wouldn’t impact an in-progress rollout that someone has been working toward

for their network with proper due diligence, and following IETF advice

faithfully.

I think if the intent is really to deploy this experiment under the claim

that's a non-problem, it should be called out in the docs as a risk factor,

and consensus should probably be explicitly checked on that point.  It also

probably would be polite to update RFC 7567's advice in section 4, since it

seems like this position would invalidate (or at least add nuance) to

several of the SHOULDs given there, recommending the use of ECN.

2. “does not starve a classic flow, but can be highly unequal” is also

perhaps too low a bar to consider a non-problem, and also seems like maybe

it deserves to be called out as a risk factor.

3. One more meta-point: the sales-y language makes the drafts hard to

read for me, so please forgive some of my confusion.  I'm having a hard

time distinguishing the claims that are well-supported by test results in a

realistic experimental design from some of the claims that are more forward-

looking or speculative.

(4. There’s one other point I’ll mention in response to Ingemar’s comment,

about performance being sufficient to drive adoption, and the difference

between what’s achievable with classic ECN and what’s achievable with L4S,

but that thread is perhaps a better venue for discussing it.)

Best regards,

Jake

[-- Attachment #2: Type: text/html, Size: 13872 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-07 18:07 ` Bob Briscoe
  2019-06-14 17:39   ` Holland, Jake
@ 2019-06-14 20:10   ` Luca Muscariello
  2019-06-14 21:44     ` Dave Taht
  2019-06-19  1:15     ` [Ecn-sane] [tsvwg] Comments " Bob Briscoe
  1 sibling, 2 replies; 84+ messages in thread
From: Luca Muscariello @ 2019-06-14 20:10 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Holland, Jake, tsvwg, ecn-sane

[-- Attachment #1: Type: text/plain, Size: 3533 bytes --]

On Fri, Jun 7, 2019 at 8:10 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:

>
>  I'm afraid there are not the same pressures to cause rapid roll-out at
> all, cos it's flakey now, jam tomorrow. (Actually ECN-DualQ-SCE has a much
> greater problem - complete starvation of SCE flows - but we'll come on to
> that in Q4.)
>
> I want to say at this point, that I really appreciate all the effort
> you've been putting in, trying to find common ground.
>
> In trying to find a compromise, you've taken the fire that is really aimed
> at the inadequacy of underlying SCE protocol - for anything other than FQ.
> If the primary SCE proponents had attempted to articulate a way to use SCE
> in a single queue or a dual queue, as you have, that would have taken my
> fire.
>
> But regardless, the queue-building from classic ECN-capable endpoints that
> only get 1 congestion signal per RTT is what I understand as the main
> downside of the tradeoff if we try to use ECN-capability as the dualq
> classifier.  Does that match your understanding?
>
> This is indeed a major concern of mine (not as major as the starvation of
> SCE explained under Q4, but we'll come to that).
>
> Fine-grained (DCTCP-like) and coarse-grained (Cubic-like) congestion
> controls need to be isolated, but I don't see how, unless their packets are
> tagged for separate queues. Without a specific fine/coarse identifier,
> we're left with having to re-use other identifiers:
>
>    - You've tried to use ECN vs Not-ECN. But that still lumps two large
>    incompatible groups (fine ECN and coarse ECN) together.
>    - The only alternative that would serve this purpose is the flow
>    identifier at layer-4, because it isolates everything from everything else.
>    FQ is where SCE started, and that seems to be as far as it can go.
>
> Should we burn the last unicorn for a capability needed on "carrier-scale"
> boxes, but which requires FQ to work? Perhaps yes if there was no
> alternative. But there is: L4S.
>
>
I have a problem to understand why all traffic ends up to be classified as
either Cubic-like or DCTCP-like.
If we know that this is not true today I fail to understand why this should
be the case in the future.
It is also difficult to predict now how applications will change in the
future in terms of the traffic mix they'll generate.
I feel like we'd be moving towards more customized transport services with
less predictable patterns.

I do not see for instance much discussion about the presence of RTC traffic
and how the dualQ system behaves when the
input traffic does not respond as expected by the 2-types of sources
assumed by dualQ.

If my application is using simulcast or multi-stream techniques I can have
several video streams in the same link,  that, as far as I understand,
will get significant latency in the classic queue. Unless my app starts
cheating by marking packets to get into the priority queue.

In both cases, i.e. my RTC app is cheating or not, I do not understand how
the parametrization of the dualQ scheduler
can cope with traffic that behaves in a different way to what is assumed
while tuning parameters.
For instance, in one instantiation of dualQ based on WRR the weights are
set to 1:16.  This has to necessarily
change when RTC traffic is present. How?

Is the assumption that a trusted marker is used as in typical diffserv
deployments
or that a policer identifies and punishes cheating applications?

BTW I'd love to understand how dualQ is supposed to work under more general
traffic assumptions.

Luca

[-- Attachment #2: Type: text/html, Size: 4610 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-14 20:10   ` [Ecn-sane] [tsvwg] " Luca Muscariello
@ 2019-06-14 21:44     ` Dave Taht
  2019-06-15 20:26       ` [Ecn-sane] [tsvwg] CoIt'smments " David P. Reed
  2019-06-19  1:15     ` [Ecn-sane] [tsvwg] Comments " Bob Briscoe
  1 sibling, 1 reply; 84+ messages in thread
From: Dave Taht @ 2019-06-14 21:44 UTC (permalink / raw)
  To: Luca Muscariello; +Cc: Bob Briscoe, ecn-sane, tsvwg

This thread using unconventional markers for it is hard to follow.

Luca Muscariello <muscariello@ieee.org> writes:

> On Fri, Jun 7, 2019 at 8:10 PM Bob Briscoe <ietf@bobbriscoe.net>
> wrote:
>
>     
>         
>
>     I'm afraid there are not the same pressures to cause rapid
>     roll-out at all, cos it's flakey now, jam tomorrow. (Actually
>     ECN-DualQ-SCE has a much greater problem - complete starvation of
>     SCE flows - but we'll come on to that in Q4.)

Answering that statement is the only reason why I popped up here.
more below.

>     I want to say at this point, that I really appreciate all the
>     effort you've been putting in, trying to find common ground. 

I am happy to see this thread happen also, and I do plan to
stay out of it.

>     
>     In trying to find a compromise, you've taken the fire that is
>     really aimed at the inadequacy of underlying SCE protocol - for
>     anything other than FQ.

The SCE idea does, indeed work best with FQ, in a world of widely
varying congestion control ideas as explored in the recent paper, 50
shades of congestion control:

https://arxiv.org/pdf/1903.03852.pdf

>     If the primary SCE proponents had
>     attempted to articulate a way to use SCE in a single queue or a
>     dual queue, as you have, that would have taken my fire. 

I have no faith in single or dual queues with ECN either, due to
how anyone can scribble on the relevant bits, however...

>     
>         
>         But regardless, the queue-building from classic ECN-capable endpoints that
> only get 1 congestion signal per RTT is what I understand as the main
> downside of the tradeoff if we try to use ECN-capability as the dualq
> classifier.  Does that match your understanding?
>
>     This is indeed a major concern of mine (not as major as the
>     starvation of SCE explained under Q4, but we'll come to that).

I think I missed a portion of this thread. Starvation is impossible,
you are reduced to no less than cwnd 2 (non-bbr), or cwnd 4 (bbr).

Your own work points out a general problem with needing sub-packet
windows with too many flows that cause excessive marking using CE, which
so far as I know remains an unsolved problem.

https://arxiv.org/pdf/1904.07598.pdf

This is easily demonstrated via experiment, also, and the primary reason
why, even with FQ_codel in the field, we generally have turned off ecn
support at low bitrates until the first major release of sch_cake.

I had an open question outstanding about the 10% figure for converting
to drop sch_pie uses that remains unresolved.

As for what level of compatability with classic transports in a single
queue that is possible with a SCE capable receiver and sender, that
remains to be seen. Only the bits have been defined as yet. Two
approaches are being tried in public, so far.

>     
>     Fine-grained (DCTCP-like) and coarse-grained (Cubic-like)
>     congestion controls need to be isolated, but I don't see how,
>     unless their packets are tagged for separate queues. Without a
>     specific fine/coarse identifier, we're left with having to re-use
>     other identifiers:
>     
>     * You've tried to use ECN vs Not-ECN. But that still lumps two
>       large incompatible groups (fine ECN and coarse ECN) together. 
>     * The only alternative that would serve this purpose is the flow
>       identifier at layer-4, because it isolates everything from
>       everything else. FQ is where SCE started, and that seems to be
>       as far as it can go.

Actually, I was seeking a solution (and had been, for going on 5 years)
to the "too many flows not getting out of slow start fast enough",
problem, which you can see from any congested airport, public space,
small office, or coffeeshop nowadays. The vast majority of traffic
there does not consist of long duration high rate flows.

Even if you eliminate the wireless retries and rate changes and put in a
good fq_codel aqm, the traffic in such a large shared environment is
mostly flows lacking a need for congestion control at all (dns, voip,
etc), or in slow start, hammering away at ever increasing delays in
those environments until the user stops hitting the reload button.

Others have different goals and outlooks in this project and I'm
not really part of that.

I would rather like to see both approaches tried in an environment
that had a normal mix of traffic in a shared environment like that.

Some good potential solutions include reducing the slower bits of the
internet back to IW4 and/or using things like initial spreading, both of
which are good ideas and interact well with SCE's more immediate
response curve, paced chirping also.

>
>     Should we burn the last unicorn for a capability needed on
>     "carrier-scale" boxes, but which requires FQ to work? Perhaps yes
>     if there was no alternative. But there is: L4S.

The core of the internet is simply overprovisioned, with fairly short
queues. DCTCP itself did not deploy in very many places that I know of.

could you define exactly what carrier scale means?

>     
>     
>
> I have a problem to understand why all traffic ends up to be
> classified as either Cubic-like or DCTCP-like. 
> If we know that this is not true today I fail to understand why this
> should be the case in the future. 
> It is also difficult to predict now how applications will change in
> the future in terms of the traffic mix they'll generate.
> I feel like we'd be moving towards more customized transport services
> with less predictable patterns.
>
> I do not see for instance much discussion about the presence of RTC
> traffic and how the dualQ system behaves when the 
> input traffic does not respond as expected by the 2-types of sources
> assumed by dualQ.
>
> If my application is using simulcast or multi-stream techniques I can
> have several video streams in the same link, that, as far as I
> understand,
> will get significant latency in the classic queue. Unless my app
> starts cheating by marking packets to get into the priority queue.
>
> In both cases, i.e. my RTC app is cheating or not, I do not understand
> how the parametrization of the dualQ scheduler 
> can cope with traffic that behaves in a different way to what is
> assumed while tuning parameters. 
> For instance, in one instantiation of dualQ based on WRR the weights
> are set to 1:16. This has to necessarily 
> change when RTC traffic is present. How?
>
> Is the assumption that a trusted marker is used as in typical diffserv
> deployments
> or that a policer identifies and punishes cheating applications?
>
> BTW I'd love to understand how dualQ is supposed to work under more
> general traffic assumptions.
>
> Luca

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] CoIt'smments on L4S drafts
  2019-06-14 21:44     ` Dave Taht
@ 2019-06-15 20:26       ` David P. Reed
  0 siblings, 0 replies; 84+ messages in thread
From: David P. Reed @ 2019-06-15 20:26 UTC (permalink / raw)
  To: Dave Taht; +Cc: Luca Muscariello, ecn-sane, Bob Briscoe, tsvwg

[-- Attachment #1: Type: text/plain, Size: 9047 bytes --]


It's essential that the normal state of the Internet, everywhere, is that all queues, except at the ultimate source and destination, average to < 1 packet queued on the outgoing link.
 
That's for many interlocking reasons.  Transient queue buildup MUST be drained back to less than 1 packet with alacrity. All queue buildups must sit either at the entry to the shared network or at the recipient node.
 
Now it is straightforward algorithmically to prioritize competing flows, basically by changing the packet admission rates *at the entry* (through windowing or rate control). To do so requires that there be enough information reflected into each flow (by drops or ECN or whatever) to cause the rates/windowing controls to selectively prioritize the scheduling of admission at the entry endpoint.
 
These are in the nature of desirable invariants achieved by the interaction of the distributed system of flows.
 
Thus, the task of scheduling packets at every congestion point (whether there are priorities or not) must keep the queues short.
 
IMO, far too much focus is currently maintained on "within router" algorithmics.  The problem of congestion is entirely a system-wide problem, not a router problem.
 
Use a little queueing theory and control theory to understand this. It's non-standard queueing and control theory, because the control is "decentralized" and "distributed".
 
The design of the Internet *requires* no central controller. That's its design point for very good reasons. That's why we (not you kids) built it.
 
One can imagine something called "5G" (not the Internet) that has a master world control center. It won't scale, but it is a fantasy of phone companies like BT and suppliers like ALU. Feel free to design that thing. Just don't think that would be an "Internet".
 
 
 
 
On Friday, June 14, 2019 5:44pm, "Dave Taht" <dave@taht.net> said:



> 
> This thread using unconventional markers for it is hard to follow.
> 
> Luca Muscariello <muscariello@ieee.org> writes:
> 
> > On Fri, Jun 7, 2019 at 8:10 PM Bob Briscoe <ietf@bobbriscoe.net>
> > wrote:
> >
> >
> >
> >
> > I'm afraid there are not the same pressures to cause rapid
> > roll-out at all, cos it's flakey now, jam tomorrow. (Actually
> > ECN-DualQ-SCE has a much greater problem - complete starvation of
> > SCE flows - but we'll come on to that in Q4.)
> 
> Answering that statement is the only reason why I popped up here.
> more below.
> 
> > I want to say at this point, that I really appreciate all the
> > effort you've been putting in, trying to find common ground.
> 
> I am happy to see this thread happen also, and I do plan to
> stay out of it.
> 
> >
> > In trying to find a compromise, you've taken the fire that is
> > really aimed at the inadequacy of underlying SCE protocol - for
> > anything other than FQ.
> 
> The SCE idea does, indeed work best with FQ, in a world of widely
> varying congestion control ideas as explored in the recent paper, 50
> shades of congestion control:
> 
> https://arxiv.org/pdf/1903.03852.pdf
> 
> > If the primary SCE proponents had
> > attempted to articulate a way to use SCE in a single queue or a
> > dual queue, as you have, that would have taken my fire.
> 
> I have no faith in single or dual queues with ECN either, due to
> how anyone can scribble on the relevant bits, however...
> 
> >
> >
> > But regardless, the queue-building from classic ECN-capable endpoints
> that
> > only get 1 congestion signal per RTT is what I understand as the main
> > downside of the tradeoff if we try to use ECN-capability as the dualq
> > classifier. Does that match your understanding?
> >
> > This is indeed a major concern of mine (not as major as the
> > starvation of SCE explained under Q4, but we'll come to that).
> 
> I think I missed a portion of this thread. Starvation is impossible,
> you are reduced to no less than cwnd 2 (non-bbr), or cwnd 4 (bbr).
> 
> Your own work points out a general problem with needing sub-packet
> windows with too many flows that cause excessive marking using CE, which
> so far as I know remains an unsolved problem.
> 
> https://arxiv.org/pdf/1904.07598.pdf
> 
> This is easily demonstrated via experiment, also, and the primary reason
> why, even with FQ_codel in the field, we generally have turned off ecn
> support at low bitrates until the first major release of sch_cake.
> 
> I had an open question outstanding about the 10% figure for converting
> to drop sch_pie uses that remains unresolved.
> 
> As for what level of compatability with classic transports in a single
> queue that is possible with a SCE capable receiver and sender, that
> remains to be seen. Only the bits have been defined as yet. Two
> approaches are being tried in public, so far.
> 
> >
> > Fine-grained (DCTCP-like) and coarse-grained (Cubic-like)
> > congestion controls need to be isolated, but I don't see how,
> > unless their packets are tagged for separate queues. Without a
> > specific fine/coarse identifier, we're left with having to re-use
> > other identifiers:
> >
> > * You've tried to use ECN vs Not-ECN. But that still lumps two
> > large incompatible groups (fine ECN and coarse ECN) together.
> > * The only alternative that would serve this purpose is the flow
> > identifier at layer-4, because it isolates everything from
> > everything else. FQ is where SCE started, and that seems to be
> > as far as it can go.
> 
> Actually, I was seeking a solution (and had been, for going on 5 years)
> to the "too many flows not getting out of slow start fast enough",
> problem, which you can see from any congested airport, public space,
> small office, or coffeeshop nowadays. The vast majority of traffic
> there does not consist of long duration high rate flows.
> 
> Even if you eliminate the wireless retries and rate changes and put in a
> good fq_codel aqm, the traffic in such a large shared environment is
> mostly flows lacking a need for congestion control at all (dns, voip,
> etc), or in slow start, hammering away at ever increasing delays in
> those environments until the user stops hitting the reload button.
> 
> Others have different goals and outlooks in this project and I'm
> not really part of that.
> 
> I would rather like to see both approaches tried in an environment
> that had a normal mix of traffic in a shared environment like that.
> 
> Some good potential solutions include reducing the slower bits of the
> internet back to IW4 and/or using things like initial spreading, both of
> which are good ideas and interact well with SCE's more immediate
> response curve, paced chirping also.
> 
> >
> > Should we burn the last unicorn for a capability needed on
> > "carrier-scale" boxes, but which requires FQ to work? Perhaps yes
> > if there was no alternative. But there is: L4S.
> 
> The core of the internet is simply overprovisioned, with fairly short
> queues. DCTCP itself did not deploy in very many places that I know of.
> 
> could you define exactly what carrier scale means?
> 
> >
> >
> >
> > I have a problem to understand why all traffic ends up to be
> > classified as either Cubic-like or DCTCP-like.
> > If we know that this is not true today I fail to understand why this
> > should be the case in the future.
> > It is also difficult to predict now how applications will change in
> > the future in terms of the traffic mix they'll generate.
> > I feel like we'd be moving towards more customized transport services
> > with less predictable patterns.
> >
> > I do not see for instance much discussion about the presence of RTC
> > traffic and how the dualQ system behaves when the
> > input traffic does not respond as expected by the 2-types of sources
> > assumed by dualQ.
> >
> > If my application is using simulcast or multi-stream techniques I can
> > have several video streams in the same link, that, as far as I
> > understand,
> > will get significant latency in the classic queue. Unless my app
> > starts cheating by marking packets to get into the priority queue.
> >
> > In both cases, i.e. my RTC app is cheating or not, I do not understand
> > how the parametrization of the dualQ scheduler
> > can cope with traffic that behaves in a different way to what is
> > assumed while tuning parameters.
> > For instance, in one instantiation of dualQ based on WRR the weights
> > are set to 1:16. This has to necessarily
> > change when RTC traffic is present. How?
> >
> > Is the assumption that a trusted marker is used as in typical diffserv
> > deployments
> > or that a policer identifies and punishes cheating applications?
> >
> > BTW I'd love to understand how dualQ is supposed to work under more
> > general traffic assumptions.
> >
> > Luca
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane
> 

[-- Attachment #2: Type: text/html, Size: 12882 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-14 20:10   ` [Ecn-sane] [tsvwg] " Luca Muscariello
  2019-06-14 21:44     ` Dave Taht
@ 2019-06-19  1:15     ` Bob Briscoe
  2019-06-19  1:33       ` Dave Taht
  2019-06-19  4:24       ` Holland, Jake
  1 sibling, 2 replies; 84+ messages in thread
From: Bob Briscoe @ 2019-06-19  1:15 UTC (permalink / raw)
  To: Luca Muscariello; +Cc: Holland, Jake, tsvwg, ecn-sane

[-- Attachment #1: Type: text/plain, Size: 9763 bytes --]

Luca,

I'm still preparing a (long) reply to Jake's earlier (long) response. 
But I'll take time out to quickly clear this point up inline...

On 14/06/2019 21:10, Luca Muscariello wrote:
>
> On Fri, Jun 7, 2019 at 8:10 PM Bob Briscoe <ietf@bobbriscoe.net 
> <mailto:ietf@bobbriscoe.net>> wrote:
>
>
>      I'm afraid there are not the same pressures to cause rapid
>     roll-out at all, cos it's flakey now, jam tomorrow. (Actually
>     ECN-DualQ-SCE has a much greater problem - complete starvation of
>     SCE flows - but we'll come on to that in Q4.)
>
>     I want to say at this point, that I really appreciate all the
>     effort you've been putting in, trying to find common ground.
>
>     In trying to find a compromise, you've taken the fire that is
>     really aimed at the inadequacy of underlying SCE protocol - for
>     anything other than FQ. If the primary SCE proponents had
>     attempted to articulate a way to use SCE in a single queue or a
>     dual queue, as you have, that would have taken my fire.
>
>>     But regardless, the queue-building from classic ECN-capable endpoints that
>>     only get 1 congestion signal per RTT is what I understand as the main
>>     downside of the tradeoff if we try to use ECN-capability as the dualq
>>     classifier.  Does that match your understanding?
>     This is indeed a major concern of mine (not as major as the
>     starvation of SCE explained under Q4, but we'll come to that).
>
>     Fine-grained (DCTCP-like) and coarse-grained (Cubic-like)
>     congestion controls need to be isolated, but I don't see how,
>     unless their packets are tagged for separate queues. Without a
>     specific fine/coarse identifier, we're left with having to re-use
>     other identifiers:
>
>       * You've tried to use ECN vs Not-ECN. But that still lumps two
>         large incompatible groups (fine ECN and coarse ECN) together.
>       * The only alternative that would serve this purpose is the flow
>         identifier at layer-4, because it isolates everything from
>         everything else. FQ is where SCE started, and that seems to be
>         as far as it can go.
>
>     Should we burn the last unicorn for a capability needed on
>     "carrier-scale" boxes, but which requires FQ to work? Perhaps yes
>     if there was no alternative. But there is: L4S.
>
>
> I have a problem to understand why all traffic ends up to be 
> classified as either Cubic-like or DCTCP-like.
> If we know that this is not true today I fail to understand why this 
> should be the case in the future.
> It is also difficult to predict now how applications will change in 
> the future in terms of the traffic mix they'll generate.
> I feel like we'd be moving towards more customized transport services 
> with less predictable patterns.
>
> I do not see for instance much discussion about the presence of RTC 
> traffic and how the dualQ system behaves when the
> input traffic does not respond as expected by the 2-types of sources 
> assumed by dualQ.
I'm sorry for using "Cubic-like" and "DCTCP-like", but I was trying 
(obviously unsuccessfully) to be clearer than using 'Classic' and 
'Scalable'.

"Classic" means traffic driven by congestion controls designed to 
coexist in the same queue with Reno (TCP-friendly), which necessarily 
makes it unscalable, as explained below.

The definition of a scalable congestion control concerns the power b in 
the relationship between window, W and the fraction of congestion 
signals, p (ECN or drop) under stable conditions:
     W = k / p^b
where k is a constant (or in some cases a function of other parameters 
such as RTT).
     If b >= 1 the CC is scalable.
     If b < 1 it is not (i.e. Classic).

"Scalable" does not exclude RTC traffic. For instance the L4S variant of 
SCReAM that Ingemar just talked about is scalable ("DCTCP-like"), 
because it has b = 1.

I used "Cubic-like" 'cos there's more Cubic than Reno on the current 
Internet. Over Internet paths with typical BDP, Cubic is always in its 
Reno-friendly mode, and therefore also just as unscalable as Reno, with 
b = 1/2 (inversely proportional to the square-root). Even in its proper 
Cubic mode on high BDP paths, Cubic is still unscalable with b = 0.75.

As flow rate scales up, the increase-decrease sawteeth of unscalable CCs 
get very large and very infrequent, so the control becomes extremely 
slack during dynamics. Whereas the sawteeth of scalable CCs stay 
invariant and tiny at any scale, keeping control tight, queuing low and 
utilization high. See the example of Cubic & DCTCP at Slide 5 here:
https://www.files.netdevconf.org/f/4ebdcdd6f94547ad8b77/?dl=1

Also, there's a useful plot of when Cubic switches to Reno mode on the 
last slide.

>
> If my application is using simulcast or multi-stream techniques I can 
> have several video streams in the same link,  that, as far as I 
> understand,
> will get significant latency in the classic queue.

You are talking as if you think that queuing delay is caused by the 
buffer. You haven't said what your RTC congestion control is (gcc 
perhaps?). Whatever, assuming it's TCP-friendly, even in a queue on its 
own, it will need to induce about 1 additional base RTT of queuing delay 
to maintain full utilization.

In the coupled dualQ AQM, the classic queue runs a state-of-the-art 
classic AQM (PI2 in our implementation) with a target delay of 15ms. 
With any less, your classic congestion controlled streams would 
under-utilize the link.

> Unless my app starts cheating by marking packets to get into the 
> priority queue.
There's two misconceptions here about the DualQ Coupled AQM that I need 
to correct.

1/ As above, if a classic CC can't build ~1 base RTT of queue in the 
classic buffer, it badly underutiizes. So if you 'cheat' by directing 
traffic from a queue-building CC into the low latency queue with a 
shallow ECN threshold, you'll just massively under-utilize the capacity.

2/ Even if it were a strict priority scheduler it wouldn't determine the 
scheduling under all normal traffic conditions. The coupling between the 
AQMs dominates the scheduler. I'll explain next...

>
> In both cases, i.e. my RTC app is cheating or not, I do not understand 
> how the parametrization of the dualQ scheduler
> can cope with traffic that behaves in a different way to what is 
> assumed while tuning parameters.
> For instance, in one instantiation of dualQ based on WRR the weights 
> are set to 1:16.  This has to necessarily
> change when RTC traffic is present. How?

The coupling simply applies congestion signals from the C queue across 
into the L queue, as if the C flows were L flows. So, the L flows leave 
sufficient space for however many C flows there are. Then, in all the 
gaps that the L traffic leaves, any work-conserving scheduler can be 
used to serve the C queue.

The WRR scheduler is only there in case of overload or unresponsive L 
traffic; to prevent the Classic queue starving.

>
> Is the assumption that a trusted marker is used as in typical diffserv 
> deployments
> or that a policer identifies and punishes cheating applications?
As explained, if a classic flow cheats, it will get v low throughput. So 
it has no incentive to cheat.

There's still the possibility of bugs/accidents/malice. The need for 
general Internet flows to be responsive to congestion is also vulnerable 
to bugs/accidents/malice, but it hasn't needed policing.

Nonetheless, in Low Latency DOCSIS, we have implemented a queue 
protection function that maintains a queuing score per flow. Then, any 
packets from high-scoring flows that would cause the queue to exceed a 
threshold delay, are redirected to the classic queue instead. For 
well-behaved flows the state that holds the score ages out between 
packets, so only ill-behaved flows hold flow-state long term.

Queue protection might not be needed, but it's as well to have it in 
case. It can be disabled.

>
> BTW I'd love to understand how dualQ is supposed to work under more 
> general traffic assumptions.
Coexistence with Reno is a general requirement for long-running Internet 
traffic. That's really all we depend on. That also covers RTC flows in 
the C queue that average to similar throughput as Reno but react more 
smoothly.

The L traffic can be similarly heterogeneous - part of the L4S 
experiment is to see how broad that will stretch to. It can certainly 
accommodate other lighter traffic like VoIP, DNS, flow startups, 
transactional, etc, etc.

BBR (v1) is a good example of something different that wasn't designed 
to coexist with Reno. It sort-of avoided too many problems by being 
primarily used for app-limited flows. It does its RTT probing on much 
longer timescales than typical sawtoothing congestion controls, running 
on a model of the link between times, so it doesn't fit the formulae above.

For BBRv2 we're promised that the non-ECN side of it will coexist with 
existing Internet traffic, at least above a certain loss level. Without 
having seen it I can't be sure, but I assume that implies it will fit 
the formulae above in some way.

PS. I believe all the above is explained in the three L4S Internet 
drafts, which we've taken a lot of trouble over. I don't really want to 
have to keep explaining it longhand in response to each email. So I'd 
prefer questions to be of the form "In section X of draft Y, I don't 
understand Z". Then I can devote my time to improving the drafts.

Alternatively, there's useful papers of various lengths on the L4S 
landing page at:
https://riteproject.eu/dctth/#papers

Cheers

Bob

>
> Luca
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[-- Attachment #2: Type: text/html, Size: 14535 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-19  1:15     ` [Ecn-sane] [tsvwg] Comments " Bob Briscoe
@ 2019-06-19  1:33       ` Dave Taht
  2019-06-19  4:24       ` Holland, Jake
  1 sibling, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-06-19  1:33 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Luca Muscariello, ECN-Sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 10166 bytes --]

I simply have one question. Is the code for the modified dctcp and dualpi
in the l4steam repos on github ready for independent testing?

On Tue, Jun 18, 2019, 6:15 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Luca,
>
> I'm still preparing a (long) reply to Jake's earlier (long) response. But
> I'll take time out to quickly clear this point up inline...
>
> On 14/06/2019 21:10, Luca Muscariello wrote:
>
>
> On Fri, Jun 7, 2019 at 8:10 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:
>
>>
>>  I'm afraid there are not the same pressures to cause rapid roll-out at
>> all, cos it's flakey now, jam tomorrow. (Actually ECN-DualQ-SCE has a much
>> greater problem - complete starvation of SCE flows - but we'll come on to
>> that in Q4.)
>>
>> I want to say at this point, that I really appreciate all the effort
>> you've been putting in, trying to find common ground.
>>
>> In trying to find a compromise, you've taken the fire that is really
>> aimed at the inadequacy of underlying SCE protocol - for anything other
>> than FQ. If the primary SCE proponents had attempted to articulate a way to
>> use SCE in a single queue or a dual queue, as you have, that would have
>> taken my fire.
>>
>> But regardless, the queue-building from classic ECN-capable endpoints that
>> only get 1 congestion signal per RTT is what I understand as the main
>> downside of the tradeoff if we try to use ECN-capability as the dualq
>> classifier.  Does that match your understanding?
>>
>> This is indeed a major concern of mine (not as major as the starvation of
>> SCE explained under Q4, but we'll come to that).
>>
>> Fine-grained (DCTCP-like) and coarse-grained (Cubic-like) congestion
>> controls need to be isolated, but I don't see how, unless their packets are
>> tagged for separate queues. Without a specific fine/coarse identifier,
>> we're left with having to re-use other identifiers:
>>
>>    - You've tried to use ECN vs Not-ECN. But that still lumps two large
>>    incompatible groups (fine ECN and coarse ECN) together.
>>    - The only alternative that would serve this purpose is the flow
>>    identifier at layer-4, because it isolates everything from everything else.
>>    FQ is where SCE started, and that seems to be as far as it can go.
>>
>> Should we burn the last unicorn for a capability needed on
>> "carrier-scale" boxes, but which requires FQ to work? Perhaps yes if there
>> was no alternative. But there is: L4S.
>>
>>
> I have a problem to understand why all traffic ends up to be classified as
> either Cubic-like or DCTCP-like.
> If we know that this is not true today I fail to understand why this
> should be the case in the future.
> It is also difficult to predict now how applications will change in the
> future in terms of the traffic mix they'll generate.
> I feel like we'd be moving towards more customized transport services with
> less predictable patterns.
>
> I do not see for instance much discussion about the presence of RTC
> traffic and how the dualQ system behaves when the
> input traffic does not respond as expected by the 2-types of sources
> assumed by dualQ.
>
> I'm sorry for using "Cubic-like" and "DCTCP-like", but I was trying
> (obviously unsuccessfully) to be clearer than using 'Classic' and
> 'Scalable'.
>
> "Classic" means traffic driven by congestion controls designed to coexist
> in the same queue with Reno (TCP-friendly), which necessarily makes it
> unscalable, as explained below.
>
> The definition of a scalable congestion control concerns the power b in
> the relationship between window, W and the fraction of congestion signals,
> p (ECN or drop) under stable conditions:
>     W = k / p^b
> where k is a constant (or in some cases a function of other parameters
> such as RTT).
>     If b >= 1 the CC is scalable.
>     If b < 1 it is not (i.e. Classic).
>
> "Scalable" does not exclude RTC traffic. For instance the L4S variant of
> SCReAM that Ingemar just talked about is scalable ("DCTCP-like"), because
> it has b = 1.
>
> I used "Cubic-like" 'cos there's more Cubic than Reno on the current
> Internet. Over Internet paths with typical BDP, Cubic is always in its
> Reno-friendly mode, and therefore also just as unscalable as Reno, with b =
> 1/2 (inversely proportional to the square-root). Even in its proper Cubic
> mode on high BDP paths, Cubic is still unscalable with b = 0.75.
>
> As flow rate scales up, the increase-decrease sawteeth of unscalable CCs
> get very large and very infrequent, so the control becomes extremely slack
> during dynamics. Whereas the sawteeth of scalable CCs stay invariant and
> tiny at any scale, keeping control tight, queuing low and utilization high.
> See the example of Cubic & DCTCP at Slide 5 here:
> https://www.files.netdevconf.org/f/4ebdcdd6f94547ad8b77/?dl=1
>
> Also, there's a useful plot of when Cubic switches to Reno mode on the
> last slide.
>
>
> If my application is using simulcast or multi-stream techniques I can have
> several video streams in the same link,  that, as far as I understand,
> will get significant latency in the classic queue.
>
>
> You are talking as if you think that queuing delay is caused by the
> buffer. You haven't said what your RTC congestion control is (gcc
> perhaps?). Whatever, assuming it's TCP-friendly, even in a queue on its
> own, it will need to induce about 1 additional base RTT of queuing delay to
> maintain full utilization.
>
> In the coupled dualQ AQM, the classic queue runs a state-of-the-art
> classic AQM (PI2 in our implementation) with a target delay of 15ms. With
> any less, your classic congestion controlled streams would under-utilize
> the link.
>
> Unless my app starts cheating by marking packets to get into the priority
> queue.
>
> There's two misconceptions here about the DualQ Coupled AQM that I need to
> correct.
>
> 1/ As above, if a classic CC can't build ~1 base RTT of queue in the
> classic buffer, it badly underutiizes. So if you 'cheat' by directing
> traffic from a queue-building CC into the low latency queue with a shallow
> ECN threshold, you'll just massively under-utilize the capacity.
>
> 2/ Even if it were a strict priority scheduler it wouldn't determine the
> scheduling under all normal traffic conditions. The coupling between the
> AQMs dominates the scheduler. I'll explain next...
>
>
> In both cases, i.e. my RTC app is cheating or not, I do not understand how
> the parametrization of the dualQ scheduler
> can cope with traffic that behaves in a different way to what is assumed
> while tuning parameters.
> For instance, in one instantiation of dualQ based on WRR the weights are
> set to 1:16.  This has to necessarily
> change when RTC traffic is present. How?
>
>
> The coupling simply applies congestion signals from the C queue across
> into the L queue, as if the C flows were L flows. So, the L flows leave
> sufficient space for however many C flows there are. Then, in all the gaps
> that the L traffic leaves, any work-conserving scheduler can be used to
> serve the C queue.
>
> The WRR scheduler is only there in case of overload or unresponsive L
> traffic; to prevent the Classic queue starving.
>
>
>
> Is the assumption that a trusted marker is used as in typical diffserv
> deployments
> or that a policer identifies and punishes cheating applications?
>
> As explained, if a classic flow cheats, it will get v low throughput. So
> it has no incentive to cheat.
>
> There's still the possibility of bugs/accidents/malice. The need for
> general Internet flows to be responsive to congestion is also vulnerable to
> bugs/accidents/malice, but it hasn't needed policing.
>
> Nonetheless, in Low Latency DOCSIS, we have implemented a queue protection
> function that maintains a queuing score per flow. Then, any packets from
> high-scoring flows that would cause the queue to exceed a threshold delay,
> are redirected to the classic queue instead. For well-behaved flows the
> state that holds the score ages out between packets, so only ill-behaved
> flows hold flow-state long term.
>
> Queue protection might not be needed, but it's as well to have it in case.
> It can be disabled.
>
>
> BTW I'd love to understand how dualQ is supposed to work under more
> general traffic assumptions.
>
> Coexistence with Reno is a general requirement for long-running Internet
> traffic. That's really all we depend on. That also covers RTC flows in the
> C queue that average to similar throughput as Reno but react more smoothly.
>
> The L traffic can be similarly heterogeneous - part of the L4S experiment
> is to see how broad that will stretch to. It can certainly accommodate
> other lighter traffic like VoIP, DNS, flow startups, transactional, etc,
> etc.
>
>
> BBR (v1) is a good example of something different that wasn't designed to
> coexist with Reno. It sort-of avoided too many problems by being primarily
> used for app-limited flows. It does its RTT probing on much longer
> timescales than typical sawtoothing congestion controls, running on a model
> of the link between times, so it doesn't fit the formulae above.
>
> For BBRv2 we're promised that the non-ECN side of it will coexist with
> existing Internet traffic, at least above a certain loss level. Without
> having seen it I can't be sure, but I assume that implies it will fit the
> formulae above in some way.
>
>
> PS. I believe all the above is explained in the three L4S Internet drafts,
> which we've taken a lot of trouble over. I don't really want to have to
> keep explaining it longhand in response to each email. So I'd prefer
> questions to be of the form "In section X of draft Y, I don't understand
> Z". Then I can devote my time to improving the drafts.
>
> Alternatively, there's useful papers of various lengths on the L4S landing
> page at:
> https://riteproject.eu/dctth/#papers
>
>
> Cheers
>
>
>
> Bob
>
>
>
> Luca
>
>
>
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane
>

[-- Attachment #2: Type: text/html, Size: 15430 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-19  1:15     ` [Ecn-sane] [tsvwg] Comments " Bob Briscoe
  2019-06-19  1:33       ` Dave Taht
@ 2019-06-19  4:24       ` Holland, Jake
  2019-06-19 13:02         ` Luca Muscariello
  2019-07-04 13:45         ` Bob Briscoe
  1 sibling, 2 replies; 84+ messages in thread
From: Holland, Jake @ 2019-06-19  4:24 UTC (permalink / raw)
  To: Bob Briscoe, Luca Muscariello; +Cc: tsvwg, ecn-sane

Hi Bob and Luca,

Thank you both for this discussion, I think it helped crystallize a
comment I hadn't figured out how to make yet, but was bothering me.

I’m reading Luca’s question as asking about fixed-rate traffic that does
something like a cutoff or downshift if loss gets bad enough for long
enough, but is otherwise unresponsive.

The dualq draft does discuss unresponsive traffic in 3 of the sub-
sections in section 4, but there's a point that seems sort of swept
aside without comment in the analysis to me.

The referenced paper[1] from that section does examine the question
of sharing a link with unresponsive traffic in some detail, but the
analysis seems to bake in an assumption that there's a fixed amount
of unresponsive traffic, when in fact for a lot of the real-life
scenarios for unresponsive traffic (games, voice, and some of the
video conferencing) there's some app-level backpressure, in that
when the quality of experience goes low enough, the user (or a qoe
trigger in the app) will often change the traffic demand at a higher
layer than a congestion controller (by shutting off video, for
instance).

The reason I mention it is because it seems like unresponsive
traffic has an incentive to mark L4S and get low latency.  It doesn't
hurt, since it's a fixed rate and not bandwidth-seeking, so it's
perfectly happy to massively underutilize the link. And until the
link gets overloaded it will no longer suffer delay when using the
low latency queue, whereas in the classic queue queuing delay provides
a noticeable degradation in the presence of competing traffic.

I didn't see anywhere in the paper that tried to check the quality
of experience for the UDP traffic as non-responsive traffic approached
saturation, except by inference that loss in the classic queue will
cause loss in the LL queue as well.

But letting unresponsive flows get away with pushing out more classic
traffic and removing the penalty that classic flows would give it seems
like a risk that would result in more use of this kind of unresponsive
traffic marking itself for the LL queue, since it just would get lower
latency almost up until overload.

Many of the apps that send unresponsive traffic would benefit from low
latency and isolation from the classic traffic, so it seems a mistake
to claim there's no benefit, and it furthermore seems like there's
systematic pressures that would often push unresponsive apps into this
domain.

If that line of reasoning holds up, the "rather specific" phrase in
section 4.1.1 of the dualq draft might not turn out to be so specific
after all, and could be seen as downplaying the risks.

Best regards,
Jake

[1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf

PS: This seems like a consequence of the lack of access control on
setting ECT(1), and maybe the queue protection function would address
it, so that's interesting to hear about.

But I thought the whole point of dualq over fq was that fq state couldn't
scale properly in aggregating devices with enough expected flows sharing
a queue?  If this protection feature turns out to be necessary, would that
advantage be gone?  (Also: why would one want to turn this protection off
if it's available?)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-19  4:24       ` Holland, Jake
@ 2019-06-19 13:02         ` Luca Muscariello
  2019-07-04 11:54           ` Bob Briscoe
  2019-07-04 13:45         ` Bob Briscoe
  1 sibling, 1 reply; 84+ messages in thread
From: Luca Muscariello @ 2019-06-19 13:02 UTC (permalink / raw)
  To: Holland, Jake; +Cc: Bob Briscoe, tsvwg, ecn-sane

[-- Attachment #1: Type: text/plain, Size: 5469 bytes --]

Jake,

Yes, that is one scenario that I had in mind.
Your response comforts me that I my message was not totally unreadable.

My understanding was
- There are incentives to mark packets  if they get privileged treatment
because of that marking. This is similar to the diffserv model with all the
consequences in terms of trust.
- Unresponsive traffic in particular (gaming, voice, video etc.) has
incentives to mark. Assuming there is x% of unresponsive traffic in the
priority queue, it is non trivial to guess how the system works.
- in particular it is easy to see the extreme cases,
               (a) x is very small, assuming the system is stable, the
overall equilibrium will not change.
               (b) x is very large so the dctcp like sources fall back to
cubic like and the systems behave almost like a single FIFO.
               (c) in all other cases x varies according to the
unresponsive sources' rates.
                    Several different equilibria may exist, some of which
may include oscillations. Including oscillations of all fallback
mechanisms.
The reason I'm asking is that these cases are not discussed in the I-D
documents or in the references, despite these are very common use cases.

If we add the queue protection mechanism, all unresponsive  flows that are
caught cheating are registered in a blacklist and always scheduled in the
non-priority queue.
It that happens unresponsive flows will get a service quality that is worse
than if using a single FIFO for all flows.

Using a flow blacklist brings back the complexity that dualq is supposed to
remove compared to flow-isolation by flow-queueing.
It seems to me that the blacklist is actually necessary to make dualq work
under the assumption that x is small, because in the other cases the
behavior
of the dualq system is unspecified and likely subject to instabilities,
i.e. potentially different kind of oscillations.

Luca




On Tue, Jun 18, 2019 at 9:25 PM Holland, Jake <jholland@akamai.com> wrote:

> Hi Bob and Luca,
>
> Thank you both for this discussion, I think it helped crystallize a
> comment I hadn't figured out how to make yet, but was bothering me.
>
> I’m reading Luca’s question as asking about fixed-rate traffic that does
> something like a cutoff or downshift if loss gets bad enough for long
> enough, but is otherwise unresponsive.
>
> The dualq draft does discuss unresponsive traffic in 3 of the sub-
> sections in section 4, but there's a point that seems sort of swept
> aside without comment in the analysis to me.
>
> The referenced paper[1] from that section does examine the question
> of sharing a link with unresponsive traffic in some detail, but the
> analysis seems to bake in an assumption that there's a fixed amount
> of unresponsive traffic, when in fact for a lot of the real-life
> scenarios for unresponsive traffic (games, voice, and some of the
> video conferencing) there's some app-level backpressure, in that
> when the quality of experience goes low enough, the user (or a qoe
> trigger in the app) will often change the traffic demand at a higher
> layer than a congestion controller (by shutting off video, for
> instance).
>
> The reason I mention it is because it seems like unresponsive
> traffic has an incentive to mark L4S and get low latency.  It doesn't
> hurt, since it's a fixed rate and not bandwidth-seeking, so it's
> perfectly happy to massively underutilize the link. And until the
> link gets overloaded it will no longer suffer delay when using the
> low latency queue, whereas in the classic queue queuing delay provides
> a noticeable degradation in the presence of competing traffic.
>
> I didn't see anywhere in the paper that tried to check the quality
> of experience for the UDP traffic as non-responsive traffic approached
> saturation, except by inference that loss in the classic queue will
> cause loss in the LL queue as well.
>
> But letting unresponsive flows get away with pushing out more classic
> traffic and removing the penalty that classic flows would give it seems
> like a risk that would result in more use of this kind of unresponsive
> traffic marking itself for the LL queue, since it just would get lower
> latency almost up until overload.
>
> Many of the apps that send unresponsive traffic would benefit from low
> latency and isolation from the classic traffic, so it seems a mistake
> to claim there's no benefit, and it furthermore seems like there's
> systematic pressures that would often push unresponsive apps into this
> domain.
>
> If that line of reasoning holds up, the "rather specific" phrase in
> section 4.1.1 of the dualq draft might not turn out to be so specific
> after all, and could be seen as downplaying the risks.
>
> Best regards,
> Jake
>
> [1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf
>
> PS: This seems like a consequence of the lack of access control on
> setting ECT(1), and maybe the queue protection function would address
> it, so that's interesting to hear about.
>
> But I thought the whole point of dualq over fq was that fq state couldn't
> scale properly in aggregating devices with enough expected flows sharing
> a queue?  If this protection feature turns out to be necessary, would that
> advantage be gone?  (Also: why would one want to turn this protection off
> if it's available?)
>
>
>

[-- Attachment #2: Type: text/html, Size: 6373 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] Comments on L4S drafts
  2019-06-14 17:39   ` Holland, Jake
@ 2019-06-19 14:11     ` Bob Briscoe
  2019-07-10 13:55       ` Holland, Jake
  0 siblings, 1 reply; 84+ messages in thread
From: Bob Briscoe @ 2019-06-19 14:11 UTC (permalink / raw)
  To: Holland, Jake; +Cc: tsvwg, ecn-sane

[-- Attachment #1: Type: text/plain, Size: 5612 bytes --]

Jake,

On 14/06/2019 18:39, Holland, Jake wrote:
>
> Hi Bob,
>
> Thanks for your response, I think it helped clarify some important things
>
> for me.
>
> The point about starvation especially was a good one I hadn't fully
>
> considered, and I agree if SCE-based implementations can’t demonstrate a
>
> solution, that would be a major problem with the SCE approach for 
> signaling.
>
> And sorry for my slow response, I ended up restarting a few times to 
> try to
>
> dodge ratholes.  (Plus some day-job duties, apologies...)
>
> I found it a bit challenging to avoid the ratholes effectively, so I'm
>
> thinking maybe the right move is to set up a testbed.  Maybe playing with
>
> that (very cool-looking!) L4SDemo tool can either ease my concerns, or
>
> provide some more specific and detailed scenarios to address.
>
> I see that the source code is published now at
>
> https://github.com/L4STeam/l4sdemo (thanks Olivier!).  So I’ll try to
>
> bring that up at some point, time permitting, in hopes it makes the
>
> comments and questions more productive.
>
[BB] Cool.

> One meta-point I wanted to make:
>
>   "In trying to find a compromise, you've taken the fire that is really
>
>   aimed at the inadequacy of underlying SCE protocol - for anything
>
>   other than FQ.  If the primary SCE proponents had attempted to
>
>   articulate a way to use SCE in a single queue or a dual queue, as you
>
>   have, that would have taken my fire."
>
> I think "fire" here is a potentially harmful metaphor--I don't take your
>
> comments as an attack or this discussion as a battle, but rather a
>
> collaborative attempt to reach a common goal of a better internet.
>
> I hope my comments on this are received the same way, even where we don't
>
> see eye to eye yet.  While both ideas can't be the best use of ECT(1) at
>
> the same time, I take this discussion as an effort to reach a common and
>
> complete understanding of the issues at hand, so that we can hopefully
>
> agree on the best approach in the end (or if we can't get there, maybe we
>
> can at least agree on the underlying reasons we don't agree).
>
[BB] Understood. I was concerned that I was demolishing your idea in 
public, and I was trying to thank you for being willing to put up a 
strawman.

My quest is also solely to improve the Internet. I spend my 
half-sleeping hours thinking through all the possible side-effects and 
combinations of problems with different solutions. I hope I give due 
weight to problems with my own ideas vs. problems with those of others. 
However, recently I have had to counter some rather nasty slurs on our 
work and our motivations, that did require some over-compensation.


> With that said, a few brief points I think really should be raised:
>
> 1. "non-problem" is an unreasonably strong conclusion to reach from a
>
> snapshot failure to detect any single-queue marking AQMs.
>
> We know that tc-pie exists in widely deployed systems, supports ECN, and
>
> could be turned on at any moment by anybody, and we also know there's an
>
> increased interest in ECN since Apple and Linux got it turned on on
>
> endpoints.  Even if we measure everything today, it’s hard to be sure this
>
> wouldn’t impact an in-progress rollout that someone has been working 
> toward
>
> for their network with proper due diligence, and following IETF advice
>
> faithfully.
>
> I think if the intent is really to deploy this experiment under the claim
>
> that's a non-problem, it should be called out in the docs as a risk 
> factor,
>
> and consensus should probably be explicitly checked on that point.  It 
> also
>
> probably would be polite to update RFC 7567's advice in section 4, 
> since it
>
> seems like this position would invalidate (or at least add nuance) to
>
> several of the SHOULDs given there, recommending the use of ECN.
>
[BB] I understand this, and indeed I've been on the other side of it 
(where someone else's inconsiderate deployment screwed up something I 
had been working on for years - and screwed up other things others had 
been working on). Nonetheless, to a certain extent, it is the Wild West 
out there, and we cannot interminably walk on egg-shells to the extent 
that nothing gets done.

Indeed, FQ itself screwed up the work on background transport protocols, 
and many other plans for novel applications of unequal throughput (I'll 
start a separate thread on that).

Don't worry. Classic ECN fall-back is on the ToDo list. I just didn't 
want to do it unless we have to, cos I prefer simplicity.



> 2. “does not starve a classic flow, but can be highly unequal” is also
>
> perhaps too low a bar to consider a non-problem, and also seems like maybe
>
> it deserves to be called out as a risk factor.
>
[BB] To be clear, I wasn't trying to say that a lesser problem was a 
non-problem.

I was pointing out that the word starvation has a specific meaning, that 
doesn't apply to Scalable vs Classic, but does apply to SCE vs Cubic 
(both in a single queue).

> 3. One more meta-point: the sales-y language makes the drafts hard to
>
> read for me, so please forgive some of my confusion.  I'm having a hard
>
> time distinguishing the claims that are well-supported by test results 
> in a
>
> realistic experimental design from some of the claims that are more 
> forward-
>
> looking or speculative.
>
[BB] if there are any you want changed, pls call them out.

Cheers



Bob
>
> Best regards,
>
> Jake
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


[-- Attachment #2: Type: text/html, Size: 17257 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-19 13:02         ` Luca Muscariello
@ 2019-07-04 11:54           ` Bob Briscoe
  2019-07-04 12:24             ` Jonathan Morton
  2019-07-05  9:48             ` Luca Muscariello
  0 siblings, 2 replies; 84+ messages in thread
From: Bob Briscoe @ 2019-07-04 11:54 UTC (permalink / raw)
  To: Luca Muscariello, Holland, Jake; +Cc: ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 13699 bytes --]

Luca,

On 19/06/2019 14:02, Luca Muscariello wrote:
> Jake,
>
> Yes, that is one scenario that I had in mind.
> Your response comforts me that I my message was not totally unreadable.
>
> My understanding was
> - There are incentives to mark packets  if they get privileged 
> treatment because of that marking. This is similar to the diffserv 
> model with all the consequences in terms of trust.
[BB] I'm afraid this is a common misunderstanding. We have gone to great 
lengths to ensure that the coupled dualQ does not give any privilege, by 
separating out latency from throughput, so:

  * It solely isolates traffic that gives /itself/ low latency from
    traffic that doesn't.
  * It is very hard to get any throughput advantage from the mechanism,
    relative to a FIFO (see further down this email).

The phrase "relative to a FIFO" is important. In a FIFO, it is of course 
possible for flows to take more throughput than others. We see that as a 
feature of the Internet not a bug. But we accept that some might disagree...

So those that want equal flow rates can add per-flow bandwidth policing, 
e.g. AFD, to the coupled dualQ. But that should be (and now can be) a 
separate policy choice.

An important advance of the coupled dualQ is to cut latency without 
interfering with throughput.

> - Unresponsive traffic in particular (gaming, voice, video etc.) has 
> incentives to mark. Assuming there is x% of unresponsive traffic in 
> the priority queue, it is non trivial to guess how the system works.
> - in particular it is easy to see the extreme cases,
>                (a) x is very small, assuming the system is stable, the 
> overall equilibrium will not change.
>                (b) x is very large so the dctcp like sources fall back 
> to cubic like and the systems behave almost like a single FIFO.
>                (c) in all other cases x varies according to the 
> unresponsive sources' rates.
>                     Several different equilibria may exist, some of 
> which may include oscillations. Including oscillations of all 
> fallback  mechanisms.
> The reason I'm asking is that these cases are not discussed in the I-D 
> documents or in the references, despite these are very common use cases.
[BB] This has all already been explained and discussed at length during 
various IETF meetings. I had an excellent student (Henrik Steen) act as 
a "red-team" guy. His challenge was: Can you contrive a mis-marking 
strategy with unresponsive traffic to cause any more harm than in a 
FIFO? We wanted to make sure that introducing a priority scheduler could 
not be exploited as a significant new attack vector.

Have you looked at his thesis - the [DualQ-Test 
<https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#ref-DualQ-Test>] 
reference at the end of this subsection of the Security Considerations 
in the aqm-dualq-coupled draft:
4.1.3. Protecting against Unresponsive ECN-Capable Traffic 
<https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#section-4.1.3> 
?
(we ruled evaluation results out of scope of this already over-long 
draft - instead giving references).

Firstly, when unresponsive traffic < link rate, counter-intuitively it 
doesn't matter which queue it classifies itself into. Any responsive 
traffic in either or both queues still shares out the remaining capacity 
as if the unresponsive traffic had subtracted from the overall capacity 
(like a FIFO).

Beyond that, Henrik tested whether the persistent overload mechanism 
that switches off any distinction between the queues (code in the 
reference Linux implementation 
<https://github.com/L4STeam/sch_dualpi2_upstream/blob/master/net/sched/sch_dualpi2.c>, 
pseudocode and explanation in Appendix A.2 
<https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#appendix-A.2>) 
left any room for mis-marked traffic to gain an advantage before the 
switch-over. There was a narrow region in which unresponsive traffic 
mismarked as ECN could strengthen its attack relative to the same attack 
on the Classic queue without mismarking.

I presented a one-slide summary of Henrik's experiment here in 2017 in 
IETF tcpm 
<https://datatracker.ietf.org/meeting/99/materials/slides-99-tcpm-ecn-adding-explicit-congestion-notification-ecn-to-tcp-control-packets-02#page=12>.
I tried to make the legends self-explanatory as long as you work at it, 
but shout if you need it explained.
Each column of plots shows attack traffic at increasing fractions of the 
link rate; from 70% to 200%.

Try to spot the difference between the odd columns and the even columns 
- they're just a little different in the narrow window either side of 
100% - a sharp kink instead of a smooth kink.
I included log-scale plots of the bottom end of the range to magnify the 
difference.

Yes, the system oscillates around the switch-over point, but you can see 
from the tcpm slide that the oscillations are also there in the 3rd 
column (which emulates the same switch-over in a FIFO). So we haven't 
added a new problem.

In summary, the advantage of mismarking was small and it was hard for 
the attacker not to trip the dualQ into overload state when it applies 
the same drop level in either queue. And that was when the victim 
traffic was just a predictable long-running flow. With normal less 
predictable victim traffic, I cannot think how to get this attack to be 
effective.

> If we add the queue protection mechanism, all unresponsive flows that 
> are caught cheating are registered in a blacklist and always scheduled 
> in the non-priority queue.
[BB]
1/ Queue protection is an alternative to overload protection, not an 
addition.

  * The Linux implementation solely uses the overload mechanism, which
    is sufficient to prevent the priority scheduler amplifying a
    mismarking attack (whether ECN or DSCP).
  * The DOCSIS implementation use per-flow queue protection instead.

2/ Aligned incentives

The coupled dualQ with just overload protection ensures incentives are 
aligned so that, normal developers won't intentionally mismark traffic. 
As explained at the start of this email:

    the DualQ solely isolates traffic that gives /itself/ low latency
    from traffic that doesn't. Low latency solely depends on the
    traffic's own behaviour. Traffic doesn't /get/ anything from the low
    latency queue, so there's no point mismarking to get into it.

However, incentives only address rational behaviour, not accidents and 
malice. That's why DOCSIS operators asked for Q protection - to protect 
against something accidentally or deliberately introducing bursty or 
excessive traffic into the low latency queue.

The Linux code is sufficient under normal circumstances though. There 
are already other mechanisms that deal with the worms, trojans, etc. 
that might launch these attacks.

3/ DOCSIS Q protection does not black-list flows.

It redirects certain /packets/ from those flows with the highest queuing 
scores into the Classic queue, only if those packets would otherwise 
risk a threshold delay for the low latency queue being exceeded.

If a flow has a temporary wobble, some of its packets get redirected to 
protect the low latency queue, but if it gets back on track, then 
there's just no further packet redirection.

> It that happens unresponsive flows will get a service quality that is 
> worse than if using a single FIFO for all flows.
4/ Slight punishment is a feature, not a bug

If an unresponsive flow is well-paced and not contributing to queuing, 
it will accumulate only a low queuing score, and experience no 
redirected packets.

If it is contributing to queuing and it is mismarking itself, then Q 
Prot will redirect some of its packets, and the continual reordering 
will (intentionally) give it worse service quality. This deliberate 
slight punishment gives developers a slight incentive to mark their 
flows correctly.

I could explain more about the queuing score (I think I already did for 
you on these lists), but it's all in Annex P of the DOCSIS spec 
<https://specification-search.cablelabs.com/CM-SP-MULPIv3.1>. and I'm 
trying to write a stand-alone document about it at the moment.

>
> Using a flow blacklist brings back the complexity that dualq is 
> supposed to remove compared to flow-isolation by flow-queueing.
> It seems to me that the blacklist is actually necessary to make dualq 
> work under the assumption that x is small,
[BB] As above, the Linux implementation works and aligns incentives 
without Q Prot, which is merely an optional additional protection 
against accidents and malice.

(and there's no flow black-list).

> because in the other cases the behavior
> of the dualq system is unspecified and likely subject to 
> instabilities, i.e. potentially different kind of oscillations.

I do find the tone of these emails rather disheartening. We've done all 
this work that we think is really cool. And all we get in return is 
criticism in an authoritative tone as if it is backed by experiments. 
But so far it is not. There seems to be a presumption that we are not 
professional and we are somehow not to be trusted to have done a sound job.

Yes, I'm sure mistakes can be found in our work. But it would be nice if 
the tone of these emails could become more constructive. Possibly even 
some praise. There seems to be a presumption of disrespect that I'm not 
used to, and I would rather it stopped.

Sorry for going silent recently - had too much backlog. I'm working my 
way backwards through this thread. Next I'll reply to Jake's email, 
which is, as always, perfectly constructive.

Cheers

Bob

> Luca
>
>
>
>
> On Tue, Jun 18, 2019 at 9:25 PM Holland, Jake <jholland@akamai.com 
> <mailto:jholland@akamai.com>> wrote:
>
>     Hi Bob and Luca,
>
>     Thank you both for this discussion, I think it helped crystallize a
>     comment I hadn't figured out how to make yet, but was bothering me.
>
>     I’m reading Luca’s question as asking about fixed-rate traffic
>     that does
>     something like a cutoff or downshift if loss gets bad enough for long
>     enough, but is otherwise unresponsive.
>
>     The dualq draft does discuss unresponsive traffic in 3 of the sub-
>     sections in section 4, but there's a point that seems sort of swept
>     aside without comment in the analysis to me.
>
>     The referenced paper[1] from that section does examine the question
>     of sharing a link with unresponsive traffic in some detail, but the
>     analysis seems to bake in an assumption that there's a fixed amount
>     of unresponsive traffic, when in fact for a lot of the real-life
>     scenarios for unresponsive traffic (games, voice, and some of the
>     video conferencing) there's some app-level backpressure, in that
>     when the quality of experience goes low enough, the user (or a qoe
>     trigger in the app) will often change the traffic demand at a higher
>     layer than a congestion controller (by shutting off video, for
>     instance).
>
>     The reason I mention it is because it seems like unresponsive
>     traffic has an incentive to mark L4S and get low latency.  It doesn't
>     hurt, since it's a fixed rate and not bandwidth-seeking, so it's
>     perfectly happy to massively underutilize the link. And until the
>     link gets overloaded it will no longer suffer delay when using the
>     low latency queue, whereas in the classic queue queuing delay provides
>     a noticeable degradation in the presence of competing traffic.
>
>     I didn't see anywhere in the paper that tried to check the quality
>     of experience for the UDP traffic as non-responsive traffic approached
>     saturation, except by inference that loss in the classic queue will
>     cause loss in the LL queue as well.
>
>     But letting unresponsive flows get away with pushing out more classic
>     traffic and removing the penalty that classic flows would give it
>     seems
>     like a risk that would result in more use of this kind of unresponsive
>     traffic marking itself for the LL queue, since it just would get lower
>     latency almost up until overload.
>
>     Many of the apps that send unresponsive traffic would benefit from low
>     latency and isolation from the classic traffic, so it seems a mistake
>     to claim there's no benefit, and it furthermore seems like there's
>     systematic pressures that would often push unresponsive apps into this
>     domain.
>
>     If that line of reasoning holds up, the "rather specific" phrase in
>     section 4.1.1 of the dualq draft might not turn out to be so specific
>     after all, and could be seen as downplaying the risks.
>
>     Best regards,
>     Jake
>
>     [1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf
>
>     PS: This seems like a consequence of the lack of access control on
>     setting ECT(1), and maybe the queue protection function would address
>     it, so that's interesting to hear about.
>
>     But I thought the whole point of dualq over fq was that fq state
>     couldn't
>     scale properly in aggregating devices with enough expected flows
>     sharing
>     a queue?  If this protection feature turns out to be necessary,
>     would that
>     advantage be gone?  (Also: why would one want to turn this
>     protection off
>     if it's available?)
>
>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[-- Attachment #2: Type: text/html, Size: 19085 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-04 11:54           ` Bob Briscoe
@ 2019-07-04 12:24             ` Jonathan Morton
  2019-07-04 13:43               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-05  9:48             ` Luca Muscariello
  1 sibling, 1 reply; 84+ messages in thread
From: Jonathan Morton @ 2019-07-04 12:24 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Luca Muscariello, Holland, Jake, ecn-sane, tsvwg

> On 4 Jul, 2019, at 2:54 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> The phrase "relative to a FIFO" is important. In a FIFO, it is of course possible for flows to take more throughput than others. We see that as a feature of the Internet not a bug. But we accept that some might disagree...

Chalk me up as among those who consider "no worse than a FIFO" to not be very reassuring.  As is well documented and even admitted in L4S drafts, L4S flows tend to squash "classic" flows in a FIFO.

So the difficulty here is twofold:

1: DualQ or FQ is needed to make L4S coexist with existing traffic, and

2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.

I'll read your reply to Jake when it arrives.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-04 12:24             ` Jonathan Morton
@ 2019-07-04 13:43               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-04 14:03                 ` Jonathan Morton
  0 siblings, 1 reply; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-04 13:43 UTC (permalink / raw)
  To: Jonathan Morton, Bob Briscoe; +Cc: ecn-sane, tsvwg

Jonathan,

>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.

Not correct. DualQ works not different than any (single q) FIFO, which can be defeated by non-responsive traffic.
It even does not matter what type of traffic the adversary is (L4S or Classic drop/mark), as the adversary will push away the responsive traffic only by the congestion signal it invokes in the AQM (drop or classic or L4S marking). The switch to drop for all traffic from 25% onwards avoids that ECN flows get a benefit under overload caused by non-responsive flows. This mechanism protects also Classic ECN single Q AQMs, as defined in the ECN RFCs.

So conclusion:   a DualQ works exactly the same as any other single Q AQM supporting ECN !!
Try it, and you'll see...

Koen.

-----Original Message-----
From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Jonathan Morton
Sent: Thursday, July 4, 2019 2:24 PM
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

> On 4 Jul, 2019, at 2:54 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> The phrase "relative to a FIFO" is important. In a FIFO, it is of course possible for flows to take more throughput than others. We see that as a feature of the Internet not a bug. But we accept that some might disagree...

Chalk me up as among those who consider "no worse than a FIFO" to not be very reassuring.  As is well documented and even admitted in L4S drafts, L4S flows tend to squash "classic" flows in a FIFO.

So the difficulty here is twofold:

1: DualQ or FQ is needed to make L4S coexist with existing traffic, and

2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.

I'll read your reply to Jake when it arrives.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-06-19  4:24       ` Holland, Jake
  2019-06-19 13:02         ` Luca Muscariello
@ 2019-07-04 13:45         ` Bob Briscoe
  2019-07-10 17:03           ` Holland, Jake
  1 sibling, 1 reply; 84+ messages in thread
From: Bob Briscoe @ 2019-07-04 13:45 UTC (permalink / raw)
  To: Holland, Jake; +Cc: Luca Muscariello, ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 9750 bytes --]

Jake,

On 19/06/2019 05:24, Holland, Jake wrote:
> Hi Bob and Luca,
>
> Thank you both for this discussion, I think it helped crystallize a
> comment I hadn't figured out how to make yet, but was bothering me.
>
> I’m reading Luca’s question as asking about fixed-rate traffic that does
> something like a cutoff or downshift if loss gets bad enough for long
> enough, but is otherwise unresponsive.
>
> The dualq draft does discuss unresponsive traffic in 3 of the sub-
> sections in section 4, but there's a point that seems sort of swept
> aside without comment in the analysis to me.
>
> The referenced paper[1] from that section does examine the question
> of sharing a link with unresponsive traffic in some detail, but the
> analysis seems to bake in an assumption that there's a fixed amount
> of unresponsive traffic, when in fact for a lot of the real-life
> scenarios for unresponsive traffic (games, voice, and some of the
> video conferencing) there's some app-level backpressure, in that
> when the quality of experience goes low enough, the user (or a qoe
> trigger in the app) will often change the traffic demand at a higher
> layer than a congestion controller (by shutting off video, for
> instance).
>
> The reason I mention it is because it seems like unresponsive
> traffic has an incentive to mark L4S and get low latency.  It doesn't
> hurt, since it's a fixed rate and not bandwidth-seeking, so it's
> perfectly happy to massively underutilize the link. And until the
> link gets overloaded it will no longer suffer delay when using the
> low latency queue, whereas in the classic queue queuing delay provides
> a noticeable degradation in the presence of competing traffic.
It is very much intentional to allow unresponsive traffic in the L queue 
if it is not contributing to queuing.

You're right that the title of S.4.1.3 sounds like there's a presumption 
that all unresponsive ECN traffic is bad. Sorry that was not the 
intention. Elsewhere the drafts do say that a reasonable amount of 
smoothly paced unresponsive traffic is OK alongside any responsive traffic.

(I've just posted an -09 rev, but I'll post a draft-10 that fixes that, 
hopefully before the Monday cut-off).

If you're talking about where unresponsive traffic is mentioned in 
4.1.1, I think that's OK, 'cos that's in the context of saturated 
congestion marking (when it's not OK to be unresponsive).

>
> I didn't see anywhere in the paper that tried to check the quality
> of experience for the UDP traffic as non-responsive traffic approached
> saturation, except by inference that loss in the classic queue will
> cause loss in the LL queue as well.
Yeah, in the context of Henrik's thesis (your [1]), "unresponsive" was 
used as a byword for "attack traffic". But that shouldn't be taken to 
mean unresponsive is considered evil for L4S in general.

Indeed, Low Latency DOCIS started from the assumption of using a low 
latency queue for unresponsive traffic (games, VoIP, etc), then added 
responsive L4S traffic into the same queue later.

You may have seen the draft about assigning a DSCP for 
Non-Queue-Building (NQB) traffic for that purpose (as with L4S and 
unlike Diffserv, this codepoint solely describes the traffic's 
behaviour, not what it wants or needs).
     https://tools.ietf.org/html/draft-white-tsvwg-nqb-02
And there are references in ecn-l4s-id to other identifiers that could 
be used to get unresponsive traffic into the low latency queue (DOCSIS 
classifies EF and NQB as low latency by default).

We don't want ECN to be the only way to get into the L queue, cos we 
don't want to encourage mismarking as 'ECN' when a flow is not actually 
going to respond to ECN).

>
> But letting unresponsive flows get away with pushing out more classic
> traffic and removing the penalty that classic flows would give it seems
> like a risk that would result in more use of this kind of unresponsive
> traffic marking itself for the LL queue, since it just would get lower
> latency almost up until overload.
As explained to Luca, it's counter-intuitive, but responsive flows 
(either C or L) use the same share of capacity irrespective of which 
queue any unresponsive traffic is in. Think of it as the unresponsive 
traffic subtracting capacity from the aggregate (because both queues can 
use the whole aggregate), then the coupling sharing out what's left. The 
coupling makes it like a FIFO from a bandwidth perspective.

You can try this with the tool you mentioned that you had downloaded. 
There's a slider to add unresponsive traffic to either queue.

So it's fine if unresponsive traffic doesn't cause any queuing itself. 
It can happily use the L queue. This was a very important design goal, 
but we write about it circumspectly in the IETF drafts, 'cos talk about 
allowing unresponsive traffic can trigger political correctness 
arguments. (Oops, am I writing on an IETF list?)

Nonetheless, when an unresponsive flow(s) is consuming some capacity, 
and a responsive flow(s) takes the total over the available capacity, 
then both are responsible in proportion to their contribution to the 
queue, 'cos the unresponsive flow didn't respond (it didn't even try to).

This is why it's OK to have a small unresponsive flow, but it becomes 
less and less OK to have a larger and larger unresponsive flow.

BTW, the proportion of blame for the queue is what the queuing score 
represents in the DOCSIS queue protection algo. It's quite simple but 
subtle. See your PS at the end. Right now I'm going to get on with 
writing about that in a proper doc, rather than in an email.

>
> Many of the apps that send unresponsive traffic would benefit from low
> latency and isolation from the classic traffic, so it seems a mistake
> to claim there's no benefit, and it furthermore seems like there's
> systematic pressures that would often push unresponsive apps into this
> domain.
There's no bandwidth benefit.
There's only latency benefit, and then the only benefits are:

  * the low latency behaviour of yourself and other flows behaving like you
  * and, critically, isolation from those flows not behaving well like you.

Neither give an incentive to mismark - you get nothing if you don't 
behave. And there's a disincentive for 'Classic' TCP flows to mismark, 
'cos they badly underutilize without a queue.

(See also reply to Luca addressing accidents and malice, which lie 
outside control by incentives).

>
> If that line of reasoning holds up, the "rather specific" phrase in
> section 4.1.1 of the dualq draft might not turn out to be so specific
> after all, and could be seen as downplaying the risks.
Yup, as said, will fix the phrasing in 4.1.3. But I'm not going to touch 
4.1.1. without better understand what the problem is there.

>
> Best regards,
> Jake
>
> [1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf
>
> PS: This seems like a consequence of the lack of access control on
> setting ECT(1), and maybe the queue protection function would address
> it, so that's interesting to hear about.
Yeah, I'm trying to write about that next. But if you extract Appendix P 
from the DOCSIS 3.1 spec it's explained pretty well already and openly 
available.

However, I want it to be clear that Q Prot is not /necessary/ for L4S - 
and it's also got wider applicability, I think.

> But I thought the whole point of dualq over fq was that fq state couldn't
> scale properly in aggregating devices with enough expected flows sharing
> a queue?  If this protection feature turns out to be necessary, would that
> advantage be gone?  (Also: why would one want to turn this protection off
> if it's available?)
1/ The q-prot mechanism certainly has the disadvantage that it has to 
access L4 headers. But it is much more lightweight than FQ.

There's no queue state per flow. The flow-state is just a number that 
represents its own expiry time - a higher queuing score pushes out the 
expiry time further. If it has expired when the next packet of the flow 
arrives, it just starts from now, like a new flow, otherwise it adds to 
the existing expiry time. Long-running L4S flows don't hold on to 
flow-state between most packets - it usually expires reasonably early in 
the gap between the packets of a normal flow, then it can be recycled 
for packets from any other flows that arrive in between. So only 
misbehaving flows hold flow state persistently.

The subtle part is the queuing score. It uses the internal variable from 
the AQM that drives the ECN marking probability - call it p (between 0 
and 1 in floating point). And it takes the size of each arriving packet 
of a flow and scales by the value of p on arrival. This would accumulate 
a number which would rise at the so-called congestion-rate of the flow, 
i.e. the rate at which the flow is causing congestion (the rate at which 
it is sending bytes that are ECN marked or dropped).

However, rather than just doing that, the queuing score is also 
normalized into time units (to represent the expiry time of the flow 
state, as above). That's possible by just dividing by a constant that 
represents the acceptable congestion-rate per flow (rounded up to an 
integer power of 2 for efficiency). A nice property of the linear 
scaling of L4S is that this number is a constant for any link rate.

That's probably not understandable. Let me write it up properly - with 
some explanatory pictures and examples.

Bob

>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[-- Attachment #2: Type: text/html, Size: 12392 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-04 13:43               ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-04 14:03                 ` Jonathan Morton
  2019-07-04 17:54                   ` Bob Briscoe
  2019-07-05  6:46                   ` De Schepper, Koen (Nokia - BE/Antwerp)
  0 siblings, 2 replies; 84+ messages in thread
From: Jonathan Morton @ 2019-07-04 14:03 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp); +Cc: Bob Briscoe, ecn-sane, tsvwg

> On 4 Jul, 2019, at 4:43 pm, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> So conclusion:   a DualQ works exactly the same as any other single Q AQM supporting ECN !!
> Try it, and you'll see...

But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.  This isolation is the very reason why something like DualQ is proposed, so the fact that it can be defeated into this degraded single-queue mode is a genuine problem.

May I direct you to our LFQ draft, published yesterday, for what we consider to be a much more robust approach, yet with similar hardware requirements to DualQ?  I'd be interested in hearing feedback.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]  Comments on L4S drafts
  2019-07-04 14:03                 ` Jonathan Morton
@ 2019-07-04 17:54                   ` Bob Briscoe
  2019-07-05  8:26                     ` Jonathan Morton
  2019-07-05  6:46                   ` De Schepper, Koen (Nokia - BE/Antwerp)
  1 sibling, 1 reply; 84+ messages in thread
From: Bob Briscoe @ 2019-07-04 17:54 UTC (permalink / raw)
  To: Jonathan Morton, De Schepper, Koen (Nokia - BE/Antwerp); +Cc: ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 3839 bytes --]

Jonathan,

On 04/07/2019 15:03, Jonathan Morton wrote:
>> On 4 Jul, 2019, at 4:43 pm, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
>>
>> So conclusion:   a DualQ works exactly the same as any other single Q AQM supporting ECN !!
>> Try it, and you'll see...
> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
You are assuming that the one thing we haven't done yet (fall-back to 
TCP-friendly on detection of classic ECN) won't work, whereas all the 
problems you have not addressed yet with SCE will work.

>   This isolation is the very reason why something like DualQ is proposed, so the fact that it can be defeated into this degraded single-queue mode is a genuine problem.
>
> May I direct you to our LFQ draft, published yesterday, for what we consider to be a much more robust approach, yet with similar hardware requirements to DualQ?  I'd be interested in hearing feedback.
I will certainly read. I assume you are aware that implementation 
complexity is only a small part of the objections to FQ. {Note 1}

I believe that using this to enable fine-grained congestion control 
would still rely on the semantics of the SCE style of signalling still. 
Correct?

So, for the third time of asking, can you or someone please respond to 
the 5 points that will be problematic for SCE (I listed them on 11 Mar 
2019 on tsvwg@ietf.org re-pasted from bloat@ to you & DaveT the day 
after you posted the first draft). You will not get anywhere in the IETF 
without addressing serious problems that people raise with your proposal.

I don't need to tell you that the Internet is a complex place to 
introduce anything new, especially into IP itself. If you cannot solve 
/all/ these problems, it will save everyone a lot of time if you just 
say so.

I have repeated bullets summarizing each question below (I've removed 
the one about re-purposing the receive window, which DaveT wished hadn't 
been mentioned, and added Q4 which I asked more recently). You may wish 
to start a new thread to answer some of the more substantive ones. They 
are roughly ranked in order of seriousness with Q1-3 being show-stoppers.

  * Q1. Does SCE require per-flow scheduling?
      o If so, how do you expect it to be supported on L2 links, where
        not even the IP header is accessible, let alone L4?
      o If not, how does it work?
  * Q2. How do you address the lack of ECT(1) feedback in TCP, given
    no-one is implementing the AccECN TCP option? And even if they did,
    do you have measurements on how few middleboxes / proxies, etc will
    allow traversal?
  * Q3. How do you address all the tunnel decapsulators that will
    black-hole ECT(1) marking of the outer? Do you have measurements of
    how much of a blockage to progress this will be?
  * Q4. How do you address the interaction of the two timescale dynamics
    in the SCE congestion control?
  * Q5. Can out-of-order tolerance be relaxed on links supporting SCE?
    (not a problem as such, but a lack of one of L4S's advantages)

{Note 1}: Implementation complexity is only a small part of the 
objections to FQ. One major reason is in Q1 above. I have promised a 
write-up of all the other reasons for why per-flow scheduling is not a 
desirable goal even if it can be achieved with low complexity. I've got 
it half written (as a tech report, not an Internet Draft), but it's on 
hold while other stuff takes priority for me (not least an awkwardly 
timed family vacation starting tomorrow for 10 days).

Cheers

Bob

>
>   - Jonathan Morton

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[-- Attachment #2: Type: text/html, Size: 5347 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-04 14:03                 ` Jonathan Morton
  2019-07-04 17:54                   ` Bob Briscoe
@ 2019-07-05  6:46                   ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-05  8:51                     ` Jonathan Morton
  1 sibling, 1 reply; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-05  6:46 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Bob Briscoe, ecn-sane, tsvwg

>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.

Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?

        "You cannot defeat a DualQ" (at least no more than a single Q)

>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.

With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.

I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.
We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type. 
	- So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).
	- you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.
	- you only can apply SCE less than CE, but that makes it useless, as it creates a bigger queue for SCE, and CE would kick in first anyway.

Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.
So this is why I think L4S is the best solution. Why would you try an alternative if it cannot work?

Koen.

-----Original Message-----
From: Jonathan Morton <chromatix99@gmail.com> 
Sent: Thursday, July 4, 2019 4:03 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Bob Briscoe <ietf@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

> On 4 Jul, 2019, at 4:43 pm, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> So conclusion:   a DualQ works exactly the same as any other single Q AQM supporting ECN !!
> Try it, and you'll see...

But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.  This isolation is the very reason why something like DualQ is proposed, so the fact that it can be defeated into this degraded single-queue mode is a genuine problem.

May I direct you to our LFQ draft, published yesterday, for what we consider to be a much more robust approach, yet with similar hardware requirements to DualQ?  I'd be interested in hearing feedback.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]  Comments on L4S drafts
  2019-07-04 17:54                   ` Bob Briscoe
@ 2019-07-05  8:26                     ` Jonathan Morton
  0 siblings, 0 replies; 84+ messages in thread
From: Jonathan Morton @ 2019-07-05  8:26 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

> On 4 Jul, 2019, at 8:54 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:

> You are assuming that the one thing we haven't done yet (fall-back to TCP-friendly on detection of classic ECN) won't work, whereas all the problems you have not addressed yet with SCE will work.

This is whataboutism.  Please don't.

We have a complete end-to-end implementation of SCE, which not only works but is safe-by-design in today's Internet, as outlined not only in the I-Ds we submitted this week, but also below.

> I believe that using this to enable fine-grained congestion control would still rely on the semantics of the SCE style of signalling still. Correct?

Yes, although the fine detail of these semantics has changed since the first I-D in light of implementation experience.  I do suggest reading the new version.

> 	• Q1. Does SCE require per-flow scheduling?

SCE does not require per-flow scheduling.

It does work *better* with per-flow scheduling, but that's also true of most types of existing traffic.

> 		• If so, how do you expect it to be supported on L2 links, where not even the IP header is accessible, let alone L4?

While this question is moot, may I ask how you expect the ECN field to be used when the IP header is inaccessible?  I'm sure either DCTCP or SCE-like principles can be applied to an L2 flow, but it would not be through ECN per se.

> 		• If not, how does it work? 

In the first place, SCE flows work transparently with existing dumb and CE-marking infrastructure, and behave in an RFC-3168 compliant manner in that case.  So no special preparations in the network are required merely to allow SCE endpoints to be deployed safely.  We consider this one of SCE's key advantages over L4S.

We have now implemented and at least briefly tested a way to mark SCE in a single-queue bottleneck while retaining fairness versus non-SCE traffic.  It requires only an adjustment to a detail of the way SCE marking is done at that node - that is, altering the relationship between CE and SCE marking - and does not increase implementation complexity even there.  The tradeoff is that SCE's benefit is diluted because SCE flows may receive unnecessary CE marks, but it does achieve fairness (for example) between plain Reno and Reno-SCE.

You might wish to read the submitted draft outlining our initial test results.  They do in fact focus on single-queue behaviour, both with single flows and with two similar or dissimilar flows competing, and should thus answer additional questions you may have on this topic.  We are still refining this, of course.

> 	• Q2. How do you address the lack of ECT(1) feedback in TCP, given no-one is implementing the AccECN TCP option? And even if they did, do you have measurements on how few middleboxes / proxies, etc will allow traversal?

Our experimental reference implementation uses the former NS bit in the TCP header as an ESCE feedback mechanism.  NS is unused because Nonce Sum was never deployed, but because Nonce Sum was specified in an RFC, we expect it will traverse the Internet quite well.  Additionally, the reuse of NS in another role also associated with ECT(1) seems poetic.  Controlled tests over public Internet paths, as well as more extensively in lab conditions, have been carried out successfully.

Disruption of either SCE or ESCE signals is tolerated by design, because in extremis SCE flows still respond to CE marks and packet drops appropriately for effective congestion control.

We expect to publish an I-D covering the above shortly.

Cursory examination of QUIC indicates that it already has a mechanism specified for detailed ECN feedback, and naturally this can also support SCE.

> 	• Q3. How do you address all the tunnel decapsulators that will black-hole ECT(1) marking of the outer? Do you have measurements of how much of a blockage to progress this will be?

I imagine a blackhole of ECT(1) would also be problematic for L4S.  I would consider such tunnels RFC-ignorant (ie. buggy) because ECT(1) is expressly permitted by RFC-3168 in the same circumstances where ECT(0) is.  We have not encountered any such problems ourselves.

In any case, the precise effects will depend on the nature of the blackhole.  If they change ECT(1) to ECT(0) or Not-ECT, then SCE flows will not receive SCE information and will therefore behave like RFC-3168 flows do.  If the affected packets are dropped, then TCP should be able to recover from that.

> 	• Q4. How do you address the interaction of the two timescale dynamics in the SCE congestion control?

Which two timescale dynamics are you referring to?

> 	• Q5. Can out-of-order tolerance be relaxed on links supporting SCE? (not a problem as such, but a lack of one of L4S's advantages)

We consider that aspect of L2 link design to be orthogonal to SCE.  Most transports currently deployed should be able to cope with microsecond-level reordering on multi-millisecond Internet paths without triggering unnecessary retransmissions.

> {Note 1}: Implementation complexity is only a small part of the objections to FQ.

We are still waiting for a good explanation of these objections.  So far, we are aware only of the well-known vulnerability to "gaming" by employing more flows than necessary - but we also have defences against that, which we plan to add to a future version of the LFQ draft.  These defences are semantically similar to the dual host-flow fairness currently deployed in Cake, but with a more hardware-friendly algorithm.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-05  6:46                   ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-05  8:51                     ` Jonathan Morton
  2019-07-08 10:26                       ` De Schepper, Koen (Nokia - BE/Antwerp)
  0 siblings, 1 reply; 84+ messages in thread
From: Jonathan Morton @ 2019-07-05  8:51 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp); +Cc: Bob Briscoe, ecn-sane, tsvwg

> On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
>>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
> 
> Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
> 
>        "You cannot defeat a DualQ" (at least no more than a single Q)

I consider forcibly degrading DualQ to single-queue mode to be a defeat.  However…

>>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
> 
> With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.

Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck.  Defeating DualQ only destroys L4S' latency advantage over "classic" traffic.  We might actually be making progress here!

> I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.

Devil's advocate time.  What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE?  Classic traffic would see only the latter; L4S could use the former.

> We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type. 

Yes, that's exactly what we do - and it does work.

> 	- So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).

This is a false dichotomy.  We quickly realised both of those options were unacceptable, and sought a third way.

SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already.  The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.

> 	- you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.

Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware.  So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.

It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.

> Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.

SCE is designed around not *needing* to differentiate the traffic types.  Single queues have known disadvantages, and SCE doesn't worsen them.

Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be interested in hearing a principled critique of it.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-04 11:54           ` Bob Briscoe
  2019-07-04 12:24             ` Jonathan Morton
@ 2019-07-05  9:48             ` Luca Muscariello
  1 sibling, 0 replies; 84+ messages in thread
From: Luca Muscariello @ 2019-07-05  9:48 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Holland, Jake, ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 15396 bytes --]

I have asked a relatively simple question that Jake's got right, so I'm not
alone in my own bubble.
I've asked this to Greg White months ago and his answer was that
unresponsive traffic  x  is assumed to be small.
That the queue protection mechanism will ensure that.

You Bob send me a reference, that I checked again, and considers the other
extreme case where x is very large.

I have not found anything that covers the most realistic cases which is
case c in my previous message, when x
just varies normally when you have unresponsive traffic that varies just
like in today networks.

The normal case is the one I'm interested the most. This is something that
in LDD for Cable may
enter the access network of many cable subscribers, surfing the web, using
WebEx working form home, doing normal things.
The document "Destruction testing" :) is not very useful to my use case and
specific question.

We have been promised below millisecond latency up to the 99-th quantile
and I have simple technical questions.
LLD is a new initiative in the public IETF forum so it is normal that
people start asking questions.
No need to get blood pressure above the threshold.

As long as LLD is a cable industry only thing using PHB and private
marking, all this discussion may be irrelevant in this forum,
but this is not the case.



On Thu, Jul 4, 2019 at 1:55 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Luca,
>
>
> On 19/06/2019 14:02, Luca Muscariello wrote:
>
> Jake,
>
> Yes, that is one scenario that I had in mind.
> Your response comforts me that I my message was not totally unreadable.
>
> My understanding was
> - There are incentives to mark packets  if they get privileged treatment
> because of that marking. This is similar to the diffserv model with all the
> consequences in terms of trust.
>
> [BB] I'm afraid this is a common misunderstanding. We have gone to great
> lengths to ensure that the coupled dualQ does not give any privilege, by
> separating out latency from throughput, so:
>
>    - It solely isolates traffic that gives /itself/ low latency from
>    traffic that doesn't.
>    - It is very hard to get any throughput advantage from the mechanism,
>    relative to a FIFO (see further down this email).
>
> The phrase "relative to a FIFO" is important. In a FIFO, it is of course
> possible for flows to take more throughput than others. We see that as a
> feature of the Internet not a bug. But we accept that some might disagree...
>
> So those that want equal flow rates can add per-flow bandwidth policing,
> e.g. AFD, to the coupled dualQ. But that should be (and now can be) a
> separate policy choice.
>
> An important advance of the coupled dualQ is to cut latency without
> interfering with throughput.
>
>
> - Unresponsive traffic in particular (gaming, voice, video etc.) has
> incentives to mark. Assuming there is x% of unresponsive traffic in the
> priority queue, it is non trivial to guess how the system works.
> - in particular it is easy to see the extreme cases,
>                (a) x is very small, assuming the system is stable, the
> overall equilibrium will not change.
>                (b) x is very large so the dctcp like sources fall back to
> cubic like and the systems behave almost like a single FIFO.
>                (c) in all other cases x varies according to the
> unresponsive sources' rates.
>                     Several different equilibria may exist, some of which
> may include oscillations. Including oscillations of all fallback
> mechanisms.
> The reason I'm asking is that these cases are not discussed in the I-D
> documents or in the references, despite these are very common use cases.
>
> [BB] This has all already been explained and discussed at length during
> various IETF meetings. I had an excellent student (Henrik Steen) act as a
> "red-team" guy. His challenge was: Can you contrive a mis-marking strategy
> with unresponsive traffic to cause any more harm than in a FIFO? We wanted
> to make sure that introducing a priority scheduler could not be exploited
> as a significant new attack vector.
>
> Have you looked at his thesis - the [DualQ-Test
> <https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#ref-DualQ-Test>]
> reference at the end of this subsection of the Security Considerations in
> the aqm-dualq-coupled draft:
>  4.1.3.  Protecting against Unresponsive ECN-Capable Traffic
> <https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#section-4.1.3>
> ?
> (we ruled evaluation results out of scope of this already over-long draft
> - instead giving references).
>
> Firstly, when unresponsive traffic < link rate, counter-intuitively it
> doesn't matter which queue it classifies itself into. Any responsive
> traffic in either or both queues still shares out the remaining capacity as
> if the unresponsive traffic had subtracted from the overall capacity (like
> a FIFO).
>
> Beyond that, Henrik tested whether the persistent overload mechanism that
> switches off any distinction between the queues (code in the reference
> Linux implementation
> <https://github.com/L4STeam/sch_dualpi2_upstream/blob/master/net/sched/sch_dualpi2.c>,
> pseudocode and explanation in Appendix A.2
> <https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#appendix-A.2>)
> left any room for mis-marked traffic to gain an advantage before the
> switch-over. There was a narrow region in which unresponsive traffic
> mismarked as ECN could strengthen its attack relative to the same attack on
> the Classic queue without mismarking.
>
> I presented a one-slide summary of Henrik's experiment here in 2017 in
> IETF tcpm
> <https://datatracker.ietf.org/meeting/99/materials/slides-99-tcpm-ecn-adding-explicit-congestion-notification-ecn-to-tcp-control-packets-02#page=12>
> .
> I tried to make the legends self-explanatory as long as you work at it,
> but shout if you need it explained.
> Each column of plots shows attack traffic at increasing fractions of the
> link rate; from 70% to 200%.
>
> Try to spot the difference between the odd columns and the even columns -
> they're just a little different in the narrow window either side of 100% -
> a sharp kink instead of a smooth kink.
> I included log-scale plots of the bottom end of the range to magnify the
> difference.
>
> Yes, the system oscillates around the switch-over point, but you can see
> from the tcpm slide that the oscillations are also there in the 3rd column
> (which emulates the same switch-over in a FIFO). So we haven't added a new
> problem.
>
> In summary, the advantage of mismarking was small and it was hard for the
> attacker not to trip the dualQ into overload state when it applies the same
> drop level in either queue. And that was when the victim traffic was just a
> predictable long-running flow. With normal less predictable victim traffic,
> I cannot think how to get this attack to be effective.
>
>
> If we add the queue protection mechanism, all unresponsive  flows that are
> caught cheating are registered in a blacklist and always scheduled in the
> non-priority queue.
>
> [BB]
> 1/ Queue protection is an alternative to overload protection, not an
> addition.
>
>    - The Linux implementation solely uses the overload mechanism, which
>    is sufficient to prevent the priority scheduler amplifying a mismarking
>    attack (whether ECN or DSCP).
>    - The DOCSIS implementation use per-flow queue protection instead.
>
> 2/ Aligned incentives
>
> The coupled dualQ with just overload protection ensures incentives are
> aligned so that, normal developers won't intentionally mismark traffic. As
> explained at the start of this email:
>
> the DualQ solely isolates traffic that gives /itself/ low latency from
> traffic that doesn't. Low latency solely depends on the traffic's own
> behaviour. Traffic doesn't /get/ anything from the low latency queue, so
> there's no point mismarking to get into it.
>
> However, incentives only address rational behaviour, not accidents and
> malice. That's why DOCSIS operators asked for Q protection - to protect
> against something accidentally or deliberately introducing bursty or
> excessive traffic into the low latency queue.
>
> The Linux code is sufficient under normal circumstances though. There are
> already other mechanisms that deal with the worms, trojans, etc. that might
> launch these attacks.
>
> 3/ DOCSIS Q protection does not black-list flows.
>
> It redirects certain /packets/ from those flows with the highest queuing
> scores into the Classic queue, only if those packets would otherwise risk a
> threshold delay for the low latency queue being exceeded.
>
> If a flow has a temporary wobble, some of its packets get redirected to
> protect the low latency queue, but if it gets back on track, then there's
> just no further packet redirection.
>
> It that happens unresponsive flows will get a service quality that is
> worse than if using a single FIFO for all flows.
>
> 4/ Slight punishment is a feature, not a bug
>
> If an unresponsive flow is well-paced and not contributing to queuing, it
> will accumulate only a low queuing score, and experience no redirected
> packets.
>
> If it is contributing to queuing and it is mismarking itself, then Q Prot
> will redirect some of its packets, and the continual reordering will
> (intentionally) give it worse service quality. This deliberate slight
> punishment gives developers a slight incentive to mark their flows
> correctly.
>
> I could explain more about the queuing score (I think I already did for
> you on these lists), but it's all in Annex P of the DOCSIS spec
> <https://specification-search.cablelabs.com/CM-SP-MULPIv3.1>. and I'm
> trying to write a stand-alone document about it at the moment.
>
>
>
> Using a flow blacklist brings back the complexity that dualq is supposed
> to remove compared to flow-isolation by flow-queueing.
> It seems to me that the blacklist is actually necessary to make dualq work
> under the assumption that x is small,
>
> [BB] As above, the Linux implementation works and aligns incentives
> without Q Prot, which is merely an optional additional protection against
> accidents and malice.
>
> (and there's no flow black-list).
>
>
> because in the other cases the behavior
> of the dualq system is unspecified and likely subject to instabilities,
> i.e. potentially different kind of oscillations.
>
>
> I do find the tone of these emails rather disheartening. We've done all
> this work that we think is really cool. And all we get in return is
> criticism in an authoritative tone as if it is backed by experiments. But
> so far it is not. There seems to be a presumption that we are not
> professional and we are somehow not to be trusted to have done a sound job.
>
> Yes, I'm sure mistakes can be found in our work. But it would be nice if
> the tone of these emails could become more constructive. Possibly even some
> praise. There seems to be a presumption of disrespect that I'm not used to,
> and I would rather it stopped.
>
> Sorry for going silent recently - had too much backlog. I'm working my way
> backwards through this thread. Next I'll reply to Jake's email, which is,
> as always, perfectly constructive.
>
> Cheers
>
>
> Bob
>
> Luca
>
>
>
>
>
> On Tue, Jun 18, 2019 at 9:25 PM Holland, Jake <jholland@akamai.com> wrote:
>
>> Hi Bob and Luca,
>>
>> Thank you both for this discussion, I think it helped crystallize a
>> comment I hadn't figured out how to make yet, but was bothering me.
>>
>> I’m reading Luca’s question as asking about fixed-rate traffic that does
>> something like a cutoff or downshift if loss gets bad enough for long
>> enough, but is otherwise unresponsive.
>>
>> The dualq draft does discuss unresponsive traffic in 3 of the sub-
>> sections in section 4, but there's a point that seems sort of swept
>> aside without comment in the analysis to me.
>>
>> The referenced paper[1] from that section does examine the question
>> of sharing a link with unresponsive traffic in some detail, but the
>> analysis seems to bake in an assumption that there's a fixed amount
>> of unresponsive traffic, when in fact for a lot of the real-life
>> scenarios for unresponsive traffic (games, voice, and some of the
>> video conferencing) there's some app-level backpressure, in that
>> when the quality of experience goes low enough, the user (or a qoe
>> trigger in the app) will often change the traffic demand at a higher
>> layer than a congestion controller (by shutting off video, for
>> instance).
>>
>> The reason I mention it is because it seems like unresponsive
>> traffic has an incentive to mark L4S and get low latency.  It doesn't
>> hurt, since it's a fixed rate and not bandwidth-seeking, so it's
>> perfectly happy to massively underutilize the link. And until the
>> link gets overloaded it will no longer suffer delay when using the
>> low latency queue, whereas in the classic queue queuing delay provides
>> a noticeable degradation in the presence of competing traffic.
>>
>> I didn't see anywhere in the paper that tried to check the quality
>> of experience for the UDP traffic as non-responsive traffic approached
>> saturation, except by inference that loss in the classic queue will
>> cause loss in the LL queue as well.
>>
>> But letting unresponsive flows get away with pushing out more classic
>> traffic and removing the penalty that classic flows would give it seems
>> like a risk that would result in more use of this kind of unresponsive
>> traffic marking itself for the LL queue, since it just would get lower
>> latency almost up until overload.
>>
>> Many of the apps that send unresponsive traffic would benefit from low
>> latency and isolation from the classic traffic, so it seems a mistake
>> to claim there's no benefit, and it furthermore seems like there's
>> systematic pressures that would often push unresponsive apps into this
>> domain.
>>
>> If that line of reasoning holds up, the "rather specific" phrase in
>> section 4.1.1 of the dualq draft might not turn out to be so specific
>> after all, and could be seen as downplaying the risks.
>>
>> Best regards,
>> Jake
>>
>> [1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf
>>
>> PS: This seems like a consequence of the lack of access control on
>> setting ECT(1), and maybe the queue protection function would address
>> it, so that's interesting to hear about.
>>
>> But I thought the whole point of dualq over fq was that fq state couldn't
>> scale properly in aggregating devices with enough expected flows sharing
>> a queue?  If this protection feature turns out to be necessary, would that
>> advantage be gone?  (Also: why would one want to turn this protection off
>> if it's available?)
>>
>>
>>
> _______________________________________________
> Ecn-sane mailing listEcn-sane@lists.bufferbloat.nethttps://lists.bufferbloat.net/listinfo/ecn-sane
>
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
>

[-- Attachment #2: Type: text/html, Size: 21092 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-05  8:51                     ` Jonathan Morton
@ 2019-07-08 10:26                       ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-08 20:55                         ` Holland, Jake
  0 siblings, 1 reply; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-08 10:26 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Bob Briscoe, ecn-sane, tsvwg

Hi Jonathan,

From your responses below, I have the impression you think this discussion is about FQ (flow/fair queuing). Fair queuing is used today where strict isolation is wanted, like between subscribers, and by extension (if possible and preferred) on a per transport layer flow, like in Fixed CPEs and Mobile networks. No discussion about this, and assuming we have and still will have an Internet which needs to support both common queues (like DualQ is intended) and FQs, I think the only discussion point is how we want to migrate to an Internet that supports optimally Low Latency.

This leads us to the question L4S or SCE?

If we want to support low latency for both common queues and FQs we "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too, and if we want to force the whole Internet to use only FQs, we "SHOULD" use SCE 😉. If your goal is to force only FQs in the Internet, then let this be clear... I assume we need a discussion on another level in that case (and to be clear, it is not a goal I can support)...

Koen.

-----Original Message-----
From: Jonathan Morton <chromatix99@gmail.com> 
Sent: Friday, July 5, 2019 10:51 AM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Bob Briscoe <ietf@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

> On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
>>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
> 
> Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
> 
>        "You cannot defeat a DualQ" (at least no more than a single Q)

I consider forcibly degrading DualQ to single-queue mode to be a defeat.  However…

>>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
> 
> With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.

Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck.  Defeating DualQ only destroys L4S' latency advantage over "classic" traffic.  We might actually be making progress here!

> I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.

Devil's advocate time.  What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE?  Classic traffic would see only the latter; L4S could use the former.

> We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type. 

Yes, that's exactly what we do - and it does work.

> 	- So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).

This is a false dichotomy.  We quickly realised both of those options were unacceptable, and sought a third way.

SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already.  The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.

> 	- you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.

Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware.  So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.

It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.

> Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.

SCE is designed around not *needing* to differentiate the traffic types.  Single queues have known disadvantages, and SCE doesn't worsen them.

Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be interested in hearing a principled critique of it.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-08 10:26                       ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-08 20:55                         ` Holland, Jake
  2019-07-10  0:10                           ` Jonathan Morton
  2019-07-10  9:00                           ` De Schepper, Koen (Nokia - BE/Antwerp)
  0 siblings, 2 replies; 84+ messages in thread
From: Holland, Jake @ 2019-07-08 20:55 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp), Jonathan Morton; +Cc: ecn-sane, tsvwg

Hi Koen,

I'm a bit confused by this response.

I agree the key question for this discussion is about how best to
get low latency for the internet.

If I'm reading your message correctly, you're saying that under the
L4S approach for ECT(1), we can achieve it with either dualq or fq
at the bottleneck, but under the SCE approach we can only do it with
fq at the bottleneck.

(I think I understand and roughly agree with this claim, subject to
some caveats.  I just want to make sure I've got this right so
far, and that we agree that in neither case can very low latency be
achieved with a classic single queue with classic bandwidth-seeking
traffic.)

Are you saying that even if a scalable FQ can be implemented in
high-volume aggregated links at the same cost and difficulty as
dualq, there's a reason not to use FQ?  Is there a use case where
it's necessary to avoid strict isolation if strict isolation can be
accomplished as cheaply?

Also, I think if the SCE position is "low latency can only be
achieved with FQ", that's different from "forcing only FQ on the
internet", provided the fairness claims hold up, right?  (Classic
single queue AQMs may still have a useful place in getting
pretty-good latency in the cheapest hardware, like maybe PIE with
marking.)

Anyway, to me this discussion is about the tradeoffs between the
2 proposals.  It seems to me SCE has some safety advantages that
should not be thrown away lightly, so if the performance can be
made equivalent, it would be good to know about it before
committing the codepoint.

Best regards,
Jake

On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com> wrote:

    Hi Jonathan,
    
    From your responses below, I have the impression you think this discussion is about FQ (flow/fair queuing). Fair queuing is used today where strict isolation is wanted, like between subscribers, and by extension (if possible and preferred) on a per transport layer flow, like in Fixed CPEs and Mobile networks. No discussion about this, and assuming we have and still will have an Internet which needs to support both common queues (like DualQ is intended) and FQs, I think the only discussion point is how we want to migrate to an Internet that supports optimally Low Latency.
    
    This leads us to the question L4S or SCE?
    
    If we want to support low latency for both common queues and FQs we "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too, and if we want to force the whole Internet to use only FQs, we "SHOULD" use SCE 😉. If your goal is to force only FQs in the Internet, then let this be clear... I assume we need a discussion on another level in that case (and to be clear, it is not a goal I can support)...
    
    Koen.
    
    
    -----Original Message-----
    From: Jonathan Morton <chromatix99@gmail.com> 
    Sent: Friday, July 5, 2019 10:51 AM
    To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
    Cc: Bob Briscoe <ietf@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
    Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
    
    > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
    > 
    >>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
    > 
    > Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
    > 
    >        "You cannot defeat a DualQ" (at least no more than a single Q)
    
    I consider forcibly degrading DualQ to single-queue mode to be a defeat.  However…
    
    >>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
    > 
    > With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.
    
    Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck.  Defeating DualQ only destroys L4S' latency advantage over "classic" traffic.  We might actually be making progress here!
    
    > I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.
    
    Devil's advocate time.  What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE?  Classic traffic would see only the latter; L4S could use the former.
    
    > We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type. 
    
    Yes, that's exactly what we do - and it does work.
    
    > 	- So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).
    
    This is a false dichotomy.  We quickly realised both of those options were unacceptable, and sought a third way.
    
    SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already.  The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.
    
    > 	- you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.
    
    Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware.  So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.
    
    It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.
    
    > Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.
    
    SCE is designed around not *needing* to differentiate the traffic types.  Single queues have known disadvantages, and SCE doesn't worsen them.
    
    Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be interested in hearing a principled critique of it.
    
     - Jonathan Morton
    
    


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-08 20:55                         ` Holland, Jake
@ 2019-07-10  0:10                           ` Jonathan Morton
  2019-07-10  9:00                           ` De Schepper, Koen (Nokia - BE/Antwerp)
  1 sibling, 0 replies; 84+ messages in thread
From: Jonathan Morton @ 2019-07-10  0:10 UTC (permalink / raw)
  To: Holland, Jake; +Cc: De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

> On 8 Jul, 2019, at 11:55 pm, Holland, Jake <jholland@akamai.com> wrote:
> 
> Also, I think if the SCE position is "low latency can only be
> achieved with FQ", that's different from "forcing only FQ on the
> internet", provided the fairness claims hold up, right?  (Classic
> single queue AQMs may still have a useful place in getting
> pretty-good latency in the cheapest hardware, like maybe PIE with
> marking.)

In support of this viewpoint, here are some illustrative time-series graphs showing SCE behaviour in a variety of contexts.  These are all simple two-flow tests plus a sparse latency probe flow, conducted using Flent, over a 50Mbps, 80ms RTT path under lab conditions.

First let's get the FQ case out of the way, with Reno-SCE competing against plain old Reno.  Here you can see Reno's classic sawtooth, while FQ keeps the latency of sparse flows sharing the link low; the novelty is that Reno-SCE is successfully using almost all of the capacity left on the table by plain Reno's sawtooth.  This is basically ideal behaviour, enabled by FQ.

If we then disable FQ and run the same test, we find that Reno-SCE yields very politely to plain Reno, again using only leftover capacity.  From earlier comments, I gather that a similar graph was seen by the L4S team at some point in their development.  Here we can see some small delay spikes, just before AQM activates to cut the plain Reno flow down.

Conversely, if we begin the SCE marking ramp only when CE marking also begins, we get good fairness between the two flows, in the same manner as with a conventional AQM - because both flows are mostly receiving only conventional AQM signals.  The delay spikes also reflect that fact, and a significant amount of capacity goes unused.  I gather that this scenario was also approximately seen during L4S development.

Our solution - which required only a few days' thought and calculation to define - is to make the SCE ramp straddle the AQM activation threshold, for single-queue situations only.  The precise extent of straddling is configurable to suit different network situations; here is the one that works best for this scenario.  Fairness between the two flows remains good; mostly the CE marks are going to the plain Reno flow, while the SCE flow is using the remaining capacity fairly effectively.  Notice however that the delay plateaus due to the weakened SCE signalling:

Compare this to single-queue SCE vs SCE performance in a single queue, using the basic SCE ramp which lies entirely below the AQM threshold:

And with the straddling ramp:

And with the SCE ramp entirely above the threshold:

And, finally, the *real* ideal situation - SCE vs SCE with FQ:

I hope this reassures various people that we do, in fact, know what we're doing over here.

 - Jonathan Morton

[-- Attachment #2.1: Type: text/html, Size: 5164 bytes --]

[-- Attachment #2.2: PastedGraphic-1.png --]
[-- Type: image/png, Size: 134772 bytes --]

[-- Attachment #2.3: PastedGraphic-2.png --]
[-- Type: image/png, Size: 142683 bytes --]

[-- Attachment #2.4: PastedGraphic-3.png --]
[-- Type: image/png, Size: 165646 bytes --]

[-- Attachment #2.5: PastedGraphic-4.png --]
[-- Type: image/png, Size: 171245 bytes --]

[-- Attachment #2.6: PastedGraphic-5.png --]
[-- Type: image/png, Size: 171513 bytes --]

[-- Attachment #2.7: PastedGraphic-6.png --]
[-- Type: image/png, Size: 172226 bytes --]

[-- Attachment #2.8: PastedGraphic-7.png --]
[-- Type: image/png, Size: 187005 bytes --]

[-- Attachment #2.9: PastedGraphic-8.png --]
[-- Type: image/png, Size: 140303 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-08 20:55                         ` Holland, Jake
  2019-07-10  0:10                           ` Jonathan Morton
@ 2019-07-10  9:00                           ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-10 13:14                             ` Dave Taht
  2019-07-17 22:40                             ` Sebastian Moeller
  1 sibling, 2 replies; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-10  9:00 UTC (permalink / raw)
  To: Holland, Jake, Jonathan Morton; +Cc: ecn-sane, tsvwg

Hi Jake,

>> I agree the key question for this discussion is about how best to get low latency for the internet.
Thanks

>> under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.
Correct

>> we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking traffic
Correct, not without compromising latency for Prague or throughput/utilization/stability/drop for Reno/Cubic

>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?

FQ for "per-user" isolation in access equipment has clearly an extra cost, not? If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...

>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?

Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens. Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows are identified might evolve (new transport protocols, encapsulations, ...?). Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities, besides it is a hard-coded throughput policy (and even mice size = 1 packet). On the other hand, flow isolation has benefits too, so hard to rule out one of them, not?

>> Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right?  (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with marking.)

Are you saying that the real good stuff can only be for FQ 😉? Fairness between a flow getting only one signal and another getting 2 is an issue, right? The one with the 2 signals can either ignore one, listen half to both, or try to smooth both signals to find the average loudest one? Again safety or performance needs to be chosen. PIE or PI2 is optimal for Classic traffic and good to couple congestion to Prague traffic, but Prague traffic needs a separate Q and an immediate step to get the "good stuff" working. Otherwise it will also overshoot, respond sluggish, etc...

>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, 

I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system. For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.

>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.

The performance in FQ is clearly equivalent, but for a common-Q behavior, only L4S can work. As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ. Don't underestimate the power of congestion control 😉. The ultimate proof is in the DualQ Coupled AQM where congestion control can beat a priority scheduler. If you want FQ to have effect, you need to have an AQM per FQ... The authors will notice this when they implement an AQM on top of it. I saw the current implementation works only in taildrop mode. But I think it is very good that the SCE proponents are very motivated to try with this speed to improve L4S. I'm happy to be proven wrong, but up to now I don't see any promising improvements to justify delay for L4S, only the above alternative compromise. Agreed that we can continue exploring alternative proposal in parallel though.

Koen.

-----Original Message-----
From: Holland, Jake <jholland@akamai.com> 
Sent: Monday, July 8, 2019 10:56 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>; Jonathan Morton <chromatix99@gmail.com>
Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

Hi Koen,

I'm a bit confused by this response.

I agree the key question for this discussion is about how best to get low latency for the internet.

If I'm reading your message correctly, you're saying that under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.

(I think I understand and roughly agree with this claim, subject to some caveats.  I just want to make sure I've got this right so far, and that we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking
traffic.)

Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?  Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?

Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right?  (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with
marking.)

Anyway, to me this discussion is about the tradeoffs between the
2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.

Best regards,
Jake

On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com> wrote:

    Hi Jonathan,

    From your responses below, I have the impression you think this discussion is about FQ (flow/fair queuing). Fair queuing is used today where strict isolation is wanted, like between subscribers, and by extension (if possible and preferred) on a per transport layer flow, like in Fixed CPEs and Mobile networks. No discussion about this, and assuming we have and still will have an Internet which needs to support both common queues (like DualQ is intended) and FQs, I think the only discussion point is how we want to migrate to an Internet that supports optimally Low Latency.

    This leads us to the question L4S or SCE?

    If we want to support low latency for both common queues and FQs we "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too, and if we want to force the whole Internet to use only FQs, we "SHOULD" use SCE 😉. If your goal is to force only FQs in the Internet, then let this be clear... I assume we need a discussion on another level in that case (and to be clear, it is not a goal I can support)...

    Koen.

    -----Original Message-----
    From: Jonathan Morton <chromatix99@gmail.com> 
    Sent: Friday, July 5, 2019 10:51 AM
    To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
    Cc: Bob Briscoe <ietf@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
    Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

    > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
    > 
    >>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
    > 
    > Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
    > 
    >        "You cannot defeat a DualQ" (at least no more than a single Q)

    I consider forcibly degrading DualQ to single-queue mode to be a defeat.  However…

    >>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
    > 
    > With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.

    Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck.  Defeating DualQ only destroys L4S' latency advantage over "classic" traffic.  We might actually be making progress here!

    > I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.

    Devil's advocate time.  What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE?  Classic traffic would see only the latter; L4S could use the former.

    > We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type. 

    Yes, that's exactly what we do - and it does work.

    > 	- So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).

    This is a false dichotomy.  We quickly realised both of those options were unacceptable, and sought a third way.

    SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already.  The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.

    > 	- you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.

    Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware.  So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.

    It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.

    > Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.

    SCE is designed around not *needing* to differentiate the traffic types.  Single queues have known disadvantages, and SCE doesn't worsen them.

    Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be interested in hearing a principled critique of it.

     - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-10  9:00                           ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-10 13:14                             ` Dave Taht
  2019-07-10 17:32                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-17 22:40                             ` Sebastian Moeller
  1 sibling, 1 reply; 84+ messages in thread
From: Dave Taht @ 2019-07-10 13:14 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: Holland, Jake, Jonathan Morton, ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 18327 bytes --]

I keep trying to stay out of this conversation being yellow about ecn in
the first place, in any form. I would like to stress that
ecn-sane was formed by the group of folk that were concerned about having
accidentally masterminded the worlds biggest fq + aqm
deployment, and the only one with ecn support, which happens

In the case of wifi, the deployment is now in the 10s of millions, and
doing hordes of good - latencies measured in the 10s of ms rather than 10s
of seconds.

I have seen no numbers on how well l4s will make it over to wifi as yet,
nor any discussion, and I would rather like more pieces of the l4s solution
to land sufficiently integrated for testing using tools like flent, and
over far more than just a isochronous mac layer like dsl or docsis. Given
the size of a txop in wifi (5.3ms), and how far back we have
to put the AQM and FQ components today (2 txops), I don't think many of
either SCE or L4S concepts will work well on wifi... but in general
I prefer not to make assertions or assumptions until real-world testing can
commence.

I am presently at the battlemesh conference trying to get a bit of
real-world data.

A big problem wifi and 3g have is too many retransmits at the mac layer,
not congestion controlled. Any signalling gets there late, and it's
better to drop a bunch of packets when you hit a bunch of retransmits, in
general. IMHO.

On Wed, Jul 10, 2019 at 2:05 AM De Schepper, Koen (Nokia - BE/Antwerp) <
koen.de_schepper@nokia-bell-labs.com> wrote:

> Hi Jake,
>
> >> I agree the key question for this discussion is about how best to get
> low latency for the internet.
> Thanks
>
> >> under the L4S approach for ECT(1), we can achieve it with either dualq
> or fq at the bottleneck, but under the SCE approach we can only do it with
> fq at the bottleneck.
> Correct
>
> >> we agree that in neither case can very low latency be achieved with a
> classic single queue with classic bandwidth-seeking traffic
> Correct, not without compromising latency for Prague or
> throughput/utilization/stability/drop for Reno/Cubic
>
> >> Are you saying that even if a scalable FQ can be implemented in
> high-volume aggregated links at the same cost and difficulty as dualq,
> there's a reason not to use FQ?
>


> FQ for "per-user" isolation in access equipment has clearly an extra cost,
> not?


I've argued in the past that hashing is a bog standard part of most network
cards and switches already.

"extra cost" should be measured by actual measurements. Usually when you do
those, you find it's another variable entirely costing you the most
cpu/circuits.


If we need to implement FQ "per-flow" on top, we need 2 levels of FQ
> (per-user and per-user-flow, so from thousands to millions of queues).
> Also, I haven’t seen DC switches coming with an FQ AQM...
>

Meh. Most of the time the instantaneous number of queues for some
measurement of instantenious is in the low hundreds for rates up to
10GigE. We don't have a lot of data for bigger pipes.

I haven't seen any DC switches with support anything other than RED or AFD,
and DC folk overprovision anyway.



> >> Is there a use case where it's necessary to avoid strict isolation if
> strict isolation can be accomplished as cheaply?
>
> Even if as cheaply, as long as there is no reliable flow identification,
> it clearly has side effects. Many homeworkers are using a VPN tunnel, which
> is only one flow encapsulating maybe dozens.


This is true. For a local endpoint for a vpn from a router fq_codel long
ago gained support for doing the hashing & FQ before entering the tunnel.

This works only with in-kernel ipsec transports although I've been trying
to get it added to wireguard for a long time now.

 It of course doesn't apply to the whole path, but when applied at the home
gateway router (bottleneck link), works rather well.

Here are two examples of that mechanism in play.

http://www.taht.net/~d/ipsec_fq_codel/oldqos.png

http://www.taht.net/~d/ipsec_fq_codel/newqos.png

Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows
> are identified might evolve (new transport protocols, encapsulations,
> ...?). Also if strict flow isolation could be done correctly, it has
> additional issues related to missed scheduling opportunities, besides it is
> a hard-coded throughput policy (and even mice size = 1 packet). On the
> other hand, flow isolation has benefits too, so hard to rule out one of
> them, not?
>

The packet dissector in linux is quite robust, the one in BSD, less so.

A counterpoint to the entire ECN debate (l4s or sce) that I'd like to make
at more length is that it can and does hurt non ecn'd flows, particularly
at lower
bandwidths when you cannot reduce cwnd below 2 and the link is thus
saturated. ARP can starve. ISIS fails. batman - lacking an IP header -  can
starve.
babel, lacking ecn support can start to fail. And so on.


> >> Also, I think if the SCE position is "low latency can only be achieved
> with FQ", that's different from "forcing only FQ on the internet", provided
> the fairness claims hold up, right?  (Classic single queue AQMs may still
> have a useful place in getting pretty-good latency in the cheapest
> hardware, like maybe PIE with marking.)
>
> Are you saying that the real good stuff can only be for FQ 😉? Fairness
> between a flow getting only one signal and another getting 2 is an issue,
> right? The one with the 2 signals can either ignore one, listen half to
> both, or try to smooth both signals to find the average loudest one? Again
> safety or performance needs to be chosen. PIE or PI2 is optimal for Classic
> traffic and good to couple congestion to Prague traffic, but Prague traffic
> needs a separate Q and an immediate step to get the "good stuff" working.
> Otherwise it will also overshoot, respond sluggish, etc...
>
> >> Anyway, to me this discussion is about the tradeoffs between the 2
> proposals.  It seems to me SCE has some safety advantages that should not
> be thrown away lightly,
>
> I appreciate the efforts of trying to improve L4S, but nobody working on
> L4S for years now see a way that SCE can work on a non-FQ system. For me
> (and I think many others) it is a no-go to only support FQ. Unfortunately
> we only have half a bit free, and we need to choose how to use it. Would
> you choose for the existing ECN switches that cannot be upgraded (are there
> any?) or for all future non-FQ systems.
>
>


> >> so if the performance can be made equivalent, it would be good to know
> about it before committing the codepoint.
>
> The performance in FQ is clearly equivalent,


Huh?


> but for a common-Q behavior, only L4S can work. As far as I understood the
> SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ
> disguise 😉), so I think not really a better alternative than pure FQ. Also
> its single AQM on the bulk queue will undo any isolation, as a coupled AQM
> is stronger than any scheduler, including FQ. Don't underestimate the power
> of congestion control 😉. The ultimate proof is in the DualQ Coupled AQM
> where congestion control can beat a priority scheduler. If you want FQ to
> have effect, you need to have an AQM per FQ... The authors will notice this
> when they implement an AQM on top of it. I saw the current implementation
> works only in taildrop mode. But I think it is very good that the SCE
> proponents are very motivated to try with this speed to improve L4S. I'm
> happy to be proven wrong, but up to now I don't see any promising
> improvements to justify delay for L4S, only the above alternative
> compromise. Agreed that we can continue exploring alternative proposal in
> parallel though.
>
>
I cannot parse this extreme set of assumptions and declarations. "taildrop
mode??"

As for promising improvements in general, there is a 7 year old deployment,
running code,  of something that we've show to work well in a variety
of network scenarios, with 10x-100x improvements in network latency, at
roughly 100% in linux overall, widely used in wifi and in many, many
SQM/Qos systems and containers, with basic rfc3168 ecn enabled... and a
proposal for a backward compatible way of enhancing that still more being
explored. The embedded hardware pipeline
for future implementations of this tech is full - it would take 3+ years to
make a course change....

vs something that still has no real-world deployment data at all, that
changes the definition of ecn, that has not a public ns2 or n3 model (?),
no testing aside from a few
very specific benchmarks, and so on...

I do hope the coding competition heats up more, with more running code that
others can explore, most of all. I long ago tired of the endless debates,
as everyone knows,
and I do kind of wish I wasn't burning lunch on this email instead of
setting up a test at battlemesh.

I note also that my leanings - in a fq_codel'd world, were it to stay such,
was to enable more RTT based CCs  like BBRto work more often in an RTT
mode, and thus
we start - originally to me, the SCE idea was a way to trigger a faster
switch to congestion avoidance - as most of my captures taken from over
used APs in
restaurants, cafes, train stations etc shows stuff in slow start to be the
biggest problem - and, regardless, an initial CE, right now, is a strong
indicator that fq-codel is present, and
a RTT based tcp can thus start to happen, and a good one, would not have
many future marks after the first.

A big difference in our outlooks, I guess, is that my viewpoint is that
most of the congestion is at the edges of the network and I don't care all
that
much about big iron or switches, and I don't think either can afford much
aqm tech at all in the first place. Not dual queues, not fqs.

Were L4S not to deploy (using ect1 as a marker - btw, I think CS5 might be
a better candidate as it goes into the wifi VI queue), and a
fq_pie/fq_codel/sch_cake
world to remain predominant, well, we might get somewhere, faster, where it
counted.

Koen.
>
>
> -----Original Message-----
> From: Holland, Jake <jholland@akamai.com>
> Sent: Monday, July 8, 2019 10:56 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper@nokia-bell-labs.com>; Jonathan Morton <
> chromatix99@gmail.com>
> Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
>
> Hi Koen,
>
> I'm a bit confused by this response.
>
> I agree the key question for this discussion is about how best to get low
> latency for the internet.
>
> If I'm reading your message correctly, you're saying that under the L4S
> approach for ECT(1), we can achieve it with either dualq or fq at the
> bottleneck, but under the SCE approach we can only do it with fq at the
> bottleneck.
>
> (I think I understand and roughly agree with this claim, subject to some
> caveats.  I just want to make sure I've got this right so far, and that we
> agree that in neither case can very low latency be achieved with a classic
> single queue with classic bandwidth-seeking
> traffic.)
>
> Are you saying that even if a scalable FQ can be implemented in
> high-volume aggregated links at the same cost and difficulty as dualq,
> there's a reason not to use FQ?  Is there a use case where it's necessary
> to avoid strict isolation if strict isolation can be accomplished as
> cheaply?
>
> Also, I think if the SCE position is "low latency can only be achieved
> with FQ", that's different from "forcing only FQ on the internet", provided
> the fairness claims hold up, right?  (Classic single queue AQMs may still
> have a useful place in getting pretty-good latency in the cheapest
> hardware, like maybe PIE with
> marking.)
>
> Anyway, to me this discussion is about the tradeoffs between the
> 2 proposals.  It seems to me SCE has some safety advantages that should
> not be thrown away lightly, so if the performance can be made equivalent,
> it would be good to know about it before committing the codepoint.
>
> Best regards,
> Jake
>
> On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <
> koen.de_schepper@nokia-bell-labs.com> wrote:
>
>     Hi Jonathan,
>
>     From your responses below, I have the impression you think this
> discussion is about FQ (flow/fair queuing). Fair queuing is used today
> where strict isolation is wanted, like between subscribers, and by
> extension (if possible and preferred) on a per transport layer flow, like
> in Fixed CPEs and Mobile networks. No discussion about this, and assuming
> we have and still will have an Internet which needs to support both common
> queues (like DualQ is intended) and FQs, I think the only discussion point
> is how we want to migrate to an Internet that supports optimally Low
> Latency.
>
>     This leads us to the question L4S or SCE?
>
>     If we want to support low latency for both common queues and FQs we
> "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too,
> and if we want to force the whole Internet to use only FQs, we "SHOULD" use
> SCE 😉. If your goal is to force only FQs in the Internet, then let this be
> clear... I assume we need a discussion on another level in that case (and
> to be clear, it is not a goal I can support)...
>
>     Koen.
>
>
>     -----Original Message-----
>     From: Jonathan Morton <chromatix99@gmail.com>
>     Sent: Friday, July 5, 2019 10:51 AM
>     To: De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper@nokia-bell-labs.com>
>     Cc: Bob Briscoe <ietf@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net;
> tsvwg@ietf.org
>     Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
>
>     > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper@nokia-bell-labs.com> wrote:
>     >
>     >>> 2: DualQ can be defeated by an adversary, destroying its ability
> to isolate L4S traffic.
>     >
>     > Before jumping to another point, let's close down your original
> issue. Since you didn't mention, I assume that you agree with the
> following, right?
>     >
>     >        "You cannot defeat a DualQ" (at least no more than a single Q)
>
>     I consider forcibly degrading DualQ to single-queue mode to be a
> defeat.  However…
>
>     >>> But that's exactly the problem.  Single queue AQM does not isolate
> L4S traffic from "classic" traffic, so the latter suffers from the former's
> relative aggression in the face of AQM activity.
>     >
>     > With L4S a single queue can differentiate between Classic and L4S
> traffic. That's why it knows exactly how to treat the traffic. For Non-ECT
> and ECT(0) square the probability, and for ECT(1) don't square, and it
> works exactly like a DualQ, but then without the latency isolation. Both
> types get the same throughput, AND delay. See the PI2 paper, which is
> exactly about a single Q.
>
>     Okay, this is an important point: the real assertion is not that DualQ
> itself is needed for L4S to be safe on the Internet, but for differential
> AQM treatment to be present at the bottleneck.  Defeating DualQ only
> destroys L4S' latency advantage over "classic" traffic.  We might actually
> be making progress here!
>
>     > I agree you cannot isolate in a single Q, and this is why L4S is
> better than SCE, because it tells the AQM what to do, even if it has a
> single Q. SCE needs isolation, L4S not.
>
>     Devil's advocate time.  What if, instead of providing differential
> treatment WRT CE marking, PI2 instead applied both marking strategies
> simultaneously - the higher rate using SCE, and the lower rate using CE?
> Classic traffic would see only the latter; L4S could use the former.
>
>     > We tried years ago similar things like needed for SCE, and found
> that it can't work. For throughput fairness you need the squared relation
> between the 2 signals, but with SCE, you need to apply both signals in
> parallel, because you don't know the sender type.
>
>     Yes, that's exactly what we do - and it does work.
>
>     >   - So either the sender needs to ignore CE if it gets SCE, or
> ignore SCE if you get CE. The first is dangerous if you have multiple
> bottlenecks, and the second is defeating the purpose of SCE. Any other
> combination leads to unfairness (double response).
>
>     This is a false dichotomy.  We quickly realised both of those options
> were unacceptable, and sought a third way.
>
>     SCE senders apply a reduced CE response when also responding to
> parallel SCE feedback, roughly in line with ABE, on the grounds that
> responding to SCE does some of the necessary reduction already.  The
> reduced response is still a Multiplicative Decrease, so it fits with normal
> TCP congestion control principles.
>
>     >   - you separate the signals in queue dept, first applying SCE and
> later CE, as you originally proposed, but that results in starvation for
> SCE.
>
>     Yes, although this approach gives the best performance for SCE when
> used with flow isolation, or when all flows are known to be SCE-aware.  So
> we apply this strategy in those cases, and move the SCE marking function up
> to overlap CE marking specifically for single queues.
>
>     It has been suggested that single queue AQMs are rare in any case, but
> this approach covers that corner case.
>
>     > Add on top that SCE makes it impossible to use DualQ, as you cannot
> differentiate the traffic types.
>
>     SCE is designed around not *needing* to differentiate the traffic
> types.  Single queues have known disadvantages, and SCE doesn't worsen them.
>
>     Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be
> interested in hearing a principled critique of it.
>
>      - Jonathan Morton
>
>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane
>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

[-- Attachment #2: Type: text/html, Size: 22205 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] Comments on L4S drafts
  2019-06-19 14:11     ` Bob Briscoe
@ 2019-07-10 13:55       ` Holland, Jake
  0 siblings, 0 replies; 84+ messages in thread
From: Holland, Jake @ 2019-07-10 13:55 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: tsvwg, ecn-sane

Hi Bob,

Responses to a few points inline.  And sorry for the slow response
on this message, it's been a busy time.

From: Bob Briscoe <ietf@bobbriscoe.net>
Date: 2019-06-19 at 07:11

[BB] Understood. I was concerned that I was demolishing your idea in public, and I was trying to thank you for being willing to put up a strawman. 

[JH] My pleasure :)  I just wanted to raise the point in hopes
it would help avoid heading down a wrong path there.

Indeed, FQ itself screwed up the work on background transport protocols, and many other plans for novel applications of unequal throughput (I'll start a separate thread on that).

[JH] That's interesting, and I look forward to hearing more about it.
But I'm surprised if this isn't addressable in some other, perhaps
better ways... Between RFC 8622 and a sender's ability to just back off
more aggressively or persistently under congestion signals, this doesn't
seem as intractable as the fairness problem that FQ solves, so my first
instinct is that this is a good trade.  But you seem to have thought
more about it, so I'm interested to hear the counterexamples.

Don't worry. Classic ECN fall-back is on the ToDo list. I just didn't want to do it unless we have to, cos I prefer simplicity.

[JH] My concern is that this seems like a sender-side safety problem
that might not be discovered right away, lacking extensive traffic,
but makes it unsafe to do otherwise reasonable things (like PIE with
ECN) that might seem like they could be a pretty good idea.  If this
gets rolled out without the safety valves, it seems like the kind of
thing that would blow out later, with the problem being traffic from
systems that haven't been touched in years, under the right kind of
pressure, which might not be all that impossible.

And thus it seems worth raising as problematic ossification that's
worth avoiding if possible, rather than making existing deployed code
obsolete without properly deprecating it.

3. One more meta-point: the sales-y language makes the drafts hard to
read for me
[BB] if there are any you want changed, pls call them out.

Thanks for inviting the critique, I haven't been quite sure how to
approach this.  I hope this is received in the spirit it's offered: as an
attempt to improve the document text to make it easier for future readers,
especially potential future implementors.

I'm going to have to do this in sections, because it seems to me there's
quite a large density of advocacy and unquantified hype-increasing
value judgements for an RFC, and that it's spread pretty widely.  If you
want this kind of review for the whole thing, it may take a while, but
hopefully I can give a general idea that could be applied to other
sections as well.  But if needed, I'll try to raise what I can elsewhere
too, my thinly-stretched time permitting.

Also: I recognize that there's some editorial discretion here that can
reasonably differ, and I don't insist that all the instances I'll try to
raise be stripped completely, but it seems to me there's a lot of
text--maybe as much as half or more of the paragraphs in these docs--
with issues like the ones I'll mention here.  My experience reading
these docs was characterized by frequently having to push down the
skepticism I get when someone's trying to sell me something, and
re-focus on the tech.  So this section is based on the assumption that
when an implementor is trying to read an RFC, there's negative value on
exposition with even a little tendency toward hype.

I'll start with just the abstract for l4s-arch:

Overall, I think this can be summarized a lot more concisely, and would
read better if the benefits were outlined more as goals, and less as
speculative claims, and if a lot of the exposition was cut, or moved to
the introduction in cases that it's necessary.

Here's a straw-man suggestion for your consideration, please feel free
to use it or adjust as needed:

<suggested_abstract_text>
This document describes an architecture for "Low Latency, Low
Loss, Scalable throughput" (L4S), a new Internet service targeted
to replace best-effort transport service eventually, via
incremental deployment.

L4S-capable senders rely on congestion signaling to eliminate
queuing delay and loss while maintaining high link utilization
during sustained transfers.  L4S-capable bottlenecks rely on
sender response and packet classification to maintain a low queue
occupancy and provide preferential forwarding for L4S traffic.

Bottleneck link capacity is shared with non-L4S traffic, providing
low loss and low latency to L4S traffic, but with inter-class
fairness roughly equal to inter-flow TCP competition.  This provides
improved fairness relative to Diffserv solutions that use traffic
priority to provide low latency.
</suggested_abstract_text>

But to illustrate the nature of the issues to which I'm referring,
and in case you don't like that text, I'll also flag the points that
gave me trouble in the original:

   This document describes the L4S architecture for the provision of a
   new Internet service that could eventually replace best efforts for

- "could eventually replace" is a speculative claim, better expressed
as a goal IMO.

   all traffic: Low Latency, Low Loss, Scalable throughput (L4S).  It is
   becoming common for _all_ (or most) applications being run by a user

- "_all_ (or most)" means the same thing as "most", and framing it with
"all" seems to have no purpose beyond hype?
- underlining is prohibited punctuation (RFC 7322, section 3.2)

   at any one time to require low latency.  However, the only solution

- "require" seems a hype-adding exaggeration, where something
like "benefit from" is closer to fair.

   the IETF can offer for ultra-low queuing delay is Diffserv, which

- "only solution the IETF can offer" is a mistake, assuming this doc
becomes an IETF-offered solution.  Something about "previous
low-latency solutions rely on Diffserv" seems closer to correct.

   only favours a minority of packets at the expense of others.  In
   extensive testing the new L4S service keeps average queuing delay

- "extensive" is an unquantified value judgement that looks like hype

   under a millisecond for _all_ applications even under very heavy

- underlining prohibited
- "all" is misleading, regarding the overload states that fail over to
classic queue

   load, without sacrificing utilization; and it keeps congestion loss
   to zero.  It is becoming widely recognized that adding more access

- zero is also misleading in overloaded states.
- "widely recognized" is a weird basis for this claim, and also an
unquantified hype-adding value jugement

   capacity gives diminishing returns, because latency is becoming the
   critical problem.  Even with a high capacity broadband access, the

- "latency is the critical problem" is a context-sensitive claim, and
adds hype.

   reduced latency of L4S remarkably and consistently improves

- "remarkably" is an unquantified value judgement, and also makes a
good case study for the general claim of sales-y language I'm
making:  searching for "remarkable" or "remarkably" finds that
they're used in only 4 RFCs so far, all of which are referring to
historical occurrences that exceeded expectations (regarding SMTP
in 1869 and 5598, and Jon Postel's contributions in 5540 and 2555).
Using this term for an expectation of performance in an undeployed
system feels like over-the-top hype to me, even if it might come
true eventually, and even if it exceeded expectations in testing.
- "consistently" likewise is an unquantified value judgement

   performance under load for applications such as interactive video,

- "improves performance" is a context-sensitive claim (it would only
get parity depending on the metrics and conditions).

   conversational video, voice, Web, gaming, instant messaging, remote
   desktop and cloud-based apps (even when all being used at once over
   the same access link).  The insight is that the root cause of queuing

- "the insight is that the root cause" is expository and not concise.
(this one isn't about hype, just editorially the point seems misplaced
in the abstract.)

   delay is in TCP, not in the queue.  By fixing the sending TCP (and
   other transports) queuing latency becomes so much better than today
   that operators will want to deploy the network part of L4S to enable

- "so much better that operators will want" is a highly speculative
hype-y claim, and context-specific.  This has been historically quite
hard to predict, and strongly relies on an absence of unexpected issues
that may not be discoverable in test environments.  This also seems
over the top.

   new products and services.  Further, the network part is simple to
   deploy - incrementally with zero-config.  Both parts, sender and

- the "Further, the..." sentence is expository and redundant

   network, ensure coexistence with other legacy traffic.  At the same

- "legacy" is a bit presumptuous

   time L4S solves the long-recognized problem with the future
   scalability of TCP throughput.

   This document describes the L4S architecture, briefly describing the
   different components and how the work together to provide the

- nit: "the"->"they"

   aforementioned enhanced Internet service.

- In closing, I'll also note that at 1964, the character count for this
abstract is more than 5 standard deviations above the mean for RFC
abstracts in the last 15 years (~521+/-267), and would set a new record
(beating RFC 8148 at 1898), so it seems useful to cut it back somehow,
regardless.

I hope that's helpful, and thanks again for inviting critique on this
point.

I'll see whether this comment is considered helpful and whether it
provides enough information to generalize before moving on to other
sections, but hopefully these examples demonstrate the overall nature
of the issues I was having trouble with.

Also worth mentioning: I'm of course only one voice, and this is about
consensus.  If others agree or disagree, it would be good to know,
along with whatever caveats, before either of us puts a lot of work
into attempting a big editorial overhaul, independently of whether the
technical considerations lately under discussion end up having any
bearing.

Best regards,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-04 13:45         ` Bob Briscoe
@ 2019-07-10 17:03           ` Holland, Jake
  0 siblings, 0 replies; 84+ messages in thread
From: Holland, Jake @ 2019-07-10 17:03 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Luca Muscariello, ecn-sane, tsvwg

Hi Bob,

<JH>Responses inline...</JH>

From: Bob Briscoe <ietf@bobbriscoe.net>
Date: 2019-07-04 at 06:45

Nonetheless, when an unresponsive flow(s) is consuming some capacity, and a responsive flow(s) takes the total over the available capacity, then both are responsible in proportion to their contribution to the queue, 'cos the unresponsive flow didn't respond (it didn't even try to).

This is why it's OK to have a small unresponsive flow, but it becomes less and less OK to have a larger and larger unresponsive flow. 

<JH>
Right, this is a big part of the point I'm trying to make here.
Some of the video systems are sending a substantial-sized flow which
is not responsive at the transport layer.

However, that doesn't mean it's entirely unresponsive.  These often
do respond in practice at the application layer, but by observing
some quality of experience threshold from the video rendering.

Part of this quality of experience signal comes from the delay
fluctuation caused by the queueing delay when the link is overloaded,
but running the video through a low-latency queue would remove that
fluctuation, and thus change it from something that would cut over
to a lower bit-rate or remove the video into something that wouldn't.

At the same time, the app benefits from removing that fluctuation--
it gets to deliver a better video quality successfully.  When its
owners test it comparatively, they'll find they have an incentive
to add the marking, and their customers will have an incentive to
adopt that solution over solutions that don't, leading to an arms
race that progressively pushes out more of the responsive traffic.

My claim is that the lack of admission control is what makes this
arms race possible, by removing an important source of backpressure
on apps like this relative to today's internet (or one that does a
stricter fair-share-based degradation at bottlenecks).
</JH>

There's no bandwidth benefit. 
There's only latency benefit, and then the only benefits are:
• the low latency behaviour of yourself and other flows behaving like you
• and, critically, isolation from those flows not behaving well like you. 
Neither give an incentive to mismark - you get nothing if you don't behave. And there's a disincentive for 'Classic' TCP flows to mismark, 'cos they badly underutilize without a queue.

<JH>
It's typical for Non-responsive flows to get benefits from lower
latency.

I actually do (with caveats) agree that flows that don't respond
to transient congestion should be fine, as long as they use no more
than their fair share of the capacity.  However, by removing the
backpressure without adding something to prevent them from using
more than their fair share, it sets up perverse incentives that
push the ecosystem toward congestion collapse.

The Queue protection mechanism you mentioned might be sufficient
for this, but I'm having a hard time understanding the claim that
it's not required.

It seems to me in practice it will need to be used whenever there's
a chance that non-responsive flows can consume more than their
share, which chance we can reasonably expect will grow naturally
if L4S deployment grows.
</JH>

1/ The q-prot mechanism certainly has the disadvantage that it has to access L4 headers. But it is much more lightweight than FQ.

...

That's probably not understandable. Let me write it up properly - with some explanatory pictures and examples.

<JH>
I thought it was a reasonable summary and thanks for the
quick explanation (not to discourage writing it up properly,
which would also be good).

In short, it sounds to me like if we end up agreeing that Q
protection is required in L4S with dualq (a point currently
disputed, acked), and if the lfq draft holds up to scrutiny
(also a point to be determined), then it means:

The per-bucket complexity overhead comparison for the 2 proposed
architectures (L4S vs. SCE-based) would be 1 int per hash bucket
for dualq, vs. 2 ints + 1 bit per hash bucket for lfq.  And if so,
these overhead comparisons at the bottleneck queues can be taken
as a roughly fair comparison to weigh against other considerations.

Does that sound approximately correct?

Best regards,
Jake
</JH>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-10 13:14                             ` Dave Taht
@ 2019-07-10 17:32                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  0 siblings, 0 replies; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-10 17:32 UTC (permalink / raw)
  To: Dave Taht; +Cc: Holland, Jake, Jonathan Morton, ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 21448 bytes --]

Hi Dave,

Sorry for your lunch, but maybe I’ve cut away too much context, as I think some of your responses are not really about the discussion point.

In general I see that we both agree that FQ has pro’s and con’s, and is deployed and useful, so no need for further discussion on FQ. The actual discussion is on whether we still need to support low latency on non_FQ systems, or that low latency is only a privilege of FQ systems.

>>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
>>
>>The performance in FQ is clearly equivalent,
>
>Huh?

My point was that SCE on FQ can give equivalent results as L4S on FQ, and I think everyone agrees here too.
But I want to make clear that SCE is only working with FQ with an AQM per Q:

>> but for a common-Q behavior, only L4S can work. As far as I understood the SCE-LFQ proposal is actually
>> a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than
>> pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than
>> any scheduler, including FQ. Don't underestimate the power of congestion control 😉. The ultimate proof
>> is in the DualQ Coupled AQM where congestion control can beat a priority scheduler. If you want FQ to
>> have effect, you need to have an AQM per FQ... The authors will notice this when they implement an AQM
>> on top of it. I saw the current implementation works only in taildrop mode. But I think it is very good that
>> the SCE proponents are very motivated to try with this speed to improve L4S. I'm happy to be proven wrong,
>> but up to now I don't see any promising improvements to justify delay for L4S, only the above alternative
>> compromise. Agreed that we can continue exploring alternative proposal in parallel though.
>
> I cannot parse this extreme set of assumptions and declarations. "taildrop mode??"

Context: Common-Q behavior is one common Q or set of common Qs (like DualQ) with one
coupled AQM which doesn’t want to identify every flow, but only traffic classes (Classic or L4S).

If you re-read the section again with this context, you will better understand that this is not about FQ (we agree that
Both L4S and SCE work) but about the LFQ (light-weight-FQ) proposal that seems to claim to be a DualQ, but is
actually an FQ which needs more time to select a packet at dequeue. It also has a common AQM on top of all bulk
virtual-FQ-queues. As you probably agree, you need an AQM per queue if you want to benefit from FQ or congestion
control will take over and FQ behaves like a single Q. This is especially important if the congestion controls are not
compatible, because you need to identify the traffic classes to give a differentiated AQM treatment to the different
classes, hence the need for L4S...

I hope this clarifies,
Koen.

From: Dave Taht <dave.taht@gmail.com>
Sent: Wednesday, July 10, 2019 3:15 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Holland, Jake <jholland@akamai.com>; Jonathan Morton <chromatix99@gmail.com>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts

I keep trying to stay out of this conversation being yellow about ecn in the first place, in any form. I would like to stress that
ecn-sane was formed by the group of folk that were concerned about having accidentally masterminded the worlds biggest fq + aqm
deployment, and the only one with ecn support, which happens

In the case of wifi, the deployment is now in the 10s of millions, and doing hordes of good - latencies measured in the 10s of ms rather than 10s of seconds.

I have seen no numbers on how well l4s will make it over to wifi as yet, nor any discussion, and I would rather like more pieces of the l4s solution to land sufficiently integrated for testing using tools like flent, and over far more than just a isochronous mac layer like dsl or docsis. Given the size of a txop in wifi (5.3ms), and how far back we have
to put the AQM and FQ components today (2 txops), I don't think many of either SCE or L4S concepts will work well on wifi... but in general
I prefer not to make assertions or assumptions until real-world testing can commence.

I am presently at the battlemesh conference trying to get a bit of real-world data.

A big problem wifi and 3g have is too many retransmits at the mac layer, not congestion controlled. Any signalling gets there late, and it's
better to drop a bunch of packets when you hit a bunch of retransmits, in general. IMHO.

On Wed, Jul 10, 2019 at 2:05 AM De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com<mailto:koen.de_schepper@nokia-bell-labs.com>> wrote:
Hi Jake,

>> I agree the key question for this discussion is about how best to get low latency for the internet.
Thanks

>> under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.
Correct

>> we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking traffic
Correct, not without compromising latency for Prague or throughput/utilization/stability/drop for Reno/Cubic

>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?

FQ for "per-user" isolation in access equipment has clearly an extra cost, not?

I've argued in the past that hashing is a bog standard part of most network cards and switches already.

"extra cost" should be measured by actual measurements. Usually when you do those, you find it's another variable entirely costing you the most
cpu/circuits.

If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...

Meh. Most of the time the instantaneous number of queues for some measurement of instantenious is in the low hundreds for rates up to
10GigE. We don't have a lot of data for bigger pipes.

I haven't seen any DC switches with support anything other than RED or AFD, and DC folk overprovision anyway.

>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?

Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens.

This is true. For a local endpoint for a vpn from a router fq_codel long ago gained support for doing the hashing & FQ before entering the tunnel.

This works only with in-kernel ipsec transports although I've been trying to get it added to wireguard for a long time now.

 It of course doesn't apply to the whole path, but when applied at the home gateway router (bottleneck link), works rather well.

Here are two examples of that mechanism in play.

http://www.taht.net/~d/ipsec_fq_codel/oldqos.png

http://www.taht.net/~d/ipsec_fq_codel/newqos.png

Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows are identified might evolve (new transport protocols, encapsulations, ...?). Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities, besides it is a hard-coded throughput policy (and even mice size = 1 packet). On the other hand, flow isolation has benefits too, so hard to rule out one of them, not?

The packet dissector in linux is quite robust, the one in BSD, less so.

A counterpoint to the entire ECN debate (l4s or sce) that I'd like to make at more length is that it can and does hurt non ecn'd flows, particularly at lower
bandwidths when you cannot reduce cwnd below 2 and the link is thus saturated. ARP can starve. ISIS fails. batman - lacking an IP header -  can starve.
babel, lacking ecn support can start to fail. And so on.

>> Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right?  (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with marking.)

Are you saying that the real good stuff can only be for FQ 😉? Fairness between a flow getting only one signal and another getting 2 is an issue, right? The one with the 2 signals can either ignore one, listen half to both, or try to smooth both signals to find the average loudest one? Again safety or performance needs to be chosen. PIE or PI2 is optimal for Classic traffic and good to couple congestion to Prague traffic, but Prague traffic needs a separate Q and an immediate step to get the "good stuff" working. Otherwise it will also overshoot, respond sluggish, etc...

>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly,

I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system. For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.

>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.

The performance in FQ is clearly equivalent,

Huh?

but for a common-Q behavior, only L4S can work. As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ. Don't underestimate the power of congestion control 😉. The ultimate proof is in the DualQ Coupled AQM where congestion control can beat a priority scheduler. If you want FQ to have effect, you need to have an AQM per FQ... The authors will notice this when they implement an AQM on top of it. I saw the current implementation works only in taildrop mode. But I think it is very good that the SCE proponents are very motivated to try with this speed to improve L4S. I'm happy to be proven wrong, but up to now I don't see any promising improvements to justify delay for L4S, only the above alternative compromise. Agreed that we can continue exploring alternative proposal in parallel though.

I cannot parse this extreme set of assumptions and declarations. "taildrop mode??"

As for promising improvements in general, there is a 7 year old deployment, running code,  of something that we've show to work well in a variety
of network scenarios, with 10x-100x improvements in network latency, at roughly 100% in linux overall, widely used in wifi and in many, many SQM/Qos systems and containers, with basic rfc3168 ecn enabled... and a proposal for a backward compatible way of enhancing that still more being explored. The embedded hardware pipeline
for future implementations of this tech is full - it would take 3+ years to make a course change....

vs something that still has no real-world deployment data at all, that changes the definition of ecn, that has not a public ns2 or n3 model (?), no testing aside from a few
very specific benchmarks, and so on...

I do hope the coding competition heats up more, with more running code that others can explore, most of all. I long ago tired of the endless debates, as everyone knows,
and I do kind of wish I wasn't burning lunch on this email instead of setting up a test at battlemesh.

I note also that my leanings - in a fq_codel'd world, were it to stay such, was to enable more RTT based CCs  like BBRto work more often in an RTT mode, and thus
we start - originally to me, the SCE idea was a way to trigger a faster switch to congestion avoidance - as most of my captures taken from over used APs in
restaurants, cafes, train stations etc shows stuff in slow start to be the biggest problem - and, regardless, an initial CE, right now, is a strong indicator that fq-codel is present, and
a RTT based tcp can thus start to happen, and a good one, would not have many future marks after the first.

A big difference in our outlooks, I guess, is that my viewpoint is that most of the congestion is at the edges of the network and I don't care all that
much about big iron or switches, and I don't think either can afford much aqm tech at all in the first place. Not dual queues, not fqs.

Were L4S not to deploy (using ect1 as a marker - btw, I think CS5 might be a better candidate as it goes into the wifi VI queue), and a fq_pie/fq_codel/sch_cake
world to remain predominant, well, we might get somewhere, faster, where it counted.

Koen.

-----Original Message-----
From: Holland, Jake <jholland@akamai.com<mailto:jholland@akamai.com>>
Sent: Monday, July 8, 2019 10:56 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com<mailto:koen.de_schepper@nokia-bell-labs.com>>; Jonathan Morton <chromatix99@gmail.com<mailto:chromatix99@gmail.com>>
Cc: ecn-sane@lists.bufferbloat.net<mailto:ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org<mailto:tsvwg@ietf.org>
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

Hi Koen,

I'm a bit confused by this response.

I agree the key question for this discussion is about how best to get low latency for the internet.

If I'm reading your message correctly, you're saying that under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.

(I think I understand and roughly agree with this claim, subject to some caveats.  I just want to make sure I've got this right so far, and that we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking
traffic.)

Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?  Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?

Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right?  (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with
marking.)

Anyway, to me this discussion is about the tradeoffs between the
2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.

Best regards,
Jake

On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com<mailto:koen.de_schepper@nokia-bell-labs.com>> wrote:

    Hi Jonathan,

    From your responses below, I have the impression you think this discussion is about FQ (flow/fair queuing). Fair queuing is used today where strict isolation is wanted, like between subscribers, and by extension (if possible and preferred) on a per transport layer flow, like in Fixed CPEs and Mobile networks. No discussion about this, and assuming we have and still will have an Internet which needs to support both common queues (like DualQ is intended) and FQs, I think the only discussion point is how we want to migrate to an Internet that supports optimally Low Latency.

    This leads us to the question L4S or SCE?

    If we want to support low latency for both common queues and FQs we "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too, and if we want to force the whole Internet to use only FQs, we "SHOULD" use SCE 😉. If your goal is to force only FQs in the Internet, then let this be clear... I assume we need a discussion on another level in that case (and to be clear, it is not a goal I can support)...

    Koen.

    -----Original Message-----
    From: Jonathan Morton <chromatix99@gmail.com<mailto:chromatix99@gmail.com>>
    Sent: Friday, July 5, 2019 10:51 AM
    To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com<mailto:koen.de_schepper@nokia-bell-labs.com>>
    Cc: Bob Briscoe <ietf@bobbriscoe.net<mailto:ietf@bobbriscoe.net>>; ecn-sane@lists.bufferbloat.net<mailto:ecn-sane@lists.bufferbloat.net>; tsvwg@ietf.org<mailto:tsvwg@ietf.org>
    Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

    > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com<mailto:koen.de_schepper@nokia-bell-labs.com>> wrote:
    >
    >>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
    >
    > Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
    >
    >        "You cannot defeat a DualQ" (at least no more than a single Q)

    I consider forcibly degrading DualQ to single-queue mode to be a defeat.  However…

    >>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
    >
    > With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.

    Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck.  Defeating DualQ only destroys L4S' latency advantage over "classic" traffic.  We might actually be making progress here!

    > I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.

    Devil's advocate time.  What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE?  Classic traffic would see only the latter; L4S could use the former.

    > We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type.

    Yes, that's exactly what we do - and it does work.

    >   - So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).

    This is a false dichotomy.  We quickly realised both of those options were unacceptable, and sought a third way.

    SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already.  The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.

    >   - you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.

    Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware.  So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.

    It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.

    > Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.

    SCE is designed around not *needing* to differentiate the traffic types.  Single queues have known disadvantages, and SCE doesn't worsen them.

    Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be interested in hearing a principled critique of it.

     - Jonathan Morton

_______________________________________________
Ecn-sane mailing list
Ecn-sane@lists.bufferbloat.net<mailto:Ecn-sane@lists.bufferbloat.net>
https://lists.bufferbloat.net/listinfo/ecn-sane

--

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

[-- Attachment #2: Type: text/html, Size: 34027 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-10  9:00                           ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-10 13:14                             ` Dave Taht
@ 2019-07-17 22:40                             ` Sebastian Moeller
  2019-07-19  9:06                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  1 sibling, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-17 22:40 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: Holland, Jake, Jonathan Morton, ecn-sane, tsvwg

Dear Koen,


> On Jul 10, 2019, at 11:00, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
[...]
>>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?
> 
> FQ for "per-user" isolation in access equipment has clearly an extra cost, not? If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...

	I believe there is work available demonstrating that a) millions of concurrently active flows might be overly pessimistic (even for peering routers) and b) IMHO it is far from settled that these bid transit/peering routers will employ any of the the schemes we are cooking up here. For b) I argue that both L4S "linear" CE-marking and SCE linear ECT(1) marking will give a clear signal of overload that an big ISP might not want to explicitly tell its customers...

> 
>>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?
> 
> Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens.

	Fair enough, but why do you see a problem of treating this multiplexed flow as different from any other flow, after all it was the end-points conscious decision to masquerade as a single flow so why assume special treatment; it is not that intermediate hops have any insight into the multiplexing, so why expect them to cater for this?

> Drop and ECN (if implemented correctly) are tunnel agnostic.

	Exactly, and that is true for each identified flow as well, so fq does not diminish this, but rather builds on top of it.


> Also how flows are identified might evolve (new transport protocols, encapsulations, ...?).

	You are jesting surely, new protocols? We are in this kefuffle, because you claim that a new protocol to signal linear CE-marking response to be made of unobtaininum so you want to abuse an underused EVN code point as a classifier. If new protocols are an option, just bite the bullet and give tcp-reno a new protocol number and use this for your L4S classifier; problem solved in a nice and clean fashion.

> Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities,

	Please elaborate, how an intermediate hop would know about the desires of the endpoints here. As far as I can tell such hops have their own ideas about optimal scheduling that they will enforce independent of the what the endpoints deem optimal (by ncessity as most endpoints will desire highest priority for their packets).

[...]

>>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, 
> 
> I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system.

	That is a rather peculiar argument, especially given that both you and Bob, major forces in the L4S approach, seemm to have philosophical issues with fq?

> For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, 

	??? Again you elaborately state the options in the L4S RFC and just converge on the one which is most convenient, but also not the best match for your requirements.

> and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.
> 
>>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
> 
> The performance in FQ is clearly equivalent, but for a common-Q behavior, only L4S can work.
> As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ.

	But how would the bulk queue actually care, being dedicated to bulk flows? This basically just uses a single codel instance for all flows in the bulk queue, exactly the situation codel was designed for, if I recall correctly. Sure this will run into problems with unrepsonsive flows, but not any more than DualQ with or without  queue protection (you can steer misbehaving flows into the the "classic" queue, but this will just change which flows will suffer most of the collateral damage of that unresponsive flow, IMHO).


Best Regards
	Sebastian Moeller

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-17 22:40                             ` Sebastian Moeller
@ 2019-07-19  9:06                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-19 15:37                                 ` Dave Taht
  2019-07-19 17:59                                 ` Sebastian Moeller
  0 siblings, 2 replies; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-19  9:06 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Holland, Jake, Jonathan Morton, ecn-sane, tsvwg

Hi Sebastian,

To avoid people to read through the long mail, I think the main point I want to make is:
 "Indeed, having common-Qs supported is one of my requirements. That's why I want to keep the discussion on that level: is there consensus that low latency is only needed for a per flow FQ system with an AQM per flow?"

If there is this consensus, this means that we can use SCE and that from now on, all network nodes have to implement per flow queuing with an AQM per flow.
If there is no consensus, we cannot use SCE and need to use L4S.

For all the other detailed discussion topics, see [K] inline:

Regards,
Koen.

-----Original Message-----
From: Sebastian Moeller <moeller0@gmx.de> 
Sent: Thursday, July 18, 2019 12:40 AM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Holland, Jake <jholland@akamai.com>; Jonathan Morton <chromatix99@gmail.com>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts

Dear Koen,

> On Jul 10, 2019, at 11:00, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
[...]
>>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?
> 
> FQ for "per-user" isolation in access equipment has clearly an extra cost, not? If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...

	I believe there is work available demonstrating that a) millions of concurrently active flows might be overly pessimistic (even for peering routers) and b) IMHO it is far from settled that these bid transit/peering routers will employ any of the the schemes we are cooking up here. For b) I argue that both L4S "linear" CE-marking and SCE linear ECT(1) marking will give a clear signal of overload that an big ISP might not want to explicitly tell its customers...

[K] a) indeed, if queues can be dynamically allocated you could settle with less, not sure if dynamic allocation is compatible with high speed implementations. Anyway, any additional complexity is additional cost (and cycles, energy, heat dissipation, ...). Of course everything can be done...
b) I don't agree that ECN is a signal of overload. It is a natural part of feedback to tell greedy TCP that it reached its full capacity. Excessive drop/latency is the signal of overload and an ECN-capable AQM switches from ECN to drop anyway in overload conditions.  Excessive drop and latency can also be measured today, not? Running a few probes can tell customers the same with or without ECN, and capacity is measured simply with speedtests.

> 
>>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?
> 
> Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens.

	Fair enough, but why do you see a problem of treating this multiplexed flow as different from any other flow, after all it was the end-points conscious decision to masquerade as a single flow so why assume special treatment; it is not that intermediate hops have any insight into the multiplexing, so why expect them to cater for this?

[K] Because the design of VPN tunnels had as a main goal to maintain a secure/encrypted connection between clients and servers, trying to minimize the overhead on clients and servers by using a single TCP/UDP connection. I don't think the single flow was chosen to get treated as one flow's budget of throughput. This "feature" didn't exist at that (pre-FQ) time.

> Drop and ECN (if implemented correctly) are tunnel agnostic.

	Exactly, and that is true for each identified flow as well, so fq does not diminish this, but rather builds on top of it.

[K] True for flows within a tunnel, but the point was that FQ treats the aggregated tunnel as a single flow compared to other single flows.

> Also how flows are identified might evolve (new transport protocols, encapsulations, ...?).

	You are jesting surely, new protocols? We are in this kefuffle, because you claim that a new protocol to signal linear CE-marking response to be made of unobtaininum so you want to abuse an underused EVN code point as a classifier. If new protocols are an option, just bite the bullet and give tcp-reno a new protocol number and use this for your L4S classifier; problem solved in a nice and clean fashion.

[K] Indeed, it is hardly impossible to deploy new protocols in practice, but I hope we can make it more possible in the future, not less possible... Maybe utopic, but at least we should try to learn from past mistakes.

> Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities,

	Please elaborate, how an intermediate hop would know about the desires of the endpoints here. As far as I can tell such hops have their own ideas about optimal scheduling that they will enforce independent of the what the endpoints deem optimal (by ncessity as most endpoints will desire highest priority for their packets).

[K] That network nodes cannot know what the end-systems want is exactly the point. FQ just assumes everybody should have the same throughput, and makes an exception for single packets (to undo the most flagrant disadvantage of this strategy). But again, I don't want to let the discussion get distracted by arguing pro or con FQ. I think we have to live with both now.

[...]

>>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, 
> 
> I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system.

	That is a rather peculiar argument, especially given that both you and Bob, major forces in the L4S approach, seemm to have philosophical issues with fq?

[K] I think I am realistic to accept pro's and con's and existence of both. I think wanting only FQ is as philosophical as wanting no FQ at all.

> For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, 

	??? Again you elaborately state the options in the L4S RFC and just converge on the one which is most convenient, but also not the best match for your requirements.

[K] Indeed, having common-Qs supported is one of my requirements. That's why I want to keep the discussion on that level: is there consensus that low latency is only needed for a per flow FQ system with an AQM per flow?

> and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.
> 
>>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
> 
> The performance in FQ is clearly equivalent, but for a common-Q behavior, only L4S can work.
> As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ.

	But how would the bulk queue actually care, being dedicated to bulk flows? This basically just uses a single codel instance for all flows in the bulk queue, exactly the situation codel was designed for, if I recall correctly. Sure this will run into problems with unrepsonsive flows, but not any more than DualQ with or without  queue protection (you can steer misbehaving flows into the the "classic" queue, but this will just change which flows will suffer most of the collateral damage of that unresponsive flow, IMHO).

[K] As far as I recall, CoDel works best for a single flow. For a stateless AQM like a step using only per packet sojourn time, a common AQM over FQs is indeed working as an FQ with an AQM per queue. Applying an stateless AQM for Classic traffic (like sojourn-time RED without smoothing) will have impact on its performance. Adding common state for all bulk queue AQMs will disable the FQ effect. Anyway, the sequential scan at dequeue is the main reason why LFQ will be hard to get traction in high-speed equipment.

Best Regards
	Sebastian Moeller

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]     Comments on L4S drafts
  2019-07-19  9:06                               ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-19 15:37                                 ` Dave Taht
  2019-07-19 18:33                                   ` Wesley Eddy
  2019-07-22 16:28                                   ` Bless, Roland (TM)
  2019-07-19 17:59                                 ` Sebastian Moeller
  1 sibling, 2 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-19 15:37 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp); +Cc: Sebastian Moeller, ecn-sane, tsvwg

"De Schepper, Koen (Nokia - BE/Antwerp)"
<koen.de_schepper@nokia-bell-labs.com> writes:

> Hi Sebastian,
>
> To avoid people to read through the long mail, I think the main point I want to make is:
>  "Indeed, having common-Qs supported is one of my requirements. That's

It's the common-q with AQM **+ ECN** that's the sticking point. I'm
perfectly satisfied with the behavior of every ietf approved single
queued AQM without ecn enabled. Let's deploy more of those!

> why I want to keep the discussion on that level: is there consensus
> that low latency is only needed for a per flow FQ system with an AQM
> per flow?"

Your problem statement elides the ECN bit.

If there is any one point that I'd like to see resolved about L4S
vs SCE, it's having a vote on its the use of ECT(1) as an e2e
identifier.

The poll I took in my communities (after trying really hard for years to
get folk to take a look at the architecture without bias), ran about
98% against the L4S usage of ect(1), in the lwn article and in every
private conversation since.

The SCE proposal for this half a bit as an additional congestion
signal supplied by the aqm, is vastly superior.

If we could somehow create a neutral poll in the general networking
community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
ISPs small and large) , and do it much like your classic "vote for a
political measure" thing, with a single point/counterpoint section,
maybe we'd get somewhere.

>
> If there is this consensus, this means that we can use SCE and that
> from now on, all network nodes have to implement per flow queuing with
> an AQM per flow.

There is no "we" here, and this is not a binary set of choices.

In particular conflating "low latency" really confounds the subject
matter, and has for years. FQ gives "low latency" for the vast
majority of flows running below their fair share. L4S promises "low
latency" for a rigidly defined set of congestion controls in a
specialized queue, and otherwise tosses all flows into a higher latency
queue when one flow is greedy.

The "ultra low queuing latency *for all*" marketing claptrap that l4S
had at one point really stuck in my craw.

0) There is a "we" that likes L4S in all its complexity and missing
integrated running code that demands total ECN deployment on one
physical medium (so far), a change to the definition of ECN itself, and
uses up ect(1) e2e instead of a dscp.

1) There is a "we" that has a highly deployed fq+aqm that happens to
have an ECN response, that is providing some of the lowest latencies
ever seen, live on the internet, across multiple physical mediums.

With a backward compatible proposal to do better, that uses up ect(1) as
an additional congestion notifier by the AQM.

2) There is a VERY large (silent) majority that wants nothing to do with
ECN at all and long ago fled the ietf, and works on things like RTT and
other metrics that don't need anything extra at the IP layer.

3) There is a vastly larger majority that has never even heard of AQM,
much less ECN, and doesn't care.

> If there is no consensus, we cannot use SCE and need to use L4S.

No.

If there is no consensus, we just keep motoring on with the existing
pie (with drop) deployments, and fq_codel/fq_pie/sch_cake more or less
as is... and continued refinement of transports and more research.

We've got a few billion devices that could use just what we got to get
orders of magnitude improvements in network delay.

And:

If there is consensus on fq+aqm+sce - ECN remains *optional*
which is an outcome I massively support, also.

So repeating this:

> If there is this consensus, this means that we can use SCE and that
> from now on, all network nodes have to implement per flow queuing with
> an AQM per flow.

It's not a binary choice as you lay it out.

1) Just getting FIFO queue sizes down to something reasonable - would be
GREAT. It still blows my mind that CMTSes still have 700ms of buffering at
100Mbit, 8 years into this debate.

2) only the network nodes most regularly experiencing human visible
congestive events truly need any form of AQM or FQ. In terms of what I
observe, thats:

ISP uplinks
Wifi (at ISP downlink speeds > 40Mbit)
345G 
ISP downlinks
Other in-home devices like ethernet over powerline

I'm sure others in the DC and interconnects see things differently.

I know I'm weird, but I'd like to eliminate congestion *humans* see,
rather than what skynet sees. Am I the only one that thinks this way?

3) we currently have a choice between multiple single queue, *non ECN*
enabled aqms that DO indeed work - pretty well - without any ECN support
enabled - pie, red, dualpi without using the ect identifier, cake
(cobalt). We never got around to making codel work better on a single
queue because we didn't see the point, but what's in cobalt could go
there if anyone cares.

We have a couple very successful fq+aqm combinations, *also*, that
happen to have an RFC3168 ECN response.

4) as for ECN enabled AQMs - single queued, dual q'd, or FQ'd, there's
plenty of problems remaining with all of them and their transports, that
make me very dubious about internet-wide deployment. Period. No matter
what happens here, I am going to keep discouraging the linux distros as
a whole to turn it on without first addressing the long list of items in
the ecn-sane design group's work list.

...

So to me, it goes back to slamming the door shut, or not, on L4S's usage
of ect(1) as a too easily gamed e2e identifier. As I don't think it and
all the dependent code and algorithms can possibly scale past a single
physical layer tech, I'd like to see it move to a DSCP codepoint, worst
case... and certainly remain "experimental" in scope until anyone
independent can attempt to evaluate it. 

second door I'd like to slam shut is redefining CE to be a weaker signal
of congestion as L4S does. I'm willing to write a whole bunch of
standards track RFCs obsoleting the experimental RFCs allowing this, if
that's what it takes. Bufferbloat is still a huge problem! Can we keep
working on fixing that?

third door I'd like to see open is the possibilities behind SCE.

Lastly:

I'd really all the tcp-go-fast-at-any-cost people to take a year off to
dogfood their designs, and go live somewhere with a congested network to
deal with daily, like a railway or airport, or on 3G network on a
sailboat or beach somewhere. It's not a bad life... REALLY.

In fact, it's WAY cheaper than attending 3 ietf conferences a year.

Enjoy Montreal!

Sincerely,

Dave Taht
From my sailboat in Alameda

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-19  9:06                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-19 15:37                                 ` Dave Taht
@ 2019-07-19 17:59                                 ` Sebastian Moeller
  1 sibling, 0 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-19 17:59 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: Holland, Jake, Jonathan Morton, ecn-sane, tsvwg

Hi Koen,



> On Jul 19, 2019, at 11:06, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi Sebastian,
> 
> To avoid people to read through the long mail, I think the main point I want to make is:
> "Indeed, having common-Qs supported is one of my requirements. That's why I want to keep the discussion on that level: is there consensus that low latency is only needed for a per flow FQ system with an AQM per flow?"
> 
> If there is this consensus, this means that we can use SCE and that from now on, all network nodes have to implement per flow queuing with an AQM per flow.

	Well, this in this exclusivity I would say this is wrong. as always only few nodes along a path actually develop queues in the first place and only those need to implement a competent AQM. As a data point from real life, I employ an fq-shaper for both ingress and egress traffic on my CPE and almost all of my latency-under-load issues improved to a level where I do not care anymore; and of the remaining issues most are/were caused by my ISPs peerings/transits to the other endpoint of a connection was running "hot". And as stated in this thread already, I do not see any of our proposals reach the transit/peering routers for lack of a monetary incentive for those that would need to operate AQMs on such devices.
	On monetary incentives, I add, that, even though it is not one of L4S's stated goals, but it looks like a reasonable match for the "special services" exemption carved out in the EU's network neutrality regulations. I do not want to go into a political discussion about special services here, but just notice that this is one option for ISPs to monetize a special low-latency service tier (as L4S aims to deliver), but even in that case the ISPs are at best incentivized to build L4S-empowered links into their own data-centers and for payed peeerings, this still does not address the issue of generel peering/transit routers IMHO.

> If there is no consensus, we cannot use SCE and need to use L4S.

	I am always very wary of these kind on "tertium non datur" arguments, as if L4S and SCE would be the only options to tackle the issue (sure those are the two alternatives in the table right now, but that is a different argument).

> 
> For all the other detailed discussion topics, see [K] inline:
> 
> Regards,
> Koen.
> 
> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de> 
> Sent: Thursday, July 18, 2019 12:40 AM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
> Cc: Holland, Jake <jholland@akamai.com>; Jonathan Morton <chromatix99@gmail.com>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
> Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
> 
> Dear Koen,
> 
> 
>> On Jul 10, 2019, at 11:00, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> [...]
>>>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?
>> 
>> FQ for "per-user" isolation in access equipment has clearly an extra cost, not? If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...
> 
> 	I believe there is work available demonstrating that a) millions of concurrently active flows might be overly pessimistic (even for peering routers) and b) IMHO it is far from settled that these bid transit/peering routers will employ any of the the schemes we are cooking up here. For b) I argue that both L4S "linear" CE-marking and SCE linear ECT(1) marking will give a clear signal of overload that an big ISP might not want to explicitly tell its customers...
> 
> [K] a) indeed, if queues can be dynamically allocated you could settle with less, not sure if dynamic allocation is compatible with high speed implementations. Anyway, any additional complexity is additional cost (and cycles, energy, heat dissipation, ...). Of course everything can be done...

	Great that we agree here, this is all about trade-offs.

> b) I don't agree that ECN is a signal of overload.

	Rereading RFC3168, I believe a CE mark is merited only if the packet would be dropped otherwise. IMHO that most likely will be caused by the node running out of some limited resource (bandwidth and/or CPU cycles), but sure it can be a policy decision as well, but I fail to see how such subtleties matter in our discussion.


> It is a natural part of feedback to tell greedy TCP that it reached its full capacity. Excessive drop/latency

	Well, excessive latency often correlates with overload, but IMHO is not causally linked (and hence I believe all schemes trying to deduce overload from latency-under-load-increases are _not_ looking at the right measure).


> is the signal of overload and an ECN-capable AQM switches from ECN to drop anyway in overload conditions.  

	This is the extreme situation, like in L4S when the 20ms queue limit gets exceeded and head- or tail-dropping starts?

> Excessive drop and latency can also be measured today, not?

	Well, only if you have a reasonable prior for what drop-rate and latency variation is under normal conditions. And even then one needs time to get measurement error down to the desired level, in other words that seems sub-optimal for a tight control loop.

> Running a few probes can tell customers the same with or without ECN, and capacity is measured simply with speedtests.

	Running probes is a) harder than it seems (as the probes should run against the servers of interest) b) requires probes send over the reverse path as well (so one needs looking glass servers close to the endpoints of interest). Ans speedtests are a whole different can of worms.... most end-user accessible speedtests severely under-report the necessary details to actually being able to access a link's properties even at rest, IMHO.

> 
>> 
>>>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?
>> 
>> Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens.
> 
> 	Fair enough, but why do you see a problem of treating this multiplexed flow as different from any other flow, after all it was the end-points conscious decision to masquerade as a single flow so why assume special treatment; it is not that intermediate hops have any insight into the multiplexing, so why expect them to cater for this?
> 
> [K] Because the design of VPN tunnels had as a main goal to maintain a secure/encrypted connection between clients and servers, trying to minimize the overhead on clients and servers by using a single TCP/UDP connection. I don't think the single flow was chosen to get treated as one flow's budget of throughput. This "feature" didn't exist at that (pre-FQ) time.

	Well, in pre-FQ-times there was no guarantee what so ever, so claiming this is an insurmountable problems seems a bit naive to me. For one using IPv6 flow labels or multiple flows are all options to deal with an FQ world. I see this as an non-serious strawman argument.

> 
>> Drop and ECN (if implemented correctly) are tunnel agnostic.
> 
> 	Exactly, and that is true for each identified flow as well, so fq does not diminish this, but rather builds on top of it.
> 
> [K] True for flows within a tunnel, but the point was that FQ treats the aggregated tunnel as a single flow compared to other single flows.

	And so does L4S... (modulo queue protection, but that will only act on packet ingress as it seems to leave the already queued packets alone). But yes, tunneling has side-effects, don't do it of you dislike these.

> 
>> Also how flows are identified might evolve (new transport protocols, encapsulations, ...?).
> 
> 	You are jesting surely, new protocols? We are in this kefuffle, because you claim that a new protocol to signal linear CE-marking response to be made of unobtaininum so you want to abuse an underused EVN code point as a classifier. If new protocols are an option, just bite the bullet and give tcp-reno a new protocol number and use this for your L4S classifier; problem solved in a nice and clean fashion.
> 
> [K] Indeed, it is hardly impossible to deploy new protocols in practice, but I hope we can make it more possible in the future, not less possible... Maybe utopic, but at least we should try to learn from past mistakes.

	So, why not use a new protocol for L4S behaviour then? If L4S truly is the bee's knees then it will drive adoptation of the new protocol, and if not, that also tells us something about the market's assessment of L4S's promises.

> 
>> Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities,
> 
> 	Please elaborate, how an intermediate hop would know about the desires of the endpoints here. As far as I can tell such hops have their own ideas about optimal scheduling that they will enforce independent of the what the endpoints deem optimal (by ncessity as most endpoints will desire highest priority for their packets).
> 
> [K] That network nodes cannot know what the end-systems want is exactly the point. FQ just assumes everybody should have the same throughput,

	Which has the great advantage of being predictable by the enduser.

> and makes an exception for single packets (to undo the most flagrant disadvantage of this strategy).

	Sorry, IMHO this one-packet rule assures forward progress for all flows and is a feature, not a kludge. But I guess I am missing something in your argument, care to elaborate?


>  But again, I don't want to let the discussion get distracted by arguing pro or con FQ. I think we have to live with both now.
> 
> [...]
> 
>>>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, 
>> 
>> I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system.
> 
> 	That is a rather peculiar argument, especially given that both you and Bob, major forces in the L4S approach, seemm to have philosophical issues with fq?
> 
> [K] I think I am realistic to accept pro's and con's and existence of both. I think wanting only FQ is as philosophical as wanting no FQ at all.

	Nobody wants you to switch your design away from dualQ or whathever you might want, as long as your choice does not have side-effects on the rest of the internet; use a real classifier instead of trying to press ECT(1) into service where a full bit is required and the issue is solved. My point is, again, I already use an fq-system on my CPE which gets me quite close to what L4S promises, but without necessarily redesigning most of the internet. So from my perspective FQ proofed itself already, now the newcomer L4S will need to demonstrate sufficient improvements over the existing FQ solution to merit the required non-backward compatible changes it mandates. And I do want to see a fair competition between the options (and will happily switch to L4S if it proves to be superior) under fair conditions.

> 
>> For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, 
> 
> 	??? Again you elaborately state the options in the L4S RFC and just converge on the one which is most convenient, but also not the best match for your requirements.
> 
> [K] Indeed, having common-Qs supported is one of my requirements.

	Misunderstanding here, I am not talking about dualQ/common-Q or mandating FQ everywhere, but about the fact that you committed on (ab)using ECT(1) as you "classifier" of choice even though this has severe side-effects...


> That's why I want to keep the discussion on that level: is there consensus that low latency is only needed for a per flow FQ system with an AQM per flow?

	This is a strawman argument , as far as I am concerned, as all I want is that L4S be orthogonal the the existing internet. As the L4S-RFCs verbosely describe there are other options for the required classification, so why insist upon using ECT(1)?

> 
>> and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.
>> 
>>>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
>> 
>> The performance in FQ is clearly equivalent, but for a common-Q behavior, only L4S can work.
>> As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ.
> 
> 	But how would the bulk queue actually care, being dedicated to bulk flows? This basically just uses a single codel instance for all flows in the bulk queue, exactly the situation codel was designed for, if I recall correctly. Sure this will run into problems with unrepsonsive flows, but not any more than DualQ with or without  queue protection (you can steer misbehaving flows into the the "classic" queue, but this will just change which flows will suffer most of the collateral damage of that unresponsive flow, IMHO).
> 
> [K] As far as I recall, CoDel works best for a single flow.

	As any other AQM on a single queue... The point is that the AQM really really wants to target those flows that cause most of the traffic (as throttling those will cause the most immediate reduction on ingress rate for the AQM hop), FQ presents those flows on a platter, single queue AQMs rely on stochastic truths like the likelihood of marking/dropping a flow's packets being proportional to the fraction of packets of this flow in the queue. As far as I can tell DualQ works exactly on the same (stochastically marking) principle and hence also will work best for a single flow (sure due to the higher marking probability this might not be as pronounced as with RED and codel, but still theoretically it will be there). I might be confused by DualQ, so please correct me if my assumption is wrong.


> For a stateless AQM like a step using only per packet sojourn time, a common AQM over FQs is indeed working as an FQ with an AQM per queue. Applying an stateless AQM for Classic traffic (like sojourn-time RED without smoothing) will have impact on its performance. Adding common state for all bulk queue AQMs will disable the FQ effect. Anyway, the sequential scan at dequeue is the main reason why LFQ will be hard to get traction in high-speed equipment.

	I believe this to be directed at Jonathan, so no comment from my side.

Best Regards
	Sebastian

> 
> 
> Best Regards
> 	Sebastian Moeller


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 15:37                                 ` Dave Taht
@ 2019-07-19 18:33                                   ` Wesley Eddy
  2019-07-19 20:03                                     ` Dave Taht
                                                       ` (2 more replies)
  2019-07-22 16:28                                   ` Bless, Roland (TM)
  1 sibling, 3 replies; 84+ messages in thread
From: Wesley Eddy @ 2019-07-19 18:33 UTC (permalink / raw)
  To: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp); +Cc: ecn-sane, tsvwg

On 7/19/2019 11:37 AM, Dave Taht wrote:
> It's the common-q with AQM **+ ECN** that's the sticking point. I'm
> perfectly satisfied with the behavior of every ietf approved single
> queued AQM without ecn enabled. Let's deploy more of those!

Hi Dave, I'm just trying to make sure I'm reading into your message 
correctly ... if I'm understanding it, then you're not in favor of 
either SCE or L4S at all?  With small queues and without ECN, loss 
becomes the only congestion signal, which is not desirable, IMHO, or am 
I totally misunderstanding something?

> If we could somehow create a neutral poll in the general networking
> community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
> ISPs small and large) , and do it much like your classic "vote for a
> political measure" thing, with a single point/counterpoint section,
> maybe we'd get somewhere.

While I agree that would be really useful, it's kind of an "I want a 
pony" statement.  As a TSVWG chair where we're doing this work, we've 
been getting inputs from people that have a foot in many of the 
communities you mention, but always looking for more.

> In particular conflating "low latency" really confounds the subject
> matter, and has for years. FQ gives "low latency" for the vast
> majority of flows running below their fair share. L4S promises "low
> latency" for a rigidly defined set of congestion controls in a
> specialized queue, and otherwise tosses all flows into a higher latency
> queue when one flow is greedy.

I don't think this is a correct statement.  Packets have to be from a 
"scalable congestion control" to get access to the L4S queue.  There are 
some draft requirements for using the L4S ID, but they seem pretty 
flexible to me.  Mostly, they're things that an end-host algorithm needs 
to do in order to behave nicely, that might be good things anyways 
without regard to L4S in the network (coexist w/ Reno, avoid RTT bias, 
work well w/ small RTT, be robust to reordering).  I am curious which 
ones you think are too rigid ... maybe they can be loosened?

Also, I don't think the "tosses all flows into a higher latency queue 
when one flow is greedy" characterization is correct.  The other queue 
is for classic/non-scalable traffic, and not necessarily higher latency 
for a given flow, nor is winding up there related to whether another 
flow is greedy.

> So to me, it goes back to slamming the door shut, or not, on L4S's usage
> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
> all the dependent code and algorithms can possibly scale past a single
> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
> case... and certainly remain "experimental" in scope until anyone
> independent can attempt to evaluate it.

That seems good to discuss in regard to the L4S ID draft.  There is a 
section (5.2) there already discussing DSCP, and why it alone isn't 
feasible.  There's also more detailed description of the relation and 
interworking in 
https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02

> I'd really all the tcp-go-fast-at-any-cost people to take a year off to
> dogfood their designs, and go live somewhere with a congested network to
> deal with daily, like a railway or airport, or on 3G network on a
> sailboat or beach somewhere. It's not a bad life... REALLY.
>
Fortunately, at least in the IETF, I don't think there have been 
initiatives in the direction of going fast at any cost in recent 
history, and they would be unlikely to be well accepted if there were!  
That is at least one place that there seems to be strong consensus.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 18:33                                   ` Wesley Eddy
@ 2019-07-19 20:03                                     ` Dave Taht
  2019-07-19 22:09                                       ` Wesley Eddy
  2019-07-19 20:06                                     ` Black, David
  2019-07-19 21:49                                     ` Sebastian Moeller
  2 siblings, 1 reply; 84+ messages in thread
From: Dave Taht @ 2019-07-19 20:03 UTC (permalink / raw)
  To: Wesley Eddy
  Cc: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

On Fri, Jul 19, 2019 at 11:33 AM Wesley Eddy <wes@mti-systems.com> wrote:
>
> On 7/19/2019 11:37 AM, Dave Taht wrote:
> > It's the common-q with AQM **+ ECN** that's the sticking point. I'm
> > perfectly satisfied with the behavior of every ietf approved single
> > queued AQM without ecn enabled. Let's deploy more of those!
>
> Hi Dave, I'm just trying to make sure I'm reading into your message
> correctly ... if I'm understanding it, then you're not in favor of
> either SCE or L4S at all?

I am not in favor of internet scale deployment of *ECN* at this time.
For controlled networks it can make sense. I have, indeed, done so.

Of the two proposals for making ECN safer and more useful, SCE is
struck me as superior when it appeared, and L4S
totally undeployable for a half dozen reasons since it appeared and
perpetually worse as more and more details and flaws fell out of the
architecture documents, and were 'documented' rather than treated as
the showstoppers they were.

>  With small queues and without ECN, loss
> becomes the only congestion signal

RTT... BBR...

>, which is not desirable,

packet loss we know to work with all protocols we have on the internet
other than tcp, and thus is the most important congestion indicator we
have. Until fq_codel's essentially accidental deployment of
ecn-enablement, and apple then turning it on universally, we had
essentially no field data aside from those crazies (like me) that
fully deployed it on their corporate networks.

I do rather like SCE's addition of two new congestion signals and
retention of CE as a very strong one. I'd *really* like it,
additionally, if treating "drop and mark" as an even stronger
congesting indicator also became a thing.

And I'd like it if we did more transport level work (as is finally
happening) on just about everything and *dogfooded* the results on
real home and small business networks (as I do), and ran real
benchmarks with real loads concurrent, before unleashing such a change
to the internet.

> IMHO, or am
> I totally misunderstanding something?

Has it not been clear all these years that I don't care much for ECN
in the first place? Nor do the designers of codel? Nor everyone burned
by it the first time? That ecn advocacy is limited to a very small
passionate number of folk in the ietf?

Do any of the "ecn side" actually dogfood any of their ecn stuff, day
in and day out? I encouraged y'all years ago to convince one uni, one
lab, one reasonably large scale enterprise to go all-in on ecn, and
that has not happened? still?

Look at how much of that sort of testing went into ipv6 before it
started to deploy...

every time I give a talk to the more general networking public -
people that should know what I'm talking about - I have to go explain
ecn, in enormous detail.

One of the most basic side-effects of ecn enablement is that I also
had to ecn-enable the babel protocol so it doesn't get starved out on
slower links. This points to bad side effects on every non-tcp-enabled
routing protocol.

>
> > If we could somehow create a neutral poll in the general networking
> > community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
> > ISPs small and large) , and do it much like your classic "vote for a
> > political measure" thing, with a single point/counterpoint section,
> > maybe we'd get somewhere.
>
> While I agree that would be really useful, it's kind of an "I want a
> pony" statement.  As a TSVWG chair where we're doing this work, we've
> been getting inputs from people that have a foot in many of the
> communities you mention, but always looking for more.

Speaking as someone very fed up with the ietf, that did try to leave a
few months back - there is one sadly optional ietf process I like -
"running code, & two interoperable implementations", that I wish had
been applied to the entire l4s process long before it got to this
point.

public Ns2 and ns3 models of pie and codel were required in the aqm
group. So was independent testing.

In the L4S process we'd also made the strong suggestion the L4Steam
went the openwrt route, just as we did for fq_codel, to be able to
look at real world problems we encountered there like TSO/GRO batching
and non-tcp applications. We still don't got anything even close to
that. L4S is essentially at a pre 2011 state in terms of the real
effects on real networks and legacy applcations.

Wanting that basic stuff, *running* long before it is standardized is
not "I want a pony", it's "you want a unicorn".

"doing the work" includes doing basic stuff like that. to me it's
utterly required to have done that work before inflicting it on even
the tiniest portion of the internet. I have no idea why some ietfers
don't seem to get this.

Anyway, I'm on the verge of losing my temper again, and I really
should just stay clear of these discussions, and steer clear of the
meetings, and try to just read summary reports and code. I rather
liked the early SCE results that went
by one some thread here or another in the past week or two, even the
single queue ones looked promising, and the FQ one was to die for.....

I'm looking forward, as I've always said throughout these processes,
for *RUNNING CODE* and a chance to independently evaluate the various
new ideas on real gear. My personal and principal goal is to make wifi
(and other wireless internet tech)  work better, or at least - not
work worse - that what has already been deployed in 10s of millions in
the fq_codel for wifi work.

I would like it very much if the tsvwg chairs decided to enforce the
"running code, two interoperable implementations,
and independent testability requirements that I have" - and the old
ietf that I used to like used to have - on both L4S and SCE, and the
transport mods under test - and even then the ect(1) dispute needs to
be resolved soon.

Is there any chance we'll see my conception of the good ietf process
enforced on the L4S and SCE processes by the chairs?

I'd sleep better to then focus on what I do best, which is blowing up
ideas in the real world and making them good enough to use across the
internet.

>
>
> > In particular conflating "low latency" really confounds the subject
> > matter, and has for years. FQ gives "low latency" for the vast
> > majority of flows running below their fair share. L4S promises "low
> > latency" for a rigidly defined set of congestion controls in a
> > specialized queue, and otherwise tosses all flows into a higher latency
> > queue when one flow is greedy.
>
> I don't think this is a correct statement.  Packets have to be from a
> "scalable congestion control" to get access to the L4S queue.  There are

No, they just have to mark the right bit.

No requirement to be from a scalable congestion control is *enforcable*.

So I'd never say "packets have to be from a scalable congestion
control", but "they have to set the right bit"

as for the other part, I'd re-say:

"and otherwise toss all  "normal" (classic) flows into a higher
latency classic queue when one normal flow is greedy."

I don't think "have to be from a scalable congestion control" is a
correct statement. What part about how any application can, from
userspace, set:

    const int ds = 0x01;        /* Yea! let's abuse L4S! */
    rc = setsockopt(s, IPPROTO_IPV6, IPV6_TCLASS, &ds, sizeof(ds));

is unclear?

> some draft requirements for using the L4S ID, but they seem pretty
> flexible to me.  Mostly, they're things that an end-host algorithm needs
> to do in order to behave nicely, that might be good things anyways
> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
> work well w/ small RTT, be robust to reordering).  I am curious which
> ones you think are too rigid ... maybe they can be loosened?

no, I don't think they are rigid enough to actually work against
mixed, real workloads!

> Also, I don't think the "tosses all flows into a higher latency queue
> when one flow is greedy" characterization is correct.  The other queue
> is for classic/non-scalable traffic, and not necessarily higher latency

"Classic" is *normal* traffic. roughly 100% of the traffic that exists
today, falls into that queue.

So I should have said - "tosses all normal ("classic") flows into a
single and higher latency queue when a greedy normal flow is present"
... "in the dualpi" case? I know it's possible to hang a different
queue algo on the "normal" queue, but
to this day I don't see the need for the l4s "fast lane" in the first
place, nor a cpu efficient way of doing the right things with the
dualpi or curvyred code. What I see, is, long term, that special bit
just becomes a "fast" lane for any sort of admission controlled
traffic the ISP wants to put there, because the dualpi idea fails on
real traffic.

In my future public statements on this I'm going to give up entirely
on the newspeak.

> for a given flow, nor is winding up there related to whether another
> flow is greedy.

I'm not sure if we were talking about the same thing, but I agree what
I wrote above was originally unclear especially if your mated to the
dualq concept.

>
> > So to me, it goes back to slamming the door shut, or not, on L4S's usage
> > of ect(1) as a too easily gamed e2e identifier. As I don't think it and
> > all the dependent code and algorithms can possibly scale past a single
> > physical layer tech, I'd like to see it move to a DSCP codepoint, worst
> > case... and certainly remain "experimental" in scope until anyone
> > independent can attempt to evaluate it.
>
> That seems good to discuss in regard to the L4S ID draft.  There is a
> section (5.2) there already discussing DSCP, and why it alone isn't
> feasible.  There's also more detailed description of the relation and
> interworking in
> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02

It's kind of a showstopping problem, I think, for anything but a well
controlled network.

Ship some code, do some tests, let some other people at it, get some
real results, starting with flent's rrul tests.

>
>
> > I'd really all the tcp-go-fast-at-any-cost people to take a year off to
> > dogfood their designs, and go live somewhere with a congested network to
> > deal with daily, like a railway or airport, or on 3G network on a
> > sailboat or beach somewhere. It's not a bad life... REALLY.
> >
> Fortunately, at least in the IETF, I don't think there have been
> initiatives in the direction of going fast at any cost in recent
> history, and they would be unlikely to be well accepted if there were!
> That is at least one place that there seems to be strong consensus.

Well if the various WGs would exit that nice hotel, and form a
diaspora over the city in coffee shops and other public spaces, and do
some tests of your latest and greatest stuff, y'all might get a more
accurate viewpoint of what you are actually accomplishing. Take a look
at what BBR does, take a look at what IW10 does, take a look at what
browsers currently do.

IETF design and testing is overly driven by overly simple tests, and
not enough by real world traffic effects.

I'm not coming to this meeting and I'm not on the tsvwg list.

I'd wanted the ecn-sane list to be a nice quiet spot to be able to
think clearly about how to fix the enormous fq_codel deployment -
particularly on wifi - if we had to - far more than I'd wanted to get
embroiled in the l4s debate.

Is there any chance we'll see my conception of the good ietf process
enforced on both the L4S and SCE processes by the chairs?

>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-19 18:33                                   ` Wesley Eddy
  2019-07-19 20:03                                     ` Dave Taht
@ 2019-07-19 20:06                                     ` Black, David
  2019-07-19 20:44                                       ` Jonathan Morton
                                                         ` (2 more replies)
  2019-07-19 21:49                                     ` Sebastian Moeller
  2 siblings, 3 replies; 84+ messages in thread
From: Black, David @ 2019-07-19 20:06 UTC (permalink / raw)
  To: Wesley Eddy, Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: ecn-sane, tsvwg

Two comments as an individual, not as a WG chair:

> Mostly, they're things that an end-host algorithm needs
> to do in order to behave nicely, that might be good things anyways
> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
> work well w/ small RTT, be robust to reordering).  I am curious which
> ones you think are too rigid ... maybe they can be loosened?

[1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).  

For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.

> > So to me, it goes back to slamming the door shut, or not, on L4S's usage
> > of ect(1) as a too easily gamed e2e identifier. As I don't think it and
> > all the dependent code and algorithms can possibly scale past a single
> > physical layer tech, I'd like to see it move to a DSCP codepoint, worst
> > case... and certainly remain "experimental" in scope until anyone
> > independent can attempt to evaluate it.
> 
> That seems good to discuss in regard to the L4S ID draft.  There is a
> section (5.2) there already discussing DSCP, and why it alone isn't
> feasible.  There's also more detailed description of the relation and
> interworking in
> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02

[2] We probably should pay more attention to that draft.  One of the things that I think is important in that draft is a requirement that operators can enable/disable L4S behavior of ECT(1) on a per-DSCP basis - the rationale for that functionality starts with incremental deployment.   This technique may also have the potential to provide a means for L4S and SCE to coexist via use of different DSCPs for L4S vs. SCE traffic (there are some subtleties here, e.g., interaction with operator bleaching of DSCPs to zero at network boundaries).

To be clear on what I have in mind:
	o Unacceptable: All traffic marked with ECT(1) goes into the L4S queue, independent of what DSCP it is marked with.
	o Acceptable:  There's an operator-configurable list of DSCPs that support an L4S service - traffic marked with ECT(1) goes into the L4S queue if and only if that traffic is also marked with a DSCP that is on the operator's DSCPs-for-L4S list.

Reminder: This entire message is posted as an individual, not as a WG chair.

Thanks, --David

> -----Original Message-----
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Wesley Eddy
> Sent: Friday, July 19, 2019 2:34 PM
> To: Dave Taht; De Schepper, Koen (Nokia - BE/Antwerp)
> Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
> 
> 
> [EXTERNAL EMAIL]
> 
> On 7/19/2019 11:37 AM, Dave Taht wrote:
> > It's the common-q with AQM **+ ECN** that's the sticking point. I'm
> > perfectly satisfied with the behavior of every ietf approved single
> > queued AQM without ecn enabled. Let's deploy more of those!
> 
> Hi Dave, I'm just trying to make sure I'm reading into your message
> correctly ... if I'm understanding it, then you're not in favor of
> either SCE or L4S at all?  With small queues and without ECN, loss
> becomes the only congestion signal, which is not desirable, IMHO, or am
> I totally misunderstanding something?
> 
> 
> > If we could somehow create a neutral poll in the general networking
> > community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
> > ISPs small and large) , and do it much like your classic "vote for a
> > political measure" thing, with a single point/counterpoint section,
> > maybe we'd get somewhere.
> 
> While I agree that would be really useful, it's kind of an "I want a
> pony" statement.  As a TSVWG chair where we're doing this work, we've
> been getting inputs from people that have a foot in many of the
> communities you mention, but always looking for more.
> 
> 
> > In particular conflating "low latency" really confounds the subject
> > matter, and has for years. FQ gives "low latency" for the vast
> > majority of flows running below their fair share. L4S promises "low
> > latency" for a rigidly defined set of congestion controls in a
> > specialized queue, and otherwise tosses all flows into a higher latency
> > queue when one flow is greedy.
> 
> I don't think this is a correct statement.  Packets have to be from a
> "scalable congestion control" to get access to the L4S queue.  There are
> some draft requirements for using the L4S ID, but they seem pretty
> flexible to me.  Mostly, they're things that an end-host algorithm needs
> to do in order to behave nicely, that might be good things anyways
> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
> work well w/ small RTT, be robust to reordering).  I am curious which
> ones you think are too rigid ... maybe they can be loosened?
> 
> Also, I don't think the "tosses all flows into a higher latency queue
> when one flow is greedy" characterization is correct.  The other queue
> is for classic/non-scalable traffic, and not necessarily higher latency
> for a given flow, nor is winding up there related to whether another
> flow is greedy.
> 
> 
> > So to me, it goes back to slamming the door shut, or not, on L4S's usage
> > of ect(1) as a too easily gamed e2e identifier. As I don't think it and
> > all the dependent code and algorithms can possibly scale past a single
> > physical layer tech, I'd like to see it move to a DSCP codepoint, worst
> > case... and certainly remain "experimental" in scope until anyone
> > independent can attempt to evaluate it.
> 
> That seems good to discuss in regard to the L4S ID draft.  There is a
> section (5.2) there already discussing DSCP, and why it alone isn't
> feasible.  There's also more detailed description of the relation and
> interworking in
> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
> 
> 
> > I'd really all the tcp-go-fast-at-any-cost people to take a year off to
> > dogfood their designs, and go live somewhere with a congested network
> to
> > deal with daily, like a railway or airport, or on 3G network on a
> > sailboat or beach somewhere. It's not a bad life... REALLY.
> >
> Fortunately, at least in the IETF, I don't think there have been
> initiatives in the direction of going fast at any cost in recent
> history, and they would be unlikely to be well accepted if there were!
> That is at least one place that there seems to be strong consensus.
> 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-19 20:06                                     ` Black, David
@ 2019-07-19 20:44                                       ` Jonathan Morton
  2019-07-19 22:03                                         ` Sebastian Moeller
  2019-07-21 12:30                                       ` Bob Briscoe
  2019-07-21 12:30                                       ` Scharf, Michael
  2 siblings, 1 reply; 84+ messages in thread
From: Jonathan Morton @ 2019-07-19 20:44 UTC (permalink / raw)
  To: Black, David
  Cc: Wesley Eddy, Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp),
	ecn-sane, tsvwg

> On 19 Jul, 2019, at 4:06 pm, Black, David <David.Black@dell.com> wrote:
> 
> To be clear on what I have in mind:
> 	o Unacceptable: All traffic marked with ECT(1) goes into the L4S queue, independent of what DSCP it is marked with.
> 	o Acceptable:  There's an operator-configurable list of DSCPs that support an L4S service - traffic marked with ECT(1) goes into the L4S queue if and only if that traffic is also marked with a DSCP that is on the operator's DSCPs-for-L4S list.

I take it, in the latter case, that this increases the cases in which L4S endpoints would need to detect that they are not receiving L4S signals, but RFC-3168 signals.  The current lack of such a mechanism therefore remains concerning.  For comparison, SCE inherently retains such a mechanism by putting the RFC-3168 and high-fidelity signals on different ECN codepoints.

So I'm pleased to hear that the L4S team will be at the hackathon with a demo setup.  Hopefully we will be able to obtain comparative test results, using the same test scripts as we use on SCE, and also insert an RFC-3168 single queue AQM into their network to demonstrate what actually happens in that case.  I think that the results will be illuminating for all concerned.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 18:33                                   ` Wesley Eddy
  2019-07-19 20:03                                     ` Dave Taht
  2019-07-19 20:06                                     ` Black, David
@ 2019-07-19 21:49                                     ` Sebastian Moeller
  2 siblings, 0 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-19 21:49 UTC (permalink / raw)
  To: Wesley Eddy
  Cc: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg



> On Jul 19, 2019, at 20:33, Wesley Eddy <wes@mti-systems.com> wrote:
> 
> On 7/19/2019 11:37 AM, Dave Taht wrote:
> [...]
>> In particular conflating "low latency" really confounds the subject
>> matter, and has for years. FQ gives "low latency" for the vast
>> majority of flows running below their fair share. L4S promises "low
>> latency" for a rigidly defined set of congestion controls in a
>> specialized queue, and otherwise tosses all flows into a higher latency
>> queue when one flow is greedy.
> 
> I don't think this is a correct statement.  Packets have to be from a "scalable congestion control" to get access to the L4S queue.  

	With the current proposal, a packet needs to set the ECT(1) codepoint, there is _no_ checking whether there is a "scalable congestion control" operational on this flow. Even worse every CE-marked packet will be put into the L4S queue; the latter is a consequence of the currently preferred choice of using ECT(1) as L4S classifying bit. Sure the queue protection feature might help to demote flows not playing along the L4S rules back into the RFC3168 queue, but queue protection is advertized as optional....


> There are some draft requirements for using the L4S ID, but they seem pretty flexible to me.  Mostly, they're things that an end-host algorithm needs to do in order to behave nicely,

	Except there is no real enforcement/measurement whether flows "behave nicely", at least as far as I can see.


> [...]
> 
>> So to me, it goes back to slamming the door shut, or not, on L4S's usage
>> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
>> all the dependent code and algorithms can possibly scale past a single
>> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
>> case... and certainly remain "experimental" in scope until anyone
>> independent can attempt to evaluate it.
> 
> That seems good to discuss in regard to the L4S ID draft.  There is a section (5.2) there already discussing DSCP, and why it alone isn't feasible.  There's also more detailed description of the relation and interworking in https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02

	IMHO a new protocol ID is the solution:
See https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4"

'B.4.  Protocol ID


   It has been suggested that a new ID in the IPv4 Protocol field or the
   IPv6 Next Header field could identify L4S packets.  However this
   approach is ruled out by numerous problems:

   o  A new protocol ID would need to be paired with the old one for
      each transport (TCP, SCTP, UDP, etc.);

   o  In IPv6, there can be a sequence of Next Header fields, and it
      would not be obvious which one would be expected to identify a
      network service like L4S;

   o  A new protocol ID would rarely provide an end-to-end service,
      because It is well-known that new protocol IDs are often blocked
      by numerous types of middlebox;

   o  The approach is not a solution for AQMs below the IP layer;"


None of these points are show stoppers, IMHO:
1) Especially since in all likelihood only two new protocol IDs will be needed, "AIAD TCP" and "AIAD UDP". 
2) The IPv6 issue is a bit of a red herring as the next header field typically seems to contain the exact same number as IPv4's protocol field and chained headers are probably rare. Also if the primary next header is not of an L4S type, simply treating the flow as RFC3168 compliant seems like a safe option.
3) Okay that is a challenge, but ig L4S is worth its salt, it will offer enough incentives to overcome this hurdle, otherwise why waste ECT(1) on something that the market/the network community does not seem to want?
4)
Me: "Doctor it hurts if I put an AQM below the IP layer."
Physician: " Do not do that then!"
Honestly, how is an AQM below the IP layer (so L1/L2) going to act on IP's ECN code points as required for L4S, but going to fail to look at the protocol/next header field?

This is a really clean solution for L4S issues with the currently proposed badly fitting classifier, that solves all issues with interoperability with the rest of the current internet. 

[...]


Best Regards
	Sebastian

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-19 20:44                                       ` Jonathan Morton
@ 2019-07-19 22:03                                         ` Sebastian Moeller
  2019-07-20 21:02                                           ` Dave Taht
  2019-07-21 11:53                                           ` Bob Briscoe
  0 siblings, 2 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-19 22:03 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Black, David, tsvwg, ecn-sane, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)

Hi Jonathan,



> On Jul 19, 2019, at 22:44, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>> On 19 Jul, 2019, at 4:06 pm, Black, David <David.Black@dell.com> wrote:
>> 
>> To be clear on what I have in mind:
>> 	o Unacceptable: All traffic marked with ECT(1) goes into the L4S queue, independent of what DSCP it is marked with.
>> 	o Acceptable:  There's an operator-configurable list of DSCPs that support an L4S service - traffic marked with ECT(1) goes into the L4S queue if and only if that traffic is also marked with a DSCP that is on the operator's DSCPs-for-L4S list.
> 
> I take it, in the latter case, that this increases the cases in which L4S endpoints would need to detect that they are not receiving L4S signals, but RFC-3168 signals.  The current lack of such a mechanism therefore remains concerning.  For comparison, SCE inherently retains such a mechanism by putting the RFC-3168 and high-fidelity signals on different ECN codepoints.
> 
> So I'm pleased to hear that the L4S team will be at the hackathon with a demo setup.  Hopefully we will be able to obtain comparative test results, using the same test scripts as we use on SCE, and also insert an RFC-3168 single queue AQM into their network to demonstrate what actually happens in that case.  I think that the results will be illuminating for all concerned.

	What I really would like to see, how L4S endpoints will deal with post-bottleneck ingress shaping by an RFC3168 -compliant FQ-AQM. I know the experts here deems this not even a theoretical concern, but I really really want to see data, that L4S flows will not crowd out the more reactive RFC3168 flows in that situation. This is the set-up quite a number of latency sensitive end-users actually use to "debloat" the internet and it would be nice to have real data showing that this is not a concern.

Best Regards
	Sebastian



> 
> - Jonathan Morton
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 20:03                                     ` Dave Taht
@ 2019-07-19 22:09                                       ` Wesley Eddy
  2019-07-19 23:42                                         ` Dave Taht
  0 siblings, 1 reply; 84+ messages in thread
From: Wesley Eddy @ 2019-07-19 22:09 UTC (permalink / raw)
  To: Dave Taht
  Cc: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

Hi Dave, thanks for clarifying, and sorry if you're getting upset.

When we're talking about keeping very small queues, then RTT is lost as 
a congestion indicator (since there is no queue depth to modulate as a 
congestion signal into the RTT).  We have indicators that include drop, 
RTT, and ECN (when available).  Using rate of marks rather than just 
binary presence of marking gives a finer-grained signal.  SCE is also 
providing a multi-level indication, so that's another way to get more 
"ENOB" into the samples of congestion being fed to the controllers.

Marking (whether classic ECN, mark-rate, or multi-level marking) is 
needed since with small queues there's lack of congestion information in 
the RTT.

To address one question you repeated a couple times:

> Is there any chance we'll see my conception of the good ietf process
> enforced on the L4S and SCE processes by the chairs?

We look for working group consensus.  So far, we saw consensus to adopt 
as a WG item for experimental track, and have been following the process 
for that.

On the topic of gaming the system by falsely setting the L4S ID, that 
might need to be discussed a little bit more, since now that you mention 
it, the docs don't seem to very directly address it yet.  I can only 
speak for myself, but assumed a couple things internally, such as (1) 
this is getting enabled in specific environments, (2) in less controlled 
environments, an operator enabling it has protections in place for 
getting admission or dealing with bad behavior, (3) there could be 
further development of audit capabilities such as in CONEX, etc.  I 
guess it could be good to hear more about what others were thinking on this.

> So I should have said - "tosses all normal ("classic") flows into a
> single and higher latency queue when a greedy normal flow is present"
> ... "in the dualpi" case? I know it's possible to hang a different
> queue algo on the "normal" queue, but
> to this day I don't see the need for the l4s "fast lane" in the first
> place, nor a cpu efficient way of doing the right things with the
> dualpi or curvyred code. What I see, is, long term, that special bit
> just becomes a "fast" lane for any sort of admission controlled
> traffic the ISP wants to put there, because the dualpi idea fails on
> real traffic.

Thanks; this was helpful for me to understand your position.

> Well if the various WGs would exit that nice hotel, and form a
> diaspora over the city in coffee shops and other public spaces, and do
> some tests of your latest and greatest stuff, y'all might get a more
> accurate viewpoint of what you are actually accomplishing. Take a look
> at what BBR does, take a look at what IW10 does, take a look at what
> browsers currently do.

All of those things come up in the meetings, and frequently there is 
measurement data shown and discussed.  It's always welcome when people 
bring measurements, data, and experience.  The drafts and other 
contributions are here so that anyone interested can independently 
implement and do the testing you advocate and share results.  We're all 
on the same team trying to make the Internet better.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 22:09                                       ` Wesley Eddy
@ 2019-07-19 23:42                                         ` Dave Taht
  2019-07-24 16:21                                           ` Dave Taht
  0 siblings, 1 reply; 84+ messages in thread
From: Dave Taht @ 2019-07-19 23:42 UTC (permalink / raw)
  To: Wesley Eddy
  Cc: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

On Fri, Jul 19, 2019 at 3:09 PM Wesley Eddy <wes@mti-systems.com> wrote:
>
> Hi Dave, thanks for clarifying, and sorry if you're getting upset.

There have been a few other disappointments this ietf. I'd hoped bbrv2
would land for independent testing. Didn't.

https://github.com/google/bbr

I have some "interesting" patches for bbrv1 but felt it would be saner
to wait for the most current version (or for the bbrv2 authors to
have the small rfc3168 baseline patch I'd requested tested by them
rather than I), to bother redoing that series of tests and publishing.

I'd asked if the dctcp and dualpi code on github was stable enough to
be independently tested. No reply.

The SCE folk did freeze and document a release worth testing.

I did some testing on wifi at battlemesh but it's too noisy (but the
sources of "noise" were important) and too obviously "ecn is not the
wifi problem"

I didn't know there was an "add a delay based option to cubic patch"
until last week.

So anyway, I do retain hope, maybe after this coming week and some
more hackathoning, it might be possible to start getting reproducible
and repeatable results from more participants in this controversy.
Having to sit through another half-dozen presentations with
irreproducible results is not something I look forward to, and I'm
glad I don't have to.

> When we're talking about keeping very small queues, then RTT is lost as
> a congestion indicator (since there is no queue depth to modulate as a
> congestion signal into the RTT).  We have indicators that include drop,
> RTT, and ECN (when available).  Using rate of marks rather than just
> binary presence of marking gives a finer-grained signal.  SCE is also
> providing a multi-level indication, so that's another way to get more
> "ENOB" into the samples of congestion being fed to the controllers.

While this is extremely well said, RTT is NOT lost as a congestion
indicator, it just becomes finer grained.

While I'm reading tea-leaves... there's been a lot of stuff landing in
the linux kernel from google around edf scheduling for tcp and the
hardware enabled pacing qdiscs. So I figure they are now in the nsec
category on their stuff but not ready to be talking.

> Marking (whether classic ECN, mark-rate, or multi-level marking) is
> needed since with small queues there's lack of congestion information in
> the RTT.

small queues *and isochronous, high speed, wired connections*.

What will it take to get the ecn and especially l4s crowd to take a
hard look at actual wireless or wifi packet captures? I mean, y'all
are sitting staring into your laptops for a week, doing wifi. Would it
hurt to test more actual transports during
that time?

How many ISPs would still be in business if wifi didn't exist, only {X}G?

the wifi at the last ietf sucked...

Can't even get close to 5ms latencies on any form of wireless/wifi.

Anyway, I long ago agreed that multiple marks (of some sort) per rtt
made sense (see my position statements on ecn-sane),
but of late I've been leaning more towards really good pacing,  rtt
and chirping with minimal marking required on
"small queues *and isochronous, high speed, wired connections*.

>
> To address one question you repeated a couple times:
>
> > Is there any chance we'll see my conception of the good ietf process
> > enforced on the L4S and SCE processes by the chairs?
>
> We look for working group consensus.  So far, we saw consensus to adopt
> as a WG item for experimental track, and have been following the process
> for that.

Well, given the announcement of docsis low latency, and the size of
the fq_codel deployment,
and the l4s/sce drafts, we are light-years beyond anything I'd
consider to be "experimental" in the real world.

Would recognizing this reality and somehow converting this to a
standards track debate within the ietf help anything?

Would getting this out of tsvwg and restarting aqmwg help any?

I was, up until all this blew up in december, planning on starting the
process for an rfc8289bis and rfc8290bis on the standards track.

>
> On the topic of gaming the system by falsely setting the L4S ID, that
> might need to be discussed a little bit more, since now that you mention
> it, the docs don't seem to very directly address it yet.

to me this has always been a game theory deal killer for l4s (and
diffserv, intserv, etc). You cannot ask for
more priority, only less. While I've been recommending books from
kleinrock lately, another one that
I think everyone in this field should have is:

https://www.amazon.com/Theory-Games-Economic-Behavior-Commemorative-ebook/dp/B00AMAZL4I/ref=sr_1_1?keywords=theory+of+games+and+economic+behavior&qid=1563579161&s=gateway&sr=8-1

I've read it countless times (and can't claim to have understood more
than a tiny percentage of it). I wasn't aware
until this moment there was a kindle edition.

> I can only
> speak for myself, but assumed a couple things internally, such as (1)
> this is getting enabled in specific environments, (2) in less controlled
> environments, an operator enabling it has protections in place for
> getting admission or dealing with bad behavior, (3) there could be
> further development of audit capabilities such as in CONEX, etc.  I
> guess it could be good to hear more about what others were thinking on this.

I think there was "yet another queue" suggested for detected bad behavior.

>
> > So I should have said - "tosses all normal ("classic") flows into a
> > single and higher latency queue when a greedy normal flow is present"
> > ... "in the dualpi" case? I know it's possible to hang a different
> > queue algo on the "normal" queue, but
> > to this day I don't see the need for the l4s "fast lane" in the first
> > place, nor a cpu efficient way of doing the right things with the
> > dualpi or curvyred code. What I see, is, long term, that special bit
> > just becomes a "fast" lane for any sort of admission controlled
> > traffic the ISP wants to put there, because the dualpi idea fails on
> > real traffic.
>
> Thanks; this was helpful for me to understand your position.

Groovy.

I recently ripped ecn support out of fq_codel entirely, in
the fq_codel_fast tree. saved some cpu, still measuring (my real objective
is to make that code multicore),

another branch also has the basic sce support, and will have more
after jon settles on a ramp and single queue fallbacks in
sch_cake. btw, if anyone cares, there's more than a few flent test
servers scattered around the internet now that
do some variant of sce for others to play with....

>
>
> > Well if the various WGs would exit that nice hotel, and form a
> > diaspora over the city in coffee shops and other public spaces, and do
> > some tests of your latest and greatest stuff, y'all might get a more
> > accurate viewpoint of what you are actually accomplishing. Take a look
> > at what BBR does, take a look at what IW10 does, take a look at what
> > browsers currently do.
>
> All of those things come up in the meetings, and frequently there is
> measurement data shown and discussed.  It's always welcome when people
> bring measurements, data, and experience.  The drafts and other
> contributions are here so that anyone interested can independently
> implement and do the testing you advocate and share results.  We're all
> on the same team trying to make the Internet better.

Skip a meeting. Try the internet in Bali. Or africa. Or south america.
Or on a boat, Or do an interim
in places like that.

>
>

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-19 22:03                                         ` Sebastian Moeller
@ 2019-07-20 21:02                                           ` Dave Taht
  2019-07-21 11:53                                           ` Bob Briscoe
  1 sibling, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-20 21:02 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: Jonathan Morton, Black, David, tsvwg, ecn-sane, De Schepper,
	Koen (Nokia - BE/Antwerp)

Sebastian Moeller <moeller0@gmx.de> writes:

> Hi Jonathan,
>
>
>
>> On Jul 19, 2019, at 22:44, Jonathan Morton <chromatix99@gmail.com> wrote:
>> 
>>> On 19 Jul, 2019, at 4:06 pm, Black, David <David.Black@dell.com> wrote:
>>> 
>>> To be clear on what I have in mind:
>>> 	o Unacceptable: All traffic marked with ECT(1) goes into the L4S queue, independent of what DSCP it is marked with.
>>> 	o Acceptable: There's an operator-configurable list of DSCPs
>>> that support an L4S service - traffic marked with ECT(1) goes into
>>> the L4S queue if and only if that traffic is also marked with a
>>> DSCP that is on the operator's DSCPs-for-L4S list.
>> 
>> I take it, in the latter case, that this increases the cases in
>> which L4S endpoints would need to detect that they are not receiving
>> L4S signals, but RFC-3168 signals.  The current lack of such a
>> mechanism therefore remains concerning.  For comparison, SCE
>> inherently retains such a mechanism by putting the RFC-3168 and
>> high-fidelity signals on different ECN codepoints.
>> 
>> So I'm pleased to hear that the L4S team will be at the hackathon
>> with a demo setup.  Hopefully we will be able to obtain comparative
>> test results, using the same test scripts as we use on SCE, and also
>> insert an RFC-3168 single queue AQM into their network to
>> demonstrate what actually happens in that case.  I think that the
>> results will be illuminating for all concerned.
>
> 	What I really would like to see, how L4S endpoints will deal
> with post-bottleneck ingress shaping by an RFC3168 -compliant
> FQ-AQM. I know the experts here deems this not even a theoretical
> concern, but I really really want to see data, that L4S flows will not
> crowd out the more reactive RFC3168 flows in that situation. This is
> the set-up quite a number of latency sensitive end-users actually use
> to "debloat" the internet and it would be nice to have real data
> showing that this is not a concern.

+10

>
> Best Regards
> 	Sebastian
>
>
>
>> 
>> - Jonathan Morton
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 22:03                                         ` Sebastian Moeller
  2019-07-20 21:02                                           ` Dave Taht
@ 2019-07-21 11:53                                           ` Bob Briscoe
  2019-07-21 15:30                                             ` [Ecn-sane] Hackathon tests Dave Taht
                                                               ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: Bob Briscoe @ 2019-07-21 11:53 UTC (permalink / raw)
  To: Sebastian Moeller, Jonathan Morton
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

Sebastian,

On 19/07/2019 23:03, Sebastian Moeller wrote:
> Hi Jonathan,
>
>
>
>> On Jul 19, 2019, at 22:44, Jonathan Morton <chromatix99@gmail.com> wrote:
>> So I'm pleased to hear that the L4S team will be at the hackathon with a demo setup.  Hopefully we will be able to obtain comparative test results, using the same test scripts as we use on SCE, and also insert an RFC-3168 single queue AQM into their network to demonstrate what actually happens in that case.  I think that the results will be illuminating for all concerned.
> 	What I really would like to see, how L4S endpoints will deal with post-bottleneck ingress shaping by an RFC3168 -compliant FQ-AQM. I know the experts here deems this not even a theoretical concern, but I really really want to see data, that L4S flows will not crowd out the more reactive RFC3168 flows in that situation. This is the set-up quite a number of latency sensitive end-users actually use to "debloat" the internet and it would be nice to have real data showing that this is not a concern.
Both teams brought their testbeds, and as of yesterday evening, Koen and 
Pete Heist had put the two together and started the tests Jonathan 
proposed. Usual problems: latest Linux kernel being used has introduced 
a bug, so need to wind back. But progressing.

Nonetheless, altho it's included in the tests, I don't see the 
particular concern with this 'Cake' scenario. How can "L4S flows crowd 
out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM". 
Whenever it would be happening, FQ would prevent it.

To ensure we're not continually being blown into the weeds, I thought 
the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.



Bob

>
> Best Regards
> 	Sebastian
>
>
>
>> - Jonathan Morton
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 20:06                                     ` Black, David
  2019-07-19 20:44                                       ` Jonathan Morton
@ 2019-07-21 12:30                                       ` Bob Briscoe
  2019-07-21 16:08                                         ` Sebastian Moeller
       [not found]                                         ` <5D34803D.50501@erg.abdn.ac.uk>
  2019-07-21 12:30                                       ` Scharf, Michael
  2 siblings, 2 replies; 84+ messages in thread
From: Bob Briscoe @ 2019-07-21 12:30 UTC (permalink / raw)
  To: Black, David, Wesley Eddy, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)
  Cc: ecn-sane, tsvwg

David,

On 19/07/2019 21:06, Black, David wrote:
> Two comments as an individual, not as a WG chair:
>
>> Mostly, they're things that an end-host algorithm needs
>> to do in order to behave nicely, that might be good things anyways
>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>> work well w/ small RTT, be robust to reordering).  I am curious which
>> ones you think are too rigid ... maybe they can be loosened?
> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>
> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
As you know, we have been at pains to address every concern about L4S 
that has come up over the years, and I thought we had addressed this one 
to your satisfaction.

The reliable transports you are are concerned about require ordered 
delivery by the underlying fabric, so they can only ever exist in a 
controlled environment. In such a controlled environment, your ECT1+DSCP 
idea (below) could be used to isolate the L4S experiment from these 
transports and their firmware/hardware constraints.

On the public Internet, the DSCP commonly gets wiped at the first hop. 
So requiring a DSCP as well as ECT1 to separate off L4S would serve no 
useful purpose: it would still lead to ECT1 packets without the DSCP 
sent from a scalable congestion controls (which is behind Jonathan's 
concern in response to you).


>>> So to me, it goes back to slamming the door shut, or not, on L4S's usage
>>> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
>>> all the dependent code and algorithms can possibly scale past a single
>>> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
>>> case... and certainly remain "experimental" in scope until anyone
>>> independent can attempt to evaluate it.
>> That seems good to discuss in regard to the L4S ID draft.  There is a
>> section (5.2) there already discussing DSCP, and why it alone isn't
>> feasible.  There's also more detailed description of the relation and
>> interworking in
>> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
> [2] We probably should pay more attention to that draft.  One of the things that I think is important in that draft is a requirement that operators can enable/disable L4S behavior of ECT(1) on a per-DSCP basis - the rationale for that functionality starts with incremental deployment.   This technique may also have the potential to provide a means for L4S and SCE to coexist via use of different DSCPs for L4S vs. SCE traffic (there are some subtleties here, e.g., interaction with operator bleaching of DSCPs to zero at network boundaries).
>
> To be clear on what I have in mind:
> 	o Unacceptable: All traffic marked with ECT(1) goes into the L4S queue, independent of what DSCP it is marked with.
> 	o Acceptable:  There's an operator-configurable list of DSCPs that support an L4S service - traffic marked with ECT(1) goes into the L4S queue if and only if that traffic is also marked with a DSCP that is on the operator's DSCPs-for-L4S list.
Please confirm:
a) that your RACK concern only applies in controlled environments, and 
ECT1+DSCP resolves it
b) on the public Internet, we currently have one issue to address: 
single-queue RFC3168 AQMs,
and if we can resolve that, ECT1 alone would be acceptable as an L4S 
identifier.

I am trying to focus the issues list, which I would hope you would 
support, even without your chair hat on.



Bob

>
> Reminder: This entire message is posted as an individual, not as a WG chair.
>
> Thanks, --David
>
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Wesley Eddy
>> Sent: Friday, July 19, 2019 2:34 PM
>> To: Dave Taht; De Schepper, Koen (Nokia - BE/Antwerp)
>> Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
>> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
>>
>>
>> [EXTERNAL EMAIL]
>>
>> On 7/19/2019 11:37 AM, Dave Taht wrote:
>>> It's the common-q with AQM **+ ECN** that's the sticking point. I'm
>>> perfectly satisfied with the behavior of every ietf approved single
>>> queued AQM without ecn enabled. Let's deploy more of those!
>> Hi Dave, I'm just trying to make sure I'm reading into your message
>> correctly ... if I'm understanding it, then you're not in favor of
>> either SCE or L4S at all?  With small queues and without ECN, loss
>> becomes the only congestion signal, which is not desirable, IMHO, or am
>> I totally misunderstanding something?
>>
>>
>>> If we could somehow create a neutral poll in the general networking
>>> community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
>>> ISPs small and large) , and do it much like your classic "vote for a
>>> political measure" thing, with a single point/counterpoint section,
>>> maybe we'd get somewhere.
>> While I agree that would be really useful, it's kind of an "I want a
>> pony" statement.  As a TSVWG chair where we're doing this work, we've
>> been getting inputs from people that have a foot in many of the
>> communities you mention, but always looking for more.
>>
>>
>>> In particular conflating "low latency" really confounds the subject
>>> matter, and has for years. FQ gives "low latency" for the vast
>>> majority of flows running below their fair share. L4S promises "low
>>> latency" for a rigidly defined set of congestion controls in a
>>> specialized queue, and otherwise tosses all flows into a higher latency
>>> queue when one flow is greedy.
>> I don't think this is a correct statement.  Packets have to be from a
>> "scalable congestion control" to get access to the L4S queue.  There are
>> some draft requirements for using the L4S ID, but they seem pretty
>> flexible to me.  Mostly, they're things that an end-host algorithm needs
>> to do in order to behave nicely, that might be good things anyways
>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>> work well w/ small RTT, be robust to reordering).  I am curious which
>> ones you think are too rigid ... maybe they can be loosened?
>>
>> Also, I don't think the "tosses all flows into a higher latency queue
>> when one flow is greedy" characterization is correct.  The other queue
>> is for classic/non-scalable traffic, and not necessarily higher latency
>> for a given flow, nor is winding up there related to whether another
>> flow is greedy.
>>
>>
>>> So to me, it goes back to slamming the door shut, or not, on L4S's usage
>>> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
>>> all the dependent code and algorithms can possibly scale past a single
>>> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
>>> case... and certainly remain "experimental" in scope until anyone
>>> independent can attempt to evaluate it.
>> That seems good to discuss in regard to the L4S ID draft.  There is a
>> section (5.2) there already discussing DSCP, and why it alone isn't
>> feasible.  There's also more detailed description of the relation and
>> interworking in
>> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
>>
>>
>>> I'd really all the tcp-go-fast-at-any-cost people to take a year off to
>>> dogfood their designs, and go live somewhere with a congested network
>> to
>>> deal with daily, like a railway or airport, or on 3G network on a
>>> sailboat or beach somewhere. It's not a bad life... REALLY.
>>>
>> Fortunately, at least in the IETF, I don't think there have been
>> initiatives in the direction of going fast at any cost in recent
>> history, and they would be unlikely to be well accepted if there were!
>> That is at least one place that there seems to be strong consensus.
>>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
  2019-07-19 20:06                                     ` Black, David
  2019-07-19 20:44                                       ` Jonathan Morton
  2019-07-21 12:30                                       ` Bob Briscoe
@ 2019-07-21 12:30                                       ` Scharf, Michael
  2 siblings, 0 replies; 84+ messages in thread
From: Scharf, Michael @ 2019-07-21 12:30 UTC (permalink / raw)
  To: Black, David, Wesley Eddy, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)
  Cc: ecn-sane, tsvwg

One comment, also with no hat on...

> -----Original Message-----
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Black, David
> Sent: Friday, July 19, 2019 10:06 PM
> To: Wesley Eddy <wes@mti-systems.com>; Dave Taht <dave@taht.net>; De
> Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-
> labs.com>
> Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
> 
> Two comments as an individual, not as a WG chair:
> 
> > Mostly, they're things that an end-host algorithm needs
> > to do in order to behave nicely, that might be good things anyways
> > without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
> > work well w/ small RTT, be robust to reordering).  I am curious which
> > ones you think are too rigid ... maybe they can be loosened?
> 
> [1] I have profoundly objected to L4S's RACK-like requirement (use time to
> detect loss, and in particular do not use 3DupACK) in public on multiple
> occasions

... and I have asked in public to remove the RACK requirement, too.

> because in reliable transport space, that forces use of TCP Prague,
> a protocol with which we have little to no deployment or operational
> experience.  Moreover, that requirement raises the bar for other protocols
> in a fashion that impacts endpoint firmware, and possibly hardware in some
> important (IMHO) environments where investing in those changes delivers
> little to no benefit.  The environments that I have in mind include a lot of data
> centers.  Process wise, I'm ok with addressing this objection via some sort of
> "controlled environment" escape clause text that makes this RACK-like
> requirement inapplicable in a "controlled environment" that does not need
> that behavior (e.g., where 3DupACK does not cause problems and is not
> expected to cause problems).

Also, note that the work on RACK is ongoing in TCPM. While there seems to be plenty of deployment expertise, it is perfectly possible that issues will be discovered in future. And we are pre-WGLC in TCPM, i.e., even the specification of RACK could still change.

In general, listing in one experiment requirements on the outcome of another ongoing experiment is a bad idea and should be avoided, IMHO. I have also mentioned this in the past and will not change my mind so easily. Historically, the IETF has good experience with bottom-up modular protocol mechanisms and running code instead of top-down architectures.

Michael

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Ecn-sane] Hackathon tests
  2019-07-21 11:53                                           ` Bob Briscoe
@ 2019-07-21 15:30                                             ` Dave Taht
  2019-07-21 15:33                                             ` [Ecn-sane] [tsvwg] Comments on L4S drafts Sebastian Moeller
  2019-07-21 16:00                                             ` Jonathan Morton
  2 siblings, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-21 15:30 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: Sebastian Moeller, Jonathan Morton, Black, David, ecn-sane,
	tsvwg, Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp)

Just changing the endless topic line

On Sun, Jul 21, 2019 at 8:17 AM Bob Briscoe <in@bobbriscoe.net> wrote:
>
> Sebastian,
>
> On 19/07/2019 23:03, Sebastian Moeller wrote:
> > Hi Jonathan,
> >
> >
> >
> >> On Jul 19, 2019, at 22:44, Jonathan Morton <chromatix99@gmail.com> wrote:
> >> So I'm pleased to hear that the L4S team will be at the hackathon with a demo setup.  Hopefully we will be able to obtain comparative test results, using the same test scripts as we use on SCE, and also insert an RFC-3168 single queue AQM into their network to demonstrate what actually happens in that case.  I think that the results will be illuminating for all concerned.
> >       What I really would like to see, how L4S endpoints will deal with post-bottleneck ingress shaping by an RFC3168 -compliant FQ-AQM. I know the experts here deems this not even a theoretical concern, but I really really want to see data, that L4S flows will not crowd out the more reactive RFC3168 flows in that situation. This is the set-up quite a number of latency sensitive end-users actually use to "debloat" the internet and it would be nice to have real data showing that this is not a concern.
> Both teams brought their testbeds, and as of yesterday evening, Koen and
> Pete Heist had put the two together and started the tests Jonathan
> proposed. Usual problems: latest Linux kernel being used has introduced
> a bug, so need to wind back. But progressing.
>
> Nonetheless, altho it's included in the tests, I don't see the
> particular concern with this 'Cake' scenario. How can "L4S flows crowd
> out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM".
> Whenever it would be happening, FQ would prevent it.
>
> To ensure we're not continually being blown into the weeds, I thought
> the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.
>
>
>
> Bob
>
> >
> > Best Regards
> >       Sebastian
> >
> >
> >
> >> - Jonathan Morton
> >> _______________________________________________
> >> Ecn-sane mailing list
> >> Ecn-sane@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/ecn-sane
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/ecn-sane
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 11:53                                           ` Bob Briscoe
  2019-07-21 15:30                                             ` [Ecn-sane] Hackathon tests Dave Taht
@ 2019-07-21 15:33                                             ` Sebastian Moeller
  2019-07-21 16:00                                             ` Jonathan Morton
  2 siblings, 0 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-21 15:33 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: Jonathan Morton, De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

Hi Bob,

I hope you had an enjoyable holiday.

> On Jul 21, 2019, at 13:53, Bob Briscoe <in@bobbriscoe.net> wrote:
> 
> Sebastian,
> 
> On 19/07/2019 23:03, Sebastian Moeller wrote:
>> Hi Jonathan,
>> 
>> 
>> 
>>> On Jul 19, 2019, at 22:44, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> So I'm pleased to hear that the L4S team will be at the hackathon with a demo setup.  Hopefully we will be able to obtain comparative test results, using the same test scripts as we use on SCE, and also insert an RFC-3168 single queue AQM into their network to demonstrate what actually happens in that case.  I think that the results will be illuminating for all concerned.
>> 	What I really would like to see, how L4S endpoints will deal with post-bottleneck ingress shaping by an RFC3168 -compliant FQ-AQM. I know the experts here deems this not even a theoretical concern, but I really really want to see data, that L4S flows will not crowd out the more reactive RFC3168 flows in that situation. This is the set-up quite a number of latency sensitive end-users actually use to "debloat" the internet and it would be nice to have real data showing that this is not a concern.
> Both teams brought their testbeds, and as of yesterday evening, Koen and Pete Heist had put the two together and started the tests Jonathan proposed. Usual problems: latest Linux kernel being used has introduced a bug, so need to wind back. But progressing.

	Great!

> 
> Nonetheless, altho it's included in the tests, I don't see the particular concern with this 'Cake' scenario.

	This is not a "cake" scenario, but rather an sqm-scripts scenario; for a number of years we have directed latency sensitive users to use ingress and egress traffic shaping to keep latency under load increase in check. To make things easy we offer an exemplary set of scripts under the name sqm-scripts, see https://github.com/tohojo/sqm-scripts, that make it easy to create and test this approach (we also integrated it nicely into OpenWrt to make it eve simpler to get decent de-bufferblat configured for home networks). We implanted the general approach of an FQ-AQM as post-bottleneck shaper, with HFSC+fq_codel (since retired), HTB+fq_codel and also with cake, but the whole approach proceeds cake existence. Now, cake takes most of these ideas to a new level (e.g. operating as ingress shaper to actually shape the ingress rate instead of the shaper's egress rate), but it is not that this approach requires cake.


> How can "L4S flows crowd out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM". Whenever it would be happening, FQ would prevent it.

	I have heard this repeatedly, but I want to see hard data instead of theoretic considerations, please. Especially since nobody bothered to think about post-bottleneck ingress shaping before I brought it up, this certainly was not considered during the design of L4S; so if it is not a problem, just demonstrate this to shut me up ;).
	So to be clear the scenario I want tested is something like the following:

1) Internet: the test servers connected with say 10 times the true bottleneck rate

2) "true bottleneck": say 100 Mbps / 40 Mbps (using a relative dump over-buffered traffic shaper, like most ISPs seem to do, so at least buffering for >=300ms per direction)

3) post-bottleneck ingress&egress flow-fair shaping: say 90/36 Mbps.

What I want to see is that with that set-up bi-directional saturating traffic with both RFC3168 and L4S flows that each flow still sees roughly its fair share of the bandwidth. I fear that L4S with its linear CE-response will react slower to AQM signals and hence will successively eat a bit of the RFC3168-flow's bandwidth share that throttle back due to receiving a CE mark. I hope my fears are over blown, but at the current state it was not easy enough to actually test that myself.


> 
> To ensure we're not continually being blown into the weeds, I thought the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.

	I believe I have been clear that my concern is the effect of under-responsive L4S-flows on the flow fairness with a post-bottleneck ingress FQ-AQM system. So no compatibility with a "RFC3168-compliant /single-queue/ AQM" is not the only concern. Especially since I know that there is a community out there using post-bottleneck ingress FQ-AQM to keep latency under load increase under control, who would be less than impressed if L4S would destroy the effectiveness of their "solution". Really, I wonder why the L4S project did not reach out to this community during the design phase, since there users could be your natural supporters assuming your solution scratches their itches sufficiently well.

Best Regards
	Sebastian

> 
> 
> 
> Bob
> 
>> 
>> Best Regards
>> 	Sebastian
>> 
>> 
>> 
>>> - Jonathan Morton
>>> _______________________________________________
>>> Ecn-sane mailing list
>>> Ecn-sane@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/ecn-sane
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
> 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 11:53                                           ` Bob Briscoe
  2019-07-21 15:30                                             ` [Ecn-sane] Hackathon tests Dave Taht
  2019-07-21 15:33                                             ` [Ecn-sane] [tsvwg] Comments on L4S drafts Sebastian Moeller
@ 2019-07-21 16:00                                             ` Jonathan Morton
  2019-07-21 16:12                                               ` Sebastian Moeller
  2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2 siblings, 2 replies; 84+ messages in thread
From: Jonathan Morton @ 2019-07-21 16:00 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: Sebastian Moeller, De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

> On 21 Jul, 2019, at 7:53 am, Bob Briscoe <in@bobbriscoe.net> wrote:
> 
> Both teams brought their testbeds, and as of yesterday evening, Koen and Pete Heist had put the two together and started the tests Jonathan proposed. Usual problems: latest Linux kernel being used has introduced a bug, so need to wind back. But progressing.
> 
> Nonetheless, altho it's included in the tests, I don't see the particular concern with this 'Cake' scenario. How can "L4S flows crowd out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM". Whenever it would be happening, FQ would prevent it.
> 
> To ensure we're not continually being blown into the weeds, I thought the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.

I drew up a list of five network topologies to test, each with the SCE set of tests and tools, but using mostly L4S network components and focused on L4S performance and robustness.

1: L4S sender -> L4S middlebox (bottleneck) -> L4S receiver.

This is simply a sanity check to make sure the tools worked.  Actually we fell over even at this stage yesterday, because we discovered problems in the system Bob and Koen had brought along to demo.  These may or may not be improved today; we'll see.

2: L4S sender -> FQ-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.

This is the most favourable-to-L4S topology that incorporates a non-L4S component that we could easily come up with, and therefore .  Apparently the L4S folks are also relatively unfamiliar with Codel, which is now the most widely deployed AQM in the world, and this would help to validate that L4S transports respond reasonably to it.

3: L4S sender -> single-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.

This is the topology of most concern, and is obtained from topology 2 by simply changing a parameter on our middlebox.

4: L4S sender -> ECT(1) mangler -> L4S middlebox (bottleneck) -> L4S receiver.

Exploring what happens if an adversary tries to game the system.  We could also try an ECT(0) mangler or a Not-ECT mangler, in the same spirit.

5: L4S sender -> L4S middlebox (bottleneck 1) -> Dumb FIFO (bottleneck 2) -> FQ-AQM middlebox (bottleneck 3) -> L4S receiver.

This is Sebastian's scenario.  We did have some discussion yesterday about the propensity of existing senders to produce line-rate bursts occasionally, and the way these bursts could collect in *all* of the queues at successively decreasing bottlenecks.  This is a test which explores that scenario and measures its effects, and is highly relevant to best consumer practice on today's Internet.

Naturally, we have tried the equivalent of most of the above scenarios on our SCE testbed already.  The only one we haven't explicitly tried out is #5; I think we'd need to use all of Pete's APUs plus at least one of my machines to set it up, and we were too tired for that last night.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 12:30                                       ` Bob Briscoe
@ 2019-07-21 16:08                                         ` Sebastian Moeller
  2019-07-21 19:14                                           ` Bob Briscoe
       [not found]                                         ` <5D34803D.50501@erg.abdn.ac.uk>
  1 sibling, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-21 16:08 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: Black, David, Wesley Eddy, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp),
	ecn-sane, tsvwg

Hi Bob,


> On Jul 21, 2019, at 14:30, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> David,
> 
> On 19/07/2019 21:06, Black, David wrote:
>> Two comments as an individual, not as a WG chair:
>> 
>>> Mostly, they're things that an end-host algorithm needs
>>> to do in order to behave nicely, that might be good things anyways
>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>> ones you think are too rigid ... maybe they can be loosened?
>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>> 
>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
> 
> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
> 
> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).

	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.

Best Regards
	Sebastian



> 
> 
>>>> So to me, it goes back to slamming the door shut, or not, on L4S's usage
>>>> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
>>>> all the dependent code and algorithms can possibly scale past a single
>>>> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
>>>> case... and certainly remain "experimental" in scope until anyone
>>>> independent can attempt to evaluate it.
>>> That seems good to discuss in regard to the L4S ID draft.  There is a
>>> section (5.2) there already discussing DSCP, and why it alone isn't
>>> feasible.  There's also more detailed description of the relation and
>>> interworking in
>>> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
>> [2] We probably should pay more attention to that draft.  One of the things that I think is important in that draft is a requirement that operators can enable/disable L4S behavior of ECT(1) on a per-DSCP basis - the rationale for that functionality starts with incremental deployment.   This technique may also have the potential to provide a means for L4S and SCE to coexist via use of different DSCPs for L4S vs. SCE traffic (there are some subtleties here, e.g., interaction with operator bleaching of DSCPs to zero at network boundaries).
>> 
>> To be clear on what I have in mind:
>> 	o Unacceptable: All traffic marked with ECT(1) goes into the L4S queue, independent of what DSCP it is marked with.
>> 	o Acceptable:  There's an operator-configurable list of DSCPs that support an L4S service - traffic marked with ECT(1) goes into the L4S queue if and only if that traffic is also marked with a DSCP that is on the operator's DSCPs-for-L4S list.
> Please confirm:
> a) that your RACK concern only applies in controlled environments, and ECT1+DSCP resolves it
> b) on the public Internet, we currently have one issue to address: single-queue RFC3168 AQMs,
> and if we can resolve that, ECT1 alone would be acceptable as an L4S identifier.
> 
> I am trying to focus the issues list, which I would hope you would support, even without your chair hat on.
> 
> 
> 
> Bob
> 
>> 
>> Reminder: This entire message is posted as an individual, not as a WG chair.
>> 
>> Thanks, --David
>> 
>>> -----Original Message-----
>>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Wesley Eddy
>>> Sent: Friday, July 19, 2019 2:34 PM
>>> To: Dave Taht; De Schepper, Koen (Nokia - BE/Antwerp)
>>> Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
>>> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
>>> 
>>> 
>>> [EXTERNAL EMAIL]
>>> 
>>> On 7/19/2019 11:37 AM, Dave Taht wrote:
>>>> It's the common-q with AQM **+ ECN** that's the sticking point. I'm
>>>> perfectly satisfied with the behavior of every ietf approved single
>>>> queued AQM without ecn enabled. Let's deploy more of those!
>>> Hi Dave, I'm just trying to make sure I'm reading into your message
>>> correctly ... if I'm understanding it, then you're not in favor of
>>> either SCE or L4S at all?  With small queues and without ECN, loss
>>> becomes the only congestion signal, which is not desirable, IMHO, or am
>>> I totally misunderstanding something?
>>> 
>>> 
>>>> If we could somehow create a neutral poll in the general networking
>>>> community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
>>>> ISPs small and large) , and do it much like your classic "vote for a
>>>> political measure" thing, with a single point/counterpoint section,
>>>> maybe we'd get somewhere.
>>> While I agree that would be really useful, it's kind of an "I want a
>>> pony" statement.  As a TSVWG chair where we're doing this work, we've
>>> been getting inputs from people that have a foot in many of the
>>> communities you mention, but always looking for more.
>>> 
>>> 
>>>> In particular conflating "low latency" really confounds the subject
>>>> matter, and has for years. FQ gives "low latency" for the vast
>>>> majority of flows running below their fair share. L4S promises "low
>>>> latency" for a rigidly defined set of congestion controls in a
>>>> specialized queue, and otherwise tosses all flows into a higher latency
>>>> queue when one flow is greedy.
>>> I don't think this is a correct statement.  Packets have to be from a
>>> "scalable congestion control" to get access to the L4S queue.  There are
>>> some draft requirements for using the L4S ID, but they seem pretty
>>> flexible to me.  Mostly, they're things that an end-host algorithm needs
>>> to do in order to behave nicely, that might be good things anyways
>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>> ones you think are too rigid ... maybe they can be loosened?
>>> 
>>> Also, I don't think the "tosses all flows into a higher latency queue
>>> when one flow is greedy" characterization is correct.  The other queue
>>> is for classic/non-scalable traffic, and not necessarily higher latency
>>> for a given flow, nor is winding up there related to whether another
>>> flow is greedy.
>>> 
>>> 
>>>> So to me, it goes back to slamming the door shut, or not, on L4S's usage
>>>> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
>>>> all the dependent code and algorithms can possibly scale past a single
>>>> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
>>>> case... and certainly remain "experimental" in scope until anyone
>>>> independent can attempt to evaluate it.
>>> That seems good to discuss in regard to the L4S ID draft.  There is a
>>> section (5.2) there already discussing DSCP, and why it alone isn't
>>> feasible.  There's also more detailed description of the relation and
>>> interworking in
>>> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
>>> 
>>> 
>>>> I'd really all the tcp-go-fast-at-any-cost people to take a year off to
>>>> dogfood their designs, and go live somewhere with a congested network
>>> to
>>>> deal with daily, like a railway or airport, or on 3G network on a
>>>> sailboat or beach somewhere. It's not a bad life... REALLY.
>>>> 
>>> Fortunately, at least in the IETF, I don't think there have been
>>> initiatives in the direction of going fast at any cost in recent
>>> history, and they would be unlikely to be well accepted if there were!
>>> That is at least one place that there seems to be strong consensus.
>>> 
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
> 
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 16:00                                             ` Jonathan Morton
@ 2019-07-21 16:12                                               ` Sebastian Moeller
  2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  1 sibling, 0 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-21 16:12 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Bob Briscoe, De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

Dear Jonathan,

many thanks, these are exactly the tests I am curious about. Excellent work, now I am super curious about the results!



> On Jul 21, 2019, at 18:00, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
>> On 21 Jul, 2019, at 7:53 am, Bob Briscoe <in@bobbriscoe.net> wrote:
>> 
>> Both teams brought their testbeds, and as of yesterday evening, Koen and Pete Heist had put the two together and started the tests Jonathan proposed. Usual problems: latest Linux kernel being used has introduced a bug, so need to wind back. But progressing.
>> 
>> Nonetheless, altho it's included in the tests, I don't see the particular concern with this 'Cake' scenario. How can "L4S flows crowd out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM". Whenever it would be happening, FQ would prevent it.
>> 
>> To ensure we're not continually being blown into the weeds, I thought the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.
> 
> I drew up a list of five network topologies to test, each with the SCE set of tests and tools, but using mostly L4S network components and focused on L4S performance and robustness.
> 
> 
> 1: L4S sender -> L4S middlebox (bottleneck) -> L4S receiver.
> 
> This is simply a sanity check to make sure the tools worked.  Actually we fell over even at this stage yesterday, because we discovered problems in the system Bob and Koen had brought along to demo.  These may or may not be improved today; we'll see.
> 
> 
> 2: L4S sender -> FQ-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.
> 
> This is the most favourable-to-L4S topology that incorporates a non-L4S component that we could easily come up with, and therefore .  Apparently the L4S folks are also relatively unfamiliar with Codel, which is now the most widely deployed AQM in the world, and this would help to validate that L4S transports respond reasonably to it.
> 
> 
> 3: L4S sender -> single-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.
> 
> This is the topology of most concern, and is obtained from topology 2 by simply changing a parameter on our middlebox.
> 
> 
> 4: L4S sender -> ECT(1) mangler -> L4S middlebox (bottleneck) -> L4S receiver.
> 
> Exploring what happens if an adversary tries to game the system.  We could also try an ECT(0) mangler or a Not-ECT mangler, in the same spirit.
> 
> 
> 5: L4S sender -> L4S middlebox (bottleneck 1) -> Dumb FIFO (bottleneck 2) -> FQ-AQM middlebox (bottleneck 3) -> L4S receiver.
> 
> This is Sebastian's scenario.  We did have some discussion yesterday about the propensity of existing senders to produce line-rate bursts occasionally, and the way these bursts could collect in *all* of the queues at successively decreasing bottlenecks.  This is a test which explores that scenario and measures its effects, and is highly relevant to best consumer practice on today's Internet.

	double plus!



> 
> 
> Naturally, we have tried the equivalent of most of the above scenarios on our SCE testbed already.  The only one we haven't explicitly tried out is #5; I think we'd need to use all of Pete's APUs plus at least one of my machines to set it up, and we were too tired for that last night.

	Thanks for doing this!

Best Regards
	Sebastian


> 
> - Jonathan Morton


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]   Comments on L4S drafts
       [not found]                                         ` <5D34803D.50501@erg.abdn.ac.uk>
@ 2019-07-21 16:43                                           ` Black, David
  0 siblings, 0 replies; 84+ messages in thread
From: Black, David @ 2019-07-21 16:43 UTC (permalink / raw)
  To: gorry, Bob Briscoe; +Cc: ecn-sane, tsvwg

Bob,

Pulling relevant text to the top ...

> > As you know, we have been at pains to address every concern about L4S
> > that has come up over the years, and I thought we had addressed this
> > one to your satisfaction.

Truth be told, "acquiescence" would be a more accurate word than "satisfaction."  I can live with the current plans, but I would not describe myself as satisfied with them.

> > The reliable transports you are concerned about require ordered
> > delivery by the underlying fabric, so they can only ever exist in a
> > controlled environment. In such a controlled environment, your
> > ECT1+DSCP idea (below) could be used to isolate the L4S experiment
> > from these transports and their firmware/hardware constraints.

There appears to be a lack of understanding here.  The protocols in question, RoCEv2 in particular have some reordering tolerance, but not as good as TCP's.  Current requirements for ordered delivery are in the same general area as TCP with 3DupACK, which is not constrained to controlled environments.

> > On the public Internet, the DSCP commonly gets wiped at the first hop.
> > So requiring a DSCP as well as ECT1 to separate off L4S would serve no
> > useful purpose: it would still lead to ECT1 packets without the DSCP
> > sent from a scalable congestion controls (which is behind Jonathan's
> > concern in response to you).

We're on the same page here, as I also wrote the following (although a stronger word that "subtleties" would have been better in 20/20 hindsight):

> >> traffic (there are some subtleties here, e.g., interaction
> >> with operator bleaching of DSCPs to zero at network boundaries).

On to the two requests.

> > Please confirm:
> > a) that your RACK concern only applies in controlled environments, and
> > ECT1+DSCP resolves it

No, twice.  I hope that’s clearer now between what Gorry, Michael, and myself have posted.

As stated in the past, and moreover in this email thread, I can accept some sort controlled environment text as a compromise means of moving the experiment forward:

> >> Process wise, I'm ok with addressing this objection via some sort of
> >> "controlled environment" escape clause text that makes this RACK-like
> >> requirement inapplicable in a "controlled environment" that does not
> >> need that behavior (e.g., where 3DupACK does not cause problems and
> >> is not expected to cause problems).

Moving on to the next topic:

> > b) on the public Internet, we currently have one issue to address:
> > single-queue RFC3168 AQMs,
> > and if we can resolve that, ECT1 alone would be acceptable as an L4S
> > identifier.

In addition to a), there is now the desire of SCE to use ECT(1) at similar scope.

Thanks, --David

> -----Original Message-----
> From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
> Sent: Sunday, July 21, 2019 11:10 AM
> To: Bob Briscoe
> Cc: Black, David; Wesley Eddy; Dave Taht; De Schepper, Koen (Nokia -
> BE/Antwerp); ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
> 
> 
> [EXTERNAL EMAIL]
> 
> I'd like to add to this what I understand as an individual ... see inline.
> 
> On 21/07/2019, 08:30, Bob Briscoe wrote:
> > David,
> >
> > On 19/07/2019 21:06, Black, David wrote:
> >> Two comments as an individual, not as a WG chair:
> >>
> >>> Mostly, they're things that an end-host algorithm needs
> >>> to do in order to behave nicely, that might be good things anyways
> >>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
> >>> work well w/ small RTT, be robust to reordering).  I am curious which
> >>> ones you think are too rigid ... maybe they can be loosened?
> >> [1] I have profoundly objected to L4S's RACK-like requirement (use
> >> time to detect loss, and in particular do not use 3DupACK) in public
> >> on multiple occasions, because in reliable transport space, that
> >> forces use of TCP Prague, a protocol with which we have little to no
> >> deployment or operational experience.  Moreover, that requirement
> >> raises the bar for other protocols in a fashion that impacts endpoint
> >> firmware, and possibly hardware in some important (IMHO) environments
> >> where investing in those changes delivers little to no benefit.  The
> >> environments that I have in mind include a lot of data centers.
> >> Process wise, I'm ok with addressing this objection via some sort of
> >> "controlled environment" escape clause text that makes this RACK-like
> >> requirement inapplicable in a "controlled environment" that does not
> >> need that behavior (e.g., where 3DupACK does not cause problems and
> >> is not expected to cause problems).
> >>
> >> For clarity, I understand the multi-lane link design rationale behind
> >> the RACK-like requirement and would agree with that requirement in a
> >> perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK
> >> will not vanish from "running code" anytime soon.
> > As you know, we have been at pains to address every concern about L4S
> > that has come up over the years, and I thought we had addressed this
> > one to your satisfaction.
> >
> > The reliable transports you are are concerned about require ordered
> > delivery by the underlying fabric, so they can only ever exist in a
> > controlled environment. In such a controlled environment, your
> > ECT1+DSCP idea (below) could be used to isolate the L4S experiment
> > from these transports and their firmware/hardware constraints.
> >
> > On the public Internet, the DSCP commonly gets wiped at the first hop.
> > So requiring a DSCP as well as ECT1 to separate off L4S would serve no
> > useful purpose: it would still lead to ECT1 packets without the DSCP
> > sent from a scalable congestion controls (which is behind Jonathan's
> > concern in response to you).
> >
> >
> It would always be possible to have taken an approach that required a
> DSCP to use the "alternantive ECN semantic" . This option was debated
> when L4S was first discussed. The WG draft decided against that
> approach, and instead chose to use an ECT(1) codepoint. That I recall
> was analysed in depth.
> 
> This does not preclude someone from classifying on a DSCP (such as the
> suggested NQB) to also choose which ECN treatment to use (should that be
> useful for some reason, e.g. because the traffic is low rate). To me, at
> least, it important to allow traffic with DSCP markings to utilise the
> AQM ECN treatments.
> 
> >>>> So to me, it goes back to slamming the door shut, or not, on L4S's
> >>>> usage
> >>>> of ect(1) as a too easily gamed e2e identifier. As I don't think it
> >>>> and
> >>>> all the dependent code and algorithms can possibly scale past a single
> >>>> physical layer tech, I'd like to see it move to a DSCP codepoint,
> >>>> worst
> >>>> case... and certainly remain "experimental" in scope until anyone
> >>>> independent cn attempt to evaluate it.
> >>> That seems good to discuss in regard to the L4S ID draft.  There is a
> >>> section (5.2) there already discussing DSCP, and why it alone isn't
> >>> feasible.  There's also more detailed description of the relation and
> >>> interworking in
> >>> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
> >> [2] We probably should pay more attention to that draft.  One of the
> >> things that I think is important in that draft is a requirement that
> >> operators can enable/disable L4S behavior of ECT(1) on a per-DSCP
> >> basis - the rationale for that functionality starts with incremental
> >> deployment.   This technique may also have the potential to provide a
> >> means for L4S and SCE to coexist via use of different DSCPs for L4S
> >> vs. SCE traffic (there are some subtleties here, e.g., interaction
> >> with operator bleaching of DSCPs to zero at network boundaries).
> >>
> >> To be clear on what I have in mind:
> >>     o Unacceptable: All traffic marked with ECT(1) goes into the L4S
> >> queue, independent of what DSCP it is marked with.
> That is what has been described in the WG drafts since they entered
> TSVWG. I don't recall any suggested change to that decision until just now.
> >>     o Acceptable:  There's an operator-configurable list of DSCPs
> >> that support an L4S service - traffic marked with ECT(1) goes into
> >> the L4S queue if and only if that traffic is also marked with a DSCP
> >> that is on the operator's DSCPs-for-L4S list.
> That was always possible under the "alternative ECN markings", but I
> understood the purpose was to facilitate an Internet experiment.
> > Please confirm:
> > a) that your RACK concern only applies in controlled environments, and
> > ECT1+DSCP resolves it
> That seems more than obviously needed to me. There is a lot of traffic
> that uses some notion of timeliness for retransmission. Designing such a
> transport to be robust is tricky, but we're alreday exploring that for
> TCP and QUIC.
> 
> On the other hand, I have many times urged caution in creating
> assumptions that wit would be OK for Internet paths to somehow now allow
> more reordering. I'd like to see that happen - but I don't this
> recommendation is appropriuate.
> > b) on the public Internet, we currently have one issue to address:
> > single-queue RFC3168 AQMs,
> > and if we can resolve that, ECT1 alone would be acceptable as an L4S
> > identifier.
> >
> > I am trying to focus the issues list, which I would hope you would
> > support, even without your chair hat on.
> >
> >
> >
> > Bob
> >
> Gorry
> >>
> >> Reminder: This entire message is posted as an individual, not as a WG
> >> chair.
> >>
> >> Thanks, --David
> >>
> >>> -----Original Message-----
> >>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Wesley Eddy
> >>> Sent: Friday, July 19, 2019 2:34 PM
> >>> To: Dave Taht; De Schepper, Koen (Nokia - BE/Antwerp)
> >>> Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org
> >>> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
> >>>
> >>>
> >>> [EXTERNAL EMAIL]
> >>>
> >>> On 7/19/2019 11:37 AM, Dave Taht wrote:
> >>>> It's the common-q with AQM **+ ECN** that's the sticking point. I'm
> >>>> perfectly satisfied with the behavior of every ietf approved single
> >>>> queued AQM without ecn enabled. Let's deploy more of those!
> >>> Hi Dave, I'm just trying to make sure I'm reading into your message
> >>> correctly ... if I'm understanding it, then you're not in favor of
> >>> either SCE or L4S at all?  With small queues and without ECN, loss
> >>> becomes the only congestion signal, which is not desirable, IMHO, or am
> >>> I totally misunderstanding something?
> >>>
> >>>
> >>>> If we could somehow create a neutral poll in the general networking
> >>>> community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
> >>>> ISPs small and large) , and do it much like your classic "vote for a
> >>>> political measure" thing, with a single point/counterpoint section,
> >>>> maybe we'd get somewhere.
> >>> While I agree that would be really useful, it's kind of an "I want a
> >>> pony" statement.  As a TSVWG chair where we're doing this work, we've
> >>> been getting inputs from people that have a foot in many of the
> >>> communities you mention, but always looking for more.
> >>>
> >>>
> >>>> In particular conflating "low latency" really confounds the subject
> >>>> matter, and has for years. FQ gives "low latency" for the vast
> >>>> majority of flows running below their fair share. L4S promises "low
> >>>> latency" for a rigidly defined set of congestion controls in a
> >>>> specialized queue, and otherwise tosses all flows into a higher
> >>>> latency
> >>>> queue when one flow is greedy.
> >>> I don't think this is a correct statement.  Packets have to be from a
> >>> "scalable congestion control" to get access to the L4S queue.  There
> >>> are
> >>> some draft requirements for using the L4S ID, but they seem pretty
> >>> flexible to me.  Mostly, they're things that an end-host algorithm
> >>> needs
> >>> to do in order to behave nicely, that might be good things anyways
> >>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
> >>> work well w/ small RTT, be robust to reordering).  I am curious which
> >>> ones you think are too rigid ... maybe they can be loosened?
> >>>
> >>> Also, I don't think the "tosses all flows into a higher latency queue
> >>> when one flow is greedy" characterization is correct.  The other queue
> >>> is for classic/non-scalable traffic, and not necessarily higher latency
> >>> for a given flow, nor is winding up there related to whether another
> >>> flow is greedy.
> >>>
> >>>
> >>>> So to me, it goes back to slamming the door shut, or not, on L4S's
> >>>> usage
> >>>> of ect(1) as a too easily gamed e2e identifier. As I don't think it
> >>>> and
> >>>> all the dependent code and algorithms can possibly scale past a single
> >>>> physical layer tech, I'd like to see it move to a DSCP codepoint,
> >>>> worst
> >>>> case... and certainly remain "experimental" in scope until anyone
> >>>> independent can attempt to evaluate it.
> >>> That seems good to discuss in regard to the L4S ID draft.  There is a
> >>> section (5.2) there already discussing DSCP, and why it alone isn't
> >>> feasible.  There's also more detailed description of the relation and
> >>> interworking in
> >>> https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-diffserv-02
> >>>
> >>>
> >>>> I'd really all the tcp-go-fast-at-any-cost people to take a year
> >>>> off to
> >>>> dogfood their designs, and go live somewhere with a congested network
> >>> to
> >>>> deal with daily, like a railway or airport, or on 3G network on a
> >>>> sailboat or beach somewhere. It's not a bad life... REALLY.
> >>>>
> >>> Fortunately, at least in the IETF, I don't think there have been
> >>> initiatives in the direction of going fast at any cost in recent
> >>> history, and they would be unlikely to be well accepted if there were!
> >>> That is at least one place that there seems to be strong consensus.
> >>>
> >> _______________________________________________
> >> Ecn-sane mailing list
> >> Ecn-sane@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/ecn-sane
> >


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 16:08                                         ` Sebastian Moeller
@ 2019-07-21 19:14                                           ` Bob Briscoe
  2019-07-21 20:48                                             ` Sebastian Moeller
  0 siblings, 1 reply; 84+ messages in thread
From: Bob Briscoe @ 2019-07-21 19:14 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: tsvwg, Black, David, ecn-sane, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]

Sebastien,

On 21/07/2019 17:08, Sebastian Moeller wrote:
> Hi Bob,
>
>
>> On Jul 21, 2019, at 14:30, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> David,
>>
>> On 19/07/2019 21:06, Black, David wrote:
>>> Two comments as an individual, not as a WG chair:
>>>
>>>> Mostly, they're things that an end-host algorithm needs
>>>> to do in order to behave nicely, that might be good things anyways
>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>> ones you think are too rigid ... maybe they can be loosened?
>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>
>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>
>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>
>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>
> Best Regards
> 	Sebastian
>
I think you've understood this from reading abbreviated description of 
the requirement on the list, rather than the spec. The spec. solely says:

	A scalable congestion control MUST detect loss by counting in time-based units

That's all. No more, no less.

People call this the "RACK requirement", purely because the idea came 
from RACK. There is no requirement to do RACK, and the requirement 
applies to all transports, not just TCP.

It then means that a packet with ECT1 in the IP field can be forwarded 
without resequencing (no requirement - it just it /can/ be). This is a 
network layer 'unordered delivery' property, so it's appropriate to flag 
at the IP layer.




Bob



-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


[-- Attachment #2: Type: text/html, Size: 5421 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 19:14                                           ` Bob Briscoe
@ 2019-07-21 20:48                                             ` Sebastian Moeller
  2019-07-25 20:51                                               ` Bob Briscoe
  0 siblings, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-21 20:48 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: tsvwg, Black, David, ecn-sane, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)

Dear Bob, 

> On Jul 21, 2019, at 21:14, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastien,
> 
> On 21/07/2019 17:08, Sebastian Moeller wrote:
>> Hi Bob,
>> 
>> 
>> 
>>> On Jul 21, 2019, at 14:30, Bob Briscoe <ietf@bobbriscoe.net>
>>>  wrote:
>>> 
>>> David,
>>> 
>>> On 19/07/2019 21:06, Black, David wrote:
>>> 
>>>> Two comments as an individual, not as a WG chair:
>>>> 
>>>> 
>>>>> Mostly, they're things that an end-host algorithm needs
>>>>> to do in order to behave nicely, that might be good things anyways
>>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>>> ones you think are too rigid ... maybe they can be loosened?
>>>>> 
>>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>> 
>>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>>>> 
>>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>> 
>>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>> 
>>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
>>> 
>> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>> 
>> Best Regards
>> 	Sebastian
>> 
>> 
> I think you've understood this from reading abbreviated description of the requirement on the list, rather than the spec. The spec. solely says:
> 	A scalable congestion control MUST detect loss by counting in time-based units
> That's all. No more, no less. 
> 
> People call this the "RACK requirement", purely because the idea came from RACK. There is no requirement to do RACK, and the requirement applies to all transports, not just TCP.

	Fair enough, but my argument was not really about RACK at all, it more-so applies to the linear response to CE-marks that ECT(1) promises in the L4S approach. You are making changes to TCP's congestion controller that make it cease to be "TCP-friendly" (for arguably good reasons). So why insist on pretending that this is still TCP? So give it a new protocol ID already and all your classification needs are solved. As a bonus you do not need to use the same signal (CE) to elicit two different responses, but you could use the re-gained ECT(1) code point similarly to SCE to put the new fine-grained congestion signal into... while using CE in the RFC3168 compliant sense.


> 
> It then means that a packet with ECT1 in the IP field can be forwarded without resequencing (no requirement - it just it /can/ be).

	Packets always "can" be forwarded without resequencing, the question is whether the end-points are going to like that... 
And IMHO even RACK with its at maximum one RTT reordering windows gives intermediate hops not much to work with, without knowing the full RTT a cautious hop might allow itself one retransmission slot (so its own contribution to the RTT), but as far as I can tell they do that already. And tracking the RTT will require to keep per flow statistics, this also seems like it can get computationally expensive quickly... (I probably misunderstand how RACK works, but I fail to see how it will really allow more re-ordering, but that is also orthogonal to the L4S issues I try to raise).

> This is a network layer 'unordered delivery' property, so it's appropriate to flag at the IP layer. 

	But at that point you are multiplexing multiple things into the poor ECT(1) codepoint, the promise of a certain "linear" back-off behavior on encountered congestion AND a "allow relaxed ordering" ( "detect loss by counting in time-based units" does not seem to be fully equivalent with a generic tolerance to 'unordered delivery' as far as I understand). That seems asking to much of a simple number...

Best Regards
	Sebastian

> 
> 
> 
> 
> Bob
> 
> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg]  Comments on L4S drafts
  2019-07-19 15:37                                 ` Dave Taht
  2019-07-19 18:33                                   ` Wesley Eddy
@ 2019-07-22 16:28                                   ` Bless, Roland (TM)
  1 sibling, 0 replies; 84+ messages in thread
From: Bless, Roland (TM) @ 2019-07-22 16:28 UTC (permalink / raw)
  To: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: ecn-sane, Sebastian Moeller, tsvwg

Hi Dave and all,

[sorry, I'm a bit behind all the recent discussion, however...]
I agree in several points here:
1) burning ECT(1) for L4S is less beneficial than using ECT(1)
   as different kind of congestion signal as proposed in SCE
2) L4S could also use the very same signal, probably in
   addition to an L4S DSCP.
3) I don't think that we need to couple a particular SCE
   implementation to the ECT(1) usage.

Regards,
 Roland

Am 19.07.19 um 17:37 schrieb Dave Taht:
> "De Schepper, Koen (Nokia - BE/Antwerp)"
> <koen.de_schepper@nokia-bell-labs.com> writes:
> 
>> Hi Sebastian,
>>
>> To avoid people to read through the long mail, I think the main point I want to make is:
>>  "Indeed, having common-Qs supported is one of my requirements. That's
> 
> It's the common-q with AQM **+ ECN** that's the sticking point. I'm
> perfectly satisfied with the behavior of every ietf approved single
> queued AQM without ecn enabled. Let's deploy more of those!
> 
>> why I want to keep the discussion on that level: is there consensus
>> that low latency is only needed for a per flow FQ system with an AQM
>> per flow?"
> 
> Your problem statement elides the ECN bit.
> 
> If there is any one point that I'd like to see resolved about L4S
> vs SCE, it's having a vote on its the use of ECT(1) as an e2e
> identifier.
> 
> The poll I took in my communities (after trying really hard for years to
> get folk to take a look at the architecture without bias), ran about
> 98% against the L4S usage of ect(1), in the lwn article and in every
> private conversation since.
> 
> The SCE proposal for this half a bit as an additional congestion
> signal supplied by the aqm, is vastly superior.
> 
> If we could somehow create a neutral poll in the general networking
> community outside the ietf (nanog, bsd, linux, dcs, bigcos, routercos,
> ISPs small and large) , and do it much like your classic "vote for a
> political measure" thing, with a single point/counterpoint section,
> maybe we'd get somewhere.
> 
>>
>> If there is this consensus, this means that we can use SCE and that
>> from now on, all network nodes have to implement per flow queuing with
>> an AQM per flow.
> 
> There is no "we" here, and this is not a binary set of choices.
> 
> In particular conflating "low latency" really confounds the subject
> matter, and has for years. FQ gives "low latency" for the vast
> majority of flows running below their fair share. L4S promises "low
> latency" for a rigidly defined set of congestion controls in a
> specialized queue, and otherwise tosses all flows into a higher latency
> queue when one flow is greedy.
> 
> The "ultra low queuing latency *for all*" marketing claptrap that l4S
> had at one point really stuck in my craw.
> 
> 0) There is a "we" that likes L4S in all its complexity and missing
> integrated running code that demands total ECN deployment on one
> physical medium (so far), a change to the definition of ECN itself, and
> uses up ect(1) e2e instead of a dscp.
> 
> 1) There is a "we" that has a highly deployed fq+aqm that happens to
> have an ECN response, that is providing some of the lowest latencies
> ever seen, live on the internet, across multiple physical mediums.
> 
> With a backward compatible proposal to do better, that uses up ect(1) as
> an additional congestion notifier by the AQM.
> 
> 2) There is a VERY large (silent) majority that wants nothing to do with
> ECN at all and long ago fled the ietf, and works on things like RTT and
> other metrics that don't need anything extra at the IP layer.
> 
> 3) There is a vastly larger majority that has never even heard of AQM,
> much less ECN, and doesn't care.
> 
>> If there is no consensus, we cannot use SCE and need to use L4S.
> 
> No.
> 
> If there is no consensus, we just keep motoring on with the existing
> pie (with drop) deployments, and fq_codel/fq_pie/sch_cake more or less
> as is... and continued refinement of transports and more research.
> 
> We've got a few billion devices that could use just what we got to get
> orders of magnitude improvements in network delay.
> 
> And:
> 
> If there is consensus on fq+aqm+sce - ECN remains *optional*
> which is an outcome I massively support, also.
> 
> So repeating this:
> 
>> If there is this consensus, this means that we can use SCE and that
>> from now on, all network nodes have to implement per flow queuing with
>> an AQM per flow.
> 
> It's not a binary choice as you lay it out.
> 
> 1) Just getting FIFO queue sizes down to something reasonable - would be
> GREAT. It still blows my mind that CMTSes still have 700ms of buffering at
> 100Mbit, 8 years into this debate.
> 
> 2) only the network nodes most regularly experiencing human visible
> congestive events truly need any form of AQM or FQ. In terms of what I
> observe, thats:
> 
> ISP uplinks
> Wifi (at ISP downlink speeds > 40Mbit)
> 345G 
> ISP downlinks
> Other in-home devices like ethernet over powerline
> 
> I'm sure others in the DC and interconnects see things differently.
> 
> I know I'm weird, but I'd like to eliminate congestion *humans* see,
> rather than what skynet sees. Am I the only one that thinks this way?
> 
> 3) we currently have a choice between multiple single queue, *non ECN*
> enabled aqms that DO indeed work - pretty well - without any ECN support
> enabled - pie, red, dualpi without using the ect identifier, cake
> (cobalt). We never got around to making codel work better on a single
> queue because we didn't see the point, but what's in cobalt could go
> there if anyone cares.
> 
> We have a couple very successful fq+aqm combinations, *also*, that
> happen to have an RFC3168 ECN response.
> 
> 4) as for ECN enabled AQMs - single queued, dual q'd, or FQ'd, there's
> plenty of problems remaining with all of them and their transports, that
> make me very dubious about internet-wide deployment. Period. No matter
> what happens here, I am going to keep discouraging the linux distros as
> a whole to turn it on without first addressing the long list of items in
> the ecn-sane design group's work list.
> 
> ....
> 
> So to me, it goes back to slamming the door shut, or not, on L4S's usage
> of ect(1) as a too easily gamed e2e identifier. As I don't think it and
> all the dependent code and algorithms can possibly scale past a single
> physical layer tech, I'd like to see it move to a DSCP codepoint, worst
> case... and certainly remain "experimental" in scope until anyone
> independent can attempt to evaluate it. 
> 
> second door I'd like to slam shut is redefining CE to be a weaker signal
> of congestion as L4S does. I'm willing to write a whole bunch of
> standards track RFCs obsoleting the experimental RFCs allowing this, if
> that's what it takes. Bufferbloat is still a huge problem! Can we keep
> working on fixing that?
> 
> third door I'd like to see open is the possibilities behind SCE.
> 
> Lastly:
> 
> I'd really all the tcp-go-fast-at-any-cost people to take a year off to
> dogfood their designs, and go live somewhere with a congested network to
> deal with daily, like a railway or airport, or on 3G network on a
> sailboat or beach somewhere. It's not a bad life... REALLY.
> 
> In fact, it's WAY cheaper than attending 3 ietf conferences a year.
> 
> Enjoy Montreal!
> 
> Sincerely,
> 
> Dave Taht
>From my sailboat in Alameda
> 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 16:00                                             ` Jonathan Morton
  2019-07-21 16:12                                               ` Sebastian Moeller
@ 2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-22 18:33                                                 ` Dave Taht
                                                                   ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-22 18:15 UTC (permalink / raw)
  To: Jonathan Morton, Bob Briscoe
  Cc: Sebastian Moeller, Black, David, ecn-sane, tsvwg, Dave Taht

Jonathan,

I'm a bit surprised to read what I read here... I had the impression that we were on a much better level of understanding during the hackathon and that :

- we both agreed that the latest updates in the Linux kernels had quite some impact on DCTCP's performance (burstyness) that both you and we are working on. As also our testbed showed it had the same impact on DualPI2 and FQ-Codel (yes we do understand FQ_Codel and did extensively compare DualQ with it since the beginning of L4S).
- the current TCP-Prague we have in the public GitHub, which is DCTCP using accurate ECN and ect(1) and is drop compliant with Reno, is what SCE can use as well, and whatever you called SCE-TCP can be used for L4S, as (what I showed you mathematically) it is actually perfectly working according to DCTCP's law of 1/p, because it is DCTCP with some simple pacing tweaks you did. I thought we agreed that there is no difference in the congestion control part, and we want the same thing, and the only difference is how to use the code-point.
- related to the testbed setups, we have several running, the first since 2013. We support all kernel versions since 3.19 up to the latest 5.2-rc5. We have demonstrated L4S since 2015 in IETF93 and the L4S BoF with real equipment and software that is still the same as we use today.
- the testbed I brought (5 laptops and a switch that got broken during travel and I had to replace in the nearest shop), I had to install during the hackathon from scratch from our public GitHub (I arrived only at 14:00 on Saturday) which we made immediately available for you guys to put the flent testing tools on.
- related to the flent testing, you might have expected to find big differences, but both measurements showed exactly the same results. I understood you need to extent your tools to get more measurement parameters included which were missing compared to ours.
- we planned to complete your test list during this week and maybe best that we jointly report on the outcome of those to avoid different interpretations again.
- anybody who had interest in L4S could have evaluated it since we made our DUALPI2 code available in 2015 (actually many did). (To Dave That: if you wanted to evaluate DualPI2 you had plenty of opportunity, 4 years by now. I find it weird that suddenly you were not able to install a qdisc in Linux. Even if you wanted us to setup a testbed for you, you could have asked us.)

Maybe some good news too, we also had a (first time right) successful accurate ECN interop test between our Linux TCP-Prague and FreeBSD Reno (acc-ecn implementation provided by Richard Scheffenegger).

I hope these accusations of incompetence can stop now, and that we get to the point of finally getting a future looking low latency Internet deployed. Anybody else who doubts on the performance/robustness of L4S, let me know and we arrange a test session this week.

Koen.

-----Original Message-----
From: Jonathan Morton <chromatix99@gmail.com> 
Sent: Sunday, July 21, 2019 6:01 PM
To: Bob Briscoe <in@bobbriscoe.net>
Cc: Sebastian Moeller <moeller0@gmx.de>; De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>; Black, David <David.Black@dell.com>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht <dave@taht.net>
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts

> On 21 Jul, 2019, at 7:53 am, Bob Briscoe <in@bobbriscoe.net> wrote:
> 
> Both teams brought their testbeds, and as of yesterday evening, Koen and Pete Heist had put the two together and started the tests Jonathan proposed. Usual problems: latest Linux kernel being used has introduced a bug, so need to wind back. But progressing.
> 
> Nonetheless, altho it's included in the tests, I don't see the particular concern with this 'Cake' scenario. How can "L4S flows crowd out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM". Whenever it would be happening, FQ would prevent it.
> 
> To ensure we're not continually being blown into the weeds, I thought the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.

I drew up a list of five network topologies to test, each with the SCE set of tests and tools, but using mostly L4S network components and focused on L4S performance and robustness.

1: L4S sender -> L4S middlebox (bottleneck) -> L4S receiver.

This is simply a sanity check to make sure the tools worked.  Actually we fell over even at this stage yesterday, because we discovered problems in the system Bob and Koen had brought along to demo.  These may or may not be improved today; we'll see.

2: L4S sender -> FQ-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.

This is the most favourable-to-L4S topology that incorporates a non-L4S component that we could easily come up with, and therefore .  Apparently the L4S folks are also relatively unfamiliar with Codel, which is now the most widely deployed AQM in the world, and this would help to validate that L4S transports respond reasonably to it.

3: L4S sender -> single-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.

This is the topology of most concern, and is obtained from topology 2 by simply changing a parameter on our middlebox.

4: L4S sender -> ECT(1) mangler -> L4S middlebox (bottleneck) -> L4S receiver.

Exploring what happens if an adversary tries to game the system.  We could also try an ECT(0) mangler or a Not-ECT mangler, in the same spirit.

5: L4S sender -> L4S middlebox (bottleneck 1) -> Dumb FIFO (bottleneck 2) -> FQ-AQM middlebox (bottleneck 3) -> L4S receiver.

This is Sebastian's scenario.  We did have some discussion yesterday about the propensity of existing senders to produce line-rate bursts occasionally, and the way these bursts could collect in *all* of the queues at successively decreasing bottlenecks.  This is a test which explores that scenario and measures its effects, and is highly relevant to best consumer practice on today's Internet.

Naturally, we have tried the equivalent of most of the above scenarios on our SCE testbed already.  The only one we haven't explicitly tried out is #5; I think we'd need to use all of Pete's APUs plus at least one of my machines to set it up, and we were too tired for that last night.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-22 18:33                                                 ` Dave Taht
  2019-07-22 19:48                                                 ` Pete Heist
  2019-07-23 10:33                                                 ` [Ecn-sane] [tsvwg] Comments on L4S drafts Sebastian Moeller
  2 siblings, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-22 18:33 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: Jonathan Morton, Bob Briscoe, ecn-sane, Black, David, tsvwg, Dave Taht

Koen:

to be utterly clear the principal barrier to me evaluating dualpi at
any point was the patent. Still is - has the DCO issue been resolved?
But I did look at it and ran it after all this blew up and it's part
of my testbeds.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-22 18:33                                                 ` Dave Taht
@ 2019-07-22 19:48                                                 ` Pete Heist
  2019-07-25 16:14                                                   ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-23 10:33                                                 ` [Ecn-sane] [tsvwg] Comments on L4S drafts Sebastian Moeller
  2 siblings, 1 reply; 84+ messages in thread
From: Pete Heist @ 2019-07-22 19:48 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: Jonathan Morton, Bob Briscoe, ecn-sane, Black, David, tsvwg, Dave Taht

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

> On Jul 22, 2019, at 2:15 PM, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> - related to the flent testing, you might have expected to find big differences, but both measurements showed exactly the same results. I understood you need to extent your tools to get more measurement parameters included which were missing compared to ours.

On this point, this morning the ability to start multiple ping flows with different tos values for each was already added to flent (thanks to Toke), so that we can measure inter-flow latency separately for the classic and L4S queues. We added a few related plots to use this new feature.

Since 104, development and testing of SCE has been our focus, but work on testing and interop with L4S has begun. We have built the TCP Prague and sch_dualpi2 repos for use in our testbed. Some documentation on setup, including which kernels from which repos need to be deployed in which part of a dumbbell setup, and for example, what if any configuration is, like new sysctls or sysctl values, could be helpful. We have added some documentation to the README of our repo (https://github.com/chromi/sce/ <https://github.com/chromi/sce/>).

To editorialize a bit, I think we’re both aware that testing congestion control can take time and care. I believe that together we can figure out how to improve congestion control for people that use the Internet, and the different ways that they use it. We’ll try to think about them first and foremost. :)

[-- Attachment #2: Type: text/html, Size: 2091 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-22 18:33                                                 ` Dave Taht
  2019-07-22 19:48                                                 ` Pete Heist
@ 2019-07-23 10:33                                                 ` Sebastian Moeller
  2 siblings, 0 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-23 10:33 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp)
  Cc: Jonathan Morton, Bob Briscoe, Black, David, ecn-sane, tsvwg, Dave Taht

Hi Koen,


> On Jul 22, 2019, at 20:15, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Jonathan,
> 
> I'm a bit surprised to read what I read here... I had the impression that we were on a much better level of understanding during the hackathon and that :
> 
> - we both agreed that the latest updates in the Linux kernels had quite some impact on DCTCP's performance (burstyness) that both you and we are working on. As also our testbed showed it had the same impact on DualPI2 and FQ-Codel (yes we do understand FQ_Codel and did extensively compare DualQ with it since the beginning of L4S).
> - the current TCP-Prague we have in the public GitHub, which is DCTCP using accurate ECN and ect(1) and is drop compliant with Reno, is what SCE can use as well, and whatever you called SCE-TCP can be used for L4S, as (what I showed you mathematically) it is actually perfectly working according to DCTCP's law of 1/p, because it is DCTCP with some simple pacing tweaks you did. I thought we agreed that there is no difference in the congestion control part, and we want the same thing, and the only difference is how to use the code-point.
> - related to the testbed setups, we have several running, the first since 2013. We support all kernel versions since 3.19 up to the latest 5.2-rc5. We have demonstrated L4S since 2015 in IETF93 and the L4S BoF with real equipment and software that is still the same as we use today.
> - the testbed I brought (5 laptops and a switch that got broken during travel and I had to replace in the nearest shop), I had to install during the hackathon from scratch from our public GitHub (I arrived only at 14:00 on Saturday) which we made immediately available for you guys to put the flent testing tools on.
> - related to the flent testing, you might have expected to find big differences, but both measurements showed exactly the same results. I understood you need to extent your tools to get more measurement parameters included which were missing compared to ours.
> - we planned to complete your test list during this week and maybe best that we jointly report on the outcome of those to avoid different interpretations again.
> - anybody who had interest in L4S could have evaluated it since we made our DUALPI2 code available in 2015 (actually many did).

	`Well, at IETF 104 there was a promise on the lists of VMs with both endpoints for a L4S system, which as far as I can tell never materialized, which made me refrain from testing... And I believe I did ask/propose the VM thing on this very list and got no response.

> (To Dave That: if you wanted to evaluate DualPI2 you had plenty of opportunity, 4 years by now. I find it weird that suddenly you were not able to install a qdisc in Linux. Even if you wanted us to setup a testbed for you, you could have asked us.)
> 
> Maybe some good news too, we also had a (first time right) successful accurate ECN interop test between our Linux TCP-Prague and FreeBSD Reno (acc-ecn implementation provided by Richard Scheffenegger).
> 
> I hope these accusations of incompetence can stop now, and that we get to the point of finally getting a future looking low latency Internet deployed.

	??? sorry to be so negative, but the "getting [...] deployed" part is out of our control. 

> Anybody else who doubts on the performance/robustness of L4S, let me know and we arrange a test session this week.

	Not that it counts for much, but I am neither convinced of L4S reaching its stated performance or robustness goals under adversarial conditions and long RTTs. I am looking forward to the outcome of this weeks testing (and hope my concerns will have been unfounded).


Best Regards
	Sebastian


> 
> Koen.
> 
> 
> -----Original Message-----
> From: Jonathan Morton <chromatix99@gmail.com> 
> Sent: Sunday, July 21, 2019 6:01 PM
> To: Bob Briscoe <in@bobbriscoe.net>
> Cc: Sebastian Moeller <moeller0@gmx.de>; De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>; Black, David <David.Black@dell.com>; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht <dave@taht.net>
> Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
> 
>> On 21 Jul, 2019, at 7:53 am, Bob Briscoe <in@bobbriscoe.net> wrote:
>> 
>> Both teams brought their testbeds, and as of yesterday evening, Koen and Pete Heist had put the two together and started the tests Jonathan proposed. Usual problems: latest Linux kernel being used has introduced a bug, so need to wind back. But progressing.
>> 
>> Nonetheless, altho it's included in the tests, I don't see the particular concern with this 'Cake' scenario. How can "L4S flows crowd out more reactive RFC3168 flows" in "an RFC3168-compliant FQ-AQM". Whenever it would be happening, FQ would prevent it.
>> 
>> To ensure we're not continually being blown into the weeds, I thought the /only/ concern was about RFC3168-compliant /single-queue/ AQMs.
> 
> I drew up a list of five network topologies to test, each with the SCE set of tests and tools, but using mostly L4S network components and focused on L4S performance and robustness.
> 
> 
> 1: L4S sender -> L4S middlebox (bottleneck) -> L4S receiver.
> 
> This is simply a sanity check to make sure the tools worked.  Actually we fell over even at this stage yesterday, because we discovered problems in the system Bob and Koen had brought along to demo.  These may or may not be improved today; we'll see.
> 
> 
> 2: L4S sender -> FQ-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.
> 
> This is the most favourable-to-L4S topology that incorporates a non-L4S component that we could easily come up with, and therefore .  Apparently the L4S folks are also relatively unfamiliar with Codel, which is now the most widely deployed AQM in the world, and this would help to validate that L4S transports respond reasonably to it.
> 
> 
> 3: L4S sender -> single-AQM middlebox (bottleneck) -> L4S middlebox -> L4S receiver.
> 
> This is the topology of most concern, and is obtained from topology 2 by simply changing a parameter on our middlebox.
> 
> 
> 4: L4S sender -> ECT(1) mangler -> L4S middlebox (bottleneck) -> L4S receiver.
> 
> Exploring what happens if an adversary tries to game the system.  We could also try an ECT(0) mangler or a Not-ECT mangler, in the same spirit.
> 
> 
> 5: L4S sender -> L4S middlebox (bottleneck 1) -> Dumb FIFO (bottleneck 2) -> FQ-AQM middlebox (bottleneck 3) -> L4S receiver.
> 
> This is Sebastian's scenario.  We did have some discussion yesterday about the propensity of existing senders to produce line-rate bursts occasionally, and the way these bursts could collect in *all* of the queues at successively decreasing bottlenecks.  This is a test which explores that scenario and measures its effects, and is highly relevant to best consumer practice on today's Internet.
> 
> 
> Naturally, we have tried the equivalent of most of the above scenarios on our SCE testbed already.  The only one we haven't explicitly tried out is #5; I think we'd need to use all of Pete's APUs plus at least one of my machines to set it up, and we were too tired for that last night.
> 
> - Jonathan Morton


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-19 23:42                                         ` Dave Taht
@ 2019-07-24 16:21                                           ` Dave Taht
  0 siblings, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-24 16:21 UTC (permalink / raw)
  To: Wesley Eddy
  Cc: Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

On Fri, Jul 19, 2019 at 4:42 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Fri, Jul 19, 2019 at 3:09 PM Wesley Eddy <wes@mti-systems.com> wrote:
> >
> > Hi Dave, thanks for clarifying, and sorry if you're getting upset.
>
> There have been a few other disappointments this ietf. I'd hoped bbrv2
> would land for independent testing. Didn't.
>
> https://github.com/google/bbr
>
> I have some "interesting" patches for bbrv1 but felt it would be saner
> to wait for the most current version (or for the bbrv2 authors to
> have the small rfc3168 baseline patch I'd requested tested by them
> rather than I), to bother redoing that series of tests and publishing.

The bbrv2 code did indeed land yesterday (and - joy!) was accompanied
by test scripts for repeatable results. The iccrg preso was
impressive. thank you, thank you. It's going to take a while to
retofit my suggested simpler rfc3168 ecn handing, and or/sce, but not
as long as until next ietf.

> I'd asked if the dctcp and dualpi code on github was stable enough to
> be independently tested. No reply.

In poking through the most current git trees, I see this commit
finally installed into dctcp *sane behavior
in response to loss* which it didn't have before.

commit aecfde23108b8e637d9f5c5e523b24fb97035dc3
Author: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Date:   Thu Apr 4 12:24:02 2019 +0000
    tcp: Ensure DCTCP reacts to losses
...

Which explains a few things. Now I get to throw out 8 years of test
results and start over. And throw out most of yours, also. Please note
that seeing a bug fixed of this magnitude gives me joy. Perhaps many
issues I saw were due to this, not theory/spec failures. This brings
up another issue I'll start a new subject line for.

This commit looks to make a dent in the GRO issue I've raised periodically:

commit e3058450965972e67cc0e5492c08c4cdadafc134
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Apr 11 05:55:23 2019 -0700
    dctcp: more accurate tracking of packets delivery

After commit e21db6f69a95 ("tcp: track total bytes delivered with ECN CE marks")
core TCP stack does a very good job tracking ECN signals.

    The "sender's best estimate of CE information" Yuchung mentioned in his
    patch is indeed the best we can do.

    DCTCP can use tp->delivered_ce and tp->delivered to not duplicate the logic,
    and use the existing best estimate.

    This solves some problems, since current DCTCP logic does not deal
with losses
    and/or GRO or ack aggregation very well.

...

Still it's hard to mark multiple packets in a gso/gro bundle - cake
does gso splitting by default, dualpi
does not. Has tso/gro been enabled or disabled for other's tests so far?

> The SCE folk did freeze and document a release worth testing.

But it looks to me they were missing both these commits.

> I did some testing on wifi at battlemesh but it's too noisy (but the
> sources of "noise" were important) and too obviously "ecn is not the
> wifi problem"
>
> I didn't know there was an "add a delay based option to cubic patch"
> until last week.
>
> So anyway, I do retain hope, maybe after this coming week and some
> more hackathoning, it might be possible to start getting reproducible
> and repeatable results from more participants in this controversy.
> Having to sit through another half-dozen presentations with
> irreproducible results is not something I look forward to, and I'm
> glad I don't have to.
>
> > When we're talking about keeping very small queues, then RTT is lost as
> > a congestion indicator (since there is no queue depth to modulate as a
> > congestion signal into the RTT).  We have indicators that include drop,
> > RTT, and ECN (when available).  Using rate of marks rather than just
> > binary presence of marking gives a finer-grained signal.  SCE is also
> > providing a multi-level indication, so that's another way to get more
> > "ENOB" into the samples of congestion being fed to the controllers.
>
> While this is extremely well said, RTT is NOT lost as a congestion
> indicator, it just becomes finer grained.
>
> While I'm reading tea-leaves... there's been a lot of stuff landing in
> the linux kernel from google around edf scheduling for tcp and the
> hardware enabled pacing qdiscs. So I figure they are now in the nsec
> category on their stuff but not ready to be talking.
>
> > Marking (whether classic ECN, mark-rate, or multi-level marking) is
> > needed since with small queues there's lack of congestion information in
> > the RTT.
>
> small queues *and isochronous, high speed, wired connections*.
>
> What will it take to get the ecn and especially l4s crowd to take a
> hard look at actual wireless or wifi packet captures? I mean, y'all
> are sitting staring into your laptops for a week, doing wifi. Would it
> hurt to test more actual transports during
> that time?

I do keep hoping someone will attempt to publish some wifi results. I guess
that might end up being me, next time around.

>
> How many ISPs would still be in business if wifi didn't exist, only {X}G?
>
> the wifi at the last ietf sucked...
>
> Can't even get close to 5ms latencies on any form of wireless/wifi.
>
> Anyway, I long ago agreed that multiple marks (of some sort) per rtt
> made sense (see my position statements on ecn-sane),
> but of late I've been leaning more towards really good pacing,  rtt
> and chirping with minimal marking required on
> "small queues *and isochronous, high speed, wired connections*.
>
> >
> > To address one question you repeated a couple times:
> >
> > > Is there any chance we'll see my conception of the good ietf process
> > > enforced on the L4S and SCE processes by the chairs?
> >
> > We look for working group consensus.  So far, we saw consensus to adopt
> > as a WG item for experimental track, and have been following the process
> > for that.
>
> Well, given the announcement of docsis low latency, and the size of
> the fq_codel deployment,
> and the l4s/sce drafts, we are light-years beyond anything I'd
> consider to be "experimental" in the real world.
>
> Would recognizing this reality and somehow converting this to a
> standards track debate within the ietf help anything?
>
> Would getting this out of tsvwg and restarting aqmwg help any?
>
> I was, up until all this blew up in december, planning on starting the
> process for an rfc8289bis and rfc8290bis on the standards track.
>
> >
> > On the topic of gaming the system by falsely setting the L4S ID, that
> > might need to be discussed a little bit more, since now that you mention
> > it, the docs don't seem to very directly address it yet.
>
> to me this has always been a game theory deal killer for l4s (and
> diffserv, intserv, etc). You cannot ask for
> more priority, only less. While I've been recommending books from
> kleinrock lately, another one that
> I think everyone in this field should have is:
>
> https://www.amazon.com/Theory-Games-Economic-Behavior-Commemorative-ebook/dp/B00AMAZL4I/ref=sr_1_1?keywords=theory+of+games+and+economic+behavior&qid=1563579161&s=gateway&sr=8-1
>
> I've read it countless times (and can't claim to have understood more
> than a tiny percentage of it). I wasn't aware
> until this moment there was a kindle edition.
>
> > I can only
> > speak for myself, but assumed a couple things internally, such as (1)
> > this is getting enabled in specific environments, (2) in less controlled
> > environments, an operator enabling it has protections in place for
> > getting admission or dealing with bad behavior, (3) there could be
> > further development of audit capabilities such as in CONEX, etc.  I
> > guess it could be good to hear more about what others were thinking on this.
>
> I think there was "yet another queue" suggested for detected bad behavior.
>
> >
> > > So I should have said - "tosses all normal ("classic") flows into a
> > > single and higher latency queue when a greedy normal flow is present"
> > > ... "in the dualpi" case? I know it's possible to hang a different
> > > queue algo on the "normal" queue, but
> > > to this day I don't see the need for the l4s "fast lane" in the first
> > > place, nor a cpu efficient way of doing the right things with the
> > > dualpi or curvyred code. What I see, is, long term, that special bit
> > > just becomes a "fast" lane for any sort of admission controlled
> > > traffic the ISP wants to put there, because the dualpi idea fails on
> > > real traffic.
> >
> > Thanks; this was helpful for me to understand your position.
>
> Groovy.
>
> I recently ripped ecn support out of fq_codel entirely, in
> the fq_codel_fast tree. saved some cpu, still measuring (my real objective
> is to make that code multicore),
>
> another branch also has the basic sce support, and will have more
> after jon settles on a ramp and single queue fallbacks in
> sch_cake. btw, if anyone cares, there's more than a few flent test
> servers scattered around the internet now that
> do some variant of sce for others to play with....
>
> >
> >
> > > Well if the various WGs would exit that nice hotel, and form a
> > > diaspora over the city in coffee shops and other public spaces, and do
> > > some tests of your latest and greatest stuff, y'all might get a more
> > > accurate viewpoint of what you are actually accomplishing. Take a look
> > > at what BBR does, take a look at what IW10 does, take a look at what
> > > browsers currently do.
> >
> > All of those things come up in the meetings, and frequently there is
> > measurement data shown and discussed.  It's always welcome when people
> > bring measurements, data, and experience.  The drafts and other
> > contributions are here so that anyone interested can independently
> > implement and do the testing you advocate and share results.  We're all
> > on the same team trying to make the Internet better.
>
> Skip a meeting. Try the internet in Bali. Or africa. Or south america.
> Or on a boat, Or do an interim
> in places like that.
>
> >
> >
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-22 19:48                                                 ` Pete Heist
@ 2019-07-25 16:14                                                   ` De Schepper, Koen (Nokia - BE/Antwerp)
  2019-07-26 13:10                                                     ` Pete Heist
  0 siblings, 1 reply; 84+ messages in thread
From: De Schepper, Koen (Nokia - BE/Antwerp) @ 2019-07-25 16:14 UTC (permalink / raw)
  To: Pete Heist
  Cc: Jonathan Morton, Bob Briscoe, ecn-sane, Black,  David, tsvwg, Dave Taht

[-- Attachment #1: Type: text/plain, Size: 2624 bytes --]

All,

We have the testbed running our reference kernel version 3.19 with the drop patch. Let me know if you want to see the difference in behavior between the “good” DCTCP and the “deteriorated” DCTCP in the latest kernels too. There were several issues introduced which made DCTCP both more aggressive, and currently less aggressive. It calls for better regression tests (for Prague at least) to make sure it’s behavior is not changed too drastically by new updates. If enough people are interested, we can organize a session in one of the available rooms.

Pete, Jonathan,

Also for testing further your tests, let me know when you are available.

Koen.

From: Pete Heist <pete@heistp.net>
Sent: Monday, July 22, 2019 9:48 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Jonathan Morton <chromatix99@gmail.com>; Bob Briscoe <in@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net; Black, David <David.Black@dell.com>; tsvwg@ietf.org; Dave Taht <dave@taht.net>
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts

On Jul 22, 2019, at 2:15 PM, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com<mailto:koen.de_schepper@nokia-bell-labs.com>> wrote:

- related to the flent testing, you might have expected to find big differences, but both measurements showed exactly the same results. I understood you need to extent your tools to get more measurement parameters included which were missing compared to ours.

On this point, this morning the ability to start multiple ping flows with different tos values for each was already added to flent (thanks to Toke), so that we can measure inter-flow latency separately for the classic and L4S queues. We added a few related plots to use this new feature.

Since 104, development and testing of SCE has been our focus, but work on testing and interop with L4S has begun. We have built the TCP Prague and sch_dualpi2 repos for use in our testbed. Some documentation on setup, including which kernels from which repos need to be deployed in which part of a dumbbell setup, and for example, what if any configuration is, like new sysctls or sysctl values, could be helpful. We have added some documentation to the README of our repo (https://github.com/chromi/sce/).

To editorialize a bit, I think we’re both aware that testing congestion control can take time and care. I believe that together we can figure out how to improve congestion control for people that use the Internet, and the different ways that they use it. We’ll try to think about them first and foremost. :)

[-- Attachment #2: Type: text/html, Size: 5851 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-21 20:48                                             ` Sebastian Moeller
@ 2019-07-25 20:51                                               ` Bob Briscoe
  2019-07-25 21:17                                                 ` Bob Briscoe
  0 siblings, 1 reply; 84+ messages in thread
From: Bob Briscoe @ 2019-07-25 20:51 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

[-- Attachment #1: Type: text/plain, Size: 8382 bytes --]

Sebastien,

The protocol ID identifies the wire protocol, not the congestion control 
behaviour. If we had used a different protocol ID for each congestion 
control behaviour, we'd have run out of protocol IDs long ago (semi 
serious ;)

This is a re-run of a debate that has already been had (in Jul 2015 - 
Nov 2016), which is recorded in the appendix of ecn-l4s-id here:
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4
Quoted and annotated below:

> B.4.  Protocol ID
>
>     It has been suggested that a new ID in the IPv4 Protocol field or the
>     IPv6 Next Header field could identify L4S packets.  However this
>     approach is ruled out by numerous problems:
>
>     o  A new protocol ID would need to be paired with the old one for
>        each transport (TCP, SCTP, UDP, etc.);
>
>     o  In IPv6, there can be a sequence of Next Header fields, and it
>        would not be obvious which one would be expected to identify a
>        network service like L4S;

In particular, the protocol ID / next header stays next to the upper 
layer header as a PDU gets encapsulated, possibly many times. So the 
protocol ID is not necessarily (rarely?) in the outer, particularly in 
IPv6, and it might be encrypted in IPSec.

>     o  A new protocol ID would rarely provide an end-to-end service,
>        because It is well-known that new protocol IDs are often blocked
>        by numerous types of middlebox;
>
>     o  The approach is not a solution for AQMs below the IP layer;

That last point means that the protocol ID is not designed to always 
propagate to the outer on encap and back from the outer on decap, 
whereas the ECN field is (and it's the only field that is).




Bob

On 21/07/2019 16:48, Sebastian Moeller wrote:
> Dear Bob,
>
>> On Jul 21, 2019, at 21:14, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Sebastien,
>>
>> On 21/07/2019 17:08, Sebastian Moeller wrote:
>>> Hi Bob,
>>>
>>>
>>>
>>>> On Jul 21, 2019, at 14:30, Bob Briscoe <ietf@bobbriscoe.net>
>>>>   wrote:
>>>>
>>>> David,
>>>>
>>>> On 19/07/2019 21:06, Black, David wrote:
>>>>
>>>>> Two comments as an individual, not as a WG chair:
>>>>>
>>>>>
>>>>>> Mostly, they're things that an end-host algorithm needs
>>>>>> to do in order to behave nicely, that might be good things anyways
>>>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>>>> ones you think are too rigid ... maybe they can be loosened?
>>>>>>
>>>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>>>
>>>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>>>>>
>>>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>>>
>>>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>>>
>>>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
>>>>
>>> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>>>
>>> Best Regards
>>> 	Sebastian
>>>
>>>
>> I think you've understood this from reading abbreviated description of the requirement on the list, rather than the spec. The spec. solely says:
>> 	A scalable congestion control MUST detect loss by counting in time-based units
>> That's all. No more, no less.
>>
>> People call this the "RACK requirement", purely because the idea came from RACK. There is no requirement to do RACK, and the requirement applies to all transports, not just TCP.
> 	Fair enough, but my argument was not really about RACK at all, it more-so applies to the linear response to CE-marks that ECT(1) promises in the L4S approach. You are making changes to TCP's congestion controller that make it cease to be "TCP-friendly" (for arguably good reasons). So why insist on pretending that this is still TCP? So give it a new protocol ID already and all your classification needs are solved. As a bonus you do not need to use the same signal (CE) to elicit two different responses, but you could use the re-gained ECT(1) code point similarly to SCE to put the new fine-grained congestion signal into... while using CE in the RFC3168 compliant sense.
>
>
>> It then means that a packet with ECT1 in the IP field can be forwarded without resequencing (no requirement - it just it /can/ be).
> 	Packets always "can" be forwarded without resequencing, the question is whether the end-points are going to like that...
> And IMHO even RACK with its at maximum one RTT reordering windows gives intermediate hops not much to work with, without knowing the full RTT a cautious hop might allow itself one retransmission slot (so its own contribution to the RTT), but as far as I can tell they do that already. And tracking the RTT will require to keep per flow statistics, this also seems like it can get computationally expensive quickly... (I probably misunderstand how RACK works, but I fail to see how it will really allow more re-ordering, but that is also orthogonal to the L4S issues I try to raise).
>
>> This is a network layer 'unordered delivery' property, so it's appropriate to flag at the IP layer.
> 	But at that point you are multiplexing multiple things into the poor ECT(1) codepoint, the promise of a certain "linear" back-off behavior on encountered congestion AND a "allow relaxed ordering" ( "detect loss by counting in time-based units" does not seem to be fully equivalent with a generic tolerance to 'unordered delivery' as far as I understand). That seems asking to much of a simple number...
>
> Best Regards
> 	Sebastian
>
>>
>>
>>
>> Bob
>>
>>
>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoe
>> http://bobbriscoe.net/
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


[-- Attachment #2: Type: text/html, Size: 10743 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-25 20:51                                               ` Bob Briscoe
@ 2019-07-25 21:17                                                 ` Bob Briscoe
  2019-07-25 22:00                                                   ` Sebastian Moeller
  0 siblings, 1 reply; 84+ messages in thread
From: Bob Briscoe @ 2019-07-25 21:17 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

[-- Attachment #1: Type: text/plain, Size: 10231 bytes --]

Sebastien,

Sry, I sent that last reply too early, and not bottom posted. Both 
corrected below (tagged [BB]):


On 25/07/2019 16:51, Bob Briscoe wrote:
> Sebastien,
>
>
> On 21/07/2019 16:48, Sebastian Moeller wrote:
>> Dear Bob,
>>
>>> On Jul 21, 2019, at 21:14, Bob Briscoe<ietf@bobbriscoe.net>  wrote:
>>>
>>> Sebastien,
>>>
>>> On 21/07/2019 17:08, Sebastian Moeller wrote:
>>>> Hi Bob,
>>>>
>>>>
>>>>
>>>>> On Jul 21, 2019, at 14:30, Bob Briscoe<ietf@bobbriscoe.net>
>>>>>   wrote:
>>>>>
>>>>> David,
>>>>>
>>>>> On 19/07/2019 21:06, Black, David wrote:
>>>>>
>>>>>> Two comments as an individual, not as a WG chair:
>>>>>>
>>>>>>
>>>>>>> Mostly, they're things that an end-host algorithm needs
>>>>>>> to do in order to behave nicely, that might be good things anyways
>>>>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>>>>> ones you think are too rigid ... maybe they can be loosened?
>>>>>>>
>>>>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>>>>
>>>>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>>>>>>
>>>>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>>>>
>>>>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>>>>
>>>>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
>>>>>
>>>> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>>>>
>>>> Best Regards
>>>> 	Sebastian
>>>>
>>>>
>>> I think you've understood this from reading abbreviated description of the requirement on the list, rather than the spec. The spec. solely says:
>>> 	A scalable congestion control MUST detect loss by counting in time-based units
>>> That's all. No more, no less.
>>>
>>> People call this the "RACK requirement", purely because the idea came from RACK. There is no requirement to do RACK, and the requirement applies to all transports, not just TCP.
>> 	Fair enough, but my argument was not really about RACK at all, it more-so applies to the linear response to CE-marks that ECT(1) promises in the L4S approach. You are making changes to TCP's congestion controller that make it cease to be "TCP-friendly" (for arguably good reasons). So why insist on pretending that this is still TCP? So give it a new protocol ID already and all your classification needs are solved. As a bonus you do not need to use the same signal (CE) to elicit two different responses, but you could use the re-gained ECT(1) code point similarly to SCE to put the new fine-grained congestion signal into... while using CE in the RFC3168 compliant sense.

[BB] The protocol ID identifies the wire protocol, not the congestion 
control behaviour. If we had used a different protocol ID for each 
congestion control behaviour, we'd have run out of protocol IDs long ago 
(semi serious ;)

This is a re-run of a debate that has already been had (in Jul 2015 - 
Nov 2016), which is recorded in the appendix of ecn-l4s-id here:
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4
Quoted and annotated below:

> B.4.  Protocol ID
>
>     It has been suggested that a new ID in the IPv4 Protocol field or the
>     IPv6 Next Header field could identify L4S packets.  However this
>     approach is ruled out by numerous problems:
>
>     o  A new protocol ID would need to be paired with the old one for
>        each transport (TCP, SCTP, UDP, etc.);
>
>     o  In IPv6, there can be a sequence of Next Header fields, and it
>        would not be obvious which one would be expected to identify a
>        network service like L4S;

In particular, the protocol ID / next header stays next to the upper 
layer header as a PDU gets encapsulated, possibly many times. So the 
protocol ID is not necessarily (rarely?) in the outer, particularly in 
IPv6, and it might be encrypted in IPSec.

>     o  A new protocol ID would rarely provide an end-to-end service,
>        because It is well-known that new protocol IDs are often blocked
>        by numerous types of middlebox;
>
>     o  The approach is not a solution for AQMs below the IP layer;

That last point means that the protocol ID is not designed to always 
propagate to the outer on encap and back from the outer on decap, 
whereas the ECN field is (and it's the only field that is).

more....
>>
>>
>>> It then means that a packet with ECT1 in the IP field can be forwarded without resequencing (no requirement - it just it /can/ be).
>> 	Packets always "can" be forwarded without resequencing, the question is whether the end-points are going to like that...
>> And IMHO even RACK with its at maximum one RTT reordering windows gives intermediate hops not much to work with, without knowing the full RTT a cautious hop might allow itself one retransmission slot (so its own contribution to the RTT), but as far as I can tell they do that already. And tracking the RTT will require to keep per flow statistics, this also seems like it can get computationally expensive quickly... (I probably misunderstand how RACK works, but I fail to see how it will really allow more re-ordering, but that is also orthogonal to the L4S issues I try to raise).
[BB] No-one's suggesting reordering degree will adapt to measured RTT at 
run-time.

See the original discussion on this point here:
Vicious or Virtuous circle? Adapting reordering window to reordering 
degree 
<https://mailarchive.ietf.org/arch/msg/tcpm/QOhMjHEo2kbHGInH8eFEsXbdwkA>

In summary, the uncertainty for the network is a feature not a bug. It 
means it has to keep reordering degree lower than the lowest likely RTT 
(or some fraction of it) that is expected for that link technology at 
the design stage. This will keep reordering low, but not too 
unnecessarily low (i.e. not 3 packets at the link rate).

>>
>>> This is a network layer 'unordered delivery' property, so it's appropriate to flag at the IP layer.
>> 	But at that point you are multiplexing multiple things into the poor ECT(1) codepoint, the promise of a certain "linear" back-off behavior on encountered congestion AND a "allow relaxed ordering" ( "detect loss by counting in time-based units" does not seem to be fully equivalent with a generic tolerance to 'unordered delivery' as far as I understand). That seems asking to much of a simple number...
[BB] In a purist sense, it is a valid architectural criticism that we 
overload one codepoint with two architecturally distinct functions:

  * low queuing delay
  * low resequencing delay

But then, one has to consider the value vs cost of 2 independent 
identifiers for two things that are unlikely to ever need to be 
distinguished. If an app wants low delay, would it want only low queuing 
delay and not low resequencing delay?

You could contrive a case where the receiver is memory-challenged and 
needs the network to do the resequencing. But it's not a reasonable 
expectation for the network to do a function that will cause HoL 
blocking for other applications in the process of helping you with your 
memory problems.

Given we are header-bit-challenged, it would not be unreasonable for the 
WG to decide to conflate these two architectural identifiers into one.


Bob

>>
>> Best Regards
>> 	Sebastian
>>
>>>
>>> Bob
>>>
>>>
>>>
>>> -- 
>>> ________________________________________________________________
>>> Bob Briscoe
>>> http://bobbriscoe.net/
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
>
> -- 
> ________________________________________________________________
> Bob Briscoehttp://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


[-- Attachment #2: Type: text/html, Size: 14261 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-25 21:17                                                 ` Bob Briscoe
@ 2019-07-25 22:00                                                   ` Sebastian Moeller
  2019-07-26 10:20                                                     ` [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs Sebastian Moeller
  0 siblings, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-25 22:00 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Black, David, ecn-sane, tsvwg, Dave Taht

Dear Bob,

thanks for you time and insight. More comments below. I will try to follow your style.

> On Jul 25, 2019, at 23:17, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastien,
> 
> Sry, I sent that last reply too early, and not bottom posted. Both corrected below (tagged [BB]):
> 
> 
> On 25/07/2019 16:51, Bob Briscoe wrote:
>> Sebastien,
>> 
>> 
>> On 21/07/2019 16:48, Sebastian Moeller wrote:
>>> Dear Bob, 
>>> 
>>> 
>>>> On Jul 21, 2019, at 21:14, Bob Briscoe <ietf@bobbriscoe.net>
>>>>  wrote:
>>>> 
>>>> Sebastien,
>>>> 
>>>> On 21/07/2019 17:08, Sebastian Moeller wrote:
>>>> 
>>>>> Hi Bob,
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 21, 2019, at 14:30, Bob Briscoe <ietf@bobbriscoe.net>
>>>>>> 
>>>>>>  wrote:
>>>>>> 
>>>>>> David,
>>>>>> 
>>>>>> On 19/07/2019 21:06, Black, David wrote:
>>>>>> 
>>>>>> 
>>>>>>> Two comments as an individual, not as a WG chair:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Mostly, they're things that an end-host algorithm needs
>>>>>>>> to do in order to behave nicely, that might be good things anyways
>>>>>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>>>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>>>>>> ones you think are too rigid ... maybe they can be loosened?
>>>>>>>> 
>>>>>>>> 
>>>>>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>>>>> 
>>>>>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>>>>>>> 
>>>>>>> 
>>>>>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>>>>> 
>>>>>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>>>>> 
>>>>>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
>>>>>> 
>>>>>> 
>>>>> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>>>>> 
>>>>> Best Regards
>>>>> 	Sebastian
>>>>> 
>>>>> 
>>>>> 
>>>> I think you've understood this from reading abbreviated description of the requirement on the list, rather than the spec. The spec. solely says:
>>>> 	A scalable congestion control MUST detect loss by counting in time-based units
>>>> That's all. No more, no less. 
>>>> 
>>>> People call this the "RACK requirement", purely because the idea came from RACK. There is no requirement to do RACK, and the requirement applies to all transports, not just TCP.
>>>> 
>>> 	Fair enough, but my argument was not really about RACK at all, it more-so applies to the linear response to CE-marks that ECT(1) promises in the L4S approach. You are making changes to TCP's congestion controller that make it cease to be "TCP-friendly" (for arguably good reasons). So why insist on pretending that this is still TCP? So give it a new protocol ID already and all your classification needs are solved. As a bonus you do not need to use the same signal (CE) to elicit two different responses, but you could use the re-gained ECT(1) code point similarly to SCE to put the new fine-grained congestion signal into... while using CE in the RFC3168 compliant sense.
> 
> [BB] The protocol ID identifies the wire protocol, not the congestion control behaviour. If we had used a different protocol ID for each congestion control behaviour, we'd have run out of protocol IDs long ago (semi serious ;)


	[SM] Yes, I know, but you are proposing a massively incompatible "congestion control behaviour" for L4S that is not TCP-friendly, otherwise you would not need to deal with isolating your new style flows from the rest. For convenience (and since most of the other components are TCP-like) you package the whole thing as a congestion control module for TCP. My argument is, do not do that.
	As an aside, with this approach you are still at the mercy of OS and router manufacturers (okay Linux should be easy, but what is the plan of attack to get L4S behaviour into windows' TCP implementation; to me it seems your best bet would be to create a library for UDP that will do your L4S type response on top of UDP (you get resequencing tolerance for free ;) ), as long as you supply that library for all inportant OSes application writers can opt in without the need for OSes to change, but that is an aside.



> 
> This is a re-run of a debate that has already been had (in Jul 2015 - Nov 2016), which is recorded in the appendix of ecn-l4s-id here:
> https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4

	[SM] Read it there, I just believe that the final choice of identifier was not the optimal one (I know this is all about trade-offs, I just happen to have different priorities than the L4S project; IMHO all the power to L4S as long as it does stay opt-in and has ZERO side-effects on existing internet users).


> Quoted and annotated below:
> 
>> B.4.  Protocol ID
>> 
>>    It has been suggested that a new ID in the IPv4 Protocol field or the
>>    IPv6 Next Header field could identify L4S packets.  However this
>>    approach is ruled out by numerous problems:
>> 
>>    o  A new protocol ID would need to be paired with the old one for
>>       each transport (TCP, SCTP, UDP, etc.);

	[SM] That is somewhat weak, as you are a) currently only pushing a TCP version, and you might want a UDP version (see above), (how many applications use anything but TCP or UDP?)

>> 
>>    o  In IPv6, there can be a sequence of Next Header fields, and it
>>       would not be obvious which one would be expected to identify a
>>       network service like L4S;
>> 
> In particular, the protocol ID / next header stays next to the upper layer header as a PDU gets encapsulated, possibly many times. So the protocol ID is not necessarily (rarely?) in the outer, particularly in IPv6, and it might be encrypted in IPSec.

	[SM] So, at a peering/transit point, which encapsulations are actually realistic? I would have thought that more or less raw IP packets are required to make the necessary routing decisions at a network's edge, same argument holds for the internet access links. At which points besides the ingress and egress of a network do you expect queueing to happen routinely? From my limited experience it really is at ingress/egress/transit, so which other hops will actually be realistic targets for an L4S-AQM?
	I also am not yet convinced that ISPs will really want to signal that their peering/transits are under-sized, so I am dubious that these will ever get L4S/SCE style signaling (but I hope I am overly pessimistic here).


> 
>>    o  A new protocol ID would rarely provide an end-to-end service,
>>       because It is well-known that new protocol IDs are often blocked
>>       by numerous types of middlebox;

	[SM] Yes, that is the strongest of these four arguments, at last to my layman's eyes.


>> 
>>    o  The approach is not a solution for AQMs below the IP layer;
>> 
>> 
> That last point means that the protocol ID is not designed to always propagate to the outer on encap and back from the outer on decap, whereas the ECN field is (and it's the only field that is).

	[SM] Fair enough, as indicated above, I am not really seeing hops that deal in non-IP packets to actually ever use L4S/SCE type signalling, so is that really a big problem?


> 
> more....
>>> 
>>> 
>>> 
>>>> It then means that a packet with ECT1 in the IP field can be forwarded without resequencing (no requirement - it just it /can/ be).
>>>> 
>>> 	Packets always "can" be forwarded without resequencing, the question is whether the end-points are going to like that... 
>>> And IMHO even RACK with its at maximum one RTT reordering windows gives intermediate hops not much to work with, without knowing the full RTT a cautious hop might allow itself one retransmission slot (so its own contribution to the RTT), but as far as I can tell they do that already. And tracking the RTT will require to keep per flow statistics, this also seems like it can get computationally expensive quickly... (I probably misunderstand how RACK works, but I fail to see how it will really allow more re-ordering, but that is also orthogonal to the L4S issues I try to raise).
>>> 
> [BB] No-one's suggesting reordering degree will adapt to measured RTT at run-time. 

	[SM] I know, as that would defeat the purpose, but that also puts severe limits on how much re-ordering budget a given link actually has.

> 
> See the original discussion on this point here:
> Vicious or Virtuous circle? Adapting reordering window to reordering degree
> 
> In summary, the uncertainty for the network is a feature not a bug. It means it has to keep reordering degree lower than the lowest likely RTT (or some fraction of it) that is expected for that link technology at the design stage. This will keep reordering low, but not too unnecessarily low (i.e. not 3 packets at the link rate).

	[SM] As I state above, a given link realistically will only be allowed one of its own local RTTs worth of re-ordering (other links might re-order as well, so no link can claim the full e2E RTT's worth of re-ordering all for itself). So all I can see for each link one or (if the link feels lucky) two re-transmit opportunities before the link needs to stall to resequenced packets again. Now, that might already be enough (and a sufficiently "batchy" link might transfer more than 3 packets in one haul).
	I naively thought that a link would only ever stall those flows with out-of-order packets and happily fill its upstream pipe with packets from unaffected flows, but that seems not to be happening.


> 
>>> 
>>>> This is a network layer 'unordered delivery' property, so it's appropriate to flag at the IP layer. 
>>>> 
>>> 	But at that point you are multiplexing multiple things into the poor ECT(1) codepoint, the promise of a certain "linear" back-off behavior on encountered congestion AND a "allow relaxed ordering" ( "detect loss by counting in time-based units" does not seem to be fully equivalent with a generic tolerance to 'unordered delivery' as far as I understand). That seems asking to much of a simple number...
> [BB] In a purist sense, it is a valid architectural criticism that we overload one codepoint with two architecturally distinct functions:
> 	• low queuing delay
> 	• low resequencing delay
> But then, one has to consider the value vs cost of 2 independent identifiers for two things that are unlikely to ever need to be distinguished. If an app wants low delay, would it want only low queuing delay and not low resequencing delay? 

	[SM] Sorry, I can well envision apps that do not care about "low queuing delay" but would be happy to give laxer reordering requirements to the network (like a bulk data transfer, that just wants to keep pushing packets through). Is that unrealistic? 

> 
> You could contrive a case where the receiver is memory-challenged and needs the network to do the resequencing.

	Well, packets are send in sequence, so the idea is not to burden the network with undue work, but rather to faithfully transmit what the endpoints send. 
(On a tangent, somewhere else you argued against FQ as it will take the dynamic packet spacing decisions away from the sending endpoint, but surely changing the order of packets is a far more grave intervention than just changing the interpacket intervals, no?)

> But it's not a reasonable expectation for the network to do a function that will cause HoL blocking for other applications in the process of helping you with your memory problems.
> 
> Given we are header-bit-challenged, it would not be unreasonable for the WG to decide to conflate these two architectural identifiers into one.
> 
> 
> Bob
> 
>>> 
>>> Best Regards
>>> 	Sebastian
>>> 
>>> 
>>>> 
>>>> Bob
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> ________________________________________________________________
>>>> Bob Briscoe                               
>>>> 
>>>> http://bobbriscoe.net/
>>> _______________________________________________
>>> Ecn-sane mailing list
>>> 
>>> Ecn-sane@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/ecn-sane
>> 
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               
>> http://bobbriscoe.net/
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Ecn-sane]  [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-25 22:00                                                   ` Sebastian Moeller
@ 2019-07-26 10:20                                                     ` Sebastian Moeller
  2019-07-26 14:10                                                       ` Black, David
  0 siblings, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-26 10:20 UTC (permalink / raw)
  To: Bob Briscoe
  Cc: Black, David, ecn-sane, tsvwg, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)

Dear Bob,

we have been going through the consequences and side effects of re-defining the meaning of a CE-mark for L4S-flows and using ECT(1) as a flllow-classifying heuristic.
One of the side-effects is that  a single queue ecn-enabled AQM will CE-marl L4S packets, expecting a strong reduction in sending rate, while the L4S endpoints will only respond to that signal with a mild rate-reduction. One of the consequences of this behaviour is that L4S flows will crowd out RFC3168 and non-ECN flows, because these flows half their rates on drop or CE-mark (approximately) making congestion go away with the end result that the L4S flows gain an undesired advantage, at least that is my interpretation of the discussion so far.
Now there are two options to deal with this issue, one is to declare it insignificant and just ignore it, or to make L4S endpoints detect that condition and revert back to RFC3168 behaviour.
The first option seems highly undesirable to me, as a) (TCP-friendly) single queue RFC3168 AQM are standards compliant and will be for the foreseeable future, so ms making them ineffective seems like a no-go to me (could someone clarify what the IETF's official stance is on this matter, please?), b) I would expect most of such AQMs to be instantiated close to/at the consu,er's edge of the internet, making it really hard to ameasure their prevalence.
In short, I believe the only sane way forward is to teach L4S endpoints to to the right thing under such conditions, I believe this would not be too onerous an ask, given that the configuration is easy to set up for testing and development and a number of ideas have already been theoretically discussed here. As far as I can see these ideas mostly riff on the idea that such anAQM will, under congesation conditions, increase each ftraversing flow's RTT and that should be quickly and robustly detectable. I would love to learn more about these ideas and the state of development and testing.

Best Regards & many thanks in advance
	Sebastian Moeller

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
  2019-07-25 16:14                                                   ` De Schepper, Koen (Nokia - BE/Antwerp)
@ 2019-07-26 13:10                                                     ` Pete Heist
  2019-07-26 15:05                                                       ` [Ecn-sane] The state of l4s, bbrv2, sce? Dave Taht
  0 siblings, 1 reply; 84+ messages in thread
From: Pete Heist @ 2019-07-26 13:10 UTC (permalink / raw)
  To: De Schepper, Koen (Nokia - BE/Antwerp); +Cc: ecn-sane, tsvwg

> On Jul 25, 2019, at 12:14 PM, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> We have the testbed running our reference kernel version 3.19 with the drop patch. Let me know if you want to see the difference in behavior between the “good” DCTCP and the “deteriorated” DCTCP in the latest kernels too. There were several issues introduced which made DCTCP both more aggressive, and currently less aggressive. It calls for better regression tests (for Prague at least) to make sure it’s behavior is not changed too drastically by new updates. If enough people are interested, we can organize a session in one of the available rooms.
>  
> Pete, Jonathan,
>  
> Also for testing further your tests, let me know when you are available.

Regarding testing, we now have a five node setup in our test environment running a mixture of tcp-prague and dualq kernels to cover the scenarios Jon outlined earlier. With what little time we’ve had for it this week, we’ve only done some basic tests, and seem to be seeing behavior similar to what we saw at the hackathon, but we can discuss specific results following IETF 105.

Our intention is to coordinate a public effort to create reproducible test scenarios for L4S using flent. Details to follow post-conference. We do feel it’s important that all of our Linux testing be on modern 5.1+ kernels, as the 3.19 series was end of life as of May 2015 (https://lwn.net/Articles/643934/), so we'll try to keep up to date with any patches you might have for the newer kernels.

Overall, I think we’ve improved the cooperation between the teams this week (from zero to a little bit :), which should hopefully help move both projects along...

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 10:20                                                     ` [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs Sebastian Moeller
@ 2019-07-26 14:10                                                       ` Black, David
  2019-07-26 16:06                                                         ` Sebastian Moeller
  2019-07-26 16:15                                                         ` Holland, Jake
  0 siblings, 2 replies; 84+ messages in thread
From: Black, David @ 2019-07-26 14:10 UTC (permalink / raw)
  To: Sebastian Moeller, Bob Briscoe
  Cc: ecn-sane, tsvwg, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp),
	Black, David

Inline comment on "IETF's official stance":

> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
> ms making them ineffective seems like a no-go to me (could someone clarify
> what the IETF's official stance is on this matter, please?),

The IETF expects that all relevant technical concerns such as this one will be raised by participants and will be carefully considered by the WG in determining what to do.

That was the technical answer, now for the official [officious? :-) ] answer ... the current L4S drafts do not modify RFC 3168 beyond the modifications already made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least one of the L4S drafts has to further modify RFC 3168, please bring that up with a specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs further modification.

Thanks, --David

> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de>
> Sent: Friday, July 26, 2019 6:20 AM
> To: Bob Briscoe
> Cc: Black, David; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht; De
> Schepper, Koen (Nokia - BE/Antwerp)
> Subject: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
> 
> 
> [EXTERNAL EMAIL]
> 
> Dear Bob,
> 
> we have been going through the consequences and side effects of re-defining
> the meaning of a CE-mark for L4S-flows and using ECT(1) as a flllow-classifying
> heuristic.
> One of the side-effects is that  a single queue ecn-enabled AQM will CE-marl L4S
> packets, expecting a strong reduction in sending rate, while the L4S endpoints
> will only respond to that signal with a mild rate-reduction. One of the
> consequences of this behaviour is that L4S flows will crowd out RFC3168 and
> non-ECN flows, because these flows half their rates on drop or CE-mark
> (approximately) making congestion go away with the end result that the L4S
> flows gain an undesired advantage, at least that is my interpretation of the
> discussion so far.
> Now there are two options to deal with this issue, one is to declare it
> insignificant and just ignore it, or to make L4S endpoints detect that condition
> and revert back to RFC3168 behaviour.
> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
> ms making them ineffective seems like a no-go to me (could someone clarify
> what the IETF's official stance is on this matter, please?), b) I would expect most
> of such AQMs to be instantiated close to/at the consu,er's edge of the internet,
> making it really hard to ameasure their prevalence.
> In short, I believe the only sane way forward is to teach L4S endpoints to to the
> right thing under such conditions, I believe this would not be too onerous an ask,
> given that the configuration is easy to set up for testing and development and a
> number of ideas have already been theoretically discussed here. As far as I can
> see these ideas mostly riff on the idea that such anAQM will, under congesation
> conditions, increase each ftraversing flow's RTT and that should be quickly and
> robustly detectable. I would love to learn more about these ideas and the state
> of development and testing.
> 
> Best Regards & many thanks in advance
> 	Sebastian Moeller

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Ecn-sane] The state of l4s, bbrv2, sce?
  2019-07-26 13:10                                                     ` Pete Heist
@ 2019-07-26 15:05                                                       ` Dave Taht
  2019-07-26 15:32                                                         ` Dave Taht
  2019-07-26 15:37                                                         ` Neal Cardwell
  0 siblings, 2 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-26 15:05 UTC (permalink / raw)
  To: Pete Heist
  Cc: De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg, Neal Cardwell

Changing the title....

I hope to be able to add some features and boxes to the worldwide
flent fleet to gather up some more data. Simple stuff includes trying
to verify more fully worldwide what happens when you twiddle the ecn
bits, mildly longer term look at what happens when conflicting
interpretations
of these bits are in play somewhere on the path, bit longer than that
getting an openwrt build up as a middlebox and vm, and then finally,
finally
see what happens on a couple kinds of wifi.

There's now a flent server in mumbai, in particular, which I hope will
shed some insight as to the state of networks in india, long term, on
a variety
of fronts. But none of it's ready lacking a good release to freeze on.

1) BBRv2 is now available for public hacking. I had a good readthrough
last night.

The published tree applies cleanly (with a small patch) to net-next.
I've had a chance to read through the code (lots of good changes to
bbr!).

Although neal was careful to say in iccrg the optional ecn mode uses
"dctcp/l4s-style signalling", he did not identify how that was
actually applied
at the middleboxes, and the supplied test scripts
(gtests/net/tcp/bbr/nsperf) don't do that. All we know is that it's
set to kick in at 20 packets. Is it fq_codel's ce_threshold? red? pie?
dualpi? Does it revert to drop on overload?

Is it running on bare metal? 260us is at the bare bottom of what linux
can schedule reliably, vms are much worse.

Couple notes:

BBRv2 doesn't use ect(1) as an identifier.

The chromium release has no support for ecn at all.

Adding back in the stuff I'd first done to rfc3168 bbrv1 looks
straightforward, making it do sce, less so.

2) To clarify something from the l4s team, are the results you've been
presenting for years all from the 3.19 kernel? bsd? microsoft? ns2?
ns3? what?

The code on github is not worth testing against currently? It does
have some needed features like a setsockopt for using up ect(1).

should I use the issue tracker for that? I have some comments on
dualpi in addition to my outstanding question about pie's default of
drop at 10% mark
rate vs dualpi's 0. Notably it's set to 1000 packets now (fq_codel
defaults to 10,000 and we switched to memory limits both in it and
cake given a modern
packet's dynamic range of 64b to 64k). I've observed 10gige can be in
the 2-3k packets range... has dualpi been tested above 1gige yet?

3) The current patches for sce need to get rebased for net-next. The
sch_cake mods are easy but as the dctcp code did morph a bit since sce
work forked it as did the other tcps. I took a stab at forward porting
it to net-next, but I figure that development is hot and heavy and
some patches will land after ietf. I do not mind taking a stab again
at cleaning it up (helps me to understand what's going on), as how the
algos currently (as of, like, yesterday) work is clear to me... what
I'd like to do at least is also add 'em to the out of tree
fq_codel_fast implementation.

Did I miss anything about the current state of things?

My basic testbed is a string of containers on a couple 12 core boxes
on bare metal, and more advanced is the openwrt stuff part of my wifi
lab. That's
presently almost all 4.14 based on arm, mips, and x86, running both on
real hardware and in emulation.

On Fri, Jul 26, 2019 at 6:10 AM Pete Heist <pete@heistp.net> wrote:
>
>
> > On Jul 25, 2019, at 12:14 PM, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> >
> > We have the testbed running our reference kernel version 3.19 with the drop patch. Let me know if you want to see the difference in behavior between the “good” DCTCP and the “deteriorated” DCTCP in the latest kernels too. There were several issues introduced which made DCTCP both more aggressive, and currently less aggressive. It calls for better regression tests (for Prague at least) to make sure it’s behavior is not changed too drastically by new updates. If enough people are interested, we can organize a session in one of the available rooms.
> >
> > Pete, Jonathan,
> >
> > Also for testing further your tests, let me know when you are available.
>
> Regarding testing, we now have a five node setup in our test environment running a mixture of tcp-prague and dualq kernels to cover the scenarios Jon outlined earlier. With what little time we’ve had for it this week, we’ve only done some basic tests, and seem to be seeing behavior similar to what we saw at the hackathon, but we can discuss specific results following IETF 105.
>
> Our intention is to coordinate a public effort to create reproducible test scenarios for L4S using flent. Details to follow post-conference. We do feel it’s important that all of our Linux testing be on modern 5.1+ kernels, as the 3.19 series was end of life as of May 2015 (https://lwn.net/Articles/643934/), so we'll try to keep up to date with any patches you might have for the newer kernels.
>
> Overall, I think we’ve improved the cooperation between the teams this week (from zero to a little bit :), which should hopefully help move both projects along...
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] The state of l4s, bbrv2, sce?
  2019-07-26 15:05                                                       ` [Ecn-sane] The state of l4s, bbrv2, sce? Dave Taht
@ 2019-07-26 15:32                                                         ` Dave Taht
  2019-07-26 15:37                                                         ` Neal Cardwell
  1 sibling, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-26 15:32 UTC (permalink / raw)
  To: Pete Heist
  Cc: De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg, Neal Cardwell

I did miss a couple details

On Fri, Jul 26, 2019 at 8:05 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> Changing the title....
>
> I hope to be able to add some features and boxes to the worldwide
> flent fleet to gather up some more data. Simple stuff includes trying
> to verify more fully worldwide what happens when you twiddle the ecn
> bits, mildly longer term look at what happens when conflicting
> interpretations
> of these bits are in play somewhere on the path, bit longer than that
> getting an openwrt build up as a middlebox and vm, and then finally,
> finally
> see what happens on a couple kinds of wifi.
>
> There's now a flent server in mumbai, in particular, which I hope will
> shed some insight as to the state of networks in india, long term, on
> a variety
> of fronts. But none of it's ready lacking a good release to freeze on.
>
> 1) BBRv2 is now available for public hacking. I had a good readthrough
> last night.
>
> The published tree applies cleanly (with a small patch) to net-next.
> I've had a chance to read through the code (lots of good changes to
> bbr!).
>
> Although neal was careful to say in iccrg the optional ecn mode uses
> "dctcp/l4s-style signalling", he did not identify how that was
> actually applied
> at the middleboxes, and the supplied test scripts
> (gtests/net/tcp/bbr/nsperf) don't do that. All we know is that it's
> set to kick in at 20 packets. Is it fq_codel's ce_threshold? red? pie?
> dualpi? Does it revert to drop on overload?
>
> Is it running on bare metal? 260us is at the bare bottom of what linux
> can schedule reliably, vms are much worse.
>
> Couple notes:
>
> BBRv2 doesn't use ect(1) as an identifier.
>
> The chromium release has no support for ecn at all.
>
> Adding back in the stuff I'd first done to rfc3168 bbrv1 looks
> straightforward, making it do sce, less so.

I note that at lower rates a cap of cwnd 2 instead of 4 seems seems feasible.

> 2) To clarify something from the l4s team, are the results you've been
> presenting for years all from the 3.19 kernel? bsd? microsoft? ns2?
> ns3? what?
>
> The code on github is not worth testing against currently? It does
> have some needed features like a setsockopt for using up ect(1).

Were these tests with gro/tso enabled?

> should I use the issue tracker for that? I have some comments on
> dualpi in addition to my outstanding question about pie's default of
> drop at 10% mark
> rate vs dualpi's 0. Notably it's set to 1000 packets now (fq_codel
> defaults to 10,000 and we switched to memory limits both in it and
> cake given a modern
> packet's dynamic range of 64b to 64k). I've observed 10gige can be in
> the 2-3k packets range... has dualpi been tested above 1gige yet?
>
> 3) The current patches for sce need to get rebased for net-next. The
> sch_cake mods are easy but as the dctcp code did morph a bit since sce
> work forked it as did the other tcps. I took a stab at forward porting
> it to net-next, but I figure that development is hot and heavy and
> some patches will land after ietf. I do not mind taking a stab again
> at cleaning it up (helps me to understand what's going on), as how the
> algos currently (as of, like, yesterday) work is clear to me... what
> I'd like to do at least is also add 'em to the out of tree
> fq_codel_fast implementation.

Another issue on the tcp front in this patchset was disabling iw10 as
a burst. I do strongly agree with that, pacing it,
and or reverting to iw4, then pacing (as it's not been taken up by
netbsd or osx either) would make this stuff gentler at lower rates.

Is the ramp function as needed with iw4 in play?

>
> Did I miss anything about the current state of things?
>
> My basic testbed is a string of containers on a couple 12 core boxes
> on bare metal, and more advanced is the openwrt stuff part of my wifi
> lab. That's
> presently almost all 4.14 based on arm, mips, and x86, running both on
> real hardware and in emulation.
>
> On Fri, Jul 26, 2019 at 6:10 AM Pete Heist <pete@heistp.net> wrote:
> >
> >
> > > On Jul 25, 2019, at 12:14 PM, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> > >
> > > We have the testbed running our reference kernel version 3.19 with the drop patch. Let me know if you want to see the difference in behavior between the “good” DCTCP and the “deteriorated” DCTCP in the latest kernels too. There were several issues introduced which made DCTCP both more aggressive, and currently less aggressive. It calls for better regression tests (for Prague at least) to make sure it’s behavior is not changed too drastically by new updates. If enough people are interested, we can organize a session in one of the available rooms.
> > >
> > > Pete, Jonathan,
> > >
> > > Also for testing further your tests, let me know when you are available.
> >
> > Regarding testing, we now have a five node setup in our test environment running a mixture of tcp-prague and dualq kernels to cover the scenarios Jon outlined earlier. With what little time we’ve had for it this week, we’ve only done some basic tests, and seem to be seeing behavior similar to what we saw at the hackathon, but we can discuss specific results following IETF 105.
> >
> > Our intention is to coordinate a public effort to create reproducible test scenarios for L4S using flent. Details to follow post-conference. We do feel it’s important that all of our Linux testing be on modern 5.1+ kernels, as the 3.19 series was end of life as of May 2015 (https://lwn.net/Articles/643934/), so we'll try to keep up to date with any patches you might have for the newer kernels.
> >
> > Overall, I think we’ve improved the cooperation between the teams this week (from zero to a little bit :), which should hopefully help move both projects along...
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/ecn-sane
>
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] The state of l4s, bbrv2, sce?
  2019-07-26 15:05                                                       ` [Ecn-sane] The state of l4s, bbrv2, sce? Dave Taht
  2019-07-26 15:32                                                         ` Dave Taht
@ 2019-07-26 15:37                                                         ` Neal Cardwell
  2019-07-26 15:45                                                           ` Dave Taht
  1 sibling, 1 reply; 84+ messages in thread
From: Neal Cardwell @ 2019-07-26 15:37 UTC (permalink / raw)
  To: Dave Taht
  Cc: Pete Heist, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

[-- Attachment #1: Type: text/plain, Size: 1992 bytes --]

On Fri, Jul 26, 2019 at 11:05 AM Dave Taht <dave.taht@gmail.com> wrote:

> 1) BBRv2 is now available for public hacking. I had a good readthrough
> last night.
>
> The published tree applies cleanly (with a small patch) to net-next.
> I've had a chance to read through the code (lots of good changes to
> bbr!).
>
> Although neal was careful to say in iccrg the optional ecn mode uses
> "dctcp/l4s-style signalling", he did not identify how that was
> actually applied
> at the middleboxes, and the supplied test scripts
> (gtests/net/tcp/bbr/nsperf) don't do that. All we know is that it's
> set to kick in at 20 packets. Is it fq_codel's ce_threshold? red? pie?
> dualpi? Does it revert to drop on overload?
>

As mentioned in the ICCRG session, the TCP source tree includes the scripts
used to run the tests and generate the graphs in the slide deck. Here is
the commit I was mentioning:

https://github.com/google/bbr/commit/e76d4f89b0c42d5409d34c48ee6f8d32407d4b8d

So you can look at exactly how each test was run, and re-run those tests
yourself, with the v2alpha code or any experimental tweaks you might make
beyond that.

To answer your particular question, the ECN marks were from a bottleneck
qdisc configured as:

  codel ce_threshold 242us limit 1000 target 100ms

I'm not claiming that's necessarily the best mechanism or set of parameters
to set ECN marks. The 20-packet number comes from the DCTCP SIGCOMM 2010
paper's recommendation for 1Gbps bottlenecks. I just picked this kind of
approach because the bare metal router/switch hardware varies, so this is a
simple and easy way for everyone to experiment with the exact same ECN
settings.

Is it running on bare metal? 260us is at the bare bottom of what linux
> can schedule reliably, vms are much worse.

I have tried both VMs and bare metal with those scripts, and of course the
VMs are quite noisy and the bare metal results much less noisy. So the
graphs are from runs on bare metal x86 server-class machines.

neal

[-- Attachment #2: Type: text/html, Size: 2831 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] The state of l4s, bbrv2, sce?
  2019-07-26 15:37                                                         ` Neal Cardwell
@ 2019-07-26 15:45                                                           ` Dave Taht
  0 siblings, 0 replies; 84+ messages in thread
From: Dave Taht @ 2019-07-26 15:45 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Pete Heist, De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg

On Fri, Jul 26, 2019 at 8:37 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Jul 26, 2019 at 11:05 AM Dave Taht <dave.taht@gmail.com> wrote:
>>
>> 1) BBRv2 is now available for public hacking. I had a good readthrough
>> last night.
>>
>> The published tree applies cleanly (with a small patch) to net-next.
>> I've had a chance to read through the code (lots of good changes to
>> bbr!).
>>
>> Although neal was careful to say in iccrg the optional ecn mode uses
>> "dctcp/l4s-style signalling", he did not identify how that was
>> actually applied
>> at the middleboxes, and the supplied test scripts
>> (gtests/net/tcp/bbr/nsperf) don't do that. All we know is that it's
>> set to kick in at 20 packets. Is it fq_codel's ce_threshold? red? pie?
>> dualpi? Does it revert to drop on overload?
>
>
> As mentioned in the ICCRG session, the TCP source tree includes the scripts used to run the tests and generate the graphs in the slide deck. Here is the commit I was mentioning:
>
>    https://github.com/google/bbr/commit/e76d4f89b0c42d5409d34c48ee6f8d32407d4b8d
>
> So you can look at exactly how each test was run, and re-run those tests yourself, with the v2alpha code or any experimental tweaks you might make beyond that.
>
> To answer your particular question, the ECN marks were from a bottleneck qdisc configured as:
>
>   codel ce_threshold 242us limit 1000 target 100ms

thx neal! I missed that!

> I'm not claiming that's necessarily the best mechanism or set of parameters to set ECN marks. The 20-packet number comes from the DCTCP SIGCOMM 2010 paper's recommendation for 1Gbps bottlenecks. I just picked this kind of approach because the bare metal router/switch hardware varies, so this is a simple and easy way for everyone to experiment with the exact same ECN settings.

ok!

>
>> Is it running on bare metal? 260us is at the bare bottom of what linux
>> can schedule reliably, vms are much worse.
>
>
> I have tried both VMs and bare metal with those scripts, and of course the VMs are quite noisy and the bare metal results much less noisy. So the graphs are from runs on bare metal x86 server-class machines.

Good to know. On the cloud I use (linode) 1ms was the best I could
hope for, and even then, dang jittery. (it was much worse 8 years
back when xen underneath could be in the 10-20ms range!).

There are major jitter issues on lower end hardware but I don't know
how bad they are post spectre fixes, been afraid to look.

containers are a huge improvement over vms but still break things like tsq.

>
> neal
>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 14:10                                                       ` Black, David
@ 2019-07-26 16:06                                                         ` Sebastian Moeller
  2019-07-26 19:58                                                           ` Black, David
  2019-07-26 16:15                                                         ` Holland, Jake
  1 sibling, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-26 16:06 UTC (permalink / raw)
  To: Black, David
  Cc: Bob Briscoe, ecn-sane, tsvwg, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)

Dear David,

thanks for your clearing things up. I see, I should have read deeper into the relevant "web" of RFCs before asking.

Am I correct in interpreting the following  sentence from RFC 8311:
"ECN experiments are expected to coexist with deployed ECN
   functionality, with the responsibility for that coexistence falling
   primarily upon designers of experimental changes to ECN."
as meaning, that L4S will need to implement the long discussed fall-back to RFC3168 compliant responses to CE marks, if a RFC3168 AQM is detected as being active on a path, and that L4S endpoint need to closely monitor for signs of RFC3168 behavior? I ask because section 4.1 fails to put in those safe-guard clauses explicitly (in my reading this effectively says anything goes, as long as it is defined in its own RFC)

Now looking at the L4S RFC I see (https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-04#page-21 (assuming that this is one of the RFCs required to allow the exemption according to RFC8311)):

"Classic ECN support is starting to materialize on the Internet as an
   increased level of CE marking.  Given some of this Classic ECN might
   be due to single-queue ECN deployment, an L4S sender will have to
   fall back to a classic ('TCP-Friendly') behaviour if it detects that
   ECN marking is accompanied by greater queuing delay or greater delay
   variation than would be expected with L4S (see Appendix A.1.4 of [I-D.ietf-tsvwg-ecn-l4s-id]).  
   It is hard to detect whether this is
   all due to the addition of support for ECN in the Linux
   implementation of FQ-CoDel, which would not require fall-back to
   Classic behaviour, because FQ inherently forces the throughput of
   each flow to be equal irrespective of its aggressiveness."

Which I believe to be problematic, as it conflates issues. The problem with L4S-CE response on non L4S-AQMs is that it will give L4S flows an unfair and unexpected advantage, so L4S endpoints should aim at detecting non-L4S AQMs on the path and not (just) "that ECN marking is accompanied by greater queuing delay or greater delay variation than would be expected with L4S". Sure delay variations can be a eans of trying to detect such an AQM, but this text basically gives L4S the license to just look at RTT variations and declare victory if these stay below an arbitrary threshold.
	Also I voiced concerns about the rationale for excluding RFC3168 FQ-AQMs from this fall-back treatment, and gave an explicit example of a system in use (post-true bottleneck ingress shaping) that I would like to see to be tested first. This should be easy to test (and as far as I know these tests are planned if not already done) so that the RFC can either be amended with a link to the data showing that this is harmless, or changed ot indicate that the fall-back might also be required for FQ-AQMs under certain conditions.

Now if I look at https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#page-25, I see the following:

"A.1.4.  Fall back to Reno-friendly congestion control on classic ECN bottlenecks

   Description: A scalable congestion control needs to react to ECN
   marking from a non-L4S but ECN-capable bottleneck in a way that will
   coexist with a TCP Reno congestion control [RFC5681].

   Motivation: Similarly to the requirement in Appendix A.1.3, this
   requirement is a safety condition to ensure a scalable congestion
   control behaves properly when it builds a queue at a network
   bottleneck that has not been upgraded to support L4S.  On detecting
   classic ECN marking (see below), a scalable congestion control will
   need to fall back to classic congestion control behaviour.  If it
   does not comply with this requirement it could starve classic
   traffic.

   It would take time for endpoints to distinguish classic and L4S ECN
   marking.  An increase in queuing delay or in delay variation would be
   a tell-tale sign, but it is not yet clear where a line would be drawn
   between the two behaviours.  It might be possible to cache what was
   learned about the path to help subsequent attempts to detect the type
   of marking."

Here, the special casing of FQ-AQMs does not seem to be present, which L4S RFC will have precedence here?

Anyway, am I correct in interpreting all of the above as a clear an unambiguous requirement for L4S components like TCP-Prague to implement RFC3168-AQM detection and fall-back to appropriate behavior before being given the permission for usage on the wider internet?

Best Regards
	Sebastian

> On Jul 26, 2019, at 16:10, Black, David <David.Black@dell.com> wrote:
> 
> Inline comment on "IETF's official stance":
> 
>> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
>> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
>> ms making them ineffective seems like a no-go to me (could someone clarify
>> what the IETF's official stance is on this matter, please?),
> 
> The IETF expects that all relevant technical concerns such as this one will be raised by participants and will be carefully considered by the WG in determining what to do.
> 
> That was the technical answer, now for the official [officious? :-) ] answer ... the current L4S drafts do not modify RFC 3168 beyond the modifications already made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least one of the L4S drafts has to further modify RFC 3168, please bring that up with a specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs further modification.
> 
> Thanks, --David
> 
>> -----Original Message-----
>> From: Sebastian Moeller <moeller0@gmx.de>
>> Sent: Friday, July 26, 2019 6:20 AM
>> To: Bob Briscoe
>> Cc: Black, David; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht; De
>> Schepper, Koen (Nokia - BE/Antwerp)
>> Subject: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
>> 
>> 
>> [EXTERNAL EMAIL]
>> 
>> Dear Bob,
>> 
>> we have been going through the consequences and side effects of re-defining
>> the meaning of a CE-mark for L4S-flows and using ECT(1) as a flllow-classifying
>> heuristic.
>> One of the side-effects is that  a single queue ecn-enabled AQM will CE-marl L4S
>> packets, expecting a strong reduction in sending rate, while the L4S endpoints
>> will only respond to that signal with a mild rate-reduction. One of the
>> consequences of this behaviour is that L4S flows will crowd out RFC3168 and
>> non-ECN flows, because these flows half their rates on drop or CE-mark
>> (approximately) making congestion go away with the end result that the L4S
>> flows gain an undesired advantage, at least that is my interpretation of the
>> discussion so far.
>> Now there are two options to deal with this issue, one is to declare it
>> insignificant and just ignore it, or to make L4S endpoints detect that condition
>> and revert back to RFC3168 behaviour.
>> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
>> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
>> ms making them ineffective seems like a no-go to me (could someone clarify
>> what the IETF's official stance is on this matter, please?), b) I would expect most
>> of such AQMs to be instantiated close to/at the consu,er's edge of the internet,
>> making it really hard to ameasure their prevalence.
>> In short, I believe the only sane way forward is to teach L4S endpoints to to the
>> right thing under such conditions, I believe this would not be too onerous an ask,
>> given that the configuration is easy to set up for testing and development and a
>> number of ideas have already been theoretically discussed here. As far as I can
>> see these ideas mostly riff on the idea that such anAQM will, under congesation
>> conditions, increase each ftraversing flow's RTT and that should be quickly and
>> robustly detectable. I would love to learn more about these ideas and the state
>> of development and testing.
>> 
>> Best Regards & many thanks in advance
>> 	Sebastian Moeller

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 14:10                                                       ` Black, David
  2019-07-26 16:06                                                         ` Sebastian Moeller
@ 2019-07-26 16:15                                                         ` Holland, Jake
  2019-07-26 20:07                                                           ` Black, David
  1 sibling, 1 reply; 84+ messages in thread
From: Holland, Jake @ 2019-07-26 16:15 UTC (permalink / raw)
  To: Black, David, Sebastian Moeller, Bob Briscoe
  Cc: De Schepper, Koen (Nokia - BE/Antwerp), ecn-sane, tsvwg, Dave Taht

On 2019-07-26, 10:13, "Black, David" <David.Black@dell.com> wrote:
>> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
>> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
>> ms making them ineffective seems like a no-go to me (could someone clarify
>> what the IETF's official stance is on this matter, please?),
>
> The IETF expects that all relevant technical concerns such as this one will be raised by participants and will be carefully considered by the WG in determining what to do.
>    
> That was the technical answer, now for the official [officious? :-) ] answer ... the current L4S drafts do not modify RFC 3168 beyond the modifications already made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least one of the L4S drafts has to further modify RFC 3168, please bring that up with a specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs further modification.

I'll try pointing to some specific citations.  I think there may be
others along these lines, and would love to see a more complete
enumeration, but in the interest of a timely response, thought I'd
send one of the first I saw.

I'm not sure how I could recommend updating RFC 3168 to address this
point, but I do believe it's an incompatibility between the L4S
proposal and RFC 3168.

From https://tools.ietf.org/html/rfc8311#section-2.1
2.1.  Effective Congestion Control is Required

   Congestion control remains an important aspect of the Internet
   architecture [RFC2914].  Any Experimental RFC in the IETF document
   stream that takes advantage of this memo's updates to any RFC is
   required to discuss the congestion control implications of the
   experiment(s) in order to provide assurance that deployment of the
   experiment(s) does not pose a congestion-based threat to the
   operation of the Internet.

From https://tools.ietf.org/html/rfc2914#section-3.2
3.2.  Fairness
...
   It is convenient to divide flows into three classes: (1) TCP-
   compatible flows, (2) unresponsive flows, i.e., flows that do not
   slow down when congestion occurs, and (3) flows that are responsive
   but are not TCP-compatible.  The last two classes contain more
   aggressive flows that pose significant threats to Internet
   performance, as we discuss below.

I believe under this nomenclature, L4S in a queue with RFC3168-style
marking at a bottleneck should be classified as a flow that is
responsive but not TCP-compatible, and therefore poses a significant
threat to internet performance within this context.

I'm not sure how best to describe this discrepancy, but I think it's
fair to call it an incompatibility between a RFC3168-style marking
queue and L4S.

I didn't see this explicitly discussed in the L4S drafts as an
incompatibility that proposes to deploy a threat to flows in RFC3168
queues, but to me it seems required by RFC 8311 (and possibly in
conflict with the advice from section 4 of BCP 197, recommending the
use of ECN in deployed AQM devices).

On the contrary, I think we saw this described on the list as a non-
problem because most of the live RFC 3168 queues that we know about
are FQ (with the implication that this is sufficient to protect against
the non-TCP-compatible flows, except where there's a hash collision).

To me it seems unsafe to deploy this experiment without deprecating
the BCP 197 advice, and the root cause is its interaction with RFC
3168 marking.

Something like a requirement for a controlled environment would
address this problem, to me, and of course perhaps I'm on the rough
side of the consensus, but I think worth calling out the issue for
open discussion.

Best regards,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 16:06                                                         ` Sebastian Moeller
@ 2019-07-26 19:58                                                           ` Black, David
  2019-07-26 21:34                                                             ` Sebastian Moeller
  0 siblings, 1 reply; 84+ messages in thread
From: Black, David @ 2019-07-26 19:58 UTC (permalink / raw)
  To: Sebastian Moeller
  Cc: Bob Briscoe, ecn-sane, tsvwg, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp),
	Black, David

Sebastian,

Continuing with the official (officious?) response, in part as the author of RFC 8311 ...

> Am I correct in interpreting the following  sentence from RFC 8311:
> "ECN experiments are expected to coexist with deployed ECN
>    functionality, with the responsibility for that coexistence falling
>    primarily upon designers of experimental changes to ECN."
> as meaning, that L4S will need to implement the long discussed fall-back to
> RFC3168 compliant responses to CE marks, if a RFC3168 AQM is detected as
> being active on a path, and that L4S endpoint need to closely monitor for signs
> of RFC3168 behavior?

As RFC 8311 is a Proposed Standard RFC, the text quoted from that RFC "represents the consensus of the IETF community" (from "Status of This Memo" boilerplate in RFC 8311), and hence needs to be respected in the design of the L4S experiment.  I think the specific implications of that RFC 8311 text on the L4S experiment design are ultimately a technical matter for the WG to determine, but ignoring the quoted text would not be acceptable (and does not appear to be occurring).

Thanks, --David

> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de>
> Sent: Friday, July 26, 2019 12:07 PM
> To: Black, David
> Cc: Bob Briscoe; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht; De
> Schepper, Koen (Nokia - BE/Antwerp)
> Subject: Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
> 
> 
> [EXTERNAL EMAIL]
> 
> Dear David,
> 
> thanks for your clearing things up. I see, I should have read deeper into the
> relevant "web" of RFCs before asking.
> 
> Am I correct in interpreting the following  sentence from RFC 8311:
> "ECN experiments are expected to coexist with deployed ECN
>    functionality, with the responsibility for that coexistence falling
>    primarily upon designers of experimental changes to ECN."
> as meaning, that L4S will need to implement the long discussed fall-back to
> RFC3168 compliant responses to CE marks, if a RFC3168 AQM is detected as
> being active on a path, and that L4S endpoint need to closely monitor for signs
> of RFC3168 behavior? I ask because section 4.1 fails to put in those safe-guard
> clauses explicitly (in my reading this effectively says anything goes, as long as it
> is defined in its own RFC)
> 
> Now looking at the L4S RFC I see (https://tools.ietf.org/html/draft-ietf-tsvwg-
> l4s-arch-04#page-21 (assuming that this is one of the RFCs required to allow the
> exemption according to RFC8311)):
> 
> "Classic ECN support is starting to materialize on the Internet as an
>    increased level of CE marking.  Given some of this Classic ECN might
>    be due to single-queue ECN deployment, an L4S sender will have to
>    fall back to a classic ('TCP-Friendly') behaviour if it detects that
>    ECN marking is accompanied by greater queuing delay or greater delay
>    variation than would be expected with L4S (see Appendix A.1.4 of [I-D.ietf-
> tsvwg-ecn-l4s-id]).
>    It is hard to detect whether this is
>    all due to the addition of support for ECN in the Linux
>    implementation of FQ-CoDel, which would not require fall-back to
>    Classic behaviour, because FQ inherently forces the throughput of
>    each flow to be equal irrespective of its aggressiveness."
> 
> Which I believe to be problematic, as it conflates issues. The problem with L4S-
> CE response on non L4S-AQMs is that it will give L4S flows an unfair and
> unexpected advantage, so L4S endpoints should aim at detecting non-L4S AQMs
> on the path and not (just) "that ECN marking is accompanied by greater queuing
> delay or greater delay variation than would be expected with L4S". Sure delay
> variations can be a eans of trying to detect such an AQM, but this text basically
> gives L4S the license to just look at RTT variations and declare victory if these
> stay below an arbitrary threshold.
> 	Also I voiced concerns about the rationale for excluding RFC3168 FQ-
> AQMs from this fall-back treatment, and gave an explicit example of a system in
> use (post-true bottleneck ingress shaping) that I would like to see to be tested
> first. This should be easy to test (and as far as I know these tests are planned if
> not already done) so that the RFC can either be amended with a link to the data
> showing that this is harmless, or changed ot indicate that the fall-back might
> also be required for FQ-AQMs under certain conditions.
> 
> 
> Now if I look at https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#page-
> 25, I see the following:
> 
> "A.1.4.  Fall back to Reno-friendly congestion control on classic ECN bottlenecks
> 
>    Description: A scalable congestion control needs to react to ECN
>    marking from a non-L4S but ECN-capable bottleneck in a way that will
>    coexist with a TCP Reno congestion control [RFC5681].
> 
>    Motivation: Similarly to the requirement in Appendix A.1.3, this
>    requirement is a safety condition to ensure a scalable congestion
>    control behaves properly when it builds a queue at a network
>    bottleneck that has not been upgraded to support L4S.  On detecting
>    classic ECN marking (see below), a scalable congestion control will
>    need to fall back to classic congestion control behaviour.  If it
>    does not comply with this requirement it could starve classic
>    traffic.
> 
>    It would take time for endpoints to distinguish classic and L4S ECN
>    marking.  An increase in queuing delay or in delay variation would be
>    a tell-tale sign, but it is not yet clear where a line would be drawn
>    between the two behaviours.  It might be possible to cache what was
>    learned about the path to help subsequent attempts to detect the type
>    of marking."
> 
> Here, the special casing of FQ-AQMs does not seem to be present, which L4S
> RFC will have precedence here?
> 
> 
> Anyway, am I correct in interpreting all of the above as a clear an unambiguous
> requirement for L4S components like TCP-Prague to implement RFC3168-AQM
> detection and fall-back to appropriate behavior before being given the
> permission for usage on the wider internet?
> 
> 
> Best Regards
> 	Sebastian
> 
> > On Jul 26, 2019, at 16:10, Black, David <David.Black@dell.com> wrote:
> >
> > Inline comment on "IETF's official stance":
> >
> >> The first option seems highly undesirable to me, as a) (TCP-friendly) single
> queue
> >> RFC3168 AQM are standards compliant and will be for the foreseeable future,
> so
> >> ms making them ineffective seems like a no-go to me (could someone clarify
> >> what the IETF's official stance is on this matter, please?),
> >
> > The IETF expects that all relevant technical concerns such as this one will be
> raised by participants and will be carefully considered by the WG in determining
> what to do.
> >
> > That was the technical answer, now for the official [officious? :-) ] answer ...
> the current L4S drafts do not modify RFC 3168 beyond the modifications already
> made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least
> one of the L4S drafts has to further modify RFC 3168, please bring that up with a
> specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs
> further modification.
> >
> > Thanks, --David
> >
> >> -----Original Message-----
> >> From: Sebastian Moeller <moeller0@gmx.de>
> >> Sent: Friday, July 26, 2019 6:20 AM
> >> To: Bob Briscoe
> >> Cc: Black, David; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht;
> De
> >> Schepper, Koen (Nokia - BE/Antwerp)
> >> Subject: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
> >>
> >>
> >> [EXTERNAL EMAIL]
> >>
> >> Dear Bob,
> >>
> >> we have been going through the consequences and side effects of re-defining
> >> the meaning of a CE-mark for L4S-flows and using ECT(1) as a flllow-
> classifying
> >> heuristic.
> >> One of the side-effects is that  a single queue ecn-enabled AQM will CE-marl
> L4S
> >> packets, expecting a strong reduction in sending rate, while the L4S endpoints
> >> will only respond to that signal with a mild rate-reduction. One of the
> >> consequences of this behaviour is that L4S flows will crowd out RFC3168 and
> >> non-ECN flows, because these flows half their rates on drop or CE-mark
> >> (approximately) making congestion go away with the end result that the L4S
> >> flows gain an undesired advantage, at least that is my interpretation of the
> >> discussion so far.
> >> Now there are two options to deal with this issue, one is to declare it
> >> insignificant and just ignore it, or to make L4S endpoints detect that condition
> >> and revert back to RFC3168 behaviour.
> >> The first option seems highly undesirable to me, as a) (TCP-friendly) single
> queue
> >> RFC3168 AQM are standards compliant and will be for the foreseeable future,
> so
> >> ms making them ineffective seems like a no-go to me (could someone clarify
> >> what the IETF's official stance is on this matter, please?), b) I would expect
> most
> >> of such AQMs to be instantiated close to/at the consu,er's edge of the
> internet,
> >> making it really hard to ameasure their prevalence.
> >> In short, I believe the only sane way forward is to teach L4S endpoints to to
> the
> >> right thing under such conditions, I believe this would not be too onerous an
> ask,
> >> given that the configuration is easy to set up for testing and development and
> a
> >> number of ideas have already been theoretically discussed here. As far as I
> can
> >> see these ideas mostly riff on the idea that such anAQM will, under
> congesation
> >> conditions, increase each ftraversing flow's RTT and that should be quickly
> and
> >> robustly detectable. I would love to learn more about these ideas and the
> state
> >> of development and testing.
> >>
> >> Best Regards & many thanks in advance
> >> 	Sebastian Moeller


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 16:15                                                         ` Holland, Jake
@ 2019-07-26 20:07                                                           ` Black, David
  2019-07-26 23:40                                                             ` Jonathan Morton
  0 siblings, 1 reply; 84+ messages in thread
From: Black, David @ 2019-07-26 20:07 UTC (permalink / raw)
  To: Holland, Jake, Sebastian Moeller, Bob Briscoe
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	ecn-sane, tsvwg, Dave Taht, Black, David

Jake,

> I believe under this nomenclature, L4S in a queue with RFC3168-style
> marking at a bottleneck should be classified as a flow that is
> responsive but not TCP-compatible, and therefore poses a significant
> threat to internet performance within this context.
> 
> I'm not sure how best to describe this discrepancy, but I think it's
> fair to call it an incompatibility between a RFC3168-style marking
> queue and L4S.

Based on the L4S slides in today's meeting and related discussion, the L4S folks are starting to deal with this concern.

I share your technical view that this concern is not safe to ignore.

Thanks, --David

> -----Original Message-----
> From: Holland, Jake <jholland@akamai.com>
> Sent: Friday, July 26, 2019 12:16 PM
> To: Black, David; Sebastian Moeller; Bob Briscoe
> Cc: De Schepper, Koen (Nokia - BE/Antwerp); ecn-sane@lists.bufferbloat.net;
> tsvwg@ietf.org; Dave Taht
> Subject: Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
> 
> 
> [EXTERNAL EMAIL]
> 
> On 2019-07-26, 10:13, "Black, David" <David.Black@dell.com> wrote:
> >> The first option seems highly undesirable to me, as a) (TCP-friendly) single
> queue
> >> RFC3168 AQM are standards compliant and will be for the foreseeable future,
> so
> >> ms making them ineffective seems like a no-go to me (could someone clarify
> >> what the IETF's official stance is on this matter, please?),
> >
> > The IETF expects that all relevant technical concerns such as this one will be
> raised by participants and will be carefully considered by the WG in determining
> what to do.
> >
> > That was the technical answer, now for the official [officious? :-) ] answer ...
> the current L4S drafts do not modify RFC 3168 beyond the modifications already
> made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least
> one of the L4S drafts has to further modify RFC 3168, please bring that up with a
> specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs
> further modification.
> 
> I'll try pointing to some specific citations.  I think there may be
> others along these lines, and would love to see a more complete
> enumeration, but in the interest of a timely response, thought I'd
> send one of the first I saw.
> 
> I'm not sure how I could recommend updating RFC 3168 to address this
> point, but I do believe it's an incompatibility between the L4S
> proposal and RFC 3168.
> 
> 
> From https://tools.ietf.org/html/rfc8311#section-2.1
> 2.1.  Effective Congestion Control is Required
> 
>    Congestion control remains an important aspect of the Internet
>    architecture [RFC2914].  Any Experimental RFC in the IETF document
>    stream that takes advantage of this memo's updates to any RFC is
>    required to discuss the congestion control implications of the
>    experiment(s) in order to provide assurance that deployment of the
>    experiment(s) does not pose a congestion-based threat to the
>    operation of the Internet.
> 
> 
> From https://tools.ietf.org/html/rfc2914#section-3.2
> 3.2.  Fairness
> ...
>    It is convenient to divide flows into three classes: (1) TCP-
>    compatible flows, (2) unresponsive flows, i.e., flows that do not
>    slow down when congestion occurs, and (3) flows that are responsive
>    but are not TCP-compatible.  The last two classes contain more
>    aggressive flows that pose significant threats to Internet
>    performance, as we discuss below.
> 
> 
> I believe under this nomenclature, L4S in a queue with RFC3168-style
> marking at a bottleneck should be classified as a flow that is
> responsive but not TCP-compatible, and therefore poses a significant
> threat to internet performance within this context.
> 
> I'm not sure how best to describe this discrepancy, but I think it's
> fair to call it an incompatibility between a RFC3168-style marking
> queue and L4S.
> 
> I didn't see this explicitly discussed in the L4S drafts as an
> incompatibility that proposes to deploy a threat to flows in RFC3168
> queues, but to me it seems required by RFC 8311 (and possibly in
> conflict with the advice from section 4 of BCP 197, recommending the
> use of ECN in deployed AQM devices).
> 
> On the contrary, I think we saw this described on the list as a non-
> problem because most of the live RFC 3168 queues that we know about
> are FQ (with the implication that this is sufficient to protect against
> the non-TCP-compatible flows, except where there's a hash collision).
> 
> To me it seems unsafe to deploy this experiment without deprecating
> the BCP 197 advice, and the root cause is its interaction with RFC
> 3168 marking.
> 
> Something like a requirement for a controlled environment would
> address this problem, to me, and of course perhaps I'm on the rough
> side of the consensus, but I think worth calling out the issue for
> open discussion.
> 
> Best regards,
> Jake
> 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 19:58                                                           ` Black, David
@ 2019-07-26 21:34                                                             ` Sebastian Moeller
  0 siblings, 0 replies; 84+ messages in thread
From: Sebastian Moeller @ 2019-07-26 21:34 UTC (permalink / raw)
  To: Black, David
  Cc: Bob Briscoe, ecn-sane, tsvwg, Dave Taht, De Schepper,
	Koen (Nokia - BE/Antwerp)

Hi David,


> On Jul 26, 2019, at 21:58, Black, David <David.Black@dell.com> wrote:
> 
> Sebastian,
> 
> Continuing with the official (officious?) response, in part as the author of RFC 8311 ...
> 
>> Am I correct in interpreting the following  sentence from RFC 8311:
>> "ECN experiments are expected to coexist with deployed ECN
>>   functionality, with the responsibility for that coexistence falling
>>   primarily upon designers of experimental changes to ECN."
>> as meaning, that L4S will need to implement the long discussed fall-back to
>> RFC3168 compliant responses to CE marks, if a RFC3168 AQM is detected as
>> being active on a path, and that L4S endpoint need to closely monitor for signs
>> of RFC3168 behavior?
> 
> As RFC 8311 is a Proposed Standard RFC, the text quoted from that RFC "represents the consensus of the IETF community" (from "Status of This Memo" boilerplate in RFC 8311), and hence needs to be respected in the design of the L4S experiment.  

	Ah okay, thanks for clearing that up.

> I think the specific implications of that RFC 8311 text on the L4S experiment design are ultimately a technical matter for the WG to determine

	Is there really any wiggle room? "the responsibility for that coexistence falling primarily upon designers of experimental changes to ECN" seems pretty explicit to me...

> , but ignoring the quoted text would not be acceptable (and does not appear to be occurring).

	Ah okay, I was under the impression, that there very much was an open question whether peaceful co-existence with RFC3168 compliant AQMs was actually a requirement or whether that could be negotiated away at the green table. Great that we agree that this issue on interoperability with the existing RFC-compliant internet is not optional ;)


Best Regards
	Sebastian

> 
> Thanks, --David
> 
>> -----Original Message-----
>> From: Sebastian Moeller <moeller0@gmx.de>
>> Sent: Friday, July 26, 2019 12:07 PM
>> To: Black, David
>> Cc: Bob Briscoe; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht; De
>> Schepper, Koen (Nokia - BE/Antwerp)
>> Subject: Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
>> 
>> 
>> [EXTERNAL EMAIL]
>> 
>> Dear David,
>> 
>> thanks for your clearing things up. I see, I should have read deeper into the
>> relevant "web" of RFCs before asking.
>> 
>> Am I correct in interpreting the following  sentence from RFC 8311:
>> "ECN experiments are expected to coexist with deployed ECN
>>   functionality, with the responsibility for that coexistence falling
>>   primarily upon designers of experimental changes to ECN."
>> as meaning, that L4S will need to implement the long discussed fall-back to
>> RFC3168 compliant responses to CE marks, if a RFC3168 AQM is detected as
>> being active on a path, and that L4S endpoint need to closely monitor for signs
>> of RFC3168 behavior? I ask because section 4.1 fails to put in those safe-guard
>> clauses explicitly (in my reading this effectively says anything goes, as long as it
>> is defined in its own RFC)
>> 
>> Now looking at the L4S RFC I see (https://tools.ietf.org/html/draft-ietf-tsvwg-
>> l4s-arch-04#page-21 (assuming that this is one of the RFCs required to allow the
>> exemption according to RFC8311)):
>> 
>> "Classic ECN support is starting to materialize on the Internet as an
>>   increased level of CE marking.  Given some of this Classic ECN might
>>   be due to single-queue ECN deployment, an L4S sender will have to
>>   fall back to a classic ('TCP-Friendly') behaviour if it detects that
>>   ECN marking is accompanied by greater queuing delay or greater delay
>>   variation than would be expected with L4S (see Appendix A.1.4 of [I-D.ietf-
>> tsvwg-ecn-l4s-id]).
>>   It is hard to detect whether this is
>>   all due to the addition of support for ECN in the Linux
>>   implementation of FQ-CoDel, which would not require fall-back to
>>   Classic behaviour, because FQ inherently forces the throughput of
>>   each flow to be equal irrespective of its aggressiveness."
>> 
>> Which I believe to be problematic, as it conflates issues. The problem with L4S-
>> CE response on non L4S-AQMs is that it will give L4S flows an unfair and
>> unexpected advantage, so L4S endpoints should aim at detecting non-L4S AQMs
>> on the path and not (just) "that ECN marking is accompanied by greater queuing
>> delay or greater delay variation than would be expected with L4S". Sure delay
>> variations can be a eans of trying to detect such an AQM, but this text basically
>> gives L4S the license to just look at RTT variations and declare victory if these
>> stay below an arbitrary threshold.
>> 	Also I voiced concerns about the rationale for excluding RFC3168 FQ-
>> AQMs from this fall-back treatment, and gave an explicit example of a system in
>> use (post-true bottleneck ingress shaping) that I would like to see to be tested
>> first. This should be easy to test (and as far as I know these tests are planned if
>> not already done) so that the RFC can either be amended with a link to the data
>> showing that this is harmless, or changed ot indicate that the fall-back might
>> also be required for FQ-AQMs under certain conditions.
>> 
>> 
>> Now if I look at https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#page-
>> 25, I see the following:
>> 
>> "A.1.4.  Fall back to Reno-friendly congestion control on classic ECN bottlenecks
>> 
>>   Description: A scalable congestion control needs to react to ECN
>>   marking from a non-L4S but ECN-capable bottleneck in a way that will
>>   coexist with a TCP Reno congestion control [RFC5681].
>> 
>>   Motivation: Similarly to the requirement in Appendix A.1.3, this
>>   requirement is a safety condition to ensure a scalable congestion
>>   control behaves properly when it builds a queue at a network
>>   bottleneck that has not been upgraded to support L4S.  On detecting
>>   classic ECN marking (see below), a scalable congestion control will
>>   need to fall back to classic congestion control behaviour.  If it
>>   does not comply with this requirement it could starve classic
>>   traffic.
>> 
>>   It would take time for endpoints to distinguish classic and L4S ECN
>>   marking.  An increase in queuing delay or in delay variation would be
>>   a tell-tale sign, but it is not yet clear where a line would be drawn
>>   between the two behaviours.  It might be possible to cache what was
>>   learned about the path to help subsequent attempts to detect the type
>>   of marking."
>> 
>> Here, the special casing of FQ-AQMs does not seem to be present, which L4S
>> RFC will have precedence here?
>> 
>> 
>> Anyway, am I correct in interpreting all of the above as a clear an unambiguous
>> requirement for L4S components like TCP-Prague to implement RFC3168-AQM
>> detection and fall-back to appropriate behavior before being given the
>> permission for usage on the wider internet?
>> 
>> 
>> Best Regards
>> 	Sebastian
>> 
>>> On Jul 26, 2019, at 16:10, Black, David <David.Black@dell.com> wrote:
>>> 
>>> Inline comment on "IETF's official stance":
>>> 
>>>> The first option seems highly undesirable to me, as a) (TCP-friendly) single
>> queue
>>>> RFC3168 AQM are standards compliant and will be for the foreseeable future,
>> so
>>>> ms making them ineffective seems like a no-go to me (could someone clarify
>>>> what the IETF's official stance is on this matter, please?),
>>> 
>>> The IETF expects that all relevant technical concerns such as this one will be
>> raised by participants and will be carefully considered by the WG in determining
>> what to do.
>>> 
>>> That was the technical answer, now for the official [officious? :-) ] answer ...
>> the current L4S drafts do not modify RFC 3168 beyond the modifications already
>> made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least
>> one of the L4S drafts has to further modify RFC 3168, please bring that up with a
>> specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs
>> further modification.
>>> 
>>> Thanks, --David
>>> 
>>>> -----Original Message-----
>>>> From: Sebastian Moeller <moeller0@gmx.de>
>>>> Sent: Friday, July 26, 2019 6:20 AM
>>>> To: Bob Briscoe
>>>> Cc: Black, David; ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org; Dave Taht;
>> De
>>>> Schepper, Koen (Nokia - BE/Antwerp)
>>>> Subject: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
>>>> 
>>>> 
>>>> [EXTERNAL EMAIL]
>>>> 
>>>> Dear Bob,
>>>> 
>>>> we have been going through the consequences and side effects of re-defining
>>>> the meaning of a CE-mark for L4S-flows and using ECT(1) as a flllow-
>> classifying
>>>> heuristic.
>>>> One of the side-effects is that  a single queue ecn-enabled AQM will CE-marl
>> L4S
>>>> packets, expecting a strong reduction in sending rate, while the L4S endpoints
>>>> will only respond to that signal with a mild rate-reduction. One of the
>>>> consequences of this behaviour is that L4S flows will crowd out RFC3168 and
>>>> non-ECN flows, because these flows half their rates on drop or CE-mark
>>>> (approximately) making congestion go away with the end result that the L4S
>>>> flows gain an undesired advantage, at least that is my interpretation of the
>>>> discussion so far.
>>>> Now there are two options to deal with this issue, one is to declare it
>>>> insignificant and just ignore it, or to make L4S endpoints detect that condition
>>>> and revert back to RFC3168 behaviour.
>>>> The first option seems highly undesirable to me, as a) (TCP-friendly) single
>> queue
>>>> RFC3168 AQM are standards compliant and will be for the foreseeable future,
>> so
>>>> ms making them ineffective seems like a no-go to me (could someone clarify
>>>> what the IETF's official stance is on this matter, please?), b) I would expect
>> most
>>>> of such AQMs to be instantiated close to/at the consu,er's edge of the
>> internet,
>>>> making it really hard to ameasure their prevalence.
>>>> In short, I believe the only sane way forward is to teach L4S endpoints to to
>> the
>>>> right thing under such conditions, I believe this would not be too onerous an
>> ask,
>>>> given that the configuration is easy to set up for testing and development and
>> a
>>>> number of ideas have already been theoretically discussed here. As far as I
>> can
>>>> see these ideas mostly riff on the idea that such anAQM will, under
>> congesation
>>>> conditions, increase each ftraversing flow's RTT and that should be quickly
>> and
>>>> robustly detectable. I would love to learn more about these ideas and the
>> state
>>>> of development and testing.
>>>> 
>>>> Best Regards & many thanks in advance
>>>> 	Sebastian Moeller
> 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 20:07                                                           ` Black, David
@ 2019-07-26 23:40                                                             ` Jonathan Morton
  2019-08-07  8:41                                                               ` Mikael Abrahamsson
  0 siblings, 1 reply; 84+ messages in thread
From: Jonathan Morton @ 2019-07-26 23:40 UTC (permalink / raw)
  To: Black, David
  Cc: Holland, Jake, Sebastian Moeller, Bob Briscoe, ecn-sane, tsvwg,
	Dave Taht, De Schepper, Koen (Nokia - BE/Antwerp)

> On 26 Jul, 2019, at 4:07 pm, Black, David <David.Black@dell.com> wrote:
> 
>> I believe under this nomenclature, L4S in a queue with RFC3168-style
>> marking at a bottleneck should be classified as a flow that is
>> responsive but not TCP-compatible, and therefore poses a significant
>> threat to internet performance within this context.
>> 
>> I'm not sure how best to describe this discrepancy, but I think it's
>> fair to call it an incompatibility between a RFC3168-style marking
>> queue and L4S.
> 
> Based on the L4S slides in today's meeting and related discussion, the L4S folks are starting to deal with this concern.
> 
> I share your technical view that this concern is not safe to ignore.

Based on our post-session discussions, I feel that it may not actually be entirely clear to the L4S people just how serious the situation with L4S and Codel is.

The impression I gained was that they consider *Codel* to be broken, and that *it* should be changed to match what L4S expects.  This is impractical given how widely Codel is already deployed, and the fact that it was carefully designed specifically with RFC-compliant transport flows in mind.  The result of their proposed changes would no longer resemble Codel at all.

Unfortunately contributing to their apparent confusion, TCP Prague is currently broken in such a way as to mask the problem if tested directly.  To experimentally verify our hypothesis, we had to synthesise a pseudo-Prague implementation by inserting a firewall rule (mangling CE into SCE) in front of our DCTCP-SCE implementation, the results of which matched our mathematical predictions perfectly.  We saw no evidence of a Classic ECN detector in our TCP Prague tests.

Codel is itself documented in an Experimental RFC, authored by no less personages than Kathy Nichols and VJ.  The derivative FQ-Codel is similarly documented in an RFC.  The variant I use named COBALT (aka Codel-BLUE Alternate) is not yet in an RFC (nor even a draft), but possibly it should be made into one, as the improvements are at least interesting and are proven by both research and deployment.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-07-26 23:40                                                             ` Jonathan Morton
@ 2019-08-07  8:41                                                               ` Mikael Abrahamsson
  2019-08-07 10:06                                                                 ` Mikael Abrahamsson
  0 siblings, 1 reply; 84+ messages in thread
From: Mikael Abrahamsson @ 2019-08-07  8:41 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Black, David, tsvwg, Bob Briscoe, ecn-sane, Dave Taht,
	De Schepper, Koen (Nokia - BE/Antwerp)

On Fri, 26 Jul 2019, Jonathan Morton wrote:

> Based on our post-session discussions, I feel that it may not actually 
> be entirely clear to the L4S people just how serious the situation with 
> L4S and Codel is.

My take on all of this is that whatever we come up with needs to be 
incrementally deployable on an Internet that has everything from stupid 
huge FIFOs to FQ_CODEL to whatever else might be out there, and there 
should be no huge pathological downsides of deployment that causes 
widespread degradation/collapse of anything currently being in wider use 
on the Internet.

In 5-10 years we're still going to have all kinds of AQMs and stupid huge 
FIFOs still in wide use.

So I'd like to see robust testing done for all proposals to see that they 
work properly on everything from GSM EDGE to ADSL to FQ_CODEL and 
dual-queue whatever we come up with together with commonly used traffic 
types on the Internet today, elastic and non-elastic.

I realise this is problematic and gets in the way of progress but the 
Internet is a messy place and we need to do mitigation of pathological 
cases where new and old don't always play nice together.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07  8:41                                                               ` Mikael Abrahamsson
@ 2019-08-07 10:06                                                                 ` Mikael Abrahamsson
  2019-08-07 11:57                                                                   ` Jeremy Harris
  0 siblings, 1 reply; 84+ messages in thread
From: Mikael Abrahamsson @ 2019-08-07 10:06 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: tsvwg, Bob Briscoe, Black, David, ecn-sane, Dave Taht,
	De Schepper, Koen (Nokia - BE/Antwerp)

On Wed, 7 Aug 2019, Mikael Abrahamsson wrote:

> I realise this is problematic and gets in the way of progress but the 
> Internet is a messy place and we need to do mitigation of pathological 
> cases where new and old don't always play nice together.

I forgot to also mention that I still would like to see more testing of 
the benefits of removing the ordering requirement on some traffic. Having 
been exposed to different medias over the past 25 years I've seen numerous 
Head-of-Line blocking problems my fair share of time and removing this 
ordering requirement would free up some medias to deliver some packets 
even though other packets are being re-transmitted and delivered later.

I also encourage if we can reach overall consensus that transports should 
take into account that packets can be re-ordered and what timescales would 
be acceptable for this to happen (1ms? 10ms? 100ms? Even more? Less? Not 
at all?)

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07 10:06                                                                 ` Mikael Abrahamsson
@ 2019-08-07 11:57                                                                   ` Jeremy Harris
  2019-08-07 12:03                                                                     ` Mikael Abrahamsson
  0 siblings, 1 reply; 84+ messages in thread
From: Jeremy Harris @ 2019-08-07 11:57 UTC (permalink / raw)
  To: ecn-sane

On 07/08/2019 11:06, Mikael Abrahamsson wrote:
> I also encourage if we can reach overall consensus that transports
> should take into account that packets can be re-ordered and what
> timescales would be acceptable for this to happen (1ms? 10ms? 100ms?
> Even more? Less? Not at all?)

The major issue I see is that the interface between transport and
its client layer needs to become more complex for that.

I could, for example, imagine a pure (FTP-complexity) file transfer
application over TCP being able to receive out-of-order received
TCP segments and hand them direct to the kernel using pwrite
(assuming TCP SACK in use).  But the socket interface would need
to present sequencing information along with the segments; it being
no longer implied by the sequence of satisfied reads.

The timescale for acceptable re-ordering for that case is "indefinite".
But it's only one case, and a pretty limited one.

-- 
Cheers,
  Jeremy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07 11:57                                                                   ` Jeremy Harris
@ 2019-08-07 12:03                                                                     ` Mikael Abrahamsson
  2019-08-07 12:14                                                                       ` Sebastian Moeller
  2019-08-07 12:34                                                                       ` Jeremy Harris
  0 siblings, 2 replies; 84+ messages in thread
From: Mikael Abrahamsson @ 2019-08-07 12:03 UTC (permalink / raw)
  To: Jeremy Harris; +Cc: ecn-sane

On Wed, 7 Aug 2019, Jeremy Harris wrote:

> (assuming TCP SACK in use).  But the socket interface would need
> to present sequencing information along with the segments; it being
> no longer implied by the sequence of satisfied reads.

Yes, the socket stream interface guarantees ordered delivery of that 
stream. That doesn't mean other 5 tuple connections running over the same 
media need to be held up just because a packet is missing from this first 
stream. A lot of medias guarantees complete ordering, even between 
flows/streams. If we loosen this requirement then muxed transports or 
other stream can continue even if there is a packet missing and being 
ARQed on the media.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07 12:03                                                                     ` Mikael Abrahamsson
@ 2019-08-07 12:14                                                                       ` Sebastian Moeller
  2019-08-07 12:25                                                                         ` Mikael Abrahamsson
  2019-08-07 12:34                                                                       ` Jeremy Harris
  1 sibling, 1 reply; 84+ messages in thread
From: Sebastian Moeller @ 2019-08-07 12:14 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Jeremy Harris, ecn-sane



> On Aug 7, 2019, at 14:03, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> 
> On Wed, 7 Aug 2019, Jeremy Harris wrote:
> 
>> (assuming TCP SACK in use).  But the socket interface would need
>> to present sequencing information along with the segments; it being
>> no longer implied by the sequence of satisfied reads.
> 
> Yes, the socket stream interface guarantees ordered delivery of that stream. That doesn't mean other 5 tuple connections running over the same media need to be held up just because a packet is missing from this first stream. A lot of medias guarantees complete ordering, even between flows/streams. If we loosen this requirement then muxed transports or other stream can continue even if there is a packet missing and being ARQed on the media.

	I guess I am overly naive, but as far as I can tell only TCP has this strong shuffling-sensitivity, UDP itself does not care (applications still might dislike packet shuffling). Could the intermediate hops not simply block TCP and just pass on UDP? This should at least avoid the medium idling while waiting for the straggling packets. I guess that is tricky in that a medium's ARQ might not look past its own headers, "sequence identifiers" and checksums, clearly that is not enough to get to the protocol ID (add to this IPv6 extension headers and the required deep dive).
 The fact that is not implemented yet, indicates to me that the ECT(1) thing is also not likely to make more inroads, what am I missing?

Best Regards
	Sebastian


> 
> -- 
> Mikael Abrahamsson    email: swmike@swm.pp.se
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07 12:14                                                                       ` Sebastian Moeller
@ 2019-08-07 12:25                                                                         ` Mikael Abrahamsson
  0 siblings, 0 replies; 84+ messages in thread
From: Mikael Abrahamsson @ 2019-08-07 12:25 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Jeremy Harris, ecn-sane

On Wed, 7 Aug 2019, Sebastian Moeller wrote:

> 	I guess I am overly naive, but as far as I can tell only TCP has 
> this strong shuffling-sensitivity, UDP itself does not care 
> (applications still might dislike packet shuffling). Could the

I know applications that require packets (using UDP) to be arriving in 
order, because an out of order packet is considered a lost packet. These 
applications are designed with the presumtion that the network deliver the 
packets in order (at least within the 5 tuple).

I guess basic TCP (without SACK) can be argued as the same property?

> intermediate hops not simply block TCP and just pass on UDP? This should 
> at least avoid the medium idling while waiting for the straggling 
> packets. I guess that is tricky in that a medium's ARQ might not look 
> past its own headers, "sequence identifiers" and checksums, clearly that 
> is not enough to get to the protocol ID (add to this IPv6 extension 
> headers and the required deep dive).

I think the argument can be made that TCP actually is less sensitive to 
packet reordering (with SACK) than UDP. With TCP it's fairly well 
understood what happens, with UDP we have no idea, because we don't know 
what application is running.

> The fact that is not implemented yet, indicates to me that the ECT(1) 
> thing is also not likely to make more inroads, what am I missing?

The whole concept of "let's try to figure out what traffic can be 
delivered out of order and what needs to be in order" is a fairly new 
concept. I am not surprised this hasn't been implemented yet. There is 
absolutely no current consensus on what traffic can be re-ordered or not. 
L4S suggests an implicit mark on "new" transports that are allowed to be 
re-ordered. I have not seen any such proposal before. Typically the L2/ARQ 
designers are sitting there designing their thing without knowing anything 
about L3 or L4 basically. Having this kind of cross-layer approach is kind 
of new from what I can see.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07 12:03                                                                     ` Mikael Abrahamsson
  2019-08-07 12:14                                                                       ` Sebastian Moeller
@ 2019-08-07 12:34                                                                       ` Jeremy Harris
  2019-08-07 12:49                                                                         ` Mikael Abrahamsson
  1 sibling, 1 reply; 84+ messages in thread
From: Jeremy Harris @ 2019-08-07 12:34 UTC (permalink / raw)
  To: ecn-sane

On 07/08/2019 13:03, Mikael Abrahamsson wrote:
> On Wed, 7 Aug 2019, Jeremy Harris wrote:
> 
>> (assuming TCP SACK in use).  But the socket interface would need
>> to present sequencing information along with the segments; it being
>> no longer implied by the sequence of satisfied reads.
> 
> Yes, the socket stream interface guarantees ordered delivery of that
> stream. That doesn't mean other 5 tuple connections running over the
> same media need to be held up just because a packet is missing from this
> first stream. A lot of medias guarantees complete ordering, even between
> flows/streams. If we loosen this requirement then muxed transports or
> other stream can continue even if there is a packet missing and being
> ARQed on the media.

I can't quite tell if you're noting that transports need to be aware of,
and handle (in one way or another) packets re-ordered by lower layers
[ I thought this was a given, already ]

or

that link-layers Should Not enforce ordered delivery of frames
[ i.e. Wifi and, I think, mobile phone providers are doing it all
wrong.   And half of the work being talked about in the LOOPS
group is suspect].


The latter sounds somewhat like the end-to-end principle.

-- 
Cheers,
  Jeremy

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
  2019-08-07 12:34                                                                       ` Jeremy Harris
@ 2019-08-07 12:49                                                                         ` Mikael Abrahamsson
  0 siblings, 0 replies; 84+ messages in thread
From: Mikael Abrahamsson @ 2019-08-07 12:49 UTC (permalink / raw)
  To: Jeremy Harris; +Cc: ecn-sane

On Wed, 7 Aug 2019, Jeremy Harris wrote:

> I can't quite tell if you're noting that transports need to be aware of,
> and handle (in one way or another) packets re-ordered by lower layers
> [ I thought this was a given, already ]

My statement here is that most if not all media is designed around 
delivering 5 tuple traffic in-order, and any out of order delivery is 
considered an anomily, and to be avoided.

Juniper had a bug in one of their routers back in 2000 which would 
re-order 5-tuple flows occasionally. They were ridiculed by ISP peeps for 
years about this and it was kind of a scandal. We in the ISP business 
generally try really-really-really hard to deliver at least 5 tuple 
traffic in order. Most media try really really really hard to deliver the 
entire queue in order. For instance a mobile network typically will have a 
buffer that can hold many hundreds of packets, and it'll never re-order. 
It'll sit on a single packet being re-transmitted over and over again and 
hold the queue and nothing will be delivered until the entire queue 
ordering "promise" is met. So this is not 5-tuple, this is entire queue.

All media I know of that has ARQ, for instance wifi, DOCSIS, *DSL, 3GPP 
networks, they all deliver the entire queue in-order. They'll hold up the 
entire queue if there is a single packet that needs several re-transmits 
to be delivered.

So it's not uncommon to see packets being delivered in a very bursty 
fashion because there might have been 200 packets being held up for that 
single packet to be re-transmitted a few times, and it might not even have 
been in the same 5-tuple flow as any of those 200 packets.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2019-08-07 12:49 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-05  0:01 [Ecn-sane] Comments on L4S drafts Holland, Jake
2019-06-07 18:07 ` Bob Briscoe
2019-06-14 17:39   ` Holland, Jake
2019-06-19 14:11     ` Bob Briscoe
2019-07-10 13:55       ` Holland, Jake
2019-06-14 20:10   ` [Ecn-sane] [tsvwg] " Luca Muscariello
2019-06-14 21:44     ` Dave Taht
2019-06-15 20:26       ` [Ecn-sane] [tsvwg] CoIt'smments " David P. Reed
2019-06-19  1:15     ` [Ecn-sane] [tsvwg] Comments " Bob Briscoe
2019-06-19  1:33       ` Dave Taht
2019-06-19  4:24       ` Holland, Jake
2019-06-19 13:02         ` Luca Muscariello
2019-07-04 11:54           ` Bob Briscoe
2019-07-04 12:24             ` Jonathan Morton
2019-07-04 13:43               ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-04 14:03                 ` Jonathan Morton
2019-07-04 17:54                   ` Bob Briscoe
2019-07-05  8:26                     ` Jonathan Morton
2019-07-05  6:46                   ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-05  8:51                     ` Jonathan Morton
2019-07-08 10:26                       ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-08 20:55                         ` Holland, Jake
2019-07-10  0:10                           ` Jonathan Morton
2019-07-10  9:00                           ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-10 13:14                             ` Dave Taht
2019-07-10 17:32                               ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-17 22:40                             ` Sebastian Moeller
2019-07-19  9:06                               ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-19 15:37                                 ` Dave Taht
2019-07-19 18:33                                   ` Wesley Eddy
2019-07-19 20:03                                     ` Dave Taht
2019-07-19 22:09                                       ` Wesley Eddy
2019-07-19 23:42                                         ` Dave Taht
2019-07-24 16:21                                           ` Dave Taht
2019-07-19 20:06                                     ` Black, David
2019-07-19 20:44                                       ` Jonathan Morton
2019-07-19 22:03                                         ` Sebastian Moeller
2019-07-20 21:02                                           ` Dave Taht
2019-07-21 11:53                                           ` Bob Briscoe
2019-07-21 15:30                                             ` [Ecn-sane] Hackathon tests Dave Taht
2019-07-21 15:33                                             ` [Ecn-sane] [tsvwg] Comments on L4S drafts Sebastian Moeller
2019-07-21 16:00                                             ` Jonathan Morton
2019-07-21 16:12                                               ` Sebastian Moeller
2019-07-22 18:15                                               ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-22 18:33                                                 ` Dave Taht
2019-07-22 19:48                                                 ` Pete Heist
2019-07-25 16:14                                                   ` De Schepper, Koen (Nokia - BE/Antwerp)
2019-07-26 13:10                                                     ` Pete Heist
2019-07-26 15:05                                                       ` [Ecn-sane] The state of l4s, bbrv2, sce? Dave Taht
2019-07-26 15:32                                                         ` Dave Taht
2019-07-26 15:37                                                         ` Neal Cardwell
2019-07-26 15:45                                                           ` Dave Taht
2019-07-23 10:33                                                 ` [Ecn-sane] [tsvwg] Comments on L4S drafts Sebastian Moeller
2019-07-21 12:30                                       ` Bob Briscoe
2019-07-21 16:08                                         ` Sebastian Moeller
2019-07-21 19:14                                           ` Bob Briscoe
2019-07-21 20:48                                             ` Sebastian Moeller
2019-07-25 20:51                                               ` Bob Briscoe
2019-07-25 21:17                                                 ` Bob Briscoe
2019-07-25 22:00                                                   ` Sebastian Moeller
2019-07-26 10:20                                                     ` [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs Sebastian Moeller
2019-07-26 14:10                                                       ` Black, David
2019-07-26 16:06                                                         ` Sebastian Moeller
2019-07-26 19:58                                                           ` Black, David
2019-07-26 21:34                                                             ` Sebastian Moeller
2019-07-26 16:15                                                         ` Holland, Jake
2019-07-26 20:07                                                           ` Black, David
2019-07-26 23:40                                                             ` Jonathan Morton
2019-08-07  8:41                                                               ` Mikael Abrahamsson
2019-08-07 10:06                                                                 ` Mikael Abrahamsson
2019-08-07 11:57                                                                   ` Jeremy Harris
2019-08-07 12:03                                                                     ` Mikael Abrahamsson
2019-08-07 12:14                                                                       ` Sebastian Moeller
2019-08-07 12:25                                                                         ` Mikael Abrahamsson
2019-08-07 12:34                                                                       ` Jeremy Harris
2019-08-07 12:49                                                                         ` Mikael Abrahamsson
     [not found]                                         ` <5D34803D.50501@erg.abdn.ac.uk>
2019-07-21 16:43                                           ` [Ecn-sane] [tsvwg] Comments on L4S drafts Black, David
2019-07-21 12:30                                       ` Scharf, Michael
2019-07-19 21:49                                     ` Sebastian Moeller
2019-07-22 16:28                                   ` Bless, Roland (TM)
2019-07-19 17:59                                 ` Sebastian Moeller
2019-07-05  9:48             ` Luca Muscariello
2019-07-04 13:45         ` Bob Briscoe
2019-07-10 17:03           ` Holland, Jake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox