[Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs

Fri Jul 26 12:06:52 EDT 2019

Dear David,

thanks for your clearing things up. I see, I should have read deeper into the relevant "web" of RFCs before asking.

Am I correct in interpreting the following  sentence from RFC 8311:
"ECN experiments are expected to coexist with deployed ECN
   functionality, with the responsibility for that coexistence falling
   primarily upon designers of experimental changes to ECN."
as meaning, that L4S will need to implement the long discussed fall-back to RFC3168 compliant responses to CE marks, if a RFC3168 AQM is detected as being active on a path, and that L4S endpoint need to closely monitor for signs of RFC3168 behavior? I ask because section 4.1 fails to put in those safe-guard clauses explicitly (in my reading this effectively says anything goes, as long as it is defined in its own RFC)

Now looking at the L4S RFC I see (https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-04#page-21 (assuming that this is one of the RFCs required to allow the exemption according to RFC8311)):

"Classic ECN support is starting to materialize on the Internet as an
   increased level of CE marking.  Given some of this Classic ECN might
   be due to single-queue ECN deployment, an L4S sender will have to
   fall back to a classic ('TCP-Friendly') behaviour if it detects that
   ECN marking is accompanied by greater queuing delay or greater delay
   variation than would be expected with L4S (see Appendix A.1.4 of [I-D.ietf-tsvwg-ecn-l4s-id]).  
   It is hard to detect whether this is
   all due to the addition of support for ECN in the Linux
   implementation of FQ-CoDel, which would not require fall-back to
   Classic behaviour, because FQ inherently forces the throughput of
   each flow to be equal irrespective of its aggressiveness."

Which I believe to be problematic, as it conflates issues. The problem with L4S-CE response on non L4S-AQMs is that it will give L4S flows an unfair and unexpected advantage, so L4S endpoints should aim at detecting non-L4S AQMs on the path and not (just) "that ECN marking is accompanied by greater queuing delay or greater delay variation than would be expected with L4S". Sure delay variations can be a eans of trying to detect such an AQM, but this text basically gives L4S the license to just look at RTT variations and declare victory if these stay below an arbitrary threshold.
	Also I voiced concerns about the rationale for excluding RFC3168 FQ-AQMs from this fall-back treatment, and gave an explicit example of a system in use (post-true bottleneck ingress shaping) that I would like to see to be tested first. This should be easy to test (and as far as I know these tests are planned if not already done) so that the RFC can either be amended with a link to the data showing that this is harmless, or changed ot indicate that the fall-back might also be required for FQ-AQMs under certain conditions.

Now if I look at https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#page-25, I see the following:

"A.1.4.  Fall back to Reno-friendly congestion control on classic ECN bottlenecks

   Description: A scalable congestion control needs to react to ECN
   marking from a non-L4S but ECN-capable bottleneck in a way that will
   coexist with a TCP Reno congestion control [RFC5681].

   Motivation: Similarly to the requirement in Appendix A.1.3, this
   requirement is a safety condition to ensure a scalable congestion
   control behaves properly when it builds a queue at a network
   bottleneck that has not been upgraded to support L4S.  On detecting
   classic ECN marking (see below), a scalable congestion control will
   need to fall back to classic congestion control behaviour.  If it
   does not comply with this requirement it could starve classic
   traffic.

   It would take time for endpoints to distinguish classic and L4S ECN
   marking.  An increase in queuing delay or in delay variation would be
   a tell-tale sign, but it is not yet clear where a line would be drawn
   between the two behaviours.  It might be possible to cache what was
   learned about the path to help subsequent attempts to detect the type
   of marking."

Here, the special casing of FQ-AQMs does not seem to be present, which L4S RFC will have precedence here?

Anyway, am I correct in interpreting all of the above as a clear an unambiguous requirement for L4S components like TCP-Prague to implement RFC3168-AQM detection and fall-back to appropriate behavior before being given the permission for usage on the wider internet?

Best Regards
	Sebastian

> On Jul 26, 2019, at 16:10, Black, David <David.Black at dell.com> wrote:
> 
> Inline comment on "IETF's official stance":
> 
>> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
>> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
>> ms making them ineffective seems like a no-go to me (could someone clarify
>> what the IETF's official stance is on this matter, please?),
> 
> The IETF expects that all relevant technical concerns such as this one will be raised by participants and will be carefully considered by the WG in determining what to do.
> 
> That was the technical answer, now for the official [officious? :-) ] answer ... the current L4S drafts do not modify RFC 3168 beyond the modifications already made by RFC 8311.  If anyone believes that to be incorrect, i.e., believes at least one of the L4S drafts has to further modify RFC 3168, please bring that up with a specific reference to the text in "RFC 3168 as modified by RFC 8311" that needs further modification.
> 
> Thanks, --David
> 
>> -----Original Message-----
>> From: Sebastian Moeller <moeller0 at gmx.de>
>> Sent: Friday, July 26, 2019 6:20 AM
>> To: Bob Briscoe
>> Cc: Black, David; ecn-sane at lists.bufferbloat.net; tsvwg at ietf.org; Dave Taht; De
>> Schepper, Koen (Nokia - BE/Antwerp)
>> Subject: [Ecn-sane] [tsvwg] Compatibility with singlw queue RFC3168 AQMs
>> 
>> 
>> [EXTERNAL EMAIL]
>> 
>> Dear Bob,
>> 
>> we have been going through the consequences and side effects of re-defining
>> the meaning of a CE-mark for L4S-flows and using ECT(1) as a flllow-classifying
>> heuristic.
>> One of the side-effects is that  a single queue ecn-enabled AQM will CE-marl L4S
>> packets, expecting a strong reduction in sending rate, while the L4S endpoints
>> will only respond to that signal with a mild rate-reduction. One of the
>> consequences of this behaviour is that L4S flows will crowd out RFC3168 and
>> non-ECN flows, because these flows half their rates on drop or CE-mark
>> (approximately) making congestion go away with the end result that the L4S
>> flows gain an undesired advantage, at least that is my interpretation of the
>> discussion so far.
>> Now there are two options to deal with this issue, one is to declare it
>> insignificant and just ignore it, or to make L4S endpoints detect that condition
>> and revert back to RFC3168 behaviour.
>> The first option seems highly undesirable to me, as a) (TCP-friendly) single queue
>> RFC3168 AQM are standards compliant and will be for the foreseeable future, so
>> ms making them ineffective seems like a no-go to me (could someone clarify
>> what the IETF's official stance is on this matter, please?), b) I would expect most
>> of such AQMs to be instantiated close to/at the consu,er's edge of the internet,
>> making it really hard to ameasure their prevalence.
>> In short, I believe the only sane way forward is to teach L4S endpoints to to the
>> right thing under such conditions, I believe this would not be too onerous an ask,
>> given that the configuration is easy to set up for testing and development and a
>> number of ideas have already been theoretically discussed here. As far as I can
>> see these ideas mostly riff on the idea that such anAQM will, under congesation
>> conditions, increase each ftraversing flow's RTT and that should be quickly and
>> robustly detectable. I would love to learn more about these ideas and the state
>> of development and testing.
>> 
>> Best Regards & many thanks in advance
>> 	Sebastian Moeller