From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ietf@bobbriscoe.net>
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id C17A63B2A4
 for <ecn-sane@lists.bufferbloat.net>; Thu,  4 Jul 2019 07:55:02 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date:
 Message-ID:From:References:Cc:To:Subject:Sender:Reply-To:
 Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
 Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
 List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=2QdDg9l9JEHMvGVKbVtRhhBFvtjsYnwdWEBg3qzBihk=; b=PwSRySeU22nKDw6v19hmlFnhd
 JmcMhwyr+KPhq3lb31qHEnUSO3z9yrqZTQDfy9OBge9wWnd9jFMT7qVxyiJtlyH23l4QxVf05HAC6
 5ywzSX+DjDFxdqbW0pYWwQ+ggCzCHgKR/M9wOXGDMVA4Xbwrtc/DMaDPqpk/UhGp+fihSyQl8tpqZ
 yClktLwoHn8Lwii8ZawxdouoAiZz6wC1n89LDAiJH4hot6Cv9xZ0Ae40oaQ9Vqp87j0TC07e3iSyO
 T2guIYe2Ba3dwj2B37oMG96k4N6+KUAjtHIUzvFerDKhMD/3h5z7KOY8Sa6wzKLjS5y0V51VjE98C
 a3wZX9P5g==;
Received: from [31.185.128.20] (port=56034 helo=[192.168.0.6])
 by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128)
 (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>)
 id 1hj0KK-0006Ak-C0; Thu, 04 Jul 2019 12:55:00 +0100
To: Luca Muscariello <muscariello@ieee.org>,
 "Holland, Jake" <jholland@akamai.com>
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <364514D5-07F2-4388-A2CD-35ED1AE38405@akamai.com>
 <cc446538-cf23-4fd0-12df-7839ec6c04a2@bobbriscoe.net>
 <CAH8sseSPz3FoLWZNPEJcwb4xQNYk_FXb8VS5ec9oYwocHAHCBg@mail.gmail.com>
 <4aff6353-eb0d-b0b8-942d-9c92753f074e@bobbriscoe.net>
 <D13294C4-105C-4F58-A762-6911A21A18C6@akamai.com>
 <CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <ce4b1e2d-3bc8-265c-6bcd-5a26b4dd89e9@bobbriscoe.net>
Date: Thu, 4 Jul 2019 12:54:59 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------5959447A6CB69C1E050DA449"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - lists.bufferbloat.net
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id:
 in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 04 Jul 2019 11:55:03 -0000

This is a multi-part message in MIME format.
--------------5959447A6CB69C1E050DA449
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

Luca,


On 19/06/2019 14:02, Luca Muscariello wrote:
> Jake,
>
> Yes, that is one scenario that I had in mind.
> Your response comforts me that I my message was not totally unreadable.
>
> My understanding was
> - There are incentives to mark packets  if they get privileged 
> treatment because of that marking. This is similar to the diffserv 
> model with all the consequences in terms of trust.
[BB] I'm afraid this is a common misunderstanding. We have gone to great 
lengths to ensure that the coupled dualQ does not give any privilege, by 
separating out latency from throughput, so:

  * It solely isolates traffic that gives /itself/ low latency from
    traffic that doesn't.
  * It is very hard to get any throughput advantage from the mechanism,
    relative to a FIFO (see further down this email).

The phrase "relative to a FIFO" is important. In a FIFO, it is of course 
possible for flows to take more throughput than others. We see that as a 
feature of the Internet not a bug. But we accept that some might disagree...

So those that want equal flow rates can add per-flow bandwidth policing, 
e.g. AFD, to the coupled dualQ. But that should be (and now can be) a 
separate policy choice.

An important advance of the coupled dualQ is to cut latency without 
interfering with throughput.


> - Unresponsive traffic in particular (gaming, voice, video etc.) has 
> incentives to mark. Assuming there is x% of unresponsive traffic in 
> the priority queue, it is non trivial to guess how the system works.
> - in particular it is easy to see the extreme cases,
>                (a) x is very small, assuming the system is stable, the 
> overall equilibrium will not change.
>                (b) x is very large so the dctcp like sources fall back 
> to cubic like and the systems behave almost like a single FIFO.
>                (c) in all other cases x varies according to the 
> unresponsive sources' rates.
>                     Several different equilibria may exist, some of 
> which may include oscillations. Including oscillations of all 
> fallback  mechanisms.
> The reason I'm asking is that these cases are not discussed in the I-D 
> documents or in the references, despite these are very common use cases.
[BB] This has all already been explained and discussed at length during 
various IETF meetings. I had an excellent student (Henrik Steen) act as 
a "red-team" guy. His challenge was: Can you contrive a mis-marking 
strategy with unresponsive traffic to cause any more harm than in a 
FIFO? We wanted to make sure that introducing a priority scheduler could 
not be exploited as a significant new attack vector.

Have you looked at his thesis - the [DualQ-Test 
<https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#ref-DualQ-Test>] 
reference at the end of this subsection of the Security Considerations 
in the aqm-dualq-coupled draft:
4.1.3. Protecting against Unresponsive ECN-Capable Traffic 
<https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#section-4.1.3> 
?
(we ruled evaluation results out of scope of this already over-long 
draft - instead giving references).

Firstly, when unresponsive traffic < link rate, counter-intuitively it 
doesn't matter which queue it classifies itself into. Any responsive 
traffic in either or both queues still shares out the remaining capacity 
as if the unresponsive traffic had subtracted from the overall capacity 
(like a FIFO).

Beyond that, Henrik tested whether the persistent overload mechanism 
that switches off any distinction between the queues (code in the 
reference Linux implementation 
<https://github.com/L4STeam/sch_dualpi2_upstream/blob/master/net/sched/sch_dualpi2.c>, 
pseudocode and explanation in Appendix A.2 
<https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#appendix-A.2>) 
left any room for mis-marked traffic to gain an advantage before the 
switch-over. There was a narrow region in which unresponsive traffic 
mismarked as ECN could strengthen its attack relative to the same attack 
on the Classic queue without mismarking.

I presented a one-slide summary of Henrik's experiment here in 2017 in 
IETF tcpm 
<https://datatracker.ietf.org/meeting/99/materials/slides-99-tcpm-ecn-adding-explicit-congestion-notification-ecn-to-tcp-control-packets-02#page=12>.
I tried to make the legends self-explanatory as long as you work at it, 
but shout if you need it explained.
Each column of plots shows attack traffic at increasing fractions of the 
link rate; from 70% to 200%.

Try to spot the difference between the odd columns and the even columns 
- they're just a little different in the narrow window either side of 
100% - a sharp kink instead of a smooth kink.
I included log-scale plots of the bottom end of the range to magnify the 
difference.

Yes, the system oscillates around the switch-over point, but you can see 
from the tcpm slide that the oscillations are also there in the 3rd 
column (which emulates the same switch-over in a FIFO). So we haven't 
added a new problem.

In summary, the advantage of mismarking was small and it was hard for 
the attacker not to trip the dualQ into overload state when it applies 
the same drop level in either queue. And that was when the victim 
traffic was just a predictable long-running flow. With normal less 
predictable victim traffic, I cannot think how to get this attack to be 
effective.


> If we add the queue protection mechanism, all unresponsive flows that 
> are caught cheating are registered in a blacklist and always scheduled 
> in the non-priority queue.
[BB]
1/ Queue protection is an alternative to overload protection, not an 
addition.

  * The Linux implementation solely uses the overload mechanism, which
    is sufficient to prevent the priority scheduler amplifying a
    mismarking attack (whether ECN or DSCP).
  * The DOCSIS implementation use per-flow queue protection instead.

2/ Aligned incentives

The coupled dualQ with just overload protection ensures incentives are 
aligned so that, normal developers won't intentionally mismark traffic. 
As explained at the start of this email:

    the DualQ solely isolates traffic that gives /itself/ low latency
    from traffic that doesn't. Low latency solely depends on the
    traffic's own behaviour. Traffic doesn't /get/ anything from the low
    latency queue, so there's no point mismarking to get into it.

However, incentives only address rational behaviour, not accidents and 
malice. That's why DOCSIS operators asked for Q protection - to protect 
against something accidentally or deliberately introducing bursty or 
excessive traffic into the low latency queue.

The Linux code is sufficient under normal circumstances though. There 
are already other mechanisms that deal with the worms, trojans, etc. 
that might launch these attacks.

3/ DOCSIS Q protection does not black-list flows.

It redirects certain /packets/ from those flows with the highest queuing 
scores into the Classic queue, only if those packets would otherwise 
risk a threshold delay for the low latency queue being exceeded.

If a flow has a temporary wobble, some of its packets get redirected to 
protect the low latency queue, but if it gets back on track, then 
there's just no further packet redirection.

> It that happens unresponsive flows will get a service quality that is 
> worse than if using a single FIFO for all flows.
4/ Slight punishment is a feature, not a bug

If an unresponsive flow is well-paced and not contributing to queuing, 
it will accumulate only a low queuing score, and experience no 
redirected packets.

If it is contributing to queuing and it is mismarking itself, then Q 
Prot will redirect some of its packets, and the continual reordering 
will (intentionally) give it worse service quality. This deliberate 
slight punishment gives developers a slight incentive to mark their 
flows correctly.

I could explain more about the queuing score (I think I already did for 
you on these lists), but it's all in Annex P of the DOCSIS spec 
<https://specification-search.cablelabs.com/CM-SP-MULPIv3.1>. and I'm 
trying to write a stand-alone document about it at the moment.


>
> Using a flow blacklist brings back the complexity that dualq is 
> supposed to remove compared to flow-isolation by flow-queueing.
> It seems to me that the blacklist is actually necessary to make dualq 
> work under the assumption that x is small,
[BB] As above, the Linux implementation works and aligns incentives 
without Q Prot, which is merely an optional additional protection 
against accidents and malice.

(and there's no flow black-list).


> because in the other cases the behavior
> of the dualq system is unspecified and likely subject to 
> instabilities, i.e. potentially different kind of oscillations.

I do find the tone of these emails rather disheartening. We've done all 
this work that we think is really cool. And all we get in return is 
criticism in an authoritative tone as if it is backed by experiments. 
But so far it is not. There seems to be a presumption that we are not 
professional and we are somehow not to be trusted to have done a sound job.

Yes, I'm sure mistakes can be found in our work. But it would be nice if 
the tone of these emails could become more constructive. Possibly even 
some praise. There seems to be a presumption of disrespect that I'm not 
used to, and I would rather it stopped.

Sorry for going silent recently - had too much backlog. I'm working my 
way backwards through this thread. Next I'll reply to Jake's email, 
which is, as always, perfectly constructive.

Cheers


Bob

> Luca
>
>
>
>
> On Tue, Jun 18, 2019 at 9:25 PM Holland, Jake <jholland@akamai.com 
> <mailto:jholland@akamai.com>> wrote:
>
>     Hi Bob and Luca,
>
>     Thank you both for this discussion, I think it helped crystallize a
>     comment I hadn't figured out how to make yet, but was bothering me.
>
>     I’m reading Luca’s question as asking about fixed-rate traffic
>     that does
>     something like a cutoff or downshift if loss gets bad enough for long
>     enough, but is otherwise unresponsive.
>
>     The dualq draft does discuss unresponsive traffic in 3 of the sub-
>     sections in section 4, but there's a point that seems sort of swept
>     aside without comment in the analysis to me.
>
>     The referenced paper[1] from that section does examine the question
>     of sharing a link with unresponsive traffic in some detail, but the
>     analysis seems to bake in an assumption that there's a fixed amount
>     of unresponsive traffic, when in fact for a lot of the real-life
>     scenarios for unresponsive traffic (games, voice, and some of the
>     video conferencing) there's some app-level backpressure, in that
>     when the quality of experience goes low enough, the user (or a qoe
>     trigger in the app) will often change the traffic demand at a higher
>     layer than a congestion controller (by shutting off video, for
>     instance).
>
>     The reason I mention it is because it seems like unresponsive
>     traffic has an incentive to mark L4S and get low latency.  It doesn't
>     hurt, since it's a fixed rate and not bandwidth-seeking, so it's
>     perfectly happy to massively underutilize the link. And until the
>     link gets overloaded it will no longer suffer delay when using the
>     low latency queue, whereas in the classic queue queuing delay provides
>     a noticeable degradation in the presence of competing traffic.
>
>     I didn't see anywhere in the paper that tried to check the quality
>     of experience for the UDP traffic as non-responsive traffic approached
>     saturation, except by inference that loss in the classic queue will
>     cause loss in the LL queue as well.
>
>     But letting unresponsive flows get away with pushing out more classic
>     traffic and removing the penalty that classic flows would give it
>     seems
>     like a risk that would result in more use of this kind of unresponsive
>     traffic marking itself for the LL queue, since it just would get lower
>     latency almost up until overload.
>
>     Many of the apps that send unresponsive traffic would benefit from low
>     latency and isolation from the classic traffic, so it seems a mistake
>     to claim there's no benefit, and it furthermore seems like there's
>     systematic pressures that would often push unresponsive apps into this
>     domain.
>
>     If that line of reasoning holds up, the "rather specific" phrase in
>     section 4.1.1 of the dualq draft might not turn out to be so specific
>     after all, and could be seen as downplaying the risks.
>
>     Best regards,
>     Jake
>
>     [1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf
>
>     PS: This seems like a consequence of the lack of access control on
>     setting ECT(1), and maybe the queue protection function would address
>     it, so that's interesting to hear about.
>
>     But I thought the whole point of dualq over fq was that fq state
>     couldn't
>     scale properly in aggregating devices with enough expected flows
>     sharing
>     a queue?  If this protection feature turns out to be necessary,
>     would that
>     advantage be gone?  (Also: why would one want to turn this
>     protection off
>     if it's available?)
>
>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


--------------5959447A6CB69C1E050DA449
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Luca,<br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 19/06/2019 14:02, Luca Muscariello
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">Jake,
        <div><br>
        </div>
        <div>Yes, that is one scenario that I had in mind. </div>
        <div>Your response comforts me that I my message was not totally
          unreadable. </div>
        <div><br>
        </div>
        <div>My understanding was</div>
        <div>- There are incentives to mark packets  if they get
          privileged treatment because of that marking. This is similar
          to the diffserv model with all the consequences in terms of
          trust.</div>
      </div>
    </blockquote>
    [BB] I'm afraid this is a common misunderstanding. We have gone to
    great lengths to ensure that the coupled dualQ does not give any
    privilege, by separating out latency from throughput, so:<br>
    <ul>
      <li>It solely isolates traffic that gives /itself/ low latency
        from traffic that doesn't.</li>
      <li>It is very hard to get any throughput advantage from the
        mechanism, relative to a FIFO (see further down this email).</li>
    </ul>
    The phrase "relative to a FIFO" is important. In a FIFO, it is of
    course possible for flows to take more throughput than others. We
    see that as a feature of the Internet not a bug. But we accept that
    some might disagree...<br>
    <br>
    So those that want equal flow rates can add per-flow bandwidth
    policing, e.g. AFD, to the coupled dualQ. But that should be (and
    now can be) a separate policy choice. <br>
    <br>
    An important advance of the coupled dualQ is to cut latency without
    interfering with throughput.<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr">
        <div>- Unresponsive traffic in particular (gaming, voice, video
          etc.) has incentives to mark. Assuming there is x% of
          unresponsive traffic in the priority queue, it is non trivial
          to guess how the system works.</div>
        <div>- in particular it is easy to see the extreme cases, </div>
        <div>               (a) x is very small, assuming the system is
          stable, the overall equilibrium will not change.  </div>
        <div>               (b) x is very large so the dctcp like
          sources fall back to cubic like and the systems behave almost
          like a single FIFO.</div>
        <div>               (c) in all other cases x varies according to
          the unresponsive sources' rates. </div>
        <div>                    Several different equilibria may exist,
          some of which may include oscillations. Including oscillations
          of all fallback  mechanisms.</div>
        <div>The reason I'm asking is that these cases are not discussed
          in the I-D documents or in the references, despite these are
          very common use cases.</div>
      </div>
    </blockquote>
    [BB] This has all already been explained and discussed at length
    during various IETF meetings. I had an excellent student (Henrik
    Steen) act as a "red-team" guy. His challenge was: Can you contrive
    a mis-marking strategy with unresponsive traffic to cause any more
    harm than in a FIFO? We wanted to make sure that introducing a
    priority scheduler could not be exploited as a significant new
    attack vector.<br>
    <br>
    Have you looked at his thesis - the [<a
href="https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#ref-DualQ-Test"
      title="&quot;Destruction Testing: Ultra-Low Delay using Dual Queue
      Coupled Active Queue Management&quot;">DualQ-Test</a>] reference
    at the end of this subsection of the Security Considerations in the
    aqm-dualq-coupled draft:<br>
     <a moz-do-not-send="true"
href="https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#section-4.1.3">4.1.3. 
      Protecting against Unresponsive ECN-Capable Traffic</a> ?<br>
    (we ruled evaluation results out of scope of this already over-long
    draft - instead giving references).<br>
    <br>
    Firstly, when unresponsive traffic &lt; link rate,
    counter-intuitively it doesn't matter which queue it classifies
    itself into. Any responsive traffic in either or both queues still
    shares out the remaining capacity as if the unresponsive traffic had
    subtracted from the overall capacity (like a FIFO). <br>
    <br>
    Beyond that, Henrik tested whether the persistent overload mechanism
    that switches off any distinction between the queues (<a
      moz-do-not-send="true"
href="https://github.com/L4STeam/sch_dualpi2_upstream/blob/master/net/sched/sch_dualpi2.c">code
      in the reference Linux implementation</a>, <a
href="https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-09#appendix-A.2">pseudocode
      and explanation in Appendix A.2</a>) left any room for mis-marked
    traffic to gain an advantage before the switch-over. There was a
    narrow region in which unresponsive traffic mismarked as ECN could
    strengthen its attack relative to the same attack on the Classic
    queue without mismarking. <br>
    <br>
    I presented a one-slide summary of Henrik's experiment here <a
href="https://datatracker.ietf.org/meeting/99/materials/slides-99-tcpm-ecn-adding-explicit-congestion-notification-ecn-to-tcp-control-packets-02#page=12">in
      2017 in IETF tcpm</a>.<br>
    I tried to make the legends self-explanatory as long as you work at
    it, but shout if you need it explained.<br>
    Each column of plots shows attack traffic at increasing fractions of
    the link rate; from 70% to 200%.<br>
    <br>
    Try to spot the difference between the odd columns and the even
    columns - they're just a little different in the narrow window
    either side of 100% - a sharp kink instead of a smooth kink. <br>
    I included log-scale plots of the bottom end of the range to magnify
    the difference.<br>
    <br>
    Yes, the system oscillates around the switch-over point, but you can
    see from the tcpm slide that the oscillations are also there in the
    3rd column (which emulates the same switch-over in a FIFO). So we
    haven't added a new problem.<br>
    <br>
    In summary, the advantage of mismarking was small and it was hard
    for the attacker not to trip the dualQ into overload state when it
    applies the same drop level in either queue. And that was when the
    victim traffic was just a predictable long-running flow. With normal
    less predictable victim traffic, I cannot think how to get this
    attack to be effective.<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr">
        <div>If we add the queue protection mechanism, all unresponsive 
          flows that are caught cheating are registered in a blacklist
          and always scheduled in the non-priority queue.</div>
      </div>
    </blockquote>
    [BB] <br>
    1/ Queue protection is an alternative to overload protection, not an
    addition. <br>
    <ul>
      <li>The Linux implementation solely uses the overload mechanism,
        which is sufficient to prevent the priority scheduler amplifying
        a mismarking attack (whether ECN or DSCP).</li>
      <li>The DOCSIS implementation use per-flow queue protection
        instead.<br>
      </li>
    </ul>
    2/ Aligned incentives<br>
    <br>
    The coupled dualQ with just overload protection ensures incentives
    are aligned so that, normal developers won't intentionally mismark
    traffic. As explained at the start of this email:<br>
    <blockquote>the DualQ solely isolates traffic that gives /itself/
      low latency from traffic that doesn't. Low latency solely depends
      on the traffic's own behaviour. Traffic doesn't /get/ anything
      from the low latency queue, so there's no point mismarking to get
      into it.<br>
    </blockquote>
    However, incentives only address rational behaviour, not accidents
    and malice. That's why DOCSIS operators asked for Q protection - to
    protect against something accidentally or deliberately introducing
    bursty or excessive traffic into the low latency queue.<br>
    <br>
    The Linux code is sufficient under normal circumstances though.
    There are already other mechanisms that deal with the worms,
    trojans, etc. that might launch these attacks. <br>
    <br>
    3/ DOCSIS Q protection does not black-list flows. <br>
    <br>
    It redirects certain /packets/ from those flows with the highest
    queuing scores into the Classic queue, only if those packets would
    otherwise risk a threshold delay for the low latency queue being
    exceeded. <br>
    <br>
    If a flow has a temporary wobble, some of its packets get redirected
    to protect the low latency queue, but if it gets back on track, then
    there's just no further packet redirection. <br>
    <br>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr">
        <div>It that happens unresponsive flows will get a service
          quality that is worse than if using a single FIFO for all
          flows.</div>
      </div>
    </blockquote>
    4/ Slight punishment is a feature, not a bug<br>
    <br>
    If an unresponsive flow is well-paced and not contributing to
    queuing, it will accumulate only a low queuing score, and experience
    no redirected packets.<br>
    <br>
    If it is contributing to queuing and it is mismarking itself, then Q
    Prot will redirect some of its packets, and the continual reordering
    will (intentionally) give it worse service quality. This deliberate
    slight punishment gives developers a slight incentive to mark their
    flows correctly.<br>
    <br>
    I could explain more about the queuing score (I think I already did
    for you on these lists), but it's all in Annex P of <a
      moz-do-not-send="true"
      href="https://specification-search.cablelabs.com/CM-SP-MULPIv3.1">the
      DOCSIS spec</a>. and I'm trying to write a stand-alone document
    about it at the moment.<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr">
        <div><br>
        </div>
        <div>Using a flow blacklist brings back the complexity that
          dualq is supposed to remove compared to flow-isolation by
          flow-queueing.</div>
        <div>It seems to me that the blacklist is actually necessary to
          make dualq work under the assumption that x is small, </div>
      </div>
    </blockquote>
    [BB] As above, the Linux implementation works and aligns incentives
    without Q Prot, which is merely an optional additional protection
    against accidents and malice.<br>
    <br>
    (and there's no flow black-list).<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr">
        <div>because in the other cases the behavior</div>
        <div>of the dualq system is unspecified and likely subject to
          instabilities, i.e. potentially different kind of
          oscillations. <br>
        </div>
      </div>
    </blockquote>
    <br>
    I do find the tone of these emails rather disheartening. We've done
    all this work that we think is really cool. And all we get in return
    is criticism in an authoritative tone as if it is backed by
    experiments. But so far it is not. There seems to be a presumption
    that we are not professional and we are somehow not to be trusted to
    have done a sound job.<br>
    <br>
    Yes, I'm sure mistakes can be found in our work. But it would be
    nice if the tone of these emails could become more constructive.
    Possibly even some praise. There seems to be a presumption of
    disrespect that I'm not used to, and I would rather it stopped.<br>
    <br>
    Sorry for going silent recently - had too much backlog. I'm working
    my way backwards through this thread. Next I'll reply to Jake's
    email, which is, as always, perfectly constructive.<br>
    <br>
    Cheers<br>
    <br>
    <br>
    Bob<br>
    <br>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr">
        <div>Luca</div>
      </div>
    </blockquote>
    <blockquote type="cite"
cite="mid:CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com">
      <div dir="ltr"><br>
        <div><br>
        </div>
        <div><br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Jun 18, 2019 at 9:25
          PM Holland, Jake &lt;<a href="mailto:jholland@akamai.com"
            moz-do-not-send="true">jholland@akamai.com</a>&gt; wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi
          Bob and Luca,<br>
          <br>
          Thank you both for this discussion, I think it helped
          crystallize a<br>
          comment I hadn't figured out how to make yet, but was
          bothering me.<br>
          <br>
          I’m reading Luca’s question as asking about fixed-rate traffic
          that does<br>
          something like a cutoff or downshift if loss gets bad enough
          for long<br>
          enough, but is otherwise unresponsive.<br>
          <br>
          The dualq draft does discuss unresponsive traffic in 3 of the
          sub-<br>
          sections in section 4, but there's a point that seems sort of
          swept<br>
          aside without comment in the analysis to me.<br>
          <br>
          The referenced paper[1] from that section does examine the
          question<br>
          of sharing a link with unresponsive traffic in some detail,
          but the<br>
          analysis seems to bake in an assumption that there's a fixed
          amount<br>
          of unresponsive traffic, when in fact for a lot of the
          real-life<br>
          scenarios for unresponsive traffic (games, voice, and some of
          the<br>
          video conferencing) there's some app-level backpressure, in
          that<br>
          when the quality of experience goes low enough, the user (or a
          qoe<br>
          trigger in the app) will often change the traffic demand at a
          higher<br>
          layer than a congestion controller (by shutting off video, for<br>
          instance).<br>
          <br>
          The reason I mention it is because it seems like unresponsive<br>
          traffic has an incentive to mark L4S and get low latency.  It
          doesn't<br>
          hurt, since it's a fixed rate and not bandwidth-seeking, so
          it's<br>
          perfectly happy to massively underutilize the link. And until
          the<br>
          link gets overloaded it will no longer suffer delay when using
          the<br>
          low latency queue, whereas in the classic queue queuing delay
          provides<br>
          a noticeable degradation in the presence of competing traffic.<br>
          <br>
          I didn't see anywhere in the paper that tried to check the
          quality<br>
          of experience for the UDP traffic as non-responsive traffic
          approached<br>
          saturation, except by inference that loss in the classic queue
          will<br>
          cause loss in the LL queue as well.<br>
          <br>
          But letting unresponsive flows get away with pushing out more
          classic<br>
          traffic and removing the penalty that classic flows would give
          it seems<br>
          like a risk that would result in more use of this kind of
          unresponsive<br>
          traffic marking itself for the LL queue, since it just would
          get lower<br>
          latency almost up until overload.<br>
          <br>
          Many of the apps that send unresponsive traffic would benefit
          from low<br>
          latency and isolation from the classic traffic, so it seems a
          mistake<br>
          to claim there's no benefit, and it furthermore seems like
          there's<br>
          systematic pressures that would often push unresponsive apps
          into this<br>
          domain.<br>
          <br>
          If that line of reasoning holds up, the "rather specific"
          phrase in<br>
          section 4.1.1 of the dualq draft might not turn out to be so
          specific<br>
          after all, and could be seen as downplaying the risks.<br>
          <br>
          Best regards,<br>
          Jake<br>
          <br>
          [1] <a
href="https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf</a><br>
          <br>
          PS: This seems like a consequence of the lack of access
          control on<br>
          setting ECT(1), and maybe the queue protection function would
          address<br>
          it, so that's interesting to hear about.<br>
          <br>
          But I thought the whole point of dualq over fq was that fq
          state couldn't<br>
          scale properly in aggregating devices with enough expected
          flows sharing<br>
          a queue?  If this protection feature turns out to be
          necessary, would that<br>
          advantage be gone?  (Also: why would one want to turn this
          protection off<br>
          if it's available?)<br>
          <br>
          <br>
        </blockquote>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Ecn-sane mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ecn-sane@lists.bufferbloat.net">Ecn-sane@lists.bufferbloat.net</a>
<a class="moz-txt-link-freetext" href="https://lists.bufferbloat.net/listinfo/ecn-sane">https://lists.bufferbloat.net/listinfo/ecn-sane</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
________________________________________________________________
Bob Briscoe                               <a class="moz-txt-link-freetext" href="http://bobbriscoe.net/">http://bobbriscoe.net/</a></pre>
  </body>
</html>

--------------5959447A6CB69C1E050DA449--