From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <muscariello@ieee.org>
Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com
 [IPv6:2a00:1450:4864:20::342])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 00BB33B2A4
 for <ecn-sane@lists.bufferbloat.net>; Wed, 19 Jun 2019 09:02:25 -0400 (EDT)
Received: by mail-wm1-x342.google.com with SMTP id u8so1773699wmm.1
 for <ecn-sane@lists.bufferbloat.net>; Wed, 19 Jun 2019 06:02:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ieee.org; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=b7rT4dObkzGfltnqgYrNZisLK5o22yz7RMv91/t9qgc=;
 b=gHU4G9dKrXL1RB1gflp4vRYQ6oZdyv78os+esQY/NCysUj7SU5k7P0Gr7odn1pYd+r
 013amlVwatekhO984VmlIob7bvmKNKX8xJIc6rtvnZv2soM8DT+83m9eYNS7YR8Pa4in
 LjWUVZ1DBn/xTuS5zqsqFsfjfAvRLqquQxmlo=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=b7rT4dObkzGfltnqgYrNZisLK5o22yz7RMv91/t9qgc=;
 b=kuEv8u7wVECs8jpZXrMm+O5HQMMuyjXLOmd+D3VQi/NNNwRwKyT32dHTy4xQoUc93N
 JTYvoK9LSvxOd3tyc93TsENbaxRFk7XBGYqjhO4i4Yl0zisGxR/qY1wsqD8ChD24t2UW
 oDZrQHf8OK7lZPxsn4LNvqOyIQDC+RGE4OvoB4ME1Q0/ssY41bNBJXig+6n/IKsqhZkX
 6CJfP+fR5EOXJ/ZSnKhb7qsVj+fNh+lDkMnmYJW+XTnIp4amysd7srNEBs8VIHKFdkRf
 qIp7oqDX60JUzAMPRhJaXE+8scgLFowdJEPKGJeLy9ACZ3donLzXXj+geTfdgaEUz/js
 dimw==
X-Gm-Message-State: APjAAAVIkuFxqGYCpD4HQZqLR496dwIafq7vM1lCmc332eJUcB+Fvyh8
 Sz5zHZ0rWkgYuWloHJvAjE3D4M8VnDfuf7zah9pXRg==
X-Google-Smtp-Source: APXvYqz5LR4qHAyeqPmsJ6jLR+agc2OzG9yzTnAx7lsXzp9aHzScHQYAIx8mIMMh4ZcrCptF+gCx4lIzgj6u09+TfFw=
X-Received: by 2002:a05:600c:389:: with SMTP id
 w9mr8032314wmd.139.1560949345021; 
 Wed, 19 Jun 2019 06:02:25 -0700 (PDT)
MIME-Version: 1.0
References: <364514D5-07F2-4388-A2CD-35ED1AE38405@akamai.com>
 <cc446538-cf23-4fd0-12df-7839ec6c04a2@bobbriscoe.net>
 <CAH8sseSPz3FoLWZNPEJcwb4xQNYk_FXb8VS5ec9oYwocHAHCBg@mail.gmail.com>
 <4aff6353-eb0d-b0b8-942d-9c92753f074e@bobbriscoe.net>
 <D13294C4-105C-4F58-A762-6911A21A18C6@akamai.com>
In-Reply-To: <D13294C4-105C-4F58-A762-6911A21A18C6@akamai.com>
From: Luca Muscariello <muscariello@ieee.org>
Date: Wed, 19 Jun 2019 06:02:13 -0700
Message-ID: <CAH8sseSQaCbknok--hf=DgwzCs3OnnkKjPy5bdLgnzjq7-+c_w@mail.gmail.com>
To: "Holland, Jake" <jholland@akamai.com>
Cc: Bob Briscoe <ietf@bobbriscoe.net>, "tsvwg@ietf.org" <tsvwg@ietf.org>, 
 "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary="0000000000003923a8058bacd768"
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Wed, 19 Jun 2019 13:02:26 -0000

--0000000000003923a8058bacd768
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Jake,

Yes, that is one scenario that I had in mind.
Your response comforts me that I my message was not totally unreadable.

My understanding was
- There are incentives to mark packets  if they get privileged treatment
because of that marking. This is similar to the diffserv model with all the
consequences in terms of trust.
- Unresponsive traffic in particular (gaming, voice, video etc.) has
incentives to mark. Assuming there is x% of unresponsive traffic in the
priority queue, it is non trivial to guess how the system works.
- in particular it is easy to see the extreme cases,
               (a) x is very small, assuming the system is stable, the
overall equilibrium will not change.
               (b) x is very large so the dctcp like sources fall back to
cubic like and the systems behave almost like a single FIFO.
               (c) in all other cases x varies according to the
unresponsive sources' rates.
                    Several different equilibria may exist, some of which
may include oscillations. Including oscillations of all fallback
mechanisms.
The reason I'm asking is that these cases are not discussed in the I-D
documents or in the references, despite these are very common use cases.

If we add the queue protection mechanism, all unresponsive  flows that are
caught cheating are registered in a blacklist and always scheduled in the
non-priority queue.
It that happens unresponsive flows will get a service quality that is worse
than if using a single FIFO for all flows.

Using a flow blacklist brings back the complexity that dualq is supposed to
remove compared to flow-isolation by flow-queueing.
It seems to me that the blacklist is actually necessary to make dualq work
under the assumption that x is small, because in the other cases the
behavior
of the dualq system is unspecified and likely subject to instabilities,
i.e. potentially different kind of oscillations.

Luca


On Tue, Jun 18, 2019 at 9:25 PM Holland, Jake <jholland@akamai.com> wrote:

> Hi Bob and Luca,
>
> Thank you both for this discussion, I think it helped crystallize a
> comment I hadn't figured out how to make yet, but was bothering me.
>
> I=E2=80=99m reading Luca=E2=80=99s question as asking about fixed-rate tr=
affic that does
> something like a cutoff or downshift if loss gets bad enough for long
> enough, but is otherwise unresponsive.
>
> The dualq draft does discuss unresponsive traffic in 3 of the sub-
> sections in section 4, but there's a point that seems sort of swept
> aside without comment in the analysis to me.
>
> The referenced paper[1] from that section does examine the question
> of sharing a link with unresponsive traffic in some detail, but the
> analysis seems to bake in an assumption that there's a fixed amount
> of unresponsive traffic, when in fact for a lot of the real-life
> scenarios for unresponsive traffic (games, voice, and some of the
> video conferencing) there's some app-level backpressure, in that
> when the quality of experience goes low enough, the user (or a qoe
> trigger in the app) will often change the traffic demand at a higher
> layer than a congestion controller (by shutting off video, for
> instance).
>
> The reason I mention it is because it seems like unresponsive
> traffic has an incentive to mark L4S and get low latency.  It doesn't
> hurt, since it's a fixed rate and not bandwidth-seeking, so it's
> perfectly happy to massively underutilize the link. And until the
> link gets overloaded it will no longer suffer delay when using the
> low latency queue, whereas in the classic queue queuing delay provides
> a noticeable degradation in the presence of competing traffic.
>
> I didn't see anywhere in the paper that tried to check the quality
> of experience for the UDP traffic as non-responsive traffic approached
> saturation, except by inference that loss in the classic queue will
> cause loss in the LL queue as well.
>
> But letting unresponsive flows get away with pushing out more classic
> traffic and removing the penalty that classic flows would give it seems
> like a risk that would result in more use of this kind of unresponsive
> traffic marking itself for the LL queue, since it just would get lower
> latency almost up until overload.
>
> Many of the apps that send unresponsive traffic would benefit from low
> latency and isolation from the classic traffic, so it seems a mistake
> to claim there's no benefit, and it furthermore seems like there's
> systematic pressures that would often push unresponsive apps into this
> domain.
>
> If that line of reasoning holds up, the "rather specific" phrase in
> section 4.1.1 of the dualq draft might not turn out to be so specific
> after all, and could be seen as downplaying the risks.
>
> Best regards,
> Jake
>
> [1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf
>
> PS: This seems like a consequence of the lack of access control on
> setting ECT(1), and maybe the queue protection function would address
> it, so that's interesting to hear about.
>
> But I thought the whole point of dualq over fq was that fq state couldn't
> scale properly in aggregating devices with enough expected flows sharing
> a queue?  If this protection feature turns out to be necessary, would tha=
t
> advantage be gone?  (Also: why would one want to turn this protection off
> if it's available?)
>
>
>

--0000000000003923a8058bacd768
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Jake,<div><br></div><div>Yes, that is one scenario that I =
had in mind.=C2=A0</div><div>Your response comforts me that I my message wa=
s not totally unreadable.=C2=A0</div><div><br></div><div>My understanding w=
as</div><div>- There are incentives to mark packets=C2=A0 if they get privi=
leged treatment because of that marking. This is similar to the diffserv mo=
del with all the consequences in terms of trust.</div><div>- Unresponsive t=
raffic in particular (gaming, voice, video etc.) has incentives to mark. As=
suming there is x% of unresponsive traffic in the priority queue, it is non=
 trivial to guess how the system works.</div><div>- in particular it is eas=
y to see the extreme cases,=C2=A0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0(a) x is very small, assuming the system is stable,=
 the overall equilibrium will not change.=C2=A0=C2=A0</div><div>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(b) x is very large so the dct=
cp like sources fall back to cubic like and the systems behave almost like =
a single FIFO.</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0(c) in all other cases x varies according to the unresponsive sources=
&#39; rates.=C2=A0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 Several different equilibria may exist, some of wh=
ich may include oscillations. Including oscillations of all fallback=C2=A0 =
mechanisms.</div><div>The reason I&#39;m asking is that these cases are not=
 discussed in the I-D documents or in the references, despite these are ver=
y common use cases.</div><div><br></div><div>If we add the queue protection=
 mechanism, all unresponsive=C2=A0 flows that are caught cheating are regis=
tered in a blacklist and always scheduled in the non-priority queue.</div><=
div>It that happens unresponsive flows will get a service quality that is w=
orse than if using a single FIFO for all flows.</div><div><br></div><div>Us=
ing a flow blacklist brings back the complexity that dualq is supposed to r=
emove compared to flow-isolation by flow-queueing.</div><div>It seems to me=
 that the blacklist is actually necessary to make dualq work under the assu=
mption that x is small, because in the other cases the behavior</div><div>o=
f the dualq system is unspecified and likely subject to instabilities, i.e.=
 potentially different kind of oscillations.=C2=A0</div><div><br></div><div=
>Luca</div><div><br></div><div><br></div><div><br></div></div><br><div clas=
s=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jun 18, 201=
9 at 9:25 PM Holland, Jake &lt;<a href=3D"mailto:jholland@akamai.com">jholl=
and@akamai.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd=
ing-left:1ex">Hi Bob and Luca,<br>
<br>
Thank you both for this discussion, I think it helped crystallize a<br>
comment I hadn&#39;t figured out how to make yet, but was bothering me.<br>
<br>
I=E2=80=99m reading Luca=E2=80=99s question as asking about fixed-rate traf=
fic that does<br>
something like a cutoff or downshift if loss gets bad enough for long<br>
enough, but is otherwise unresponsive.<br>
<br>
The dualq draft does discuss unresponsive traffic in 3 of the sub-<br>
sections in section 4, but there&#39;s a point that seems sort of swept<br>
aside without comment in the analysis to me.<br>
<br>
The referenced paper[1] from that section does examine the question<br>
of sharing a link with unresponsive traffic in some detail, but the<br>
analysis seems to bake in an assumption that there&#39;s a fixed amount<br>
of unresponsive traffic, when in fact for a lot of the real-life<br>
scenarios for unresponsive traffic (games, voice, and some of the<br>
video conferencing) there&#39;s some app-level backpressure, in that<br>
when the quality of experience goes low enough, the user (or a qoe<br>
trigger in the app) will often change the traffic demand at a higher<br>
layer than a congestion controller (by shutting off video, for<br>
instance).<br>
<br>
The reason I mention it is because it seems like unresponsive<br>
traffic has an incentive to mark L4S and get low latency.=C2=A0 It doesn=
9;t<br>
hurt, since it&#39;s a fixed rate and not bandwidth-seeking, so it&#39;s<br=
>
perfectly happy to massively underutilize the link. And until the<br>
link gets overloaded it will no longer suffer delay when using the<br>
low latency queue, whereas in the classic queue queuing delay provides<br>
a noticeable degradation in the presence of competing traffic.<br>
<br>
I didn&#39;t see anywhere in the paper that tried to check the quality<br>
of experience for the UDP traffic as non-responsive traffic approached<br>
saturation, except by inference that loss in the classic queue will<br>
cause loss in the LL queue as well.<br>
<br>
But letting unresponsive flows get away with pushing out more classic<br>
traffic and removing the penalty that classic flows would give it seems<br>
like a risk that would result in more use of this kind of unresponsive<br>
traffic marking itself for the LL queue, since it just would get lower<br>
latency almost up until overload.<br>
<br>
Many of the apps that send unresponsive traffic would benefit from low<br>
latency and isolation from the classic traffic, so it seems a mistake<br>
to claim there&#39;s no benefit, and it furthermore seems like there&#39;s<=
br>
systematic pressures that would often push unresponsive apps into this<br>
domain.<br>
<br>
If that line of reasoning holds up, the &quot;rather specific&quot; phrase =
in<br>
section 4.1.1 of the dualq draft might not turn out to be so specific<br>
after all, and could be seen as downplaying the risks.<br>
<br>
Best regards,<br>
Jake<br>
<br>
[1] <a href=3D"https://riteproject.files.wordpress.com/2018/07/thesis-henrs=
te.pdf" rel=3D"noreferrer" target=3D"_blank">https://riteproject.files.word=
press.com/2018/07/thesis-henrste.pdf</a><br>
<br>
PS: This seems like a consequence of the lack of access control on<br>
setting ECT(1), and maybe the queue protection function would address<br>
it, so that&#39;s interesting to hear about.<br>
<br>
But I thought the whole point of dualq over fq was that fq state couldn&#39=
;t<br>
scale properly in aggregating devices with enough expected flows sharing<br=
>
a queue?=C2=A0 If this protection feature turns out to be necessary, would =
that<br>
advantage be gone?=C2=A0 (Also: why would one want to turn this protection =
off<br>
if it&#39;s available?)<br>
<br>
<br>
</blockquote></div>

--0000000000003923a8058bacd768--