Jake,

Yes, that is one scenario that I had in mind.

Your response comforts me that I my message was not totally unreadable.

My understanding was

- There are incentives to mark packets if they get privileged treatment because of that marking. This is similar to the diffserv model with all the consequences in terms of trust.

- Unresponsive traffic in particular (gaming, voice, video etc.) has incentives to mark. Assuming there is x% of unresponsive traffic in the priority queue, it is non trivial to guess how the system works.

- in particular it is easy to see the extreme cases,

(a) x is very small, assuming the system is stable, the overall equilibrium will not change.

(b) x is very large so the dctcp like sources fall back to cubic like and the systems behave almost like a single FIFO.

Several different equilibria may exist, some of which may include oscillations. Including oscillations of all fallback mechanisms.

The reason I'm asking is that these cases are not discussed in the I-D documents or in the references, despite these are very common use cases.

If we add the queue protection mechanism, all unresponsive flows that are caught cheating are registered in a blacklist and always scheduled in the non-priority queue.

It that happens unresponsive flows will get a service quality that is worse than if using a single FIFO for all flows.

Using a flow blacklist brings back the complexity that dualq is supposed to remove compared to flow-isolation by flow-queueing.

It seems to me that the blacklist is actually necessary to make dualq work under the assumption that x is small, because in the other cases the behavior

of the dualq system is unspecified and likely subject to instabilities, i.e. potentially different kind of oscillations.

Luca

On Tue, Jun 18, 2019 at 9:25 PM Holland, Jake <jholland@akamai.com> wrote:

Hi Bob and Luca,

Thank you both for this discussion, I think it helped crystallize a
comment I hadn't figured out how to make yet, but was bothering me.

I’m reading Luca’s question as asking about fixed-rate traffic that does
something like a cutoff or downshift if loss gets bad enough for long
enough, but is otherwise unresponsive.

The dualq draft does discuss unresponsive traffic in 3 of the sub-
sections in section 4, but there's a point that seems sort of swept
aside without comment in the analysis to me.

The referenced paper[1] from that section does examine the question
of sharing a link with unresponsive traffic in some detail, but the
analysis seems to bake in an assumption that there's a fixed amount
of unresponsive traffic, when in fact for a lot of the real-life
scenarios for unresponsive traffic (games, voice, and some of the
video conferencing) there's some app-level backpressure, in that
when the quality of experience goes low enough, the user (or a qoe
trigger in the app) will often change the traffic demand at a higher
layer than a congestion controller (by shutting off video, for
instance).

The reason I mention it is because it seems like unresponsive
traffic has an incentive to mark L4S and get low latency. It doesn't
hurt, since it's a fixed rate and not bandwidth-seeking, so it's
perfectly happy to massively underutilize the link. And until the
link gets overloaded it will no longer suffer delay when using the
low latency queue, whereas in the classic queue queuing delay provides
a noticeable degradation in the presence of competing traffic.

I didn't see anywhere in the paper that tried to check the quality
of experience for the UDP traffic as non-responsive traffic approached
saturation, except by inference that loss in the classic queue will
cause loss in the LL queue as well.

But letting unresponsive flows get away with pushing out more classic
traffic and removing the penalty that classic flows would give it seems
like a risk that would result in more use of this kind of unresponsive
traffic marking itself for the LL queue, since it just would get lower
latency almost up until overload.

Many of the apps that send unresponsive traffic would benefit from low
latency and isolation from the classic traffic, so it seems a mistake
to claim there's no benefit, and it furthermore seems like there's
systematic pressures that would often push unresponsive apps into this
domain.

If that line of reasoning holds up, the "rather specific" phrase in
section 4.1.1 of the dualq draft might not turn out to be so specific
after all, and could be seen as downplaying the risks.

Best regards,
Jake

[1] https://riteproject.files.wordpress.com/2018/07/thesis-henrste.pdf

PS: This seems like a consequence of the lack of access control on
setting ECT(1), and maybe the queue protection function would address
it, so that's interesting to hear about.

But I thought the whole point of dualq over fq was that fq state couldn't
scale properly in aggregating devices with enough expected flows sharing
a queue? If this protection feature turns out to be necessary, would that
advantage be gone? (Also: why would one want to turn this protection off
if it's available?)