[Ecn-sane] [tsvwg] Comments on L4S drafts

Wed Jul 10 13:03:23 EDT 2019

Hi Bob,

<JH>Responses inline...</JH>

From: Bob Briscoe <ietf at bobbriscoe.net>
Date: 2019-07-04 at 06:45

Nonetheless, when an unresponsive flow(s) is consuming some capacity, and a responsive flow(s) takes the total over the available capacity, then both are responsible in proportion to their contribution to the queue, 'cos the unresponsive flow didn't respond (it didn't even try to).

This is why it's OK to have a small unresponsive flow, but it becomes less and less OK to have a larger and larger unresponsive flow. 

<JH>
Right, this is a big part of the point I'm trying to make here.
Some of the video systems are sending a substantial-sized flow which
is not responsive at the transport layer.

However, that doesn't mean it's entirely unresponsive.  These often
do respond in practice at the application layer, but by observing
some quality of experience threshold from the video rendering.

Part of this quality of experience signal comes from the delay
fluctuation caused by the queueing delay when the link is overloaded,
but running the video through a low-latency queue would remove that
fluctuation, and thus change it from something that would cut over
to a lower bit-rate or remove the video into something that wouldn't.

At the same time, the app benefits from removing that fluctuation--
it gets to deliver a better video quality successfully.  When its
owners test it comparatively, they'll find they have an incentive
to add the marking, and their customers will have an incentive to
adopt that solution over solutions that don't, leading to an arms
race that progressively pushes out more of the responsive traffic.

My claim is that the lack of admission control is what makes this
arms race possible, by removing an important source of backpressure
on apps like this relative to today's internet (or one that does a
stricter fair-share-based degradation at bottlenecks).
</JH>

There's no bandwidth benefit. 
There's only latency benefit, and then the only benefits are:
• the low latency behaviour of yourself and other flows behaving like you
• and, critically, isolation from those flows not behaving well like you. 
Neither give an incentive to mismark - you get nothing if you don't behave. And there's a disincentive for 'Classic' TCP flows to mismark, 'cos they badly underutilize without a queue.

<JH>
It's typical for Non-responsive flows to get benefits from lower
latency.

I actually do (with caveats) agree that flows that don't respond
to transient congestion should be fine, as long as they use no more
than their fair share of the capacity.  However, by removing the
backpressure without adding something to prevent them from using
more than their fair share, it sets up perverse incentives that
push the ecosystem toward congestion collapse.

The Queue protection mechanism you mentioned might be sufficient
for this, but I'm having a hard time understanding the claim that
it's not required.

It seems to me in practice it will need to be used whenever there's
a chance that non-responsive flows can consume more than their
share, which chance we can reasonably expect will grow naturally
if L4S deployment grows.
</JH>

1/ The q-prot mechanism certainly has the disadvantage that it has to access L4 headers. But it is much more lightweight than FQ.

...

That's probably not understandable. Let me write it up properly - with some explanatory pictures and examples.

<JH>
I thought it was a reasonable summary and thanks for the
quick explanation (not to discourage writing it up properly,
which would also be good).

In short, it sounds to me like if we end up agreeing that Q
protection is required in L4S with dualq (a point currently
disputed, acked), and if the lfq draft holds up to scrutiny
(also a point to be determined), then it means:

The per-bucket complexity overhead comparison for the 2 proposed
architectures (L4S vs. SCE-based) would be 1 int per hash bucket
for dualq, vs. 2 ints + 1 bit per hash bucket for lfq.  And if so,
these overhead comparisons at the bottleneck queues can be taken
as a roughly fair comparison to weigh against other considerations.

Does that sound approximately correct?

Best regards,
Jake
</JH>