[Ecn-sane] [tsvwg] Comments on L4S drafts

Wed Jul 10 05:00:57 EDT 2019

Hi Jake,

>> I agree the key question for this discussion is about how best to get low latency for the internet.
Thanks

>> under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.
Correct

>> we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking traffic
Correct, not without compromising latency for Prague or throughput/utilization/stability/drop for Reno/Cubic

>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?

FQ for "per-user" isolation in access equipment has clearly an extra cost, not? If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...

>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?

Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens. Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows are identified might evolve (new transport protocols, encapsulations, ...?). Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities, besides it is a hard-coded throughput policy (and even mice size = 1 packet). On the other hand, flow isolation has benefits too, so hard to rule out one of them, not?

>> Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right?  (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with marking.)

Are you saying that the real good stuff can only be for FQ 😉? Fairness between a flow getting only one signal and another getting 2 is an issue, right? The one with the 2 signals can either ignore one, listen half to both, or try to smooth both signals to find the average loudest one? Again safety or performance needs to be chosen. PIE or PI2 is optimal for Classic traffic and good to couple congestion to Prague traffic, but Prague traffic needs a separate Q and an immediate step to get the "good stuff" working. Otherwise it will also overshoot, respond sluggish, etc...

>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, 

I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system. For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.

>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.

The performance in FQ is clearly equivalent, but for a common-Q behavior, only L4S can work. As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ. Don't underestimate the power of congestion control 😉. The ultimate proof is in the DualQ Coupled AQM where congestion control can beat a priority scheduler. If you want FQ to have effect, you need to have an AQM per FQ... The authors will notice this when they implement an AQM on top of it. I saw the current implementation works only in taildrop mode. But I think it is very good that the SCE proponents are very motivated to try with this speed to improve L4S. I'm happy to be proven wrong, but up to now I don't see any promising improvements to justify delay for L4S, only the above alternative compromise. Agreed that we can continue exploring alternative proposal in parallel though.

Koen.

-----Original Message-----
From: Holland, Jake <jholland at akamai.com> 
Sent: Monday, July 8, 2019 10:56 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com>; Jonathan Morton <chromatix99 at gmail.com>
Cc: ecn-sane at lists.bufferbloat.net; tsvwg at ietf.org
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

Hi Koen,

I'm a bit confused by this response.

I agree the key question for this discussion is about how best to get low latency for the internet.

If I'm reading your message correctly, you're saying that under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.

(I think I understand and roughly agree with this claim, subject to some caveats.  I just want to make sure I've got this right so far, and that we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking
traffic.)

Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?  Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?

Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right?  (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with
marking.)

Anyway, to me this discussion is about the tradeoffs between the
2 proposals.  It seems to me SCE has some safety advantages that should not be thrown away lightly, so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.

Best regards,
Jake

On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper at nokia-bell-labs.com> wrote:

    Hi Jonathan,

    From your responses below, I have the impression you think this discussion is about FQ (flow/fair queuing). Fair queuing is used today where strict isolation is wanted, like between subscribers, and by extension (if possible and preferred) on a per transport layer flow, like in Fixed CPEs and Mobile networks. No discussion about this, and assuming we have and still will have an Internet which needs to support both common queues (like DualQ is intended) and FQs, I think the only discussion point is how we want to migrate to an Internet that supports optimally Low Latency.

    This leads us to the question L4S or SCE?

    If we want to support low latency for both common queues and FQs we "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too, and if we want to force the whole Internet to use only FQs, we "SHOULD" use SCE 😉. If your goal is to force only FQs in the Internet, then let this be clear... I assume we need a discussion on another level in that case (and to be clear, it is not a goal I can support)...

    Koen.

    -----Original Message-----
    From: Jonathan Morton <chromatix99 at gmail.com> 
    Sent: Friday, July 5, 2019 10:51 AM
    To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com>
    Cc: Bob Briscoe <ietf at bobbriscoe.net>; ecn-sane at lists.bufferbloat.net; tsvwg at ietf.org
    Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

    > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com> wrote:
    > 
    >>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
    > 
    > Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
    > 
    >        "You cannot defeat a DualQ" (at least no more than a single Q)

    I consider forcibly degrading DualQ to single-queue mode to be a defeat.  However…

    >>> But that's exactly the problem.  Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
    > 
    > With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.

    Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck.  Defeating DualQ only destroys L4S' latency advantage over "classic" traffic.  We might actually be making progress here!

    > I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.

    Devil's advocate time.  What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE?  Classic traffic would see only the latter; L4S could use the former.

    > We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type. 

    Yes, that's exactly what we do - and it does work.

    > 	- So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).

    This is a false dichotomy.  We quickly realised both of those options were unacceptable, and sought a third way.

    SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already.  The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.

    > 	- you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.

    Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware.  So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.

    It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.

    > Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.

    SCE is designed around not *needing* to differentiate the traffic types.  Single queues have known disadvantages, and SCE doesn't worsen them.

    Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be interested in hearing a principled critique of it.

     - Jonathan Morton