From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id D1BF33B2A4 for ; Wed, 10 Jul 2019 09:14:42 -0400 (EDT) Received: by mail-io1-xd43.google.com with SMTP id s7so4508099iob.11 for ; Wed, 10 Jul 2019 06:14:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KhxZj24+nL1igVk3A1KsZG6wbUHHcfKrc+qdc2mGm0I=; b=G0e58xt1gdttwROTfaScX57Lk7Sw2irThYSHR42D6MayHTwvdUZCizPcZYs/zEqGdG 7583Vy5qxtITwdUndBBDcRK5A6CJM6YY1ioSWGe3f4+re0RadTJFRgo8y4WXjVJ7y2kC 6827rmb3akJ5YxwYnbOE8HGW+66hrVTQZCdJEYs/aVAC67vjGymBmGz1fk8pbvKqRuBR YbuJSPxnqNWAbiNleN6fPMDQfXYuvDypghgHvVW5zBXXjlcNZFePdjYyCrE7NSa9knh8 VuB9qUjY5CNNtIA5bBgBOdfoK0QW33/4bVA3mi/8iOrtHGx1ag/ZoSfcGrP0rkAyaxjf ZxIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KhxZj24+nL1igVk3A1KsZG6wbUHHcfKrc+qdc2mGm0I=; b=f5qmY3jmWCnZPuNLElKq1KBKBVmuNxAjQ97tGtIvSrFk1xN2GbSXLHaMwSmNxACuMx G0mYz0khCeEiSWSFGJp+la27cw9Ge6c/eI6TgVKPk43rnlQqsPbTFCQnp3tzjZeLmOFL qpfjspUv1nr1+WVQNLPvCyPWGrYslVkKhh1y7wnf8zPez386J+qUE+3avkJ2MOWvm1KG uzE9hCb+R8j+OvryyqohVDjhicJ7jkIxlLZesIiQOPfA5tAc54rIqN3x3XXjRp9Rb6dA aGaRdrRTJAlm37iqk6hBAHdInatKRvlT5fTtVmOsr5FYqz/Eym3H4oVu4OetE9UTNbAA LBNg== X-Gm-Message-State: APjAAAXDIN3ixs0sPPQbfro9JdmlR1lGLFju1Ybf7lp3dvGhjj3iUkw6 dmwYqFKzDP9sxJqGTHGxfiWzztKFsK/QTc2Ouss= X-Google-Smtp-Source: APXvYqxdijJOiO8zLNSdnB8lViH2AWH4Z77GUuybeCkZWDQPphA6UxBB1Lm+rI5TZ1mmMGBzby30I7dzY985HcZK74U= X-Received: by 2002:a02:c7c9:: with SMTP id s9mr34069251jao.82.1562764482008; Wed, 10 Jul 2019 06:14:42 -0700 (PDT) MIME-Version: 1.0 References: <364514D5-07F2-4388-A2CD-35ED1AE38405@akamai.com> <4aff6353-eb0d-b0b8-942d-9c92753f074e@bobbriscoe.net> <1238A446-6E05-4A55-8B3B-878C8F39FC75@gmail.com> <17B33B39-D25A-432C-9037-3A4835CCC0E1@gmail.com> <52F85CFC-B7CF-4C7A-88B8-AE0879B3CCFE@gmail.com> In-Reply-To: From: Dave Taht Date: Wed, 10 Jul 2019 06:14:31 -0700 Message-ID: To: "De Schepper, Koen (Nokia - BE/Antwerp)" Cc: "Holland, Jake" , Jonathan Morton , "ecn-sane@lists.bufferbloat.net" , "tsvwg@ietf.org" Content-Type: multipart/alternative; boundary="000000000000d15a16058d537591" Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2019 13:14:43 -0000 --000000000000d15a16058d537591 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I keep trying to stay out of this conversation being yellow about ecn in the first place, in any form. I would like to stress that ecn-sane was formed by the group of folk that were concerned about having accidentally masterminded the worlds biggest fq + aqm deployment, and the only one with ecn support, which happens In the case of wifi, the deployment is now in the 10s of millions, and doing hordes of good - latencies measured in the 10s of ms rather than 10s of seconds. I have seen no numbers on how well l4s will make it over to wifi as yet, nor any discussion, and I would rather like more pieces of the l4s solution to land sufficiently integrated for testing using tools like flent, and over far more than just a isochronous mac layer like dsl or docsis. Given the size of a txop in wifi (5.3ms), and how far back we have to put the AQM and FQ components today (2 txops), I don't think many of either SCE or L4S concepts will work well on wifi... but in general I prefer not to make assertions or assumptions until real-world testing can commence. I am presently at the battlemesh conference trying to get a bit of real-world data. A big problem wifi and 3g have is too many retransmits at the mac layer, not congestion controlled. Any signalling gets there late, and it's better to drop a bunch of packets when you hit a bunch of retransmits, in general. IMHO. On Wed, Jul 10, 2019 at 2:05 AM De Schepper, Koen (Nokia - BE/Antwerp) < koen.de_schepper@nokia-bell-labs.com> wrote: > Hi Jake, > > >> I agree the key question for this discussion is about how best to get > low latency for the internet. > Thanks > > >> under the L4S approach for ECT(1), we can achieve it with either dualq > or fq at the bottleneck, but under the SCE approach we can only do it wit= h > fq at the bottleneck. > Correct > > >> we agree that in neither case can very low latency be achieved with a > classic single queue with classic bandwidth-seeking traffic > Correct, not without compromising latency for Prague or > throughput/utilization/stability/drop for Reno/Cubic > > >> Are you saying that even if a scalable FQ can be implemented in > high-volume aggregated links at the same cost and difficulty as dualq, > there's a reason not to use FQ? > > FQ for "per-user" isolation in access equipment has clearly an extra cost= , > not? I've argued in the past that hashing is a bog standard part of most network cards and switches already. "extra cost" should be measured by actual measurements. Usually when you do those, you find it's another variable entirely costing you the most cpu/circuits. If we need to implement FQ "per-flow" on top, we need 2 levels of FQ > (per-user and per-user-flow, so from thousands to millions of queues). > Also, I haven=E2=80=99t seen DC switches coming with an FQ AQM... > Meh. Most of the time the instantaneous number of queues for some measurement of instantenious is in the low hundreds for rates up to 10GigE. We don't have a lot of data for bigger pipes. I haven't seen any DC switches with support anything other than RED or AFD, and DC folk overprovision anyway. > >> Is there a use case where it's necessary to avoid strict isolation if > strict isolation can be accomplished as cheaply? > > Even if as cheaply, as long as there is no reliable flow identification, > it clearly has side effects. Many homeworkers are using a VPN tunnel, whi= ch > is only one flow encapsulating maybe dozens. This is true. For a local endpoint for a vpn from a router fq_codel long ago gained support for doing the hashing & FQ before entering the tunnel. This works only with in-kernel ipsec transports although I've been trying to get it added to wireguard for a long time now. It of course doesn't apply to the whole path, but when applied at the home gateway router (bottleneck link), works rather well. Here are two examples of that mechanism in play. http://www.taht.net/~d/ipsec_fq_codel/oldqos.png http://www.taht.net/~d/ipsec_fq_codel/newqos.png Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows > are identified might evolve (new transport protocols, encapsulations, > ...?). Also if strict flow isolation could be done correctly, it has > additional issues related to missed scheduling opportunities, besides it = is > a hard-coded throughput policy (and even mice size =3D 1 packet). On the > other hand, flow isolation has benefits too, so hard to rule out one of > them, not? > The packet dissector in linux is quite robust, the one in BSD, less so. A counterpoint to the entire ECN debate (l4s or sce) that I'd like to make at more length is that it can and does hurt non ecn'd flows, particularly at lower bandwidths when you cannot reduce cwnd below 2 and the link is thus saturated. ARP can starve. ISIS fails. batman - lacking an IP header - can starve. babel, lacking ecn support can start to fail. And so on. > >> Also, I think if the SCE position is "low latency can only be achieved > with FQ", that's different from "forcing only FQ on the internet", provid= ed > the fairness claims hold up, right? (Classic single queue AQMs may still > have a useful place in getting pretty-good latency in the cheapest > hardware, like maybe PIE with marking.) > > Are you saying that the real good stuff can only be for FQ =F0=9F=98=89? = Fairness > between a flow getting only one signal and another getting 2 is an issue, > right? The one with the 2 signals can either ignore one, listen half to > both, or try to smooth both signals to find the average loudest one? Agai= n > safety or performance needs to be chosen. PIE or PI2 is optimal for Class= ic > traffic and good to couple congestion to Prague traffic, but Prague traff= ic > needs a separate Q and an immediate step to get the "good stuff" working. > Otherwise it will also overshoot, respond sluggish, etc... > > >> Anyway, to me this discussion is about the tradeoffs between the 2 > proposals. It seems to me SCE has some safety advantages that should not > be thrown away lightly, > > I appreciate the efforts of trying to improve L4S, but nobody working on > L4S for years now see a way that SCE can work on a non-FQ system. For me > (and I think many others) it is a no-go to only support FQ. Unfortunately > we only have half a bit free, and we need to choose how to use it. Would > you choose for the existing ECN switches that cannot be upgraded (are the= re > any?) or for all future non-FQ systems. > > > >> so if the performance can be made equivalent, it would be good to know > about it before committing the codepoint. > > The performance in FQ is clearly equivalent, Huh? > but for a common-Q behavior, only L4S can work. As far as I understood th= e > SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ > disguise =F0=9F=98=89), so I think not really a better alternative than p= ure FQ. Also > its single AQM on the bulk queue will undo any isolation, as a coupled AQ= M > is stronger than any scheduler, including FQ. Don't underestimate the pow= er > of congestion control =F0=9F=98=89. The ultimate proof is in the DualQ Co= upled AQM > where congestion control can beat a priority scheduler. If you want FQ to > have effect, you need to have an AQM per FQ... The authors will notice th= is > when they implement an AQM on top of it. I saw the current implementation > works only in taildrop mode. But I think it is very good that the SCE > proponents are very motivated to try with this speed to improve L4S. I'm > happy to be proven wrong, but up to now I don't see any promising > improvements to justify delay for L4S, only the above alternative > compromise. Agreed that we can continue exploring alternative proposal in > parallel though. > > I cannot parse this extreme set of assumptions and declarations. "taildrop mode??" As for promising improvements in general, there is a 7 year old deployment, running code, of something that we've show to work well in a variety of network scenarios, with 10x-100x improvements in network latency, at roughly 100% in linux overall, widely used in wifi and in many, many SQM/Qos systems and containers, with basic rfc3168 ecn enabled... and a proposal for a backward compatible way of enhancing that still more being explored. The embedded hardware pipeline for future implementations of this tech is full - it would take 3+ years to make a course change.... vs something that still has no real-world deployment data at all, that changes the definition of ecn, that has not a public ns2 or n3 model (?), no testing aside from a few very specific benchmarks, and so on... I do hope the coding competition heats up more, with more running code that others can explore, most of all. I long ago tired of the endless debates, as everyone knows, and I do kind of wish I wasn't burning lunch on this email instead of setting up a test at battlemesh. I note also that my leanings - in a fq_codel'd world, were it to stay such, was to enable more RTT based CCs like BBRto work more often in an RTT mode, and thus we start - originally to me, the SCE idea was a way to trigger a faster switch to congestion avoidance - as most of my captures taken from over used APs in restaurants, cafes, train stations etc shows stuff in slow start to be the biggest problem - and, regardless, an initial CE, right now, is a strong indicator that fq-codel is present, and a RTT based tcp can thus start to happen, and a good one, would not have many future marks after the first. A big difference in our outlooks, I guess, is that my viewpoint is that most of the congestion is at the edges of the network and I don't care all that much about big iron or switches, and I don't think either can afford much aqm tech at all in the first place. Not dual queues, not fqs. Were L4S not to deploy (using ect1 as a marker - btw, I think CS5 might be a better candidate as it goes into the wifi VI queue), and a fq_pie/fq_codel/sch_cake world to remain predominant, well, we might get somewhere, faster, where it counted. Koen. > > > -----Original Message----- > From: Holland, Jake > Sent: Monday, July 8, 2019 10:56 PM > To: De Schepper, Koen (Nokia - BE/Antwerp) < > koen.de_schepper@nokia-bell-labs.com>; Jonathan Morton < > chromatix99@gmail.com> > Cc: ecn-sane@lists.bufferbloat.net; tsvwg@ietf.org > Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts > > Hi Koen, > > I'm a bit confused by this response. > > I agree the key question for this discussion is about how best to get low > latency for the internet. > > If I'm reading your message correctly, you're saying that under the L4S > approach for ECT(1), we can achieve it with either dualq or fq at the > bottleneck, but under the SCE approach we can only do it with fq at the > bottleneck. > > (I think I understand and roughly agree with this claim, subject to some > caveats. I just want to make sure I've got this right so far, and that w= e > agree that in neither case can very low latency be achieved with a classi= c > single queue with classic bandwidth-seeking > traffic.) > > Are you saying that even if a scalable FQ can be implemented in > high-volume aggregated links at the same cost and difficulty as dualq, > there's a reason not to use FQ? Is there a use case where it's necessary > to avoid strict isolation if strict isolation can be accomplished as > cheaply? > > Also, I think if the SCE position is "low latency can only be achieved > with FQ", that's different from "forcing only FQ on the internet", provid= ed > the fairness claims hold up, right? (Classic single queue AQMs may still > have a useful place in getting pretty-good latency in the cheapest > hardware, like maybe PIE with > marking.) > > Anyway, to me this discussion is about the tradeoffs between the > 2 proposals. It seems to me SCE has some safety advantages that should > not be thrown away lightly, so if the performance can be made equivalent, > it would be good to know about it before committing the codepoint. > > Best regards, > Jake > > =EF=BB=BFOn 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" < > koen.de_schepper@nokia-bell-labs.com> wrote: > > Hi Jonathan, > > From your responses below, I have the impression you think this > discussion is about FQ (flow/fair queuing). Fair queuing is used today > where strict isolation is wanted, like between subscribers, and by > extension (if possible and preferred) on a per transport layer flow, like > in Fixed CPEs and Mobile networks. No discussion about this, and assuming > we have and still will have an Internet which needs to support both commo= n > queues (like DualQ is intended) and FQs, I think the only discussion poin= t > is how we want to migrate to an Internet that supports optimally Low > Latency. > > This leads us to the question L4S or SCE? > > If we want to support low latency for both common queues and FQs we > "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too= , > and if we want to force the whole Internet to use only FQs, we "SHOULD" u= se > SCE =F0=9F=98=89. If your goal is to force only FQs in the Internet, then= let this be > clear... I assume we need a discussion on another level in that case (and > to be clear, it is not a goal I can support)... > > Koen. > > > -----Original Message----- > From: Jonathan Morton > Sent: Friday, July 5, 2019 10:51 AM > To: De Schepper, Koen (Nokia - BE/Antwerp) < > koen.de_schepper@nokia-bell-labs.com> > Cc: Bob Briscoe ; ecn-sane@lists.bufferbloat.net= ; > tsvwg@ietf.org > Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts > > > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) = < > koen.de_schepper@nokia-bell-labs.com> wrote: > > > >>> 2: DualQ can be defeated by an adversary, destroying its ability > to isolate L4S traffic. > > > > Before jumping to another point, let's close down your original > issue. Since you didn't mention, I assume that you agree with the > following, right? > > > > "You cannot defeat a DualQ" (at least no more than a single = Q) > > I consider forcibly degrading DualQ to single-queue mode to be a > defeat. However=E2=80=A6 > > >>> But that's exactly the problem. Single queue AQM does not isolat= e > L4S traffic from "classic" traffic, so the latter suffers from the former= 's > relative aggression in the face of AQM activity. > > > > With L4S a single queue can differentiate between Classic and L4S > traffic. That's why it knows exactly how to treat the traffic. For Non-EC= T > and ECT(0) square the probability, and for ECT(1) don't square, and it > works exactly like a DualQ, but then without the latency isolation. Both > types get the same throughput, AND delay. See the PI2 paper, which is > exactly about a single Q. > > Okay, this is an important point: the real assertion is not that Dual= Q > itself is needed for L4S to be safe on the Internet, but for differential > AQM treatment to be present at the bottleneck. Defeating DualQ only > destroys L4S' latency advantage over "classic" traffic. We might actuall= y > be making progress here! > > > I agree you cannot isolate in a single Q, and this is why L4S is > better than SCE, because it tells the AQM what to do, even if it has a > single Q. SCE needs isolation, L4S not. > > Devil's advocate time. What if, instead of providing differential > treatment WRT CE marking, PI2 instead applied both marking strategies > simultaneously - the higher rate using SCE, and the lower rate using CE? > Classic traffic would see only the latter; L4S could use the former. > > > We tried years ago similar things like needed for SCE, and found > that it can't work. For throughput fairness you need the squared relation > between the 2 signals, but with SCE, you need to apply both signals in > parallel, because you don't know the sender type. > > Yes, that's exactly what we do - and it does work. > > > - So either the sender needs to ignore CE if it gets SCE, or > ignore SCE if you get CE. The first is dangerous if you have multiple > bottlenecks, and the second is defeating the purpose of SCE. Any other > combination leads to unfairness (double response). > > This is a false dichotomy. We quickly realised both of those options > were unacceptable, and sought a third way. > > SCE senders apply a reduced CE response when also responding to > parallel SCE feedback, roughly in line with ABE, on the grounds that > responding to SCE does some of the necessary reduction already. The > reduced response is still a Multiplicative Decrease, so it fits with norm= al > TCP congestion control principles. > > > - you separate the signals in queue dept, first applying SCE and > later CE, as you originally proposed, but that results in starvation for > SCE. > > Yes, although this approach gives the best performance for SCE when > used with flow isolation, or when all flows are known to be SCE-aware. S= o > we apply this strategy in those cases, and move the SCE marking function = up > to overlap CE marking specifically for single queues. > > It has been suggested that single queue AQMs are rare in any case, bu= t > this approach covers that corner case. > > > Add on top that SCE makes it impossible to use DualQ, as you cannot > differentiate the traffic types. > > SCE is designed around not *needing* to differentiate the traffic > types. Single queues have known disadvantages, and SCE doesn't worsen th= em. > > Meanwhile, we have proposed LFQ to cover the DualQ use case. I'd be > interested in hearing a principled critique of it. > > - Jonathan Morton > > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > --=20 Dave T=C3=A4ht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 --000000000000d15a16058d537591 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I keep trying to stay out of this conversation being = yellow about ecn in the first place, in any form. I would like to stress th= at=C2=A0
ecn-sane was formed by the group of folk that were conce= rned about having accidentally masterminded the worlds biggest fq=C2=A0+ aq= m
deployment, and the only one with ecn support, which happens=C2= =A0

In the case of wifi, the deployment is now in = the 10s of millions, and doing hordes of good - latencies measured in the 1= 0s of ms rather than 10s of seconds.

I have seen n= o numbers on how well l4s will make it over to wifi as yet, nor any discuss= ion, and I would rather like more pieces of the l4s solution to land suffic= iently integrated for testing using tools like flent, and over far more tha= n just a isochronous mac layer like dsl or docsis. Given the size of a txop= in wifi (5.3ms), and how far back we have
to put the AQM and FQ = components today (2 txops), I don't think many of either SCE or L4S con= cepts will work well on wifi... but in general
I prefer not to ma= ke assertions or assumptions until real-world testing can commence.=C2=A0

I am presently at the battlemesh conference trying = to get a bit of real-world data.

A big problem wif= i and 3g have is too many retransmits at the mac layer, not congestion cont= rolled. Any signalling gets there late, and it's
better to dr= op a bunch of packets when you hit a bunch of retransmits, in general. IMHO= .=C2=A0

On Wed, Jul 10, 2019 at 2:05 AM De Schepper, Koen (N= okia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
Hi Jake,

>> I agree the key question for this discussion is about how best to = get low latency for the internet.
Thanks

>> under the L4S approach for ECT(1), we can achieve it with either d= ualq or fq at the bottleneck, but under the SCE approach we can only do it = with fq at the bottleneck.
Correct

>> we agree that in neither case can very low latency be achieved wit= h a classic single queue with classic bandwidth-seeking traffic
Correct, not without compromising latency for Prague or throughput/utilizat= ion/stability/drop for Reno/Cubic

>> Are you saying that even if a scalable FQ can be implemented in hi= gh-volume aggregated links at the same cost and difficulty as dualq, there&= #39;s a reason not to use FQ?
=C2=A0
FQ for "per-user" isolation in access equipment has clearly an ex= tra cost, not?

I've argued in the past= that hashing is a bog standard part of most network cards and switches alr= eady.=C2=A0

"extra cost" should be measu= red by actual measurements. Usually when you do those, you find it's an= other variable entirely costing you the most
cpu/circuits.
<= div>

If we need to implement FQ "per-flow" on top, we need 2 level= s of FQ (per-user and per-user-flow, so from thousands to millions of queue= s). Also, I haven=E2=80=99t seen DC switches coming with an FQ AQM...

Meh. Most of the time the instantaneous numb= er of queues for some measurement of instantenious is in the low hundreds f= or rates up to
10GigE. We don't have a lot of data for bigger= pipes.=C2=A0

I haven't seen any DC switches w= ith support anything other than RED or AFD, and DC folk overprovision anywa= y.

=C2=A0
>> Is there a use case where it's necessary to avoid strict isola= tion if strict isolation can be accomplished as cheaply?

Even if as cheaply, as long as there is no reliable flow identification, it= clearly has side effects. Many homeworkers are using a VPN tunnel, which i= s only one flow encapsulating maybe dozens.

This is true. For a local endpoint for a vpn from a router fq_codel long = ago gained support for doing the hashing & FQ before entering the tunne= l.

This works only with in-kernel ipsec transports= although I've been trying to get it added to wireguard for a long time= now.

=C2=A0It of course doesn't apply to the = whole path, but when applied at the home gateway router (bottleneck link), = works rather well.

Here are two examples of that m= echanism in play.



D= rop and ECN (if implemented correctly) are tunnel agnostic. Also how flows = are identified might evolve (new transport protocols, encapsulations, ...?)= . Also if strict flow isolation could be done correctly, it has additional = issues related to missed scheduling opportunities, besides it is a hard-cod= ed throughput policy (and even mice size =3D 1 packet). On the other hand, = flow isolation has benefits too, so hard to rule out one of them, not?
<= /blockquote>

The packet dissector in linux is quite robu= st, the one in BSD, less so.

A counterpoint to the= entire ECN debate (l4s or sce) that I'd like to make at more length is= that it can and does hurt non ecn'd flows, particularly at lower
=
bandwidths when you cannot reduce cwnd below 2 and the link is thus sa= turated. ARP can starve. ISIS fails. batman - lacking an IP header -=C2=A0 = can starve.
babel, lacking ecn support can start to fail. And so = on.


>> Also, I think if the SCE position is "low latency can only be= achieved with FQ", that's different from "forcing only FQ on= the internet", provided the fairness claims hold up, right?=C2=A0 (Cl= assic single queue AQMs may still have a useful place in getting pretty-goo= d latency in the cheapest hardware, like maybe PIE with marking.)

Are you saying that the real good stuff can only be for FQ =F0=9F=98=89? Fa= irness between a flow getting only one signal and another getting 2 is an i= ssue, right? The one with the 2 signals can either ignore one, listen half = to both, or try to smooth both signals to find the average loudest one? Aga= in safety or performance needs to be chosen. PIE or PI2 is optimal for Clas= sic traffic and good to couple congestion to Prague traffic, but Prague tra= ffic needs a separate Q and an immediate step to get the "good stuff&q= uot; working. Otherwise it will also overshoot, respond sluggish, etc...
>> Anyway, to me this discussion is about the tradeoffs between the 2= proposals.=C2=A0 It seems to me SCE has some safety advantages that should= not be thrown away lightly,

I appreciate the efforts of trying to improve L4S, but nobody working on L4= S for years now see a way that SCE can work on a non-FQ system. For me (and= I think many others) it is a no-go to only support FQ. Unfortunately we on= ly have half a bit free, and we need to choose how to use it. Would you cho= ose for the existing ECN switches that cannot be upgraded (are there any?) = or for all future non-FQ systems.


=C2=A0
>> so if the performance can be made equivalent, it would be good to = know about it before committing the codepoint.

The performance in FQ is clearly equivalent,

Huh?
=C2=A0
but for a common-Q behavior, only L4S can work. As far as I understood= the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in Dual= Q disguise =F0=9F=98=89), so I think not really a better alternative than p= ure FQ. Also its single AQM on the bulk queue will undo any isolation, as a= coupled AQM is stronger than any scheduler, including FQ. Don't undere= stimate the power of congestion control =F0=9F=98=89. The ultimate proof is= in the DualQ Coupled AQM where congestion control can beat a priority sche= duler. If you want FQ to have effect, you need to have an AQM per FQ... The= authors will notice this when they implement an AQM on top of it. I saw th= e current implementation works only in taildrop mode. But I think it is ver= y good that the SCE proponents are very motivated to try with this speed to= improve L4S. I'm happy to be proven wrong, but up to now I don't s= ee any promising improvements to justify delay for L4S, only the above alte= rnative compromise. Agreed that we can continue exploring alternative propo= sal in parallel though.


I cannot parse this extreme set of ass= umptions and declarations. "taildrop mode??"

=
As for promising improvements in general, there is a 7 year old deploy= ment, running code,=C2=A0 of something that we've show to work well in = a variety
of network scenarios, with 10x-100x improvements in net= work latency, at roughly 100% in linux overall, widely used in wifi and in = many, many SQM/Qos systems and containers, with basic rfc3168 ecn enabled..= . and a proposal for a backward compatible way of enhancing that still more= being explored. The embedded hardware pipeline
for future implem= entations of this tech is full - it would take 3+ years to make a course ch= ange....=C2=A0

vs something that still has no real= -world deployment data at all, that changes the definition of ecn, that has= not a public ns2 or n3 model (?), no testing aside from a few
ve= ry specific benchmarks, and so on...

I do hope the= coding competition heats up more, with more running code that others can e= xplore, most of all. I long ago tired of the endless debates, as everyone k= nows,
and I do kind of wish I wasn't burning lunch on this em= ail instead of setting up a test at battlemesh.

I = note also that my leanings - in a fq_codel'd world, were it to stay suc= h, was to enable more RTT based CCs=C2=A0 like BBRto work more often in an = RTT mode, and thus
we start - originally to me, the SCE idea was = a way to trigger a faster switch to congestion avoidance - as most of my ca= ptures taken from over used APs in
restaurants, cafes, train stat= ions etc shows stuff in slow start to be the biggest problem - and, regardl= ess, an initial CE, right now, is a strong indicator that fq-codel is prese= nt, and
a RTT based tcp can thus start to happen, and a good one,= would not have many future marks after the first.

A big difference in our outlooks, I guess, is that my viewpoint is that mo= st of the congestion is at the edges of the network and I don't care al= l that
much about big iron or switches, and I don't think eit= her can afford much aqm tech at all in the first place. Not dual queues, no= t fqs.

Were L4S not to deploy (using ect1 as a mar= ker - btw, I think CS5 might be a better candidate as it goes into the wifi= VI queue), and a fq_pie/fq_codel/sch_cake
world to remain predom= inant, well, we might get somewhere, faster, where it counted.
Koen.


-----Original Message-----
From: Holland, Jake <jholland@akamai.com>
Sent: Monday, July 8, 2019 10:56 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-l= abs.com>; Jonathan Morton <chromatix99@gmail.com>
Cc: ecn= -sane@lists.bufferbloat.net; tsvwg@ietf.org
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

Hi Koen,

I'm a bit confused by this response.

I agree the key question for this discussion is about how best to get low l= atency for the internet.

If I'm reading your message correctly, you're saying that under the= L4S approach for ECT(1), we can achieve it with either dualq or fq at the = bottleneck, but under the SCE approach we can only do it with fq at the bot= tleneck.

(I think I understand and roughly agree with this claim, subject to some ca= veats.=C2=A0 I just want to make sure I've got this right so far, and t= hat we agree that in neither case can very low latency be achieved with a c= lassic single queue with classic bandwidth-seeking
traffic.)

Are you saying that even if a scalable FQ can be implemented in high-volume= aggregated links at the same cost and difficulty as dualq, there's a r= eason not to use FQ?=C2=A0 Is there a use case where it's necessary to = avoid strict isolation if strict isolation can be accomplished as cheaply?<= br>
Also, I think if the SCE position is "low latency can only be achieved= with FQ", that's different from "forcing only FQ on the inte= rnet", provided the fairness claims hold up, right?=C2=A0 (Classic sin= gle queue AQMs may still have a useful place in getting pretty-good latency= in the cheapest hardware, like maybe PIE with
marking.)

Anyway, to me this discussion is about the tradeoffs between the
2 proposals.=C2=A0 It seems to me SCE has some safety advantages that shoul= d not be thrown away lightly, so if the performance can be made equivalent,= it would be good to know about it before committing the codepoint.

Best regards,
Jake

=EF=BB=BFOn 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)= " <koen.de_schepper@nokia-bell-labs.com> wrote:

=C2=A0 =C2=A0 Hi Jonathan,

=C2=A0 =C2=A0 From your responses below, I have the impression you think th= is discussion is about FQ (flow/fair queuing). Fair queuing is used today w= here strict isolation is wanted, like between subscribers, and by extension= (if possible and preferred) on a per transport layer flow, like in Fixed C= PEs and Mobile networks. No discussion about this, and assuming we have and= still will have an Internet which needs to support both common queues (lik= e DualQ is intended) and FQs, I think the only discussion point is how we w= ant to migrate to an Internet that supports optimally Low Latency.

=C2=A0 =C2=A0 This leads us to the question L4S or SCE?

=C2=A0 =C2=A0 If we want to support low latency for both common queues and = FQs we "NEED" L4S, if we need to support it only for FQs, we &quo= t;COULD" use SCE too, and if we want to force the whole Internet to us= e only FQs, we "SHOULD" use SCE =F0=9F=98=89. If your goal is to = force only FQs in the Internet, then let this be clear... I assume we need = a discussion on another level in that case (and to be clear, it is not a go= al I can support)...

=C2=A0 =C2=A0 Koen.


=C2=A0 =C2=A0 -----Original Message-----
=C2=A0 =C2=A0 From: Jonathan Morton <chromatix99@gmail.com>
=C2=A0 =C2=A0 Sent: Friday, July 5, 2019 10:51 AM
=C2=A0 =C2=A0 To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_scheppe= r@nokia-bell-labs.com>
=C2=A0 =C2=A0 Cc: Bob Briscoe <ietf@bobbriscoe.net>; ecn-sane@lists.bufferbloat.net; <= a href=3D"mailto:tsvwg@ietf.org" target=3D"_blank">tsvwg@ietf.org
=C2=A0 =C2=A0 Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

=C2=A0 =C2=A0 > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - B= E/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
=C2=A0 =C2=A0 >
=C2=A0 =C2=A0 >>> 2: DualQ can be defeated by an adversary, destro= ying its ability to isolate L4S traffic.
=C2=A0 =C2=A0 >
=C2=A0 =C2=A0 > Before jumping to another point, let's close down yo= ur original issue. Since you didn't mention, I assume that you agree wi= th the following, right?
=C2=A0 =C2=A0 >
=C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0 =C2=A0 "You cannot defeat a Dua= lQ" (at least no more than a single Q)

=C2=A0 =C2=A0 I consider forcibly degrading DualQ to single-queue mode to b= e a defeat.=C2=A0 However=E2=80=A6

=C2=A0 =C2=A0 >>> But that's exactly the problem.=C2=A0 Single= queue AQM does not isolate L4S traffic from "classic" traffic, s= o the latter suffers from the former's relative aggression in the face = of AQM activity.
=C2=A0 =C2=A0 >
=C2=A0 =C2=A0 > With L4S a single queue can differentiate between Classi= c and L4S traffic. That's why it knows exactly how to treat the traffic= . For Non-ECT and ECT(0) square the probability, and for ECT(1) don't s= quare, and it works exactly like a DualQ, but then without the latency isol= ation. Both types get the same throughput, AND delay. See the PI2 paper, wh= ich is exactly about a single Q.

=C2=A0 =C2=A0 Okay, this is an important point: the real assertion is not t= hat DualQ itself is needed for L4S to be safe on the Internet, but for diff= erential AQM treatment to be present at the bottleneck.=C2=A0 Defeating Dua= lQ only destroys L4S' latency advantage over "classic" traffi= c.=C2=A0 We might actually be making progress here!

=C2=A0 =C2=A0 > I agree you cannot isolate in a single Q, and this is wh= y L4S is better than SCE, because it tells the AQM what to do, even if it h= as a single Q. SCE needs isolation, L4S not.

=C2=A0 =C2=A0 Devil's advocate time.=C2=A0 What if, instead of providin= g differential treatment WRT CE marking, PI2 instead applied both marking s= trategies simultaneously - the higher rate using SCE, and the lower rate us= ing CE?=C2=A0 Classic traffic would see only the latter; L4S could use the = former.

=C2=A0 =C2=A0 > We tried years ago similar things like needed for SCE, a= nd found that it can't work. For throughput fairness you need the squar= ed relation between the 2 signals, but with SCE, you need to apply both sig= nals in parallel, because you don't know the sender type.

=C2=A0 =C2=A0 Yes, that's exactly what we do - and it does work.

=C2=A0 =C2=A0 >=C2=A0 =C2=A0- So either the sender needs to ignore CE if= it gets SCE, or ignore SCE if you get CE. The first is dangerous if you ha= ve multiple bottlenecks, and the second is defeating the purpose of SCE. An= y other combination leads to unfairness (double response).

=C2=A0 =C2=A0 This is a false dichotomy.=C2=A0 We quickly realised both of = those options were unacceptable, and sought a third way.

=C2=A0 =C2=A0 SCE senders apply a reduced CE response when also responding = to parallel SCE feedback, roughly in line with ABE, on the grounds that res= ponding to SCE does some of the necessary reduction already.=C2=A0 The redu= ced response is still a Multiplicative Decrease, so it fits with normal TCP= congestion control principles.

=C2=A0 =C2=A0 >=C2=A0 =C2=A0- you separate the signals in queue dept, fi= rst applying SCE and later CE, as you originally proposed, but that results= in starvation for SCE.

=C2=A0 =C2=A0 Yes, although this approach gives the best performance for SC= E when used with flow isolation, or when all flows are known to be SCE-awar= e.=C2=A0 So we apply this strategy in those cases, and move the SCE marking= function up to overlap CE marking specifically for single queues.

=C2=A0 =C2=A0 It has been suggested that single queue AQMs are rare in any = case, but this approach covers that corner case.

=C2=A0 =C2=A0 > Add on top that SCE makes it impossible to use DualQ, as= you cannot differentiate the traffic types.

=C2=A0 =C2=A0 SCE is designed around not *needing* to differentiate the tra= ffic types.=C2=A0 Single queues have known disadvantages, and SCE doesn'= ;t worsen them.

=C2=A0 =C2=A0 Meanwhile, we have proposed LFQ to cover the DualQ use case.= =C2=A0 I'd be interested in hearing a principled critique of it.

=C2=A0 =C2=A0 =C2=A0- Jonathan Morton



_______________________________________________
Ecn-sane mailing list
Ecn-san= e@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/ecn-sane


--

Dave T=C3=A4ht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
--000000000000d15a16058d537591--