[Ecn-sane] [tsvwg] Comments on L4S drafts
De Schepper, Koen (Nokia - BE/Antwerp)
koen.de_schepper at nokia-bell-labs.com
Wed Jul 10 13:32:38 EDT 2019
Hi Dave,
Sorry for your lunch, but maybe I’ve cut away too much context, as I think some of your responses are not really about the discussion point.
In general I see that we both agree that FQ has pro’s and con’s, and is deployed and useful, so no need for further discussion on FQ. The actual discussion is on whether we still need to support low latency on non_FQ systems, or that low latency is only a privilege of FQ systems.
>>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
>>
>>The performance in FQ is clearly equivalent,
>
>Huh?
My point was that SCE on FQ can give equivalent results as L4S on FQ, and I think everyone agrees here too.
But I want to make clear that SCE is only working with FQ with an AQM per Q:
>> but for a common-Q behavior, only L4S can work. As far as I understood the SCE-LFQ proposal is actually
>> a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than
>> pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than
>> any scheduler, including FQ. Don't underestimate the power of congestion control 😉. The ultimate proof
>> is in the DualQ Coupled AQM where congestion control can beat a priority scheduler. If you want FQ to
>> have effect, you need to have an AQM per FQ... The authors will notice this when they implement an AQM
>> on top of it. I saw the current implementation works only in taildrop mode. But I think it is very good that
>> the SCE proponents are very motivated to try with this speed to improve L4S. I'm happy to be proven wrong,
>> but up to now I don't see any promising improvements to justify delay for L4S, only the above alternative
>> compromise. Agreed that we can continue exploring alternative proposal in parallel though.
>
> I cannot parse this extreme set of assumptions and declarations. "taildrop mode??"
Context: Common-Q behavior is one common Q or set of common Qs (like DualQ) with one
coupled AQM which doesn’t want to identify every flow, but only traffic classes (Classic or L4S).
If you re-read the section again with this context, you will better understand that this is not about FQ (we agree that
Both L4S and SCE work) but about the LFQ (light-weight-FQ) proposal that seems to claim to be a DualQ, but is
actually an FQ which needs more time to select a packet at dequeue. It also has a common AQM on top of all bulk
virtual-FQ-queues. As you probably agree, you need an AQM per queue if you want to benefit from FQ or congestion
control will take over and FQ behaves like a single Q. This is especially important if the congestion controls are not
compatible, because you need to identify the traffic classes to give a differentiated AQM treatment to the different
classes, hence the need for L4S...
I hope this clarifies,
Koen.
From: Dave Taht <dave.taht at gmail.com>
Sent: Wednesday, July 10, 2019 3:15 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com>
Cc: Holland, Jake <jholland at akamai.com>; Jonathan Morton <chromatix99 at gmail.com>; ecn-sane at lists.bufferbloat.net; tsvwg at ietf.org
Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts
I keep trying to stay out of this conversation being yellow about ecn in the first place, in any form. I would like to stress that
ecn-sane was formed by the group of folk that were concerned about having accidentally masterminded the worlds biggest fq + aqm
deployment, and the only one with ecn support, which happens
In the case of wifi, the deployment is now in the 10s of millions, and doing hordes of good - latencies measured in the 10s of ms rather than 10s of seconds.
I have seen no numbers on how well l4s will make it over to wifi as yet, nor any discussion, and I would rather like more pieces of the l4s solution to land sufficiently integrated for testing using tools like flent, and over far more than just a isochronous mac layer like dsl or docsis. Given the size of a txop in wifi (5.3ms), and how far back we have
to put the AQM and FQ components today (2 txops), I don't think many of either SCE or L4S concepts will work well on wifi... but in general
I prefer not to make assertions or assumptions until real-world testing can commence.
I am presently at the battlemesh conference trying to get a bit of real-world data.
A big problem wifi and 3g have is too many retransmits at the mac layer, not congestion controlled. Any signalling gets there late, and it's
better to drop a bunch of packets when you hit a bunch of retransmits, in general. IMHO.
On Wed, Jul 10, 2019 at 2:05 AM De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com<mailto:koen.de_schepper at nokia-bell-labs.com>> wrote:
Hi Jake,
>> I agree the key question for this discussion is about how best to get low latency for the internet.
Thanks
>> under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.
Correct
>> we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking traffic
Correct, not without compromising latency for Prague or throughput/utilization/stability/drop for Reno/Cubic
>> Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ?
FQ for "per-user" isolation in access equipment has clearly an extra cost, not?
I've argued in the past that hashing is a bog standard part of most network cards and switches already.
"extra cost" should be measured by actual measurements. Usually when you do those, you find it's another variable entirely costing you the most
cpu/circuits.
If we need to implement FQ "per-flow" on top, we need 2 levels of FQ (per-user and per-user-flow, so from thousands to millions of queues). Also, I haven’t seen DC switches coming with an FQ AQM...
Meh. Most of the time the instantaneous number of queues for some measurement of instantenious is in the low hundreds for rates up to
10GigE. We don't have a lot of data for bigger pipes.
I haven't seen any DC switches with support anything other than RED or AFD, and DC folk overprovision anyway.
>> Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?
Even if as cheaply, as long as there is no reliable flow identification, it clearly has side effects. Many homeworkers are using a VPN tunnel, which is only one flow encapsulating maybe dozens.
This is true. For a local endpoint for a vpn from a router fq_codel long ago gained support for doing the hashing & FQ before entering the tunnel.
This works only with in-kernel ipsec transports although I've been trying to get it added to wireguard for a long time now.
It of course doesn't apply to the whole path, but when applied at the home gateway router (bottleneck link), works rather well.
Here are two examples of that mechanism in play.
http://www.taht.net/~d/ipsec_fq_codel/oldqos.png
http://www.taht.net/~d/ipsec_fq_codel/newqos.png
Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows are identified might evolve (new transport protocols, encapsulations, ...?). Also if strict flow isolation could be done correctly, it has additional issues related to missed scheduling opportunities, besides it is a hard-coded throughput policy (and even mice size = 1 packet). On the other hand, flow isolation has benefits too, so hard to rule out one of them, not?
The packet dissector in linux is quite robust, the one in BSD, less so.
A counterpoint to the entire ECN debate (l4s or sce) that I'd like to make at more length is that it can and does hurt non ecn'd flows, particularly at lower
bandwidths when you cannot reduce cwnd below 2 and the link is thus saturated. ARP can starve. ISIS fails. batman - lacking an IP header - can starve.
babel, lacking ecn support can start to fail. And so on.
>> Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right? (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with marking.)
Are you saying that the real good stuff can only be for FQ 😉? Fairness between a flow getting only one signal and another getting 2 is an issue, right? The one with the 2 signals can either ignore one, listen half to both, or try to smooth both signals to find the average loudest one? Again safety or performance needs to be chosen. PIE or PI2 is optimal for Classic traffic and good to couple congestion to Prague traffic, but Prague traffic needs a separate Q and an immediate step to get the "good stuff" working. Otherwise it will also overshoot, respond sluggish, etc...
>> Anyway, to me this discussion is about the tradeoffs between the 2 proposals. It seems to me SCE has some safety advantages that should not be thrown away lightly,
I appreciate the efforts of trying to improve L4S, but nobody working on L4S for years now see a way that SCE can work on a non-FQ system. For me (and I think many others) it is a no-go to only support FQ. Unfortunately we only have half a bit free, and we need to choose how to use it. Would you choose for the existing ECN switches that cannot be upgraded (are there any?) or for all future non-FQ systems.
>> so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
The performance in FQ is clearly equivalent,
Huh?
but for a common-Q behavior, only L4S can work. As far as I understood the SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ disguise 😉), so I think not really a better alternative than pure FQ. Also its single AQM on the bulk queue will undo any isolation, as a coupled AQM is stronger than any scheduler, including FQ. Don't underestimate the power of congestion control 😉. The ultimate proof is in the DualQ Coupled AQM where congestion control can beat a priority scheduler. If you want FQ to have effect, you need to have an AQM per FQ... The authors will notice this when they implement an AQM on top of it. I saw the current implementation works only in taildrop mode. But I think it is very good that the SCE proponents are very motivated to try with this speed to improve L4S. I'm happy to be proven wrong, but up to now I don't see any promising improvements to justify delay for L4S, only the above alternative compromise. Agreed that we can continue exploring alternative proposal in parallel though.
I cannot parse this extreme set of assumptions and declarations. "taildrop mode??"
As for promising improvements in general, there is a 7 year old deployment, running code, of something that we've show to work well in a variety
of network scenarios, with 10x-100x improvements in network latency, at roughly 100% in linux overall, widely used in wifi and in many, many SQM/Qos systems and containers, with basic rfc3168 ecn enabled... and a proposal for a backward compatible way of enhancing that still more being explored. The embedded hardware pipeline
for future implementations of this tech is full - it would take 3+ years to make a course change....
vs something that still has no real-world deployment data at all, that changes the definition of ecn, that has not a public ns2 or n3 model (?), no testing aside from a few
very specific benchmarks, and so on...
I do hope the coding competition heats up more, with more running code that others can explore, most of all. I long ago tired of the endless debates, as everyone knows,
and I do kind of wish I wasn't burning lunch on this email instead of setting up a test at battlemesh.
I note also that my leanings - in a fq_codel'd world, were it to stay such, was to enable more RTT based CCs like BBRto work more often in an RTT mode, and thus
we start - originally to me, the SCE idea was a way to trigger a faster switch to congestion avoidance - as most of my captures taken from over used APs in
restaurants, cafes, train stations etc shows stuff in slow start to be the biggest problem - and, regardless, an initial CE, right now, is a strong indicator that fq-codel is present, and
a RTT based tcp can thus start to happen, and a good one, would not have many future marks after the first.
A big difference in our outlooks, I guess, is that my viewpoint is that most of the congestion is at the edges of the network and I don't care all that
much about big iron or switches, and I don't think either can afford much aqm tech at all in the first place. Not dual queues, not fqs.
Were L4S not to deploy (using ect1 as a marker - btw, I think CS5 might be a better candidate as it goes into the wifi VI queue), and a fq_pie/fq_codel/sch_cake
world to remain predominant, well, we might get somewhere, faster, where it counted.
Koen.
-----Original Message-----
From: Holland, Jake <jholland at akamai.com<mailto:jholland at akamai.com>>
Sent: Monday, July 8, 2019 10:56 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com<mailto:koen.de_schepper at nokia-bell-labs.com>>; Jonathan Morton <chromatix99 at gmail.com<mailto:chromatix99 at gmail.com>>
Cc: ecn-sane at lists.bufferbloat.net<mailto:ecn-sane at lists.bufferbloat.net>; tsvwg at ietf.org<mailto:tsvwg at ietf.org>
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
Hi Koen,
I'm a bit confused by this response.
I agree the key question for this discussion is about how best to get low latency for the internet.
If I'm reading your message correctly, you're saying that under the L4S approach for ECT(1), we can achieve it with either dualq or fq at the bottleneck, but under the SCE approach we can only do it with fq at the bottleneck.
(I think I understand and roughly agree with this claim, subject to some caveats. I just want to make sure I've got this right so far, and that we agree that in neither case can very low latency be achieved with a classic single queue with classic bandwidth-seeking
traffic.)
Are you saying that even if a scalable FQ can be implemented in high-volume aggregated links at the same cost and difficulty as dualq, there's a reason not to use FQ? Is there a use case where it's necessary to avoid strict isolation if strict isolation can be accomplished as cheaply?
Also, I think if the SCE position is "low latency can only be achieved with FQ", that's different from "forcing only FQ on the internet", provided the fairness claims hold up, right? (Classic single queue AQMs may still have a useful place in getting pretty-good latency in the cheapest hardware, like maybe PIE with
marking.)
Anyway, to me this discussion is about the tradeoffs between the
2 proposals. It seems to me SCE has some safety advantages that should not be thrown away lightly, so if the performance can be made equivalent, it would be good to know about it before committing the codepoint.
Best regards,
Jake
On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper at nokia-bell-labs.com<mailto:koen.de_schepper at nokia-bell-labs.com>> wrote:
Hi Jonathan,
From your responses below, I have the impression you think this discussion is about FQ (flow/fair queuing). Fair queuing is used today where strict isolation is wanted, like between subscribers, and by extension (if possible and preferred) on a per transport layer flow, like in Fixed CPEs and Mobile networks. No discussion about this, and assuming we have and still will have an Internet which needs to support both common queues (like DualQ is intended) and FQs, I think the only discussion point is how we want to migrate to an Internet that supports optimally Low Latency.
This leads us to the question L4S or SCE?
If we want to support low latency for both common queues and FQs we "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too, and if we want to force the whole Internet to use only FQs, we "SHOULD" use SCE 😉. If your goal is to force only FQs in the Internet, then let this be clear... I assume we need a discussion on another level in that case (and to be clear, it is not a goal I can support)...
Koen.
-----Original Message-----
From: Jonathan Morton <chromatix99 at gmail.com<mailto:chromatix99 at gmail.com>>
Sent: Friday, July 5, 2019 10:51 AM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com<mailto:koen.de_schepper at nokia-bell-labs.com>>
Cc: Bob Briscoe <ietf at bobbriscoe.net<mailto:ietf at bobbriscoe.net>>; ecn-sane at lists.bufferbloat.net<mailto:ecn-sane at lists.bufferbloat.net>; tsvwg at ietf.org<mailto:tsvwg at ietf.org>
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
> On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com<mailto:koen.de_schepper at nokia-bell-labs.com>> wrote:
>
>>> 2: DualQ can be defeated by an adversary, destroying its ability to isolate L4S traffic.
>
> Before jumping to another point, let's close down your original issue. Since you didn't mention, I assume that you agree with the following, right?
>
> "You cannot defeat a DualQ" (at least no more than a single Q)
I consider forcibly degrading DualQ to single-queue mode to be a defeat. However…
>>> But that's exactly the problem. Single queue AQM does not isolate L4S traffic from "classic" traffic, so the latter suffers from the former's relative aggression in the face of AQM activity.
>
> With L4S a single queue can differentiate between Classic and L4S traffic. That's why it knows exactly how to treat the traffic. For Non-ECT and ECT(0) square the probability, and for ECT(1) don't square, and it works exactly like a DualQ, but then without the latency isolation. Both types get the same throughput, AND delay. See the PI2 paper, which is exactly about a single Q.
Okay, this is an important point: the real assertion is not that DualQ itself is needed for L4S to be safe on the Internet, but for differential AQM treatment to be present at the bottleneck. Defeating DualQ only destroys L4S' latency advantage over "classic" traffic. We might actually be making progress here!
> I agree you cannot isolate in a single Q, and this is why L4S is better than SCE, because it tells the AQM what to do, even if it has a single Q. SCE needs isolation, L4S not.
Devil's advocate time. What if, instead of providing differential treatment WRT CE marking, PI2 instead applied both marking strategies simultaneously - the higher rate using SCE, and the lower rate using CE? Classic traffic would see only the latter; L4S could use the former.
> We tried years ago similar things like needed for SCE, and found that it can't work. For throughput fairness you need the squared relation between the 2 signals, but with SCE, you need to apply both signals in parallel, because you don't know the sender type.
Yes, that's exactly what we do - and it does work.
> - So either the sender needs to ignore CE if it gets SCE, or ignore SCE if you get CE. The first is dangerous if you have multiple bottlenecks, and the second is defeating the purpose of SCE. Any other combination leads to unfairness (double response).
This is a false dichotomy. We quickly realised both of those options were unacceptable, and sought a third way.
SCE senders apply a reduced CE response when also responding to parallel SCE feedback, roughly in line with ABE, on the grounds that responding to SCE does some of the necessary reduction already. The reduced response is still a Multiplicative Decrease, so it fits with normal TCP congestion control principles.
> - you separate the signals in queue dept, first applying SCE and later CE, as you originally proposed, but that results in starvation for SCE.
Yes, although this approach gives the best performance for SCE when used with flow isolation, or when all flows are known to be SCE-aware. So we apply this strategy in those cases, and move the SCE marking function up to overlap CE marking specifically for single queues.
It has been suggested that single queue AQMs are rare in any case, but this approach covers that corner case.
> Add on top that SCE makes it impossible to use DualQ, as you cannot differentiate the traffic types.
SCE is designed around not *needing* to differentiate the traffic types. Single queues have known disadvantages, and SCE doesn't worsen them.
Meanwhile, we have proposed LFQ to cover the DualQ use case. I'd be interested in hearing a principled critique of it.
- Jonathan Morton
_______________________________________________
Ecn-sane mailing list
Ecn-sane at lists.bufferbloat.net<mailto:Ecn-sane at lists.bufferbloat.net>
https://lists.bufferbloat.net/listinfo/ecn-sane
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/ecn-sane/attachments/20190710/a7cbf65a/attachment-0001.html>
More information about the Ecn-sane
mailing list