[Ecn-sane] [tsvwg] Comments on L4S drafts

Dave Taht dave.taht at gmail.com
Wed Jul 10 09:14:31 EDT 2019


I keep trying to stay out of this conversation being yellow about ecn in
the first place, in any form. I would like to stress that
ecn-sane was formed by the group of folk that were concerned about having
accidentally masterminded the worlds biggest fq + aqm
deployment, and the only one with ecn support, which happens

In the case of wifi, the deployment is now in the 10s of millions, and
doing hordes of good - latencies measured in the 10s of ms rather than 10s
of seconds.

I have seen no numbers on how well l4s will make it over to wifi as yet,
nor any discussion, and I would rather like more pieces of the l4s solution
to land sufficiently integrated for testing using tools like flent, and
over far more than just a isochronous mac layer like dsl or docsis. Given
the size of a txop in wifi (5.3ms), and how far back we have
to put the AQM and FQ components today (2 txops), I don't think many of
either SCE or L4S concepts will work well on wifi... but in general
I prefer not to make assertions or assumptions until real-world testing can
commence.

I am presently at the battlemesh conference trying to get a bit of
real-world data.

A big problem wifi and 3g have is too many retransmits at the mac layer,
not congestion controlled. Any signalling gets there late, and it's
better to drop a bunch of packets when you hit a bunch of retransmits, in
general. IMHO.

On Wed, Jul 10, 2019 at 2:05 AM De Schepper, Koen (Nokia - BE/Antwerp) <
koen.de_schepper at nokia-bell-labs.com> wrote:

> Hi Jake,
>
> >> I agree the key question for this discussion is about how best to get
> low latency for the internet.
> Thanks
>
> >> under the L4S approach for ECT(1), we can achieve it with either dualq
> or fq at the bottleneck, but under the SCE approach we can only do it with
> fq at the bottleneck.
> Correct
>
> >> we agree that in neither case can very low latency be achieved with a
> classic single queue with classic bandwidth-seeking traffic
> Correct, not without compromising latency for Prague or
> throughput/utilization/stability/drop for Reno/Cubic
>
> >> Are you saying that even if a scalable FQ can be implemented in
> high-volume aggregated links at the same cost and difficulty as dualq,
> there's a reason not to use FQ?
>


> FQ for "per-user" isolation in access equipment has clearly an extra cost,
> not?


I've argued in the past that hashing is a bog standard part of most network
cards and switches already.

"extra cost" should be measured by actual measurements. Usually when you do
those, you find it's another variable entirely costing you the most
cpu/circuits.


If we need to implement FQ "per-flow" on top, we need 2 levels of FQ
> (per-user and per-user-flow, so from thousands to millions of queues).
> Also, I haven’t seen DC switches coming with an FQ AQM...
>

Meh. Most of the time the instantaneous number of queues for some
measurement of instantenious is in the low hundreds for rates up to
10GigE. We don't have a lot of data for bigger pipes.

I haven't seen any DC switches with support anything other than RED or AFD,
and DC folk overprovision anyway.



> >> Is there a use case where it's necessary to avoid strict isolation if
> strict isolation can be accomplished as cheaply?
>
> Even if as cheaply, as long as there is no reliable flow identification,
> it clearly has side effects. Many homeworkers are using a VPN tunnel, which
> is only one flow encapsulating maybe dozens.


This is true. For a local endpoint for a vpn from a router fq_codel long
ago gained support for doing the hashing & FQ before entering the tunnel.

This works only with in-kernel ipsec transports although I've been trying
to get it added to wireguard for a long time now.

 It of course doesn't apply to the whole path, but when applied at the home
gateway router (bottleneck link), works rather well.

Here are two examples of that mechanism in play.

http://www.taht.net/~d/ipsec_fq_codel/oldqos.png

http://www.taht.net/~d/ipsec_fq_codel/newqos.png

Drop and ECN (if implemented correctly) are tunnel agnostic. Also how flows
> are identified might evolve (new transport protocols, encapsulations,
> ...?). Also if strict flow isolation could be done correctly, it has
> additional issues related to missed scheduling opportunities, besides it is
> a hard-coded throughput policy (and even mice size = 1 packet). On the
> other hand, flow isolation has benefits too, so hard to rule out one of
> them, not?
>

The packet dissector in linux is quite robust, the one in BSD, less so.

A counterpoint to the entire ECN debate (l4s or sce) that I'd like to make
at more length is that it can and does hurt non ecn'd flows, particularly
at lower
bandwidths when you cannot reduce cwnd below 2 and the link is thus
saturated. ARP can starve. ISIS fails. batman - lacking an IP header -  can
starve.
babel, lacking ecn support can start to fail. And so on.


> >> Also, I think if the SCE position is "low latency can only be achieved
> with FQ", that's different from "forcing only FQ on the internet", provided
> the fairness claims hold up, right?  (Classic single queue AQMs may still
> have a useful place in getting pretty-good latency in the cheapest
> hardware, like maybe PIE with marking.)
>
> Are you saying that the real good stuff can only be for FQ 😉? Fairness
> between a flow getting only one signal and another getting 2 is an issue,
> right? The one with the 2 signals can either ignore one, listen half to
> both, or try to smooth both signals to find the average loudest one? Again
> safety or performance needs to be chosen. PIE or PI2 is optimal for Classic
> traffic and good to couple congestion to Prague traffic, but Prague traffic
> needs a separate Q and an immediate step to get the "good stuff" working.
> Otherwise it will also overshoot, respond sluggish, etc...
>
> >> Anyway, to me this discussion is about the tradeoffs between the 2
> proposals.  It seems to me SCE has some safety advantages that should not
> be thrown away lightly,
>
> I appreciate the efforts of trying to improve L4S, but nobody working on
> L4S for years now see a way that SCE can work on a non-FQ system. For me
> (and I think many others) it is a no-go to only support FQ. Unfortunately
> we only have half a bit free, and we need to choose how to use it. Would
> you choose for the existing ECN switches that cannot be upgraded (are there
> any?) or for all future non-FQ systems.
>
>


> >> so if the performance can be made equivalent, it would be good to know
> about it before committing the codepoint.
>
> The performance in FQ is clearly equivalent,


Huh?


> but for a common-Q behavior, only L4S can work. As far as I understood the
> SCE-LFQ proposal is actually a slower FQ implementation (an FQ in DualQ
> disguise 😉), so I think not really a better alternative than pure FQ. Also
> its single AQM on the bulk queue will undo any isolation, as a coupled AQM
> is stronger than any scheduler, including FQ. Don't underestimate the power
> of congestion control 😉. The ultimate proof is in the DualQ Coupled AQM
> where congestion control can beat a priority scheduler. If you want FQ to
> have effect, you need to have an AQM per FQ... The authors will notice this
> when they implement an AQM on top of it. I saw the current implementation
> works only in taildrop mode. But I think it is very good that the SCE
> proponents are very motivated to try with this speed to improve L4S. I'm
> happy to be proven wrong, but up to now I don't see any promising
> improvements to justify delay for L4S, only the above alternative
> compromise. Agreed that we can continue exploring alternative proposal in
> parallel though.
>
>
I cannot parse this extreme set of assumptions and declarations. "taildrop
mode??"

As for promising improvements in general, there is a 7 year old deployment,
running code,  of something that we've show to work well in a variety
of network scenarios, with 10x-100x improvements in network latency, at
roughly 100% in linux overall, widely used in wifi and in many, many
SQM/Qos systems and containers, with basic rfc3168 ecn enabled... and a
proposal for a backward compatible way of enhancing that still more being
explored. The embedded hardware pipeline
for future implementations of this tech is full - it would take 3+ years to
make a course change....

vs something that still has no real-world deployment data at all, that
changes the definition of ecn, that has not a public ns2 or n3 model (?),
no testing aside from a few
very specific benchmarks, and so on...

I do hope the coding competition heats up more, with more running code that
others can explore, most of all. I long ago tired of the endless debates,
as everyone knows,
and I do kind of wish I wasn't burning lunch on this email instead of
setting up a test at battlemesh.

I note also that my leanings - in a fq_codel'd world, were it to stay such,
was to enable more RTT based CCs  like BBRto work more often in an RTT
mode, and thus
we start - originally to me, the SCE idea was a way to trigger a faster
switch to congestion avoidance - as most of my captures taken from over
used APs in
restaurants, cafes, train stations etc shows stuff in slow start to be the
biggest problem - and, regardless, an initial CE, right now, is a strong
indicator that fq-codel is present, and
a RTT based tcp can thus start to happen, and a good one, would not have
many future marks after the first.

A big difference in our outlooks, I guess, is that my viewpoint is that
most of the congestion is at the edges of the network and I don't care all
that
much about big iron or switches, and I don't think either can afford much
aqm tech at all in the first place. Not dual queues, not fqs.

Were L4S not to deploy (using ect1 as a marker - btw, I think CS5 might be
a better candidate as it goes into the wifi VI queue), and a
fq_pie/fq_codel/sch_cake
world to remain predominant, well, we might get somewhere, faster, where it
counted.

Koen.
>
>
> -----Original Message-----
> From: Holland, Jake <jholland at akamai.com>
> Sent: Monday, July 8, 2019 10:56 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper at nokia-bell-labs.com>; Jonathan Morton <
> chromatix99 at gmail.com>
> Cc: ecn-sane at lists.bufferbloat.net; tsvwg at ietf.org
> Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
>
> Hi Koen,
>
> I'm a bit confused by this response.
>
> I agree the key question for this discussion is about how best to get low
> latency for the internet.
>
> If I'm reading your message correctly, you're saying that under the L4S
> approach for ECT(1), we can achieve it with either dualq or fq at the
> bottleneck, but under the SCE approach we can only do it with fq at the
> bottleneck.
>
> (I think I understand and roughly agree with this claim, subject to some
> caveats.  I just want to make sure I've got this right so far, and that we
> agree that in neither case can very low latency be achieved with a classic
> single queue with classic bandwidth-seeking
> traffic.)
>
> Are you saying that even if a scalable FQ can be implemented in
> high-volume aggregated links at the same cost and difficulty as dualq,
> there's a reason not to use FQ?  Is there a use case where it's necessary
> to avoid strict isolation if strict isolation can be accomplished as
> cheaply?
>
> Also, I think if the SCE position is "low latency can only be achieved
> with FQ", that's different from "forcing only FQ on the internet", provided
> the fairness claims hold up, right?  (Classic single queue AQMs may still
> have a useful place in getting pretty-good latency in the cheapest
> hardware, like maybe PIE with
> marking.)
>
> Anyway, to me this discussion is about the tradeoffs between the
> 2 proposals.  It seems to me SCE has some safety advantages that should
> not be thrown away lightly, so if the performance can be made equivalent,
> it would be good to know about it before committing the codepoint.
>
> Best regards,
> Jake
>
> On 2019-07-08, 03:26, "De Schepper, Koen (Nokia - BE/Antwerp)" <
> koen.de_schepper at nokia-bell-labs.com> wrote:
>
>     Hi Jonathan,
>
>     From your responses below, I have the impression you think this
> discussion is about FQ (flow/fair queuing). Fair queuing is used today
> where strict isolation is wanted, like between subscribers, and by
> extension (if possible and preferred) on a per transport layer flow, like
> in Fixed CPEs and Mobile networks. No discussion about this, and assuming
> we have and still will have an Internet which needs to support both common
> queues (like DualQ is intended) and FQs, I think the only discussion point
> is how we want to migrate to an Internet that supports optimally Low
> Latency.
>
>     This leads us to the question L4S or SCE?
>
>     If we want to support low latency for both common queues and FQs we
> "NEED" L4S, if we need to support it only for FQs, we "COULD" use SCE too,
> and if we want to force the whole Internet to use only FQs, we "SHOULD" use
> SCE 😉. If your goal is to force only FQs in the Internet, then let this be
> clear... I assume we need a discussion on another level in that case (and
> to be clear, it is not a goal I can support)...
>
>     Koen.
>
>
>     -----Original Message-----
>     From: Jonathan Morton <chromatix99 at gmail.com>
>     Sent: Friday, July 5, 2019 10:51 AM
>     To: De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper at nokia-bell-labs.com>
>     Cc: Bob Briscoe <ietf at bobbriscoe.net>; ecn-sane at lists.bufferbloat.net;
> tsvwg at ietf.org
>     Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
>
>     > On 5 Jul, 2019, at 9:46 am, De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper at nokia-bell-labs.com> wrote:
>     >
>     >>> 2: DualQ can be defeated by an adversary, destroying its ability
> to isolate L4S traffic.
>     >
>     > Before jumping to another point, let's close down your original
> issue. Since you didn't mention, I assume that you agree with the
> following, right?
>     >
>     >        "You cannot defeat a DualQ" (at least no more than a single Q)
>
>     I consider forcibly degrading DualQ to single-queue mode to be a
> defeat.  However…
>
>     >>> But that's exactly the problem.  Single queue AQM does not isolate
> L4S traffic from "classic" traffic, so the latter suffers from the former's
> relative aggression in the face of AQM activity.
>     >
>     > With L4S a single queue can differentiate between Classic and L4S
> traffic. That's why it knows exactly how to treat the traffic. For Non-ECT
> and ECT(0) square the probability, and for ECT(1) don't square, and it
> works exactly like a DualQ, but then without the latency isolation. Both
> types get the same throughput, AND delay. See the PI2 paper, which is
> exactly about a single Q.
>
>     Okay, this is an important point: the real assertion is not that DualQ
> itself is needed for L4S to be safe on the Internet, but for differential
> AQM treatment to be present at the bottleneck.  Defeating DualQ only
> destroys L4S' latency advantage over "classic" traffic.  We might actually
> be making progress here!
>
>     > I agree you cannot isolate in a single Q, and this is why L4S is
> better than SCE, because it tells the AQM what to do, even if it has a
> single Q. SCE needs isolation, L4S not.
>
>     Devil's advocate time.  What if, instead of providing differential
> treatment WRT CE marking, PI2 instead applied both marking strategies
> simultaneously - the higher rate using SCE, and the lower rate using CE?
> Classic traffic would see only the latter; L4S could use the former.
>
>     > We tried years ago similar things like needed for SCE, and found
> that it can't work. For throughput fairness you need the squared relation
> between the 2 signals, but with SCE, you need to apply both signals in
> parallel, because you don't know the sender type.
>
>     Yes, that's exactly what we do - and it does work.
>
>     >   - So either the sender needs to ignore CE if it gets SCE, or
> ignore SCE if you get CE. The first is dangerous if you have multiple
> bottlenecks, and the second is defeating the purpose of SCE. Any other
> combination leads to unfairness (double response).
>
>     This is a false dichotomy.  We quickly realised both of those options
> were unacceptable, and sought a third way.
>
>     SCE senders apply a reduced CE response when also responding to
> parallel SCE feedback, roughly in line with ABE, on the grounds that
> responding to SCE does some of the necessary reduction already.  The
> reduced response is still a Multiplicative Decrease, so it fits with normal
> TCP congestion control principles.
>
>     >   - you separate the signals in queue dept, first applying SCE and
> later CE, as you originally proposed, but that results in starvation for
> SCE.
>
>     Yes, although this approach gives the best performance for SCE when
> used with flow isolation, or when all flows are known to be SCE-aware.  So
> we apply this strategy in those cases, and move the SCE marking function up
> to overlap CE marking specifically for single queues.
>
>     It has been suggested that single queue AQMs are rare in any case, but
> this approach covers that corner case.
>
>     > Add on top that SCE makes it impossible to use DualQ, as you cannot
> differentiate the traffic types.
>
>     SCE is designed around not *needing* to differentiate the traffic
> types.  Single queues have known disadvantages, and SCE doesn't worsen them.
>
>     Meanwhile, we have proposed LFQ to cover the DualQ use case.  I'd be
> interested in hearing a principled critique of it.
>
>      - Jonathan Morton
>
>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane
>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/ecn-sane/attachments/20190710/6fc5d8e6/attachment-0001.html>


More information about the Ecn-sane mailing list