[Ecn-sane] IETF 110 quick summary

Steven Blake slblake at petri-meat.com
Tue Mar 9 12:31:19 EST 2021


TL;DR: L4S traffic sharing a queue with AQM-Classic ECN will crush non-
L4S traffic.

Thanks, this lines up with my prior understanding (wanted to make sure
I wasn't missing any arguments from the zillions of back-and-forth
emails on the tsvwg list). And I'm glad that at least they appear to
behave correctly in the face of packet discards.

The disaster scenario is that their experiment introduces performance
issues in some unsuspecting operators, causing them to start bleaching
ECN bits.

Their whole safety plan depends on the claim that Classic RFC 3168 ECN 
is not deployed (except in fq_codel on the edge; who cares? they can
patch their code). If that were the case, it would make more sense for
them to try to move classic ECN to historic and redefine ECT(0) to
signal L4S traffic (ala DCTCP). 

It's also been clear that this is not an effort to conduct an
experiment.


On Tue, 2021-03-09 at 15:53 +0200, Jonathan Morton wrote:
> > On 9 Mar, 2021, at 11:57 am, Pete Heist <pete at heistp.net> wrote:
> > 
> > FQ protects competing flows, unless L4S and non-L4S traffic ends up
> > in
> > the same queue. This can happen with a hash collision, or maybe
> > more
> > commonly, with tunneled traffic in tunnels that support copying the
> > ECN
> > bits from the inner to the outer. If anyone thinks of any other
> > reasons
> > we haven't considered why competing flows would share the same 5-
> > tuple
> > and thus the same queue, do mention it.
> 
> Bob Briscoe's favourite defence to this, at the moment, seems to be
> that multiple flows sharing one tunnel are *also* disadvantaged when
> they share an FQ AQM bottleneck with multiple other flows that are
> not tunnelled, and which the FQ mechanism *can*
> distinguish.  Obviously this is specious, but it's worth pinning down
> exactly *why* so we can explain it back to him (and more importantly,
> anyone else paying attention).
> 
> Bob's scenario involves entirely conventional traffic, and a
> saturated bottleneck managed by an FQ-AQM (fq_codel), which is itself
> shared with at least one other flow.  We assume that all AQMs in
> existing networks are ECN enabled (as distinct from the also-common
> policers which only drop).  The FQ mechanism treats the tunnel as a
> single flow, and shares out bandwidth equally on that basis.  So the
> throughput available to the tunnel as a whole is one share of the
> total, no matter how many flows occupy the tunnel.  Additionally, the
> same AQM mark/drop rate is applied to everything in the tunnel,
> causing the flows using it to adopt an RTT-fair relationship to each
> other.
> 
> The disadvantage experienced by the tunnel (relative to a plain AQM)
> is proportional to the number of flows using the tunnel, and only
> indirectly related to the number of other flows using the
> bottleneck.  This I would classify as Minor severity, since it is a
> moderate, sustained effect.  It increases in effect only linearly
> with the load on the tunnel, which is the same as at any ordinary
> bottleneck - and this is routinely tolerated.
> 
> Note that if the tunnel is the only traffic using the bottleneck, the
> situation is equivalent to a plain, single-queue AQM.  This is an
> important degenerate case, which we can come back to later.  Also, in
> principle the effect can be avoided by either not using the tunnel,
> or by dividing the flows between multiple tunnels that the FQ
> mechanism *can* distinguish.  This puts the risk into either an
> "involved participant" or "interested observer" category, unless the
> tunnel has been imposed on the user without knowledge or
> consent.  What this means is that the tunnel user might reasonably
> consider the security or privacy benefit of the tunnel to outweigh
> the performance defect it incurs, and thereby choose to continue
> using it.
> 
> Now, let us add one L4S flow to the tunnel, replacing one of the
> conventional flows in it, but keeping everything else the same.  The
> conventional flows *outside* the tunnel are unaffected, because they
> are protected by the FQ-AQM.  But the conventional flows *inside* the
> tunnel, which the FQ-AQM cannot protect because it cannot distinguish
> them, are immediately squashed to minimum cwnd or thereabouts, which
> may be considerably less than the fair-share BDP within that
> allocated by the tunnel.  The L4S flow thereby grows to dominate the
> tunnel traffic as described elsewhere.  This is clearly a Major
> severity effect, as the conventional traffic in the tunnel is
> seriously impaired.
> 
> Note that if the tunnel shared a plain AQM bottleneck, without FQ,
> with other conventional flows outside the tunnel, these other flows
> would *also* be squashed by the L4S flow in the tunnel.  This is
> because the AQM must increase its signalling rate considerably to
> control the L4S flow, and it applies the same signalling rate to all
> traffic.  The FQ-AQM only increases signalling to the flow requiring
> it.
> 
> Returning to the degenerate case where the tunnel is the only traffic
> using the bottleneck, the situation remains the same within the
> tunnel, and the behaviour is again equivalent to a plain AQM, with
> the L4S flow dominating and the conventional traffic severely
> impaired.  The tunnel as a whole now occupies the full bottleneck
> rather than merely a fraction of it, but almost all of this extra
> capacity is used by the L4S flow, and can't be effectively used by
> the conventional flows within the tunnel.
> 
> It is therefore clear that the effect is caused by the L4S flow
> meeting a conventional AQM, and not by the FQ
> mechanism.  Furthermore, the effect of an L4S flow within a tunnel is
> *over and above* any effects imposed on the tunnel as a whole by an
> FQ-AQM.
> 
> The main proposed solution to this is to upgrade the AQM at the
> bottleneck, so that it understands the ECT(1) signal distinguishing
> the L4S traffic from conventional traffic.  But this imposes the
> burden of mitigating the problem on the existing network, an
> "innocent bystander".  This is therefore clearly not an appropriate
> strategy; L4S should instead ensure that it reacts appropriately to
> congestion signals produced by existing networks, which by RFC-3168
> compliance treat ECT(1) as equivalent to ECT(0).
> 
> If L4S cannot do this reliably - and we doubt that it can - then it
> must either be redesigned to use an unambiguous signal, or explicitly
> confined to networks which have been prepared for it by
> removing/upgrading all conventional AQMs.  We have proposed two
> possible methods of redesigning L4S, both of which have been rejected
> by the L4S team.
> 
>  - Jonathan Morton


Regards,

// Steve






More information about the Ecn-sane mailing list