[Ecn-sane] IETF 110 quick summary

Jonathan Morton chromatix99 at gmail.com
Tue Mar 9 08:53:13 EST 2021


> On 9 Mar, 2021, at 11:57 am, Pete Heist <pete at heistp.net> wrote:
> 
> FQ protects competing flows, unless L4S and non-L4S traffic ends up in
> the same queue. This can happen with a hash collision, or maybe more
> commonly, with tunneled traffic in tunnels that support copying the ECN
> bits from the inner to the outer. If anyone thinks of any other reasons
> we haven't considered why competing flows would share the same 5-tuple
> and thus the same queue, do mention it.

Bob Briscoe's favourite defence to this, at the moment, seems to be that multiple flows sharing one tunnel are *also* disadvantaged when they share an FQ AQM bottleneck with multiple other flows that are not tunnelled, and which the FQ mechanism *can* distinguish.  Obviously this is specious, but it's worth pinning down exactly *why* so we can explain it back to him (and more importantly, anyone else paying attention).

Bob's scenario involves entirely conventional traffic, and a saturated bottleneck managed by an FQ-AQM (fq_codel), which is itself shared with at least one other flow.  We assume that all AQMs in existing networks are ECN enabled (as distinct from the also-common policers which only drop).  The FQ mechanism treats the tunnel as a single flow, and shares out bandwidth equally on that basis.  So the throughput available to the tunnel as a whole is one share of the total, no matter how many flows occupy the tunnel.  Additionally, the same AQM mark/drop rate is applied to everything in the tunnel, causing the flows using it to adopt an RTT-fair relationship to each other.

The disadvantage experienced by the tunnel (relative to a plain AQM) is proportional to the number of flows using the tunnel, and only indirectly related to the number of other flows using the bottleneck.  This I would classify as Minor severity, since it is a moderate, sustained effect.  It increases in effect only linearly with the load on the tunnel, which is the same as at any ordinary bottleneck - and this is routinely tolerated.

Note that if the tunnel is the only traffic using the bottleneck, the situation is equivalent to a plain, single-queue AQM.  This is an important degenerate case, which we can come back to later.  Also, in principle the effect can be avoided by either not using the tunnel, or by dividing the flows between multiple tunnels that the FQ mechanism *can* distinguish.  This puts the risk into either an "involved participant" or "interested observer" category, unless the tunnel has been imposed on the user without knowledge or consent.  What this means is that the tunnel user might reasonably consider the security or privacy benefit of the tunnel to outweigh the performance defect it incurs, and thereby choose to continue using it.

Now, let us add one L4S flow to the tunnel, replacing one of the conventional flows in it, but keeping everything else the same.  The conventional flows *outside* the tunnel are unaffected, because they are protected by the FQ-AQM.  But the conventional flows *inside* the tunnel, which the FQ-AQM cannot protect because it cannot distinguish them, are immediately squashed to minimum cwnd or thereabouts, which may be considerably less than the fair-share BDP within that allocated by the tunnel.  The L4S flow thereby grows to dominate the tunnel traffic as described elsewhere.  This is clearly a Major severity effect, as the conventional traffic in the tunnel is seriously impaired.

Note that if the tunnel shared a plain AQM bottleneck, without FQ, with other conventional flows outside the tunnel, these other flows would *also* be squashed by the L4S flow in the tunnel.  This is because the AQM must increase its signalling rate considerably to control the L4S flow, and it applies the same signalling rate to all traffic.  The FQ-AQM only increases signalling to the flow requiring it.

Returning to the degenerate case where the tunnel is the only traffic using the bottleneck, the situation remains the same within the tunnel, and the behaviour is again equivalent to a plain AQM, with the L4S flow dominating and the conventional traffic severely impaired.  The tunnel as a whole now occupies the full bottleneck rather than merely a fraction of it, but almost all of this extra capacity is used by the L4S flow, and can't be effectively used by the conventional flows within the tunnel.

It is therefore clear that the effect is caused by the L4S flow meeting a conventional AQM, and not by the FQ mechanism.  Furthermore, the effect of an L4S flow within a tunnel is *over and above* any effects imposed on the tunnel as a whole by an FQ-AQM.

The main proposed solution to this is to upgrade the AQM at the bottleneck, so that it understands the ECT(1) signal distinguishing the L4S traffic from conventional traffic.  But this imposes the burden of mitigating the problem on the existing network, an "innocent bystander".  This is therefore clearly not an appropriate strategy; L4S should instead ensure that it reacts appropriately to congestion signals produced by existing networks, which by RFC-3168 compliance treat ECT(1) as equivalent to ECT(0).

If L4S cannot do this reliably - and we doubt that it can - then it must either be redesigned to use an unambiguous signal, or explicitly confined to networks which have been prepared for it by removing/upgrading all conventional AQMs.  We have proposed two possible methods of redesigning L4S, both of which have been rejected by the L4S team.

 - Jonathan Morton


More information about the Ecn-sane mailing list