[Bloat] The "Some Congestion Experienced" ECN codepoint - a new internet draft -

Mon Mar 11 04:59:35 EDT 2019

> On 11 Mar, 2019, at 9:08 am, Mikael Abrahamsson <swmike at swm.pp.se> wrote:
> 
> …also meant the packet was allowed to be re-ordered. I thought this was a big and nice thing…

Seriously?  I had to dig in the specs to find any mention of that, and… it's all about better supporting bonded links.  Which can already be done by implementing RACK at the sender, and all you propose is that when L4S is in use, the extra buffering at the link layer is dropped.  This is absolutely useless for ordinary Internet users, who are unlikely to have consecutive packets sufficiently closely traversing such a link for this reordering to exceed the 3-dupack threshold in any case - so you might as well delete that reordering buffer anyway, and let the endpoints handle it.  You don't need L4S for that.

At bottom L4S is also a very simple scheme: when ECT(1) is used, a higher CE marking rate is expected of middleboxes, and a less aggressive backoff in response to CE is expected of endpoints.  This is, incidentally, incompatible with Codel-like AQMs on the one hand and existing TCPs on the other hand, and so requires negotiation between the endpoints  (eg. using AccECN) to discover whether setting ECT(1) at the sender is legal.  SCE does not require such negotiation (ie. a transport could implement it entirely at the receiver, manipulating the send rate via the already-standardised receive window), so should be easier to specify and deploy successfully.

> On 11 Mar, 2019, at 9:35 am, Richard Scheffenegger <rscheff at gmx.at> wrote:
> 
> I can remember reading quite a few papers where a similar scheme for ect(1) was adopted - often with additional changes on both ends to make use of this signal. Including schemes that encoded complex information in the stream of ect0/ect1...
> 
> Where can one find simulations of the interaction between legacy and l4s flows when using this?

If by "between legacy and l4s" you mean "between existing SCE-ignorant and new SCE-aware" - because SCE is not L4S…

We haven't yet drilled down to that level of detail, but we plan to do so in the very near future.  SCE itself is just a means of communicating network state from AQMs to receivers, hence the brevity and simplicity of the I-D.

The encoding is not complex, just a ratio of ECT to SCE markings which is ignored by existing receivers and not produced by existing middleboxes, *and* continued use of CE as presently standardised.  This naturally ensures backwards compatibility and incremental deployability.  The stable ratio of ECT to SCE is defined as 1:1, allowing more flexibility in signalling how much unused link capacity remains than DCTCP's "2 CEs per RTT" stability condition permits, and potentially simplifying implementations.

The question of whether SCE-aware transports will compete fairly with SCE-ignorant transports, given a single-queue AQM implementing SCE, is an interesting and important one.  Intuitively, one can see that the SCE-ignorant transport will drive the AQM into the CE-marking regime, which may cause both flows to drop back, but there will be periods when the SCE-aware flow has inhibited growth while the SCE-ignorant flow has not, and so the SCE-aware flow might have overall lower throughput.

On the other hand, a Codel-like AQM (as opposed to a RED-like AQM) has a higher probability of signalling CE to a flow that sends more packets, which may naturally compensate for this.  And on the gripping hand, an SCE-aware flow operating alone should have (slightly) higher goodput than an SCE-ignorant flow operating alone, which illustrates the case for flow-isolating AQMs.

Two SCE-aware transports might interact strangely on a single-queue AQM, depending on how their response to SCE is implemented.  Eliminating the present sawtooth behaviour entirely is a desirable goal, but AIMD is what allows flows competing in a single queue to converge on fair throughput.  This will require careful attention when specifying transport behaviour.

One exciting possibility is finally solving the slow-start problem, by beginning SCE signalling (at a low rate which still permits window growth) when the link is half-full rather than when queuing begins.  That would cause the exponential growth phase of typical TCPs to exit one RTT later, when the link is just fully utilised, instead of massively overshooting and being cut back to nothing as presently occurs.  There is a range of flow lengths which could see a concrete performance benefit from that.

That requires that the AQM is able to determine when the link reaches 50% utilisation, which is not always true (Cake and hardware AQMs could do it, being aware of the link capacity) - but SCE does not mandate this behaviour, it's just a good idea.  I'm not aware of any possibility for L4S to achieve the same result; however much it redefines the meaning of CE, it won't be asserted until the link is full.

SCE-aware AQMs and TCPs are not difficult to describe and should not be very difficult to implement.  Though not described explicitly in the SCE I-D, I'm working on descriptions of how to do so, which should become additional I-Ds.  We should then be able to write working code and run simulations in ns-3, and maybe also in Linux.

 - Jonathan Morton