[Bloat] AQM & Net Neutrality

Mon May 24 10:30:36 EDT 2021

> On 24 May, 2021, at 4:09 pm, Livingood, Jason via Bloat <bloat at lists.bufferbloat.net> wrote:
> 
> I’m looking for opinions here re bloat-busting techniques like AQM in the context of network neutrality (NN). The worry I have is whether some non-technical people will misunderstand how AQM works & conclude that implementing it may violate NN because it would make interactive traffic perform better than it does today. That is true of course – it’s a design goal of AQM, but non-interactive traffic performs as well as it always has – it is not disadvantaged.
>  
> Maybe the worries I have heard just points out the need for more education/awareness about what delay is and why things like AQM are not prioritization/QoS? I appreciate any thoughts.

I'm pleased to help with education in this area.  The short and simplistic answer would be that AQM treats all traffic going through it equally; the non-interactive traffic *also* sees a reduction in latency; though many people won't viscerally notice this, they can observe it if they look closely.  More importantly, it's not necessary for traffic to make any sort of business or authentication arrangement in order to benefit from AQM, only comply with existing, well-established specifications as they already do.

If the traffic supports ECN, the AQM can use that instead of packet drops for signalling, which tends to actually *reduce* packet loss in bulk transfers, compared to simply bouncing off the tail end of a dumb FIFO.  Reduced latency would already make recovering from these losses easier for the transport, but eliminating them entirely means that the application receives a completely smooth delivery, with no sudden pauses and jumps caused by the recovery process.

It's worth digging into the details a bit to solidify the message for a broader range of audiences.  You might start with my recently published Informational draft discussing different types of latency:

	https://datatracker.ietf.org/doc/html/draft-morton-tsvwg-interflow-intraflow-delays-00

I'll note in passing that AQM can be used as part of a system that would, in fact, violate Net Neutrality.  It's important to distinguish the effects caused by different parts of the system, so that a Net-Neutral system can be obtained by deleting just the parts that are incompatible with it.  Current indications, for example, are that L4S would fall under that definition, since it gives a strong throughput priority to traffic carrying its identifier.

Without an AQM or FQ system at the bottleneck, interactive traffic is at the mercy of any competing traffic as to how much delay it will suffer in the queue.  Non-interactive traffic seeks to keep that queue full in order to maximise throughput, with latency being considered unimportant.  If the queue size is chosen carefully, the damage can be limited to some extent, but there is a limit to how much a buffer can be shrunk before it starts reducing throughput as well.  In short, it is not possible to treat both types of traffic equally well with a dumb FIFO; you must favour one or the other, and historically it was throughput-sensitive applications that won that debate.  I think there is a little bit more awareness of right-sizing buffers these days, but that can still easily lead to hundreds of milliseconds of unnecessary delay to interactive traffic, which is difficult to tolerate.

By contrast, when you implement both AQM and FQ at the bottleneck, interactive traffic is no longer affected by competing traffic at all, as long as they use less than their "fair share" of available throughput capacity.  When that threshold is exceeded, the AQM will start working to inform them that they're sending too fast, and the FQ will regulate the flow so that it *is* a fair share that it consumes.  This "fair share" metric has a number of different reasonable definitions, but it should not be confused with a so-called "fair usage policy"; it only kicks in when all the capacity on the link is already in use.

You can also implement *just* AQM at the bottleneck.  In this case the benefit seen by interactive traffic is somewhat diluted, because all traffic goes through the same queue in FIFO order.  The AQM simply tells traffic to slow down if the queue shows signs of filling up.  This leaves the queue still able to handle bursts of traffic (which are the main concern for throughput), while minimising the delay since the queue does not sit constantly full.  An "Approximate Fairness" AQM also dynamically steers AQM signalling towards traffic that contributes the most to congestion, leaving lighter and interactive traffic alone.

Alternatively, you could implement *just* FQ, although this is usually seen as the more difficult component of the two to implement at high speed and/or large scale.  This would simply hold bulk traffic to its "fair share", and keep it out of the way of interactive traffic, without also reducing the delay to the bulk traffic flows.  I would suggest that if you implement FQ, you can also usually implement AQM on top with little difficulty.

Please do ask for further clarification if that would be helpful.

 - Jonathan Morton