[Ecn-sane] draft-white-tsvwg-nqb-02 comments
Sebastian Moeller
moeller0 at gmx.de
Sat Aug 24 18:36:46 EDT 2019
"Flow queueing (FQ) approaches (such as fq_codel [RFC8290]), on the other hand, achieve latency improvements by associating packets into "flow" queues and then prioritizing "sparse flows", i.e. packets that arrive to an empty flow queue. Flow queueing does not attempt to differentiate between flows on the basis of value (importance or latency-sensitivity), it simply gives preference to sparse flows, and tries to guarantee that the non-sparse flows all get an equal share of the remaining channel capacity and are interleaved with one another. As a result, FQ mechanisms could be considered more appropriate for unmanaged environments and general Internet traffic."
[SM] An intermediate hop does not have any real handle of the "value" of a packet and hence will can not "differentiate between flows on the basis of value" this is true for FQ and non-FQ approaches. What this section calls "preference to sparse flows" is basically the same as NBQ with queue protection does to NBQ-marked packets, give them the benefit of the doubt until they exceed a variable sojourn/queueing threshold after which they are treated less preferentially, either by not being treated as "sparse" anymore, or by being redirected into the QB-flow queue. With the exception that FQ based solutions will immediately move all packets of such a non-sparse flow out of the way of qualifying sparse flows, while queue protection as described in the docsis document will only redirect newly incoming packets to the NBQ-flow queue, all falsely classified packets will remain in and "pollute" the NQB-flow queue.
"Downsides to this approach include loss of low latency performance due to the possibility of hash collisions (where a sparse flow shares a queue with a bulk data flow), complexity in managing a large number of queues in certain implementations, and some undesirable effects of the Deficit Round Robin (DRR) scheduling."
[SM] As described in DOCSIS-MULPIv3.1 the queue protection method only has a limited number of buckets to use and will to my understanding account all flows above that number in the default bucket, this looks pretty close to the consequence of a hash collision to me as I interpret this as the same fate-sharing of nominally independent flows observed in FQ when hash collisions occur. While it is fair criticism that these failure mode exists, only mentioning in the context of FQ seems sub-optimal. Especially since the docsis queue protection is defined with only 32 non-default buckets... versus a default of 1024 flows for fq_codel.
"The DRR scheduler enforces that each non-sparse flow gets an equal fraction of link bandwidth,"
[SM] This is actually a feature not a bug. This will only trigger under load conditions and will give behavior that end-points can actually predict reasonably well. Any other kind of bandwidth sharing between flows is bound to have better best-case behavior, but also much worse worst-case behavior (like almost complete starvation of some flows). In short equal bandwidth under load seems far superior to forward progress than "anything goes" as it will deliver something good enough without requiring an oracle and without regressing intro starvation territory.
Tangent, I have read Bob's justification for wanting inequality here, but just mention that an intermediate hop simply can not know or reasonably balance importance of traversing flows (unless in a very controlled environment were all endpoints can be trusted to a) do the right thing and also rank their bandwidth use by overall importance).
"In effect, the network element is making a decision as to what constitutes a flow, and then forcing all such flows to take equal bandwidth at every instant."
[SM] This seems to hold only under saturating conditions, and as argued above seems to be a reasonable compromise that will be good enough. The intermediate hop has reliable way of objectively ranking the relative importance of the concurrently active flows; and without such a ranking, treating flows all equal seems to be more cautious and conservative than basically allowing anything.
The network element in front of the saturated link needs to make a decision (otherwise no AQM would be active) and the network element needs to "force" its view on the flows (which by the way is exactly the rationale for recommending queue protection). Also the equal bandwidth for all flows at every instant is simply wrong, as long as the link is not saturated this does not trigger, also no flow is "forced" to take more bandwidth than it requires... Let me try to give a description of how FQ behavior looks from the outside (this is a simplification and hence wrong, but hopefully less wrong than the simplification in the draft: Under saturating conditions with N flows, all flows with rates less than egress_rate/N will send at full blast, just like without saturation, then the remaining bandwidth is equally shared among those flows that are sending at higher rates. This does hence not result in equal rates for all flows at every instance.
"The Dual-queue approach defined in this document achieves the main benefit of fq_codel: latency improvement without value judgements, without the downsides."
[SM] Well, that seems a rather subjective judgement, also wrong given that queue protection conceptually suffers from similar downsides as fq "hash collisions" and lacks the clear and justify-able middle-of-road equal bandwidth to all (that can make use of it) approach that might not be as optimal as the best possible bandwidth allotment, but has the advantage of not requiring an oracle to be actually guaranteed to work. The point is unequal sharing is a "value judgement" just as equal sharing, so claiming dualQ to be policy free is simply wrong.
"The distinction between NQB flows and QB flows is similar to the distinction made between "sparse flow queues" and "non-sparse flow queues" in fq_codel. In fq_codel, a flow queue is considered sparse if it is drained completely by each packet transmission, and remains empty for at least one cycle of the round robin over the active flows (this is approximately equivalent to saying that it utilizes less than its fair share of capacity). While this definition is convenient to implement in fq_codel, it isn't the only useful definition of sparse flows."
[SM] Have the fq_codel authors been asked whether the choice of this sparseness measure was by convenience (only)?
Best Regards
More information about the Ecn-sane
mailing list