[Ecn-sane] draft-white-tsvwg-nqb-02 comments

Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed

* [Ecn-sane] draft-white-tsvwg-nqb-02 comments
@ 2019-08-24 22:36 Sebastian Moeller
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Moeller @ 2019-08-24 22:36 UTC (permalink / raw)
  To: tsvwg IETF list, ECN-Sane




"Flow queueing (FQ) approaches (such as fq_codel [RFC8290]), on the other hand, achieve latency improvements by associating packets into "flow" queues and then prioritizing "sparse flows", i.e. packets that arrive to an empty flow queue. Flow queueing does not attempt to differentiate between flows on the basis of value (importance or latency-sensitivity), it simply gives preference to sparse flows, and tries to guarantee that the non-sparse flows all get an equal share of the remaining channel capacity and are interleaved with one another. As a result, FQ mechanisms could be considered more appropriate for unmanaged environments and general Internet traffic."


	[SM] An intermediate hop does not have any real handle of the "value" of a packet and hence will can not "differentiate between flows on the basis of value" this is true for FQ and non-FQ approaches. What this section calls "preference to sparse flows" is basically the same as NBQ with queue protection does to NBQ-marked packets, give them the benefit of the doubt until they exceed a  variable sojourn/queueing threshold after which they are treated less preferentially, either by not being treated as "sparse" anymore, or by being redirected into the QB-flow queue. With the exception that FQ based solutions will immediately move all packets of such a non-sparse flow out of the way of qualifying sparse flows, while queue protection as described in the docsis document will only redirect newly incoming packets to the NBQ-flow queue, all falsely classified packets will remain in and "pollute" the NQB-flow queue.


"Downsides to this approach include loss of low latency performance due to the possibility of hash collisions (where a sparse flow shares a queue with a bulk data flow), complexity in managing a large number of queues in certain implementations, and some undesirable effects of the Deficit Round Robin (DRR) scheduling."


	[SM] As described in DOCSIS-MULPIv3.1 the queue protection method only has a limited number of buckets to use and will to my understanding account all flows above that number in the default bucket, this looks pretty close to the consequence of a hash collision to me as I interpret this as the same fate-sharing of nominally independent flows observed in FQ when hash collisions occur. While it is fair criticism that these failure mode exists, only mentioning in the context of FQ seems sub-optimal. Especially since the docsis queue protection is defined with only 32 non-default buckets... versus a default of 1024 flows for fq_codel.



"The DRR scheduler enforces that each non-sparse flow gets an equal fraction of link bandwidth,"


	[SM] This is actually a feature not a bug. This will only trigger under load conditions and will give behavior that end-points can actually predict reasonably well. Any other kind of bandwidth sharing between flows is bound to have better best-case behavior, but also much worse worst-case behavior (like almost complete starvation of some flows). In short equal bandwidth under load seems far superior to forward progress than "anything goes" as it will deliver something good enough without requiring an oracle and without regressing intro starvation territory.
	Tangent, I have read Bob's justification for wanting inequality here, but just mention that an intermediate hop simply can not know or reasonably balance importance of traversing flows (unless in a very controlled environment were all endpoints can be trusted to a) do the right thing and also rank their bandwidth use by overall importance).


"In effect, the network element is making a decision as to what constitutes a flow, and then forcing all such flows to take equal bandwidth at every instant."


	[SM] This seems to hold only under saturating conditions, and as argued above seems to be a reasonable compromise that will be good enough. The intermediate hop has reliable way of objectively ranking the relative importance of the concurrently active flows; and without such a ranking, treating flows all equal seems to be more cautious and conservative than basically allowing anything. 
The network element in front of the saturated link needs to make a decision (otherwise no AQM would be active) and the network element needs to "force" its view on the flows (which by the way is exactly the rationale for recommending queue protection). Also the equal bandwidth for all flows at every instant is simply wrong, as long as the link is not saturated this does not trigger, also no flow is "forced" to take more bandwidth than it requires... Let me try to give a description of how FQ behavior looks from the outside (this is a simplification and hence wrong, but hopefully less wrong than the simplification in the draft: Under saturating conditions with N flows, all flows with rates less than egress_rate/N will send at full blast, just like without saturation, then the remaining bandwidth is equally shared among those flows that are sending at higher rates. This does hence not result in equal rates for all flows at every instance.


"The Dual-queue approach defined in this document achieves the main benefit of fq_codel: latency improvement without value judgements, without the downsides."


	[SM] Well, that seems a rather subjective judgement, also wrong given that queue protection conceptually suffers from similar downsides as fq "hash collisions" and lacks the clear and justify-able middle-of-road equal bandwidth to all (that can make use of it) approach that might not be as optimal as the best possible bandwidth allotment, but has the advantage of not requiring an oracle to be actually guaranteed to work. The point is unequal sharing is a "value judgement" just as equal sharing, so claiming dualQ to be policy free is simply wrong.


"The distinction between NQB flows and QB flows is similar to the distinction made between "sparse flow queues" and "non-sparse flow queues" in fq_codel. In fq_codel, a flow queue is considered sparse if it is drained completely by each packet transmission, and remains empty for at least one cycle of the round robin over the active flows (this is approximately equivalent to saying that it utilizes less than its fair share of capacity). While this definition is convenient to implement in fq_codel, it isn't the only useful definition of sparse flows."


	[SM] Have the fq_codel authors been asked whether the choice of this sparseness measure was by convenience (only)?


Best Regards


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Ecn-sane] draft-white-tsvwg-nqb-02 comments
@ 2019-08-24 22:07 Sebastian Moeller
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Moeller @ 2019-08-24 22:07 UTC (permalink / raw)
  To: tsvwg IETF list, Dave Taht, ECN-Sane

Dear tsvwg,

	[SM] I had a look at draft-white-tsvwg-nqb-02 and amongst other things I tripped over the following section


"8.3. WiFi Networks

WiFi networking equipment compliant with 802.11e generally supports either four or eight transmit queues and four sets of associated CSMA parameters that are used to enable differentiated media access characteristics. Implementations typically utilize the IP DSCP field to select a transmit queue.

As discussed in [RFC8325], most implementations use a default DSCP to User Priority mapping that utilizes the most significant three bits of the DiffServ Field to select User Priority. In the case of the 0x2A codepoint, this would map to UP_5 which is in the "Video" Access Category (one level above "Best Effort").

Systems that utilize [RFC8325], SHOULD map the 0x2A codepoint to UP_6 in the "Voice" Access Category."


	[SM] This is highly debatable! See RFC8325 for a description of the consequences of selecting AC_VO, in short AC_VO entails a considerable advantage of acquiring airtime (over all other ACs) that will immediately affect ALL users of the same channel (and nearby channels); that is all networks using that channel in the RF-neighbourhood, in addition AC-VO also seems to grant longer TXOP (airtime slots) than both AC_BK and AC_BE.
	Using AC_VO for any data flow that is not both reasonably low bandwidth and latency sensitive is rather rude and should not be enshrined in an IETF draft. As NBQ seems specifically designed to also allow for high bandwidth flows, AC_VO should be off-limits, as the consequences will not be restricted to the wifi network carrying the NQB flows. If a big data flow is abusing AC_VO it will effectively slow all other AC's traffic to a trickle and since wifi (currently) is mostly half-duplex it will also affect traffic in both directions.
	Let me try to phrase my objection in IETF conforming terms, this recommendation needs to be reconciled with the mapping recommendations in https://tools.ietf.org/html/rfc8325 for different types of traffic. I realize that this draft references RFC8325, but it clearly fails to understand the rationale for dividing different traffic types into different access classes due to the side-effects these classes have.

Best Regards
	Sebastian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-08-24 22:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-24 22:36 [Ecn-sane] draft-white-tsvwg-nqb-02 comments Sebastian Moeller
  -- strict thread matches above, loose matches on Subject: below --
2019-08-24 22:07 Sebastian Moeller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox