[Cerowrt-devel] SQM: tracking some diffserv related internet drafts better

Fri Nov 14 09:01:40 EST 2014

Hi Dave,

I probably do not understand the topic fully, but...

On Nov 13, 2014, at 18:26 , Dave Taht <dave.taht at gmail.com> wrote:

> This appears to be close to finalization, or finalized:
> 
> http://tools.ietf.org/html/draft-ietf-dart-dscp-rtp-10
> 
> And this is complementary:
> 
> http://tools.ietf.org/html/draft-ietf-tsvwg-rtcweb-qos-03

	Oha, that’s 15 priority levels (out of ~64 possible?) right there for a browser to mark packets with depending on media type. Now, not all need to map to real queues but that seems a lot, so that I would expect in real life a bunch of those will map to the same queues. 
	If I understand correctly we already have a problems getting  decent AQM implemented at core switching/routing equipment, how realistic is it to expect that these devices implement differential packet drop probabilities per diffserv markings? I f the answer is not realistic the last three DS bits become functionally equal… . Also CS1 for audio/video? I thought this to be the scavenger class and hence not suitable for anything but bulk background traffic if there is even the slightest contention on the path… (on second thought this will allow to turn a CS1 internet radio in a decent congestion monitor, if the audio skips you know the network is starting to develop issues…).

> 
> While wading through all this is tedious, and much of the advice contradictory,
> there are a few things that could be done more right in the sqm system
> that I'd like to discuss. (feel free to pour a cup of coffee and read
> the drafts)
> 
> -1) They still think the old style tos imm bit is obsolete. Sigh. Am I
> the last person that uses ssh or plays games?

	Are we free in cerowrt/SQM to just ignore this and just keep imm (CS2?) above the best effort queue?

> 
> 0) Key to this draft is expecting that the AF code points on a single
> 5-tuple not be re-ordered, which means dumping AF41 into a priority
> queue and AF42 into the BE queue is incorrect.

	So what about sticking to the class selectors only in SQM? If I understand correctly we can match on the CS bits only and ignore the other bits; I think each AFNx map to the same CS(M) class… Looking at section "4.2.2.3  Using the Class Selector PHB Requirements for IP Precedence Compatibility” of http://tools.ietf.org/html/rfc2474#page-11 seems to confirm that interpretation…

> 
> 1) SQM only prioritizes a few diffserv codepoints (just the ones for
> which I had tools doing classification, like ssh). Doing so with tc
> rules is very inefficient presently. I had basically planned on
> rolling a new tc and/or iptables filter to "do the right thing" to map
> into all 64 codepoints via a simple lookup table (as what is in the
> wifi code already), rather than use the existing mechanism... and
> hesitated
> as nobody had nailed down the definitions of each one.

	Well, "tc filter” hurts us badly as I figured out implementing filters to look into PPP encapsulated packets to get to the TOS bits… But in theory all tests for the individual code points can be turned into a hawh operation in tc filter so that we only pay the price for each encapsulation type (IPv4 IPv6, IPv$ in PPP, IPv6 in PPP, to poke through the PPP layer costs a few additional ANDed match tests, but I really really hope that tc filter is smart enough to stop filter processing on the first mismatch…) On my TODO list for SQM is to use tc filter’s hash functionality to process all code points in one operation per packet. This should also allow/require mapping each of the 64 diffserve markings to our queues so that any “ietf-recommendation-of-the-day” can be a easily implemented by changing our mapping table...

> 
> That said, I have not measured recently the impact of the extra tc
> filters and iptables rules required.

	As far as I can tell tc filter is costly, in a “non-scientific” test with netperf-wrapper’s RRUL test I saw the ICMP-CDF “robust-range” (the delay span in which the CDF went from ~5% to 95%) incase from 10ms to 30ms. No idea about the iptables rules (well, the internet seems to argue iptables being much cheaper than tc filter).

> 
> 1a) Certainly only doing AF42 in sqm is pretty wrong (that was left
> over from my test patches against mosh - mosh ran with AF42 for a
> while until they crashed a couple routers with it)

	Why? We could just switch to stash all CS4 packets into this queue and be compliant again to the recommendation to treat packets in each AFN set equally?

> 
> The relevant lines are here:
> 
> https://github.com/dtaht/ceropackages-3.10/blob/master/net/sqm-scripts/files/usr/lib/sqm/functions.sh#L411
> 
> 1b) The cake code presently does it pretty wrong, which is eminately fixable.
> 
> 1c) And given that the standards are settling, it might be time to
> start baking them into a new tc or iptables filter. This would be a
> small, interesting project for someone who wants to get their feet wet
> writing this sort of thing, and examples abound of how to do it.

	So what I plan on doing until the end of the year is getting the hashed tc filter set up for SQM than implementing/testing different mappings will be a piece of cake just change 64 values and you are done...

> 
> 2) A lot of these diffserv specs - notably all the AFxx codepoints -
> are all about variable drop probability. (Not that this concept has
> been proven to work in the real world) We don't do variable drop
> probability... and I haven't the slightest clue as to how to do it in
> fq_codel. But keeping variable diffserv codepoints in order on the
> same 5 tuple seems to be the way things are going. Still I have
> trouble folding these ideas into the 3 basic queue system fq_codel
> uses, it looks to me as most of the AF codepoints end up in the
> current best effort queue, as the priority queue is limited to 30% of
> the bandwidth by default.

	Is this really relevant for the wider internet at all? As you argue below (and as is argued in the drafts cited above) each network can do what ever it likes with code points so the relevant question, as I see it is not what could we do with the code points if we had all 6 bits for us end to end, but rather how many and which bits actually survive a trip over the open internet ;)

> 
> 
> 3) Squashing inbound dscp should still be the default option…

My interpretation of http://tools.ietf.org/html/draft-ietf-dart-dscp-rtp-10 section 3.2’s “ When DiffServ is used, the edge or boundary nodes of a network are responsible for ensuring that all traffic entering that network conforms to that network's policies for DSCP and PHB usage, and such nodes may change DSCP markings on traffic to achieve that result.” is anything goes including remapping to all zeros aka squashing. http://tools.ietf.org/html/rfc2474 talks about a MUST to put CS 6 and 7 into a higher priority class than CS0, but I really doubt that any ISP will allow me to label all my traffic CS7 an will treat it accordingly, so remapping to zero is okay if not by standard then by cause of reality ;). 

> 
> 4) My patch set to the wifi code for diffserv support disables the VO
> queue almost entirely in favor of punting things to the VI queue
> (which can aggregate), but I'm not sure if I handled AFxx
> appropriately.
> 
> 5) So far as I know, no browser implements any of this stuff yet. So
> far as I know nobody actually deployed a router that tries to do smart
> things with this stuff yet.

	I would love to know whether the proposed markings actually survive a trip through the open internet at all. I would like to argue that until that actually happens this a nicely academic discussion (cerowrt does a fine job already with its nice fq_codel hierarchy, and if all the new fancy stuff will be wiped directly by my ISP I am not sure that implementing the proposal in sam is going to change anything, especially nothing that can be measured. As they say, “measurement data or it did not happen” ;) )

> 
> 6) I really wish there were more codepoints for background traffic than cs1.

	But isn’t that what AF1x is all about?. I agree the range of the 6 DS bits is not used to its fullest extend: rather than bits treat is as a number and do: current priority = (DS - 32) so we have a range from -32 to 31 (or so) and simply require that higher values are not treated with less priority than smaller numbers. (Heck maybe special case CS0 to also mean zero for backward/reality compatibility ;) )
	But most likely I just have misunderstood the whole issue….

Best Regards
	Sebastian

> 
> -- 
> Dave Täht
> 
> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel