[Cerowrt-devel] squash/ignore DSCP and mangle table questions

Wed Apr 22 05:03:31 EDT 2015

Hi leetminiwheat,

On Apr 22, 2015, at 02:19 , leetminiwheat <LeetMiniWheat at gmail.com> wrote:

> Sorry this is getting a bit off-topic here.
> 
>> On Wed, Apr 15, 2015 at 5:05 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
>> 
>>> On Apr 15, 2015, at 03:35 , leetminiwheat <LeetMiniWheat at gmail.com> wrote:
>> 
>>> I assume tweaking ring parameters from default RX:128 and TX:32
>>> doesn't matter anymore thenr?
>> 
>>        As far as I know we leave that alone, see: http://www.bufferbloat.net/projects/bloat/wiki/Linux_Tips:
>> “Set the size of the ring buffer for the network interface
>> 
>> NOTE: THIS HACK IS NO LONGER NEEDED on many ethernet drivers in Linux 3.3, which has Byte Queue Limits instead, which does a far better job."
>> 
> I noticed Dave mentioned on a mailing list that changing tx ring still
> does have some benefit, and his notes in debloat script suggest BQL
> doesn't always work as implied.

	Could you cite that note please? I can not find it, @Dave maybe you could comment on the notes applicability?

>> 
>>> 
>>>> [...]
>>>> If you have time and netperf-wrapper it would be good to convince yourself and us again, that txqueuelen really does not matter for BQL’d interfaces by running RRUL tests with and without your modifications….
> 
> Thanks, after extensive RRUL testing.... I've come to the same
> conclusion Dave did, that changing tx perameters just isn't worth it
> and causes instability. I noticed on 120s tests my WAN connection
> would reset with ath9k: pll_reg and latencies would skyrocket
> thereafter. I don't quite have a producible error, but it seemed like
> associating/diassociating wireless clients might be related to it
> (with Revert "debloat: stop changing wifi qlen") but I was also
> changing txring on ethernet for testing at 4, 8, 16, etc.
> 
> Also, I tested some custom HFSC+fq_codel qos scripts here:
> https://github.com/zcecc22/qos-nxt

	Inteeresting, does his never sqm-qos work better for you?

> He defaults to 90% (meaning you have to adjust your b/w limits), and
> the 2-bin codel doesn't seem to work very well. Seemed like an
> interesting compromise between simple and simplest, but the results
> were pretty terrible.

	If you have a RRUL plot to share that would be helpful.

> I'd like to test CAKE more, but it seems
> 3.10.50-1 doesn't have the required kernel support.
> 
> Recent changes in barrier breaker to txring seem pretty dumb, they
> default to 256 txring now I believe, ticket here was closed with
> "worksforme" https://dev.openwrt.org/ticket/13072 so I'm reluctant to
> upgrade, plus I don't fully understand the extent of which Dave's
> kernel hacks impact things. Closer inspection/comparison/diffs are
> needed if I'm to upgrade and integrate Cero's tweaks.

	I am probably off here, but I assume that with a properly sizes BQL the actual tx ring does not matter for latency anymore and can be selected to keep the hardware happy ;)

> 
> Oddly enough, simplest.qos on WAN gives me better throughput/latency
> at times (likely due to less overhead), but other times simple.qos is
> doing what it should and giving more throughput and lower latency to
> higher priority traffic. I seem to get better RRUL tests with LIMIT=
> blank, and ILIMIT/ELIMIT set to auto which results in this:

	Hmm, LIMIT defaults to 1001 basically so we can see that it was not set, but left at its default. I believe that at one time limit defaulted to 10240 packets which was to large for a wndr3700, that’s why I changed that to 1001, but I have a hard time believing that the difference between 1001 and 1024 packets should affect RRUL noticeably… but as always if the data supports that notion I am willing to adapt...

> 
> qdisc fq_codel a: dev se00 root refcnt 2 limit 1514p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms ecn
> qdisc htb 1: dev ge00 root refcnt 2 r2q 10 default 12
> direct_packets_stat 0 direct_qlen 1000
> qdisc fq_codel 110: dev ge00 parent 1:11 limit 1024p flows 1024
> quantum 300 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 120: dev ge00 parent 1:12 limit 1024p flows 1024
> quantum 300 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 130: dev ge00 parent 1:13 limit 1024p flows 1024
> quantum 300 target 5.0ms interval 100.0ms ecn
> qdisc ingress ffff: dev ge00 parent ffff:fff1 ----------------
> qdisc mq 1: dev sw10 root
> qdisc fq_codel 10: dev sw10 parent 1:1 limit 800p flows 1024 quantum
> 500 target 10.0ms interval 100.0ms
> qdisc fq_codel 20: dev sw10 parent 1:2 limit 800p flows 1024 quantum
> 300 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 30: dev sw10 parent 1:3 limit 1000p flows 1024 quantum
> 300 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 40: dev sw10 parent 1:4 limit 1000p flows 1024 quantum
> 300 target 5.0ms interval 100.0ms
> qdisc mq 1: dev sw00 root
> qdisc fq_codel 10: dev sw00 parent 1:1 limit 800p flows 1024 quantum
> 500 target 10.0ms interval 100.0ms
> qdisc fq_codel 20: dev sw00 parent 1:2 limit 800p flows 1024 quantum
> 300 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 30: dev sw00 parent 1:3 limit 1000p flows 1024 quantum
> 300 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 40: dev sw00 parent 1:4 limit 1000p flows 1024 quantum
> 300 target 5.0ms interval 100.0ms
> qdisc htb 1: dev ifb4ge00 root refcnt 2 r2q 10 default 12
> direct_packets_stat 0 direct_qlen 32
> qdisc fq_codel 110: dev ifb4ge00 parent 1:11 limit 1024p flows 1024
> quantum 500 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 120: dev ifb4ge00 parent 1:12 limit 1024p flows 1024
> quantum 1500 target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 130: dev ifb4ge00 parent 1:13 limit 1024p flows 1024
> quantum 300 target 5.0ms interval 100.0ms ecn
> qdisc htb 1: dev ifb4gw00 root refcnt 2 r2q 10 default 12
> direct_packets_stat 0 direct_qlen 32

	I believe this was left-over as gw00 (temporarily) went away and hence the egress interface disappeared. More recent versions of sqm-scripts try to handle this by reacting to the ifup hotplug events. You might want to try the most recent version of sqm-scripts.

> qdisc fq_codel 110: dev ifb4gw00 parent 1:11 limit 1024p flows 1024
> quantum 500 target 10.3ms interval 105.3ms ecn
> qdisc fq_codel 120: dev ifb4gw00 parent 1:12 limit 1024p flows 1024
> quantum 1500 target 10.3ms interval 105.3ms ecn
> qdisc fq_codel 130: dev ifb4gw00 parent 1:13 limit 1024p flows 1024
> quantum 300 target 10.3ms interval 105.3ms ecn
> 
> image of RRUL 45s graph here with simple.qos, no tx changes, auto
> LIMIT on FiOS 32/25 (30Mb/22.5Mb QoS): https://screencloud.net/v/tVV0
> - looks pretty good to me,

	So what is a bit weird is that you see the priority banding in the download direction (upper plot) but not in the egress direction, for IPv4 this typically looks different, were you by any chance using IPv6 for this test?

> but I should set up more MARK or DSCP
> classifications for my important/unimportant traffic. MARK is probably
> a better idea since it won't unnecessarily mis-flag outgoing traffic.
> I assume QOS_MARK_ge00 sees marks from other interfaces too.
> 
> I'm still unsure whether to apply simple/simplest to my secure
> wireless or leave it alone, Dave's debloat script seems to have
> wireless-specific optimizations when left on auto, does simple.qos
> handle VO/VI/BE/BK queues as efficiently?

	No sqm-scripts does not touch your interfaces to that level of detail. Setting up sqm-scripts on a wireless interface works okay, as long as you have a good estimate what the lowest on-air rate is going to be (and the shape to 50% of that rate ;)), sqm-scripts does not really handle the half-duplexicity of wifi too well… . But if you are willing to shape both ingress and egress aggressively enough you should see stable low(er) latencies under load, until your wifi card will exercise the VO/VI/BE/BK queues, then all bets are off. I find wifi with its highly variable link-rates a bit tedious to shape, either you sacrifice a lot of bandwidth or you end up with a shaper that only works half of the time… I am rooting for Dave to get-wifi-fast soon ;)

> I never top out my wireless
> since it's used only for mobile phones anyways and I run HT20 which
> seems to be more reliable/less latency.

	As compared to ht40? On 5GHz or on 2.4GHz?

> however my guest wifi I do
> need to limit and segregate via firewall so I have it enabled.

	Silly question, why do you need to limit your guest wifi? At most I would disallow the guest the use of the highest priority queue (sort of keeping it as a poor man’s control plane), but otherwise let them have their fair share of the full bandwidth after all they are my guests ;) (or alternatively put that traffic into the BK queue so they will never really delay your traffic)

> 
> P.S. I learned the hard way NEVER to enable port 4 on the switch,
> results in broken ethernet.

	Not sure what hardware we are talking about, but on my wndr3700v2 all ports seem to work okay, that is to say the CPU talks to the switch and I can plug in a cable in all 4 exposed lan ports and get working connectivity, so I am surely missing something.

> port4 is unused and likely internally
> reserved for unknown purposes. I'm still trying to figure out how it
> maps an interface to an actual port, since I'd like to assign a single
> switch switch port to it's own subnet for my FiOS router instead of
> having to use a secondary router to clone the ge00 interface on the
> backend router to forward FiOS ports to the verizon/FiOS MOCA bridge
> router in order for alerts to display on set-top boxes such as caller
> ID. There has to be a way of doing this without needing 3 routers...
> My current thoughts are to remove the port (port3 in this case) from
> the switch and make a new switch config with just 4 and 5t and somehow
> make a new interface on that for the FiOS router, but assigning the
> same ip address as the default gateway/route from ge00 on that port
> might confuse routing. More information on their rather complicated
> and seemingly unnecessary config with a backend router is located
> here: http://www.dslreports.com/faq/verizonfios/3.0_Networking#16710

	Again, this is far away from my area of expertise, but I would simple use a switch as poor man’s aggregation network between the ONT’s ethernet-port and the main-router; and then I would connrect the actiontec’s wan port to the same switch. With a bit of vlan? configuration it should be possible to have the actiontec only see the main-router (to not confuse the ONT, but maybe that is not necessary), and use a fixed route from the main-router to the actiontec’s network with the devices you want to access. I guess at that level of abstraction I avoid all the pesky issues cropping up when trying to implement that ;)

Best Regards
	Sebastian