[Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

Sun Mar 29 11:13:22 EDT 2015

>> - Turning on HTB + fq_codel loses you 5%.
> 
> I assume that this partly is caused by the need to shape below the physical link bandwidth, it might be possible to get closer to the limit (if the true bottleneck bandwidth is known, but see above).

> Downstream:	(((1500 - 8 - 40 -20) * 8) * (98407 * 1000) / ((1500 + 14 + 16) * 8)) / 1000 = 92103.8 Kbps; measured: 85.35 Mbps (dual egress); 82.76 Mbps (IFB ingress)

I interpret that as meaning: you have set HTB at 98407 Kbps, and after subtracting overheads you expect to get 92103 Kbps goodput.  You got pretty close to that on the raw line, and the upstream number gets pretty close to your calculated figure, so I can’t account for the missing 6700 Kbps (7%) due to link capacity simply not being there.  HTB, being a token-bucket-type shaper, should compensate for short lulls, so subtle timing effects probably don’t explain it either.

>> Those 5% penalties add up.  People might grudgingly accept a 10% loss of bandwidth to be sure of lower latency, and faster hardware would do better than that, but losing 25% is a bit much.
> 
> 	But IPv4 simple.qos IFB ingress shaping: ingress 82.3 Mbps versus 93.48 Mbps (no SQM) =>  100 * 82.3 / 93.48 = 88.04%, so we only loose 12% (for the sum of diffserv classification, IFB ingress shaping and HTB) which seems more reasonable (that or my math is wrong).

Getting 95% three times leaves you with about 86%, so it’s a useful rule-of-thumb figure.  The more precise one (100% - 88.04% ^ -3) would be 4.16% per stage.

However, if the no-SQM throughput is really limited by the ISP rather than the router, then simply adding HTB + fq_codel might have a bigger impact on throughput for someone with a faster service; they would be limited to the same speed with SQM, but might have higher throughput without it.  So your measurements really give 5% as a lower bound for that case.

> 	But anyway I do not argue that we should not aim at decreasing overheads, but just that even without these overheads we are still a (binary) order of magnitude short of the goal, a shaper that can do up to symmetric 150Mbps shaping let alone Dave’s goal of symmetric 300 Mbps shaping.

Certainly, better hardware will perform better.  I personally use a decade-old PowerBook for my shaping needs; a 1.5GHz PowerPC 7447 (triple issue, out of order, 512KB+ on-die cache) is massively more powerful than a 680MHz MIPS 24K (single issue, in order, a few KB cache), and it shows when I conduct LAN throughput tests.  But I don’t get the chance to push that much data over the Internet.

The MIPS 74K in the Archer C7 v2 is dual issue, out of order; that certainly helps.  Multi-core (or at least multi-thread) would probably also help by reducing context switch overhead, and allowing more than one device’s interrupts to get serviced in parallel.  I happen to have one router with a MIPS 34K, which is multi-thread, but the basic pipeline is that of the 24K and the clock speed is much lower.

Still, it’s also good to help people get the most out of what they’ve already got.  Cake is part of that, but efficiency (by using a simpler shaper than HTB and eliminating one qdisc-to-qdisc interface) is only one of its goals.  Ease of configuration, and providing state-of-the-art behaviour, are equally important to me.

>> The point of this exercise was to find out whether a theoretical, ideal policer on ingress might - in theory, mind - give a noticeable improvement of efficiency and thus throughput.
> 
> 	I think we only have 12% left on the table and there is a need to keep the shaped/policed ingress rate below the real bottleneck rate with a margin, to keep instances of buffering “bleeding” back into the real bottleneck rare…, 

That’s 12% as a lower bound - and that’s already enough to be noticeable in practice.  Obviously we can’t be sure of getting all of it back, but we might get enough to bring *you* up to line rate.

 - Jonathan Morton