[Cerowrt-devel] Problems testing sqm (solved)

Sebastian Moeller moeller0 at gmx.de
Sun Oct 25 16:07:24 EDT 2015


Hi Richard,


On Oct 25, 2015, at 17:07 , Richard Smith <smithbone at gmail.com> wrote:

> On 10/25/2015 11:10 AM, Richard Smith wrote:
> 
>> So I started to try and re-create my steps for failure..  I _am_ able to
>> duplicate the problem but I'm not able to figure out how.  It seems to
>> just come and go irrespective of what I'm doing with the DUT.  I'm
>> mostly in then bad state but every so often things work as expected.
> 
> I figured it out and now I feel _really_ stupid.

	Why, the “I figured it out” part heavily argues against the stupid hypothesis, (it might not account for the feeling though ;) )

>  The problem was having WiFi enabled on my laptop.

	Ah, I guess wifi needs to be made fast ;)

> 
> The netperf server is running on a machine which sits on my 192.168.11.x network.  I've been testing things by connecting up the DUT WAN to that network and then using 192.168.1.x (or 172.30.42.x) as the DUT LAN side.
> 
> My wireless network is bridged to 192.168.11.x.. Yes, Yes, I know this is bad but I haven't put in the time to figure out how to configure things such that all the broadcast stuff like printers and chromecast Just Work in a routed world.  The SO gets unhappy with me when they are broken and it's difficult to explain the reasoning.  It's on the TODO.

	I fully understand that at a family home there is only so much experimentation that is tolerable, maybe unless the whole family is involved in CS or network engineering. I am lucky enough that we started without any network attached services so we went all routed with cerowrt and so far have not turned back (knock on wood), but once I switch to openwrt (which defaults to bridged) I will have to revisit this again...

> 
> network-manager automatically connects wlan0 to that network.

	I guess we can not even blame it for doing this, often people would be furious if it did not...

> 
> When I plug up the ethernet cable network-manager sets up eth0 as the default route but, doesn't shutdown existing wlan0 connections.  So talking to the DUT via ssh or http: works as expected. However, when I run:
> 
> netperf-wrapper -H server -l 15 --disable-log -p all rrul
> 
> If I forgot to turn off WiFi then I'm completely bypassing the DUT and testing my WiFi network instead.

	Simple in retrospect but tough to diagnose. I had a similar weir wifi issue: the 2.4GHz wlan in my proprietary ISP-supplied modem router, that I had left active as an emergency way to get into the modem and read the stats (which felt simpler at the time then teaching cerowrt to pass the traffic to the modem over the wan port as well; by then I had figured out the proper way, but had not disabled the AP, thinking this lives on a different band than my cerowrt router, so could not hurt and redundancy is good). But that radio has a “glitch” and roughly every 56 seconds it caused havoc on the modems traffic on all interfaces (wired and wlan) showing up as periodic spikes of bad induced latency under RRUL, took me ages to realize the root cause. The solution was quite simple, since the 5GHz radio in that unit behaves sanely and it is a one-out-of-two bands device I just switched the radio to 5GHz permanently...

> "This is not the network you are looking for.”

	You’d wish jedi mind tricks did not work on openwrt or sqm-scripts.

> 
> Sigh.  Sorry for all the noise.  Thanks for everyones help.

	This has been quite interesting. It also reminds me that “tc -s qdisc” is a thing to test early, looking at the statistics it should have been relatively easy to spot that there was too little traffic on the interfaces. Sidenote, I use if top on my laptop during similar tests, as it also allows to easily monitor different devices sequentially (I also like it as I believe that it also shows ACK traffic in the reverse direction which netperf unfortunately does not see).

> Now I'll get back to the original task of comparing the performance of the 1900acs factory firmware vs openwrt trunk with sqm.

	I am quite curious about the worst case performance, or the ingress+egress rate combination that keeps the latency increase under load sane with minimal packet size traffic. Mikael Abrahamsson did a lot of relevant testing in the following thread https://lists.bufferbloat.net/pipermail/cerowrt-devel/2015-June/004726.html . I believe he used iperf to allow manipulating the packet size, though I believe it should be possible to use the DUT’s MSS clamping function to achieve the same with netperf (which is a bit nicer in that it allows bidirectional saturating traffic with flent’s RRUL test easily). Since your unit uses the same/a similar SoC to the turris omni a that I want to get I am quite curious about the performance numbers you come up with. Especially with regards to offload functions and the relation between shaper rates and achieved transfer rates (see Aaron Wood’s posts about that at http://burntchrome.blogspot.de/2015/06/htb-rate-limiting-not-quite-lining-up.html )


Best Regards
	Sebastian

> 
> -- 
> Richard A. Smith




More information about the Cerowrt-devel mailing list