A small suggestion. Create a regression test suite, and require contributors to *pass* the test with each submitted patch set. Be a damned Nazi about checkins that don't meet this criterion - eliminate the right to check in code for anyone who contributes something that breaks functionality. Every project leader discovers this. Programmers are *lazy* and refuse to check their inputs unless you shame them into compliance. -----Original Message----- From: "Dave Taht" Sent: Thursday, April 5, 2012 10:27pm To: cerowrt-devel@lists.bufferbloat.net Subject: [Cerowrt-devel] Cero-state this week and last I attended the ietf conference in Paris (virtually), particularly ccrg and homenet. I do encourage folk to pay attention to homenet if possible, as laying out what home networks will look like in the next 10 years is proving to be a hairball. ccrg was productive. Some news: I have been spending time fixing some infrastructural problems. 1) After be-ing blindsided by more continuous integration problems in the last month than in the last 5, I found out that one of the root causes was that the openwrt build cluster had declined in size from 8 boxes to 1(!!), and time between successful automated builds was in some cases over a month. The risk of going 1 to 0 build slaves seemed untenable. So I sprang into action, scammed two boxes and travis has tossed them into the cluster. Someone else volunteered a box. I am a huge proponent of continuous integration on complex projects. http://en.wikipedia.org/wiki/Continuous_integration Building all the components of an OS like openwrt correctly, all the time, with the dozens of developers involved, with a minimum delta between commit, breakage, and fix, is really key to simplifying the relatively simple task we face in bufferbloat.net of merely layering on components and fixes improving the state of the art in networking. The tgrid is still looking quite bad at the moment. http://buildbot.openwrt.org:8010/tgrid There's still a huge backlog of breakage. But I hope it gets better. Certainly building a full cluster of build boxes or vms (openwrt@HOME!!) would help a lot more. If anyone would like to help hardware wise, or learn more about how to manage a build cluster using buildbot, please contact travis 2) Bloatlab #1 has been completely rewired and rebuilt and most of the routers in there reflashed to Cerowrt-3.3.1-2 or later. They survived some serious network abuse over the last couple days (ironically the only router that crashed was the last rc6 box I had in the mix - and not due to a network fault! I ran it out of flash with a logging tool). To deal with the complexity in there (there's also a sub-lab for some sdnat and PCP testing), I ended up with a new ipv6 /48 and some better ways to route that I'll write up soon. 3) I did finally got back to fully working builds for the ar71xx (cerowrt) architecture a few days ago. I also have a working 3.3.1 kernel for the x86_64 build I use to test the server side. (bufferbloat is NOT just a router problem. Fixing all sides of a connection helps a lot). That + a new iproute2 + the debloat script and YOU TOO can experience orders of magnitude less latency.... http://europa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel for x86_64 Most of the past week has been backwards rather than forwards, but it was negative in a good way, mostly. I'm sorry it's been three weeks without a viable build for others to test. 4) today's build: http://huchra.bufferbloat.net/~cero1/3.3/3.3.1-4/ + Linux 3.3.1 (this is missing the sfq patch I liked, but it's good enough) + Working wifi is back + No more fiddling with ethtool tx rings (up to 64 from 2. BQL does this job better) + TCP CUBIC is now the default (no longer westwood) after 15+ years of misplaced faith in delay based tcp for wireless, I've collected enough data to convince me the cubic wins. all the time. + alttcp enabled (making it easy to switch) + latest netperf from svn (yea! remotely changable diffserv settings for a test tool!) - still horrible dependencies on time. You pretty much have to get on it and do a rndc validation disable multiple times, restart ntp multiple times, killall named multiple times to get anywhere if you want to get dns inside of 10 minutes. At this point sometimes I just turn off named in /etc/xinetd.d/named and turn on port 53 for dnsmasq... but usually after flashing it the first time, wait 10 minutes (let it clean flash), reboot, wait another 10, then it works. Drives me crazy... Once it's up and has valid time and is working, dnssec works great but.... + way cool new stuff in dnsmasq for ra and AAAA records - huge dependency on keeping bind in there - aqm-scripts. I have not succeed in making hfsc work right. Period. + HTB (vs hfsc) is proving far more tractable. SFQRED is scaling better than I'd dreamed. Maybe eric dreamed this big, I didn't. - http://www.bufferbloat.net/issues/352 + Added some essential randomness back into the entropy pool - hostapd really acts up at high rates with the hack in there for more entroy (From the openwrt mainline) + named caching the roots idea discarded in favor of classic '.' -- Dave Täht SKYPE: davetaht US Tel: 1-239-829-5608 http://www.bufferbloat.net _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel