A small suggestion.
 
Create a regression test suite, and require contributors to *pass* the test with each submitted patch set.
 
Be a damned Nazi about checkins that don't meet this criterion - eliminate the right to check in code for anyone who contributes something that breaks functionality.
 
Every project leader discovers this.  Programmers are *lazy* and refuse to check their inputs unless you shame them into compliance.
 
-----Original Message-----
From: "Dave Taht" <dave.taht@gmail.com>
Sent: Thursday, April 5, 2012 10:27pm
To: cerowrt-devel@lists.bufferbloat.net
Subject: [Cerowrt-devel] Cero-state this week and last



I attended the ietf conference in Paris (virtually), particularly ccrg
and homenet.

I do encourage folk to pay attention to homenet if possible, as laying
out what home networks will look like in the next 10 years is proving
to be a hairball.
ccrg was productive.

Some news:

I have been spending time fixing some infrastructural problems.

1) After be-ing blindsided by more continuous integration problems in
the last month than in the last 5, I found out that one of the root
causes was that the openwrt build cluster had declined in size from 8
boxes to 1(!!), and time between successful automated builds was in
some cases over a month.

The risk of going 1 to 0 build slaves seemed untenable. So I sprang
into action, scammed two boxes and travis has tossed them into the
cluster. Someone else volunteered a box.

I am a huge proponent of continuous integration on complex projects.
http://en.wikipedia.org/wiki/Continuous_integration

Building all the components of an OS like openwrt correctly, all the
time, with the dozens of developers involved, with a minimum delta
between commit, breakage, and fix, is really key to simplifying the
relatively simple task we face in bufferbloat.net of merely layering
on components and fixes improving the state of the art in networking.

The tgrid is still looking quite bad at the moment.

http://buildbot.openwrt.org:8010/tgrid

There's still a huge backlog of breakage.

But I hope it gets better. Certainly building a full cluster of build
boxes or vms (openwrt@HOME!!) would help a lot more.

If anyone would like to help hardware wise, or learn more about how to
manage a build cluster using buildbot, please contact travis
<thepeople AT openwrt.org>

2)  Bloatlab #1 has been completely rewired and rebuilt and most of
the routers in there reflashed to Cerowrt-3.3.1-2 or later. They
survived some serious network abuse over the last couple days
(ironically the only router that crashed was the last rc6 box I had in
the mix - and not due to a network fault! I ran it out of flash with a
logging tool).

To deal with the complexity in there (there's also a sub-lab for some
sdnat and PCP testing), I ended up with a new ipv6 /48 and some better
ways to route that I'll write up soon.

3) I did finally got back to fully working builds for the ar71xx
(cerowrt) architecture a few days ago. I also have a working 3.3.1
kernel for the x86_64 build I use to test the server side.
(bufferbloat is NOT just a router problem. Fixing all sides of a
connection helps a lot). That + a new iproute2 + the debloat script
and YOU TOO can experience orders of magnitude less latency....

http://europa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel for x86_64

Most of the past week has been backwards rather than forwards, but it
was negative in a good way, mostly.

I'm sorry it's been three weeks without a viable build for others to test.

4) today's build: http://huchra.bufferbloat.net/~cero1/3.3/3.3.1-4/

+ Linux 3.3.1 (this is missing the sfq patch I liked, but it's good enough)
+ Working wifi is back
+ No more fiddling with ethtool tx rings (up to 64 from 2. BQL does
this job better)
+ TCP CUBIC is now the default (no longer westwood)
after 15+ years of misplaced faith in delay based tcp for wireless,
I've collected enough data to convince me the cubic wins. all the
time.
+ alttcp enabled (making it easy to switch)
+ latest netperf from svn (yea! remotely changable diffserv settings
for a test tool!)

- still horrible dependencies on time. You pretty much have to get on
it and do a rndc validation disable multiple times, restart ntp
multiple times, killall named multiple times to get anywhere if you
want to get dns inside of 10 minutes.

At this point sometimes I just turn off named in /etc/xinetd.d/named
and turn on port 53 for dnsmasq... but
usually after flashing it the first time, wait 10 minutes (let it
clean flash), reboot, wait another 10, then it works. Drives me
crazy... Once it's up and has valid time and is working, dnssec works
great but....

+ way cool new stuff in dnsmasq for ra and AAAA records
- huge dependency on keeping bind in there
- aqm-scripts. I have not succeed in making hfsc work right. Period.
+ HTB (vs hfsc) is proving far more tractable. SFQRED is scaling
better than I'd dreamed. Maybe eric dreamed this big, I didn't.
- http://www.bufferbloat.net/issues/352
+ Added some essential randomness back into the entropy pool
- hostapd really acts up at high rates with the hack in there for more
entroy (From the openwrt mainline)
+ named caching the roots idea discarded in favor of classic '.'


-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel