From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bifrost.lang.hm (mail.lang.hm [64.81.33.126]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id C28C5200A12 for ; Thu, 5 Apr 2012 20:07:13 -0700 (PDT) Received: from asgard.lang.hm (asgard.lang.hm [10.0.0.100]) by bifrost.lang.hm (8.13.4/8.13.4/Debian-3) with ESMTP id q3637ASL003951; Thu, 5 Apr 2012 20:07:10 -0700 Date: Thu, 5 Apr 2012 20:07:10 -0700 (PDT) From: david@lang.hm X-X-Sender: dlang@asgard.lang.hm To: Dave Taht In-Reply-To: Message-ID: References: <1333679627.997611294@apps.rackspace.com> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] Cero-state this week and last X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2012 03:07:14 -0000 On Thu, 5 Apr 2012, Dave Taht wrote: > A linear complete build of openwrt takes 17 hours on good hardware. > It's hard to build in parallel. distcc doesn't work for this? > A parallel full build is about 3 hours but requires a bit of monitoring can this monitoring be automated? David Lang > Incremental package builds are measured in minutes, however... > >> Be damned politically incorrect about checkins that don't meet this criterion - eliminate >> the right to check in code for anyone who contributes something that breaks >> functionality. > > The number of core committers is quite low, too low, at present. > However the key problem here is that > the matrix of potential breakage is far larger than any one contribute > can deal with. > > There are: > > 20 + fairly different cpu architectures * > 150+ platforms * > 3 different libcs * > 3 different (generation) toolchains * > 5-6 different kernels > > That matrix alone is hardly concievable to deal with. In there are > arches that are genuinely weird (avr anyone), arches that have > arbitrary endian, arches that are 32 bit and 64 bit... > > Add in well over a thousand software packages (everything from Apache > to zile), and you have an idea of how much code has dependencies on > other code... > > For example, the breakage yesterday (or was it the day before) was in > a minor update to libtool, as best as I recall. It broke 3 packages > that cerowrt has available as options. > > I'm looking forward, very much, to seeing the buildbot produce a > known, good build, that I can layer my mere 67 patches and two dozen > packages on top of without having to think too much. > >> Every project leader discovers this. > > Cerowrt is an incredibly tiny superset of the openwrt project. I help > out where I can. > >> Programmers are *lazy* and refuse to >> check their inputs unless you shame them into compliance. > > Volunteer programmers are not lazy. > > They do, however, have limited resources, and prefer to make progress > rather than make things perfect. Difficult to pass check-in tests > impeed progress. > > The fact that you or I can build an entire OS, in a matter of hours, > today, and have it work, most often buffuddles me. This is 10s of > millions of lines of code, all perfect, most of the time. > > It used to take 500+ people to engineer an os in 1992, and 4 days to > build. I consider this progress. > > There are all sorts of processes in place, some can certainly be > improved. For example, discussed last week was methods for dealing > with and approving the backlog of submitted patches by other > volunteers. > > It mostly just needs more eyeballs. And testing. There's a lot of good > stuff piled up. > > http://patchwork.openwrt.org/project/openwrt/list/ >> >> >> >> -----Original Message----- >> From: "Dave Taht" >> Sent: Thursday, April 5, 2012 10:27pm >> To: cerowrt-devel@lists.bufferbloat.net >> Subject: [Cerowrt-devel] Cero-state this week and last >> >> I attended the ietf conference in Paris (virtually), particularly ccrg >> and homenet. >> >> I do encourage folk to pay attention to homenet if possible, as laying >> out what home networks will look like in the next 10 years is proving >> to be a hairball. >> ccrg was productive. >> >> Some news: >> >> I have been spending time fixing some infrastructural problems. >> >> 1) After be-ing blindsided by more continuous integration problems in >> the last month than in the last 5, I found out that one of the root >> causes was that the openwrt build cluster had declined in size from 8 >> boxes to 1(!!), and time between successful automated builds was in >> some cases over a month. >> >> The risk of going 1 to 0 build slaves seemed untenable. So I sprang >> into action, scammed two boxes and travis has tossed them into the >> cluster. Someone else volunteered a box. >> >> I am a huge proponent of continuous integration on complex projects. >> http://en.wikipedia.org/wiki/Continuous_integration >> >> Building all the components of an OS like openwrt correctly, all the >> time, with the dozens of developers involved, with a minimum delta >> between commit, breakage, and fix, is really key to simplifying the >> relatively simple task we face in bufferbloat.net of merely layering >> on components and fixes improving the state of the art in networking. >> >> The tgrid is still looking quite bad at the moment. >> >> http://buildbot.openwrt.org:8010/tgrid >> >> There's still a huge backlog of breakage. >> >> But I hope it gets better. Certainly building a full cluster of build >> boxes or vms (openwrt@HOME!!) would help a lot more. >> >> If anyone would like to help hardware wise, or learn more about how to >> manage a build cluster using buildbot, please contact travis >> >> >> 2) Bloatlab #1 has been completely rewired and rebuilt and most of >> the routers in there reflashed to Cerowrt-3.3.1-2 or later. They >> survived some serious network abuse over the last couple days >> (ironically the only router that crashed was the last rc6 box I had in >> the mix - and not due to a network fault! I ran it out of flash with a >> logging tool). >> >> To deal with the complexity in there (there's also a sub-lab for some >> sdnat and PCP testing), I ended up with a new ipv6 /48 and some better >> ways to route that I'll write up soon. >> >> 3) I did finally got back to fully working builds for the ar71xx >> (cerowrt) architecture a few days ago. I also have a working 3.3.1 >> kernel for the x86_64 build I use to test the server side. >> (bufferbloat is NOT just a router problem. Fixing all sides of a >> connection helps a lot). That + a new iproute2 + the debloat script >> and YOU TOO can experience orders of magnitude less latency.... >> >> http://europa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel for x86_64 >> >> Most of the past week has been backwards rather than forwards, but it >> was negative in a good way, mostly. >> >> I'm sorry it's been three weeks without a viable build for others to test. >> >> 4) today's build: http://huchra.bufferbloat.net/~cero1/3.3/3.3.1-4/ >> >> + Linux 3.3.1 (this is missing the sfq patch I liked, but it's good enough) >> + Working wifi is back >> + No more fiddling with ethtool tx rings (up to 64 from 2. BQL does >> this job better) >> + TCP CUBIC is now the default (no longer westwood) >> after 15+ years of misplaced faith in delay based tcp for wireless, >> I've collected enough data to convince me the cubic wins. all the >> time. >> + alttcp enabled (making it easy to switch) >> + latest netperf from svn (yea! remotely changable diffserv settings >> for a test tool!) >> >> - still horrible dependencies on time. You pretty much have to get on >> it and do a rndc validation disable multiple times, restart ntp >> multiple times, killall named multiple times to get anywhere if you >> want to get dns inside of 10 minutes. >> >> At this point sometimes I just turn off named in /etc/xinetd.d/named >> and turn on port 53 for dnsmasq... but >> usually after flashing it the first time, wait 10 minutes (let it >> clean flash), reboot, wait another 10, then it works. Drives me >> crazy... Once it's up and has valid time and is working, dnssec works >> great but.... >> >> + way cool new stuff in dnsmasq for ra and AAAA records >> - huge dependency on keeping bind in there >> - aqm-scripts. I have not succeed in making hfsc work right. Period. >> + HTB (vs hfsc) is proving far more tractable. SFQRED is scaling >> better than I'd dreamed. Maybe eric dreamed this big, I didn't. >> - http://www.bufferbloat.net/issues/352 >> + Added some essential randomness back into the entropy pool >> - hostapd really acts up at high rates with the hack in there for more >> entroy (From the openwrt mainline) >> + named caching the roots idea discarded in favor of classic '.' >> >> >> -- >> Dave T?ht >> SKYPE: davetaht >> US Tel: 1-239-829-5608 >> http://www.bufferbloat.net >> _______________________________________________ >> Cerowrt-devel mailing list >> Cerowrt-devel@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/cerowrt-devel > > > >