From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david@lang.hm>
Received: from bifrost.lang.hm (mail.lang.hm [64.81.33.126])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id C28C5200A12
	for <cerowrt-devel@lists.bufferbloat.net>;
	Thu,  5 Apr 2012 20:07:13 -0700 (PDT)
Received: from asgard.lang.hm (asgard.lang.hm [10.0.0.100])
	by bifrost.lang.hm (8.13.4/8.13.4/Debian-3) with ESMTP id
	q3637ASL003951; Thu, 5 Apr 2012 20:07:10 -0700
Date: Thu, 5 Apr 2012 20:07:10 -0700 (PDT)
From: david@lang.hm
X-X-Sender: dlang@asgard.lang.hm
To: Dave Taht <dave.taht@gmail.com>
In-Reply-To: <CAA93jw5d2cZ6TmEfWx7P+bs9h6OLTYf_fN0-yXnE1yPzgNsHtw@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1204051954120.14040@asgard.lang.hm>
References: <CAA93jw5wXVYs+=odoSSHGMP0Sj+fb8UbAH8E6Cz-Bp0yD2hPdg@mail.gmail.com>
	<1333679627.997611294@apps.rackspace.com>
	<CAA93jw5d2cZ6TmEfWx7P+bs9h6OLTYf_fN0-yXnE1yPzgNsHtw@mail.gmail.com>
User-Agent: Alpine 2.02 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Cero-state this week and last
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 06 Apr 2012 03:07:14 -0000

On Thu, 5 Apr 2012, Dave Taht wrote:

> A linear complete build of openwrt takes 17 hours on good hardware.
> It's hard to build in parallel.

distcc doesn't work for this?

> A parallel full build is about 3 hours but requires a bit of monitoring

can this monitoring be automated?

David Lang

> Incremental package builds are measured in minutes, however...
>
>> Be damned politically incorrect about checkins that don't meet this criterion - eliminate
>> the right to check in code for anyone who contributes something that breaks
>> functionality.
>
> The number of core committers is quite low, too low, at present.
> However the key problem here is that
> the matrix of potential breakage is far larger than any one contribute
> can deal with.
>
> There are:
>
> 20 + fairly different cpu architectures *
> 150+ platforms *
> 3 different libcs *
> 3 different (generation) toolchains *
> 5-6 different kernels
>
> That matrix alone is hardly concievable to deal with. In there are
> arches that are genuinely weird (avr anyone), arches that have
> arbitrary endian, arches that are 32 bit and 64 bit...
>
> Add in well over a thousand software packages (everything from Apache
> to zile), and you have an idea of how much code has dependencies on
> other code...
>
> For example, the breakage yesterday (or was it the day before) was in
> a minor update to libtool, as best as I recall. It broke 3 packages
> that cerowrt has available as options.
>
> I'm looking forward, very much, to seeing the buildbot produce a
> known, good build, that I can layer my mere 67 patches and two dozen
> packages on top of without having to think too much.
>
>> Every project leader discovers this.
>
> Cerowrt is an incredibly tiny superset of the openwrt project. I help
> out where I can.
>
>> Programmers are *lazy* and refuse to
>> check their inputs unless you shame them into compliance.
>
> Volunteer programmers are not lazy.
>
> They do, however, have limited resources, and prefer to make progress
> rather than make things perfect. Difficult to pass check-in tests
> impeed progress.
>
> The fact that you or I can build an entire OS, in a matter of hours,
> today, and have it work, most often buffuddles me. This is 10s of
> millions of lines of code, all perfect, most of the time.
>
> It used to take 500+ people to engineer an os in 1992, and 4 days to
> build. I consider this progress.
>
> There are all sorts of processes in place, some can certainly be
> improved. For example, discussed last week was methods for dealing
> with and approving the backlog of submitted patches by other
> volunteers.
>
> It mostly just needs more eyeballs. And testing. There's a lot of good
> stuff piled up.
>
> http://patchwork.openwrt.org/project/openwrt/list/
>>
>>
>>
>> -----Original Message-----
>> From: "Dave Taht" <dave.taht@gmail.com>
>> Sent: Thursday, April 5, 2012 10:27pm
>> To: cerowrt-devel@lists.bufferbloat.net
>> Subject: [Cerowrt-devel] Cero-state this week and last
>>
>> I attended the ietf conference in Paris (virtually), particularly ccrg
>> and homenet.
>>
>> I do encourage folk to pay attention to homenet if possible, as laying
>> out what home networks will look like in the next 10 years is proving
>> to be a hairball.
>> ccrg was productive.
>>
>> Some news:
>>
>> I have been spending time fixing some infrastructural problems.
>>
>> 1) After be-ing blindsided by more continuous integration problems in
>> the last month than in the last 5, I found out that one of the root
>> causes was that the openwrt build cluster had declined in size from 8
>> boxes to 1(!!), and time between successful automated builds was in
>> some cases over a month.
>>
>> The risk of going 1 to 0 build slaves seemed untenable. So I sprang
>> into action, scammed two boxes and travis has tossed them into the
>> cluster. Someone else volunteered a box.
>>
>> I am a huge proponent of continuous integration on complex projects.
>> http://en.wikipedia.org/wiki/Continuous_integration
>>
>> Building all the components of an OS like openwrt correctly, all the
>> time, with the dozens of developers involved, with a minimum delta
>> between commit, breakage, and fix, is really key to simplifying the
>> relatively simple task we face in bufferbloat.net of merely layering
>> on components and fixes improving the state of the art in networking.
>>
>> The tgrid is still looking quite bad at the moment.
>>
>> http://buildbot.openwrt.org:8010/tgrid
>>
>> There's still a huge backlog of breakage.
>>
>> But I hope it gets better. Certainly building a full cluster of build
>> boxes or vms (openwrt@HOME!!) would help a lot more.
>>
>> If anyone would like to help hardware wise, or learn more about how to
>> manage a build cluster using buildbot, please contact travis
>> <thepeople AT openwrt.org>
>>
>> 2) Bloatlab #1 has been completely rewired and rebuilt and most of
>> the routers in there reflashed to Cerowrt-3.3.1-2 or later. They
>> survived some serious network abuse over the last couple days
>> (ironically the only router that crashed was the last rc6 box I had in
>> the mix - and not due to a network fault! I ran it out of flash with a
>> logging tool).
>>
>> To deal with the complexity in there (there's also a sub-lab for some
>> sdnat and PCP testing), I ended up with a new ipv6 /48 and some better
>> ways to route that I'll write up soon.
>>
>> 3) I did finally got back to fully working builds for the ar71xx
>> (cerowrt) architecture a few days ago. I also have a working 3.3.1
>> kernel for the x86_64 build I use to test the server side.
>> (bufferbloat is NOT just a router problem. Fixing all sides of a
>> connection helps a lot). That + a new iproute2 + the debloat script
>> and YOU TOO can experience orders of magnitude less latency....
>>
>> http://europa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel for x86_64
>>
>> Most of the past week has been backwards rather than forwards, but it
>> was negative in a good way, mostly.
>>
>> I'm sorry it's been three weeks without a viable build for others to test.
>>
>> 4) today's build: http://huchra.bufferbloat.net/~cero1/3.3/3.3.1-4/
>>
>> + Linux 3.3.1 (this is missing the sfq patch I liked, but it's good enough)
>> + Working wifi is back
>> + No more fiddling with ethtool tx rings (up to 64 from 2. BQL does
>> this job better)
>> + TCP CUBIC is now the default (no longer westwood)
>> after 15+ years of misplaced faith in delay based tcp for wireless,
>> I've collected enough data to convince me the cubic wins. all the
>> time.
>> + alttcp enabled (making it easy to switch)
>> + latest netperf from svn (yea! remotely changable diffserv settings
>> for a test tool!)
>>
>> - still horrible dependencies on time. You pretty much have to get on
>> it and do a rndc validation disable multiple times, restart ntp
>> multiple times, killall named multiple times to get anywhere if you
>> want to get dns inside of 10 minutes.
>>
>> At this point sometimes I just turn off named in /etc/xinetd.d/named
>> and turn on port 53 for dnsmasq... but
>> usually after flashing it the first time, wait 10 minutes (let it
>> clean flash), reboot, wait another 10, then it works. Drives me
>> crazy... Once it's up and has valid time and is working, dnssec works
>> great but....
>>
>> + way cool new stuff in dnsmasq for ra and AAAA records
>> - huge dependency on keeping bind in there
>> - aqm-scripts. I have not succeed in making hfsc work right. Period.
>> + HTB (vs hfsc) is proving far more tractable. SFQRED is scaling
>> better than I'd dreamed. Maybe eric dreamed this big, I didn't.
>> - http://www.bufferbloat.net/issues/352
>> + Added some essential randomness back into the entropy pool
>> - hostapd really acts up at high rates with the hack in there for more
>> entroy (From the openwrt mainline)
>> + named caching the roots idea discarded in favor of classic '.'
>>
>>
>> --
>> Dave T?ht
>> SKYPE: davetaht
>> US Tel: 1-239-829-5608
>> http://www.bufferbloat.net
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>
>