[Cerowrt-devel] Cero-state this week and last
david at lang.hm
david at lang.hm
Thu Apr 5 23:07:10 EDT 2012
On Thu, 5 Apr 2012, Dave Taht wrote:
> A linear complete build of openwrt takes 17 hours on good hardware.
> It's hard to build in parallel.
distcc doesn't work for this?
> A parallel full build is about 3 hours but requires a bit of monitoring
can this monitoring be automated?
> Incremental package builds are measured in minutes, however...
>> Be damned politically incorrect about checkins that don't meet this criterion - eliminate
>> the right to check in code for anyone who contributes something that breaks
> The number of core committers is quite low, too low, at present.
> However the key problem here is that
> the matrix of potential breakage is far larger than any one contribute
> can deal with.
> There are:
> 20 + fairly different cpu architectures *
> 150+ platforms *
> 3 different libcs *
> 3 different (generation) toolchains *
> 5-6 different kernels
> That matrix alone is hardly concievable to deal with. In there are
> arches that are genuinely weird (avr anyone), arches that have
> arbitrary endian, arches that are 32 bit and 64 bit...
> Add in well over a thousand software packages (everything from Apache
> to zile), and you have an idea of how much code has dependencies on
> other code...
> For example, the breakage yesterday (or was it the day before) was in
> a minor update to libtool, as best as I recall. It broke 3 packages
> that cerowrt has available as options.
> I'm looking forward, very much, to seeing the buildbot produce a
> known, good build, that I can layer my mere 67 patches and two dozen
> packages on top of without having to think too much.
>> Every project leader discovers this.
> Cerowrt is an incredibly tiny superset of the openwrt project. I help
> out where I can.
>> Programmers are *lazy* and refuse to
>> check their inputs unless you shame them into compliance.
> Volunteer programmers are not lazy.
> They do, however, have limited resources, and prefer to make progress
> rather than make things perfect. Difficult to pass check-in tests
> impeed progress.
> The fact that you or I can build an entire OS, in a matter of hours,
> today, and have it work, most often buffuddles me. This is 10s of
> millions of lines of code, all perfect, most of the time.
> It used to take 500+ people to engineer an os in 1992, and 4 days to
> build. I consider this progress.
> There are all sorts of processes in place, some can certainly be
> improved. For example, discussed last week was methods for dealing
> with and approving the backlog of submitted patches by other
> It mostly just needs more eyeballs. And testing. There's a lot of good
> stuff piled up.
>> -----Original Message-----
>> From: "Dave Taht" <dave.taht at gmail.com>
>> Sent: Thursday, April 5, 2012 10:27pm
>> To: cerowrt-devel at lists.bufferbloat.net
>> Subject: [Cerowrt-devel] Cero-state this week and last
>> I attended the ietf conference in Paris (virtually), particularly ccrg
>> and homenet.
>> I do encourage folk to pay attention to homenet if possible, as laying
>> out what home networks will look like in the next 10 years is proving
>> to be a hairball.
>> ccrg was productive.
>> Some news:
>> I have been spending time fixing some infrastructural problems.
>> 1) After be-ing blindsided by more continuous integration problems in
>> the last month than in the last 5, I found out that one of the root
>> causes was that the openwrt build cluster had declined in size from 8
>> boxes to 1(!!), and time between successful automated builds was in
>> some cases over a month.
>> The risk of going 1 to 0 build slaves seemed untenable. So I sprang
>> into action, scammed two boxes and travis has tossed them into the
>> cluster. Someone else volunteered a box.
>> I am a huge proponent of continuous integration on complex projects.
>> Building all the components of an OS like openwrt correctly, all the
>> time, with the dozens of developers involved, with a minimum delta
>> between commit, breakage, and fix, is really key to simplifying the
>> relatively simple task we face in bufferbloat.net of merely layering
>> on components and fixes improving the state of the art in networking.
>> The tgrid is still looking quite bad at the moment.
>> There's still a huge backlog of breakage.
>> But I hope it gets better. Certainly building a full cluster of build
>> boxes or vms (openwrt at HOME!!) would help a lot more.
>> If anyone would like to help hardware wise, or learn more about how to
>> manage a build cluster using buildbot, please contact travis
>> <thepeople AT openwrt.org>
>> 2) Bloatlab #1 has been completely rewired and rebuilt and most of
>> the routers in there reflashed to Cerowrt-3.3.1-2 or later. They
>> survived some serious network abuse over the last couple days
>> (ironically the only router that crashed was the last rc6 box I had in
>> the mix - and not due to a network fault! I ran it out of flash with a
>> logging tool).
>> To deal with the complexity in there (there's also a sub-lab for some
>> sdnat and PCP testing), I ended up with a new ipv6 /48 and some better
>> ways to route that I'll write up soon.
>> 3) I did finally got back to fully working builds for the ar71xx
>> (cerowrt) architecture a few days ago. I also have a working 3.3.1
>> kernel for the x86_64 build I use to test the server side.
>> (bufferbloat is NOT just a router problem. Fixing all sides of a
>> connection helps a lot). That + a new iproute2 + the debloat script
>> and YOU TOO can experience orders of magnitude less latency....
>> http://europa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel for x86_64
>> Most of the past week has been backwards rather than forwards, but it
>> was negative in a good way, mostly.
>> I'm sorry it's been three weeks without a viable build for others to test.
>> 4) today's build: http://huchra.bufferbloat.net/~cero1/3.3/3.3.1-4/
>> + Linux 3.3.1 (this is missing the sfq patch I liked, but it's good enough)
>> + Working wifi is back
>> + No more fiddling with ethtool tx rings (up to 64 from 2. BQL does
>> this job better)
>> + TCP CUBIC is now the default (no longer westwood)
>> after 15+ years of misplaced faith in delay based tcp for wireless,
>> I've collected enough data to convince me the cubic wins. all the
>> + alttcp enabled (making it easy to switch)
>> + latest netperf from svn (yea! remotely changable diffserv settings
>> for a test tool!)
>> - still horrible dependencies on time. You pretty much have to get on
>> it and do a rndc validation disable multiple times, restart ntp
>> multiple times, killall named multiple times to get anywhere if you
>> want to get dns inside of 10 minutes.
>> At this point sometimes I just turn off named in /etc/xinetd.d/named
>> and turn on port 53 for dnsmasq... but
>> usually after flashing it the first time, wait 10 minutes (let it
>> clean flash), reboot, wait another 10, then it works. Drives me
>> crazy... Once it's up and has valid time and is working, dnssec works
>> great but....
>> + way cool new stuff in dnsmasq for ra and AAAA records
>> - huge dependency on keeping bind in there
>> - aqm-scripts. I have not succeed in making hfsc work right. Period.
>> + HTB (vs hfsc) is proving far more tractable. SFQRED is scaling
>> better than I'd dreamed. Maybe eric dreamed this big, I didn't.
>> - http://www.bufferbloat.net/issues/352
>> + Added some essential randomness back into the entropy pool
>> - hostapd really acts up at high rates with the hack in there for more
>> entroy (From the openwrt mainline)
>> + named caching the roots idea discarded in favor of classic '.'
>> Dave T?ht
>> SKYPE: davetaht
>> US Tel: 1-239-829-5608
>> Cerowrt-devel mailing list
>> Cerowrt-devel at lists.bufferbloat.net
More information about the Cerowrt-devel