From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp131.iad.emailsrvr.com (smtp131.iad.emailsrvr.com [207.97.245.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id E83AA201B0B for ; Thu, 5 Apr 2012 21:09:33 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp53.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id A8E76583E1; Fri, 6 Apr 2012 00:09:32 -0400 (EDT) X-Virus-Scanned: OK Received: from legacy7.wa-web.iad1a (legacy7.wa-web.iad1a.rsapps.net [192.168.2.216]) by smtp53.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id 8C8D8583DE; Fri, 6 Apr 2012 00:09:32 -0400 (EDT) Received: from reed.com (localhost [127.0.0.1]) by legacy7.wa-web.iad1a (Postfix) with ESMTP id 7AA9C3200B0; Fri, 6 Apr 2012 00:09:32 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Fri, 6 Apr 2012 00:09:32 -0400 (EDT) Date: Fri, 6 Apr 2012 00:09:32 -0400 (EDT) From: dpreed@reed.com To: "Dave Taht" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20120406000932000000_37475" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: References: <1333679627.997611294@apps.rackspace.com> Message-ID: <1333685372.501325169@apps.rackspace.com> X-Mailer: webmail7.0 Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] Cero-state this week and last X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2012 04:09:34 -0000 ------=_20120406000932000000_37475 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0AI understand this. In the end of the day, however, *regression tests* m= atter, as well as tests to verify that new functionality actually works.=0A= =0AI've managed projects with 200 daily "committers". Unless those commit= ters get immediate feedback on what they break (accidentally) and design th= e tests for their new functionality so that others don't break what they ca= refully craft, projects go south and never recover.=0A =0AYou don't have th= at rate of committers here, but it's not really an excuse to say - "we have= to jam in code without testing it because we don't have a discipline of te= sting and it's a waste of time".=0A =0A50% of what a developer should be do= ing (if not more) is making sure that they don't break more than they impro= ve.=0A =0AI realize this is tough, not fun, and sometimes very frustrating.= But cool "new stuff" is far less important than keeping stuff stable.=0A = =0AI'm not trying to be negative - this is stuff I learned at huge personal= cost in very high stress environments where people were literally screamin= g at me every hour of every day.=0A =0AThe cerowrt/bufferbloat stuff is wor= th doing, and it's worth doing right - I'm a fan.=0A =0A-----Original Messa= ge-----=0AFrom: "Dave Taht" =0ASent: Thursday, April 5= , 2012 10:50pm=0ATo: dpreed@reed.com=0ACc: cerowrt-devel@lists.bufferbloat.= net=0ASubject: Re: [Cerowrt-devel] Cero-state this week and last=0A=0A=0A= =0AOn Thu, Apr 5, 2012 at 7:33 PM, wrote:=0A> A small su= ggestion.=0A>=0A>=0A>=0A> Create a regression test suite, and require contr= ibutors to *pass* the test=0A> with each submitted patch set.=0A=0AA linear= complete build of openwrt takes 17 hours on good hardware.=0AIt's hard to = build in parallel.=0A=0AA parallel full build is about 3 hours but requires= a bit of monitoring=0A=0AIncremental package builds are measured in minute= s, however...=0A=0A> Be damned politically incorrect about checkins that do= n't meet this criterion - eliminate=0A> the right to check in code for anyo= ne who contributes something that breaks=0A> functionality.=0A=0AThe number= of core committers is quite low, too low, at present.=0AHowever the key pr= oblem here is that=0Athe matrix of potential breakage is far larger than an= y one contribute=0Acan deal with.=0A=0AThere are:=0A=0A20 + fairly differen= t cpu architectures *=0A150+ platforms *=0A3 different libcs *=0A3 differen= t (generation) toolchains *=0A5-6 different kernels=0A=0AThat matrix alone = is hardly concievable to deal with. In there are=0Aarches that are genuinel= y weird (avr anyone), arches that have=0Aarbitrary endian, arches that are = 32 bit and 64 bit...=0A=0AAdd in well over a thousand software packages (ev= erything from Apache=0Ato zile), and you have an idea of how much code has = dependencies on=0Aother code...=0A=0AFor example, the breakage yesterday (o= r was it the day before) was in=0Aa minor update to libtool, as best as I r= ecall. It broke 3 packages=0Athat cerowrt has available as options.=0A=0AI'= m looking forward, very much, to seeing the buildbot produce a=0Aknown, goo= d build, that I can layer my mere 67 patches and two dozen=0Apackages on to= p of without having to think too much.=0A=0A> Every project leader discover= s this.=0A=0ACerowrt is an incredibly tiny superset of the openwrt project.= I help=0Aout where I can.=0A=0A> Programmers are *lazy* and refuse to=0A> = check their inputs unless you shame them into compliance.=0A=0AVolunteer pr= ogrammers are not lazy.=0A=0AThey do, however, have limited resources, and = prefer to make progress=0Arather than make things perfect. Difficult to pas= s check-in tests=0Aimpeed progress.=0A=0AThe fact that you or I can build a= n entire OS, in a matter of hours,=0Atoday, and have it work, most often bu= ffuddles me. This is 10s of=0Amillions of lines of code, all perfect, most = of the time.=0A=0AIt used to take 500+ people to engineer an os in 1992, an= d 4 days to=0Abuild. I consider this progress.=0A=0AThere are all sorts of = processes in place, some can certainly be=0Aimproved. For example, discusse= d last week was methods for dealing=0Awith and approving the backlog of sub= mitted patches by other=0Avolunteers.=0A=0AIt mostly just needs more eyebal= ls. And testing. There's a lot of good=0Astuff piled up.=0A=0Ahttp://patchw= ork.openwrt.org/project/openwrt/list/=0A>=0A>=0A>=0A> -----Original Message= -----=0A> From: "Dave Taht" =0A> Sent: Thursday, April= 5, 2012 10:27pm=0A> To: cerowrt-devel@lists.bufferbloat.net=0A> Subject: [= Cerowrt-devel] Cero-state this week and last=0A>=0A> I attended the ietf co= nference in Paris (virtually), particularly ccrg=0A> and homenet.=0A>=0A> I= do encourage folk to pay attention to homenet if possible, as laying=0A> o= ut what home networks will look like in the next 10 years is proving=0A> to= be a hairball.=0A> ccrg was productive.=0A>=0A> Some news:=0A>=0A> I have = been spending time fixing some infrastructural problems.=0A>=0A> 1) After b= e-ing blindsided by more continuous integration problems in=0A> the last mo= nth than in the last 5, I found out that one of the root=0A> causes was tha= t the openwrt build cluster had declined in size from 8=0A> boxes to 1(!!),= and time between successful automated builds was in=0A> some cases over a = month.=0A>=0A> The risk of going 1 to 0 build slaves seemed untenable. So I= sprang=0A> into action, scammed two boxes and travis has tossed them into = the=0A> cluster. Someone else volunteered a box.=0A>=0A> I am a huge propon= ent of continuous integration on complex projects.=0A> http://en.wikipedia.= org/wiki/Continuous_integration=0A>=0A> Building all the components of an O= S like openwrt correctly, all the=0A> time, with the dozens of developers i= nvolved, with a minimum delta=0A> between commit, breakage, and fix, is rea= lly key to simplifying the=0A> relatively simple task we face in bufferbloa= t.net of merely layering=0A> on components and fixes improving the state of= the art in networking.=0A>=0A> The tgrid is still looking quite bad at the= moment.=0A>=0A> http://buildbot.openwrt.org:8010/tgrid=0A>=0A> There's sti= ll a huge backlog of breakage.=0A>=0A> But I hope it gets better. Certainly= building a full cluster of build=0A> boxes or vms (openwrt@HOME!!) would h= elp a lot more.=0A>=0A> If anyone would like to help hardware wise, or lear= n more about how to=0A> manage a build cluster using buildbot, please conta= ct travis=0A> =0A>=0A> 2) Bloatlab #1 has been co= mpletely rewired and rebuilt and most of=0A> the routers in there reflashed= to Cerowrt-3.3.1-2 or later. They=0A> survived some serious network abuse = over the last couple days=0A> (ironically the only router that crashed was = the last rc6 box I had in=0A> the mix - and not due to a network fault! I r= an it out of flash with a=0A> logging tool).=0A>=0A> To deal with the compl= exity in there (there's also a sub-lab for some=0A> sdnat and PCP testing),= I ended up with a new ipv6 /48 and some better=0A> ways to route that I'll= write up soon.=0A>=0A> 3) I did finally got back to fully working builds f= or the ar71xx=0A> (cerowrt) architecture a few days ago. I also have a work= ing 3.3.1=0A> kernel for the x86_64 build I use to test the server side.=0A= > (bufferbloat is NOT just a router problem. Fixing all sides of a=0A> conn= ection helps a lot). That + a new iproute2 + the debloat script=0A> and YOU= TOO can experience orders of magnitude less latency....=0A>=0A> http://eur= opa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel for x86_64=0A>=0A> M= ost of the past week has been backwards rather than forwards, but it=0A> wa= s negative in a good way, mostly.=0A>=0A> I'm sorry it's been three weeks w= ithout a viable build for others to test.=0A>=0A> 4) today's build: http://= huchra.bufferbloat.net/~cero1/3.3/3.3.1-4/=0A>=0A> + Linux 3.3.1 (this is m= issing the sfq patch I liked, but it's good enough)=0A> + Working wifi is b= ack=0A> + No more fiddling with ethtool tx rings (up to 64 from 2. BQL does= =0A> this job better)=0A> + TCP CUBIC is now the default (no longer westwoo= d)=0A> after 15+ years of misplaced faith in delay based tcp for wireless,= =0A> I've collected enough data to convince me the cubic wins. all the=0A> = time.=0A> + alttcp enabled (making it easy to switch)=0A> + latest netperf = from svn (yea! remotely changable diffserv settings=0A> for a test tool!)= =0A>=0A> - still horrible dependencies on time. You pretty much have to get= on=0A> it and do a rndc validation disable multiple times, restart ntp=0A>= multiple times, killall named multiple times to get anywhere if you=0A> wa= nt to get dns inside of 10 minutes.=0A>=0A> At this point sometimes I just = turn off named in /etc/xinetd.d/named=0A> and turn on port 53 for dnsmasq..= . but=0A> usually after flashing it the first time, wait 10 minutes (let it= =0A> clean flash), reboot, wait another 10, then it works. Drives me=0A> cr= azy... Once it's up and has valid time and is working, dnssec works=0A> gre= at but....=0A>=0A> + way cool new stuff in dnsmasq for ra and AAAA records= =0A> - huge dependency on keeping bind in there=0A> - aqm-scripts. I have n= ot succeed in making hfsc work right. Period.=0A> + HTB (vs hfsc) is provin= g far more tractable. SFQRED is scaling=0A> better than I'd dreamed. Maybe = eric dreamed this big, I didn't.=0A> - http://www.bufferbloat.net/issues/35= 2=0A> + Added some essential randomness back into the entropy pool=0A> - ho= stapd really acts up at high rates with the hack in there for more=0A> entr= oy (From the openwrt mainline)=0A> + named caching the roots idea discarded= in favor of classic '.'=0A>=0A>=0A> --=0A> Dave T=C3=A4ht=0A> SKYPE: davet= aht=0A> US Tel: 1-239-829-5608=0A> http://www.bufferbloat.net=0A> _________= ______________________________________=0A> Cerowrt-devel mailing list=0A> C= erowrt-devel@lists.bufferbloat.net=0A> https://lists.bufferbloat.net/listin= fo/cerowrt-devel=0A=0A=0A=0A-- =0ADave T=C3=A4ht=0ASKYPE: davetaht=0AUS Tel= : 1-239-829-5608=0Ahttp://www.bufferbloat.net ------=_20120406000932000000_37475 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I understa= nd this.  In the end of the day, however, *regression tests* matter, a= s well as tests to verify that new functionality actually works.

=0A

 

=0A

= I've managed projects with 200 daily "committers".  Unless those commi= tters get immediate feedback on what they break (accidentally) and design t= he tests for their new functionality so that others don't break what they c= arefully craft, projects go south and never recover.

=0A

 

=0A

You don't ha= ve that rate of committers here, but it's not really an excuse to say - "we= have to jam in code without testing it because we don't have a discipline = of testing and it's a waste of time".

=0A

 

=0A

50% of what a developer sho= uld be doing (if not more) is making sure that they don't break more than t= hey improve.

=0A

 

=0A

I realize this is tough, not fun, and sometimes ve= ry frustrating.  But cool "new stuff" is far less important than keepi= ng stuff stable.

=0A

 

=0A

I'm not trying to be negative - this is stuff I = learned at huge personal cost in very high stress environments where people= were literally screaming at me every hour of every day.

=0A

 

=0A

The cero= wrt/bufferbloat stuff is worth doing, and it's worth doing right - I'm a fa= n.

=0A

 

=0A

-----Original Message-----
From: "Dave Taht" <dave.tah= t@gmail.com>
Sent: Thursday, April 5, 2012 10:50pm
To: dpreed@= reed.com
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Ce= rowrt-devel] Cero-state this week and last

=0A
=0A

On Thu, Apr 5, 2012 = at 7:33 PM, <dpreed@reed.com> wrote:
> A small suggestion.>
>
>
> Create a regression test suite, and = require contributors to *pass* the test
> with each submitted patch= set.

A linear complete build of openwrt takes 17 hours on good = hardware.
It's hard to build in parallel.

A parallel full b= uild is about 3 hours but requires a bit of monitoring

Increment= al package builds are measured in minutes, however...

> Be da= mned politically incorrect about checkins that don't meet this criterion - = eliminate
> the right to check in code for anyone who contributes s= omething that breaks
> functionality.

The number of core= committers is quite low, too low, at present.
However the key problem= here is that
the matrix of potential breakage is far larger than any = one contribute
can deal with.

There are:

20 + fa= irly different cpu architectures *
150+ platforms *
3 different l= ibcs *
3 different (generation) toolchains *
5-6 different kernel= s

That matrix alone is hardly concievable to deal with. In there= are
arches that are genuinely weird (avr anyone), arches that havearbitrary endian, arches that are 32 bit and 64 bit...

Add in= well over a thousand software packages (everything from Apache
to zil= e), and you have an idea of how much code has dependencies on
other co= de...

For example, the breakage yesterday (or was it the day bef= ore) was in
a minor update to libtool, as best as I recall. It broke 3= packages
that cerowrt has available as options.

I'm lookin= g forward, very much, to seeing the buildbot produce a
known, good bui= ld, that I can layer my mere 67 patches and two dozen
packages on top = of without having to think too much.

> Every project leader d= iscovers this.

Cerowrt is an incredibly tiny superset of the ope= nwrt project. I help
out where I can.

> Programmers are = *lazy* and refuse to
> check their inputs unless you shame them int= o compliance.

Volunteer programmers are not lazy.

The= y do, however, have limited resources, and prefer to make progress
rat= her than make things perfect. Difficult to pass check-in tests
impeed = progress.

The fact that you or I can build an entire OS, in a ma= tter of hours,
today, and have it work, most often buffuddles me. This= is 10s of
millions of lines of code, all perfect, most of the time.
It used to take 500+ people to engineer an os in 1992, and 4 days= to
build. I consider this progress.

There are all sorts of= processes in place, some can certainly be
improved. For example, disc= ussed last week was methods for dealing
with and approving the backlog= of submitted patches by other
volunteers.

It mostly just n= eeds more eyeballs. And testing. There's a lot of good
stuff piled up.=

http://patchwork.openwrt.org/project/openwrt/list/
>>
>
> -----Original Message-----
> From: "Dav= e Taht" <dave.taht@gmail.com>
> Sent: Thursday, April 5, 2012= 10:27pm
> To: cerowrt-devel@lists.bufferbloat.net
> Subjec= t: [Cerowrt-devel] Cero-state this week and last
>
> I atte= nded the ietf conference in Paris (virtually), particularly ccrg
> = and homenet.
>
> I do encourage folk to pay attention to ho= menet if possible, as laying
> out what home networks will look lik= e in the next 10 years is proving
> to be a hairball.
> ccr= g was productive.
>
> Some news:
>
> I have= been spending time fixing some infrastructural problems.
>
&g= t; 1) After be-ing blindsided by more continuous integration problems in> the last month than in the last 5, I found out that one of the root=
> causes was that the openwrt build cluster had declined in size f= rom 8
> boxes to 1(!!), and time between successful automated build= s was in
> some cases over a month.
>
> The risk of= going 1 to 0 build slaves seemed untenable. So I sprang
> into act= ion, scammed two boxes and travis has tossed them into the
> cluste= r. Someone else volunteered a box.
>
> I am a huge proponen= t of continuous integration on complex projects.
> http://en.wikipe= dia.org/wiki/Continuous_integration
>
> Building all the co= mponents of an OS like openwrt correctly, all the
> time, with the = dozens of developers involved, with a minimum delta
> between commi= t, breakage, and fix, is really key to simplifying the
> relatively= simple task we face in bufferbloat.net of merely layering
> on com= ponents and fixes improving the state of the art in networking.
>> The tgrid is still looking quite bad at the moment.
>
> http://buildbot.openwrt.org:8010/tgrid
>
> There's st= ill a huge backlog of breakage.
>
> But I hope it gets bett= er. Certainly building a full cluster of build
> boxes or vms (open= wrt@HOME!!) would help a lot more.
>
> If anyone would like= to help hardware wise, or learn more about how to
> manage a build= cluster using buildbot, please contact travis
> <thepeople AT o= penwrt.org>
>
> 2) Bloatlab #1 has been completely rewir= ed and rebuilt and most of
> the routers in there reflashed to Cero= wrt-3.3.1-2 or later. They
> survived some serious network abuse ov= er the last couple days
> (ironically the only router that crashed = was the last rc6 box I had in
> the mix - and not due to a network = fault! I ran it out of flash with a
> logging tool).
>
> To deal with the complexity in there (there's also a sub-lab for some=
> sdnat and PCP testing), I ended up with a new ipv6 /48 and some = better
> ways to route that I'll write up soon.
>
>= 3) I did finally got back to fully working builds for the ar71xx
>= (cerowrt) architecture a few days ago. I also have a working 3.3.1
&g= t; kernel for the x86_64 build I use to test the server side.
> (bu= fferbloat is NOT just a router problem. Fixing all sides of a
> con= nection helps a lot). That + a new iproute2 + the debloat script
> = and YOU TOO can experience orders of magnitude less latency....
>> http://europa.lab.bufferbloat.net/debloat/ has that 3.3.1 kernel f= or x86_64
>
> Most of the past week has been backwards rath= er than forwards, but it
> was negative in a good way, mostly.
>
> I'm sorry it's been three weeks without a viable build for = others to test.
>
> 4) today's build: http://huchra.bufferb= loat.net/~cero1/3.3/3.3.1-4/
>
> + Linux 3.3.1 (this is mis= sing the sfq patch I liked, but it's good enough)
> + Working wifi = is back
> + No more fiddling with ethtool tx rings (up to 64 from 2= . BQL does
> this job better)
> + TCP CUBIC is now the defa= ult (no longer westwood)
> after 15+ years of misplaced faith in de= lay based tcp for wireless,
> I've collected enough data to convinc= e me the cubic wins. all the
> time.
> + alttcp enabled (ma= king it easy to switch)
> + latest netperf from svn (yea! remotely = changable diffserv settings
> for a test tool!)
>
>= - still horrible dependencies on time. You pretty much have to get on
> it and do a rndc validation disable multiple times, restart ntp
= > multiple times, killall named multiple times to get anywhere if you> want to get dns inside of 10 minutes.
>
> At this p= oint sometimes I just turn off named in /etc/xinetd.d/named
> and t= urn on port 53 for dnsmasq... but
> usually after flashing it the f= irst time, wait 10 minutes (let it
> clean flash), reboot, wait ano= ther 10, then it works. Drives me
> crazy... Once it's up and has v= alid time and is working, dnssec works
> great but....
>> + way cool new stuff in dnsmasq for ra and AAAA records
> -= huge dependency on keeping bind in there
> - aqm-scripts. I have n= ot succeed in making hfsc work right. Period.
> + HTB (vs hfsc) is = proving far more tractable. SFQRED is scaling
> better than I'd dre= amed. Maybe eric dreamed this big, I didn't.
> - http://www.bufferb= loat.net/issues/352
> + Added some essential randomness back into t= he entropy pool
> - hostapd really acts up at high rates with the h= ack in there for more
> entroy (From the openwrt mainline)
>= ; + named caching the roots idea discarded in favor of classic '.'
>= ;
>
> --
> Dave T=C3=A4ht
> SKYPE: davetaht=
> US Tel: 1-239-829-5608
> http://www.bufferbloat.net
> _______________________________________________
> Cerowrt-dev= el mailing list
> Cerowrt-devel@lists.bufferbloat.net
> htt= ps://lists.bufferbloat.net/listinfo/cerowrt-devel



--=
Dave T=C3=A4ht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://www.bufferbloat.net

=0A
------=_20120406000932000000_37475--