From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id E504B21F269 for ; Tue, 17 Mar 2015 07:52:24 -0700 (PDT) Received: by oigv203 with SMTP id v203so10184003oig.3 for ; Tue, 17 Mar 2015 07:52:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7/x1vdxC9xOCwE2iebi2T0Gi02cy6uRYhPBm62FDBiA=; b=hj8tOdgjTTFUXR1rR4vSE7b/f5xebMIiflZwdtyEb9npgmiCiDBC5iHeBdrohU6rGB 5MvnIfC4AfElUyCuZksJCYxbpDGJgekpjbQVuuzKvkg2hnjvIUmswYAb7J3BpbU2lKQC Ljs6N7hGUC/2wzmN9CVozfdUzBrFZ3yG/T2cHqYF8adSBwSxFcQ+n/rgQc5hkFwvM7cj Mn+KnMqDNoOqy3r+5kljGAarlscOtLK9pN6n5l06G3bMf5tfubmvGehyTQ7EppW5C/iw 9S16y2Mgmu9akb4YNmJnidOXjVpbHqovUfHunTY1fW4cWhD4G8Upmt/YRuLm8QAvthRL WOxw== MIME-Version: 1.0 X-Received: by 10.60.48.35 with SMTP id i3mr53150178oen.39.1426603943594; Tue, 17 Mar 2015 07:52:23 -0700 (PDT) Received: by 10.202.51.66 with HTTP; Tue, 17 Mar 2015 07:52:23 -0700 (PDT) In-Reply-To: References: <7i1tkozwf2.wl-jch@pps.univ-paris-diderot.fr> <87pp87x2yp.wl-jch@pps.univ-paris-diderot.fr> Date: Tue, 17 Mar 2015 07:52:23 -0700 Message-ID: From: Dave Taht To: "cerowrt-devel@lists.bufferbloat.net" Content-Type: multipart/alternative; boundary=001a1135f1e64b0eaa05117d1dd2 Subject: [Cerowrt-devel] Fwd: Dave's wishlist [was: Source-specific routing merged] X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Mar 2015 14:52:53 -0000 --001a1135f1e64b0eaa05117d1dd2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable ---------- Forwarded message ---------- From: Dave Taht Date: Tue, Mar 17, 2015 at 7:41 AM Subject: Re: Dave's wishlist [was: Source-specific routing merged] To: Juliusz Chroboczek Cc: "babel-users@lists.alioth.debian.org" < babel-users@lists.alioth.debian.org>, Gabriel Kerneis , Steven Barth , Henning Rogge , Paul McKenney , Felix Fietkau My quest is always for an extra "9" of reliability. Anyplace where you can make something more robust (even if it is out at the .9999999999) level, I tend to like to do in order to have the highest MTBF possible in combination with all the other moving parts on the spacecraft (spaceship earth). One of the reasons why I love paul mckenney so much is that he deeply cares about stuff that happens only one in a billion times. >From this blog post of his: http://paulmck.livejournal.com/37782.html "I quickly learned that the bug is difficult to reproduce, requiring something like 100 hours of focused rcutorture testing. Bisection based on 100-hour tests would have consumed the remainder of 2014 and a significant fraction of 2015, so something better was required. In fact, something *way= * better was required because there was only a very small number of failures, which meant that the expected test time to reproduce the bug might well have been 200 hours or even 300 hours instead of my best guess of 100 hours." so, thus, I get picky on system daemons. On Tue, Mar 17, 2015 at 4:45 AM, Juliusz Chroboczek < jch@pps.univ-paris-diderot.fr> wrote: > > 1) Did the issue with procd ever get resolved? (sighup I think it was) > > Gabriel, Steven? Can procd be configured not to send SIGHUP, or shall > I add an option to babeld to ignore it? (Currently babeld terminates on > SIGHUP, and I like it that way, since it prevents a babeld from sticking > around after you log off.) > this basically logjammed on this issue. Either procd needed to be modified to be able to send an arbitrary signal, or babel changed to take sighup as a reload. I'd done the simple patch to babel. But I understand your use case also (stop routing on hup via remote access), and have never poked into procd. Perhaps it could be changed to take a var for the actual signal number it uses per daemon, but I will argue that has overhead the openwrt devs would be loathe to take. But: I will look. Babel can't be the only thing that needs a different signal to reload... a third way out is just to patch babel for openwrt... > > 2) got the new vars into openwrt, or shall I do? > rtt branch had 2 new vars as best as I recall, and I envisioned the babels package being retired, which as best as I recall has extra vars like src-eq= . Secondly the command lines would get complex on me, and I figured just re-writing the conf file was saner than command line args. Get a sigWHATEVER reload (or mmap) the conf file, checksum it against the previous version, do nothing if it didn't change. Thirdly, having an openwrt specific uci and/or ubus parser that could be compiled in would be more reliable than a script, simpler and faster. I can try to find funding for doing that... (in like 1.7's timeframe!!) I looked over the libubus and libuci interfaces and staggered away confused. Gabriel? > > > ecn > > No, since I don't understand why you think that setting ECN on Babel > packets makes sense. (It might make sense to set ECN on some Babel > packets -- the ones that are marked as "urgent" -- but I'm interested in > hearing your reasoning.) > fq_codel is the default on openwrt. ECN is enabled on that. Basic ECN marking is 2 characters of new code. (tracking it harder, but that's boilerplate code now). hnetd is presently very dumb about coalescing /60s out of /64s... I'd like to be trying much faster update schedules on ethernet as per some of the discussion on homenet. But, let me take this subject to another thread than this. > > > atomic route updates > > Ausgeschlossen. Nothing is impossible. > Last time I tried, I got a number of complaints that it > broke operational networks. > As the new FIB table patches have landed in linux 4.0 and later, it has done some odd things with RCU that I am not sure would be a good thing with the present delete+add routes system everything like quagga+ babel seems to use. I'd written about it here while discussing the amazing new FIB patches (7x reductions in lookup time or more), but was not aware that henning had actually got atomic route updates that worked. http://lists.openwall.net/netdev/2015/03/11/136 So, perhaps autodetection of some sort here, also, would be of help. And figuring out why it used to break. And ooh! atomic route changes! no packet loss at all! Look at that extra 9! > It's also less important than it used to be, since the hysteresis > algorithm in 1.5.0 dramatically reduced the number of route switches -- > current versions of Babel should not be loosing a measurable number of > packets due to non-atomic switches. > How many 9s do you want? > > > IPV6_SUBTREES autodetection > > That should definitely be a runtime option. Matthieu? > > Not so sure about autodetection, Steven, Henning, do you have any ideas? > My thought here was to try to insert and then retrieve a route that would fail if IPV6_SUBTREES was not defined in the kernel. > -- Juliusz > --=20 Dave T=C3=A4ht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb --=20 Dave T=C3=A4ht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb --001a1135f1e64b0eaa05117d1dd2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

---------- Forwarded messag= e ----------
From: Dave Taht <dave.taht@gmail.com= >
Date: Tue, Mar 17, 2015 at 7:41 AM
Subject: Re: Dave'= s wishlist [was: Source-specific routing merged]
To: Juliusz Chroboczek = <jch@pps.univ-paris-did= erot.fr>
Cc: "babel-users@lists.alioth.debian.org" <babel-users@lists.alioth.debian.org<= /a>>, Gabriel Kerneis <gabrie= l@kerneis.info>, Steven Barth <cyrus@openwrt.org>, Henning Rogge <hrogge@gmail.com>, Paul McKenney <paulmck@linux.vnet.ibm.com>, Felix Fietkau &= lt;nbd@nbd.name>


My quest is always for an extra "9" of reliabi= lity. Anyplace where you can make something more robust (even if it is out = at the .9999999999) level, I tend to like to do in order to have the highes= t MTBF possible in combination with all the other moving parts on the space= craft (spaceship earth).

One of the reasons why I love p= aul mckenney so much is that he deeply cares about stuff that happens only = one in a billion times.

From this blog post of his: http://paulmck.live= journal.com/37782.html

"I quickly learned that th= e bug is difficult to reproduce, requiring=20 something like 100 hours of focused rcutorture testing. Bisection based on 100-hour tests would have consumed the remainder of 2014 and a=20 significant fraction of 2015, so something better was required. In=20 fact, something way better was required because there was only a=20 very small number of failures, which meant that the expected test time=20 to reproduce the bug might well have been 200 hours or even 300 hours=20 instead of my best guess of 100 hours."

so, thus, I get picky on system daemo= ns.

On Tue, Mar 17, 2015 at 4:45 AM, Juliusz Chroboczek <jch@pps.univ-paris-diderot.fr> wrote:
> 1) Did the issue with procd ever ge= t resolved? (sighup I think it was)

Gabriel, Steven?=C2=A0 Can procd be configured not to send SIGHUP, or shall=
I add an option to babeld to ignore it?=C2=A0 (Currently babeld terminates = on
SIGHUP, and I like it that way, since it prevents a babeld from sticking around after you log off.)

this = basically logjammed on this issue. Either procd needed to be modified to be= able to
send an arbitrary signal, or babel changed to take s= ighup as a reload.

I'd done the simple patch to babel= .

But I understand your use case also (stop routing on hu= p via remote access), and have never poked into procd. Perhaps it could be = changed to take a var for the actual signal number it uses per daemon, but = I will argue that has overhead the openwrt devs would be loathe to take. Bu= t: I will look. Babel can't be the only thing that needs a different si= gnal to reload...

a third way out is just to patch babel = for openwrt...
=C2=A0
> 2) got the new vars into openwrt, or shall I do?
=
rtt branch had 2 new vars as best as I recall, and I = envisioned the babels package being retired, which as best as I recall has = extra vars like src-eq.

Secondly the command lines would = get complex on me, and I figured just re-writing the conf file was saner th= an command line args. Get a sigWHATEVER reload (or mmap) the conf file, che= cksum it against the previous version, do nothing if it didn't change.<= br>
=C2=A0
Thirdly, having an openwrt specific uci = and/or ubus parser that could be compiled in would be more reliable than a = script, simpler and faster. I can try to find funding for doing that... (in= like 1.7's timeframe!!) I looked over the libubus and libuci interface= s and staggered away confused.

Gabriel?

> ecn

No, since I don't understand why you think that setting ECN on Babel packets makes sense.=C2=A0 (It might make sense to set ECN on some Babel packets -- the ones that are marked as "urgent" -- but I'm in= terested in
hearing your reasoning.)

fq_code= l is the default on openwrt. ECN is enabled on that. Basic ECN marking is 2= characters of new code. (tracking it harder, but that's boilerplate co= de now). hnetd is presently very dumb about coalescing /60s out of /64s...= =C2=A0 I'd like to be trying much faster update schedules on ethernet a= s per some of the discussion on homenet.

But, let me take this subje= ct to another thread than this.
=C2=A0

> atomic route updates

Ausgeschlossen.=C2=A0

Nothing is impossibl= e.
=C2=A0
Last time I tried, I got a number of complaints that it
broke operational networks.

As t= he new FIB table patches have landed in linux 4.0 and later, it has done so= me odd things with RCU that I am not sure would be a good thing with the pr= esent delete+add routes system everything like quagga+ babel seems to use.<= br>
I'd written about it here while discussing the amazing new FIB p= atches (7x reductions in lookup time or more), but was not aware that henni= ng had actually got atomic route updates that worked.

http://lis= ts.openwall.net/netdev/2015/03/11/136

So, perhaps aut= odetection of some sort here, also, would be of help.
=C2=A0<= br>
And figuring out why it used to break.

And = ooh! atomic route changes! no packet loss at all! Look at that extra 9!
=

> IPV6_SUBTREES autodetection

That should definitely be a runtime option.=C2=A0 Matthieu?

Not so sure about autodetection, Steven, Henning, do you have any ideas?

My thought here was to try to insert and then retrieve a route th= at would fail if IPV6_SUBTREES was not defined in the kernel.
=C2=A0
=
-- Juliusz



--
Dave T=C3=A4ht
= Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX= 3o84jjmb



--
Dave = T=C3=A4ht
Let's make wifi fast, less jittery and reliable again!
=
https://plus.google.com/u/0/10794217561599370655= 8/posts/TVX3o84jjmb
--001a1135f1e64b0eaa05117d1dd2--