From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp105.iad3a.emailsrvr.com (smtp105.iad3a.emailsrvr.com [173.203.187.105]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id E141C21F452 for ; Wed, 3 Dec 2014 16:45:11 -0800 (PST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp30.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 3911638019A; Wed, 3 Dec 2014 19:45:10 -0500 (EST) X-Virus-Scanned: OK Received: from app21.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp30.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id E0DC53801AB; Wed, 3 Dec 2014 19:45:09 -0500 (EST) X-Sender-Id: dpreed@reed.com Received: from app21.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by 0.0.0.0:25 (trex/5.4.1); Thu, 04 Dec 2014 00:45:10 GMT Received: from reed.com (localhost.localdomain [127.0.0.1]) by app21.wa-webapps.iad3a (Postfix) with ESMTP id CD380280054; Wed, 3 Dec 2014 19:45:09 -0500 (EST) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Wed, 3 Dec 2014 19:45:09 -0500 (EST) Date: Wed, 3 Dec 2014 19:45:09 -0500 (EST) From: dpreed@reed.com To: "Dave Taht" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20141203194509000000_77232" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: References: <20141203120246.GO10533@sliepen.org> <892513fe-8e57-4ee9-be7d-423a3afb4fba@reed.com> X-Auth-ID: dpreed@reed.com Message-ID: <1417653909.838517290@apps.rackspace.com> X-Mailer: webmail7.0 Cc: Guus Sliepen , tinc-devel@tinc-vpn.org, "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] =?utf-8?q?tinc_vpn=3A_adding_dscp_passthrough_=28?= =?utf-8?q?priorityinherit=29=2C_ecn=2C_and_fq=5Fcodel_support?= X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Dec 2014 00:45:40 -0000 ------=_20141203194509000000_77232 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0AAwesome start on the issue, in your note, Dave. Tor needs to change for= several reasons - not that it isn't great, but with IPv6 and other things = coming on line, plus the understanding of fq_codel's rationale, plus ... - = the world can do much better. Same with VPNs.=0A =0AI hope we can set our = sights on a convergent target that doesn't get bogged down in the tradeoffs= that were made when VPNs were originally proposed. The world is no longer= a bunch of disconnected networks protected by Cheswick firewalls. Cheswic= k said they were only temporary, and they've outlived their usefulness - th= ey actually create security risks more than they fix them (centralizing sec= urity creates points of failure and attack that exponentially decrease the = attackers' work factor). To some extent that is also true for Tor after th= ese many years.=0A =0ABy putting the intelligence about security in the net= work, you basically do all the bad things that the end-to-end argument enco= urages you to avoid. We could also put congestion control in the network b= y re-creating admission control and requiring contractual agreements to car= ry traffic across every intermediary. But I think that basically destroys = almost all the value of an "inter" net. It makes it a balkanized proprieta= ry set of subnets that have dozens of reasons why you can't connect with an= yone else, and no way to be free to connect.=0A =0A =0A =0A=0A=0AOn Wednesd= ay, December 3, 2014 2:44pm, "Dave Taht" said:=0A=0A= =0A=0A> On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed wro= te:=0A> > Tor needs this stuff very badly.=0A> =0A> Tor has many, many prob= lematic behaviors relevant to congestion control=0A> in general. Let me pas= te a bit of private discussion I'd had on it in a second,=0A> but a very go= od paper that touched upon it all was:=0A> =0A> DefenestraTor: Throwing out= Windows in Tor=0A> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf= =0A> =0A> Honestly tor needs to move to udp, and hide in all the upcoming= =0A> webrtc traffic....=0A> =0A> http://blog.mozilla.org/futurereleases/201= 4/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/=0A> =0A>= webrtc needs some sort of non-centralized rendezvous mechanism, but I am R= EALLY=0A> happy to see calls and video stay entirely inside my network when= they can be=0A> negotiated as such.=0A> =0A> https://plus.google.com/u/0/1= 07942175615993706558/posts/M4xUtpCKJ4P=0A> =0A> And of course, people are b= usily reinventing torrent in webrtc without=0A> paying attention to congest= ion control at all.=0A> =0A> https://github.com/feross/webtorrent/issues/39= =0A> =0A> Giving access to udp to javascript programmers... what could go w= rong?=0A> :/=0A> =0A> > I do wonder whether we should focus on vpn's rather= than end to end=0A> > encryption that does not leak secure information thr= ough from inside as the=0A> > plan seems to do.=0A> =0A> "plan"?=0A> =0A> I= like e2e encryption. I also like overlay networks. And meshes.=0A> And wor= king dns and service discovery. And low latency.=0A> =0A> vpns are useful a= bstractions for sharing an address space you=0A> may not want to share more= widely.=0A> =0A> and: I've taken a lot of flack about how fq doesn't help = on conventional=0A> vpns, and well, just came up with an unconventional vpn= idea,=0A> that might have some legs here... (certainly in my case tinc=0A>= as constructed already, no patches, solves hooking together the=0A> 12 net= works I have around the globe, mostly)=0A> =0A> As for "leaking information= ", packet size and frequency is generally=0A> an obvious indicator of a giv= en traffic type, some padding added or=0A> no. There is one piece of plaint= ext=0A> in tinc (the seqno), also. It also uses a fixed port number for bot= h=0A> sides of the connection (perhaps it shouldn't)=0A> =0A> So I don't ne= cessarily see a difference between sending a whole lot of=0A> varying data = on one tuple=0A> =0A> 2001:db8::1 <-> 2001:db8:1::1 on port 655=0A> =0A> vs= =0A> =0A> 2001:db8::1 <-> 2001:db8:1::1 port 655=0A> 2001:db8::2 <-> 2001:d= b8:1::1 port 655=0A> 2001:db8::3 <-> 2001:db8:1::1 port 655=0A> 2001:db8::4= <-> 2001:db8:1::1 port 655=0A> ....=0A> =0A> which solves the fq problem o= n a vpn like tinc neatly. A security feature=0A> could be source specific r= outing where we send stuff over different paths=0A> from different ipv6 sou= rce addresses... and mixing up the src/dest ports=0A> more but that complex= ifies the fq portion of the algo.... my thought=0A> for an initial implemen= tation is to just hard code the ipv6 address range.=0A> =0A> I think howeve= r that adding tons and tons of ipv6 addresses to a given=0A> interface is p= robably slow,=0A> and might break things like nd and/or multicast...=0A> = =0A> what would be cooler would be if you could allocate an entire /64 (or= =0A> /118) to the vpn daemon=0A> =0A> bindtoaddress(2001:db8::/118) (give m= e all the data for 1024 ips)=0A> =0A> but I am not sure how to go about doi= ng that..=0A> =0A> ...moving back to a formerly private discussion about to= rs woes...=0A> =0A> =0A> "This conversation is a bit separate from #11197 (= which is an=0A> implementation issue in obfsproxy), so separate discussion = somewhere=0A> would probably be required.=0A> =0A> So, there appears to be = a slight misconception on how tor traffic=0A> travels across the Internet t= hat I will attempt to clarify, and=0A> hopefully not get too terribly wrong= .=0A> =0A> Each step of a given connection over tor involves multiple TCP/I= P=0A> connections. To use a standard example of someone trying to watch Cat= =0A> Videos on the "real internet", it will look approximately like thus:= =0A> =0A> Client <-> Guard <-> Relay <-> Exit <-> Cat Videos=0A> =0A> Each = step is a separate TCP/IP connection, authenticated and encrypted=0A> via T= LS (TLS is likewise terminated at each hop). Using a pluggable=0A> transpor= t encapsulates the first hop's TLS session with a different=0A> protocol be= it obfs2, obfs3, or something else.=0A> =0A> The cat videos are passed thr= ough this path of many TCP/IP connections=0A> across things called Circuits= that are created/extended by the Client=0A> one hop at a time (So the exam= ple above, the kitty cats travel across=0A> 4 TCP/IP connections, relaying = data across a Circuit that spans from=0A> the Client to the Exit. If my art= skills were up to it, I would draw a=0A> diagram.).=0A> =0A> Circuits are = currently required to provide reliable, in-order delivery.=0A> =0A> In addi= tion to the standard congestion control provided by TCP/IP on a=0A> per-hop= basis, there is Circuit level flow control *and* "end to end"=0A> flow con= trol in the form of RELAY_SENDME cells, but given that multiple=0A> circuit= s can end up being multiplexed over a singlular TCP/IP=0A> connection, prop= agation of these RELAY_SENDME cells can get delayed due=0A> to HOL issues.= =0A> =0A> So, with that quick and dirty overview out of the way:=0A> =0A> *= "Ah so if ecn is enabled it can be used?"=0A> =0A> ECN will be used if it = is enabled, *but* the congestion information=0A> will not get propaged to t= he source/destination of a given stream.=0A> =0A> * "Does it retain iw10 (t= he Linux default nowadays sadly)?"=0A> =0A> Each TCP/IP connection if sent = from a host that uses a obnoxiously=0A> large initial window, will have an = obnoxiously large initial=0A> window.=0A> =0A> It is worth noting that sinc= e multiple Circuits originating from=0A> potentially numerous clients can a= nd will reuse existing TCP/IP=0A> connections if able to (see 5.3.1 of the = tor spec) that dropping packets=0A> between tor relays is kind of bad, beca= use all of the separate=0A> encapsulated flows sharing the singular TCP/IP = link will suffer (ECN=0A> would help here). This situation is rather unfort= unate as the good=0A> active queue management algorithms drop packets (when= ECN is not=0A> available).=0A> =0A> A better summary of tor's flow control= /bufferbloat woes is given in:=0A> =0A> DefenestraTor: Throwing out Windows= in Tor=0A> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf=0A> =0A>= The N23 algorithm suggested in the paper did not end up getting=0A> implem= ented into Tor, but I do not remember the reason off the top of=0A> my head= ."=0A> =0A> =0A> >=0A> >=0A> >=0A> > On Dec 3, 2014, Guus Sliepen wrote:=0A> >>=0A> >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, D= ave Taht wrote:=0A> >>=0A> >> [...]=0A> >>>=0A> >>> https://github.com/dtah= t/tinc=0A> >>>=0A> >>> I successfully converted tinc to use sendmsg and rec= vmsg, acquire=0A> (at=0A> >>> least on linux) the TTL/Hoplimit and IP_TOS/I= Pv6_TCLASS packet=0A> fields,=0A> >>=0A> >>=0A> >> Windows does not have se= ndmsg()/recvmsg(), but the BSDs support it.=0A> >>=0A> >>> as well as SO_TI= MESTAMPNS, and use a higher resolution internal=0A> clock.=0A> >>> Got pass= ing through the dscp values to work also, but:=0A> >>>=0A> >>> A) encapsula= tion of ecn capable marked packets, and availability in=0A> >>> the outer h= eader, without correct decapsulationm doesn't work well.=0A> >>>=0A> >>> Th= e outer packet gets marked, but by default the marking doesn't=0A> make=0A>= >>> it back into the inner packet when decoded.=0A> >>=0A> >>=0A> >> Is th= e kernel stripping the ECN bits provided by userspace? In the code=0A> >> i= n your git branch you strip the ECN bits out yourself.=0A> >>=0A> >>> So co= mmunicating somehow that a path can take ecn (and/or diffserv=0A> >>> marki= ngs) is needed between tinc daemons. I thought of perhaps=0A> >>> crafting = a special icmp message marked with CE but am open to ideas=0A> >>> that wou= ld be backward compatible.=0A> >>=0A> >>=0A> >> PMTU probes are used to dis= cover whether UDP works and how big the path=0A> >> MTU is, maybe it could = be used to discover whether ECN works as well?=0A> >> Set one of the ECN bi= ts on some of the PMTU probes, and if you receive a=0A> >> probe with that = ECN bit set, also set it on the probe reply. If you=0A> >> succesfully rece= ive a reply with ECN bits set, then you know ECN works.=0A> >> Since the re= mote side just echoes the contents of the probe, you could=0A> >> also put = a copy of the ECN bits in the probe payload, and then you can=0A> >> detect= if the ECN bits got zeroed. You can also define an OPTION_ECN in=0A> >> sr= c/connection.h, so nodes can announce their support for ECN, but that=0A> >= > should not be necessary I think.=0A> >>=0A> >>> B) I have long theorized = that a lot of userspace vpns bottleneck on=0A> >>> the read and encapsulate= step, and being strict FIFOs,=0A> >>> gradually accumulate delay until fin= ally they run out of read socket=0A> >>> buffer space and start dropping pa= ckets.=0A> >>=0A> >>=0A> >> Well, encryption and decryption takes a lot of = CPU time, but context=0A> >> switches are also bad.=0A> >>=0A> >> Tinc is t= reating UDP in a strictly FIFO way, but actually it does use a=0A> >> RED a= lgorithm when tunneling over TCP. That said, it only looks at its=0A> >> ow= n buffers to determine when to drop packets, and those only come into=0A> >= > play once the kernel's TCP buffers are filled.=0A> >>=0A> >>> so I had a = couple thoughts towards using multiple rx queues in the=0A> >>> vtun interf= ace, and/or trying to read more than one packet at a time=0A> >>> (via recv= mmsg) and do some level of fair queueing and queue=0A> management=0A> >>> (= codel) inside tinc itself. I think that's=0A> >>> pretty doable without mod= ifying the protocol any, but I'm not sure=0A> of=0A> >>> it's value until I= saturate some cpu more.=0A> >>=0A> >>=0A> >> I'd welcome any work in this = area :)=0A> >>=0A> >>> (and if you thought recvmsg was complex, look at rec= vmmsg)=0A> >>=0A> >>=0A> >> It seems someone is already working on that, se= e=0A> >> https://github.com/jasdeep-hundal/tinc.=0A> >>=0A> >>> D)=0A> >>>= =0A> >>> the bottleneck link above is actually not tinc but the gateway, an= d=0A> as=0A> >>> the gateway reverts to codel behavior on a single encapsul= ated flow=0A> >>> encapsulating all the other flows, we end up with about 4= 0ms of=0A> >>> induced delay on this test. While I have a better codel (get= s below=0A> >>> 20ms latency, not deployed), *fq*_codel by identifying indi= vidual=0A> >>> flows gets the induced delay on those flows down below 5ms.= =0A> >>=0A> >>=0A> >> But that should improve with ECN if fq_codel is confi= gured to use that,=0A> >> right?=0A> >>=0A> >>> At one level, tinc being so= nicely meshy means that the "fq" part of=0A> >>> fq_codel on the gateway w= ill have more chance to work against the=0A> >>> multiple vpn flows it gene= rates for all the potential vpn=0A> endpoints...=0A> >>>=0A> >>> but at ano= ther... lookie here! ipv6! 2^64 addresses or more to use!=0A> >>> and port = space to burn! What if I could make tinc open up 1024 ports=0A> >>> per con= nection, and have it fq all it's flows over those? What could=0A> >>> go wr= ong?=0A> >>=0A> >>=0A> >> Right, hash the header of the original packets, a= nd then select a port=0A> >> or address based on the hash? What about putti= ng that hash in the flow=0A> >> label of outer packets? Any routers that wo= uld actually treat those as=0A> >> separate flows?=0A> >=0A> >=0A> > -- Sen= t from my Android device with K-@ Mail. Please excuse my brevity.=0A> >=0A>= > _______________________________________________=0A> > Cerowrt-devel mail= ing list=0A> > Cerowrt-devel@lists.bufferbloat.net=0A> > https://lists.buff= erbloat.net/listinfo/cerowrt-devel=0A> >=0A> =0A> =0A> =0A> --=0A> Dave T= =C3=A4ht=0A> =0A> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_= Talks=0A> ------=_20141203194509000000_77232 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Awesome start on the iss= ue, in your note, Dave.  Tor needs to change for several reasons - not= that it isn't great, but with IPv6 and other things coming on line, plus t= he understanding of fq_codel's rationale, plus ... - the world can do much = better.  Same with VPNs.

=0A

 

=0A

I hope we can set our sights on a convergent target that doesn't = get bogged down in the tradeoffs that were made when VPNs were originally p= roposed.  The world is no longer a bunch of disconnected networks prot= ected by Cheswick firewalls.  Cheswick said they were only temporary, = and they've outlived their usefulness - they actually create security risks= more than they fix them (centralizing security creates points of failure a= nd attack that exponentially decrease the attackers' work factor).  To= some extent that is also true for Tor after these many years.

=0A

 

=0A

By putting the intelligence abou= t security in the network, you basically do all the bad things that the end= -to-end argument encourages you to avoid.  We could also put congestio= n control in the network by re-creating admission control and requiring con= tractual agreements to carry traffic across every intermediary.  But I= think that basically destroys almost all the value of an "inter" net. &nbs= p;It makes it a balkanized proprietary set of subnets that have dozens of r= easons why you can't connect with anyone else, and no way to be free to con= nect.

=0A

 

=0A

 

=0A<= p style=3D"margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wr= ap: break-word;"> 

=0A=0A



On Wednesday, December = 3, 2014 2:44pm, "Dave Taht" <dave.taht@gmail.com> said:

=0A

=0A

> On Wed, Dec= 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> wrote:
>= > Tor needs this stuff very badly.
>
> Tor has many, m= any problematic behaviors relevant to congestion control
> in gener= al. Let me paste a bit of private discussion I'd had on it in a second,
> but a very good paper that touched upon it all was:
>
= > DefenestraTor: Throwing out Windows in Tor
> http://www.cypher= punks.ca/~iang/pubs/defenestrator.pdf
>
> Honestly tor nee= ds to move to udp, and hide in all the upcoming
> webrtc traffic...= .
>
> http://blog.mozilla.org/futurereleases/2014/10/16/te= st-the-new-firefox-hello-webrtc-feature-in-firefox-beta/
>
&g= t; webrtc needs some sort of non-centralized rendezvous mechanism, but I am= REALLY
> happy to see calls and video stay entirely inside my netw= ork when they can be
> negotiated as such.
>
> htt= ps://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P
> =
> And of course, people are busily reinventing torrent in webrtc w= ithout
> paying attention to congestion control at all.
> <= br />> https://github.com/feross/webtorrent/issues/39
>
&g= t; Giving access to udp to javascript programmers... what could go wrong?> :/
>
> > I do wonder whether we should focus o= n vpn's rather than end to end
> > encryption that does not leak= secure information through from inside as the
> > plan seems to= do.
>
> "plan"?
>
> I like e2e encryptio= n. I also like overlay networks. And meshes.
> And working dns and = service discovery. And low latency.
>
> vpns are useful ab= stractions for sharing an address space you
> may not want to share= more widely.
>
> and: I've taken a lot of flack about how= fq doesn't help on conventional
> vpns, and well, just came up wit= h an unconventional vpn idea,
> that might have some legs here... (= certainly in my case tinc
> as constructed already, no patches, sol= ves hooking together the
> 12 networks I have around the globe, mos= tly)
>
> As for "leaking information", packet size and fre= quency is generally
> an obvious indicator of a given traffic type,= some padding added or
> no. There is one piece of plaintext
&= gt; in tinc (the seqno), also. It also uses a fixed port number for both> sides of the connection (perhaps it shouldn't)
>
>= So I don't necessarily see a difference between sending a whole lot of
> varying data on one tuple
>
> 2001:db8::1 <->= 2001:db8:1::1 on port 655
>
> vs
>
> 200= 1:db8::1 <-> 2001:db8:1::1 port 655
> 2001:db8::2 <-> 2= 001:db8:1::1 port 655
> 2001:db8::3 <-> 2001:db8:1::1 port 65= 5
> 2001:db8::4 <-> 2001:db8:1::1 port 655
> ....
>
> which solves the fq problem on a vpn like tinc neatly. A = security feature
> could be source specific routing where we send s= tuff over different paths
> from different ipv6 source addresses...= and mixing up the src/dest ports
> more but that complexifies the = fq portion of the algo.... my thought
> for an initial implementati= on is to just hard code the ipv6 address range.
>
> I thin= k however that adding tons and tons of ipv6 addresses to a given
> = interface is probably slow,
> and might break things like nd and/or= multicast...
>
> what would be cooler would be if you cou= ld allocate an entire /64 (or
> /118) to the vpn daemon
> <= br />> bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips)=
>
> but I am not sure how to go about doing that..
&= gt;
> ...moving back to a formerly private discussion about tors w= oes...
>
>
> "This conversation is a bit separate= from #11197 (which is an
> implementation issue in obfsproxy), so = separate discussion somewhere
> would probably be required.
&g= t;
> So, there appears to be a slight misconception on how tor tra= ffic
> travels across the Internet that I will attempt to clarify, = and
> hopefully not get too terribly wrong.
>
> Ea= ch step of a given connection over tor involves multiple TCP/IP
> c= onnections. To use a standard example of someone trying to watch Cat
&= gt; Videos on the "real internet", it will look approximately like thus:>
> Client <-> Guard <-> Relay <-> Exit &l= t;-> Cat Videos
>
> Each step is a separate TCP/IP conn= ection, authenticated and encrypted
> via TLS (TLS is likewise term= inated at each hop). Using a pluggable
> transport encapsulates the= first hop's TLS session with a different
> protocol be it obfs2, o= bfs3, or something else.
>
> The cat videos are passed thr= ough this path of many TCP/IP connections
> across things called Ci= rcuits that are created/extended by the Client
> one hop at a time = (So the example above, the kitty cats travel across
> 4 TCP/IP conn= ections, relaying data across a Circuit that spans from
> the Clien= t to the Exit. If my art skills were up to it, I would draw a
> dia= gram.).
>
> Circuits are currently required to provide rel= iable, in-order delivery.
>
> In addition to the standard = congestion control provided by TCP/IP on a
> per-hop basis, there i= s Circuit level flow control *and* "end to end"
> flow control in t= he form of RELAY_SENDME cells, but given that multiple
> circuits c= an end up being multiplexed over a singlular TCP/IP
> connection, p= ropagation of these RELAY_SENDME cells can get delayed due
> to HOL= issues.
>
> So, with that quick and dirty overview out of= the way:
>
> * "Ah so if ecn is enabled it can be used?"<= br />>
> ECN will be used if it is enabled, *but* the congestio= n information
> will not get propaged to the source/destination of = a given stream.
>
> * "Does it retain iw10 (the Linux defa= ult nowadays sadly)?"
>
> Each TCP/IP connection if sent f= rom a host that uses a obnoxiously
> large initial window, will hav= e an obnoxiously large initial
> window.
>
> It is= worth noting that since multiple Circuits originating from
> poten= tially numerous clients can and will reuse existing TCP/IP
> connec= tions if able to (see 5.3.1 of the tor spec) that dropping packets
>= ; between tor relays is kind of bad, because all of the separate
> = encapsulated flows sharing the singular TCP/IP link will suffer (ECN
&= gt; would help here). This situation is rather unfortunate as the good
> active queue management algorithms drop packets (when ECN is not
> available).
>
> A better summary of tor's flow contr= ol/bufferbloat woes is given in:
>
> DefenestraTor: Throwi= ng out Windows in Tor
> http://www.cypherpunks.ca/~iang/pubs/defene= strator.pdf
>
> The N23 algorithm suggested in the paper d= id not end up getting
> implemented into Tor, but I do not remember= the reason off the top of
> my head."
>
>
&= gt; >
> >
> >
> > On Dec 3, 2014, Guus = Sliepen <guus@tinc-vpn.org> wrote:
> >>
> >&= gt; On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote:
> &g= t;>
> >> [...]
> >>>
> >>&g= t; https://github.com/dtaht/tinc
> >>>
> >>&= gt; I successfully converted tinc to use sendmsg and recvmsg, acquire
= > (at
> >>> least on linux) the TTL/Hoplimit and IP_TOS= /IPv6_TCLASS packet
> fields,
> >>
> >>=
> >> Windows does not have sendmsg()/recvmsg(), but the BSDs= support it.
> >>
> >>> as well as SO_TIMEST= AMPNS, and use a higher resolution internal
> clock.
> >= >> Got passing through the dscp values to work also, but:
> &= gt;>>
> >>> A) encapsulation of ecn capable marked p= ackets, and availability in
> >>> the outer header, withou= t correct decapsulationm doesn't work well.
> >>>
>= ; >>> The outer packet gets marked, but by default the marking doe= sn't
> make
> >>> it back into the inner packet wh= en decoded.
> >>
> >>
> >> Is the= kernel stripping the ECN bits provided by userspace? In the code
>= >> in your git branch you strip the ECN bits out yourself.
>= >>
> >>> So communicating somehow that a path can t= ake ecn (and/or diffserv
> >>> markings) is needed between= tinc daemons. I thought of perhaps
> >>> crafting a speci= al icmp message marked with CE but am open to ideas
> >>> = that would be backward compatible.
> >>
> >>> >> PMTU probes are used to discover whether UDP works and how= big the path
> >> MTU is, maybe it could be used to discover= whether ECN works as well?
> >> Set one of the ECN bits on s= ome of the PMTU probes, and if you receive a
> >> probe with = that ECN bit set, also set it on the probe reply. If you
> >>= succesfully receive a reply with ECN bits set, then you know ECN works.> >> Since the remote side just echoes the contents of the prob= e, you could
> >> also put a copy of the ECN bits in the prob= e payload, and then you can
> >> detect if the ECN bits got z= eroed. You can also define an OPTION_ECN in
> >> src/connecti= on.h, so nodes can announce their support for ECN, but that
> >&= gt; should not be necessary I think.
> >>
> >>&= gt; B) I have long theorized that a lot of userspace vpns bottleneck on
> >>> the read and encapsulate step, and being strict FIFOs,<= br />> >>> gradually accumulate delay until finally they run ou= t of read socket
> >>> buffer space and start dropping pac= kets.
> >>
> >>
> >> Well, encryp= tion and decryption takes a lot of CPU time, but context
> >>= switches are also bad.
> >>
> >> Tinc is treat= ing UDP in a strictly FIFO way, but actually it does use a
> >&g= t; RED algorithm when tunneling over TCP. That said, it only looks at its> >> own buffers to determine when to drop packets, and those = only come into
> >> play once the kernel's TCP buffers are fi= lled.
> >>
> >>> so I had a couple thoughts = towards using multiple rx queues in the
> >>> vtun interfa= ce, and/or trying to read more than one packet at a time
> >>= > (via recvmmsg) and do some level of fair queueing and queue
> = management
> >>> (codel) inside tinc itself. I think that'= s
> >>> pretty doable without modifying the protocol any, = but I'm not sure
> of
> >>> it's value until I sat= urate some cpu more.
> >>
> >>
> >&g= t; I'd welcome any work in this area :)
> >>
> >&g= t;> (and if you thought recvmsg was complex, look at recvmmsg)
>= >>
> >>
> >> It seems someone is already= working on that, see
> >> https://github.com/jasdeep-hundal/= tinc.
> >>
> >>> D)
> >>>> >>> the bottleneck link above is actually not tinc but th= e gateway, and
> as
> >>> the gateway reverts to c= odel behavior on a single encapsulated flow
> >>> encapsul= ating all the other flows, we end up with about 40ms of
> >>&= gt; induced delay on this test. While I have a better codel (gets below
> >>> 20ms latency, not deployed), *fq*_codel by identifying = individual
> >>> flows gets the induced delay on those flo= ws down below 5ms.
> >>
> >>
> >>= But that should improve with ECN if fq_codel is configured to use that,> >> right?
> >>
> >>> At one le= vel, tinc being so nicely meshy means that the "fq" part of
> >&= gt;> fq_codel on the gateway will have more chance to work against the> >>> multiple vpn flows it generates for all the potential= vpn
> endpoints...
> >>>
> >>> b= ut at another... lookie here! ipv6! 2^64 addresses or more to use!
>= ; >>> and port space to burn! What if I could make tinc open up 10= 24 ports
> >>> per connection, and have it fq all it's flo= ws over those? What could
> >>> go wrong?
> >&g= t;
> >>
> >> Right, hash the header of the orig= inal packets, and then select a port
> >> or address based on= the hash? What about putting that hash in the flow
> >> labe= l of outer packets? Any routers that would actually treat those as
>= ; >> separate flows?
> >
> >
> > -- = Sent from my Android device with K-@ Mail. Please excuse my brevity.
&= gt; >
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.b= ufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowr= t-devel
> >
>
>
>
> --
&= gt; Dave T=C3=A4ht
>
> thttp://www.bufferbloat.net/project= s/bloat/wiki/Upcoming_Talks
>

=0A
------=_20141203194509000000_77232--