From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp113.ord1c.emailsrvr.com (smtp113.ord1c.emailsrvr.com [108.166.43.113]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 00A3821F3A4 for ; Wed, 3 Dec 2014 06:17:47 -0800 (PST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp15.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id 2338B380153; Wed, 3 Dec 2014 09:17:46 -0500 (EST) X-Virus-Scanned: OK Received: by smtp15.relay.ord1c.emailsrvr.com (Authenticated sender: dpreed-AT-reed.com) with ESMTPSA id 36964380354; Wed, 3 Dec 2014 09:17:45 -0500 (EST) X-Sender-Id: dpreed@reed.com Received: from [100.122.135.9] (173.sub-70-211-143.myvzw.com [70.211.143.173]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA) by 0.0.0.0:465 (trex/5.4.1); Wed, 03 Dec 2014 14:17:46 GMT User-Agent: K-@ Mail for Android X-Priority: 3 In-Reply-To: <20141203120246.GO10533@sliepen.org> References: <20141203120246.GO10533@sliepen.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----P1F8B3ZBUJPMXQB85GHJDDM45MRCSJ" Content-Transfer-Encoding: 7bit From: "David P. Reed" Date: Wed, 03 Dec 2014 06:17:41 -0800 To: Guus Sliepen ,tinc-devel@tinc-vpn.org Message-ID: <892513fe-8e57-4ee9-be7d-423a3afb4fba@reed.com> Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Dec 2014 14:18:16 -0000 ------P1F8B3ZBUJPMXQB85GHJDDM45MRCSJ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Tor needs this stuff very badly=2E I do wonder whether we should focus on = vpn's rather than end to end encryption that does not leak secure informati= on through from inside as the plan seems to do=2E On Dec 3, 2014, Guus = Sliepen wrote: >On Wed, Dec 03, 2014 at 12:07:59AM -0= 800, Dave Taht wrote: > >[=2E=2E=2E] >> https://github=2Ecom/dtaht/tinc >> = >> I successfully converted tinc to use sendmsg and recvmsg, acquire (at >= > least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet >fields, >= >Windows does not have sendmsg()/recvmsg(), but the BSDs support it=2E > >= > as well as SO_TIMESTAMPNS, and use a higher resolution internal >clock=2E= >> Got passing through the dscp values to work also, but: >> >> A) encaps= ulation of ecn capable marked packets, and availability in >> the outer hea= der, without correct decapsulationm doesn't work well=2E >> >> The outer p= acket gets marked, but by default the marking doesn't make >> it back into = the inner packet when decoded=2E > >Is the kernel stripping the ECN bits pr= ovided by userspace? In the code >in your git branch you strip the ECN bits= out yourself=2E > >> So communicating somehow that a path can take ecn (an= d/or diffserv >> markings) is needed between tinc daemons=2E I thought of p= erhaps >> crafting a special icmp message marked with CE but am open to ide= as >> that would be backward compatible=2E > >PMTU probes are used to disco= ver whether UDP works and how big the path >MTU is, maybe it could be used = to discover whether ECN works as well? >Set one of the ECN bits on some of = the PMTU probes, and if you receive >a >probe with that ECN bit set, also s= et it on the probe reply=2E If you >succesfully receive a reply with ECN bi= ts set, then you know ECN works=2E >Since the remote side just echoes the c= ontents of the probe, you could >also put a copy of the ECN bits in the pro= be payload, and then you can >detect if the ECN bits got zeroed=2E You can = also define an OPTION_ECN in >src/connection=2Eh, so nodes can announce the= ir support for ECN, but that >should not be necessary I think=2E > >> B) I = have long theorized that a lot of userspace vpns bottleneck on >> the read = and encapsulate step, and being strict FIFOs, >> gradually accumulate delay= until finally they run out of read socket >> buffer space and start droppi= ng packets=2E > >Well, encryption and decryption takes a lot of CPU time, b= ut context >switches are also bad=2E > >Tinc is treating UDP in a strictly = FIFO way, but actually it does use a >RED algorithm when tunneling over TCP= =2E That said, it only looks at its >own buffers to determine when to drop = packets, and those only come into >play once the kernel's TCP buffers are f= illed=2E > >> so I had a couple thoughts towards using multiple rx queues i= n the >> vtun interface, and/or trying to read more than one packet at a ti= me >> (via recvmmsg) and do some level of fair queueing and queue >manageme= nt >> (codel) inside tinc itself=2E I think that's >> pretty doable without= modifying the protocol any, but I'm not sure of >> it's value until I satu= rate some cpu more=2E > >I'd welcome any work in this area :) > >> (and if = you thought recvmsg was complex, look at recvmmsg) > >It seems someone is a= lready working on that, see >https://github=2Ecom/jasdeep-hundal/tinc=2E > = >> D) >> >> the bottleneck link above is actually not tinc but the gateway= , and >as >> the gateway reverts to codel behavior on a single encapsulated= flow >> encapsulating all the other flows, we end up with about 40ms of >>= induced delay on this test=2E While I have a better codel (gets below >> 2= 0ms latency, not deployed), *fq*_codel by identifying individual >> flows g= ets the induced delay on those flows down below 5ms=2E > >But that should i= mprove with ECN if fq_codel is configured to use that, >right? > >> At one = level, tinc being so nicely meshy means that the "fq" part of >> fq_codel o= n the gateway will have more chance to work against the >> multiple vpn flo= ws it generates for all the potential vpn >endpoints=2E=2E=2E >> >> but at= another=2E=2E=2E lookie here! ipv6! 2^64 addresses or more to use! >> and = port space to burn! What if I could make tinc open up 1024 ports >> per con= nection, and have it fq all it's flows over those? What could >> go wrong? = > >Right, hash the header of the original packets, and then select a port >= or address based on the hash? What about putting that hash in the flow >lab= el of outer packets? Any routers that would actually treat those as >separa= te flows? -- Sent from my Android device with K-@ Mail=2E Please excuse my= brevity=2E ------P1F8B3ZBUJPMXQB85GHJDDM45MRCSJ Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Tor needs this stuff very badly=2E

I do wonder whether we should focus on vpn's rather= than end to end encryption that does not leak secure information through f= rom inside as the plan seems to do=2E



On Dec= 3, 2014, Guus Sliepen <guus@tinc-vpn=2Eorg> wrote:
On Wed, =
Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote:

[=2E=2E=2E]
https://github=2Ecom/dtaht/tinc

I successfully converted tinc to use sendmsg and recvmsg, acquire (at<= br clear=3D"none">least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS p= acket fields,

Windows does not have sendmsg(= )/recvmsg(), but the BSDs support it=2E

as well as SO_TIMESTAMPN= S, and use a higher resolution internal clock=2E
Got pass= ing through the dscp values to work also, but:

A) encapsulation of ecn capable marked packets, and availability = in
the outer header, without correct decapsulationm doesn= 't work well=2E

The outer packet gets = marked, but by default the marking doesn't make
it back i= nto the inner packet when decoded=2E

Is the = kernel stripping the ECN bits provided by userspace? In the code
in your git branch you strip the ECN bits out yourself=2E

So communicating somehow that a path can take ecn (and/or diffserv
markings) is needed between tinc daemons=2E I thought of perh= aps
crafting a special icmp message marked with CE but am= open to ideas
that would be backward compatible=2E
PMTU probes are used to discover whether UDP work= s and how big the path
MTU is, maybe it could be used to = discover whether ECN works as well?
Set one of the ECN bi= ts on some of the PMTU probes, and if you receive a
probe= with that ECN bit set, also set it on the probe reply=2E If you
succesfully receive a reply with ECN bits set, then you know ECN = works=2E
Since the remote side just echoes the contents o= f the probe, you could
also put a copy of the ECN bits in= the probe payload, and then you can
detect if the ECN bi= ts got zeroed=2E You can also define an OPTION_ECN in
src= /connection=2Eh, so nodes can announce their support for ECN, but that
should not be necessary I think=2E

B) I have long = theorized that a lot of userspace vpns bottleneck on
the = read and encapsulate step, and being strict FIFOs,
gradua= lly accumulate delay until finally they run out of read socket
buffer space and start dropping packets=2E

Well, encryption and decryption takes a lot of CPU time, but contextswitches are also bad=2E

Tinc is treating UDP in a strictly FIFO way, but actually it does use a=
RED algorithm when tunneling over TCP=2E That said, it o= nly looks at its
own buffers to determine when to drop pa= ckets, and those only come into
play once the kernel's TC= P buffers are filled=2E

so I had a couple thoughts towards using= multiple rx queues in the
vtun interface, and/or trying = to read more than one packet at a time
(via recvmmsg) and= do some level of fair queueing and queue management
(cod= el) inside tinc itself=2E I think that's
pretty doable wi= thout modifying the protocol any, but I'm not sure of
it'= s value until I saturate some cpu more=2E

I'= d welcome any work in this area :)

(and if you thought recvmsg w= as complex, look at recvmmsg)

It seems someo= ne is already working on that, see
https://github=2Ecom/jasdeep-= hundal/tinc=2E

D)

the= bottleneck link above is actually not tinc but the gateway, and as
the gateway reverts to codel behavior on a single encapsulated f= low
encapsulating all the other flows, we end up with abo= ut 40ms of
induced delay on this test=2E While I have a b= etter codel (gets below
20ms latency, not deployed), *fq*= _codel by identifying individual
flows gets the induced d= elay on those flows down below 5ms=2E

But th= at should improve with ECN if fq_codel is configured to use that,
right?

At one level, tinc being so nicely meshy means = that the "fq" part of
fq_codel on the gateway w= ill have more chance to work against the
multiple vpn flo= ws it generates for all the potential vpn endpoints=2E=2E=2E

but at another=2E=2E=2E lookie here! ipv6! 2^64 addr= esses or more to use!
and port space to burn! What if I c= ould make tinc open up 1024 ports
per connection, and hav= e it fq all it's flows over those? What could
go wrong?
Right, hash the header of the original packet= s, and then select a port
or address based on the hash? W= hat about putting that hash in the flow
label of outer pa= ckets? Any routers that would actually treat those as
sep= arate flows?

-- Sen= t from my Android device with K-@ Mail= =2E Please excuse my brevity=2E ------P1F8B3ZBUJPMXQB85GHJDDM45MRCSJ--