[Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support

Guus Sliepen guus at tinc-vpn.org
Wed Dec 3 07:02:46 EST 2014


On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote:

[...]
> https://github.com/dtaht/tinc
> 
> I successfully converted tinc to use sendmsg and recvmsg, acquire (at
> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields,

Windows does not have sendmsg()/recvmsg(), but the BSDs support it.

> as well as SO_TIMESTAMPNS, and use a higher resolution internal clock.
> Got passing through the dscp values to work also, but:
> 
> A) encapsulation of ecn capable marked packets, and availability in
> the outer header, without correct decapsulationm doesn't work well.
> 
> The outer packet gets marked, but by default the marking doesn't make
> it back into the inner packet when decoded.

Is the kernel stripping the ECN bits provided by userspace? In the code
in your git branch you strip the ECN bits out yourself.

> So communicating somehow that a path can take ecn (and/or diffserv
> markings) is needed between tinc daemons. I thought of perhaps
> crafting a special icmp message marked with CE but am open to ideas
> that would be backward compatible.

PMTU probes are used to discover whether UDP works and how big the path
MTU is, maybe it could be used to discover whether ECN works as well?
Set one of the ECN bits on some of the PMTU probes, and if you receive a
probe with that ECN bit set, also set it on the probe reply. If you
succesfully receive a reply with ECN bits set, then you know ECN works.
Since the remote side just echoes the contents of the probe, you could
also put a copy of the ECN bits in the probe payload, and then you can
detect if the ECN bits got zeroed. You can also define an OPTION_ECN in
src/connection.h, so nodes can announce their support for ECN, but that
should not be necessary I think.

> B) I have long theorized that a lot of userspace vpns bottleneck on
> the read and encapsulate step, and being strict FIFOs,
> gradually accumulate delay until finally they run out of read socket
> buffer space and start dropping packets.

Well, encryption and decryption takes a lot of CPU time, but context
switches are also bad.

Tinc is treating UDP in a strictly FIFO way, but actually it does use a
RED algorithm when tunneling over TCP. That said, it only looks at its
own buffers to determine when to drop packets, and those only come into
play once the kernel's TCP buffers are filled.

> so I had a couple thoughts towards using multiple rx queues in the
> vtun interface, and/or trying to read more than one packet at a time
> (via recvmmsg) and do some level of fair queueing and queue management
> (codel) inside tinc itself. I think that's
> pretty doable without modifying the protocol any, but I'm not sure of
> it's value until I saturate some cpu more.

I'd welcome any work in this area :)

> (and if you thought recvmsg was complex, look at recvmmsg)

It seems someone is already working on that, see
https://github.com/jasdeep-hundal/tinc.

> D)
> 
> the bottleneck link above is actually not tinc but the gateway, and as
> the gateway reverts to codel behavior on a single encapsulated flow
> encapsulating all the other flows, we end up with about 40ms of
> induced delay on this test. While I have a better codel (gets below
> 20ms latency, not deployed), *fq*_codel by identifying individual
> flows gets the induced delay on those flows down below 5ms.

But that should improve with ECN if fq_codel is configured to use that,
right?

> At one level, tinc being so nicely meshy means that the "fq" part of
> fq_codel on the gateway will have more chance to work against the
> multiple vpn flows it generates for all the potential vpn endpoints...
> 
> but at another... lookie here! ipv6! 2^64 addresses or more to use!
> and port space to burn! What if I could make tinc open up 1024 ports
> per connection, and have it fq all it's flows over those? What could
> go wrong?

Right, hash the header of the original packets, and then select a port
or address based on the hash? What about putting that hash in the flow
label of outer packets? Any routers that would actually treat those as
separate flows?

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20141203/4f58c733/attachment.sig>


More information about the Cerowrt-devel mailing list