[Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support
dpreed at reed.com
dpreed at reed.com
Wed Dec 3 19:45:09 EST 2014
Awesome start on the issue, in your note, Dave. Tor needs to change for several reasons - not that it isn't great, but with IPv6 and other things coming on line, plus the understanding of fq_codel's rationale, plus ... - the world can do much better. Same with VPNs.
I hope we can set our sights on a convergent target that doesn't get bogged down in the tradeoffs that were made when VPNs were originally proposed. The world is no longer a bunch of disconnected networks protected by Cheswick firewalls. Cheswick said they were only temporary, and they've outlived their usefulness - they actually create security risks more than they fix them (centralizing security creates points of failure and attack that exponentially decrease the attackers' work factor). To some extent that is also true for Tor after these many years.
By putting the intelligence about security in the network, you basically do all the bad things that the end-to-end argument encourages you to avoid. We could also put congestion control in the network by re-creating admission control and requiring contractual agreements to carry traffic across every intermediary. But I think that basically destroys almost all the value of an "inter" net. It makes it a balkanized proprietary set of subnets that have dozens of reasons why you can't connect with anyone else, and no way to be free to connect.
On Wednesday, December 3, 2014 2:44pm, "Dave Taht" <dave.taht at gmail.com> said:
> On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed at reed.com> wrote:
> > Tor needs this stuff very badly.
>
> Tor has many, many problematic behaviors relevant to congestion control
> in general. Let me paste a bit of private discussion I'd had on it in a second,
> but a very good paper that touched upon it all was:
>
> DefenestraTor: Throwing out Windows in Tor
> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf
>
> Honestly tor needs to move to udp, and hide in all the upcoming
> webrtc traffic....
>
> http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/
>
> webrtc needs some sort of non-centralized rendezvous mechanism, but I am REALLY
> happy to see calls and video stay entirely inside my network when they can be
> negotiated as such.
>
> https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P
>
> And of course, people are busily reinventing torrent in webrtc without
> paying attention to congestion control at all.
>
> https://github.com/feross/webtorrent/issues/39
>
> Giving access to udp to javascript programmers... what could go wrong?
> :/
>
> > I do wonder whether we should focus on vpn's rather than end to end
> > encryption that does not leak secure information through from inside as the
> > plan seems to do.
>
> "plan"?
>
> I like e2e encryption. I also like overlay networks. And meshes.
> And working dns and service discovery. And low latency.
>
> vpns are useful abstractions for sharing an address space you
> may not want to share more widely.
>
> and: I've taken a lot of flack about how fq doesn't help on conventional
> vpns, and well, just came up with an unconventional vpn idea,
> that might have some legs here... (certainly in my case tinc
> as constructed already, no patches, solves hooking together the
> 12 networks I have around the globe, mostly)
>
> As for "leaking information", packet size and frequency is generally
> an obvious indicator of a given traffic type, some padding added or
> no. There is one piece of plaintext
> in tinc (the seqno), also. It also uses a fixed port number for both
> sides of the connection (perhaps it shouldn't)
>
> So I don't necessarily see a difference between sending a whole lot of
> varying data on one tuple
>
> 2001:db8::1 <-> 2001:db8:1::1 on port 655
>
> vs
>
> 2001:db8::1 <-> 2001:db8:1::1 port 655
> 2001:db8::2 <-> 2001:db8:1::1 port 655
> 2001:db8::3 <-> 2001:db8:1::1 port 655
> 2001:db8::4 <-> 2001:db8:1::1 port 655
> ....
>
> which solves the fq problem on a vpn like tinc neatly. A security feature
> could be source specific routing where we send stuff over different paths
> from different ipv6 source addresses... and mixing up the src/dest ports
> more but that complexifies the fq portion of the algo.... my thought
> for an initial implementation is to just hard code the ipv6 address range.
>
> I think however that adding tons and tons of ipv6 addresses to a given
> interface is probably slow,
> and might break things like nd and/or multicast...
>
> what would be cooler would be if you could allocate an entire /64 (or
> /118) to the vpn daemon
>
> bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips)
>
> but I am not sure how to go about doing that..
>
> ...moving back to a formerly private discussion about tors woes...
>
>
> "This conversation is a bit separate from #11197 (which is an
> implementation issue in obfsproxy), so separate discussion somewhere
> would probably be required.
>
> So, there appears to be a slight misconception on how tor traffic
> travels across the Internet that I will attempt to clarify, and
> hopefully not get too terribly wrong.
>
> Each step of a given connection over tor involves multiple TCP/IP
> connections. To use a standard example of someone trying to watch Cat
> Videos on the "real internet", it will look approximately like thus:
>
> Client <-> Guard <-> Relay <-> Exit <-> Cat Videos
>
> Each step is a separate TCP/IP connection, authenticated and encrypted
> via TLS (TLS is likewise terminated at each hop). Using a pluggable
> transport encapsulates the first hop's TLS session with a different
> protocol be it obfs2, obfs3, or something else.
>
> The cat videos are passed through this path of many TCP/IP connections
> across things called Circuits that are created/extended by the Client
> one hop at a time (So the example above, the kitty cats travel across
> 4 TCP/IP connections, relaying data across a Circuit that spans from
> the Client to the Exit. If my art skills were up to it, I would draw a
> diagram.).
>
> Circuits are currently required to provide reliable, in-order delivery.
>
> In addition to the standard congestion control provided by TCP/IP on a
> per-hop basis, there is Circuit level flow control *and* "end to end"
> flow control in the form of RELAY_SENDME cells, but given that multiple
> circuits can end up being multiplexed over a singlular TCP/IP
> connection, propagation of these RELAY_SENDME cells can get delayed due
> to HOL issues.
>
> So, with that quick and dirty overview out of the way:
>
> * "Ah so if ecn is enabled it can be used?"
>
> ECN will be used if it is enabled, *but* the congestion information
> will not get propaged to the source/destination of a given stream.
>
> * "Does it retain iw10 (the Linux default nowadays sadly)?"
>
> Each TCP/IP connection if sent from a host that uses a obnoxiously
> large initial window, will have an obnoxiously large initial
> window.
>
> It is worth noting that since multiple Circuits originating from
> potentially numerous clients can and will reuse existing TCP/IP
> connections if able to (see 5.3.1 of the tor spec) that dropping packets
> between tor relays is kind of bad, because all of the separate
> encapsulated flows sharing the singular TCP/IP link will suffer (ECN
> would help here). This situation is rather unfortunate as the good
> active queue management algorithms drop packets (when ECN is not
> available).
>
> A better summary of tor's flow control/bufferbloat woes is given in:
>
> DefenestraTor: Throwing out Windows in Tor
> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf
>
> The N23 algorithm suggested in the paper did not end up getting
> implemented into Tor, but I do not remember the reason off the top of
> my head."
>
>
> >
> >
> >
> > On Dec 3, 2014, Guus Sliepen <guus at tinc-vpn.org> wrote:
> >>
> >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote:
> >>
> >> [...]
> >>>
> >>> https://github.com/dtaht/tinc
> >>>
> >>> I successfully converted tinc to use sendmsg and recvmsg, acquire
> (at
> >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet
> fields,
> >>
> >>
> >> Windows does not have sendmsg()/recvmsg(), but the BSDs support it.
> >>
> >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal
> clock.
> >>> Got passing through the dscp values to work also, but:
> >>>
> >>> A) encapsulation of ecn capable marked packets, and availability in
> >>> the outer header, without correct decapsulationm doesn't work well.
> >>>
> >>> The outer packet gets marked, but by default the marking doesn't
> make
> >>> it back into the inner packet when decoded.
> >>
> >>
> >> Is the kernel stripping the ECN bits provided by userspace? In the code
> >> in your git branch you strip the ECN bits out yourself.
> >>
> >>> So communicating somehow that a path can take ecn (and/or diffserv
> >>> markings) is needed between tinc daemons. I thought of perhaps
> >>> crafting a special icmp message marked with CE but am open to ideas
> >>> that would be backward compatible.
> >>
> >>
> >> PMTU probes are used to discover whether UDP works and how big the path
> >> MTU is, maybe it could be used to discover whether ECN works as well?
> >> Set one of the ECN bits on some of the PMTU probes, and if you receive a
> >> probe with that ECN bit set, also set it on the probe reply. If you
> >> succesfully receive a reply with ECN bits set, then you know ECN works.
> >> Since the remote side just echoes the contents of the probe, you could
> >> also put a copy of the ECN bits in the probe payload, and then you can
> >> detect if the ECN bits got zeroed. You can also define an OPTION_ECN in
> >> src/connection.h, so nodes can announce their support for ECN, but that
> >> should not be necessary I think.
> >>
> >>> B) I have long theorized that a lot of userspace vpns bottleneck on
> >>> the read and encapsulate step, and being strict FIFOs,
> >>> gradually accumulate delay until finally they run out of read socket
> >>> buffer space and start dropping packets.
> >>
> >>
> >> Well, encryption and decryption takes a lot of CPU time, but context
> >> switches are also bad.
> >>
> >> Tinc is treating UDP in a strictly FIFO way, but actually it does use a
> >> RED algorithm when tunneling over TCP. That said, it only looks at its
> >> own buffers to determine when to drop packets, and those only come into
> >> play once the kernel's TCP buffers are filled.
> >>
> >>> so I had a couple thoughts towards using multiple rx queues in the
> >>> vtun interface, and/or trying to read more than one packet at a time
> >>> (via recvmmsg) and do some level of fair queueing and queue
> management
> >>> (codel) inside tinc itself. I think that's
> >>> pretty doable without modifying the protocol any, but I'm not sure
> of
> >>> it's value until I saturate some cpu more.
> >>
> >>
> >> I'd welcome any work in this area :)
> >>
> >>> (and if you thought recvmsg was complex, look at recvmmsg)
> >>
> >>
> >> It seems someone is already working on that, see
> >> https://github.com/jasdeep-hundal/tinc.
> >>
> >>> D)
> >>>
> >>> the bottleneck link above is actually not tinc but the gateway, and
> as
> >>> the gateway reverts to codel behavior on a single encapsulated flow
> >>> encapsulating all the other flows, we end up with about 40ms of
> >>> induced delay on this test. While I have a better codel (gets below
> >>> 20ms latency, not deployed), *fq*_codel by identifying individual
> >>> flows gets the induced delay on those flows down below 5ms.
> >>
> >>
> >> But that should improve with ECN if fq_codel is configured to use that,
> >> right?
> >>
> >>> At one level, tinc being so nicely meshy means that the "fq" part of
> >>> fq_codel on the gateway will have more chance to work against the
> >>> multiple vpn flows it generates for all the potential vpn
> endpoints...
> >>>
> >>> but at another... lookie here! ipv6! 2^64 addresses or more to use!
> >>> and port space to burn! What if I could make tinc open up 1024 ports
> >>> per connection, and have it fq all it's flows over those? What could
> >>> go wrong?
> >>
> >>
> >> Right, hash the header of the original packets, and then select a port
> >> or address based on the hash? What about putting that hash in the flow
> >> label of outer packets? Any routers that would actually treat those as
> >> separate flows?
> >
> >
> > -- Sent from my Android device with K-@ Mail. Please excuse my brevity.
> >
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
>
>
>
> --
> Dave Täht
>
> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20141203/7abc4de8/attachment-0002.html>
More information about the Cerowrt-devel
mailing list