* [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support @ 2014-12-03 8:07 Dave Taht 2014-12-03 12:02 ` Guus Sliepen 0 siblings, 1 reply; 10+ messages in thread From: Dave Taht @ 2014-12-03 8:07 UTC (permalink / raw) To: tinc-devel; +Cc: cerowrt-devel I have long included tinc in the cerowrt project as a lighter weight, meshy alternative to conventional vpns. I sat down a few days ago to think about how to make vpn connections work better through fq_codel, and decided I should maybe hack on a vpn to do the job. So I picked up tinc's source code for the first time, got it working on IPv6 as a switch in a matter of minutes between two endpoints (very impressed, thx!), and started hacking at it to see where I would get. This is partially the outgrowth of looking at an ietf document on ecn encapsulation and vpns.... https://tools.ietf.org/html/rfc6040 Experimental patches so far are at: https://github.com/dtaht/tinc I successfully converted tinc to use sendmsg and recvmsg, acquire (at least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields, as well as SO_TIMESTAMPNS, and use a higher resolution internal clock. Got passing through the dscp values to work also, but: A) encapsulation of ecn capable marked packets, and availability in the outer header, without correct decapsulationm doesn't work well. The outer packet gets marked, but by default the marking doesn't make it back into the inner packet when decoded. see: http://snapon.lab.bufferbloat.net/~d/tinc/ecn.png # packets get marked but not decapsulated - and never dropped so they just keep accumulating delay over this path.... vs http://snapon.lab.bufferbloat.net/~d/tinc/noecn.png So communicating somehow that a path can take ecn (and/or diffserv markings) is needed between tinc daemons. I thought of perhaps crafting a special icmp message marked with CE but am open to ideas that would be backward compatible. It IS nice to be able to operate with near zero packet loss..... B) I have long theorized that a lot of userspace vpns bottleneck on the read and encapsulate step, and being strict FIFOs, gradually accumulate delay until finally they run out of read socket buffer space and start dropping packets. so I had a couple thoughts towards using multiple rx queues in the vtun interface, and/or trying to read more than one packet at a time (via recvmmsg) and do some level of fair queueing and queue management (codel) inside tinc itself. I think that's pretty doable without modifying the protocol any, but I'm not sure of it's value until I saturate some cpu more. (and if you thought recvmsg was complex, look at recvmmsg) C) Moving forward, in this case, it looks like I am bottlenecked on my gateway anyway (only eating 36% of cpu at this speed, not showing any substantial delays with SO_TIMESTAMPNS (but I haven't fully checked that) http://snapon.lab.bufferbloat.net/~d/tinc2/native_ipv6.png http://snapon.lab.bufferbloat.net/~d/tinc2/tunneled_classified.png I am a little puzzled as to how well tinc handles out of order packet delivery (the EF,BE,BK(CS1) diffserv queues are handled differently by the shaper on the gateway... and: D) the bottleneck link above is actually not tinc but the gateway, and as the gateway reverts to codel behavior on a single encapsulated flow encapsulating all the other flows, we end up with about 40ms of induced delay on this test. While I have a better codel (gets below 20ms latency, not deployed), *fq*_codel by identifying individual flows gets the induced delay on those flows down below 5ms. At one level, tinc being so nicely meshy means that the "fq" part of fq_codel on the gateway will have more chance to work against the multiple vpn flows it generates for all the potential vpn endpoints... but at another... lookie here! ipv6! 2^64 addresses or more to use! and port space to burn! What if I could make tinc open up 1024 ports per connection, and have it fq all it's flows over those? What could go wrong? -- Dave Täht http://www.bufferbloat.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-03 8:07 [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support Dave Taht @ 2014-12-03 12:02 ` Guus Sliepen 2014-12-03 14:17 ` David P. Reed 2014-12-03 20:32 ` Dave Taht 0 siblings, 2 replies; 10+ messages in thread From: Guus Sliepen @ 2014-12-03 12:02 UTC (permalink / raw) To: tinc-devel; +Cc: cerowrt-devel [-- Attachment #1: Type: text/plain, Size: 4269 bytes --] On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: [...] > https://github.com/dtaht/tinc > > I successfully converted tinc to use sendmsg and recvmsg, acquire (at > least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields, Windows does not have sendmsg()/recvmsg(), but the BSDs support it. > as well as SO_TIMESTAMPNS, and use a higher resolution internal clock. > Got passing through the dscp values to work also, but: > > A) encapsulation of ecn capable marked packets, and availability in > the outer header, without correct decapsulationm doesn't work well. > > The outer packet gets marked, but by default the marking doesn't make > it back into the inner packet when decoded. Is the kernel stripping the ECN bits provided by userspace? In the code in your git branch you strip the ECN bits out yourself. > So communicating somehow that a path can take ecn (and/or diffserv > markings) is needed between tinc daemons. I thought of perhaps > crafting a special icmp message marked with CE but am open to ideas > that would be backward compatible. PMTU probes are used to discover whether UDP works and how big the path MTU is, maybe it could be used to discover whether ECN works as well? Set one of the ECN bits on some of the PMTU probes, and if you receive a probe with that ECN bit set, also set it on the probe reply. If you succesfully receive a reply with ECN bits set, then you know ECN works. Since the remote side just echoes the contents of the probe, you could also put a copy of the ECN bits in the probe payload, and then you can detect if the ECN bits got zeroed. You can also define an OPTION_ECN in src/connection.h, so nodes can announce their support for ECN, but that should not be necessary I think. > B) I have long theorized that a lot of userspace vpns bottleneck on > the read and encapsulate step, and being strict FIFOs, > gradually accumulate delay until finally they run out of read socket > buffer space and start dropping packets. Well, encryption and decryption takes a lot of CPU time, but context switches are also bad. Tinc is treating UDP in a strictly FIFO way, but actually it does use a RED algorithm when tunneling over TCP. That said, it only looks at its own buffers to determine when to drop packets, and those only come into play once the kernel's TCP buffers are filled. > so I had a couple thoughts towards using multiple rx queues in the > vtun interface, and/or trying to read more than one packet at a time > (via recvmmsg) and do some level of fair queueing and queue management > (codel) inside tinc itself. I think that's > pretty doable without modifying the protocol any, but I'm not sure of > it's value until I saturate some cpu more. I'd welcome any work in this area :) > (and if you thought recvmsg was complex, look at recvmmsg) It seems someone is already working on that, see https://github.com/jasdeep-hundal/tinc. > D) > > the bottleneck link above is actually not tinc but the gateway, and as > the gateway reverts to codel behavior on a single encapsulated flow > encapsulating all the other flows, we end up with about 40ms of > induced delay on this test. While I have a better codel (gets below > 20ms latency, not deployed), *fq*_codel by identifying individual > flows gets the induced delay on those flows down below 5ms. But that should improve with ECN if fq_codel is configured to use that, right? > At one level, tinc being so nicely meshy means that the "fq" part of > fq_codel on the gateway will have more chance to work against the > multiple vpn flows it generates for all the potential vpn endpoints... > > but at another... lookie here! ipv6! 2^64 addresses or more to use! > and port space to burn! What if I could make tinc open up 1024 ports > per connection, and have it fq all it's flows over those? What could > go wrong? Right, hash the header of the original packets, and then select a port or address based on the hash? What about putting that hash in the flow label of outer packets? Any routers that would actually treat those as separate flows? -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus@tinc-vpn.org> [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-03 12:02 ` Guus Sliepen @ 2014-12-03 14:17 ` David P. Reed 2014-12-03 19:44 ` Dave Taht 2014-12-03 20:32 ` Dave Taht 1 sibling, 1 reply; 10+ messages in thread From: David P. Reed @ 2014-12-03 14:17 UTC (permalink / raw) To: Guus Sliepen, tinc-devel; +Cc: cerowrt-devel [-- Attachment #1: Type: text/plain, Size: 4611 bytes --] Tor needs this stuff very badly. I do wonder whether we should focus on vpn's rather than end to end encryption that does not leak secure information through from inside as the plan seems to do. On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote: >On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: > >[...] >> https://github.com/dtaht/tinc >> >> I successfully converted tinc to use sendmsg and recvmsg, acquire (at >> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet >fields, > >Windows does not have sendmsg()/recvmsg(), but the BSDs support it. > >> as well as SO_TIMESTAMPNS, and use a higher resolution internal >clock. >> Got passing through the dscp values to work also, but: >> >> A) encapsulation of ecn capable marked packets, and availability in >> the outer header, without correct decapsulationm doesn't work well. >> >> The outer packet gets marked, but by default the marking doesn't make >> it back into the inner packet when decoded. > >Is the kernel stripping the ECN bits provided by userspace? In the code >in your git branch you strip the ECN bits out yourself. > >> So communicating somehow that a path can take ecn (and/or diffserv >> markings) is needed between tinc daemons. I thought of perhaps >> crafting a special icmp message marked with CE but am open to ideas >> that would be backward compatible. > >PMTU probes are used to discover whether UDP works and how big the path >MTU is, maybe it could be used to discover whether ECN works as well? >Set one of the ECN bits on some of the PMTU probes, and if you receive >a >probe with that ECN bit set, also set it on the probe reply. If you >succesfully receive a reply with ECN bits set, then you know ECN works. >Since the remote side just echoes the contents of the probe, you could >also put a copy of the ECN bits in the probe payload, and then you can >detect if the ECN bits got zeroed. You can also define an OPTION_ECN in >src/connection.h, so nodes can announce their support for ECN, but that >should not be necessary I think. > >> B) I have long theorized that a lot of userspace vpns bottleneck on >> the read and encapsulate step, and being strict FIFOs, >> gradually accumulate delay until finally they run out of read socket >> buffer space and start dropping packets. > >Well, encryption and decryption takes a lot of CPU time, but context >switches are also bad. > >Tinc is treating UDP in a strictly FIFO way, but actually it does use a >RED algorithm when tunneling over TCP. That said, it only looks at its >own buffers to determine when to drop packets, and those only come into >play once the kernel's TCP buffers are filled. > >> so I had a couple thoughts towards using multiple rx queues in the >> vtun interface, and/or trying to read more than one packet at a time >> (via recvmmsg) and do some level of fair queueing and queue >management >> (codel) inside tinc itself. I think that's >> pretty doable without modifying the protocol any, but I'm not sure of >> it's value until I saturate some cpu more. > >I'd welcome any work in this area :) > >> (and if you thought recvmsg was complex, look at recvmmsg) > >It seems someone is already working on that, see >https://github.com/jasdeep-hundal/tinc. > >> D) >> >> the bottleneck link above is actually not tinc but the gateway, and >as >> the gateway reverts to codel behavior on a single encapsulated flow >> encapsulating all the other flows, we end up with about 40ms of >> induced delay on this test. While I have a better codel (gets below >> 20ms latency, not deployed), *fq*_codel by identifying individual >> flows gets the induced delay on those flows down below 5ms. > >But that should improve with ECN if fq_codel is configured to use that, >right? > >> At one level, tinc being so nicely meshy means that the "fq" part of >> fq_codel on the gateway will have more chance to work against the >> multiple vpn flows it generates for all the potential vpn >endpoints... >> >> but at another... lookie here! ipv6! 2^64 addresses or more to use! >> and port space to burn! What if I could make tinc open up 1024 ports >> per connection, and have it fq all it's flows over those? What could >> go wrong? > >Right, hash the header of the original packets, and then select a port >or address based on the hash? What about putting that hash in the flow >label of outer packets? Any routers that would actually treat those as >separate flows? -- Sent from my Android device with K-@ Mail. Please excuse my brevity. [-- Attachment #2: Type: text/html, Size: 7290 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-03 14:17 ` David P. Reed @ 2014-12-03 19:44 ` Dave Taht 2014-12-04 0:45 ` dpreed 0 siblings, 1 reply; 10+ messages in thread From: Dave Taht @ 2014-12-03 19:44 UTC (permalink / raw) To: David P. Reed; +Cc: Guus Sliepen, tinc-devel, cerowrt-devel On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> wrote: > Tor needs this stuff very badly. Tor has many, many problematic behaviors relevant to congestion control in general. Let me paste a bit of private discussion I'd had on it in a second, but a very good paper that touched upon it all was: DefenestraTor: Throwing out Windows in Tor http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf Honestly tor needs to move to udp, and hide in all the upcoming webrtc traffic.... http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/ webrtc needs some sort of non-centralized rendezvous mechanism, but I am REALLY happy to see calls and video stay entirely inside my network when they can be negotiated as such. https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P And of course, people are busily reinventing torrent in webrtc without paying attention to congestion control at all. https://github.com/feross/webtorrent/issues/39 Giving access to udp to javascript programmers... what could go wrong? :/ > I do wonder whether we should focus on vpn's rather than end to end > encryption that does not leak secure information through from inside as the > plan seems to do. "plan"? I like e2e encryption. I also like overlay networks. And meshes. And working dns and service discovery. And low latency. vpns are useful abstractions for sharing an address space you may not want to share more widely. and: I've taken a lot of flack about how fq doesn't help on conventional vpns, and well, just came up with an unconventional vpn idea, that might have some legs here... (certainly in my case tinc as constructed already, no patches, solves hooking together the 12 networks I have around the globe, mostly) As for "leaking information", packet size and frequency is generally an obvious indicator of a given traffic type, some padding added or no. There is one piece of plaintext in tinc (the seqno), also. It also uses a fixed port number for both sides of the connection (perhaps it shouldn't) So I don't necessarily see a difference between sending a whole lot of varying data on one tuple 2001:db8::1 <-> 2001:db8:1::1 on port 655 vs 2001:db8::1 <-> 2001:db8:1::1 port 655 2001:db8::2 <-> 2001:db8:1::1 port 655 2001:db8::3 <-> 2001:db8:1::1 port 655 2001:db8::4 <-> 2001:db8:1::1 port 655 .... which solves the fq problem on a vpn like tinc neatly. A security feature could be source specific routing where we send stuff over different paths from different ipv6 source addresses... and mixing up the src/dest ports more but that complexifies the fq portion of the algo.... my thought for an initial implementation is to just hard code the ipv6 address range. I think however that adding tons and tons of ipv6 addresses to a given interface is probably slow, and might break things like nd and/or multicast... what would be cooler would be if you could allocate an entire /64 (or /118) to the vpn daemon bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) but I am not sure how to go about doing that.. ...moving back to a formerly private discussion about tors woes... "This conversation is a bit separate from #11197 (which is an implementation issue in obfsproxy), so separate discussion somewhere would probably be required. So, there appears to be a slight misconception on how tor traffic travels across the Internet that I will attempt to clarify, and hopefully not get too terribly wrong. Each step of a given connection over tor involves multiple TCP/IP connections. To use a standard example of someone trying to watch Cat Videos on the "real internet", it will look approximately like thus: Client <-> Guard <-> Relay <-> Exit <-> Cat Videos Each step is a separate TCP/IP connection, authenticated and encrypted via TLS (TLS is likewise terminated at each hop). Using a pluggable transport encapsulates the first hop's TLS session with a different protocol be it obfs2, obfs3, or something else. The cat videos are passed through this path of many TCP/IP connections across things called Circuits that are created/extended by the Client one hop at a time (So the example above, the kitty cats travel across 4 TCP/IP connections, relaying data across a Circuit that spans from the Client to the Exit. If my art skills were up to it, I would draw a diagram.). Circuits are currently required to provide reliable, in-order delivery. In addition to the standard congestion control provided by TCP/IP on a per-hop basis, there is Circuit level flow control *and* "end to end" flow control in the form of RELAY_SENDME cells, but given that multiple circuits can end up being multiplexed over a singlular TCP/IP connection, propagation of these RELAY_SENDME cells can get delayed due to HOL issues. So, with that quick and dirty overview out of the way: * "Ah so if ecn is enabled it can be used?" ECN will be used if it is enabled, *but* the congestion information will not get propaged to the source/destination of a given stream. * "Does it retain iw10 (the Linux default nowadays sadly)?" Each TCP/IP connection if sent from a host that uses a obnoxiously large initial window, will have an obnoxiously large initial window. It is worth noting that since multiple Circuits originating from potentially numerous clients can and will reuse existing TCP/IP connections if able to (see 5.3.1 of the tor spec) that dropping packets between tor relays is kind of bad, because all of the separate encapsulated flows sharing the singular TCP/IP link will suffer (ECN would help here). This situation is rather unfortunate as the good active queue management algorithms drop packets (when ECN is not available). A better summary of tor's flow control/bufferbloat woes is given in: DefenestraTor: Throwing out Windows in Tor http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf The N23 algorithm suggested in the paper did not end up getting implemented into Tor, but I do not remember the reason off the top of my head." > > > > On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote: >> >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: >> >> [...] >>> >>> https://github.com/dtaht/tinc >>> >>> I successfully converted tinc to use sendmsg and recvmsg, acquire (at >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields, >> >> >> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. >> >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal clock. >>> Got passing through the dscp values to work also, but: >>> >>> A) encapsulation of ecn capable marked packets, and availability in >>> the outer header, without correct decapsulationm doesn't work well. >>> >>> The outer packet gets marked, but by default the marking doesn't make >>> it back into the inner packet when decoded. >> >> >> Is the kernel stripping the ECN bits provided by userspace? In the code >> in your git branch you strip the ECN bits out yourself. >> >>> So communicating somehow that a path can take ecn (and/or diffserv >>> markings) is needed between tinc daemons. I thought of perhaps >>> crafting a special icmp message marked with CE but am open to ideas >>> that would be backward compatible. >> >> >> PMTU probes are used to discover whether UDP works and how big the path >> MTU is, maybe it could be used to discover whether ECN works as well? >> Set one of the ECN bits on some of the PMTU probes, and if you receive a >> probe with that ECN bit set, also set it on the probe reply. If you >> succesfully receive a reply with ECN bits set, then you know ECN works. >> Since the remote side just echoes the contents of the probe, you could >> also put a copy of the ECN bits in the probe payload, and then you can >> detect if the ECN bits got zeroed. You can also define an OPTION_ECN in >> src/connection.h, so nodes can announce their support for ECN, but that >> should not be necessary I think. >> >>> B) I have long theorized that a lot of userspace vpns bottleneck on >>> the read and encapsulate step, and being strict FIFOs, >>> gradually accumulate delay until finally they run out of read socket >>> buffer space and start dropping packets. >> >> >> Well, encryption and decryption takes a lot of CPU time, but context >> switches are also bad. >> >> Tinc is treating UDP in a strictly FIFO way, but actually it does use a >> RED algorithm when tunneling over TCP. That said, it only looks at its >> own buffers to determine when to drop packets, and those only come into >> play once the kernel's TCP buffers are filled. >> >>> so I had a couple thoughts towards using multiple rx queues in the >>> vtun interface, and/or trying to read more than one packet at a time >>> (via recvmmsg) and do some level of fair queueing and queue management >>> (codel) inside tinc itself. I think that's >>> pretty doable without modifying the protocol any, but I'm not sure of >>> it's value until I saturate some cpu more. >> >> >> I'd welcome any work in this area :) >> >>> (and if you thought recvmsg was complex, look at recvmmsg) >> >> >> It seems someone is already working on that, see >> https://github.com/jasdeep-hundal/tinc. >> >>> D) >>> >>> the bottleneck link above is actually not tinc but the gateway, and as >>> the gateway reverts to codel behavior on a single encapsulated flow >>> encapsulating all the other flows, we end up with about 40ms of >>> induced delay on this test. While I have a better codel (gets below >>> 20ms latency, not deployed), *fq*_codel by identifying individual >>> flows gets the induced delay on those flows down below 5ms. >> >> >> But that should improve with ECN if fq_codel is configured to use that, >> right? >> >>> At one level, tinc being so nicely meshy means that the "fq" part of >>> fq_codel on the gateway will have more chance to work against the >>> multiple vpn flows it generates for all the potential vpn endpoints... >>> >>> but at another... lookie here! ipv6! 2^64 addresses or more to use! >>> and port space to burn! What if I could make tinc open up 1024 ports >>> per connection, and have it fq all it's flows over those? What could >>> go wrong? >> >> >> Right, hash the header of the original packets, and then select a port >> or address based on the hash? What about putting that hash in the flow >> label of outer packets? Any routers that would actually treat those as >> separate flows? > > > -- Sent from my Android device with K-@ Mail. Please excuse my brevity. > > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel > -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-03 19:44 ` Dave Taht @ 2014-12-04 0:45 ` dpreed 2014-12-04 9:38 ` Sebastian Moeller 0 siblings, 1 reply; 10+ messages in thread From: dpreed @ 2014-12-04 0:45 UTC (permalink / raw) To: Dave Taht; +Cc: Guus Sliepen, tinc-devel, cerowrt-devel [-- Attachment #1: Type: text/plain, Size: 13121 bytes --] Awesome start on the issue, in your note, Dave. Tor needs to change for several reasons - not that it isn't great, but with IPv6 and other things coming on line, plus the understanding of fq_codel's rationale, plus ... - the world can do much better. Same with VPNs. I hope we can set our sights on a convergent target that doesn't get bogged down in the tradeoffs that were made when VPNs were originally proposed. The world is no longer a bunch of disconnected networks protected by Cheswick firewalls. Cheswick said they were only temporary, and they've outlived their usefulness - they actually create security risks more than they fix them (centralizing security creates points of failure and attack that exponentially decrease the attackers' work factor). To some extent that is also true for Tor after these many years. By putting the intelligence about security in the network, you basically do all the bad things that the end-to-end argument encourages you to avoid. We could also put congestion control in the network by re-creating admission control and requiring contractual agreements to carry traffic across every intermediary. But I think that basically destroys almost all the value of an "inter" net. It makes it a balkanized proprietary set of subnets that have dozens of reasons why you can't connect with anyone else, and no way to be free to connect. On Wednesday, December 3, 2014 2:44pm, "Dave Taht" <dave.taht@gmail.com> said: > On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> wrote: > > Tor needs this stuff very badly. > > Tor has many, many problematic behaviors relevant to congestion control > in general. Let me paste a bit of private discussion I'd had on it in a second, > but a very good paper that touched upon it all was: > > DefenestraTor: Throwing out Windows in Tor > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf > > Honestly tor needs to move to udp, and hide in all the upcoming > webrtc traffic.... > > http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/ > > webrtc needs some sort of non-centralized rendezvous mechanism, but I am REALLY > happy to see calls and video stay entirely inside my network when they can be > negotiated as such. > > https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P > > And of course, people are busily reinventing torrent in webrtc without > paying attention to congestion control at all. > > https://github.com/feross/webtorrent/issues/39 > > Giving access to udp to javascript programmers... what could go wrong? > :/ > > > I do wonder whether we should focus on vpn's rather than end to end > > encryption that does not leak secure information through from inside as the > > plan seems to do. > > "plan"? > > I like e2e encryption. I also like overlay networks. And meshes. > And working dns and service discovery. And low latency. > > vpns are useful abstractions for sharing an address space you > may not want to share more widely. > > and: I've taken a lot of flack about how fq doesn't help on conventional > vpns, and well, just came up with an unconventional vpn idea, > that might have some legs here... (certainly in my case tinc > as constructed already, no patches, solves hooking together the > 12 networks I have around the globe, mostly) > > As for "leaking information", packet size and frequency is generally > an obvious indicator of a given traffic type, some padding added or > no. There is one piece of plaintext > in tinc (the seqno), also. It also uses a fixed port number for both > sides of the connection (perhaps it shouldn't) > > So I don't necessarily see a difference between sending a whole lot of > varying data on one tuple > > 2001:db8::1 <-> 2001:db8:1::1 on port 655 > > vs > > 2001:db8::1 <-> 2001:db8:1::1 port 655 > 2001:db8::2 <-> 2001:db8:1::1 port 655 > 2001:db8::3 <-> 2001:db8:1::1 port 655 > 2001:db8::4 <-> 2001:db8:1::1 port 655 > .... > > which solves the fq problem on a vpn like tinc neatly. A security feature > could be source specific routing where we send stuff over different paths > from different ipv6 source addresses... and mixing up the src/dest ports > more but that complexifies the fq portion of the algo.... my thought > for an initial implementation is to just hard code the ipv6 address range. > > I think however that adding tons and tons of ipv6 addresses to a given > interface is probably slow, > and might break things like nd and/or multicast... > > what would be cooler would be if you could allocate an entire /64 (or > /118) to the vpn daemon > > bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) > > but I am not sure how to go about doing that.. > > ...moving back to a formerly private discussion about tors woes... > > > "This conversation is a bit separate from #11197 (which is an > implementation issue in obfsproxy), so separate discussion somewhere > would probably be required. > > So, there appears to be a slight misconception on how tor traffic > travels across the Internet that I will attempt to clarify, and > hopefully not get too terribly wrong. > > Each step of a given connection over tor involves multiple TCP/IP > connections. To use a standard example of someone trying to watch Cat > Videos on the "real internet", it will look approximately like thus: > > Client <-> Guard <-> Relay <-> Exit <-> Cat Videos > > Each step is a separate TCP/IP connection, authenticated and encrypted > via TLS (TLS is likewise terminated at each hop). Using a pluggable > transport encapsulates the first hop's TLS session with a different > protocol be it obfs2, obfs3, or something else. > > The cat videos are passed through this path of many TCP/IP connections > across things called Circuits that are created/extended by the Client > one hop at a time (So the example above, the kitty cats travel across > 4 TCP/IP connections, relaying data across a Circuit that spans from > the Client to the Exit. If my art skills were up to it, I would draw a > diagram.). > > Circuits are currently required to provide reliable, in-order delivery. > > In addition to the standard congestion control provided by TCP/IP on a > per-hop basis, there is Circuit level flow control *and* "end to end" > flow control in the form of RELAY_SENDME cells, but given that multiple > circuits can end up being multiplexed over a singlular TCP/IP > connection, propagation of these RELAY_SENDME cells can get delayed due > to HOL issues. > > So, with that quick and dirty overview out of the way: > > * "Ah so if ecn is enabled it can be used?" > > ECN will be used if it is enabled, *but* the congestion information > will not get propaged to the source/destination of a given stream. > > * "Does it retain iw10 (the Linux default nowadays sadly)?" > > Each TCP/IP connection if sent from a host that uses a obnoxiously > large initial window, will have an obnoxiously large initial > window. > > It is worth noting that since multiple Circuits originating from > potentially numerous clients can and will reuse existing TCP/IP > connections if able to (see 5.3.1 of the tor spec) that dropping packets > between tor relays is kind of bad, because all of the separate > encapsulated flows sharing the singular TCP/IP link will suffer (ECN > would help here). This situation is rather unfortunate as the good > active queue management algorithms drop packets (when ECN is not > available). > > A better summary of tor's flow control/bufferbloat woes is given in: > > DefenestraTor: Throwing out Windows in Tor > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf > > The N23 algorithm suggested in the paper did not end up getting > implemented into Tor, but I do not remember the reason off the top of > my head." > > > > > > > > > > On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote: > >> > >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: > >> > >> [...] > >>> > >>> https://github.com/dtaht/tinc > >>> > >>> I successfully converted tinc to use sendmsg and recvmsg, acquire > (at > >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet > fields, > >> > >> > >> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. > >> > >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal > clock. > >>> Got passing through the dscp values to work also, but: > >>> > >>> A) encapsulation of ecn capable marked packets, and availability in > >>> the outer header, without correct decapsulationm doesn't work well. > >>> > >>> The outer packet gets marked, but by default the marking doesn't > make > >>> it back into the inner packet when decoded. > >> > >> > >> Is the kernel stripping the ECN bits provided by userspace? In the code > >> in your git branch you strip the ECN bits out yourself. > >> > >>> So communicating somehow that a path can take ecn (and/or diffserv > >>> markings) is needed between tinc daemons. I thought of perhaps > >>> crafting a special icmp message marked with CE but am open to ideas > >>> that would be backward compatible. > >> > >> > >> PMTU probes are used to discover whether UDP works and how big the path > >> MTU is, maybe it could be used to discover whether ECN works as well? > >> Set one of the ECN bits on some of the PMTU probes, and if you receive a > >> probe with that ECN bit set, also set it on the probe reply. If you > >> succesfully receive a reply with ECN bits set, then you know ECN works. > >> Since the remote side just echoes the contents of the probe, you could > >> also put a copy of the ECN bits in the probe payload, and then you can > >> detect if the ECN bits got zeroed. You can also define an OPTION_ECN in > >> src/connection.h, so nodes can announce their support for ECN, but that > >> should not be necessary I think. > >> > >>> B) I have long theorized that a lot of userspace vpns bottleneck on > >>> the read and encapsulate step, and being strict FIFOs, > >>> gradually accumulate delay until finally they run out of read socket > >>> buffer space and start dropping packets. > >> > >> > >> Well, encryption and decryption takes a lot of CPU time, but context > >> switches are also bad. > >> > >> Tinc is treating UDP in a strictly FIFO way, but actually it does use a > >> RED algorithm when tunneling over TCP. That said, it only looks at its > >> own buffers to determine when to drop packets, and those only come into > >> play once the kernel's TCP buffers are filled. > >> > >>> so I had a couple thoughts towards using multiple rx queues in the > >>> vtun interface, and/or trying to read more than one packet at a time > >>> (via recvmmsg) and do some level of fair queueing and queue > management > >>> (codel) inside tinc itself. I think that's > >>> pretty doable without modifying the protocol any, but I'm not sure > of > >>> it's value until I saturate some cpu more. > >> > >> > >> I'd welcome any work in this area :) > >> > >>> (and if you thought recvmsg was complex, look at recvmmsg) > >> > >> > >> It seems someone is already working on that, see > >> https://github.com/jasdeep-hundal/tinc. > >> > >>> D) > >>> > >>> the bottleneck link above is actually not tinc but the gateway, and > as > >>> the gateway reverts to codel behavior on a single encapsulated flow > >>> encapsulating all the other flows, we end up with about 40ms of > >>> induced delay on this test. While I have a better codel (gets below > >>> 20ms latency, not deployed), *fq*_codel by identifying individual > >>> flows gets the induced delay on those flows down below 5ms. > >> > >> > >> But that should improve with ECN if fq_codel is configured to use that, > >> right? > >> > >>> At one level, tinc being so nicely meshy means that the "fq" part of > >>> fq_codel on the gateway will have more chance to work against the > >>> multiple vpn flows it generates for all the potential vpn > endpoints... > >>> > >>> but at another... lookie here! ipv6! 2^64 addresses or more to use! > >>> and port space to burn! What if I could make tinc open up 1024 ports > >>> per connection, and have it fq all it's flows over those? What could > >>> go wrong? > >> > >> > >> Right, hash the header of the original packets, and then select a port > >> or address based on the hash? What about putting that hash in the flow > >> label of outer packets? Any routers that would actually treat those as > >> separate flows? > > > > > > -- Sent from my Android device with K-@ Mail. Please excuse my brevity. > > > > _______________________________________________ > > Cerowrt-devel mailing list > > Cerowrt-devel@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/cerowrt-devel > > > > > > -- > Dave Täht > > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks > [-- Attachment #2: Type: text/html, Size: 17116 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-04 0:45 ` dpreed @ 2014-12-04 9:38 ` Sebastian Moeller 2014-12-04 15:30 ` David P. Reed 0 siblings, 1 reply; 10+ messages in thread From: Sebastian Moeller @ 2014-12-04 9:38 UTC (permalink / raw) To: dpreed; +Cc: cerowrt-devel Hi, on the danger of going off on a tangent... On Dec 4, 2014, at 01:45 , dpreed@reed.com wrote: > Awesome start on the issue, in your note, Dave. Tor needs to change for several reasons - not that it isn't great, but with IPv6 and other things coming on line, plus the understanding of fq_codel's rationale, plus ... - the world can do much better. Same with VPNs. > > I hope we can set our sights on a convergent target that doesn't get bogged down in the tradeoffs that were made when VPNs were originally proposed. The world is no longer a bunch of disconnected networks protected by Cheswick firewalls. Cheswick said they were only temporary, and they've outlived their usefulness - they actually create security risks more than they fix them (centralizing security creates points of failure and attack that exponentially decrease the attackers' work factor). To some extent that is also true for Tor after these many years. But trying to keep all computers on the end/edge secure also does not work/scale well, so both ends of the continuum have their issues; I would not be amazed if realistically we need to keep doing both… securing the end devices as well as intermediary devices. > > By putting the intelligence about security in the network, you basically do all the bad things that the end-to-end argument encourages you to avoid. I might misinterpret your point here, but given the devices people connect to their own networks full e2e without added layers of security seems not very practical. There is an ever growing class of devices orphaned by their makers (either explicitly like old ipods, or implicitly by lack of timely security fixes like Siemens SCADA systems, plus old but useful hardware requiring obsolete operating systems like windows XP, the list goes on...) that still can be used to good effect in a secured network but can not be trusted to access the wider internet, let alone be contacted by the wider internet. So unless we want to retire all those devices of dubious “security” we need a layer in the network that can preempt traffic to and from specific devices. In the old IPv4 days the for “end-users” ubiquitous NAT tool care of the “traffic to specific devices” to some degree. I would be happy if even in the brave new IPv6 world we could keep such gatekeepers/bouncers around, ideally also controlling which devices can send packets to the internet. I do not propose to put these things into the core of the network, but the boundary between a implicitly trusted home-network and the internet seems like a decent compromise to me. (I would also like such a device to default to "no implicit connectivity”, so that each device needs to be manually declared fit for the internet, so that the users are aware of this system). Since the number of connections between the home-net and the internet often is smaller than the number of connected devices in such a network, the transfer points/routers seem like ideal candidates to implement the “access control”. . (This does not mean that keeping end systems not secured and patched is a good idea, but at least should greatly diminish the risk imposed by sub-optimally secured end points, I think/hope). Being a biologist I like to think about this as maintaining a special niche for hard/impossible to secure devices in my home, avoiding their extinction/pawning by keeping the predators away; as fitness is relative. Might not work perfectly, but “good enough” would do ;) To cite the russians: Dowerjai, no prowerjai, "Trust, but verify”… > We could also put congestion control in the network by re-creating admission control and requiring contractual agreements to carry traffic across every intermediary. But I think that basically destroys almost all the value of an "inter" net. It makes it a balkanized proprietary set of subnets that have dozens of reasons why you can't connect with anyone else, and no way to be free to connect. > > > > > > On Wednesday, December 3, 2014 2:44pm, "Dave Taht" <dave.taht@gmail.com> said: > > > On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> wrote: > > > Tor needs this stuff very badly. > > > > Tor has many, many problematic behaviors relevant to congestion control > > in general. Let me paste a bit of private discussion I'd had on it in a second, > > but a very good paper that touched upon it all was: > > > > DefenestraTor: Throwing out Windows in Tor > > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf > > > > Honestly tor needs to move to udp, and hide in all the upcoming > > webrtc traffic.... > > > > http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/ > > > > webrtc needs some sort of non-centralized rendezvous mechanism, but I am REALLY > > happy to see calls and video stay entirely inside my network when they can be > > negotiated as such. > > > > https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P > > > > And of course, people are busily reinventing torrent in webrtc without > > paying attention to congestion control at all. > > > > https://github.com/feross/webtorrent/issues/39 > > > > Giving access to udp to javascript programmers... what could go wrong? > > :/ > > > > > I do wonder whether we should focus on vpn's rather than end to end > > > encryption that does not leak secure information through from inside as the > > > plan seems to do. > > > > "plan"? > > > > I like e2e encryption. I also like overlay networks. And meshes. > > And working dns and service discovery. And low latency. > > > > vpns are useful abstractions for sharing an address space you > > may not want to share more widely. > > > > and: I've taken a lot of flack about how fq doesn't help on conventional > > vpns, and well, just came up with an unconventional vpn idea, > > that might have some legs here... (certainly in my case tinc > > as constructed already, no patches, solves hooking together the > > 12 networks I have around the globe, mostly) > > > > As for "leaking information", packet size and frequency is generally > > an obvious indicator of a given traffic type, some padding added or > > no. There is one piece of plaintext > > in tinc (the seqno), also. It also uses a fixed port number for both > > sides of the connection (perhaps it shouldn't) > > > > So I don't necessarily see a difference between sending a whole lot of > > varying data on one tuple > > > > 2001:db8::1 <-> 2001:db8:1::1 on port 655 > > > > vs > > > > 2001:db8::1 <-> 2001:db8:1::1 port 655 > > 2001:db8::2 <-> 2001:db8:1::1 port 655 > > 2001:db8::3 <-> 2001:db8:1::1 port 655 > > 2001:db8::4 <-> 2001:db8:1::1 port 655 > > .... > > > > which solves the fq problem on a vpn like tinc neatly. A security feature > > could be source specific routing where we send stuff over different paths > > from different ipv6 source addresses... and mixing up the src/dest ports > > more but that complexifies the fq portion of the algo.... my thought > > for an initial implementation is to just hard code the ipv6 address range. > > > > I think however that adding tons and tons of ipv6 addresses to a given > > interface is probably slow, > > and might break things like nd and/or multicast... > > > > what would be cooler would be if you could allocate an entire /64 (or > > /118) to the vpn daemon > > > > bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) > > > > but I am not sure how to go about doing that.. > > > > ...moving back to a formerly private discussion about tors woes... > > > > > > "This conversation is a bit separate from #11197 (which is an > > implementation issue in obfsproxy), so separate discussion somewhere > > would probably be required. > > > > So, there appears to be a slight misconception on how tor traffic > > travels across the Internet that I will attempt to clarify, and > > hopefully not get too terribly wrong. > > > > Each step of a given connection over tor involves multiple TCP/IP > > connections. To use a standard example of someone trying to watch Cat > > Videos on the "real internet", it will look approximately like thus: > > > > Client <-> Guard <-> Relay <-> Exit <-> Cat Videos > > > > Each step is a separate TCP/IP connection, authenticated and encrypted > > via TLS (TLS is likewise terminated at each hop). Using a pluggable > > transport encapsulates the first hop's TLS session with a different > > protocol be it obfs2, obfs3, or something else. > > > > The cat videos are passed through this path of many TCP/IP connections > > across things called Circuits that are created/extended by the Client > > one hop at a time (So the example above, the kitty cats travel across > > 4 TCP/IP connections, relaying data across a Circuit that spans from > > the Client to the Exit. If my art skills were up to it, I would draw a > > diagram.). > > > > Circuits are currently required to provide reliable, in-order delivery. > > > > In addition to the standard congestion control provided by TCP/IP on a > > per-hop basis, there is Circuit level flow control *and* "end to end" > > flow control in the form of RELAY_SENDME cells, but given that multiple > > circuits can end up being multiplexed over a singlular TCP/IP > > connection, propagation of these RELAY_SENDME cells can get delayed due > > to HOL issues. > > > > So, with that quick and dirty overview out of the way: > > > > * "Ah so if ecn is enabled it can be used?" > > > > ECN will be used if it is enabled, *but* the congestion information > > will not get propaged to the source/destination of a given stream. > > > > * "Does it retain iw10 (the Linux default nowadays sadly)?" > > > > Each TCP/IP connection if sent from a host that uses a obnoxiously > > large initial window, will have an obnoxiously large initial > > window. > > > > It is worth noting that since multiple Circuits originating from > > potentially numerous clients can and will reuse existing TCP/IP > > connections if able to (see 5.3.1 of the tor spec) that dropping packets > > between tor relays is kind of bad, because all of the separate > > encapsulated flows sharing the singular TCP/IP link will suffer (ECN > > would help here). This situation is rather unfortunate as the good > > active queue management algorithms drop packets (when ECN is not > > available). > > > > A better summary of tor's flow control/bufferbloat woes is given in: > > > > DefenestraTor: Throwing out Windows in Tor > > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf > > > > The N23 algorithm suggested in the paper did not end up getting > > implemented into Tor, but I do not remember the reason off the top of > > my head." > > > > > > > > > > > > > > > > On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote: > > >> > > >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: > > >> > > >> [...] > > >>> > > >>> https://github.com/dtaht/tinc > > >>> > > >>> I successfully converted tinc to use sendmsg and recvmsg, acquire > > (at > > >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet > > fields, > > >> > > >> > > >> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. > > >> > > >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal > > clock. > > >>> Got passing through the dscp values to work also, but: > > >>> > > >>> A) encapsulation of ecn capable marked packets, and availability in > > >>> the outer header, without correct decapsulationm doesn't work well. > > >>> > > >>> The outer packet gets marked, but by default the marking doesn't > > make > > >>> it back into the inner packet when decoded. > > >> > > >> > > >> Is the kernel stripping the ECN bits provided by userspace? In the code > > >> in your git branch you strip the ECN bits out yourself. > > >> > > >>> So communicating somehow that a path can take ecn (and/or diffserv > > >>> markings) is needed between tinc daemons. I thought of perhaps > > >>> crafting a special icmp message marked with CE but am open to ideas > > >>> that would be backward compatible. > > >> > > >> > > >> PMTU probes are used to discover whether UDP works and how big the path > > >> MTU is, maybe it could be used to discover whether ECN works as well? > > >> Set one of the ECN bits on some of the PMTU probes, and if you receive a > > >> probe with that ECN bit set, also set it on the probe reply. If you > > >> succesfully receive a reply with ECN bits set, then you know ECN works. > > >> Since the remote side just echoes the contents of the probe, you could > > >> also put a copy of the ECN bits in the probe payload, and then you can > > >> detect if the ECN bits got zeroed. You can also define an OPTION_ECN in > > >> src/connection.h, so nodes can announce their support for ECN, but that > > >> should not be necessary I think. > > >> > > >>> B) I have long theorized that a lot of userspace vpns bottleneck on > > >>> the read and encapsulate step, and being strict FIFOs, > > >>> gradually accumulate delay until finally they run out of read socket > > >>> buffer space and start dropping packets. > > >> > > >> > > >> Well, encryption and decryption takes a lot of CPU time, but context > > >> switches are also bad. > > >> > > >> Tinc is treating UDP in a strictly FIFO way, but actually it does use a > > >> RED algorithm when tunneling over TCP. That said, it only looks at its > > >> own buffers to determine when to drop packets, and those only come into > > >> play once the kernel's TCP buffers are filled. > > >> > > >>> so I had a couple thoughts towards using multiple rx queues in the > > >>> vtun interface, and/or trying to read more than one packet at a time > > >>> (via recvmmsg) and do some level of fair queueing and queue > > management > > >>> (codel) inside tinc itself. I think that's > > >>> pretty doable without modifying the protocol any, but I'm not sure > > of > > >>> it's value until I saturate some cpu more. > > >> > > >> > > >> I'd welcome any work in this area :) > > >> > > >>> (and if you thought recvmsg was complex, look at recvmmsg) > > >> > > >> > > >> It seems someone is already working on that, see > > >> https://github.com/jasdeep-hundal/tinc. > > >> > > >>> D) > > >>> > > >>> the bottleneck link above is actually not tinc but the gateway, and > > as > > >>> the gateway reverts to codel behavior on a single encapsulated flow > > >>> encapsulating all the other flows, we end up with about 40ms of > > >>> induced delay on this test. While I have a better codel (gets below > > >>> 20ms latency, not deployed), *fq*_codel by identifying individual > > >>> flows gets the induced delay on those flows down below 5ms. > > >> > > >> > > >> But that should improve with ECN if fq_codel is configured to use that, > > >> right? > > >> > > >>> At one level, tinc being so nicely meshy means that the "fq" part of > > >>> fq_codel on the gateway will have more chance to work against the > > >>> multiple vpn flows it generates for all the potential vpn > > endpoints... > > >>> > > >>> but at another... lookie here! ipv6! 2^64 addresses or more to use! > > >>> and port space to burn! What if I could make tinc open up 1024 ports > > >>> per connection, and have it fq all it's flows over those? What could > > >>> go wrong? > > >> > > >> > > >> Right, hash the header of the original packets, and then select a port > > >> or address based on the hash? What about putting that hash in the flow > > >> label of outer packets? Any routers that would actually treat those as > > >> separate flows? > > > > > > > > > -- Sent from my Android device with K-@ Mail. Please excuse my brevity. > > > > > > _______________________________________________ > > > Cerowrt-devel mailing list > > > Cerowrt-devel@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/cerowrt-devel > > > > > > > > > > > -- > > Dave Täht > > > > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks > > > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-04 9:38 ` Sebastian Moeller @ 2014-12-04 15:30 ` David P. Reed 2014-12-04 19:03 ` Dave Taht 0 siblings, 1 reply; 10+ messages in thread From: David P. Reed @ 2014-12-04 15:30 UTC (permalink / raw) To: Sebastian Moeller; +Cc: cerowrt-devel [-- Attachment #1: Type: text/plain, Size: 18397 bytes --] I'd be more likely to agree if I thought that the network level technologies could work. The problem is that I've been in the system security business long enough (starting in 1973 in a professional role) that I know how useless the network level techniques are and how little is gained by tinkering with them to improve them. Scada systems cannot be secured by addons, period. Even airgap didn't protect against Stuxnet. It's approximately the same as security theater to think that home nets can be secured by a fancy vpn. That only deals with a single threat model and the solution really does not scale at all. It just lets designers of home systems off the hook so they can promote inherently bad designs by saying they don't need to fix their designs. As soon as a hacker can control most of the stuff in a rich person's home because of the IoT craze, we will see for profit rings promoting ransom attacks. Vpn's aren't likely to fix that at the network level. So my point is a little subtle. Put effort where it pays off. At the end to end authentication interoperability level rather than fantasy based solutions that just break the network. At the creation of systems for attribution and policing and prosecution of the motivated conspirators. On Dec 4, 2014, Sebastian Moeller <moeller0@gmx.de> wrote: >Hi, > >on the danger of going off on a tangent... > >On Dec 4, 2014, at 01:45 , dpreed@reed.com wrote: > >> Awesome start on the issue, in your note, Dave. Tor needs to change >for several reasons - not that it isn't great, but with IPv6 and other >things coming on line, plus the understanding of fq_codel's rationale, >plus ... - the world can do much better. Same with VPNs. >> >> I hope we can set our sights on a convergent target that doesn't get >bogged down in the tradeoffs that were made when VPNs were originally >proposed. The world is no longer a bunch of disconnected networks >protected by Cheswick firewalls. Cheswick said they were only >temporary, and they've outlived their usefulness - they actually create >security risks more than they fix them (centralizing security creates >points of failure and attack that exponentially decrease the attackers' >work factor). To some extent that is also true for Tor after these >many years. > > But trying to keep all computers on the end/edge secure also does not >work/scale well, so both ends of the continuum have their issues; I >would not be amazed if realistically we need to keep doing both… >securing the end devices as well as intermediary devices. > >> >> By putting the intelligence about security in the network, you >basically do all the bad things that the end-to-end argument encourages >you to avoid. > > I might misinterpret your point here, but given the devices people >connect to their own networks full e2e without added layers of security >seems not very practical. There is an ever growing class of devices >orphaned by their makers (either explicitly like old ipods, or >implicitly by lack of timely security fixes like Siemens SCADA systems, >plus old but useful hardware requiring obsolete operating systems like >windows XP, the list goes on...) that still can be used to good effect >in a secured network but can not be trusted to access the wider >internet, let alone be contacted by the wider internet. So unless we >want to retire all those devices of dubious “security” we need a layer >in the network that can preempt traffic to and from specific devices. >In the old IPv4 days the for “end-users” ubiquitous NAT tool care of >the “traffic to specific devices” to some degree. I would be happy if >even in the brave new IPv6 world we could keep such >gatekeepers/bouncers around, ideally also controlling which devices can >send packets to the internet. > I do not propose to put these things into the core of the network, but >the boundary between a implicitly trusted home-network and the internet >seems like a decent compromise to me. (I would also like such a device >to default to "no implicit connectivity”, so that each device needs to >be manually declared fit for the internet, so that the users are aware >of this system). Since the number of connections between the home-net >and the internet often is smaller than the number of connected devices >in such a network, the transfer points/routers seem like ideal >candidates to implement the “access control”. . (This does not mean >that keeping end systems not secured and patched is a good idea, but at >least should greatly diminish the risk imposed by sub-optimally secured >end points, I think/hope). > Being a biologist I like to think about this as maintaining a special >niche for hard/impossible to secure devices in my home, avoiding their >extinction/pawning by keeping the predators away; as fitness is >relative. Might not work perfectly, but “good enough” would do ;) > To cite the russians: Dowerjai, no prowerjai, "Trust, but verify”… > > >> We could also put congestion control in the network by re-creating >admission control and requiring contractual agreements to carry traffic >across every intermediary. But I think that basically destroys almost >all the value of an "inter" net. It makes it a balkanized proprietary >set of subnets that have dozens of reasons why you can't connect with >anyone else, and no way to be free to connect. >> >> >> >> >> >> On Wednesday, December 3, 2014 2:44pm, "Dave Taht" ><dave.taht@gmail.com> said: >> >> > On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> >wrote: >> > > Tor needs this stuff very badly. >> > >> > Tor has many, many problematic behaviors relevant to congestion >control >> > in general. Let me paste a bit of private discussion I'd had on it >in a second, >> > but a very good paper that touched upon it all was: >> > >> > DefenestraTor: Throwing out Windows in Tor >> > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf >> > >> > Honestly tor needs to move to udp, and hide in all the upcoming >> > webrtc traffic.... >> > >> > >http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/ >> > >> > webrtc needs some sort of non-centralized rendezvous mechanism, but >I am REALLY >> > happy to see calls and video stay entirely inside my network when >they can be >> > negotiated as such. >> > >> > https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P >> > >> > And of course, people are busily reinventing torrent in webrtc >without >> > paying attention to congestion control at all. >> > >> > https://github.com/feross/webtorrent/issues/39 >> > >> > Giving access to udp to javascript programmers... what could go >wrong? >> > :/ >> > >> > > I do wonder whether we should focus on vpn's rather than end to >end >> > > encryption that does not leak secure information through from >inside as the >> > > plan seems to do. >> > >> > "plan"? >> > >> > I like e2e encryption. I also like overlay networks. And meshes. >> > And working dns and service discovery. And low latency. >> > >> > vpns are useful abstractions for sharing an address space you >> > may not want to share more widely. >> > >> > and: I've taken a lot of flack about how fq doesn't help on >conventional >> > vpns, and well, just came up with an unconventional vpn idea, >> > that might have some legs here... (certainly in my case tinc >> > as constructed already, no patches, solves hooking together the >> > 12 networks I have around the globe, mostly) >> > >> > As for "leaking information", packet size and frequency is >generally >> > an obvious indicator of a given traffic type, some padding added or >> > no. There is one piece of plaintext >> > in tinc (the seqno), also. It also uses a fixed port number for >both >> > sides of the connection (perhaps it shouldn't) >> > >> > So I don't necessarily see a difference between sending a whole lot >of >> > varying data on one tuple >> > >> > 2001:db8::1 <-> 2001:db8:1::1 on port 655 >> > >> > vs >> > >> > 2001:db8::1 <-> 2001:db8:1::1 port 655 >> > 2001:db8::2 <-> 2001:db8:1::1 port 655 >> > 2001:db8::3 <-> 2001:db8:1::1 port 655 >> > 2001:db8::4 <-> 2001:db8:1::1 port 655 >> > .... >> > >> > which solves the fq problem on a vpn like tinc neatly. A security >feature >> > could be source specific routing where we send stuff over different >paths >> > from different ipv6 source addresses... and mixing up the src/dest >ports >> > more but that complexifies the fq portion of the algo.... my >thought >> > for an initial implementation is to just hard code the ipv6 address >range. >> > >> > I think however that adding tons and tons of ipv6 addresses to a >given >> > interface is probably slow, >> > and might break things like nd and/or multicast... >> > >> > what would be cooler would be if you could allocate an entire /64 >(or >> > /118) to the vpn daemon >> > >> > bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) >> > >> > but I am not sure how to go about doing that.. >> > >> > ...moving back to a formerly private discussion about tors woes... >> > >> > >> > "This conversation is a bit separate from #11197 (which is an >> > implementation issue in obfsproxy), so separate discussion >somewhere >> > would probably be required. >> > >> > So, there appears to be a slight misconception on how tor traffic >> > travels across the Internet that I will attempt to clarify, and >> > hopefully not get too terribly wrong. >> > >> > Each step of a given connection over tor involves multiple TCP/IP >> > connections. To use a standard example of someone trying to watch >Cat >> > Videos on the "real internet", it will look approximately like >thus: >> > >> > Client <-> Guard <-> Relay <-> Exit <-> Cat Videos >> > >> > Each step is a separate TCP/IP connection, authenticated and >encrypted >> > via TLS (TLS is likewise terminated at each hop). Using a pluggable >> > transport encapsulates the first hop's TLS session with a different >> > protocol be it obfs2, obfs3, or something else. >> > >> > The cat videos are passed through this path of many TCP/IP >connections >> > across things called Circuits that are created/extended by the >Client >> > one hop at a time (So the example above, the kitty cats travel >across >> > 4 TCP/IP connections, relaying data across a Circuit that spans >from >> > the Client to the Exit. If my art skills were up to it, I would >draw a >> > diagram.). >> > >> > Circuits are currently required to provide reliable, in-order >delivery. >> > >> > In addition to the standard congestion control provided by TCP/IP >on a >> > per-hop basis, there is Circuit level flow control *and* "end to >end" >> > flow control in the form of RELAY_SENDME cells, but given that >multiple >> > circuits can end up being multiplexed over a singlular TCP/IP >> > connection, propagation of these RELAY_SENDME cells can get delayed >due >> > to HOL issues. >> > >> > So, with that quick and dirty overview out of the way: >> > >> > * "Ah so if ecn is enabled it can be used?" >> > >> > ECN will be used if it is enabled, *but* the congestion information >> > will not get propaged to the source/destination of a given stream. >> > >> > * "Does it retain iw10 (the Linux default nowadays sadly)?" >> > >> > Each TCP/IP connection if sent from a host that uses a obnoxiously >> > large initial window, will have an obnoxiously large initial >> > window. >> > >> > It is worth noting that since multiple Circuits originating from >> > potentially numerous clients can and will reuse existing TCP/IP >> > connections if able to (see 5.3.1 of the tor spec) that dropping >packets >> > between tor relays is kind of bad, because all of the separate >> > encapsulated flows sharing the singular TCP/IP link will suffer >(ECN >> > would help here). This situation is rather unfortunate as the good >> > active queue management algorithms drop packets (when ECN is not >> > available). >> > >> > A better summary of tor's flow control/bufferbloat woes is given >in: >> > >> > DefenestraTor: Throwing out Windows in Tor >> > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf >> > >> > The N23 algorithm suggested in the paper did not end up getting >> > implemented into Tor, but I do not remember the reason off the top >of >> > my head." >> > >> > >> > > >> > > >> > > >> > > On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote: >> > >> >> > >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: >> > >> >> > >> [...] >> > >>> >> > >>> https://github.com/dtaht/tinc >> > >>> >> > >>> I successfully converted tinc to use sendmsg and recvmsg, >acquire >> > (at >> > >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet >> > fields, >> > >> >> > >> >> > >> Windows does not have sendmsg()/recvmsg(), but the BSDs support >it. >> > >> >> > >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal >> > clock. >> > >>> Got passing through the dscp values to work also, but: >> > >>> >> > >>> A) encapsulation of ecn capable marked packets, and >availability in >> > >>> the outer header, without correct decapsulationm doesn't work >well. >> > >>> >> > >>> The outer packet gets marked, but by default the marking >doesn't >> > make >> > >>> it back into the inner packet when decoded. >> > >> >> > >> >> > >> Is the kernel stripping the ECN bits provided by userspace? In >the code >> > >> in your git branch you strip the ECN bits out yourself. >> > >> >> > >>> So communicating somehow that a path can take ecn (and/or >diffserv >> > >>> markings) is needed between tinc daemons. I thought of perhaps >> > >>> crafting a special icmp message marked with CE but am open to >ideas >> > >>> that would be backward compatible. >> > >> >> > >> >> > >> PMTU probes are used to discover whether UDP works and how big >the path >> > >> MTU is, maybe it could be used to discover whether ECN works as >well? >> > >> Set one of the ECN bits on some of the PMTU probes, and if you >receive a >> > >> probe with that ECN bit set, also set it on the probe reply. If >you >> > >> succesfully receive a reply with ECN bits set, then you know ECN >works. >> > >> Since the remote side just echoes the contents of the probe, you >could >> > >> also put a copy of the ECN bits in the probe payload, and then >you can >> > >> detect if the ECN bits got zeroed. You can also define an >OPTION_ECN in >> > >> src/connection.h, so nodes can announce their support for ECN, >but that >> > >> should not be necessary I think. >> > >> >> > >>> B) I have long theorized that a lot of userspace vpns >bottleneck on >> > >>> the read and encapsulate step, and being strict FIFOs, >> > >>> gradually accumulate delay until finally they run out of read >socket >> > >>> buffer space and start dropping packets. >> > >> >> > >> >> > >> Well, encryption and decryption takes a lot of CPU time, but >context >> > >> switches are also bad. >> > >> >> > >> Tinc is treating UDP in a strictly FIFO way, but actually it >does use a >> > >> RED algorithm when tunneling over TCP. That said, it only looks >at its >> > >> own buffers to determine when to drop packets, and those only >come into >> > >> play once the kernel's TCP buffers are filled. >> > >> >> > >>> so I had a couple thoughts towards using multiple rx queues in >the >> > >>> vtun interface, and/or trying to read more than one packet at a >time >> > >>> (via recvmmsg) and do some level of fair queueing and queue >> > management >> > >>> (codel) inside tinc itself. I think that's >> > >>> pretty doable without modifying the protocol any, but I'm not >sure >> > of >> > >>> it's value until I saturate some cpu more. >> > >> >> > >> >> > >> I'd welcome any work in this area :) >> > >> >> > >>> (and if you thought recvmsg was complex, look at recvmmsg) >> > >> >> > >> >> > >> It seems someone is already working on that, see >> > >> https://github.com/jasdeep-hundal/tinc. >> > >> >> > >>> D) >> > >>> >> > >>> the bottleneck link above is actually not tinc but the gateway, >and >> > as >> > >>> the gateway reverts to codel behavior on a single encapsulated >flow >> > >>> encapsulating all the other flows, we end up with about 40ms of >> > >>> induced delay on this test. While I have a better codel (gets >below >> > >>> 20ms latency, not deployed), *fq*_codel by identifying >individual >> > >>> flows gets the induced delay on those flows down below 5ms. >> > >> >> > >> >> > >> But that should improve with ECN if fq_codel is configured to >use that, >> > >> right? >> > >> >> > >>> At one level, tinc being so nicely meshy means that the "fq" >part of >> > >>> fq_codel on the gateway will have more chance to work against >the >> > >>> multiple vpn flows it generates for all the potential vpn >> > endpoints... >> > >>> >> > >>> but at another... lookie here! ipv6! 2^64 addresses or more to >use! >> > >>> and port space to burn! What if I could make tinc open up 1024 >ports >> > >>> per connection, and have it fq all it's flows over those? What >could >> > >>> go wrong? >> > >> >> > >> >> > >> Right, hash the header of the original packets, and then select >a port >> > >> or address based on the hash? What about putting that hash in >the flow >> > >> label of outer packets? Any routers that would actually treat >those as >> > >> separate flows? >> > > >> > > >> > > -- Sent from my Android device with K-@ Mail. Please excuse my >brevity. >> > > >> > > _______________________________________________ >> > > Cerowrt-devel mailing list >> > > Cerowrt-devel@lists.bufferbloat.net >> > > https://lists.bufferbloat.net/listinfo/cerowrt-devel >> > > >> > >> > >> > >> > -- >> > Dave Täht >> > >> > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks >> > >> _______________________________________________ >> Cerowrt-devel mailing list >> Cerowrt-devel@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Sent from my Android device with K-@ Mail. Please excuse my brevity. [-- Attachment #2: Type: text/html, Size: 27480 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-04 15:30 ` David P. Reed @ 2014-12-04 19:03 ` Dave Taht 0 siblings, 0 replies; 10+ messages in thread From: Dave Taht @ 2014-12-04 19:03 UTC (permalink / raw) To: David P. Reed; +Cc: cerowrt-devel On Thu, Dec 4, 2014 at 7:30 AM, David P. Reed <dpreed@reed.com> wrote: > I'd be more likely to agree if I thought that the network level technologies > could work. The problem is that I've been in the system security business > long enough (starting in 1973 in a professional role) that I know how > useless the network level techniques are and how little is gained by > tinkering with them to improve them. Scada systems cannot be secured by > addons, period. Even airgap didn't protect against Stuxnet. > > It's approximately the same as security theater to think that home nets can > be secured by a fancy vpn. That only deals with a single threat model and > the solution really does not scale at all. Take the attack surface presented by IPMI. Please. http://www.fish2.com/ipmi/ > It just lets designers of home > systems off the hook so they can promote inherently bad designs by saying > they don't need to fix their designs. The context I have here is trying to come up with better models for security in the upcoming ipv6 world. ULAs are a possible part of that, for example. Applying tinc-like techniques to a new tor-like protocol, for another. e2e encryption can kind of make things work worse. Why, exactly, is my lightbulb sending a 500 byte packet to nsa.gov? > As soon as a hacker can control most of the stuff in a rich person's home > because of the IoT craze, we will see for profit rings promoting ransom > attacks. Download the (20 Megabyte!) "wemo" app for android. Look at all the info it requires in order to operate your IoT stuff.... (someone paste the list here, I am not in front of an android box right now) It's horrifying. Yes, your smart lightbulbs apparently need all these privs to operate. More dryly amusing today was this bit about usb malware. https://plus.google.com/u/0/107942175615993706558/posts We have billions of threats outside the local network to deal with, and many within. I wish I could be as sangine and satisfied as the e2e argument david makes, but me, I keep wanting to find a tropic island, or asteroid, with no internet to deal with, as a safe haven from the eventual cataclysmic disaster. http://xkcd.com/865/ > > Vpn's aren't likely to fix that at the network level. So my point is a > little subtle. Put effort where it pays off. At the end to end > authentication interoperability level rather than fantasy based solutions > that just break the network. At the creation of systems for attribution and > policing and prosecution of the motivated conspirators. > > On Dec 4, 2014, Sebastian Moeller <moeller0@gmx.de> wrote: >> >> Hi, >> >> on the danger of going off on a tangent... >> >> On Dec 4, 2014, at 01:45 , dpreed@reed.com wrote: >> >>> Awesome start on the issue, in your note, Dave. Tor needs to change for >>> several reasons - not that it isn't great, but with IPv6 and other things >>> coming on line, plus the understanding of fq_codel's rationale, plus ... - >>> the world can do much better. Same with VPNs. >>> >>> I hope we can set our sights on a convergent target that doesn't get >>> bogged down in the tradeoffs that were made when VPNs were originally >>> proposed. The world is no longer a bunch of disconnected networks protected >>> by Cheswick firewalls. Cheswick said they were only temporary, and they've >>> outlived their usefulness - they actually create security risks more than >>> they fix them (centralizing security creates points of failure and attack >>> that exponentially decrease the attackers' work factor). To some extent that >>> is also true for Tor after these many years. >> >> >> But trying to keep all computers on the end/edge secure also does not >> work/scale well, so both ends of the continuum have their issues; I would >> not be amazed if realistically we need to keep doing both… securing the end >> devices as well as intermediary devices. >> >> >>> By putting the intelligence about security in the network, you basically >>> do all the bad things that the end-to-end argument encourages you to avoid. >> >> >> I might misinterpret your point here, but given the devices people connect >> to their own networks full e2e without added layers of security seems not >> very practical. There is an ever growing class of devices orphaned by their >> makers (either explicitly like old ipods, or implicitly by lack of timely >> security fixes like Siemens SCADA systems, plus old but useful hardware >> requiring obsolete operating systems like windows XP, the list goes on...) >> that still can be used to good effect in a secured network but can not be >> trusted to access the wider internet, let alone be contacted by the wider >> internet. So unless we want to retire all those devices of dubious >> “security” we need a layer in the network that can preempt traffic to and >> from specific devices. In the old IPv4 days the for “end-users” ubiquitous >> NAT tool care of the “traffic to specific devices” to some degree. I would >> be happy if even in the brave new IPv6 world we could keep such >> gatekeepers/bouncers around, ideally also controlling which devices can send >> packets to the internet. >> I do not propose to put these things into the core of the network, but the >> boundary between a implicitly trusted home-network and the internet seems >> like a decent compromise to me. (I would also like such a device to default >> to "no implicit connectivity”, so that each device needs to be manually >> declared fit for the internet, so that the users are aware of this system). >> Since the number of connections between the home-net and the internet often >> is smaller than the number of connected devices in such a network, the >> transfer points/routers seem like ideal candidates to implement the “access >> control”. . (This does not mean that keeping end systems not secured and >> patched is a good idea, but at least should greatly diminish the risk >> imposed by sub-optimally secured end points, I think/hope). >> Being a biologist I like to think about this as maintaining a special >> niche for hard/impossible to secure devices in my home, avoiding their >> extinction/pawning by keeping the predators away; as fitness is relative. >> Might not work perfectly, but “good enough” would do ;) >> To cite the russians: Dowerjai, no prowerjai, "Trust, but verify”… >> >> >>> We could also put congestion control in the network by re-creating >>> admission control and requiring contractual agreements to carry traffic >>> across every intermediary. But I think that basically destroys almost all >>> the value of an "inter" net. It makes it a balkanized proprietary set of >>> subnets that have dozens of reasons why you can't connect with anyone else, >>> and no way to be free to connect. >>> >>> >>> >>> >>> >>> On Wednesday, December 3, 2014 2:44pm, "Dave Taht" <dave.taht@gmail.com> >>> said: >>> >>>> On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> wrote: >>>>> >>>>> Tor needs this stuff very badly. >>>> >>>> >>>> Tor has many, many problematic behaviors relevant to congestion control >>>> in general. Let me paste a bit of private discussion I'd had on it in a >>>> second, >>>> but a very good paper that touched upon it all was: >>>> >>>> DefenestraTor: Throwing out Windows in Tor >>>> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf >>>> >>>> Honestly tor needs to move to udp, and hide in all the upcoming >>>> webrtc traffic.... >>>> >>>> >>>> http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/ >>>> >>>> webrtc needs some sort of non-centralized rendezvous mechanism, but I am >>>> REALLY >>>> happy to see calls and video stay entirely inside my network when they >>>> can be >>>> negotiated as such. >>>> >>>> https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P >>>> >>>> And of course, people are busily reinventing torrent in webrtc without >>>> paying attention to congestion control at all. >>>> >>>> https://github.com/feross/webtorrent/issues/39 >>>> >>>> Giving access to udp to javascript programmers... what could go wrong? >>>> :/ >>>> >>>>> I do wonder whether we should focus on vpn's rather than end to end >>>>> encryption that does not leak secure information through from inside as >>>>> the >>>>> plan seems to do. >>>> >>>> >>>> "plan"? >>>> >>>> I like e2e encryption. I also like overlay networks. And meshes. >>>> And working dns and service discovery. And low latency. >>>> >>>> vpns are useful abstractions for sharing an address space you >>>> may not want to share more widely. >>>> >>>> and: I've taken a lot of flack about how fq doesn't help on conventional >>>> vpns, and well, just came up with an unconventional vpn idea, >>>> that might have some legs here... (certainly in my case tinc >>>> as constructed already, no patches, solves hooking together the >>>> 12 networks I have around the globe, mostly) >>>> >>>> As for "leaking information", packet size and frequency is generally >>>> an obvious indicator of a given traffic type, some padding added or >>>> no. There is one piece of plaintext >>>> in tinc (the seqno), also. It also uses a fixed port number for both >>>> sides of the connection (perhaps it shouldn't) >>>> >>>> So I don't necessarily see a difference between sending a whole lot of >>>> varying data on one tuple >>>> >>>> 2001:db8::1 <-> 2001:db8:1::1 on port 655 >>>> >>>> vs >>>> >>>> 2001:db8::1 <-> 2001:db8:1::1 port 655 >>>> 2001:db8::2 <-> 2001:db8:1::1 port 655 >>>> 2001:db8::3 <-> 2001:db8:1::1 port 655 >>>> 2001:db8::4 <-> 2001:db8:1::1 port 655 >>>> .... >>>> >>>> which solves the fq problem on a vpn like tinc neatly. A security >>>> feature >>>> could be source specific routing where we send stuff over different >>>> paths >>>> from different ipv6 source addresses... and mixing up the src/dest ports >>>> more but that complexifies the fq portion of the algo.... my thought >>>> for an initial implementation is to just hard code the ipv6 address >>>> range. >>>> >>>> I think however that adding tons and tons of ipv6 addresses to a given >>>> interface is probably slow, >>>> and might break things like nd and/or multicast... >>>> >>>> what would be cooler would be if you could allocate an entire /64 (or >>>> /118) to the vpn daemon >>>> >>>> bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) >>>> >>>> but I am not sure how to go about doing that.. >>>> >>>> ...moving back to a formerly private discussion about tors woes... >>>> >>>> >>>> "This conversation is a bit separate from #11197 (which is an >>>> implementation issue in obfsproxy), so separate discussion somewhere >>>> would probably be required. >>>> >>>> So, there appears to be a slight misconception on how tor traffic >>>> travels across the Internet that I will attempt to clarify, and >>>> hopefully not get too terribly wrong. >>>> >>>> Each step of a given connection over tor involves multiple TCP/IP >>>> connections. To use a standard example of someone trying to watch Cat >>>> Videos on the "real internet", it will look approximately like thus: >>>> >>>> Client <-> Guard <-> Relay <-> Exit <-> Cat Videos >>>> >>>> Each step is a separate TCP/IP connection, authenticated and encrypted >>>> via TLS (TLS is likewise terminated at each hop). Using a pluggable >>>> transport encapsulates the first hop's TLS session with a different >>>> protocol be it obfs2, obfs3, or something else. >>>> >>>> The cat videos are passed through this path of many TCP/IP connections >>>> across things called Circuits that are created/extended by the Client >>>> one hop at a time (So the example above, the kitty cats travel across >>>> 4 TCP/IP connections, relaying data across a Circuit that spans from >>>> the Client to the Exit. If my art skills were up to it, I would draw a >>>> diagram.). >>>> >>>> Circuits are currently required to provide reliable, in-order delivery. >>>> >>>> In addition to the standard congestion control provided by TCP/IP on a >>>> per-hop basis, there is Circuit level flow control *and* "end to end" >>>> flow control in the form of RELAY_SENDME cells, but given that multiple >>>> circuits can end up being multiplexed over a singlular TCP/IP >>>> connection, propagation of these RELAY_SENDME cells can get delayed due >>>> to HOL issues. >>>> >>>> So, with that quick and dirty overview out of the way: >>>> >>>> * "Ah so if ecn is enabled it can be used?" >>>> >>>> ECN will be used if it is enabled, *but* the congestion information >>>> will not get propaged to the source/destination of a given stream. >>>> >>>> * "Does it retain iw10 (the Linux default nowadays sadly)?" >>>> >>>> Each TCP/IP connection if sent from a host that uses a obnoxiously >>>> large initial window, will have an obnoxiously large initial >>>> window. >>>> >>>> It is worth noting that since multiple Circuits originating from >>>> potentially numerous clients can and will reuse existing TCP/IP >>>> connections if able to (see 5.3.1 of the tor spec) that dropping packets >>>> between tor relays is kind of bad, because all of the separate >>>> encapsulated flows sharing the singular TCP/IP link will suffer (ECN >>>> would help here). This situation is rather unfortunate as the good >>>> active queue management algorithms drop packets (when ECN is not >>>> available). >>>> >>>> A better summary of tor's flow control/bufferbloat woes is given in: >>>> >>>> DefenestraTor: Throwing out Windows in Tor >>>> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf >>>> >>>> The N23 algorithm suggested in the paper did not end up getting >>>> implemented into Tor, but I do not remember the reason off the top of >>>> my head." >>>> >>>> >>>> >>>> >>>> >>>>> On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote: >>>>> >>>>>> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>> https://github.com/dtaht/tinc >>>>>>> >>>>>>> I successfully converted tinc to use sendmsg and recvmsg, acquire >>>> >>>> (at >>>>>>> >>>>>>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet >>>> >>>> fields, >>>> >>>> >>>>>> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. >>>>>> >>>>>>> as well as SO_TIMESTAMPNS, and use a higher resolution internal >>>> >>>> clock. >>>>>>> >>>>>>> Got passing through the dscp values to work also, but: >>>>>>> >>>>>>> A) encapsulation of ecn capable marked packets, and availability in >>>>>>> the outer header, without correct decapsulationm doesn't work well. >>>>>>> >>>>>>> The outer packet gets marked, but by default the marking doesn't >>>> >>>> make >>>>>>> >>>>>>> it back into the inner packet when decoded. >>>>>> >>>>>> >>>>>> >>>>>> Is the kernel stripping the ECN bits provided by userspace? In the >>>>>> code >>>>>> in your git branch you strip the ECN bits out yourself. >>>>>> >>>>>>> So communicating somehow that a path can take ecn (and/or diffserv >>>>>>> markings) is needed between tinc daemons. I thought of perhaps >>>>>>> crafting a special icmp message marked with CE but am open to ideas >>>>>>> that would be backward compatible. >>>>>> >>>>>> >>>>>> >>>>>> PMTU probes are used to discover whether UDP works and how big the >>>>>> path >>>>>> MTU is, maybe it could be used to discover whether ECN works as well? >>>>>> Set one of the ECN bits on some of the PMTU probes, and if you receive >>>>>> a >>>>>> probe with that ECN bit set, also set it on the probe reply. If you >>>>>> succesfully receive a reply with ECN bits set, then you know ECN >>>>>> works. >>>>>> Since the remote side just echoes the contents of the probe, you could >>>>>> also put a copy of the ECN bits in the probe payload, and then you can >>>>>> detect if the ECN bits got zeroed. You can also define an OPTION_ECN >>>>>> in >>>>>> src/connection.h, so nodes can announce their support for ECN, but >>>>>> that >>>>>> should not be necessary I think. >>>>>> >>>>>>> B) I have long theorized that a lot of userspace vpns bottleneck on >>>>>>> the read and encapsulate step, and being strict FIFOs, >>>>>>> gradually accumulate delay until finally they run out of read socket >>>>>>> buffer space and start dropping packets. >>>>>> >>>>>> >>>>>> >>>>>> Well, encryption and decryption takes a lot of CPU time, but context >>>>>> switches are also bad. >>>>>> >>>>>> Tinc is treating UDP in a strictly FIFO way, but actually it does use >>>>>> a >>>>>> RED algorithm when tunneling over TCP. That said, it only looks at its >>>>>> own buffers to determine when to drop packets, and those only come >>>>>> into >>>>>> play once the kernel's TCP buffers are filled. >>>>>> >>>>>>> so I had a couple thoughts towards using multiple rx queues in the >>>>>>> vtun interface, and/or trying to read more than one packet at a time >>>>>>> (via recvmmsg) and do some level of fair queueing and queue >>>> >>>> management >>>>>>> >>>>>>> (codel) inside tinc itself. I think that's >>>>>>> pretty doable without modifying the protocol any, but I'm not sure >>>> >>>> of >>>>>>> >>>>>>> it's value until I saturate some cpu more. >>>>>> >>>>>> >>>>>> >>>>>> I'd welcome any work in this area :) >>>>>> >>>>>>> (and if you thought recvmsg was complex, look at recvmmsg) >>>>>> >>>>>> >>>>>> >>>>>> It seems someone is already working on that, see >>>>>> https://github.com/jasdeep-hundal/tinc. >>>>>> >>>>>>> D) >>>>>>> >>>>>>> the bottleneck link above is actually not tinc but the gateway, and >>>> >>>> as >>>>>>> >>>>>>> the gateway reverts to codel behavior on a single encapsulated flow >>>>>>> encapsulating all the other flows, we end up with about 40ms of >>>>>>> induced delay on this test. While I have a better codel (gets below >>>>>>> 20ms latency, not deployed), *fq*_codel by identifying individual >>>>>>> flows gets the induced delay on those flows down below 5ms. >>>>>> >>>>>> >>>>>> >>>>>> But that should improve with ECN if fq_codel is configured to use >>>>>> that, >>>>>> right? >>>>>> >>>>>>> At one level, tinc being so nicely meshy means that the "fq" part of >>>>>>> fq_codel on the gateway will have more chance to work against the >>>>>>> multiple vpn flows it generates for all the potential vpn >>>> >>>> endpoints... >>>> >>>>>>> but at another... lookie here! ipv6! 2^64 addresses or more to use! >>>>>>> and port space to burn! What if I could make tinc open up 1024 ports >>>>>>> per connection, and have it fq all it's flows over those? What could >>>>>>> go wrong? >>>>>> >>>>>> >>>>>> >>>>>> Right, hash the header of the original packets, and then select a port >>>>>> or address based on the hash? What about putting that hash in the flow >>>>>> label of outer packets? Any routers that would actually treat those as >>>>>> separate flows? >>>>> >>>>> >>>>> >>>>> -- Sent from my Android device with K-@ Mail. Please excuse my brevity. >>>>> >>>>> ________________________________ >>>>> >>>>> Cerowrt-devel mailing list >>>>> Cerowrt-devel@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Dave Täht >>>> >>>> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks >>> >>> >>> ________________________________ >>> >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >> >> > > -- Sent from my Android device with K-@ Mail. Please excuse my brevity. -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-03 12:02 ` Guus Sliepen 2014-12-03 14:17 ` David P. Reed @ 2014-12-03 20:32 ` Dave Taht 2014-12-04 18:53 ` Dave Taht 1 sibling, 1 reply; 10+ messages in thread From: Dave Taht @ 2014-12-03 20:32 UTC (permalink / raw) To: tinc-devel, Guus Sliepen, cerowrt-devel On Wed, Dec 3, 2014 at 4:02 AM, Guus Sliepen <guus@tinc-vpn.org> wrote: > On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: > > [...] >> https://github.com/dtaht/tinc >> >> I successfully converted tinc to use sendmsg and recvmsg, acquire (at >> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields, > > Windows does not have sendmsg()/recvmsg(), but the BSDs support it. > >> as well as SO_TIMESTAMPNS, and use a higher resolution internal clock. >> Got passing through the dscp values to work also, but: >> >> A) encapsulation of ecn capable marked packets, and availability in >> the outer header, without correct decapsulationm doesn't work well. >> >> The outer packet gets marked, but by default the marking doesn't make >> it back into the inner packet when decoded. > > Is the kernel stripping the ECN bits provided by userspace? In the code > in your git branch you strip the ECN bits out yourself. Linux, at least, gives access to all 8 bits of the tos field on udp. Windows does not, unless you have admin privs. Don't know about other OSes. The comment there: tos = origpkt->tos & ~0x3 ; // chicken out on passing ecn for now was due to seeing this happen otherwise (talking to a tinc not yet modified to decapsulate ecn markings correctly) http://snapon.lab.bufferbloat.net/~d/tinc/ecn.png and was awaiting some thought on a truth table derived from the relevant rfc (which I think is slightly wrong, btw), and further thought on determining if ecn could be used on that path. certainly I could deploy a tinc modified to assume ecn was in use, (and may, shortly!) with the right truth table. There was a comment higher up in the file also - I would like to decrement hopcount/ttl on the encapsulated packet by the actual number of hops in the overlay path, not by one, as is the default here, and in many other vpns. This would decrease the damage caused by routing loops. >> So communicating somehow that a path can take ecn (and/or diffserv >> markings) is needed between tinc daemons. I thought of perhaps >> crafting a special icmp message marked with CE but am open to ideas >> that would be backward compatible. > > PMTU probes are used to discover whether UDP works and how big the path > MTU is, maybe it could be used to discover whether ECN works as well? Yes. > Set one of the ECN bits on some of the PMTU probes, and if you receive a > probe with that ECN bit set, also set it on the probe reply. This is an encapsulated packet vs an overt ping? Seems saner to test over the encapsulation in this case. >If you > succesfully receive a reply with ECN bits set, then you know ECN works. Well it should test for both CE and ECT(0) being set on separate packets. > Since the remote side just echoes the contents of the probe, you could > also put a copy of the ECN bits in the probe payload, and then you can > detect if the ECN bits got zeroed. You can also define an OPTION_ECN in > src/connection.h, so nodes can announce their support for ECN, but that > should not be necessary I think. Not sure. > >> B) I have long theorized that a lot of userspace vpns bottleneck on >> the read and encapsulate step, and being strict FIFOs, >> gradually accumulate delay until finally they run out of read socket >> buffer space and start dropping packets. > > Well, encryption and decryption takes a lot of CPU time, but context > switches are also bad. > > Tinc is treating UDP in a strictly FIFO way, but actually it does use a > RED algorithm when tunneling over TCP. That said, it only looks at its One of these days I'll get around to writing a userspace codel lib in pure C. Or someone else will. The C++ versions in ns2, ns3, and mahimahi are hard to read. My currently pretty elegant codel2.h might be a starting point, if only I could solve count increasing without bound sanely. > own buffers to determine when to drop packets, and those only come into > play once the kernel's TCP buffers are filled. TCP small queues (TSQ) and BQL should be a big boon to vpn and tor users. >> so I had a couple thoughts towards using multiple rx queues in the >> vtun interface, and/or trying to read more than one packet at a time >> (via recvmmsg) and do some level of fair queueing and queue management >> (codel) inside tinc itself. I think that's >> pretty doable without modifying the protocol any, but I'm not sure of >> it's value until I saturate some cpu more. > > I'd welcome any work in this area :) Well, I have to get packet timestamping to give sane results, and then come up with saturating workloads for my hardware. This is easy for cerowrt - I doubt the mips 640mhz processor can encrypt and push even as much as 2mbit/sec.... but my "vision" such as it was, was to toss a beaglebone box in as a vpn gateway instead, (on comcast's dynamically assigned ipv6 networks) and maybe fiddle with the http://cryptotronix.com/products/cryptocape/ which has a new kernel driver.... (it was a weekend, it was raining, I needed to get to my lab in los gatos from gf's in SF and ssh tunneling and portforwarding was getting bothersome... so I hacked on tinc. :) ) >> (and if you thought recvmsg was complex, look at recvmmsg) > > It seems someone is already working on that, see > https://github.com/jasdeep-hundal/tinc. Seemed to be mostly windows related hacking. I am not ready to consider all the infrastructure required to accumulate and manage packets inside of tinc, nor (after fighting with recvmsg/sendmsg for 2 days) ready to tackle recvmmsg... or threads and ringbuffers and all the headache that entails. >> D) >> >> the bottleneck link above is actually not tinc but the gateway, and as >> the gateway reverts to codel behavior on a single encapsulated flow >> encapsulating all the other flows, we end up with about 40ms of >> induced delay on this test. While I have a better codel (gets below >> 20ms latency, not deployed), *fq*_codel by identifying individual >> flows gets the induced delay on those flows down below 5ms. > > But that should improve with ECN if fq_codel is configured to use that, > right? Meh. Ecn is very useful on very short or very long paths where packet loss as an indicator of congestion is hurtful. In the general case it adds a tiny bit to overall latency for other flows as congestion is not cleared for an RTT, instead of at the bottleneck, with a loss. This is still overly optimistic, IMHO: https://tools.ietf.org/html/draft-ietf-aqm-ecn-benefits-00 current linux pie, red and codel do not enable ecn by default, currently. Arguably pie could (because it has overload protection), but codel, no. Have a version of codel and fq_codel (and cake) that do ecn overload protection, and enable ecn by default, am testing... fq_codel enables ECN by default, (overload does very little harm) but openwrt (not cerowrt) turns it off on their qos-scripts. It's half on by default in sqm-scripts, and works pretty well if you have enough bandwidth - I routinely run a few low latency networks with near zero packet loss, and near-perfect utilization... which impresses me, at least... ECN makes me nervous in general when enabled outside the datacenter, but as something like 60% of the alexa top 1million will enable ecn if asked nowadays, I hope that that worry extends to enough more people for me to worry less. http://ecn.ethz.ch/ I am concerned that enabling ECN generally breaks Tor over tcp even worse at the moment.... (I hadn't thought about it til my last message) Certainly I think ECN is a great idea for vpns so long as it is implemented correctly, although my understanding of CTR mode over udp is that loss hurts not, and neither does reordering? In tinc: what if I get a packet with a seqno 5 after receiving packets with seq 1-4,6-255. does that get dropped due to the replay protection, or (if it passes muster) get decrypted and forwarded even after that much reordering? (I am all in favor of not worrying about reordering much. wifi aps tend to do it a lot, so do route flaps, and linux tcp, at least, is now VERY resistant to reordering problems, handling megabytes of out of order delivery problems with aplomb. windows on the other hand, sucks in this department, still) > >> At one level, tinc being so nicely meshy means that the "fq" part of >> fq_codel on the gateway will have more chance to work against the >> multiple vpn flows it generates for all the potential vpn endpoints... >> >> but at another... lookie here! ipv6! 2^64 addresses or more to use! >> and port space to burn! What if I could make tinc open up 1024 ports >> per connection, and have it fq all it's flows over those? What could >> go wrong? > > Right, hash the header of the original packets, and then select a port > or address based on the hash? Yes. I am leaning towards ipv6 address rather than port, you rapidly run out of ports in ipv4, and making this an ipv6 specific feature seems safer to test. I look forward to messing up the expectations of many a stateful ipv6 firewall.... >What about putting that hash in the flow > label of outer packets? Any routers that would actually treat those as > separate flows? The flow label was a pretty good idea shot down by too many people arguing over the bits. I don't think there is a lot of useful information stored there in any coherent way, (it's too bad that the vxlan stuff added a prepended header, instead of just using the flowlabel) so it is best to just hash the main headers and whatever inner headers you can obtain, as per http://lxr.free-electrons.com/source/net/core/flow_dissector.c#L54 and https://github.com/torvalds/linux/blob/master/net/sched/sch_fq_codel.c#L70 I have quibble with the jhash3 here, as the present treatment of ipv6 is the very efficient but not very hashy addr[0] ^ addr[2] ^ addr[3] ^ addr[4] (somewhere in the code), instead of feeding all the bits to the hash function(s). > > -- > Met vriendelijke groet / with kind regards, > Guus Sliepen <guus@tinc-vpn.org> > > _______________________________________________ > tinc-devel mailing list > tinc-devel@tinc-vpn.org > http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc-devel -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support 2014-12-03 20:32 ` Dave Taht @ 2014-12-04 18:53 ` Dave Taht 0 siblings, 0 replies; 10+ messages in thread From: Dave Taht @ 2014-12-04 18:53 UTC (permalink / raw) To: tinc-devel, Guus Sliepen, cerowrt-devel [-- Attachment #1: Type: text/plain, Size: 12348 bytes --] On Wed, Dec 3, 2014 at 12:32 PM, Dave Taht <dave.taht@gmail.com> wrote: > On Wed, Dec 3, 2014 at 4:02 AM, Guus Sliepen <guus@tinc-vpn.org> wrote: >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: >> >> [...] >>> https://github.com/dtaht/tinc >>> >>> I successfully converted tinc to use sendmsg and recvmsg, acquire (at >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields, >> >> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. >> >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal clock. >>> Got passing through the dscp values to work also, but: >>> >>> A) encapsulation of ecn capable marked packets, and availability in >>> the outer header, without correct decapsulationm doesn't work well. >>> >>> The outer packet gets marked, but by default the marking doesn't make >>> it back into the inner packet when decoded. >> >> Is the kernel stripping the ECN bits provided by userspace? In the code >> in your git branch you strip the ECN bits out yourself. > > Linux, at least, gives access to all 8 bits of the tos field on udp. OSX appears to do so also, at least on ipv6. Jonathon morton wrote some code to test the ideas here: http://snapon.lab.bufferbloat.net/~d/udp-tos.c and I have similar but buggy code in my isochronous repo on github (udpburst), where I struggled with v6mapped and ipv4 sockets for a while before giving up. > Windows does not, unless you have admin privs. Don't know > about other OSes. > > The comment there: > > tos = origpkt->tos & ~0x3 ; // chicken out on passing ecn for now > > was due to seeing this happen otherwise (talking to a tinc not yet > modified to decapsulate ecn markings correctly) > > http://snapon.lab.bufferbloat.net/~d/tinc/ecn.png > > and > > was awaiting some thought on a truth table derived from the > relevant rfc (which I think is slightly wrong, btw), and further > thought on determining if ecn could be used on that path. Continuing to work with this, patch attached, haven't worked on the dencapsulation step yet... not clear to me if there is going to get state needed in connections_t... Two issues still to wrap my head around. It's not clear to me when a tinc daemon might forward an already encapsulated packet to another relay and if it does it over udp. (?) If so, the original IP headers in the packet can be lost or modified further enroute, so if I have CE set on the outside header and am forwarding to a non ECN capable receiver, I should drop the packet, and there may be other nuances. Similarly, when a packet is compressed... > certainly I could deploy a tinc modified to assume ecn was > in use, (and may, shortly!) with the right truth table. > > There was a comment higher up in the file also - > I would like to decrement hopcount/ttl > on the encapsulated packet by the > actual number of hops in the overlay path, not by one, > as is the default here, and in many other vpns. > > This would decrease the damage caused by > routing loops. And going back to the forwarding issue, if over udp, I'd like the total hopcount to be preserved e2e, and passed into the finally decapsulated packet.... > >>> So communicating somehow that a path can take ecn (and/or diffserv >>> markings) is needed between tinc daemons. I thought of perhaps >>> crafting a special icmp message marked with CE but am open to ideas >>> that would be backward compatible. >> >> PMTU probes are used to discover whether UDP works and how big the path >> MTU is, maybe it could be used to discover whether ECN works as well? > > Yes. > >> Set one of the ECN bits on some of the PMTU probes, and if you receive a >> probe with that ECN bit set, also set it on the probe reply. > > This is an encapsulated packet vs an overt ping? Seems saner to test > over the encapsulation in this case. > >>If you >> succesfully receive a reply with ECN bits set, then you know ECN works. > > Well it should test for both CE and ECT(0) being set on separate > packets. > >> Since the remote side just echoes the contents of the probe, you could >> also put a copy of the ECN bits in the probe payload, and then you can >> detect if the ECN bits got zeroed. You can also define an OPTION_ECN in >> src/connection.h, so nodes can announce their support for ECN, but that >> should not be necessary I think. > > Not sure. > >> >>> B) I have long theorized that a lot of userspace vpns bottleneck on >>> the read and encapsulate step, and being strict FIFOs, >>> gradually accumulate delay until finally they run out of read socket >>> buffer space and start dropping packets. >> >> Well, encryption and decryption takes a lot of CPU time, but context >> switches are also bad. >> >> Tinc is treating UDP in a strictly FIFO way, but actually it does use a >> RED algorithm when tunneling over TCP. That said, it only looks at its > > One of these days I'll get around to writing a userspace codel lib > in pure C. Or someone else will. > > The C++ versions in ns2, ns3, and mahimahi are hard to read. My currently > pretty elegant codel2.h might be a starting point, if only > I could solve count increasing without bound sanely. > >> own buffers to determine when to drop packets, and those only come into >> play once the kernel's TCP buffers are filled. > > TCP small queues (TSQ) and BQL should be a big boon to vpn and tor users. > >>> so I had a couple thoughts towards using multiple rx queues in the >>> vtun interface, and/or trying to read more than one packet at a time >>> (via recvmmsg) and do some level of fair queueing and queue management >>> (codel) inside tinc itself. I think that's >>> pretty doable without modifying the protocol any, but I'm not sure of >>> it's value until I saturate some cpu more. >> >> I'd welcome any work in this area :) > > Well, I have to get packet timestamping to give sane results, and then > come up with saturating workloads for my hardware. This is easy for > cerowrt - I doubt the mips 640mhz processor can encrypt and push > even as much as 2mbit/sec.... > > but my "vision" such as it was, was to toss a beaglebone box > in as a vpn gateway instead, (on comcast's dynamically assigned > ipv6 networks) and maybe fiddle with the > > http://cryptotronix.com/products/cryptocape/ > > which has a new kernel driver.... > > (it was a weekend, it was raining, I needed to get to my lab in > los gatos from gf's in SF and > ssh tunneling and portforwarding was getting bothersome... > > so I hacked on tinc. :) ) > >>> (and if you thought recvmsg was complex, look at recvmmsg) >> >> It seems someone is already working on that, see >> https://github.com/jasdeep-hundal/tinc. > > Seemed to be mostly windows related hacking. > > I am not ready to consider all the infrastructure required to > accumulate and manage packets inside of tinc, nor (after > fighting with recvmsg/sendmsg for 2 days) ready to tackle > recvmmsg... or threads and ringbuffers and all the headache > that entails. BUT, if timestamping at the socket layer works like I think it does, codel seems plausible without buffering up any packets in the daemon itself. >>> D) >>> >>> the bottleneck link above is actually not tinc but the gateway, and as >>> the gateway reverts to codel behavior on a single encapsulated flow >>> encapsulating all the other flows, we end up with about 40ms of >>> induced delay on this test. While I have a better codel (gets below >>> 20ms latency, not deployed), *fq*_codel by identifying individual >>> flows gets the induced delay on those flows down below 5ms. >> >> But that should improve with ECN if fq_codel is configured to use that, >> right? > > Meh. Ecn is very useful on very short or very long paths where > packet loss as an indicator of congestion is hurtful. In the general > case it adds a tiny bit to overall latency for other flows as congestion is not > cleared for an RTT, instead of at the bottleneck, with a loss. > > This is still overly optimistic, IMHO: > > https://tools.ietf.org/html/draft-ietf-aqm-ecn-benefits-00 > > current linux pie, red and codel do not enable ecn by default, > currently. Arguably pie could (because it has overload protection), > but codel, no. > > Have a version of codel and fq_codel (and cake) that do > ecn overload protection, and enable ecn by default, am testing... > > fq_codel enables ECN by default, (overload does very little > harm) > > but openwrt (not cerowrt) > turns it off on their qos-scripts. It's half on by default in sqm-scripts, > and works pretty well if you have enough bandwidth - I > routinely run a few low latency networks with near zero packet loss, > and near-perfect utilization... which impresses me, at least... > > ECN makes me nervous in general when enabled outside the > datacenter, but as something like 60% of the alexa top 1million > will enable ecn if asked nowadays, I hope that that worry extends > to enough more people for me to worry less. > > http://ecn.ethz.ch/ > > I am concerned that enabling ECN generally breaks Tor over tcp > even worse at the moment.... (I hadn't thought about it til > my last message) > > Certainly I think ECN is a great idea for vpns so long as it is > implemented correctly, although my understanding of CTR > mode over udp is that loss hurts not, and neither does > reordering? > > In tinc: what if I get a packet with a seqno 5 after receiving packets > with seq 1-4,6-255. does that get dropped due to the replay protection, > or (if it passes muster) get decrypted and forwarded even after that much > reordering? > > (I am all in favor of not worrying about reordering much. wifi aps > tend to do it a lot, so do route flaps, and linux tcp, at least, is > now VERY resistant to reordering problems, handling megabytes > of out of order delivery problems with aplomb. > > windows on the other hand, sucks in this department, still) >> >>> At one level, tinc being so nicely meshy means that the "fq" part of >>> fq_codel on the gateway will have more chance to work against the >>> multiple vpn flows it generates for all the potential vpn endpoints... >>> >>> but at another... lookie here! ipv6! 2^64 addresses or more to use! >>> and port space to burn! What if I could make tinc open up 1024 ports >>> per connection, and have it fq all it's flows over those? What could >>> go wrong? >> >> Right, hash the header of the original packets, and then select a port >> or address based on the hash? > > Yes. I am leaning towards ipv6 address rather than port, you rapidly > run out of ports in ipv4, and making this an ipv6 specific feature > seems safer to test. > > I look forward to messing up the expectations of many a stateful > ipv6 firewall.... > >>What about putting that hash in the flow >> label of outer packets? Any routers that would actually treat those as >> separate flows? > > The flow label was a pretty good idea shot down by too many people > arguing over the bits. I don't think there is a lot of useful information > stored there in any coherent way, (it's too bad that the vxlan stuff > added a prepended header, instead of just using the flowlabel) > > so it is best to just hash the main > headers and whatever inner headers you can obtain, as per > > http://lxr.free-electrons.com/source/net/core/flow_dissector.c#L54 > > and > > https://github.com/torvalds/linux/blob/master/net/sched/sch_fq_codel.c#L70 > > I have quibble with the jhash3 here, as the present treatment > of ipv6 is the very efficient but not very hashy > > addr[0] ^ addr[2] ^ addr[3] ^ addr[4] (somewhere in the code), > instead of feeding all the bits to the hash function(s). > > >> >> -- >> Met vriendelijke groet / with kind regards, >> Guus Sliepen <guus@tinc-vpn.org> >> >> _______________________________________________ >> tinc-devel mailing list >> tinc-devel@tinc-vpn.org >> http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc-devel > > > > -- > Dave Täht > > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks -- Dave Täht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks [-- Attachment #2: working_out_encapsulation_issues.patch --] [-- Type: text/x-patch, Size: 3266 bytes --] diff --git a/src/connection.h b/src/connection.h index 877601f..b27df48 100644 --- a/src/connection.h +++ b/src/connection.h @@ -30,6 +30,9 @@ #define OPTION_TCPONLY 0x0002 #define OPTION_PMTU_DISCOVERY 0x0004 #define OPTION_CLAMP_MSS 0x0008 +#define OPTION_ECN 0x0010 +#define OPTION_DSCP 0x0020 +#define OPTION_MEGAIP 0x0040 typedef struct connection_status_t { unsigned int pinged:1; /* sent ping */ @@ -41,7 +44,10 @@ typedef struct connection_status_t { unsigned int encryptout:1; /* 1 if we can encrypt outgoing traffic */ unsigned int decryptin:1; /* 1 if we have to decrypt incoming traffic */ unsigned int mst:1; /* 1 if this connection is part of a minimum spanning tree */ - unsigned int unused:23; + unsigned int dscp:1; /* 1 if this connection tries to preserve diffserv markings */ + unsigned int ecn:1; /* 1 if this connection respects ecn */ + unsigned int megaip:1; /* 1 if we are going to use many IPs to FQ */ + unsigned int unused:20; } connection_status_t; #include "edge.h" diff --git a/src/net_packet.c b/src/net_packet.c index 4ce83a9..1e0ca21 100644 --- a/src/net_packet.c +++ b/src/net_packet.c @@ -83,6 +83,46 @@ bool localdiscovery = false; */ +/* This needs to have discovered if the path is ecn capable or not */ + +static int ecn_encapsulate(int tos_in, int tos_encap) { + + int ecn_encap = tos_encap & 3; + int ecn_in = tos_in & 3; + +/* If CE is applied on the outer header but ECT(0) | ECT(1) NOT on the + inner, indicate the packet should be dropped */ + + if(ecn_encap == 3) + if (ecn_in & 3) + return tos_in | 3; + else + return -tos_in; + + // Note we could try to do something clever with the ecn nonce here + + return tos_in; +} + + +static int ecn_decapsulate(int tos_in, int tos_encap) { + + int ecn_encap = tos_encap & 3; + int ecn_in = tos_in & 3; + +/* If CE is applied on the outer header but ECT(0) | ECT(1) NOT on the + inner, indicate the packet should be dropped */ + + if(ecn_encap == 3) + if (ecn_in & 3) + return tos_in | 3; + else + return -tos_in; + + return tos_in; + +} + void send_mtu_probe(node_t *n) { vpn_packet_t packet; int len, i; @@ -459,7 +499,24 @@ static void send_udppacket(node_t *n, vpn_packet_t *origpkt) { origlen = inpkt->len; origpriority = inpkt->priority; + int tos; + + if(n->options & OPTION_ECN) { + if((tos = ecn_decapsulate(origpkt->tos, origpkt->tos_outer)) < 0) { + ifdebug(TRAFFIC) logger(LOG_ERR, "CE marked non ECN packet dropped %s (%s)", + n->name, n->hostname); + return; + } + } + if(n->options & OPTION_DSCP) { + tos |= origpkt->tos & ~0x3; + } + + if(!tos && tos != origpkt->tos) { + // FIXME rewrite the inner header + } + /* Compress the packet */ if(n->outcompression) { @@ -555,14 +612,6 @@ static void send_udppacket(node_t *n, vpn_packet_t *origpkt) { // invisible routing loops. FIXME We could also decrease the // TTL by the actual path length, rather than by 1. - int tos; - - if(priorityinheritance) { - tos = origpkt->tos & ~0x3 ; // chicken out on passing ecn for now - } else { - tos = 0; - } - ifdebug(TRAFFIC) logger(LOG_WARNING, "priorityinheritance %d: %d", priorityinheritance, tos); ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-12-04 19:03 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-12-03 8:07 [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support Dave Taht 2014-12-03 12:02 ` Guus Sliepen 2014-12-03 14:17 ` David P. Reed 2014-12-03 19:44 ` Dave Taht 2014-12-04 0:45 ` dpreed 2014-12-04 9:38 ` Sebastian Moeller 2014-12-04 15:30 ` David P. Reed 2014-12-04 19:03 ` Dave Taht 2014-12-03 20:32 ` Dave Taht 2014-12-04 18:53 ` Dave Taht
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox