From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 79D7121F459 for ; Thu, 4 Dec 2014 01:38:46 -0800 (PST) Received: from u-089-cab204a2.am1.uni-tuebingen.de ([134.2.89.3]) by mail.gmx.com (mrgmx002) with ESMTPSA (Nemesis) id 0M9b03-1Y9LZ91V31-00CyGy; Thu, 04 Dec 2014 10:38:40 +0100 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Content-Type: text/plain; charset=windows-1252 From: Sebastian Moeller X-Priority: 3 (Normal) In-Reply-To: <1417653909.838517290@apps.rackspace.com> Date: Thu, 4 Dec 2014 10:38:38 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20141203120246.GO10533@sliepen.org> <892513fe-8e57-4ee9-be7d-423a3afb4fba@reed.com> <1417653909.838517290@apps.rackspace.com> To: dpreed@reed.com X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:waJ131MTDM3dfJ1q7kg2ZjhPdCBvL2/ZpI8zgbqV/WMGcOeuedr cwFSfpycFXejW2+FZNVFBCXKBJyChPmtCo2bqeZPkc5PL1W1xPI/MkztE0iCQGffercVDuL Yk4xMVs1HGr4q1iiK1H8ZMl7MQE2UoNp4I1x8g3dJNk5o73uVW5kM0f9qJUJhDVSPjtpRDS DHvZY3JATATeUU4oGJ9Uw== X-UI-Out-Filterresults: notjunk:1; Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Dec 2014 09:39:16 -0000 Hi, on the danger of going off on a tangent... On Dec 4, 2014, at 01:45 , dpreed@reed.com wrote: > Awesome start on the issue, in your note, Dave. Tor needs to change = for several reasons - not that it isn't great, but with IPv6 and other = things coming on line, plus the understanding of fq_codel's rationale, = plus ... - the world can do much better. Same with VPNs. > =20 > I hope we can set our sights on a convergent target that doesn't get = bogged down in the tradeoffs that were made when VPNs were originally = proposed. The world is no longer a bunch of disconnected networks = protected by Cheswick firewalls. Cheswick said they were only = temporary, and they've outlived their usefulness - they actually create = security risks more than they fix them (centralizing security creates = points of failure and attack that exponentially decrease the attackers' = work factor). To some extent that is also true for Tor after these many = years. But trying to keep all computers on the end/edge secure also = does not work/scale well, so both ends of the continuum have their = issues; I would not be amazed if realistically we need to keep doing = both=85 securing the end devices as well as intermediary devices. >=20 > By putting the intelligence about security in the network, you = basically do all the bad things that the end-to-end argument encourages = you to avoid. =20 I might misinterpret your point here, but given the devices = people connect to their own networks full e2e without added layers of = security seems not very practical. There is an ever growing class of = devices orphaned by their makers (either explicitly like old ipods, or = implicitly by lack of timely security fixes like Siemens SCADA systems, = plus old but useful hardware requiring obsolete operating systems like = windows XP, the list goes on...) that still can be used to good effect = in a secured network but can not be trusted to access the wider = internet, let alone be contacted by the wider internet. So unless we = want to retire all those devices of dubious =93security=94 we need a = layer in the network that can preempt traffic to and from specific = devices. In the old IPv4 days the for =93end-users=94 ubiquitous NAT = tool care of the =93traffic to specific devices=94 to some degree. I = would be happy if even in the brave new IPv6 world we could keep such = gatekeepers/bouncers around, ideally also controlling which devices can = send packets to the internet. I do not propose to put these things into the core of the = network, but the boundary between a implicitly trusted home-network and = the internet seems like a decent compromise to me. (I would also like = such a device to default to "no implicit connectivity=94, so that each = device needs to be manually declared fit for the internet, so that the = users are aware of this system). Since the number of connections between = the home-net and the internet often is smaller than the number of = connected devices in such a network, the transfer points/routers seem = like ideal candidates to implement the =93access control=94. . (This = does not mean that keeping end systems not secured and patched is a good = idea, but at least should greatly diminish the risk imposed by = sub-optimally secured end points, I think/hope). Being a biologist I like to think about this as maintaining a = special niche for hard/impossible to secure devices in my home, avoiding = their extinction/pawning by keeping the predators away; as fitness is = relative. Might not work perfectly, but =93good enough=94 would do ;) To cite the russians: Dowerjai, no prowerjai, "Trust, but = verify=94=85 > We could also put congestion control in the network by re-creating = admission control and requiring contractual agreements to carry traffic = across every intermediary. But I think that basically destroys almost = all the value of an "inter" net. It makes it a balkanized proprietary = set of subnets that have dozens of reasons why you can't connect with = anyone else, and no way to be free to connect. > =20 > =20 > =20 >=20 >=20 > On Wednesday, December 3, 2014 2:44pm, "Dave Taht" = said: >=20 > > On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed = wrote: > > > Tor needs this stuff very badly. > >=20 > > Tor has many, many problematic behaviors relevant to congestion = control > > in general. Let me paste a bit of private discussion I'd had on it = in a second, > > but a very good paper that touched upon it all was: > >=20 > > DefenestraTor: Throwing out Windows in Tor > > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf > >=20 > > Honestly tor needs to move to udp, and hide in all the upcoming > > webrtc traffic.... > >=20 > > = http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hel= lo-webrtc-feature-in-firefox-beta/ > >=20 > > webrtc needs some sort of non-centralized rendezvous mechanism, but = I am REALLY > > happy to see calls and video stay entirely inside my network when = they can be > > negotiated as such. > >=20 > > https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P > >=20 > > And of course, people are busily reinventing torrent in webrtc = without > > paying attention to congestion control at all. > >=20 > > https://github.com/feross/webtorrent/issues/39 > >=20 > > Giving access to udp to javascript programmers... what could go = wrong? > > :/ > >=20 > > > I do wonder whether we should focus on vpn's rather than end to = end > > > encryption that does not leak secure information through from = inside as the > > > plan seems to do. > >=20 > > "plan"? > >=20 > > I like e2e encryption. I also like overlay networks. And meshes. > > And working dns and service discovery. And low latency. > >=20 > > vpns are useful abstractions for sharing an address space you > > may not want to share more widely. > >=20 > > and: I've taken a lot of flack about how fq doesn't help on = conventional > > vpns, and well, just came up with an unconventional vpn idea, > > that might have some legs here... (certainly in my case tinc > > as constructed already, no patches, solves hooking together the > > 12 networks I have around the globe, mostly) > >=20 > > As for "leaking information", packet size and frequency is generally > > an obvious indicator of a given traffic type, some padding added or > > no. There is one piece of plaintext > > in tinc (the seqno), also. It also uses a fixed port number for both > > sides of the connection (perhaps it shouldn't) > >=20 > > So I don't necessarily see a difference between sending a whole lot = of > > varying data on one tuple > >=20 > > 2001:db8::1 <-> 2001:db8:1::1 on port 655 > >=20 > > vs > >=20 > > 2001:db8::1 <-> 2001:db8:1::1 port 655 > > 2001:db8::2 <-> 2001:db8:1::1 port 655 > > 2001:db8::3 <-> 2001:db8:1::1 port 655 > > 2001:db8::4 <-> 2001:db8:1::1 port 655 > > .... > >=20 > > which solves the fq problem on a vpn like tinc neatly. A security = feature > > could be source specific routing where we send stuff over different = paths > > from different ipv6 source addresses... and mixing up the src/dest = ports > > more but that complexifies the fq portion of the algo.... my thought > > for an initial implementation is to just hard code the ipv6 address = range. > >=20 > > I think however that adding tons and tons of ipv6 addresses to a = given > > interface is probably slow, > > and might break things like nd and/or multicast... > >=20 > > what would be cooler would be if you could allocate an entire /64 = (or > > /118) to the vpn daemon > >=20 > > bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) > >=20 > > but I am not sure how to go about doing that.. > >=20 > > ...moving back to a formerly private discussion about tors woes... > >=20 > >=20 > > "This conversation is a bit separate from #11197 (which is an > > implementation issue in obfsproxy), so separate discussion somewhere > > would probably be required. > >=20 > > So, there appears to be a slight misconception on how tor traffic > > travels across the Internet that I will attempt to clarify, and > > hopefully not get too terribly wrong. > >=20 > > Each step of a given connection over tor involves multiple TCP/IP > > connections. To use a standard example of someone trying to watch = Cat > > Videos on the "real internet", it will look approximately like thus: > >=20 > > Client <-> Guard <-> Relay <-> Exit <-> Cat Videos > >=20 > > Each step is a separate TCP/IP connection, authenticated and = encrypted > > via TLS (TLS is likewise terminated at each hop). Using a pluggable > > transport encapsulates the first hop's TLS session with a different > > protocol be it obfs2, obfs3, or something else. > >=20 > > The cat videos are passed through this path of many TCP/IP = connections > > across things called Circuits that are created/extended by the = Client > > one hop at a time (So the example above, the kitty cats travel = across > > 4 TCP/IP connections, relaying data across a Circuit that spans from > > the Client to the Exit. If my art skills were up to it, I would draw = a > > diagram.). > >=20 > > Circuits are currently required to provide reliable, in-order = delivery. > >=20 > > In addition to the standard congestion control provided by TCP/IP on = a > > per-hop basis, there is Circuit level flow control *and* "end to = end" > > flow control in the form of RELAY_SENDME cells, but given that = multiple > > circuits can end up being multiplexed over a singlular TCP/IP > > connection, propagation of these RELAY_SENDME cells can get delayed = due > > to HOL issues. > >=20 > > So, with that quick and dirty overview out of the way: > >=20 > > * "Ah so if ecn is enabled it can be used?" > >=20 > > ECN will be used if it is enabled, *but* the congestion information > > will not get propaged to the source/destination of a given stream. > >=20 > > * "Does it retain iw10 (the Linux default nowadays sadly)?" > >=20 > > Each TCP/IP connection if sent from a host that uses a obnoxiously > > large initial window, will have an obnoxiously large initial > > window. > >=20 > > It is worth noting that since multiple Circuits originating from > > potentially numerous clients can and will reuse existing TCP/IP > > connections if able to (see 5.3.1 of the tor spec) that dropping = packets > > between tor relays is kind of bad, because all of the separate > > encapsulated flows sharing the singular TCP/IP link will suffer (ECN > > would help here). This situation is rather unfortunate as the good > > active queue management algorithms drop packets (when ECN is not > > available). > >=20 > > A better summary of tor's flow control/bufferbloat woes is given in: > >=20 > > DefenestraTor: Throwing out Windows in Tor > > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf > >=20 > > The N23 algorithm suggested in the paper did not end up getting > > implemented into Tor, but I do not remember the reason off the top = of > > my head." > >=20 > >=20 > > > > > > > > > > > > On Dec 3, 2014, Guus Sliepen wrote: > > >> > > >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: > > >> > > >> [...] > > >>> > > >>> https://github.com/dtaht/tinc > > >>> > > >>> I successfully converted tinc to use sendmsg and recvmsg, = acquire > > (at > > >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet > > fields, > > >> > > >> > > >> Windows does not have sendmsg()/recvmsg(), but the BSDs support = it. > > >> > > >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal > > clock. > > >>> Got passing through the dscp values to work also, but: > > >>> > > >>> A) encapsulation of ecn capable marked packets, and availability = in > > >>> the outer header, without correct decapsulationm doesn't work = well. > > >>> > > >>> The outer packet gets marked, but by default the marking doesn't > > make > > >>> it back into the inner packet when decoded. > > >> > > >> > > >> Is the kernel stripping the ECN bits provided by userspace? In = the code > > >> in your git branch you strip the ECN bits out yourself. > > >> > > >>> So communicating somehow that a path can take ecn (and/or = diffserv > > >>> markings) is needed between tinc daemons. I thought of perhaps > > >>> crafting a special icmp message marked with CE but am open to = ideas > > >>> that would be backward compatible. > > >> > > >> > > >> PMTU probes are used to discover whether UDP works and how big = the path > > >> MTU is, maybe it could be used to discover whether ECN works as = well? > > >> Set one of the ECN bits on some of the PMTU probes, and if you = receive a > > >> probe with that ECN bit set, also set it on the probe reply. If = you > > >> succesfully receive a reply with ECN bits set, then you know ECN = works. > > >> Since the remote side just echoes the contents of the probe, you = could > > >> also put a copy of the ECN bits in the probe payload, and then = you can > > >> detect if the ECN bits got zeroed. You can also define an = OPTION_ECN in > > >> src/connection.h, so nodes can announce their support for ECN, = but that > > >> should not be necessary I think. > > >> > > >>> B) I have long theorized that a lot of userspace vpns bottleneck = on > > >>> the read and encapsulate step, and being strict FIFOs, > > >>> gradually accumulate delay until finally they run out of read = socket > > >>> buffer space and start dropping packets. > > >> > > >> > > >> Well, encryption and decryption takes a lot of CPU time, but = context > > >> switches are also bad. > > >> > > >> Tinc is treating UDP in a strictly FIFO way, but actually it does = use a > > >> RED algorithm when tunneling over TCP. That said, it only looks = at its > > >> own buffers to determine when to drop packets, and those only = come into > > >> play once the kernel's TCP buffers are filled. > > >> > > >>> so I had a couple thoughts towards using multiple rx queues in = the > > >>> vtun interface, and/or trying to read more than one packet at a = time > > >>> (via recvmmsg) and do some level of fair queueing and queue > > management > > >>> (codel) inside tinc itself. I think that's > > >>> pretty doable without modifying the protocol any, but I'm not = sure > > of > > >>> it's value until I saturate some cpu more. > > >> > > >> > > >> I'd welcome any work in this area :) > > >> > > >>> (and if you thought recvmsg was complex, look at recvmmsg) > > >> > > >> > > >> It seems someone is already working on that, see > > >> https://github.com/jasdeep-hundal/tinc. > > >> > > >>> D) > > >>> > > >>> the bottleneck link above is actually not tinc but the gateway, = and > > as > > >>> the gateway reverts to codel behavior on a single encapsulated = flow > > >>> encapsulating all the other flows, we end up with about 40ms of > > >>> induced delay on this test. While I have a better codel (gets = below > > >>> 20ms latency, not deployed), *fq*_codel by identifying = individual > > >>> flows gets the induced delay on those flows down below 5ms. > > >> > > >> > > >> But that should improve with ECN if fq_codel is configured to use = that, > > >> right? > > >> > > >>> At one level, tinc being so nicely meshy means that the "fq" = part of > > >>> fq_codel on the gateway will have more chance to work against = the > > >>> multiple vpn flows it generates for all the potential vpn > > endpoints... > > >>> > > >>> but at another... lookie here! ipv6! 2^64 addresses or more to = use! > > >>> and port space to burn! What if I could make tinc open up 1024 = ports > > >>> per connection, and have it fq all it's flows over those? What = could > > >>> go wrong? > > >> > > >> > > >> Right, hash the header of the original packets, and then select a = port > > >> or address based on the hash? What about putting that hash in the = flow > > >> label of outer packets? Any routers that would actually treat = those as > > >> separate flows? > > > > > > > > > -- Sent from my Android device with K-@ Mail. Please excuse my = brevity. > > > > > > _______________________________________________ > > > Cerowrt-devel mailing list > > > Cerowrt-devel@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/cerowrt-devel > > > > >=20 > >=20 > >=20 > > -- > > Dave T=E4ht > >=20 > > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks > > > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel