From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 2170521F4E4 for ; Thu, 4 Dec 2014 11:03:25 -0800 (PST) Received: by mail-ob0-f181.google.com with SMTP id gq1so3212381obb.26 for ; Thu, 04 Dec 2014 11:03:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=zTA1frDWgQbMYuWI8Qv64HNeclXvi+oONWucx0a//yI=; b=R1ETXWCCqKG75MdX7Tv7z3unp6w7Gr8X95CWI6C9bUFvpltubHd+3d9mljeJDytrRT jvuhOlwKl70y7ACDToHMSEDCBKFoV7ybGRY+Ut+BGCSprN6v8pPBCw9YrNKdpJIDI7p2 sITgFOEQYk7jaPkat0jF3ECP7zIry8WXezRSSzjari2ICWNXRnqC9ql76gSfmlYYhaV6 WnMDQkO1Hl2BAZl0tPx/Y9/QJtmQJTF4CrK8fur4ezyEuy1bS58i58MuRa/03suR3wcu crlmlAeS/+Ug9OXiRdN1smm9/ne5STdegmd7ZR2jU9qw3mTs97v5sizSXfGSFzKwEbXo WRXQ== MIME-Version: 1.0 X-Received: by 10.60.219.97 with SMTP id pn1mr1335152oec.45.1417719804696; Thu, 04 Dec 2014 11:03:24 -0800 (PST) Received: by 10.202.227.77 with HTTP; Thu, 4 Dec 2014 11:03:24 -0800 (PST) In-Reply-To: <05c3d7d2-0f61-47a7-8f77-0033dd4a3230@reed.com> References: <20141203120246.GO10533@sliepen.org> <892513fe-8e57-4ee9-be7d-423a3afb4fba@reed.com> <1417653909.838517290@apps.rackspace.com> <05c3d7d2-0f61-47a7-8f77-0033dd4a3230@reed.com> Date: Thu, 4 Dec 2014 11:03:24 -0800 Message-ID: From: Dave Taht To: "David P. Reed" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Dec 2014 19:03:54 -0000 On Thu, Dec 4, 2014 at 7:30 AM, David P. Reed wrote: > I'd be more likely to agree if I thought that the network level technolog= ies > could work. The problem is that I've been in the system security business > long enough (starting in 1973 in a professional role) that I know how > useless the network level techniques are and how little is gained by > tinkering with them to improve them. Scada systems cannot be secured by > addons, period. Even airgap didn't protect against Stuxnet. > > It's approximately the same as security theater to think that home nets c= an > be secured by a fancy vpn. That only deals with a single threat model and > the solution really does not scale at all. Take the attack surface presented by IPMI. Please. http://www.fish2.com/ipmi/ > It just lets designers of home > systems off the hook so they can promote inherently bad designs by saying > they don't need to fix their designs. The context I have here is trying to come up with better models for securit= y in the upcoming ipv6 world. ULAs are a possible part of that, for example. Applying tinc-like techniques to a new tor-like protocol, for another. e2e encryption can kind of make things work worse. Why, exactly, is my ligh= tbulb sending a 500 byte packet to nsa.gov? > As soon as a hacker can control most of the stuff in a rich person's home > because of the IoT craze, we will see for profit rings promoting ransom > attacks. Download the (20 Megabyte!) "wemo" app for android. Look at all the info it requires in order to operate your IoT stuff.... (someone paste the list here, I am not in front of an android box right now= ) It's horrifying. Yes, your smart lightbulbs apparently need all these privs to operate. More dryly amusing today was this bit about usb malware. https://plus.google.com/u/0/107942175615993706558/posts We have billions of threats outside the local network to deal with, and many within. I wish I could be as sangine and satisfied as the e2e argument david makes, but me, I keep wanting to find a tropic island, or asteroid, with no internet to deal with, as a safe haven from the eventual cataclysmic disaster. http://xkcd.com/865/ > > Vpn's aren't likely to fix that at the network level. So my point is a > little subtle. Put effort where it pays off. At the end to end > authentication interoperability level rather than fantasy based solutions > that just break the network. At the creation of systems for attribution a= nd > policing and prosecution of the motivated conspirators. > > On Dec 4, 2014, Sebastian Moeller wrote: >> >> Hi, >> >> on the danger of going off on a tangent... >> >> On Dec 4, 2014, at 01:45 , dpreed@reed.com wrote: >> >>> Awesome start on the issue, in your note, Dave. Tor needs to change for >>> several reasons - not that it isn't great, but with IPv6 and other thin= gs >>> coming on line, plus the understanding of fq_codel's rationale, plus ..= . - >>> the world can do much better. Same with VPNs. >>> >>> I hope we can set our sights on a convergent target that doesn't get >>> bogged down in the tradeoffs that were made when VPNs were originally >>> proposed. The world is no longer a bunch of disconnected networks prote= cted >>> by Cheswick firewalls. Cheswick said they were only temporary, and they= 've >>> outlived their usefulness - they actually create security risks more th= an >>> they fix them (centralizing security creates points of failure and atta= ck >>> that exponentially decrease the attackers' work factor). To some extent= that >>> is also true for Tor after these many years. >> >> >> But trying to keep all computers on the end/edge secure also does not >> work/scale well, so both ends of the continuum have their issues; I woul= d >> not be amazed if realistically we need to keep doing both=E2=80=A6 secur= ing the end >> devices as well as intermediary devices. >> >> >>> By putting the intelligence about security in the network, you basicall= y >>> do all the bad things that the end-to-end argument encourages you to av= oid. >> >> >> I might misinterpret your point here, but given the devices people conne= ct >> to their own networks full e2e without added layers of security seems no= t >> very practical. There is an ever growing class of devices orphaned by th= eir >> makers (either explicitly like old ipods, or implicitly by lack of timel= y >> security fixes like Siemens SCADA systems, plus old but useful hardware >> requiring obsolete operating systems like windows XP, the list goes on..= .) >> that still can be used to good effect in a secured network but can not b= e >> trusted to access the wider internet, let alone be contacted by the wide= r >> internet. So unless we want to retire all those devices of dubious >> =E2=80=9Csecurity=E2=80=9D we need a layer in the network that can preem= pt traffic to and >> from specific devices. In the old IPv4 days the for =E2=80=9Cend-users= =E2=80=9D ubiquitous >> NAT tool care of the =E2=80=9Ctraffic to specific devices=E2=80=9D to so= me degree. I would >> be happy if even in the brave new IPv6 world we could keep such >> gatekeepers/bouncers around, ideally also controlling which devices can = send >> packets to the internet. >> I do not propose to put these things into the core of the network, but t= he >> boundary between a implicitly trusted home-network and the internet seem= s >> like a decent compromise to me. (I would also like such a device to defa= ult >> to "no implicit connectivity=E2=80=9D, so that each device needs to be m= anually >> declared fit for the internet, so that the users are aware of this syste= m). >> Since the number of connections between the home-net and the internet of= ten >> is smaller than the number of connected devices in such a network, the >> transfer points/routers seem like ideal candidates to implement the =E2= =80=9Caccess >> control=E2=80=9D. . (This does not mean that keeping end systems not sec= ured and >> patched is a good idea, but at least should greatly diminish the risk >> imposed by sub-optimally secured end points, I think/hope). >> Being a biologist I like to think about this as maintaining a special >> niche for hard/impossible to secure devices in my home, avoiding their >> extinction/pawning by keeping the predators away; as fitness is relative= . >> Might not work perfectly, but =E2=80=9Cgood enough=E2=80=9D would do ;) >> To cite the russians: Dowerjai, no prowerjai, "Trust, but verify=E2=80= =9D=E2=80=A6 >> >> >>> We could also put congestion control in the network by re-creating >>> admission control and requiring contractual agreements to carry traffic >>> across every intermediary. But I think that basically destroys almost a= ll >>> the value of an "inter" net. It makes it a balkanized proprietary set o= f >>> subnets that have dozens of reasons why you can't connect with anyone e= lse, >>> and no way to be free to connect. >>> >>> >>> >>> >>> >>> On Wednesday, December 3, 2014 2:44pm, "Dave Taht" >>> said: >>> >>>> On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed wrote: >>>>> >>>>> Tor needs this stuff very badly. >>>> >>>> >>>> Tor has many, many problematic behaviors relevant to congestion contro= l >>>> in general. Let me paste a bit of private discussion I'd had on it in = a >>>> second, >>>> but a very good paper that touched upon it all was: >>>> >>>> DefenestraTor: Throwing out Windows in Tor >>>> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf >>>> >>>> Honestly tor needs to move to udp, and hide in all the upcoming >>>> webrtc traffic.... >>>> >>>> >>>> http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox= -hello-webrtc-feature-in-firefox-beta/ >>>> >>>> webrtc needs some sort of non-centralized rendezvous mechanism, but I = am >>>> REALLY >>>> happy to see calls and video stay entirely inside my network when they >>>> can be >>>> negotiated as such. >>>> >>>> https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P >>>> >>>> And of course, people are busily reinventing torrent in webrtc without >>>> paying attention to congestion control at all. >>>> >>>> https://github.com/feross/webtorrent/issues/39 >>>> >>>> Giving access to udp to javascript programmers... what could go wrong? >>>> :/ >>>> >>>>> I do wonder whether we should focus on vpn's rather than end to end >>>>> encryption that does not leak secure information through from inside = as >>>>> the >>>>> plan seems to do. >>>> >>>> >>>> "plan"? >>>> >>>> I like e2e encryption. I also like overlay networks. And meshes. >>>> And working dns and service discovery. And low latency. >>>> >>>> vpns are useful abstractions for sharing an address space you >>>> may not want to share more widely. >>>> >>>> and: I've taken a lot of flack about how fq doesn't help on convention= al >>>> vpns, and well, just came up with an unconventional vpn idea, >>>> that might have some legs here... (certainly in my case tinc >>>> as constructed already, no patches, solves hooking together the >>>> 12 networks I have around the globe, mostly) >>>> >>>> As for "leaking information", packet size and frequency is generally >>>> an obvious indicator of a given traffic type, some padding added or >>>> no. There is one piece of plaintext >>>> in tinc (the seqno), also. It also uses a fixed port number for both >>>> sides of the connection (perhaps it shouldn't) >>>> >>>> So I don't necessarily see a difference between sending a whole lot of >>>> varying data on one tuple >>>> >>>> 2001:db8::1 <-> 2001:db8:1::1 on port 655 >>>> >>>> vs >>>> >>>> 2001:db8::1 <-> 2001:db8:1::1 port 655 >>>> 2001:db8::2 <-> 2001:db8:1::1 port 655 >>>> 2001:db8::3 <-> 2001:db8:1::1 port 655 >>>> 2001:db8::4 <-> 2001:db8:1::1 port 655 >>>> .... >>>> >>>> which solves the fq problem on a vpn like tinc neatly. A security >>>> feature >>>> could be source specific routing where we send stuff over different >>>> paths >>>> from different ipv6 source addresses... and mixing up the src/dest por= ts >>>> more but that complexifies the fq portion of the algo.... my thought >>>> for an initial implementation is to just hard code the ipv6 address >>>> range. >>>> >>>> I think however that adding tons and tons of ipv6 addresses to a given >>>> interface is probably slow, >>>> and might break things like nd and/or multicast... >>>> >>>> what would be cooler would be if you could allocate an entire /64 (or >>>> /118) to the vpn daemon >>>> >>>> bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips) >>>> >>>> but I am not sure how to go about doing that.. >>>> >>>> ...moving back to a formerly private discussion about tors woes... >>>> >>>> >>>> "This conversation is a bit separate from #11197 (which is an >>>> implementation issue in obfsproxy), so separate discussion somewhere >>>> would probably be required. >>>> >>>> So, there appears to be a slight misconception on how tor traffic >>>> travels across the Internet that I will attempt to clarify, and >>>> hopefully not get too terribly wrong. >>>> >>>> Each step of a given connection over tor involves multiple TCP/IP >>>> connections. To use a standard example of someone trying to watch Cat >>>> Videos on the "real internet", it will look approximately like thus: >>>> >>>> Client <-> Guard <-> Relay <-> Exit <-> Cat Videos >>>> >>>> Each step is a separate TCP/IP connection, authenticated and encrypted >>>> via TLS (TLS is likewise terminated at each hop). Using a pluggable >>>> transport encapsulates the first hop's TLS session with a different >>>> protocol be it obfs2, obfs3, or something else. >>>> >>>> The cat videos are passed through this path of many TCP/IP connections >>>> across things called Circuits that are created/extended by the Client >>>> one hop at a time (So the example above, the kitty cats travel across >>>> 4 TCP/IP connections, relaying data across a Circuit that spans from >>>> the Client to the Exit. If my art skills were up to it, I would draw a >>>> diagram.). >>>> >>>> Circuits are currently required to provide reliable, in-order delivery= . >>>> >>>> In addition to the standard congestion control provided by TCP/IP on a >>>> per-hop basis, there is Circuit level flow control *and* "end to end" >>>> flow control in the form of RELAY_SENDME cells, but given that multipl= e >>>> circuits can end up being multiplexed over a singlular TCP/IP >>>> connection, propagation of these RELAY_SENDME cells can get delayed du= e >>>> to HOL issues. >>>> >>>> So, with that quick and dirty overview out of the way: >>>> >>>> * "Ah so if ecn is enabled it can be used?" >>>> >>>> ECN will be used if it is enabled, *but* the congestion information >>>> will not get propaged to the source/destination of a given stream. >>>> >>>> * "Does it retain iw10 (the Linux default nowadays sadly)?" >>>> >>>> Each TCP/IP connection if sent from a host that uses a obnoxiously >>>> large initial window, will have an obnoxiously large initial >>>> window. >>>> >>>> It is worth noting that since multiple Circuits originating from >>>> potentially numerous clients can and will reuse existing TCP/IP >>>> connections if able to (see 5.3.1 of the tor spec) that dropping packe= ts >>>> between tor relays is kind of bad, because all of the separate >>>> encapsulated flows sharing the singular TCP/IP link will suffer (ECN >>>> would help here). This situation is rather unfortunate as the good >>>> active queue management algorithms drop packets (when ECN is not >>>> available). >>>> >>>> A better summary of tor's flow control/bufferbloat woes is given in: >>>> >>>> DefenestraTor: Throwing out Windows in Tor >>>> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf >>>> >>>> The N23 algorithm suggested in the paper did not end up getting >>>> implemented into Tor, but I do not remember the reason off the top of >>>> my head." >>>> >>>> >>>> >>>> >>>> >>>>> On Dec 3, 2014, Guus Sliepen wrote: >>>>> >>>>>> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>> https://github.com/dtaht/tinc >>>>>>> >>>>>>> I successfully converted tinc to use sendmsg and recvmsg, acquire >>>> >>>> (at >>>>>>> >>>>>>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet >>>> >>>> fields, >>>> >>>> >>>>>> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. >>>>>> >>>>>>> as well as SO_TIMESTAMPNS, and use a higher resolution internal >>>> >>>> clock. >>>>>>> >>>>>>> Got passing through the dscp values to work also, but: >>>>>>> >>>>>>> A) encapsulation of ecn capable marked packets, and availability in >>>>>>> the outer header, without correct decapsulationm doesn't work well. >>>>>>> >>>>>>> The outer packet gets marked, but by default the marking doesn't >>>> >>>> make >>>>>>> >>>>>>> it back into the inner packet when decoded. >>>>>> >>>>>> >>>>>> >>>>>> Is the kernel stripping the ECN bits provided by userspace? In the >>>>>> code >>>>>> in your git branch you strip the ECN bits out yourself. >>>>>> >>>>>>> So communicating somehow that a path can take ecn (and/or diffserv >>>>>>> markings) is needed between tinc daemons. I thought of perhaps >>>>>>> crafting a special icmp message marked with CE but am open to ideas >>>>>>> that would be backward compatible. >>>>>> >>>>>> >>>>>> >>>>>> PMTU probes are used to discover whether UDP works and how big the >>>>>> path >>>>>> MTU is, maybe it could be used to discover whether ECN works as well= ? >>>>>> Set one of the ECN bits on some of the PMTU probes, and if you recei= ve >>>>>> a >>>>>> probe with that ECN bit set, also set it on the probe reply. If you >>>>>> succesfully receive a reply with ECN bits set, then you know ECN >>>>>> works. >>>>>> Since the remote side just echoes the contents of the probe, you cou= ld >>>>>> also put a copy of the ECN bits in the probe payload, and then you c= an >>>>>> detect if the ECN bits got zeroed. You can also define an OPTION_ECN >>>>>> in >>>>>> src/connection.h, so nodes can announce their support for ECN, but >>>>>> that >>>>>> should not be necessary I think. >>>>>> >>>>>>> B) I have long theorized that a lot of userspace vpns bottleneck on >>>>>>> the read and encapsulate step, and being strict FIFOs, >>>>>>> gradually accumulate delay until finally they run out of read socke= t >>>>>>> buffer space and start dropping packets. >>>>>> >>>>>> >>>>>> >>>>>> Well, encryption and decryption takes a lot of CPU time, but context >>>>>> switches are also bad. >>>>>> >>>>>> Tinc is treating UDP in a strictly FIFO way, but actually it does us= e >>>>>> a >>>>>> RED algorithm when tunneling over TCP. That said, it only looks at i= ts >>>>>> own buffers to determine when to drop packets, and those only come >>>>>> into >>>>>> play once the kernel's TCP buffers are filled. >>>>>> >>>>>>> so I had a couple thoughts towards using multiple rx queues in the >>>>>>> vtun interface, and/or trying to read more than one packet at a tim= e >>>>>>> (via recvmmsg) and do some level of fair queueing and queue >>>> >>>> management >>>>>>> >>>>>>> (codel) inside tinc itself. I think that's >>>>>>> pretty doable without modifying the protocol any, but I'm not sure >>>> >>>> of >>>>>>> >>>>>>> it's value until I saturate some cpu more. >>>>>> >>>>>> >>>>>> >>>>>> I'd welcome any work in this area :) >>>>>> >>>>>>> (and if you thought recvmsg was complex, look at recvmmsg) >>>>>> >>>>>> >>>>>> >>>>>> It seems someone is already working on that, see >>>>>> https://github.com/jasdeep-hundal/tinc. >>>>>> >>>>>>> D) >>>>>>> >>>>>>> the bottleneck link above is actually not tinc but the gateway, and >>>> >>>> as >>>>>>> >>>>>>> the gateway reverts to codel behavior on a single encapsulated flow >>>>>>> encapsulating all the other flows, we end up with about 40ms of >>>>>>> induced delay on this test. While I have a better codel (gets below >>>>>>> 20ms latency, not deployed), *fq*_codel by identifying individual >>>>>>> flows gets the induced delay on those flows down below 5ms. >>>>>> >>>>>> >>>>>> >>>>>> But that should improve with ECN if fq_codel is configured to use >>>>>> that, >>>>>> right? >>>>>> >>>>>>> At one level, tinc being so nicely meshy means that the "fq" part o= f >>>>>>> fq_codel on the gateway will have more chance to work against the >>>>>>> multiple vpn flows it generates for all the potential vpn >>>> >>>> endpoints... >>>> >>>>>>> but at another... lookie here! ipv6! 2^64 addresses or more to use! >>>>>>> and port space to burn! What if I could make tinc open up 1024 port= s >>>>>>> per connection, and have it fq all it's flows over those? What coul= d >>>>>>> go wrong? >>>>>> >>>>>> >>>>>> >>>>>> Right, hash the header of the original packets, and then select a po= rt >>>>>> or address based on the hash? What about putting that hash in the fl= ow >>>>>> label of outer packets? Any routers that would actually treat those = as >>>>>> separate flows? >>>>> >>>>> >>>>> >>>>> -- Sent from my Android device with K-@ Mail. Please excuse my brevit= y. >>>>> >>>>> ________________________________ >>>>> >>>>> Cerowrt-devel mailing list >>>>> Cerowrt-devel@lists.bufferbloat.net >>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Dave T=C3=A4ht >>>> >>>> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks >>> >>> >>> ________________________________ >>> >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >> >> > > -- Sent from my Android device with K-@ Mail. Please excuse my brevity. --=20 Dave T=C3=A4ht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks