From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mout.gmx.net",
	Issuer "TeleSec ServerPass DE-1" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 79D7121F459
	for <cerowrt-devel@lists.bufferbloat.net>;
	Thu,  4 Dec 2014 01:38:46 -0800 (PST)
Received: from u-089-cab204a2.am1.uni-tuebingen.de ([134.2.89.3]) by
	mail.gmx.com (mrgmx002) with ESMTPSA (Nemesis) id
	0M9b03-1Y9LZ91V31-00CyGy; Thu, 04 Dec 2014 10:38:40 +0100
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Content-Type: text/plain; charset=windows-1252
From: Sebastian Moeller <moeller0@gmx.de>
X-Priority: 3 (Normal)
In-Reply-To: <1417653909.838517290@apps.rackspace.com>
Date: Thu, 4 Dec 2014 10:38:38 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <AA1BAE93-4111-40AE-98B0-92370040BF2F@gmx.de>
References: <CAA93jw7-odrOhQKpPz9rAtK_MCpnkz9FELhDYmZoBndX2wTG2Q@mail.gmail.com>
	<20141203120246.GO10533@sliepen.org>
	<892513fe-8e57-4ee9-be7d-423a3afb4fba@reed.com>
	<CAA93jw5Ham5eivkEQvdeeqF4n7Lp4u_2kYN=31upQr1qLWEBOA@mail.gmail.com>
	<1417653909.838517290@apps.rackspace.com>
To: dpreed@reed.com
X-Mailer: Apple Mail (2.1878.6)
X-Provags-ID: V03:K0:waJ131MTDM3dfJ1q7kg2ZjhPdCBvL2/ZpI8zgbqV/WMGcOeuedr
	cwFSfpycFXejW2+FZNVFBCXKBJyChPmtCo2bqeZPkc5PL1W1xPI/MkztE0iCQGffercVDuL
	Yk4xMVs1HGr4q1iiK1H8ZMl7MQE2UoNp4I1x8g3dJNk5o73uVW5kM0f9qJUJhDVSPjtpRDS
	DHvZY3JATATeUU4oGJ9Uw==
X-UI-Out-Filterresults: notjunk:1;
Cc: "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>
Subject: Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough
	(priorityinherit), ecn, and fq_codel support
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 04 Dec 2014 09:39:16 -0000

Hi,

on the danger of going off on a tangent...

On Dec 4, 2014, at 01:45 , dpreed@reed.com wrote:

> Awesome start on the issue, in your note, Dave.  Tor needs to change =
for several reasons - not that it isn't great, but with IPv6 and other =
things coming on line, plus the understanding of fq_codel's rationale, =
plus ... - the world can do much better.  Same with VPNs.
> =20
> I hope we can set our sights on a convergent target that doesn't get =
bogged down in the tradeoffs that were made when VPNs were originally =
proposed.  The world is no longer a bunch of disconnected networks =
protected by Cheswick firewalls.  Cheswick said they were only =
temporary, and they've outlived their usefulness - they actually create =
security risks more than they fix them (centralizing security creates =
points of failure and attack that exponentially decrease the attackers' =
work factor).  To some extent that is also true for Tor after these many =
years.

	But trying to keep all computers on the end/edge secure also =
does not work/scale well, so both ends of the continuum have their =
issues; I would not be amazed if realistically we need to keep doing =
both=85 securing the end devices as well as intermediary devices.

>=20
> By putting the intelligence about security in the network, you =
basically do all the bad things that the end-to-end argument encourages =
you to avoid. =20

	I might misinterpret your point here, but given the devices =
people connect to their own networks full e2e without added layers of =
security seems not very practical. There is an ever growing class of =
devices orphaned by their makers (either explicitly like old ipods, or =
implicitly by lack of timely security fixes like Siemens SCADA systems, =
plus old but useful hardware requiring obsolete operating systems like =
windows XP, the list goes on...) that still can be used to good effect =
in a secured network but can not be trusted to access the wider =
internet, let alone be contacted by the wider internet. So unless we =
want to retire all those devices of dubious =93security=94 we need a =
layer in the network that can preempt traffic to and from specific =
devices. In the old IPv4 days the for =93end-users=94 ubiquitous NAT =
tool care of the =93traffic to specific devices=94 to some degree. I =
would be happy if even in the brave new IPv6 world we could keep such =
gatekeepers/bouncers around, ideally also controlling which devices can =
send packets to the internet.
	I do not propose to put these things into the core of the =
network, but the boundary between a implicitly trusted home-network and =
the internet seems like a decent compromise to me. (I would also like =
such a device to default to "no implicit connectivity=94, so that each =
device needs to be manually declared fit for the internet, so that the =
users are aware of this system). Since the number of connections between =
the home-net and the internet often is smaller than the number of =
connected devices in such a network, the transfer points/routers seem =
like ideal candidates to implement the =93access control=94. . (This =
does not mean that keeping end systems not secured and patched is a good =
idea, but at least should greatly diminish the risk imposed by =
sub-optimally secured end points, I think/hope).
	Being a biologist I like to think about this as maintaining a =
special niche for hard/impossible to secure devices in my home, avoiding =
their extinction/pawning by keeping the predators away; as fitness is =
relative. Might not work perfectly, but =93good enough=94 would do ;)
	To cite the russians: Dowerjai, no prowerjai, "Trust, but =
verify=94=85


> We could also put congestion control in the network by re-creating =
admission control and requiring contractual agreements to carry traffic =
across every intermediary.  But I think that basically destroys almost =
all the value of an "inter" net.  It makes it a balkanized proprietary =
set of subnets that have dozens of reasons why you can't connect with =
anyone else, and no way to be free to connect.
> =20
> =20
> =20
>=20
>=20
> On Wednesday, December 3, 2014 2:44pm, "Dave Taht" =
<dave.taht@gmail.com> said:
>=20
> > On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> =
wrote:
> > > Tor needs this stuff very badly.
> >=20
> > Tor has many, many problematic behaviors relevant to congestion =
control
> > in general. Let me paste a bit of private discussion I'd had on it =
in a second,
> > but a very good paper that touched upon it all was:
> >=20
> > DefenestraTor: Throwing out Windows in Tor
> > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf
> >=20
> > Honestly tor needs to move to udp, and hide in all the upcoming
> > webrtc traffic....
> >=20
> > =
http://blog.mozilla.org/futurereleases/2014/10/16/test-the-new-firefox-hel=
lo-webrtc-feature-in-firefox-beta/
> >=20
> > webrtc needs some sort of non-centralized rendezvous mechanism, but =
I am REALLY
> > happy to see calls and video stay entirely inside my network when =
they can be
> > negotiated as such.
> >=20
> > https://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P
> >=20
> > And of course, people are busily reinventing torrent in webrtc =
without
> > paying attention to congestion control at all.
> >=20
> > https://github.com/feross/webtorrent/issues/39
> >=20
> > Giving access to udp to javascript programmers... what could go =
wrong?
> > :/
> >=20
> > > I do wonder whether we should focus on vpn's rather than end to =
end
> > > encryption that does not leak secure information through from =
inside as the
> > > plan seems to do.
> >=20
> > "plan"?
> >=20
> > I like e2e encryption. I also like overlay networks. And meshes.
> > And working dns and service discovery. And low latency.
> >=20
> > vpns are useful abstractions for sharing an address space you
> > may not want to share more widely.
> >=20
> > and: I've taken a lot of flack about how fq doesn't help on =
conventional
> > vpns, and well, just came up with an unconventional vpn idea,
> > that might have some legs here... (certainly in my case tinc
> > as constructed already, no patches, solves hooking together the
> > 12 networks I have around the globe, mostly)
> >=20
> > As for "leaking information", packet size and frequency is generally
> > an obvious indicator of a given traffic type, some padding added or
> > no. There is one piece of plaintext
> > in tinc (the seqno), also. It also uses a fixed port number for both
> > sides of the connection (perhaps it shouldn't)
> >=20
> > So I don't necessarily see a difference between sending a whole lot =
of
> > varying data on one tuple
> >=20
> > 2001:db8::1 <-> 2001:db8:1::1 on port 655
> >=20
> > vs
> >=20
> > 2001:db8::1 <-> 2001:db8:1::1 port 655
> > 2001:db8::2 <-> 2001:db8:1::1 port 655
> > 2001:db8::3 <-> 2001:db8:1::1 port 655
> > 2001:db8::4 <-> 2001:db8:1::1 port 655
> > ....
> >=20
> > which solves the fq problem on a vpn like tinc neatly. A security =
feature
> > could be source specific routing where we send stuff over different =
paths
> > from different ipv6 source addresses... and mixing up the src/dest =
ports
> > more but that complexifies the fq portion of the algo.... my thought
> > for an initial implementation is to just hard code the ipv6 address =
range.
> >=20
> > I think however that adding tons and tons of ipv6 addresses to a =
given
> > interface is probably slow,
> > and might break things like nd and/or multicast...
> >=20
> > what would be cooler would be if you could allocate an entire /64 =
(or
> > /118) to the vpn daemon
> >=20
> > bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips)
> >=20
> > but I am not sure how to go about doing that..
> >=20
> > ...moving back to a formerly private discussion about tors woes...
> >=20
> >=20
> > "This conversation is a bit separate from #11197 (which is an
> > implementation issue in obfsproxy), so separate discussion somewhere
> > would probably be required.
> >=20
> > So, there appears to be a slight misconception on how tor traffic
> > travels across the Internet that I will attempt to clarify, and
> > hopefully not get too terribly wrong.
> >=20
> > Each step of a given connection over tor involves multiple TCP/IP
> > connections. To use a standard example of someone trying to watch =
Cat
> > Videos on the "real internet", it will look approximately like thus:
> >=20
> > Client <-> Guard <-> Relay <-> Exit <-> Cat Videos
> >=20
> > Each step is a separate TCP/IP connection, authenticated and =
encrypted
> > via TLS (TLS is likewise terminated at each hop). Using a pluggable
> > transport encapsulates the first hop's TLS session with a different
> > protocol be it obfs2, obfs3, or something else.
> >=20
> > The cat videos are passed through this path of many TCP/IP =
connections
> > across things called Circuits that are created/extended by the =
Client
> > one hop at a time (So the example above, the kitty cats travel =
across
> > 4 TCP/IP connections, relaying data across a Circuit that spans from
> > the Client to the Exit. If my art skills were up to it, I would draw =
a
> > diagram.).
> >=20
> > Circuits are currently required to provide reliable, in-order =
delivery.
> >=20
> > In addition to the standard congestion control provided by TCP/IP on =
a
> > per-hop basis, there is Circuit level flow control *and* "end to =
end"
> > flow control in the form of RELAY_SENDME cells, but given that =
multiple
> > circuits can end up being multiplexed over a singlular TCP/IP
> > connection, propagation of these RELAY_SENDME cells can get delayed =
due
> > to HOL issues.
> >=20
> > So, with that quick and dirty overview out of the way:
> >=20
> > * "Ah so if ecn is enabled it can be used?"
> >=20
> > ECN will be used if it is enabled, *but* the congestion information
> > will not get propaged to the source/destination of a given stream.
> >=20
> > * "Does it retain iw10 (the Linux default nowadays sadly)?"
> >=20
> > Each TCP/IP connection if sent from a host that uses a obnoxiously
> > large initial window, will have an obnoxiously large initial
> > window.
> >=20
> > It is worth noting that since multiple Circuits originating from
> > potentially numerous clients can and will reuse existing TCP/IP
> > connections if able to (see 5.3.1 of the tor spec) that dropping =
packets
> > between tor relays is kind of bad, because all of the separate
> > encapsulated flows sharing the singular TCP/IP link will suffer (ECN
> > would help here). This situation is rather unfortunate as the good
> > active queue management algorithms drop packets (when ECN is not
> > available).
> >=20
> > A better summary of tor's flow control/bufferbloat woes is given in:
> >=20
> > DefenestraTor: Throwing out Windows in Tor
> > http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf
> >=20
> > The N23 algorithm suggested in the paper did not end up getting
> > implemented into Tor, but I do not remember the reason off the top =
of
> > my head."
> >=20
> >=20
> > >
> > >
> > >
> > > On Dec 3, 2014, Guus Sliepen <guus@tinc-vpn.org> wrote:
> > >>
> > >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote:
> > >>
> > >> [...]
> > >>>
> > >>> https://github.com/dtaht/tinc
> > >>>
> > >>> I successfully converted tinc to use sendmsg and recvmsg, =
acquire
> > (at
> > >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet
> > fields,
> > >>
> > >>
> > >> Windows does not have sendmsg()/recvmsg(), but the BSDs support =
it.
> > >>
> > >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal
> > clock.
> > >>> Got passing through the dscp values to work also, but:
> > >>>
> > >>> A) encapsulation of ecn capable marked packets, and availability =
in
> > >>> the outer header, without correct decapsulationm doesn't work =
well.
> > >>>
> > >>> The outer packet gets marked, but by default the marking doesn't
> > make
> > >>> it back into the inner packet when decoded.
> > >>
> > >>
> > >> Is the kernel stripping the ECN bits provided by userspace? In =
the code
> > >> in your git branch you strip the ECN bits out yourself.
> > >>
> > >>> So communicating somehow that a path can take ecn (and/or =
diffserv
> > >>> markings) is needed between tinc daemons. I thought of perhaps
> > >>> crafting a special icmp message marked with CE but am open to =
ideas
> > >>> that would be backward compatible.
> > >>
> > >>
> > >> PMTU probes are used to discover whether UDP works and how big =
the path
> > >> MTU is, maybe it could be used to discover whether ECN works as =
well?
> > >> Set one of the ECN bits on some of the PMTU probes, and if you =
receive a
> > >> probe with that ECN bit set, also set it on the probe reply. If =
you
> > >> succesfully receive a reply with ECN bits set, then you know ECN =
works.
> > >> Since the remote side just echoes the contents of the probe, you =
could
> > >> also put a copy of the ECN bits in the probe payload, and then =
you can
> > >> detect if the ECN bits got zeroed. You can also define an =
OPTION_ECN in
> > >> src/connection.h, so nodes can announce their support for ECN, =
but that
> > >> should not be necessary I think.
> > >>
> > >>> B) I have long theorized that a lot of userspace vpns bottleneck =
on
> > >>> the read and encapsulate step, and being strict FIFOs,
> > >>> gradually accumulate delay until finally they run out of read =
socket
> > >>> buffer space and start dropping packets.
> > >>
> > >>
> > >> Well, encryption and decryption takes a lot of CPU time, but =
context
> > >> switches are also bad.
> > >>
> > >> Tinc is treating UDP in a strictly FIFO way, but actually it does =
use a
> > >> RED algorithm when tunneling over TCP. That said, it only looks =
at its
> > >> own buffers to determine when to drop packets, and those only =
come into
> > >> play once the kernel's TCP buffers are filled.
> > >>
> > >>> so I had a couple thoughts towards using multiple rx queues in =
the
> > >>> vtun interface, and/or trying to read more than one packet at a =
time
> > >>> (via recvmmsg) and do some level of fair queueing and queue
> > management
> > >>> (codel) inside tinc itself. I think that's
> > >>> pretty doable without modifying the protocol any, but I'm not =
sure
> > of
> > >>> it's value until I saturate some cpu more.
> > >>
> > >>
> > >> I'd welcome any work in this area :)
> > >>
> > >>> (and if you thought recvmsg was complex, look at recvmmsg)
> > >>
> > >>
> > >> It seems someone is already working on that, see
> > >> https://github.com/jasdeep-hundal/tinc.
> > >>
> > >>> D)
> > >>>
> > >>> the bottleneck link above is actually not tinc but the gateway, =
and
> > as
> > >>> the gateway reverts to codel behavior on a single encapsulated =
flow
> > >>> encapsulating all the other flows, we end up with about 40ms of
> > >>> induced delay on this test. While I have a better codel (gets =
below
> > >>> 20ms latency, not deployed), *fq*_codel by identifying =
individual
> > >>> flows gets the induced delay on those flows down below 5ms.
> > >>
> > >>
> > >> But that should improve with ECN if fq_codel is configured to use =
that,
> > >> right?
> > >>
> > >>> At one level, tinc being so nicely meshy means that the "fq" =
part of
> > >>> fq_codel on the gateway will have more chance to work against =
the
> > >>> multiple vpn flows it generates for all the potential vpn
> > endpoints...
> > >>>
> > >>> but at another... lookie here! ipv6! 2^64 addresses or more to =
use!
> > >>> and port space to burn! What if I could make tinc open up 1024 =
ports
> > >>> per connection, and have it fq all it's flows over those? What =
could
> > >>> go wrong?
> > >>
> > >>
> > >> Right, hash the header of the original packets, and then select a =
port
> > >> or address based on the hash? What about putting that hash in the =
flow
> > >> label of outer packets? Any routers that would actually treat =
those as
> > >> separate flows?
> > >
> > >
> > > -- Sent from my Android device with K-@ Mail. Please excuse my =
brevity.
> > >
> > > _______________________________________________
> > > Cerowrt-devel mailing list
> > > Cerowrt-devel@lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > >
> >=20
> >=20
> >=20
> > --
> > Dave T=E4ht
> >=20
> > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
> >
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel