From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from smtp105.iad3a.emailsrvr.com (smtp105.iad3a.emailsrvr.com
[173.203.187.105])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by huchra.bufferbloat.net (Postfix) with ESMTPS id E141C21F452
for ;
Wed, 3 Dec 2014 16:45:11 -0800 (PST)
Received: from localhost (localhost.localdomain [127.0.0.1])
by smtp30.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
3911638019A; Wed, 3 Dec 2014 19:45:10 -0500 (EST)
X-Virus-Scanned: OK
Received: from app21.wa-webapps.iad3a (relay-webapps.rsapps.net
[172.27.255.140])
by smtp30.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
E0DC53801AB; Wed, 3 Dec 2014 19:45:09 -0500 (EST)
X-Sender-Id: dpreed@reed.com
Received: from app21.wa-webapps.iad3a (relay-webapps.rsapps.net
[172.27.255.140]) by 0.0.0.0:25 (trex/5.4.1);
Thu, 04 Dec 2014 00:45:10 GMT
Received: from reed.com (localhost.localdomain [127.0.0.1])
by app21.wa-webapps.iad3a (Postfix) with ESMTP id CD380280054;
Wed, 3 Dec 2014 19:45:09 -0500 (EST)
Received: by apps.rackspace.com
(Authenticated sender: dpreed@reed.com, from: dpreed@reed.com)
with HTTP; Wed, 3 Dec 2014 19:45:09 -0500 (EST)
Date: Wed, 3 Dec 2014 19:45:09 -0500 (EST)
From: dpreed@reed.com
To: "Dave Taht"
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_20141203194509000000_77232"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To:
References:
<20141203120246.GO10533@sliepen.org>
<892513fe-8e57-4ee9-be7d-423a3afb4fba@reed.com>
X-Auth-ID: dpreed@reed.com
Message-ID: <1417653909.838517290@apps.rackspace.com>
X-Mailer: webmail7.0
Cc: Guus Sliepen , tinc-devel@tinc-vpn.org,
"cerowrt-devel@lists.bufferbloat.net"
Subject: Re: [Cerowrt-devel]
=?utf-8?q?tinc_vpn=3A_adding_dscp_passthrough_=28?=
=?utf-8?q?priorityinherit=29=2C_ecn=2C_and_fq=5Fcodel_support?=
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Thu, 04 Dec 2014 00:45:40 -0000
------=_20141203194509000000_77232
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
=0AAwesome start on the issue, in your note, Dave. Tor needs to change for=
several reasons - not that it isn't great, but with IPv6 and other things =
coming on line, plus the understanding of fq_codel's rationale, plus ... - =
the world can do much better. Same with VPNs.=0A =0AI hope we can set our =
sights on a convergent target that doesn't get bogged down in the tradeoffs=
that were made when VPNs were originally proposed. The world is no longer=
a bunch of disconnected networks protected by Cheswick firewalls. Cheswic=
k said they were only temporary, and they've outlived their usefulness - th=
ey actually create security risks more than they fix them (centralizing sec=
urity creates points of failure and attack that exponentially decrease the =
attackers' work factor). To some extent that is also true for Tor after th=
ese many years.=0A =0ABy putting the intelligence about security in the net=
work, you basically do all the bad things that the end-to-end argument enco=
urages you to avoid. We could also put congestion control in the network b=
y re-creating admission control and requiring contractual agreements to car=
ry traffic across every intermediary. But I think that basically destroys =
almost all the value of an "inter" net. It makes it a balkanized proprieta=
ry set of subnets that have dozens of reasons why you can't connect with an=
yone else, and no way to be free to connect.=0A =0A =0A =0A=0A=0AOn Wednesd=
ay, December 3, 2014 2:44pm, "Dave Taht" said:=0A=0A=
=0A=0A> On Wed, Dec 3, 2014 at 6:17 AM, David P. Reed wro=
te:=0A> > Tor needs this stuff very badly.=0A> =0A> Tor has many, many prob=
lematic behaviors relevant to congestion control=0A> in general. Let me pas=
te a bit of private discussion I'd had on it in a second,=0A> but a very go=
od paper that touched upon it all was:=0A> =0A> DefenestraTor: Throwing out=
Windows in Tor=0A> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf=
=0A> =0A> Honestly tor needs to move to udp, and hide in all the upcoming=
=0A> webrtc traffic....=0A> =0A> http://blog.mozilla.org/futurereleases/201=
4/10/16/test-the-new-firefox-hello-webrtc-feature-in-firefox-beta/=0A> =0A>=
webrtc needs some sort of non-centralized rendezvous mechanism, but I am R=
EALLY=0A> happy to see calls and video stay entirely inside my network when=
they can be=0A> negotiated as such.=0A> =0A> https://plus.google.com/u/0/1=
07942175615993706558/posts/M4xUtpCKJ4P=0A> =0A> And of course, people are b=
usily reinventing torrent in webrtc without=0A> paying attention to congest=
ion control at all.=0A> =0A> https://github.com/feross/webtorrent/issues/39=
=0A> =0A> Giving access to udp to javascript programmers... what could go w=
rong?=0A> :/=0A> =0A> > I do wonder whether we should focus on vpn's rather=
than end to end=0A> > encryption that does not leak secure information thr=
ough from inside as the=0A> > plan seems to do.=0A> =0A> "plan"?=0A> =0A> I=
like e2e encryption. I also like overlay networks. And meshes.=0A> And wor=
king dns and service discovery. And low latency.=0A> =0A> vpns are useful a=
bstractions for sharing an address space you=0A> may not want to share more=
widely.=0A> =0A> and: I've taken a lot of flack about how fq doesn't help =
on conventional=0A> vpns, and well, just came up with an unconventional vpn=
idea,=0A> that might have some legs here... (certainly in my case tinc=0A>=
as constructed already, no patches, solves hooking together the=0A> 12 net=
works I have around the globe, mostly)=0A> =0A> As for "leaking information=
", packet size and frequency is generally=0A> an obvious indicator of a giv=
en traffic type, some padding added or=0A> no. There is one piece of plaint=
ext=0A> in tinc (the seqno), also. It also uses a fixed port number for bot=
h=0A> sides of the connection (perhaps it shouldn't)=0A> =0A> So I don't ne=
cessarily see a difference between sending a whole lot of=0A> varying data =
on one tuple=0A> =0A> 2001:db8::1 <-> 2001:db8:1::1 on port 655=0A> =0A> vs=
=0A> =0A> 2001:db8::1 <-> 2001:db8:1::1 port 655=0A> 2001:db8::2 <-> 2001:d=
b8:1::1 port 655=0A> 2001:db8::3 <-> 2001:db8:1::1 port 655=0A> 2001:db8::4=
<-> 2001:db8:1::1 port 655=0A> ....=0A> =0A> which solves the fq problem o=
n a vpn like tinc neatly. A security feature=0A> could be source specific r=
outing where we send stuff over different paths=0A> from different ipv6 sou=
rce addresses... and mixing up the src/dest ports=0A> more but that complex=
ifies the fq portion of the algo.... my thought=0A> for an initial implemen=
tation is to just hard code the ipv6 address range.=0A> =0A> I think howeve=
r that adding tons and tons of ipv6 addresses to a given=0A> interface is p=
robably slow,=0A> and might break things like nd and/or multicast...=0A> =
=0A> what would be cooler would be if you could allocate an entire /64 (or=
=0A> /118) to the vpn daemon=0A> =0A> bindtoaddress(2001:db8::/118) (give m=
e all the data for 1024 ips)=0A> =0A> but I am not sure how to go about doi=
ng that..=0A> =0A> ...moving back to a formerly private discussion about to=
rs woes...=0A> =0A> =0A> "This conversation is a bit separate from #11197 (=
which is an=0A> implementation issue in obfsproxy), so separate discussion =
somewhere=0A> would probably be required.=0A> =0A> So, there appears to be =
a slight misconception on how tor traffic=0A> travels across the Internet t=
hat I will attempt to clarify, and=0A> hopefully not get too terribly wrong=
.=0A> =0A> Each step of a given connection over tor involves multiple TCP/I=
P=0A> connections. To use a standard example of someone trying to watch Cat=
=0A> Videos on the "real internet", it will look approximately like thus:=
=0A> =0A> Client <-> Guard <-> Relay <-> Exit <-> Cat Videos=0A> =0A> Each =
step is a separate TCP/IP connection, authenticated and encrypted=0A> via T=
LS (TLS is likewise terminated at each hop). Using a pluggable=0A> transpor=
t encapsulates the first hop's TLS session with a different=0A> protocol be=
it obfs2, obfs3, or something else.=0A> =0A> The cat videos are passed thr=
ough this path of many TCP/IP connections=0A> across things called Circuits=
that are created/extended by the Client=0A> one hop at a time (So the exam=
ple above, the kitty cats travel across=0A> 4 TCP/IP connections, relaying =
data across a Circuit that spans from=0A> the Client to the Exit. If my art=
skills were up to it, I would draw a=0A> diagram.).=0A> =0A> Circuits are =
currently required to provide reliable, in-order delivery.=0A> =0A> In addi=
tion to the standard congestion control provided by TCP/IP on a=0A> per-hop=
basis, there is Circuit level flow control *and* "end to end"=0A> flow con=
trol in the form of RELAY_SENDME cells, but given that multiple=0A> circuit=
s can end up being multiplexed over a singlular TCP/IP=0A> connection, prop=
agation of these RELAY_SENDME cells can get delayed due=0A> to HOL issues.=
=0A> =0A> So, with that quick and dirty overview out of the way:=0A> =0A> *=
"Ah so if ecn is enabled it can be used?"=0A> =0A> ECN will be used if it =
is enabled, *but* the congestion information=0A> will not get propaged to t=
he source/destination of a given stream.=0A> =0A> * "Does it retain iw10 (t=
he Linux default nowadays sadly)?"=0A> =0A> Each TCP/IP connection if sent =
from a host that uses a obnoxiously=0A> large initial window, will have an =
obnoxiously large initial=0A> window.=0A> =0A> It is worth noting that sinc=
e multiple Circuits originating from=0A> potentially numerous clients can a=
nd will reuse existing TCP/IP=0A> connections if able to (see 5.3.1 of the =
tor spec) that dropping packets=0A> between tor relays is kind of bad, beca=
use all of the separate=0A> encapsulated flows sharing the singular TCP/IP =
link will suffer (ECN=0A> would help here). This situation is rather unfort=
unate as the good=0A> active queue management algorithms drop packets (when=
ECN is not=0A> available).=0A> =0A> A better summary of tor's flow control=
/bufferbloat woes is given in:=0A> =0A> DefenestraTor: Throwing out Windows=
in Tor=0A> http://www.cypherpunks.ca/~iang/pubs/defenestrator.pdf=0A> =0A>=
The N23 algorithm suggested in the paper did not end up getting=0A> implem=
ented into Tor, but I do not remember the reason off the top of=0A> my head=
."=0A> =0A> =0A> >=0A> >=0A> >=0A> > On Dec 3, 2014, Guus Sliepen wrote:=0A> >>=0A> >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, D=
ave Taht wrote:=0A> >>=0A> >> [...]=0A> >>>=0A> >>> https://github.com/dtah=
t/tinc=0A> >>>=0A> >>> I successfully converted tinc to use sendmsg and rec=
vmsg, acquire=0A> (at=0A> >>> least on linux) the TTL/Hoplimit and IP_TOS/I=
Pv6_TCLASS packet=0A> fields,=0A> >>=0A> >>=0A> >> Windows does not have se=
ndmsg()/recvmsg(), but the BSDs support it.=0A> >>=0A> >>> as well as SO_TI=
MESTAMPNS, and use a higher resolution internal=0A> clock.=0A> >>> Got pass=
ing through the dscp values to work also, but:=0A> >>>=0A> >>> A) encapsula=
tion of ecn capable marked packets, and availability in=0A> >>> the outer h=
eader, without correct decapsulationm doesn't work well.=0A> >>>=0A> >>> Th=
e outer packet gets marked, but by default the marking doesn't=0A> make=0A>=
>>> it back into the inner packet when decoded.=0A> >>=0A> >>=0A> >> Is th=
e kernel stripping the ECN bits provided by userspace? In the code=0A> >> i=
n your git branch you strip the ECN bits out yourself.=0A> >>=0A> >>> So co=
mmunicating somehow that a path can take ecn (and/or diffserv=0A> >>> marki=
ngs) is needed between tinc daemons. I thought of perhaps=0A> >>> crafting =
a special icmp message marked with CE but am open to ideas=0A> >>> that wou=
ld be backward compatible.=0A> >>=0A> >>=0A> >> PMTU probes are used to dis=
cover whether UDP works and how big the path=0A> >> MTU is, maybe it could =
be used to discover whether ECN works as well?=0A> >> Set one of the ECN bi=
ts on some of the PMTU probes, and if you receive a=0A> >> probe with that =
ECN bit set, also set it on the probe reply. If you=0A> >> succesfully rece=
ive a reply with ECN bits set, then you know ECN works.=0A> >> Since the re=
mote side just echoes the contents of the probe, you could=0A> >> also put =
a copy of the ECN bits in the probe payload, and then you can=0A> >> detect=
if the ECN bits got zeroed. You can also define an OPTION_ECN in=0A> >> sr=
c/connection.h, so nodes can announce their support for ECN, but that=0A> >=
> should not be necessary I think.=0A> >>=0A> >>> B) I have long theorized =
that a lot of userspace vpns bottleneck on=0A> >>> the read and encapsulate=
step, and being strict FIFOs,=0A> >>> gradually accumulate delay until fin=
ally they run out of read socket=0A> >>> buffer space and start dropping pa=
ckets.=0A> >>=0A> >>=0A> >> Well, encryption and decryption takes a lot of =
CPU time, but context=0A> >> switches are also bad.=0A> >>=0A> >> Tinc is t=
reating UDP in a strictly FIFO way, but actually it does use a=0A> >> RED a=
lgorithm when tunneling over TCP. That said, it only looks at its=0A> >> ow=
n buffers to determine when to drop packets, and those only come into=0A> >=
> play once the kernel's TCP buffers are filled.=0A> >>=0A> >>> so I had a =
couple thoughts towards using multiple rx queues in the=0A> >>> vtun interf=
ace, and/or trying to read more than one packet at a time=0A> >>> (via recv=
mmsg) and do some level of fair queueing and queue=0A> management=0A> >>> (=
codel) inside tinc itself. I think that's=0A> >>> pretty doable without mod=
ifying the protocol any, but I'm not sure=0A> of=0A> >>> it's value until I=
saturate some cpu more.=0A> >>=0A> >>=0A> >> I'd welcome any work in this =
area :)=0A> >>=0A> >>> (and if you thought recvmsg was complex, look at rec=
vmmsg)=0A> >>=0A> >>=0A> >> It seems someone is already working on that, se=
e=0A> >> https://github.com/jasdeep-hundal/tinc.=0A> >>=0A> >>> D)=0A> >>>=
=0A> >>> the bottleneck link above is actually not tinc but the gateway, an=
d=0A> as=0A> >>> the gateway reverts to codel behavior on a single encapsul=
ated flow=0A> >>> encapsulating all the other flows, we end up with about 4=
0ms of=0A> >>> induced delay on this test. While I have a better codel (get=
s below=0A> >>> 20ms latency, not deployed), *fq*_codel by identifying indi=
vidual=0A> >>> flows gets the induced delay on those flows down below 5ms.=
=0A> >>=0A> >>=0A> >> But that should improve with ECN if fq_codel is confi=
gured to use that,=0A> >> right?=0A> >>=0A> >>> At one level, tinc being so=
nicely meshy means that the "fq" part of=0A> >>> fq_codel on the gateway w=
ill have more chance to work against the=0A> >>> multiple vpn flows it gene=
rates for all the potential vpn=0A> endpoints...=0A> >>>=0A> >>> but at ano=
ther... lookie here! ipv6! 2^64 addresses or more to use!=0A> >>> and port =
space to burn! What if I could make tinc open up 1024 ports=0A> >>> per con=
nection, and have it fq all it's flows over those? What could=0A> >>> go wr=
ong?=0A> >>=0A> >>=0A> >> Right, hash the header of the original packets, a=
nd then select a port=0A> >> or address based on the hash? What about putti=
ng that hash in the flow=0A> >> label of outer packets? Any routers that wo=
uld actually treat those as=0A> >> separate flows?=0A> >=0A> >=0A> > -- Sen=
t from my Android device with K-@ Mail. Please excuse my brevity.=0A> >=0A>=
> _______________________________________________=0A> > Cerowrt-devel mail=
ing list=0A> > Cerowrt-devel@lists.bufferbloat.net=0A> > https://lists.buff=
erbloat.net/listinfo/cerowrt-devel=0A> >=0A> =0A> =0A> =0A> --=0A> Dave T=
=C3=A4ht=0A> =0A> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_=
Talks=0A>
------=_20141203194509000000_77232
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Awesome start on the iss=
ue, in your note, Dave. Tor needs to change for several reasons - not=
that it isn't great, but with IPv6 and other things coming on line, plus t=
he understanding of fq_codel's rationale, plus ... - the world can do much =
better. Same with VPNs.
=0A
=0AI hope we can set our sights on a convergent target that doesn't =
get bogged down in the tradeoffs that were made when VPNs were originally p=
roposed. The world is no longer a bunch of disconnected networks prot=
ected by Cheswick firewalls. Cheswick said they were only temporary, =
and they've outlived their usefulness - they actually create security risks=
more than they fix them (centralizing security creates points of failure a=
nd attack that exponentially decrease the attackers' work factor). To=
some extent that is also true for Tor after these many years.
=0A
=0ABy putting the intelligence abou=
t security in the network, you basically do all the bad things that the end=
-to-end argument encourages you to avoid. We could also put congestio=
n control in the network by re-creating admission control and requiring con=
tractual agreements to carry traffic across every intermediary. But I=
think that basically destroys almost all the value of an "inter" net. &nbs=
p;It makes it a balkanized proprietary set of subnets that have dozens of r=
easons why you can't connect with anyone else, and no way to be free to con=
nect.
=0A
=0A
=0A<=
p style=3D"margin:0;padding:0;font-family: tahoma; font-size: 10pt; word-wr=
ap: break-word;">
=0A=0A
On Wednesday, December =
3, 2014 2:44pm, "Dave Taht" <dave.taht@gmail.com> said:
=
p>=0A
=0A
> On Wed, Dec=
3, 2014 at 6:17 AM, David P. Reed <dpreed@reed.com> wrote:
>=
> Tor needs this stuff very badly.
>
> Tor has many, m=
any problematic behaviors relevant to congestion control
> in gener=
al. Let me paste a bit of private discussion I'd had on it in a second,
> but a very good paper that touched upon it all was:
>
=
> DefenestraTor: Throwing out Windows in Tor
> http://www.cypher=
punks.ca/~iang/pubs/defenestrator.pdf
>
> Honestly tor nee=
ds to move to udp, and hide in all the upcoming
> webrtc traffic...=
.
>
> http://blog.mozilla.org/futurereleases/2014/10/16/te=
st-the-new-firefox-hello-webrtc-feature-in-firefox-beta/
>
&g=
t; webrtc needs some sort of non-centralized rendezvous mechanism, but I am=
REALLY
> happy to see calls and video stay entirely inside my netw=
ork when they can be
> negotiated as such.
>
> htt=
ps://plus.google.com/u/0/107942175615993706558/posts/M4xUtpCKJ4P
> =
> And of course, people are busily reinventing torrent in webrtc w=
ithout
> paying attention to congestion control at all.
> <=
br />> https://github.com/feross/webtorrent/issues/39
>
&g=
t; Giving access to udp to javascript programmers... what could go wrong?> :/
>
> > I do wonder whether we should focus o=
n vpn's rather than end to end
> > encryption that does not leak=
secure information through from inside as the
> > plan seems to=
do.
>
> "plan"?
>
> I like e2e encryptio=
n. I also like overlay networks. And meshes.
> And working dns and =
service discovery. And low latency.
>
> vpns are useful ab=
stractions for sharing an address space you
> may not want to share=
more widely.
>
> and: I've taken a lot of flack about how=
fq doesn't help on conventional
> vpns, and well, just came up wit=
h an unconventional vpn idea,
> that might have some legs here... (=
certainly in my case tinc
> as constructed already, no patches, sol=
ves hooking together the
> 12 networks I have around the globe, mos=
tly)
>
> As for "leaking information", packet size and fre=
quency is generally
> an obvious indicator of a given traffic type,=
some padding added or
> no. There is one piece of plaintext
&=
gt; in tinc (the seqno), also. It also uses a fixed port number for both
> sides of the connection (perhaps it shouldn't)
>
>=
So I don't necessarily see a difference between sending a whole lot of
> varying data on one tuple
>
> 2001:db8::1 <->=
2001:db8:1::1 on port 655
>
> vs
>
> 200=
1:db8::1 <-> 2001:db8:1::1 port 655
> 2001:db8::2 <-> 2=
001:db8:1::1 port 655
> 2001:db8::3 <-> 2001:db8:1::1 port 65=
5
> 2001:db8::4 <-> 2001:db8:1::1 port 655
> ....
>
> which solves the fq problem on a vpn like tinc neatly. A =
security feature
> could be source specific routing where we send s=
tuff over different paths
> from different ipv6 source addresses...=
and mixing up the src/dest ports
> more but that complexifies the =
fq portion of the algo.... my thought
> for an initial implementati=
on is to just hard code the ipv6 address range.
>
> I thin=
k however that adding tons and tons of ipv6 addresses to a given
> =
interface is probably slow,
> and might break things like nd and/or=
multicast...
>
> what would be cooler would be if you cou=
ld allocate an entire /64 (or
> /118) to the vpn daemon
> <=
br />> bindtoaddress(2001:db8::/118) (give me all the data for 1024 ips)=
>
> but I am not sure how to go about doing that..
&=
gt;
> ...moving back to a formerly private discussion about tors w=
oes...
>
>
> "This conversation is a bit separate=
from #11197 (which is an
> implementation issue in obfsproxy), so =
separate discussion somewhere
> would probably be required.
&g=
t;
> So, there appears to be a slight misconception on how tor tra=
ffic
> travels across the Internet that I will attempt to clarify, =
and
> hopefully not get too terribly wrong.
>
> Ea=
ch step of a given connection over tor involves multiple TCP/IP
> c=
onnections. To use a standard example of someone trying to watch Cat
&=
gt; Videos on the "real internet", it will look approximately like thus:
>
> Client <-> Guard <-> Relay <-> Exit &l=
t;-> Cat Videos
>
> Each step is a separate TCP/IP conn=
ection, authenticated and encrypted
> via TLS (TLS is likewise term=
inated at each hop). Using a pluggable
> transport encapsulates the=
first hop's TLS session with a different
> protocol be it obfs2, o=
bfs3, or something else.
>
> The cat videos are passed thr=
ough this path of many TCP/IP connections
> across things called Ci=
rcuits that are created/extended by the Client
> one hop at a time =
(So the example above, the kitty cats travel across
> 4 TCP/IP conn=
ections, relaying data across a Circuit that spans from
> the Clien=
t to the Exit. If my art skills were up to it, I would draw a
> dia=
gram.).
>
> Circuits are currently required to provide rel=
iable, in-order delivery.
>
> In addition to the standard =
congestion control provided by TCP/IP on a
> per-hop basis, there i=
s Circuit level flow control *and* "end to end"
> flow control in t=
he form of RELAY_SENDME cells, but given that multiple
> circuits c=
an end up being multiplexed over a singlular TCP/IP
> connection, p=
ropagation of these RELAY_SENDME cells can get delayed due
> to HOL=
issues.
>
> So, with that quick and dirty overview out of=
the way:
>
> * "Ah so if ecn is enabled it can be used?"<=
br />>
> ECN will be used if it is enabled, *but* the congestio=
n information
> will not get propaged to the source/destination of =
a given stream.
>
> * "Does it retain iw10 (the Linux defa=
ult nowadays sadly)?"
>
> Each TCP/IP connection if sent f=
rom a host that uses a obnoxiously
> large initial window, will hav=
e an obnoxiously large initial
> window.
>
> It is=
worth noting that since multiple Circuits originating from
> poten=
tially numerous clients can and will reuse existing TCP/IP
> connec=
tions if able to (see 5.3.1 of the tor spec) that dropping packets
>=
; between tor relays is kind of bad, because all of the separate
> =
encapsulated flows sharing the singular TCP/IP link will suffer (ECN
&=
gt; would help here). This situation is rather unfortunate as the good
> active queue management algorithms drop packets (when ECN is not
> available).
>
> A better summary of tor's flow contr=
ol/bufferbloat woes is given in:
>
> DefenestraTor: Throwi=
ng out Windows in Tor
> http://www.cypherpunks.ca/~iang/pubs/defene=
strator.pdf
>
> The N23 algorithm suggested in the paper d=
id not end up getting
> implemented into Tor, but I do not remember=
the reason off the top of
> my head."
>
>
&=
gt; >
> >
> >
> > On Dec 3, 2014, Guus =
Sliepen <guus@tinc-vpn.org> wrote:
> >>
> >&=
gt; On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote:
> &g=
t;>
> >> [...]
> >>>
> >>&g=
t; https://github.com/dtaht/tinc
> >>>
> >>&=
gt; I successfully converted tinc to use sendmsg and recvmsg, acquire
=
> (at
> >>> least on linux) the TTL/Hoplimit and IP_TOS=
/IPv6_TCLASS packet
> fields,
> >>
> >>=
> >> Windows does not have sendmsg()/recvmsg(), but the BSDs=
support it.
> >>
> >>> as well as SO_TIMEST=
AMPNS, and use a higher resolution internal
> clock.
> >=
>> Got passing through the dscp values to work also, but:
> &=
gt;>>
> >>> A) encapsulation of ecn capable marked p=
ackets, and availability in
> >>> the outer header, withou=
t correct decapsulationm doesn't work well.
> >>>
>=
; >>> The outer packet gets marked, but by default the marking doe=
sn't
> make
> >>> it back into the inner packet wh=
en decoded.
> >>
> >>
> >> Is the=
kernel stripping the ECN bits provided by userspace? In the code
>=
>> in your git branch you strip the ECN bits out yourself.
>=
>>
> >>> So communicating somehow that a path can t=
ake ecn (and/or diffserv
> >>> markings) is needed between=
tinc daemons. I thought of perhaps
> >>> crafting a speci=
al icmp message marked with CE but am open to ideas
> >>> =
that would be backward compatible.
> >>
> >>
> >> PMTU probes are used to discover whether UDP works and how=
big the path
> >> MTU is, maybe it could be used to discover=
whether ECN works as well?
> >> Set one of the ECN bits on s=
ome of the PMTU probes, and if you receive a
> >> probe with =
that ECN bit set, also set it on the probe reply. If you
> >>=
succesfully receive a reply with ECN bits set, then you know ECN works.
> >> Since the remote side just echoes the contents of the prob=
e, you could
> >> also put a copy of the ECN bits in the prob=
e payload, and then you can
> >> detect if the ECN bits got z=
eroed. You can also define an OPTION_ECN in
> >> src/connecti=
on.h, so nodes can announce their support for ECN, but that
> >&=
gt; should not be necessary I think.
> >>
> >>&=
gt; B) I have long theorized that a lot of userspace vpns bottleneck on
> >>> the read and encapsulate step, and being strict FIFOs,<=
br />> >>> gradually accumulate delay until finally they run ou=
t of read socket
> >>> buffer space and start dropping pac=
kets.
> >>
> >>
> >> Well, encryp=
tion and decryption takes a lot of CPU time, but context
> >>=
switches are also bad.
> >>
> >> Tinc is treat=
ing UDP in a strictly FIFO way, but actually it does use a
> >&g=
t; RED algorithm when tunneling over TCP. That said, it only looks at its> >> own buffers to determine when to drop packets, and those =
only come into
> >> play once the kernel's TCP buffers are fi=
lled.
> >>
> >>> so I had a couple thoughts =
towards using multiple rx queues in the
> >>> vtun interfa=
ce, and/or trying to read more than one packet at a time
> >>=
> (via recvmmsg) and do some level of fair queueing and queue
> =
management
> >>> (codel) inside tinc itself. I think that'=
s
> >>> pretty doable without modifying the protocol any, =
but I'm not sure
> of
> >>> it's value until I sat=
urate some cpu more.
> >>
> >>
> >&g=
t; I'd welcome any work in this area :)
> >>
> >&g=
t;> (and if you thought recvmsg was complex, look at recvmmsg)
>=
>>
> >>
> >> It seems someone is already=
working on that, see
> >> https://github.com/jasdeep-hundal/=
tinc.
> >>
> >>> D)
> >>>> >>> the bottleneck link above is actually not tinc but th=
e gateway, and
> as
> >>> the gateway reverts to c=
odel behavior on a single encapsulated flow
> >>> encapsul=
ating all the other flows, we end up with about 40ms of
> >>&=
gt; induced delay on this test. While I have a better codel (gets below
> >>> 20ms latency, not deployed), *fq*_codel by identifying =
individual
> >>> flows gets the induced delay on those flo=
ws down below 5ms.
> >>
> >>
> >>=
But that should improve with ECN if fq_codel is configured to use that,
> >> right?
> >>
> >>> At one le=
vel, tinc being so nicely meshy means that the "fq" part of
> >&=
gt;> fq_codel on the gateway will have more chance to work against the> >>> multiple vpn flows it generates for all the potential=
vpn
> endpoints...
> >>>
> >>> b=
ut at another... lookie here! ipv6! 2^64 addresses or more to use!
>=
; >>> and port space to burn! What if I could make tinc open up 10=
24 ports
> >>> per connection, and have it fq all it's flo=
ws over those? What could
> >>> go wrong?
> >&g=
t;
> >>
> >> Right, hash the header of the orig=
inal packets, and then select a port
> >> or address based on=
the hash? What about putting that hash in the flow
> >> labe=
l of outer packets? Any routers that would actually treat those as
>=
; >> separate flows?
> >
> >
> > -- =
Sent from my Android device with K-@ Mail. Please excuse my brevity.
&=
gt; >
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.b=
ufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowr=
t-devel
> >
>
>
>
> --
&=
gt; Dave T=C3=A4ht
>
> thttp://www.bufferbloat.net/project=
s/bloat/wiki/Upcoming_Talks
>
=0A
------=_20141203194509000000_77232--