From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 17E4521F4E4 for ; Thu, 4 Dec 2014 10:53:03 -0800 (PST) Received: by mail-oi0-f54.google.com with SMTP id u20so12842365oif.27 for ; Thu, 04 Dec 2014 10:53:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=xXHO7zq2IVsucX8wUnofHded68/n7wYUHsYYFmPP7gQ=; b=FYLPQUMdg2Qdj7JwJDIEn/wAl6wGH0/waf93fFarZp/vhZgLneBcE1uEnkDNoD2fwL tDIjveSV1QvcCJVXgorcL/7fF2tOwBQk3ElxYcsOxf+Jrlr79dNUJy4+Dt+svz6ywYj6 xs6fV0lStZjjmZaFkHQ5bBEK+Y296/hnxGwarHRj5JCUmcIsSKbSE0k4Tdk5cAGDc2TS Wi9Csr+SzQ9hLEjBtz6+hlwI8jrg3uwkdsXqeYqePKBODTdTi3dg4IUtid9T20jliBx1 Qzzb9WZ0WOib7TplNfxvcSKlgVhzTl29iDsLbKe5dxJmExVhgRCV6Szona2NnJ5kdCl9 P6og== MIME-Version: 1.0 X-Received: by 10.60.219.97 with SMTP id pn1mr1303014oec.45.1417719182725; Thu, 04 Dec 2014 10:53:02 -0800 (PST) Received: by 10.202.227.77 with HTTP; Thu, 4 Dec 2014 10:53:02 -0800 (PST) In-Reply-To: References: <20141203120246.GO10533@sliepen.org> Date: Thu, 4 Dec 2014 10:53:02 -0800 Message-ID: From: Dave Taht To: tinc-devel@tinc-vpn.org, Guus Sliepen , "cerowrt-devel@lists.bufferbloat.net" Content-Type: multipart/mixed; boundary=001a11c1a2ea472c3d05096878d5 Subject: Re: [Cerowrt-devel] tinc vpn: adding dscp passthrough (priorityinherit), ecn, and fq_codel support X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Dec 2014 18:53:32 -0000 --001a11c1a2ea472c3d05096878d5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Dec 3, 2014 at 12:32 PM, Dave Taht wrote: > On Wed, Dec 3, 2014 at 4:02 AM, Guus Sliepen wrote: >> On Wed, Dec 03, 2014 at 12:07:59AM -0800, Dave Taht wrote: >> >> [...] >>> https://github.com/dtaht/tinc >>> >>> I successfully converted tinc to use sendmsg and recvmsg, acquire (at >>> least on linux) the TTL/Hoplimit and IP_TOS/IPv6_TCLASS packet fields, >> >> Windows does not have sendmsg()/recvmsg(), but the BSDs support it. >> >>> as well as SO_TIMESTAMPNS, and use a higher resolution internal clock. >>> Got passing through the dscp values to work also, but: >>> >>> A) encapsulation of ecn capable marked packets, and availability in >>> the outer header, without correct decapsulationm doesn't work well. >>> >>> The outer packet gets marked, but by default the marking doesn't make >>> it back into the inner packet when decoded. >> >> Is the kernel stripping the ECN bits provided by userspace? In the code >> in your git branch you strip the ECN bits out yourself. > > Linux, at least, gives access to all 8 bits of the tos field on udp. OSX appears to do so also, at least on ipv6. Jonathon morton wrote some code to test the ideas here: http://snapon.lab.bufferbloat.net/~d/udp-tos.c and I have similar but buggy code in my isochronous repo on github (udpburst), where I struggled with v6mapped and ipv4 sockets for a while before giving up. > Windows does not, unless you have admin privs. Don't know > about other OSes. > > The comment there: > > tos =3D origpkt->tos & ~0x3 ; // chicken out on passing ecn for now > > was due to seeing this happen otherwise (talking to a tinc not yet > modified to decapsulate ecn markings correctly) > > http://snapon.lab.bufferbloat.net/~d/tinc/ecn.png > > and > > was awaiting some thought on a truth table derived from the > relevant rfc (which I think is slightly wrong, btw), and further > thought on determining if ecn could be used on that path. Continuing to work with this, patch attached, haven't worked on the dencapsulation step yet... not clear to me if there is going to get state needed in connections_t... Two issues still to wrap my head around. It's not clear to me when a tinc daemon might forward an already encapsulated packet to another relay and if it does it over udp. (?) If so, the original IP headers in the packet can be lost or modified further enroute, so if I have CE set on the outside header and am forwarding to a non ECN capable receiver, I should drop the packet, and there may be other nuances. Similarly, when a packet is compressed... > certainly I could deploy a tinc modified to assume ecn was > in use, (and may, shortly!) with the right truth table. > > There was a comment higher up in the file also - > I would like to decrement hopcount/ttl > on the encapsulated packet by the > actual number of hops in the overlay path, not by one, > as is the default here, and in many other vpns. > > This would decrease the damage caused by > routing loops. And going back to the forwarding issue, if over udp, I'd like the total hopcount to be preserved e2e, and passed into the finally decapsulated packet.... > >>> So communicating somehow that a path can take ecn (and/or diffserv >>> markings) is needed between tinc daemons. I thought of perhaps >>> crafting a special icmp message marked with CE but am open to ideas >>> that would be backward compatible. >> >> PMTU probes are used to discover whether UDP works and how big the path >> MTU is, maybe it could be used to discover whether ECN works as well? > > Yes. > >> Set one of the ECN bits on some of the PMTU probes, and if you receive a >> probe with that ECN bit set, also set it on the probe reply. > > This is an encapsulated packet vs an overt ping? Seems saner to test > over the encapsulation in this case. > >>If you >> succesfully receive a reply with ECN bits set, then you know ECN works. > > Well it should test for both CE and ECT(0) being set on separate > packets. > >> Since the remote side just echoes the contents of the probe, you could >> also put a copy of the ECN bits in the probe payload, and then you can >> detect if the ECN bits got zeroed. You can also define an OPTION_ECN in >> src/connection.h, so nodes can announce their support for ECN, but that >> should not be necessary I think. > > Not sure. > >> >>> B) I have long theorized that a lot of userspace vpns bottleneck on >>> the read and encapsulate step, and being strict FIFOs, >>> gradually accumulate delay until finally they run out of read socket >>> buffer space and start dropping packets. >> >> Well, encryption and decryption takes a lot of CPU time, but context >> switches are also bad. >> >> Tinc is treating UDP in a strictly FIFO way, but actually it does use a >> RED algorithm when tunneling over TCP. That said, it only looks at its > > One of these days I'll get around to writing a userspace codel lib > in pure C. Or someone else will. > > The C++ versions in ns2, ns3, and mahimahi are hard to read. My currently > pretty elegant codel2.h might be a starting point, if only > I could solve count increasing without bound sanely. > >> own buffers to determine when to drop packets, and those only come into >> play once the kernel's TCP buffers are filled. > > TCP small queues (TSQ) and BQL should be a big boon to vpn and tor users. > >>> so I had a couple thoughts towards using multiple rx queues in the >>> vtun interface, and/or trying to read more than one packet at a time >>> (via recvmmsg) and do some level of fair queueing and queue management >>> (codel) inside tinc itself. I think that's >>> pretty doable without modifying the protocol any, but I'm not sure of >>> it's value until I saturate some cpu more. >> >> I'd welcome any work in this area :) > > Well, I have to get packet timestamping to give sane results, and then > come up with saturating workloads for my hardware. This is easy for > cerowrt - I doubt the mips 640mhz processor can encrypt and push > even as much as 2mbit/sec.... > > but my "vision" such as it was, was to toss a beaglebone box > in as a vpn gateway instead, (on comcast's dynamically assigned > ipv6 networks) and maybe fiddle with the > > http://cryptotronix.com/products/cryptocape/ > > which has a new kernel driver.... > > (it was a weekend, it was raining, I needed to get to my lab in > los gatos from gf's in SF and > ssh tunneling and portforwarding was getting bothersome... > > so I hacked on tinc. :) ) > >>> (and if you thought recvmsg was complex, look at recvmmsg) >> >> It seems someone is already working on that, see >> https://github.com/jasdeep-hundal/tinc. > > Seemed to be mostly windows related hacking. > > I am not ready to consider all the infrastructure required to > accumulate and manage packets inside of tinc, nor (after > fighting with recvmsg/sendmsg for 2 days) ready to tackle > recvmmsg... or threads and ringbuffers and all the headache > that entails. BUT, if timestamping at the socket layer works like I think it does, codel seems plausible without buffering up any packets in the daemon itself. >>> D) >>> >>> the bottleneck link above is actually not tinc but the gateway, and as >>> the gateway reverts to codel behavior on a single encapsulated flow >>> encapsulating all the other flows, we end up with about 40ms of >>> induced delay on this test. While I have a better codel (gets below >>> 20ms latency, not deployed), *fq*_codel by identifying individual >>> flows gets the induced delay on those flows down below 5ms. >> >> But that should improve with ECN if fq_codel is configured to use that, >> right? > > Meh. Ecn is very useful on very short or very long paths where > packet loss as an indicator of congestion is hurtful. In the general > case it adds a tiny bit to overall latency for other flows as congestion = is not > cleared for an RTT, instead of at the bottleneck, with a loss. > > This is still overly optimistic, IMHO: > > https://tools.ietf.org/html/draft-ietf-aqm-ecn-benefits-00 > > current linux pie, red and codel do not enable ecn by default, > currently. Arguably pie could (because it has overload protection), > but codel, no. > > Have a version of codel and fq_codel (and cake) that do > ecn overload protection, and enable ecn by default, am testing... > > fq_codel enables ECN by default, (overload does very little > harm) > > but openwrt (not cerowrt) > turns it off on their qos-scripts. It's half on by default in sqm-scripts= , > and works pretty well if you have enough bandwidth - I > routinely run a few low latency networks with near zero packet loss, > and near-perfect utilization... which impresses me, at least... > > ECN makes me nervous in general when enabled outside the > datacenter, but as something like 60% of the alexa top 1million > will enable ecn if asked nowadays, I hope that that worry extends > to enough more people for me to worry less. > > http://ecn.ethz.ch/ > > I am concerned that enabling ECN generally breaks Tor over tcp > even worse at the moment.... (I hadn't thought about it til > my last message) > > Certainly I think ECN is a great idea for vpns so long as it is > implemented correctly, although my understanding of CTR > mode over udp is that loss hurts not, and neither does > reordering? > > In tinc: what if I get a packet with a seqno 5 after receiving packets > with seq 1-4,6-255. does that get dropped due to the replay protection, > or (if it passes muster) get decrypted and forwarded even after that much > reordering? > > (I am all in favor of not worrying about reordering much. wifi aps > tend to do it a lot, so do route flaps, and linux tcp, at least, is > now VERY resistant to reordering problems, handling megabytes > of out of order delivery problems with aplomb. > > windows on the other hand, sucks in this department, still) >> >>> At one level, tinc being so nicely meshy means that the "fq" part of >>> fq_codel on the gateway will have more chance to work against the >>> multiple vpn flows it generates for all the potential vpn endpoints... >>> >>> but at another... lookie here! ipv6! 2^64 addresses or more to use! >>> and port space to burn! What if I could make tinc open up 1024 ports >>> per connection, and have it fq all it's flows over those? What could >>> go wrong? >> >> Right, hash the header of the original packets, and then select a port >> or address based on the hash? > > Yes. I am leaning towards ipv6 address rather than port, you rapidly > run out of ports in ipv4, and making this an ipv6 specific feature > seems safer to test. > > I look forward to messing up the expectations of many a stateful > ipv6 firewall.... > >>What about putting that hash in the flow >> label of outer packets? Any routers that would actually treat those as >> separate flows? > > The flow label was a pretty good idea shot down by too many people > arguing over the bits. I don't think there is a lot of useful information > stored there in any coherent way, (it's too bad that the vxlan stuff > added a prepended header, instead of just using the flowlabel) > > so it is best to just hash the main > headers and whatever inner headers you can obtain, as per > > http://lxr.free-electrons.com/source/net/core/flow_dissector.c#L54 > > and > > https://github.com/torvalds/linux/blob/master/net/sched/sch_fq_codel.c#L7= 0 > > I have quibble with the jhash3 here, as the present treatment > of ipv6 is the very efficient but not very hashy > > addr[0] ^ addr[2] ^ addr[3] ^ addr[4] (somewhere in the code), > instead of feeding all the bits to the hash function(s). > > >> >> -- >> Met vriendelijke groet / with kind regards, >> Guus Sliepen >> >> _______________________________________________ >> tinc-devel mailing list >> tinc-devel@tinc-vpn.org >> http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc-devel > > > > -- > Dave T=C3=A4ht > > thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks --=20 Dave T=C3=A4ht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks --001a11c1a2ea472c3d05096878d5 Content-Type: text/x-patch; charset=US-ASCII; name="working_out_encapsulation_issues.patch" Content-Disposition: attachment; filename="working_out_encapsulation_issues.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i3ah9uej0 ZGlmZiAtLWdpdCBhL3NyYy9jb25uZWN0aW9uLmggYi9zcmMvY29ubmVjdGlvbi5oCmluZGV4IDg3 NzYwMWYuLmIyN2RmNDggMTAwNjQ0Ci0tLSBhL3NyYy9jb25uZWN0aW9uLmgKKysrIGIvc3JjL2Nv bm5lY3Rpb24uaApAQCAtMzAsNiArMzAsOSBAQAogI2RlZmluZSBPUFRJT05fVENQT05MWQkJMHgw MDAyCiAjZGVmaW5lIE9QVElPTl9QTVRVX0RJU0NPVkVSWQkweDAwMDQKICNkZWZpbmUgT1BUSU9O X0NMQU1QX01TUwkweDAwMDgKKyNkZWZpbmUgT1BUSU9OX0VDTiAgICAgICAgICAweDAwMTAKKyNk ZWZpbmUgT1BUSU9OX0RTQ1AgICAgICAgICAweDAwMjAKKyNkZWZpbmUgT1BUSU9OX01FR0FJUCAg ICAgICAweDAwNDAKIAogdHlwZWRlZiBzdHJ1Y3QgY29ubmVjdGlvbl9zdGF0dXNfdCB7CiAJdW5z aWduZWQgaW50IHBpbmdlZDoxOwkJCQkvKiBzZW50IHBpbmcgKi8KQEAgLTQxLDcgKzQ0LDEwIEBA IHR5cGVkZWYgc3RydWN0IGNvbm5lY3Rpb25fc3RhdHVzX3QgewogCXVuc2lnbmVkIGludCBlbmNy eXB0b3V0OjE7CQkJLyogMSBpZiB3ZSBjYW4gZW5jcnlwdCBvdXRnb2luZyB0cmFmZmljICovCiAJ dW5zaWduZWQgaW50IGRlY3J5cHRpbjoxOwkJCS8qIDEgaWYgd2UgaGF2ZSB0byBkZWNyeXB0IGlu Y29taW5nIHRyYWZmaWMgKi8KIAl1bnNpZ25lZCBpbnQgbXN0OjE7CQkJCS8qIDEgaWYgdGhpcyBj b25uZWN0aW9uIGlzIHBhcnQgb2YgYSBtaW5pbXVtIHNwYW5uaW5nIHRyZWUgKi8KLQl1bnNpZ25l ZCBpbnQgdW51c2VkOjIzOworCXVuc2lnbmVkIGludCBkc2NwOjE7ICAgICAgICAgICAvKiAxIGlm IHRoaXMgY29ubmVjdGlvbiB0cmllcyB0byBwcmVzZXJ2ZSBkaWZmc2VydiBtYXJraW5ncyAqLwor CXVuc2lnbmVkIGludCBlY246MTsgICAgICAgICAgICAvKiAxIGlmIHRoaXMgY29ubmVjdGlvbiBy ZXNwZWN0cyBlY24gKi8KKwl1bnNpZ25lZCBpbnQgbWVnYWlwOjE7ICAgICAgICAgLyogMSBpZiB3 ZSBhcmUgZ29pbmcgdG8gdXNlIG1hbnkgSVBzIHRvIEZRICovCisJdW5zaWduZWQgaW50IHVudXNl ZDoyMDsKIH0gY29ubmVjdGlvbl9zdGF0dXNfdDsKIAogI2luY2x1ZGUgImVkZ2UuaCIKZGlmZiAt LWdpdCBhL3NyYy9uZXRfcGFja2V0LmMgYi9zcmMvbmV0X3BhY2tldC5jCmluZGV4IDRjZTgzYTku LjFlMGNhMjEgMTAwNjQ0Ci0tLSBhL3NyYy9uZXRfcGFja2V0LmMKKysrIGIvc3JjL25ldF9wYWNr ZXQuYwpAQCAtODMsNiArODMsNDYgQEAgYm9vbCBsb2NhbGRpc2NvdmVyeSA9IGZhbHNlOwogCiAq LwogCisvKiBUaGlzIG5lZWRzIHRvIGhhdmUgZGlzY292ZXJlZCBpZiB0aGUgcGF0aCBpcyBlY24g Y2FwYWJsZSBvciBub3QgKi8KKworc3RhdGljIGludCBlY25fZW5jYXBzdWxhdGUoaW50IHRvc19p biwgaW50IHRvc19lbmNhcCkgeworCisJaW50IGVjbl9lbmNhcCA9IHRvc19lbmNhcCAmIDM7CisJ aW50IGVjbl9pbiA9IHRvc19pbiAmIDM7CisJCisvKiBJZiBDRSBpcyBhcHBsaWVkIG9uIHRoZSBv dXRlciBoZWFkZXIgYnV0IEVDVCgwKSB8IEVDVCgxKSBOT1Qgb24gdGhlCisgICBpbm5lciwgaW5k aWNhdGUgdGhlIHBhY2tldCBzaG91bGQgYmUgZHJvcHBlZCAqLworCisJaWYoZWNuX2VuY2FwID09 IDMpCisJCWlmIChlY25faW4gJiAzKQorCQkJcmV0dXJuIHRvc19pbiB8IDM7CisJCWVsc2UKKwkJ CXJldHVybiAtdG9zX2luOworCisJLy8gTm90ZSB3ZSBjb3VsZCB0cnkgdG8gZG8gc29tZXRoaW5n IGNsZXZlciB3aXRoIHRoZSBlY24gbm9uY2UgaGVyZQorCQorCXJldHVybiB0b3NfaW47Cit9CisK Kworc3RhdGljIGludCBlY25fZGVjYXBzdWxhdGUoaW50IHRvc19pbiwgaW50IHRvc19lbmNhcCkg eworCisJaW50IGVjbl9lbmNhcCA9IHRvc19lbmNhcCAmIDM7CisJaW50IGVjbl9pbiA9IHRvc19p biAmIDM7CisJCisvKiBJZiBDRSBpcyBhcHBsaWVkIG9uIHRoZSBvdXRlciBoZWFkZXIgYnV0IEVD VCgwKSB8IEVDVCgxKSBOT1Qgb24gdGhlCisgICBpbm5lciwgaW5kaWNhdGUgdGhlIHBhY2tldCBz aG91bGQgYmUgZHJvcHBlZCAqLworCisJaWYoZWNuX2VuY2FwID09IDMpCisJCWlmIChlY25faW4g JiAzKQorCQkJcmV0dXJuIHRvc19pbiB8IDM7CisJCWVsc2UKKwkJCXJldHVybiAtdG9zX2luOwor CisJcmV0dXJuIHRvc19pbjsKKworfQorCiB2b2lkIHNlbmRfbXR1X3Byb2JlKG5vZGVfdCAqbikg ewogCXZwbl9wYWNrZXRfdCBwYWNrZXQ7CiAJaW50IGxlbiwgaTsKQEAgLTQ1OSw3ICs0OTksMjQg QEAgc3RhdGljIHZvaWQgc2VuZF91ZHBwYWNrZXQobm9kZV90ICpuLCB2cG5fcGFja2V0X3QgKm9y aWdwa3QpIHsKIAogCW9yaWdsZW4gPSBpbnBrdC0+bGVuOwogCW9yaWdwcmlvcml0eSA9IGlucGt0 LT5wcmlvcml0eTsKKwlpbnQgdG9zOworCQorCWlmKG4tPm9wdGlvbnMgJiBPUFRJT05fRUNOKSB7 CisJCWlmKCh0b3MgPSBlY25fZGVjYXBzdWxhdGUob3JpZ3BrdC0+dG9zLCBvcmlncGt0LT50b3Nf b3V0ZXIpKSA8IDApIHsKKwkJCQlpZmRlYnVnKFRSQUZGSUMpIGxvZ2dlcihMT0dfRVJSLCAiQ0Ug bWFya2VkIG5vbiBFQ04gcGFja2V0IGRyb3BwZWQgJXMgKCVzKSIsCisJCQkJCQkJCQkJbi0+bmFt ZSwgbi0+aG9zdG5hbWUpOworCQkJCXJldHVybjsKKwkJCX0KKwl9CiAKKwlpZihuLT5vcHRpb25z ICYgT1BUSU9OX0RTQ1ApIHsKKwkJdG9zIHw9IG9yaWdwa3QtPnRvcyAmIH4weDM7CisJfQorCQor CWlmKCF0b3MgJiYgdG9zICE9IG9yaWdwa3QtPnRvcykgeworCQkvLyBGSVhNRSByZXdyaXRlIHRo ZSBpbm5lciBoZWFkZXIKKwl9CisJCiAJLyogQ29tcHJlc3MgdGhlIHBhY2tldCAqLwogCiAJaWYo bi0+b3V0Y29tcHJlc3Npb24pIHsKQEAgLTU1NSwxNCArNjEyLDYgQEAgc3RhdGljIHZvaWQgc2Vu ZF91ZHBwYWNrZXQobm9kZV90ICpuLCB2cG5fcGFja2V0X3QgKm9yaWdwa3QpIHsKIAkvLyBpbnZp c2libGUgcm91dGluZyBsb29wcy4gRklYTUUgV2UgY291bGQgYWxzbyBkZWNyZWFzZSB0aGUKIAkv LyBUVEwgYnkgdGhlIGFjdHVhbCBwYXRoIGxlbmd0aCwgcmF0aGVyIHRoYW4gYnkgMS4KIAotCWlu dCB0b3M7Ci0JCi0JaWYocHJpb3JpdHlpbmhlcml0YW5jZSkgewotCSAgdG9zID0gb3JpZ3BrdC0+ dG9zICYgfjB4MyA7IC8vIGNoaWNrZW4gb3V0IG9uIHBhc3NpbmcgZWNuIGZvciBub3cKLQl9IGVs c2UgewotCSAgdG9zID0gMDsKLQl9Ci0KIAlpZmRlYnVnKFRSQUZGSUMpIGxvZ2dlcihMT0dfV0FS TklORywgInByaW9yaXR5aW5oZXJpdGFuY2UgJWQ6ICVkIiwKIAkgICAgICAgcHJpb3JpdHlpbmhl cml0YW5jZSwgdG9zKTsKIAo= --001a11c1a2ea472c3d05096878d5--