From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 3554021F1E5 for ; Tue, 7 Apr 2015 11:56:29 -0700 (PDT) Received: by obbgh1 with SMTP id gh1so98831681obb.1 for ; Tue, 07 Apr 2015 11:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=xW4Fi2XcOYhoseMS2WcW3aWitucXpQ5exPYNrFBdoyA=; b=l9eQwfE9kZSZS7a2QveGFMiGy/NdKbHjZqi5f8Tr5/Zw6q0qU1UVLFRCnujMr4rCxC ZQ2I8omtdcct9eyLA6fBaStBlQhRCZAZxljxVHrQkfY4a6NAVoGpdvydR3PJPEejIj6d +nEBE6OWr6jv7G1cQJGpl2gPqto9Bat6QS4cIwpm4l3kXaF2Xss47aYk335B87Mt00fx +DEVu/2D/caygo1MB//m4u1tI6uXwQ8/1ZgD3kwQJLqCb6nEcjyghYnN2juFl0Txg/Ty uvQrL3axwX7bsPDOkxbDr8MaXDgxGrSQn4F8k5m7Kz/vmJRKl0bANTl2WQCK1smv9IsD eM0g== MIME-Version: 1.0 X-Received: by 10.182.255.195 with SMTP id as3mr27163913obd.56.1428432987205; Tue, 07 Apr 2015 11:56:27 -0700 (PDT) Received: by 10.202.51.66 with HTTP; Tue, 7 Apr 2015 11:56:27 -0700 (PDT) Date: Tue, 7 Apr 2015 11:56:27 -0700 Message-ID: From: Dave Taht To: "babel-users@lists.alioth.debian.org" , "cerowrt-devel@lists.bufferbloat.net" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Felix Fietkau , Andrew McGregor Subject: [Cerowrt-devel] more wet paint - babel unicast IHU for short-rtt path optimization X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2015 18:56:58 -0000 Please ignore this until after babel-1.6. It was prompted by finally reading over the babel-long-rtt related code which bundles hello and IHU together and some old notes I had made 2 years back when the first talk of using arpanet style RTT routing metrics first became plausible. Might as well store it on babel-devel and stick bits in andrews and felix=C2=B4s head. My understanding of the babeld code is that unicast code is in there but not used, and if it were used, it would not work against existing babel daemons. ? So how to interoperate with older babel daemons if we used more unicast? TL; DR; ... It is kind of my hope that with all this fun stuff detailed below in play, a topology like this will choose the faster, less direct route, rather than the more direct route, more often, particularly by using the unicast responses to also measure connectivity better, summed e2e. Prefer: routerA-routerb-routerc-routere-routerf-routerg-routerH (all ethernet and nanostation M5 p2p radios) vs routerA --- 3000 meter lousy wifi connection - routerH More details below: (arguably this email may be longer than the code would = be!) 0) I have been in multiple situations where multicast worked, but unicast didn't (mostly due to bugs, but often due to distance and minstrel failing to fall back to the lowest rate (also a bug), and one time, by firewalling off unicast) and have always felt in the case of wifi that testing both the multicast and unicast path was the best indication of actual connectivity. For the eventual choose the best path from the shortest-rtt sum metrics, testing unicast in addition to multicast has a few other pleasing properties, as it provides two delay measurement variables that produce interesting and different results. 1) APs operate in power save mode against most clients. Multicast is often delayed by as much as 250ms by this feature in the wifi standards. CS6 markings are more or less ignored except that they go into queue 1 (on many mq wifi systems), and jump the hardware queue, where they then stall until they can be scheduled. Other things (mdns, nd, etc) also use multicast in this mode and go into the hardware queue, inducing delay and jitter with traffic co-existing. Win: wifi in this case is slower than unicast by a lot and exhibits high variance. 2) In adhoc mode, with 802.11e enabled, (at least on some drivers) CS6 markings are presently scheduled sooner (in the VO queue) and get airtime sooner, but only for a single packet, and are still affected by other multicast traffic in the queue. Lose: Burning a txop for a single packet is bad. Grabbing the VO airtime is bad. It was for voice, and it was not particularly good for that either. Win: multicast takes a long time to transmit - 13ms for a single 1500 byte packet, AFTER it gets airtime. [2] The only thought I have had about this before today is to turn off babel CS6 on known to be wifi transmits. That injects more native delay (under load) into the transmit than otherwise. (I have not tried this) The wifi CS6 handling [1] "feature" is not ideal! In cerowrt I flipped the diffserv handling to put it in the VI queue to take better advantage of wireless-n aggregation, and in make-wifi-fast we probably plan to maximize aggregation opportunities entirely, ignoring most markings in favor of maximizing aggregation and minimizing txops, and using fq to pack flows into those txops. Sorting out more of the right thing, to me, for short-rtt metrics, involves flipping the problem on its head - what marking will get the *least* opportunity for airtime, and be most affected by queue delays? That becomes no marking at all, for babel. And I would not mind at all if openwrt turned off the VO queue for user traffic entirely for everything in mac80211, at least. 2a) CS6 is often treated as priority on ethernet. Keep that. 3) Unicast IHU responses can run at the minstrel derived transmission rate, which is up to 600x1 ratio vs multicast wireless-n and at least 3-5x higher on ac. It makes a lot of sense to use unicast, even with a fairly dense mesh. How dense to fall back to multicast is kind of unknown on modern standards, and TXOPs have a fixed cost also in older standards. A pleasing property about this is that no matter how hard we try, sending packets both multicast and unicast will result in testing both parts of the path, and in the case of congestion, be delayed by that, also, as a function of the known rate, and the amount of packets going through it. In general, sending less multicast should be a goodness. 4) Unicast transmissions would keep minstrel=C2=B4s statistics "primed" as to the right rate to use for transmit on mostly idle links. I do not know if every 4 seconds is often enough to keep them primed, but the priming process itself injects delay, and that is good, and having the right rate, all the time, available, is also good. Bad connectivity nowadays leads to tons and tons of delay as drivers blithely retry for 10s or 100s of milliseconds. 5) ECN markings would ensure that packets are mostly dropped due to reachability, not congestion. ECN support is enabled by default in the openwrt default of fq_codel. 6) There is no.... rule 6! 7) fq-ing everywhere, (as in openwrt), leads to when a volley of unicast IHU packets to various stations being scheduled - today, without fq-ing, such a volley costs a txop for each station (possibly per packet), incurring an increase of rtt across the last packets of the volley. with per station fair queuing the volley gets scheduled into aggregates for each station, similar rtt increase win, also. 8) I know incidentally that not fully randomizing the delivery of IHU to multiple stations is an issue from a theoretical perspective. It needen't be embedded in the protocol itself (at least for testing), and sending large (2-3) full size timestamped packets per destination as part of a measurement is ok too.... and they are free when you have aggregation on wifi, or nearly so! 9) in all cases fq_codel particularly is max-min fair, so the first full sized (well, in openwrt, <=3D 300 byte) packet goes out the front of the queue quickly, and the second packet is delayed by the total number of flows. On ethernet it remains hard to drive to saturation except at 100mbit and below, so you will see nearly zero induced delays at gigE, and fq_codel will hold fq=C2=B4d delays below 5ms on most BQL drivers I am familiar with (usually below 2ms) at line rates. gige,100mbit,10mbit have pretty distinct plataeus when idle to establish a baseline rtt for those - (and ping data on idle links is misleading - ping is deoptimized somewhat) In the case of older pure fifo queues being used, you will see more jitter and variance due to congestion, and those links will be less ideal to use in general, anyway... An example where you would see that is on an ethernet through homeplug device. Can this solution be made fully general? I look forward to finding out. [1] It is worst than that. IETF defined CS1 as background. Most of the off the shelf routers I have tried still treat it as higher priority than BE on ethernet. Comcast remarks all traffic with weird markings to CS1, and then CS1 gets treated as background (the 802.11e BK queue) by most (but not all) wifi drivers. [2] I am under the impression that most meshes (freifunct?) are running at a vastly higher multicast rate - 12000 or 24000 - which is still quite slow compared to aggregation. --=20 Dave T=C3=A4ht We CAN make better hardware, ourselves, beat bufferbloat, and take back control of the edge of the internet! If we work together, on making it: https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hard= ware-for-networking