[Cerowrt-devel] more wet paint - babel unicast IHU for short-rtt path optimization

Tue Apr 7 14:56:27 EDT 2015

Please ignore this until after babel-1.6. It was prompted by finally
reading over the babel-long-rtt related code which bundles hello and
IHU together and some old notes I had made 2 years back when the first
talk of using arpanet style RTT routing metrics first became
plausible. Might as well store it on babel-devel and stick bits in
andrews and felix´s head.

My understanding of the babeld code is that unicast code is in there
but not used, and if it were used, it would not work against existing
babel daemons. ?

So how to interoperate with older babel daemons if we used more unicast?

TL; DR;
...

It is kind of my hope that with all this fun stuff detailed below in
play, a topology like this will choose the faster, less direct route,
rather than the more direct route, more often, particularly by using
the unicast responses to also measure connectivity better, summed e2e.

Prefer:

routerA-routerb-routerc-routere-routerf-routerg-routerH (all ethernet
and nanostation M5 p2p radios)

vs

routerA --- 3000 meter lousy wifi connection - routerH

More details below: (arguably this email may be longer than the code would be!)

0) I have been in multiple situations where multicast worked, but
unicast didn't (mostly due to bugs, but often due to distance and
minstrel failing to fall back to the lowest rate (also a bug), and one
time, by firewalling off unicast) and have always felt in the case of
wifi that testing both the multicast and unicast path was the best
indication of actual connectivity.

For the eventual choose the best path from the shortest-rtt sum
metrics, testing unicast in addition to multicast has a few other
pleasing properties, as it provides two delay measurement variables
that produce interesting and different results.

1) APs operate in power save mode against most clients. Multicast is
often delayed by as much as 250ms by this feature in the wifi
standards. CS6 markings are more or less ignored except
that they go into queue 1 (on many mq wifi systems), and jump the
hardware queue, where they then stall until they can be scheduled.
Other things (mdns, nd, etc) also use multicast in this mode and go
into the hardware queue, inducing delay and jitter with traffic
co-existing.

Win: wifi in this case is slower than unicast by a lot and exhibits
high variance.

2) In adhoc mode, with 802.11e enabled, (at least on some drivers) CS6
markings are presently scheduled sooner (in the VO queue) and get
airtime sooner, but only for a single packet,
and are still affected by other multicast traffic in the queue.

Lose: Burning a txop for a single packet is bad. Grabbing the VO
airtime is bad. It was for voice, and it was not particularly good for
that either.

Win: multicast takes a long time to transmit - 13ms for a single 1500
byte packet, AFTER it gets airtime. [2]

The only thought I have had about this before today is to turn off
babel CS6 on known to be wifi transmits. That injects more native
delay (under load) into the transmit than otherwise. (I have not tried
this)

The wifi CS6 handling [1]  "feature" is not ideal!

In cerowrt I flipped the diffserv handling to put it in the VI queue
to take better advantage of wireless-n aggregation, and in
make-wifi-fast we probably plan to maximize aggregation opportunities
entirely, ignoring most markings in favor of maximizing aggregation
and minimizing txops, and using fq to pack flows into those txops.

Sorting out more of the right thing, to me, for short-rtt metrics,
involves flipping the problem on its head - what marking will get the
*least* opportunity for airtime, and be most affected by queue delays?
That becomes no marking at all, for babel.

And I would not mind at all if openwrt turned off the VO queue for
user traffic entirely for everything in mac80211, at least.

2a) CS6 is often treated as priority on ethernet. Keep that.

3) Unicast IHU responses can run at the minstrel derived transmission
rate, which is up to 600x1 ratio vs multicast wireless-n and at least
3-5x higher on ac. It makes a lot of sense to use unicast, even with a
fairly dense mesh. How dense to fall back to multicast is kind of
unknown on modern standards, and TXOPs have a fixed cost also in older
standards.

A pleasing property about this is that no matter how hard we try,
sending packets both multicast and unicast will result in testing both
parts of the path, and in the case of congestion, be delayed by that,
also, as a function of the known rate, and the amount of packets going
through it. In general, sending less multicast should be a goodness.

4) Unicast transmissions would keep minstrel´s statistics "primed" as
to the right rate to use for transmit on mostly idle links. I do not
know if every 4 seconds is often enough to keep them primed,
but the priming process itself injects delay, and that is good, and
having the right rate, all the time, available, is also good. Bad
connectivity nowadays leads to tons and tons of delay as drivers
blithely retry for 10s or 100s of milliseconds.

5) ECN markings would ensure that packets are mostly dropped due to
reachability, not congestion. ECN support is enabled by default in the
openwrt default of fq_codel.

6) There is no.... rule 6!

7) fq-ing everywhere, (as in openwrt), leads to when a volley of
unicast IHU packets to various stations being scheduled -

today, without fq-ing, such a volley costs a txop for each station
(possibly per packet), incurring an increase of rtt across the last
packets of the volley.

with per station fair queuing the volley gets scheduled into
aggregates for each station, similar rtt increase

win, also.

8) I know incidentally that not fully randomizing the delivery of IHU
to multiple stations is an issue from a theoretical perspective. It
needen't be embedded in the protocol itself (at least for testing),
and sending large (2-3) full size timestamped packets per destination
as part of a measurement is ok too.... and they are free when you have
aggregation on wifi, or nearly so!

9) in all cases fq_codel particularly is max-min fair, so the first
full sized (well, in openwrt, <= 300 byte) packet goes out the front
of the queue quickly, and the second packet is delayed by the total
number of flows. On ethernet it remains hard to drive to saturation
except at 100mbit and below, so you will see nearly zero induced
delays at gigE, and fq_codel will hold fq´d delays below 5ms on most
BQL drivers I am familiar with (usually below 2ms) at line rates.
gige,100mbit,10mbit have pretty distinct plataeus when idle to
establish a baseline rtt for those - (and ping data on idle links is
misleading - ping is deoptimized somewhat)

In the case of older pure fifo queues being used, you will see more
jitter and variance due to congestion, and those links will be less
ideal to use in general, anyway... An example where you would see that
is on an ethernet through homeplug device.

Can this solution be made fully general? I look forward to finding out.

[1] It is worst than that. IETF defined CS1 as background. Most of the
off the shelf routers I have tried still treat it as higher priority
than BE on ethernet. Comcast remarks all traffic with weird markings
to CS1, and then CS1 gets treated as background (the 802.11e BK queue)
by most (but not all) wifi drivers.

[2] I am under the impression that most meshes (freifunct?) are
running at a vastly higher multicast rate - 12000 or 24000 - which is
still quite slow compared to aggregation.

-- 
Dave Täht
We CAN make better hardware, ourselves, beat bufferbloat, and take
back control of the edge of the internet! If we work together, on
making it:

https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking