So I've finally gotten around to trying out the Python bandit algorithm
implementations from
https://www.math.univ-toulouse.fr/~agarivie/Telecom/bandits/

I have added a very simple WiFi arm implementation to the code base
(which is basically a Bernoulli arm that succeeds with a certain
probability and gives a payoff scaled by the base rate), which can be
instantiated from the rate_stats_csv output from Minstrel in the kernel.

Based on the data from a simple test run in my own testbed I was able to
get three of the algorithms to produce something meaningful; see the
attached graph. The best of the algorithms performs roughly comparable
to Minstrel (I think; the numbers are not quite straight forward to
compare).

I have not been able to get the Thompson and BayesUCB algorithms to work
with this scenario yet (they require a posterior distribution to sample
from, and the included implementation doesn't work with the varying
payoffs of each arm). However, perhaps sticking with the KL-UCB
algorithm is better anyway (same one the "optimal rate sampling"
modifies; haven't quite grokked how they modify it yet).

Anyway, I do believe it is possible to extend this simulation to
something that we can use to guide, say, a dynamic implementation (i.e.,
change the probabilities over the duration of the test run), as well as
evaluating the effects of collapsing arms / defining them differently.
It would probably be good with a better source of the actual
probabilities for each rate, though.

So yeah, bit of a brain dump of where I'm at; but I'll be away for the
next couple of weeks, so though it better to get this out there.

Code here: https://kau.toke.dk/git/pybandits/

-Toke