Stochastic multi-armed bandits form a class of online learning problems that
have important applications in online recommendation systems, adaptive medical
treatment, and many others. Even though potential attacks against these
learning algorithms may hijack their behavior, causing catastrophic loss in
real-world applications, little is known about adversarial attacks on bandit
algorithms. In this paper, we propose a framework of offline attacks on bandit
algorithms and study convex optimization based attacks on several popular
bandit algorithms. We show that the attacker can force the bandit algorithm to
pull a target arm with high probability by a slight manipulation of the rewards
in the data. Then we study a form of online attacks on bandit algorithms and
propose an adaptive attack strategy against any bandit algorithm without the
knowledge of the bandit algorithm. Our adaptive attack strategy can hijack the
behavior of the bandit algorithm to suffer a linear regret with only a
logarithmic cost to the attacker. Our results demonstrate a significant
security threat to stochastic bandits.