Due to the broad range of applications of stochastic multi-armed bandit
model, understanding the effects of adversarial attacks and designing bandit
algorithms robust to attacks are essential for the safe applications of this
model. In this paper, we introduce a new class of attack named
action-manipulation attack. In this attack, an adversary can change the action
signal selected by the user. We show that without knowledge of mean rewards of
arms, our proposed attack can manipulate Upper Confidence Bound (UCB)
algorithm, a widely used bandit algorithm, into pulling a target arm very
frequently by spending only logarithmic cost. To defend against this class of
attacks, we introduce a novel algorithm that is robust to action-manipulation
attacks when an upper bound for the total attack cost is given. We prove that
our algorithm has a pseudo-regret upper bounded by $\mathcal{O}(\max\{\log
T,A\})$, where $T$ is the total number of rounds and $A$ is the upper bound of
the total attack cost.