These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Stochastic linear contextual bandit algorithms have substantial applications
in practice, such as recommender systems, online advertising, clinical trials,
etc. Recent works show that optimal bandit algorithms are vulnerable to
adversarial attacks and can fail completely in the presence of attacks.
Existing robust bandit algorithms only work for the non-contextual setting
under the attack of rewards and cannot improve the robustness in the general
and popular contextual bandit environment. In addition, none of the existing
methods can defend against attacked context. In this work, we provide the first
robust bandit algorithm for stochastic linear contextual bandit setting under a
fully adaptive and omniscient attack with sub-linear regret. Our algorithm not
only works under the attack of rewards, but also under attacked context.
Moreover, it does not need any information about the attack budget or the
particular form of the attack. We provide theoretical guarantees for our
proposed algorithm and show by experiments that our proposed algorithm improves
the robustness against various kinds of popular attacks.