Robust Q-Learning under Corrupted Rewards

Machine learning

Q-learning

Christopher JCH Watkins, Peter Dayan

Published: 1992

Springer

Stochastic approximation: a dynamical systems viewpoint

Vivek S Borkar

Published: 2009

Machine learning

Asynchronous stochastic approximation and Q-learning

John N Tsitsiklis

Published: 1994

Advances in neural information processing systems

Convergence of stochastic iterative dynamic programming algorithms

Tommi Jaakkola, Michael Jordan, Satinder Singh

Published: 1993

Advances in neural information processing systems

The asymptotic convergence-rate of Q-learning

Csaba Szepesvári

Published: 1997

Conference on learning theory

A finite time analysis of temporal difference learning with linear function approximation

Jalaj Bhandari, Daniel Russo, Raghav Singal

Published: 2018

Advances in Neural Information Processing Systems

Q-learning with nearest neighbors

Devavrat Shah, Qiaomin Xie

Published: 2018

Conference on Learning Theory

Finite-time error bounds for linear stochastic approximation and TD learning

Rayadurgam Srikant, Lei Ying

Published: 2019

Stochastic approximation with cone-contractive operators: Sharp ℓ∞-bounds for Q-learning

Martin J Wainwright

Published: 2019

Proceedings of Machine Learning Research

Finite-time analysis of asynchronous stochastic approximation and Q-learning

Adam Wierman, Guannan Qu

Published: 2020

Operations Research

Is Q-learning minimax optimal? a tight sample complexity analysis

Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, Yuejie Chi

Published: 2024

In Breakthroughs in statistics

Robust estimation of a location parameter

Peter J Huber

Published: 1992

John Wiley & Sons

Robust statistics

Peter J Huber

Published: 2004

The Annals of Statistics

Robust multivariate mean estimation: the optimality of trimmed mean

Gabor Lugosi, Shahar Mendelson

Published: 2021

Advances in neural information processing systems

Finite-sample convergence rates for Q-learning and indirect algorithms

Michael Kearns, Satinder Singh

Published: 1998

Journal of machine learning Research

Learning rates for Q-learning

Eyal Even-Dar, Yishay Mansour, Peter Bartlett

Published: 2003

Advances in Neural Information Processing Systems

Near-optimal time and sample complexities for solving markov decision processes with a generative model

Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye

Published: 2018

Systems & control letters

Error bounds for constant step-size Q-learning

Carolyn L Beck, Rayadurgam Srikant

Published: 2012

Adversarial attacks on stochastic bandits

Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu

Published: 2018

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

Stochastic bandits robust to adversarial corruptions

Lykouris, T., Mirrokni, V., Paes Leme, R.

Published: 2018

arxiv

被引用数 1

Data Poisoning Attacks on Stochastic Bandits

Fang Liu, Ness Shroff

Published: 2019.5.16

Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others. Even though potential attacks against these learning algorithms may hijack their behavior, causing catastrophic loss in real-world applications, little is known about adversarial attacks on bandit algorithms. In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms. We show that the attacker can force the bandit algorithm to pull a target arm with high probability by a slight manipulation of the rewards in the data. Then we study a form of online attacks on bandit algorithms and propose an adaptive attack strategy against any bandit algorithm without the knowledge of the bandit algorithm. Our adaptive attack strategy can hijack the behavior of the bandit algorithm to suffer a linear regret with only a logarithmic cost to the attacker. Our results demonstrate a significant security threat to stochastic bandits.

ポイズニング攻撃チェーン分析アルゴリズム

Better algorithms for stochastic bandits with adversarial corruptions

Gupta, A., Koren, T., Talwar, K.

Published: 2019

Machine Learning

Corruption-tolerant bandit learning

Sayash Kapoor, Kumar Kshitij Patel, Purushottam Kar

Published: 2019

International Conference on Artificial Intelligence and Statistics

Corruption-tolerant gaussian process bandit optimization

Ilija Bogunovic, Andreas Krause, Jonathan Scarlett

Published: 2020

arxiv

被引用数 1

International Conference on Artificial Intelligence and Statistics (AISTATS)

Stochastic Linear Bandits Robust to Adversarial Attacks

Ilija Bogunovic, Arpan Losalka, Andreas Krause, Jonathan Scarlett

Published: 2020.7.7

We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget $C$ (i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. Both variants are shown to attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively having a linear and quadratic dependency on $C$ in general. We present algorithm independent lower bounds showing that these additive terms are near-optimal. In addition, in a contextual setting, we revisit a setup of diverse contexts, and show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.

計算効率敵対的学習不確実性の定量化

Temporal difference learning with compressed updates: Error-feedback meets reinforcement learning

Aritra Mitra, George J Pappas, Hamed Hassani

Published: 2023

International Conference on Artificial Intelligence and Statistics

Stochastic approximation with delayed updates: Finite-time rates under markovian sampling

Arman Adibi, Nicolò Dal Fabbro, Luca Schenato, Sanjeev Kulkarni, H Vincent Poor, George J Pappas, Hamed Hassani, Aritra Mitra

Published: 2024

MIT press

Reinforcement learning: An introduction

Richard S Sutton, Andrew G Barto

Published: 2018

Minimax pac bounds on the sample complexity of reinforcement learning with a generative model

Mohammad Gheshlaghi Azar, Remi Munos, Bert Kappen

Published: 2013

Cambridge University Press

Algorithmic High-Dimensional Robust Statistics

Ilias Diakonikolas, Daniel M Kane

Published: 2023

Internet mathematics

Concentration inequalities and martingale inequalities: a survey

Fan Chung, Linyuan Lu

Published: 2006