AIセキュリティポータル K Program
Robust Q-Learning under Corrupted Rewards
Share
Abstract
Recently, there has been a surge of interest in analyzing the non-asymptotic behavior of model-free reinforcement learning algorithms. However, the performance of such algorithms in non-ideal environments, such as in the presence of corrupted rewards, is poorly understood. Motivated by this gap, we investigate the robustness of the celebrated Q-learning algorithm to a strong-contamination attack model, where an adversary can arbitrarily perturb a small fraction of the observed rewards. We start by proving that such an attack can cause the vanilla Q-learning algorithm to incur arbitrarily large errors. We then develop a novel robust synchronous Q-learning algorithm that uses historical reward data to construct robust empirical Bellman operators at each time step. Finally, we prove a finite-time convergence rate for our algorithm that matches known state-of-the-art bounds (in the absence of attacks) up to a small inevitable $O(\varepsilon)$ error term that scales with the adversarial corruption fraction $\varepsilon$. Notably, our results continue to hold even when the true reward distributions have infinite support, provided they admit bounded second moments.
Q-learning
Christopher JCH Watkins, Peter Dayan
Published: 1992
Stochastic approximation: a dynamical systems viewpoint
Vivek S Borkar
Published: 2009
Asynchronous stochastic approximation and Q-learning
John N Tsitsiklis
Published: 1994
Convergence of stochastic iterative dynamic programming algorithms
Tommi Jaakkola, Michael Jordan, Satinder Singh
Published: 1993
The asymptotic convergence-rate of Q-learning
Csaba Szepesvári
Published: 1997
A finite time analysis of temporal difference learning with linear function approximation
Jalaj Bhandari, Daniel Russo, Raghav Singal
Published: 2018
Q-learning with nearest neighbors
Devavrat Shah, Qiaomin Xie
Published: 2018
Finite-time error bounds for linear stochastic approximation and TD learning
Rayadurgam Srikant, Lei Ying
Published: 2019
Finite-time analysis of asynchronous stochastic approximation and Q-learning
Adam Wierman, Guannan Qu
Published: 2020
Is Q-learning minimax optimal? a tight sample complexity analysis
Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, Yuejie Chi
Published: 2024
Robust estimation of a location parameter
Peter J Huber
Published: 1992
Robust statistics
Peter J Huber
Published: 2004
Robust multivariate mean estimation: the optimality of trimmed mean
Gabor Lugosi, Shahar Mendelson
Published: 2021
Finite-sample convergence rates for Q-learning and indirect algorithms
Michael Kearns, Satinder Singh
Published: 1998
Learning rates for Q-learning
Eyal Even-Dar, Yishay Mansour, Peter Bartlett
Published: 2003
Near-optimal time and sample complexities for solving markov decision processes with a generative model
Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye
Published: 2018
Error bounds for constant step-size Q-learning
Carolyn L Beck, Rayadurgam Srikant
Published: 2012
Stochastic bandits robust to adversarial corruptions
Lykouris, T., Mirrokni, V., Paes Leme, R.
Published: 2018
Corruption-tolerant bandit learning
Sayash Kapoor, Kumar Kshitij Patel, Purushottam Kar
Published: 2019
Corruption-tolerant gaussian process bandit optimization
Ilija Bogunovic, Andreas Krause, Jonathan Scarlett
Published: 2020
Stochastic approximation with delayed updates: Finite-time rates under markovian sampling
Arman Adibi, Nicolò Dal Fabbro, Luca Schenato, Sanjeev Kulkarni, H Vincent Poor, George J Pappas, Hamed Hassani, Aritra Mitra
Published: 2024
Reinforcement learning: An introduction
Richard S Sutton, Andrew G Barto
Published: 2018
Minimax pac bounds on the sample complexity of reinforcement learning with a generative model
Mohammad Gheshlaghi Azar, Remi Munos, Bert Kappen
Published: 2013
Algorithmic High-Dimensional Robust Statistics
Ilias Diakonikolas, Daniel M Kane
Published: 2023
Concentration inequalities and martingale inequalities: a survey
Fan Chung, Linyuan Lu
Published: 2006
Share