These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Recent studies have shown that deep reinforcement learning agents are
vulnerable to small adversarial perturbations on the agent's inputs, which
raises concerns about deploying such agents in the real world. To address this
issue, we propose RADIAL-RL, a principled framework to train reinforcement
learning agents with improved robustness against $l_p$-norm bounded adversarial
attacks. Our framework is compatible with popular deep reinforcement learning
algorithms and we demonstrate its performance with deep Q-learning, A3C and
PPO. We experiment on three deep RL benchmarks (Atari, MuJoCo and ProcGen) to
show the effectiveness of our robust training algorithm. Our RADIAL-RL agents
consistently outperform prior methods when tested against attacks of varying
strength and are more computationally efficient to train. In addition, we
propose a new evaluation method called Greedy Worst-Case Reward (GWC) to
measure attack agnostic robustness of deep RL agents. We show that GWC can be
evaluated efficiently and is a good estimate of the reward under the worst
possible sequence of adversarial attacks. All code used for our experiments is
available at https://github.com/tuomaso/radial_rl_v2.