CopyCAT: Taking Control of Neural Policies with Constant Attacks

Authors: Léonard Hussenot, Matthieu Geist, Olivier Pietquin | Published: 2019-05-29 | Updated: 2020-01-21

2019.05.292025.05.28

Authors: Léonard Hussenot, Matthieu Geist, Olivier Pietquin
Published: 2019-05-29 | Updated: 2020-01-21

Source: https://arxiv.org/abs/1905.12282

PDF: https://arxiv.org/pdf/1905.12282

Labels Predicted by AI

Adversarial attack Deep Learning Effective Perturbation Methods

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider’s policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In this setting, the adversary cannot directly modify the agent’s state – its representation of the environment – but can only attack the agent’s observation – its perception of the environment. Directly modifying the agent’s state would require a write-access to the agent’s inner workings and we argue that this assumption is too strong in realistic settings.