These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We introduce two tactics to attack agents trained by deep reinforcement
learning algorithms using adversarial examples, namely the strategically-timed
attack and the enchanting attack. In the strategically-timed attack, the
adversary aims at minimizing the agent's reward by only attacking the agent at
a small subset of time steps in an episode. Limiting the attack activity to
this subset helps prevent detection of the attack by the agent. We propose a
novel method to determine when an adversarial example should be crafted and
applied. In the enchanting attack, the adversary aims at luring the agent to a
designated target state. This is achieved by combining a generative model and a
planning algorithm: while the generative model predicts the future states, the
planning algorithm generates a preferred sequence of actions for luring the
agent. A sequence of adversarial examples is then crafted to lure the agent to
take the preferred sequence of actions. We apply the two tactics to the agents
trained by the state-of-the-art deep reinforcement learning algorithm including
DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much
reward as the uniform attack (i.e., attacking at every time step) does by
attacking the agent 4 times less often. Our enchanting attack lures the agent
toward designated target states with a more than 70% success rate. Videos are
available at http://yenchenlin.me/adversarial_attack_RL/