Control policies, trained using the Deep Reinforcement Learning, have been
recently shown to be vulnerable to adversarial attacks introducing even very
small perturbations to the policy input. The attacks proposed so far have been
designed using heuristics, and build on existing adversarial example crafting
techniques used to dupe classifiers in supervised learning. In contrast, this
paper investigates the problem of devising optimal attacks, depending on a
well-defined attacker's objective, e.g., to minimize the main agent average
reward. When the policy and the system dynamics, as well as rewards, are known
to the attacker, a scenario referred to as a white-box attack, designing
optimal attacks amounts to solving a Markov Decision Process. For what we call
black-box attacks, where neither the policy nor the system is known, optimal
attacks can be trained using Reinforcement Learning techniques. Through
numerical experiments, we demonstrate the efficiency of our attacks compared to
existing attacks (usually based on Gradient methods). We further quantify the
potential impact of attacks and establish its connection to the smoothness of
the policy under attack. Smooth policies are naturally less prone to attacks
(this explains why Lipschitz policies, with respect to the state, are more
resilient). Finally, we show that from the main agent perspective, the system
uncertainties and the attacker can be modeled as a Partially Observable Markov
Decision Process. We actually demonstrate that using Reinforcement Learning
techniques tailored to POMDP (e.g. using Recurrent Neural Networks) leads to
more resilient policies.