We study a security threat to reinforcement learning where an attacker
poisons the learning environment to force the agent into executing a target
policy chosen by the attacker. As a victim, we consider RL agents whose
objective is to find a policy that maximizes average reward in undiscounted
infinite-horizon problem settings. The attacker can manipulate the rewards or
the transition dynamics in the learning environment at training-time and is
interested in doing so in a stealthy manner. We propose an optimization
framework for finding an \emph{optimal stealthy attack} for different measures
of attack cost. We provide sufficient technical conditions under which the
attack is feasible and provide lower/upper bounds on the attack cost. We
instantiate our attacks in two settings: (i) an \emph{offline} setting where
the agent is doing planning in the poisoned environment, and (ii) an
\emph{online} setting where the agent is learning a policy using a
regret-minimization framework with poisoned feedback. Our results show that the
attacker can easily succeed in teaching any target policy to the victim under
mild conditions and highlight a significant security threat to reinforcement
learning agents in practice.