We present a method to automatically find security strategies for the use
case of intrusion prevention. Following this method, we model the interaction
between an attacker and a defender as a Markov game and let attack and defense
strategies evolve through reinforcement learning and self-play without human
intervention. Using a simple infrastructure configuration, we demonstrate that
effective security strategies can emerge from self-play. This shows that
self-play, which has been applied in other domains with great success, can be
effective in the context of network security. Inspection of the converged
policies show that the emerged policies reflect common-sense knowledge and are
similar to strategies of humans. Moreover, we address known challenges of
reinforcement learning in this domain and present an approach that uses
function approximation, an opponent pool, and an autoregressive policy
representation. Through evaluations we show that our method is superior to two
baseline methods but that policy convergence in self-play remains a challenge.