Deep neural networks and other modern machine learning models are often
susceptible to adversarial attacks. Indeed, an adversary may often be able to
change a model's prediction through a small, directed perturbation of the
model's input - an issue in safety-critical applications. Adversarially robust
machine learning is usually based on a minmax optimisation problem that
minimises the machine learning loss under maximisation-based adversarial
attacks.
In this work, we study adversaries that determine their attack using a
Bayesian statistical approach rather than maximisation. The resulting Bayesian
adversarial robustness problem is a relaxation of the usual minmax problem. To
solve this problem, we propose Abram - a continuous-time particle system that
shall approximate the gradient flow corresponding to the underlying learning
problem. We show that Abram approximates a McKean-Vlasov process and justify
the use of Abram by giving assumptions under which the McKean-Vlasov process
finds the minimiser of the Bayesian adversarial robustness problem. We discuss
two ways to discretise Abram and show its suitability in benchmark adversarial
deep learning experiments.