Despite being popularly used in many applications, neural network models have
been found to be vulnerable to adversarial examples, i.e., carefully crafted
examples aiming to mislead machine learning models. Adversarial examples can
pose potential risks on safety and security critical applications. However,
existing defense approaches are still vulnerable to attacks, especially in a
white-box attack scenario. To address this issue, we propose a new defense
approach, named MulDef, based on robustness diversity. Our approach consists of
(1) a general defense framework based on multiple models and (2) a technique
for generating these multiple models to achieve high defense capability. In
particular, given a target model, our framework includes multiple models
(constructed from the target model) to form a model family. The model family is
designed to achieve robustness diversity (i.e., an adversarial example
successfully attacking one model cannot succeed in attacking other models in
the family). At runtime, a model is randomly selected from the family to be
applied on each input example. Our general framework can inspire rich future
research to construct a desirable model family achieving higher robustness
diversity. Our evaluation results show that MulDef (with only up to 5 models in
the family) can substantially improve the target model's accuracy on
adversarial examples by 22-74% in a white-box attack scenario, while
maintaining similar accuracy on legitimate examples.