The fact that deep neural networks are susceptible to crafted perturbations
severely impacts the use of deep learning in certain domains of application.
Among many developed defense models against such attacks, adversarial training
emerges as the most successful method that consistently resists a wide range of
attacks. In this work, based on an observation from a previous study that the
representations of a clean data example and its adversarial examples become
more divergent in higher layers of a deep neural net, we propose the Adversary
Divergence Reduction Network which enforces local/global compactness and the
clustering assumption over an intermediate layer of a deep neural network. We
conduct comprehensive experiments to understand the isolating behavior of each
component (i.e., local/global compactness and the clustering assumption) and
compare our proposed model with state-of-the-art adversarial training methods.
The experimental results demonstrate that augmenting adversarial training with
our proposed components can further improve the robustness of the network,
leading to higher unperturbed and adversarial predictive performances.