Deep Neural Networks (DNNs) have recently achieved great success in many
tasks, which encourages DNNs to be widely used as a machine learning service in
model sharing scenarios. However, attackers can easily generate adversarial
examples with a small perturbation to fool the DNN models to predict wrong
labels. To improve the robustness of shared DNN models against adversarial
attacks, we propose a novel method called Latent Adversarial Defence (LAD). The
proposed LAD method improves the robustness of a DNN model through adversarial
training on generated adversarial examples. Different from popular attack
methods which are carried in the input space and only generate adversarial
examples of repeating patterns, LAD generates myriad of adversarial examples
through adding perturbations to latent features along the normal of the
decision boundary which is constructed by an SVM with an attention mechanism.
Once adversarial examples are generated, we adversarially train the model
through augmenting the training data with generated adversarial examples.
Extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrate the
effectiveness of our model in defending against different types of adversarial
attacks.