Black-box adversarial attacks designing adversarial examples for unseen
neural networks (NNs) have received great attention over the past years. While
several successful black-box attack schemes have been proposed in the
literature, the underlying factors driving the transferability of black-box
adversarial examples still lack a thorough understanding. In this paper, we aim
to demonstrate the role of the generalization properties of the substitute
classifier used for generating adversarial examples in the transferability of
the attack scheme to unobserved NN classifiers. To do this, we apply the
max-min adversarial example game framework and show the importance of the
generalization properties of the substitute NN in the success of the black-box
attack scheme in application to different NN classifiers. We prove theoretical
generalization bounds on the difference between the attack transferability
rates on training and test samples. Our bounds suggest that a substitute NN
with better generalization behavior could result in more transferable
adversarial examples. In addition, we show that standard operator norm-based
regularization methods could improve the transferability of the designed
adversarial examples. We support our theoretical results by performing several
numerical experiments showing the role of the substitute network's
generalization in generating transferable adversarial examples. Our empirical
results indicate the power of Lipschitz regularization methods in improving the
transferability of adversarial examples.