These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Transferability of adversarial examples is a well-known property that
endangers all classification models, even those that are only accessible
through black-box queries. Prior work has shown that an ensemble of models is
more resilient to transferability: the probability that an adversarial example
is effective against most models of the ensemble is low. Thus, most ongoing
research focuses on improving ensemble diversity. Another line of prior work
has shown that Lipschitz continuity of the models can make models more robust
since it limits how a model's output changes with small input perturbations. In
this paper, we study the effect of Lipschitz continuity on transferability
rates. We show that although a lower Lipschitz constant increases the
robustness of a single model, it is not as beneficial in training robust
ensembles as it increases the transferability rate of adversarial examples
across models in the ensemble. Therefore, we introduce LOTOS, a new training
paradigm for ensembles, which counteracts this adverse effect. It does so by
promoting orthogonality among the top-$k$ sub-spaces of the transformations of
the corresponding affine layers of any pair of models in the ensemble. We
theoretically show that $k$ does not need to be large for convolutional layers,
which makes the computational overhead negligible. Through various experiments,
we show LOTOS increases the robust accuracy of ensembles of ResNet-18 models by
$6$ percentage points (p.p) against black-box attacks on CIFAR-10. It is also
capable of combining with the robustness of prior state-of-the-art methods for
training robust ensembles to enhance their robust accuracy by $10.7$ p.p.