These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Many machine learning algorithms are vulnerable to almost imperceptible
perturbations of their inputs. So far it was unclear how much risk adversarial
perturbations carry for the safety of real-world machine learning applications
because most methods used to generate such perturbations rely either on
detailed model information (gradient-based attacks) or on confidence scores
such as class probabilities (score-based attacks), neither of which are
available in most real-world scenarios. In many such cases one currently needs
to retreat to transfer-based attacks which rely on cumbersome substitute
models, need access to the training data and can be defended against. Here we
emphasise the importance of attacks which solely rely on the final model
decision. Such decision-based attacks are (1) applicable to real-world
black-box models such as autonomous cars, (2) need less knowledge and are
easier to apply than transfer-based attacks and (3) are more robust to simple
defences than gradient- or score-based attacks. Previous attacks in this
category were limited to simple models or simple datasets. Here we introduce
the Boundary Attack, a decision-based attack that starts from a large
adversarial perturbation and then seeks to reduce the perturbation while
staying adversarial. The attack is conceptually simple, requires close to no
hyperparameter tuning, does not rely on substitute models and is competitive
with the best gradient-based attacks in standard computer vision tasks like
ImageNet. We apply the attack on two black-box algorithms from Clarifai.com.
The Boundary Attack in particular and the class of decision-based attacks in
general open new avenues to study the robustness of machine learning models and
raise new questions regarding the safety of deployed machine learning systems.
An implementation of the attack is available as part of Foolbox at
https://github.com/bethgelab/foolbox .