Evading Black-box Classifiers Without Breaking Eggs

TOP Literature Database Evading Black-box Classifiers Without Breaking Eggs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2306.02895

PDF

https://arxiv.org/pdf/2306.02895

Paper Information

Author: Edoardo Debenedetti;Nicholas Carlini;Florian Tramèr
Published: 6-5-2023
Updated: 2-14-2024
Affiliation: ETH Zurich
Country: Switzerland
Conference: Conference on Secure and Trustworthy Machine Learning (SaTML)

Labels Estimated by AI

Attack Evaluation Adversarial Example Adversarial attack

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Decision-based evasion attacks repeatedly query a black-box classifier to generate adversarial examples. Prior work measures the cost of such attacks by the total number of queries made to the classifier. We argue this metric is flawed. Most security-critical machine learning systems aim to weed out "bad" data (e.g., malware, harmful content, etc). Queries to such systems carry a fundamentally asymmetric cost: queries detected as "bad" come at a higher cost because they trigger additional security filters, e.g., usage throttling or account suspension. Yet, we find that existing decision-based attacks issue a large number of "bad" queries, which likely renders them ineffective against security-critical systems. We then design new attacks that reduce the number of bad queries by $1.5$-$7.3\times$, but often at a significant increase in total (non-bad) queries. We thus pose it as an open problem to build black-box attacks that are more effective under realistic cost metrics.

External Datasets

ImageNet

ImageNet-Dogs

ImageNet-NSFW