These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Black-box adversarial attacks have demonstrated strong potential to
compromise machine learning models by iteratively querying the target model or
leveraging transferability from a local surrogate model. Recently, such attacks
can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g.,
detection via the pattern of sequential queries, or injecting noise into the
model. To our best knowledge, we take the first step to study a new paradigm of
black-box attacks with provable guarantees -- certifiable black-box attacks
that can guarantee the attack success probability (ASP) of adversarial examples
before querying over the target model. This new black-box attack unveils
significant vulnerabilities of machine learning models, compared to traditional
empirical black-box attacks, e.g., breaking strong SOTA defenses with provable
confidence, constructing a space of (infinite) adversarial examples with high
ASP, and the ASP of the generated adversarial examples is theoretically
guaranteed without verification/queries over the target model. Specifically, we
establish a novel theoretical foundation for ensuring the ASP of the black-box
attack with randomized adversarial examples (AEs). Then, we propose several
novel techniques to craft the randomized AEs while reducing the perturbation
size for better imperceptibility. Finally, we have comprehensively evaluated
the certifiable black-box attacks on the CIFAR10/100, ImageNet, and LibriSpeech
datasets, while benchmarking with 16 SOTA black-box attacks, against various
SOTA defenses in the domains of computer vision and speech recognition. Both
theoretical and experimental results have validated the significance of the
proposed attack. The code and all the benchmarks are available at
\url{https://github.com/datasec-lab/CertifiedAttack}.