Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

TOP Literature Database Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2405.19098

PDF

https://arxiv.org/pdf/2405.19098

Paper Information

Author: Shuyu Cheng;Yibo Miao;Yinpeng Dong;Xiao Yang;Xiao-Shan Gao;Jun Zhu
Published: 5-29-2024
Affiliation: Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University
Country: China
Conference: International Conference on Machine Learning (ICML)

Labels Estimated by AI

Optimization Problem Attack Method Algorithm

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model's loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.

External Datasets

CIFAR-10

ImageNet

MS-COCO