CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models

TOP Literature Database CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2410.04823

PDF

https://arxiv.org/pdf/2410.04823

Paper Information

Author: Songning Lai;Jiayu Yang;Yu Huang;Lijie Hu;Tianlang Xue;Zhangyi Hu;Jiaxu Li;Haicheng Liao;Yutao Yue
Published: 10-7-2024
Affiliation: HKUST(GZ)
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Backdoor Attack Poisoning

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information. However, CBMs, like other machine learning models, are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors. Understanding that the community has not yet studied the concept level backdoor attack of CBM, because of "Better the devil you know than the devil you don't know.", we introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training, enabling controlled manipulation of model predictions at inference time. An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers, thereby optimizing the attack's impact. Our comprehensive evaluation framework assesses both the attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work underscores the potential security risks associated with CBMs and provides a robust testing methodology for future security assessments.

External Datasets

Caltech-UCSD Birds-200-2011 (CUB)

Animals with Attributes (AwA)