These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Despite the transformative impact of deep learning across multiple domains,
the inherent opacity of these models has driven the development of Explainable
Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models
(CBMs) have emerged as a key approach to improve interpretability by leveraging
high-level semantic information. However, CBMs, like other machine learning
models, are susceptible to security threats, particularly backdoor attacks,
which can covertly manipulate model behaviors. Understanding that the community
has not yet studied the concept level backdoor attack of CBM, because of
"Better the devil you know than the devil you don't know.", we introduce CAT
(Concept-level Backdoor ATtacks), a methodology that leverages the conceptual
representations within CBMs to embed triggers during training, enabling
controlled manipulation of model predictions at inference time. An enhanced
attack pattern, CAT+, incorporates a correlation function to systematically
select the most effective and stealthy concept triggers, thereby optimizing the
attack's impact. Our comprehensive evaluation framework assesses both the
attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain
high performance on clean data while achieving significant targeted effects on
backdoored datasets. This work underscores the potential security risks
associated with CBMs and provides a robust testing methodology for future
security assessments.