These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Adversarial attacks remain a significant threat that can jeopardize the
integrity of Machine Learning (ML) models. In particular, query-based black-box
attacks can generate malicious noise without having access to the victim
model's architecture, making them practical in real-world contexts. The
community has proposed several defenses against adversarial attacks, only to be
broken by more advanced and adaptive attack strategies. In this paper, we
propose a framework that detects if an adversarial noise instance is being
generated. Unlike existing stateful defenses that detect adversarial noise
generation by monitoring the input space, our approach learns adversarial
patterns in the input update similarity space. In fact, we propose to observe a
new metric called Delta Similarity (DS), which we show it captures more
efficiently the adversarial behavior. We evaluate our approach against 8
state-of-the-art attacks, including adaptive attacks, where the adversary is
aware of the defense and tries to evade detection. We find that our approach is
significantly more robust than existing defenses both in terms of specificity
and sensitivity.