Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

TOP Literature Database Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2307.16331

PDF

https://arxiv.org/pdf/2307.16331

Paper Information

Author: Ashish Hooda;Neal Mangaokar;Ryan Feng;Kassem Fawaz;Somesh Jha;Atul Prakash
Published: 7-31-2023
Affiliation: University of Wisconsin-Madison
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Adversarial Spectrum Attack Detection Cybersecurity Watermark Robustness

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack detection and false positive rates, and this trade-off is typically optimized by hand-picking feature extractors and similarity thresholds that empirically work well. There is little current understanding as to the formal limits of this trade-off and the exact properties of the feature extractors/underlying problem domain that influence it. This work aims to address this gap by offering a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses. We provide upper bounds for detection rates of a general class of feature extractors and analyze the impact of this trade-off on the convergence of black-box attacks. We then support our theoretical findings with empirical evaluations across multiple datasets and stateful defenses.