These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Targeted data poisoning attacks pose an increasingly serious threat due to
their ease of deployment and high success rates. These attacks aim to
manipulate the prediction for a single test sample in classification models.
Unlike indiscriminate attacks that aim to decrease overall test performance,
targeted attacks present a unique threat to individual test instances. This
threat model raises a fundamental question: what factors make certain test
samples more susceptible to successful poisoning than others? We investigate
how attack difficulty varies across different test instances and identify key
characteristics that influence vulnerability. This paper introduces three
predictive criteria for targeted data poisoning difficulty: ergodic prediction
accuracy (analyzed through clean training dynamics), poison distance, and
poison budget. Our experimental results demonstrate that these metrics
effectively predict the varying difficulty of real-world targeted poisoning
attacks across diverse scenarios, offering practitioners valuable insights for
vulnerability assessment and understanding data poisoning attacks.