TOP 文献データベース Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification
Computing Research Repository (CoRR)
Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification
Backdoor injection attacks are a threat to machine learning models that are
trained on large data collected from untrusted sources; these attacks enable
attackers to inject malicious behavior into the model that can be triggered by
specially crafted inputs. Prior work has established bounds on the success of
backdoor attacks and their impact on the benign learning task, however, an open
question is what amount of poison data is needed for a successful backdoor
attack. Typical attacks either use few samples, but need much information about
the data points or need to poison many data points.
In this paper, we formulate the one-poison hypothesis: An adversary with one
poison sample and limited background knowledge can inject a backdoor with zero
backdooring-error and without significantly impacting the benign learning task
performance. Moreover, we prove the one-poison hypothesis for linear regression
and linear classification. For adversaries that utilize a direction that is
unused by the benign data distribution for the poison sample, we show that the
resulting model is functionally equivalent to a model where the poison was
excluded from training. We build on prior work on statistical backdoor learning
to show that in all other cases, the impact on the benign learning task is
still limited. We also validate our theoretical results experimentally with
realistic benchmark data sets.