Poisoning attacks have emerged as a significant security threat to machine
learning algorithms. It has been demonstrated that adversaries who make small
changes to the training set, such as adding specially crafted data points, can
hurt the performance of the output model. Some of the stronger poisoning
attacks require the full knowledge of the training data. This leaves open the
possibility of achieving the same attack results using poisoning attacks that
do not have the full knowledge of the clean training set.
In this work, we initiate a theoretical study of the problem above.
Specifically, for the case of feature selection with LASSO, we show that
full-information adversaries (that craft poisoning examples based on the rest
of the training data) are provably stronger than the optimal attacker that is
oblivious to the training set yet has access to the distribution of the data.
Our separation result shows that the two setting of data-aware and
data-oblivious are fundamentally different and we cannot hope to always achieve
the same attack or defense results in these scenarios.