As machine learning becomes widely used for automated decisions, attackers
have strong incentives to manipulate the results and models generated by
machine learning algorithms. In this paper, we perform the first systematic
study of poisoning attacks and their countermeasures for linear regression
models. In poisoning attacks, attackers deliberately influence the training
data to manipulate the results of a predictive model. We propose a
theoretically-grounded optimization framework specifically designed for linear
regression and demonstrate its effectiveness on a range of datasets and models.
We also introduce a fast statistical attack that requires limited knowledge of
the training process. Finally, we design a new principled defense method that
is highly resilient against all poisoning attacks. We provide formal guarantees
about its convergence and an upper bound on the effect of poisoning attacks
when the defense is deployed. We evaluate extensively our attacks and defenses
on three realistic datasets from health care, loan assessment, and real estate
domains.