In this paper, we proposed a general framework for data poisoning attacks to
graph-based semi-supervised learning (G-SSL). In this framework, we first unify
different tasks, goals, and constraints into a single formula for data
poisoning attack in G-SSL, then we propose two specialized algorithms to
efficiently solve two important cases --- poisoning regression tasks under
$\ell_2$-norm constraint and classification tasks under $\ell_0$-norm
constraint. In the former case, we transform it into a non-convex trust region
problem and show that our gradient-based algorithm with delicate initialization
and update scheme finds the (globally) optimal perturbation. For the latter
case, although it is an NP-hard integer programming problem, we propose a
probabilistic solver that works much better than the classical greedy method.
Lastly, we test our framework on real datasets and evaluate the robustness of
G-SSL algorithms. For instance, on the MNIST binary classification problem
(50000 training data with 50 labeled), flipping two labeled data is enough to
make the model perform like random guess (around 50\% error).