While machine learning is going through an era of celebrated success,
concerns have been raised about the vulnerability of its backbone: stochastic
gradient descent (SGD). Recent approaches have been proposed to ensure the
robustness of distributed SGD against adversarial (Byzantine) workers sending
poisoned gradients during the training phase. Some of these approaches have
been proven Byzantine-resilient: they ensure the convergence of SGD despite the
presence of a minority of adversarial workers.
We show in this paper that convergence is not enough. In high dimension $d
\gg 1$, an adver\-sary can build on the loss function's non-convexity to make
SGD converge to ineffective models. More precisely, we bring to light that
existing Byzantine-resilient schemes leave a margin of poisoning of
$\Omega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt{d~}$.
Based on this leeway, we build a simple attack, and experimentally show its
strong to utmost effectivity on CIFAR-10 and MNIST.
We introduce Bulyan, and prove it significantly reduces the attackers leeway
to a narrow $O( \frac{1}{\sqrt{d~}})$ bound. We empirically show that Bulyan
does not suffer the fragility of existing aggregation rules and, at a
reasonable cost in terms of required batch size, achieves convergence as if
only non-Byzantine gradients had been used to update the model.