We propose two novel stochastic gradient descent algorithms, ByGARS and
ByGARS++, for distributed machine learning in the presence of any number of
Byzantine adversaries. In these algorithms, reputation scores of workers are
computed using an auxiliary dataset at the server. This reputation score is
then used for aggregating the gradients for stochastic gradient descent. The
computational complexity of ByGARS++ is the same as the usual distributed
stochastic gradient descent method with only an additional inner product
computation in every iteration. We show that using these reputation scores for
gradient aggregation is robust to any number of multiplicative noise Byzantine
adversaries and use two-timescale stochastic approximation theory to prove
convergence for strongly convex loss functions. We demonstrate the
effectiveness of the algorithms for non-convex learning problems using MNIST
and CIFAR-10 datasets against almost all state-of-the-art Byzantine attacks. We
also show that the proposed algorithms are robust to multiple different types
of attacks at the same time.