In this work, we consider the resilience of distributed algorithms based on
stochastic gradient descent (SGD) in distributed learning with potentially
Byzantine attackers, who could send arbitrary information to the parameter
server to disrupt the training process. Toward this end, we propose a new
Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate
Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of
the workers being Byzantine attackers, while still converging almost surely to
a stationary region in non-convex settings. Also, our LICM-SGD method does not
require any information about the number of attackers and the Lipschitz
constant, which makes it attractive for practical implementations. Moreover,
our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in
the sense that the time-complexity is the same as that of the standard SGD
under no attacks. We conduct extensive experiments to show that our LICM-SGD
algorithm consistently outperforms existing methods in training multi-class
logistic regression and convolutional neural networks with MNIST and CIFAR-10
datasets. In our experiments, LICM-SGD also achieves a much faster running time
thanks to its low computational time-complexity.