We consider the federated learning problem where data on workers are not
independent and identically distributed (i.i.d.). During the learning process,
an unknown number of Byzantine workers may send malicious messages to the
central node, leading to remarkable learning error. Most of the
Byzantine-robust methods address this issue by using robust aggregation rules
to aggregate the received messages, but rely on the assumption that all the
regular workers have i.i.d. data, which is not the case in many federated
learning applications. In light of the significance of reducing stochastic
gradient noise for mitigating the effect of Byzantine attacks, we use a
resampling strategy to reduce the impact of both inner variation (that
describes the sample heterogeneity on every regular worker) and outer variation
(that describes the sample heterogeneity among the regular workers), along with
a stochastic average gradient algorithm to gradually eliminate the inner
variation. The variance-reduced messages are then aggregated with a robust
geometric median operator. We prove that the proposed method reaches a
neighborhood of the optimal solution at a linear convergence rate and the
learning error is determined by the number of Byzantine workers. Numerical
experiments corroborate the theoretical results and show that the proposed
method outperforms the state-of-the-arts in the non-i.i.d. setting.