Federated Averaging (FedAvg) has emerged as the algorithm of choice for
federated learning due to its simplicity and low communication cost. However,
in spite of recent research efforts, its performance is not fully understood.
We obtain tight convergence rates for FedAvg and prove that it suffers from
`client-drift' when the data is heterogeneous (non-iid), resulting in unstable
and slow convergence.
As a solution, we propose a new algorithm (SCAFFOLD) which uses control
variates (variance reduction) to correct for the `client-drift' in its local
updates. We prove that SCAFFOLD requires significantly fewer communication
rounds and is not affected by data heterogeneity or client sampling. Further,
we show that (for quadratics) SCAFFOLD can take advantage of similarity in the
client's data yielding even faster convergence. The latter is the first result
to quantify the usefulness of local-steps in distributed optimization.