Federated Learning is a distributed learning paradigm with two key challenges
that differentiate it from traditional distributed optimization: (1)
significant variability in terms of the systems characteristics on each device
in the network (systems heterogeneity), and (2) non-identically distributed
data across the network (statistical heterogeneity). In this work, we introduce
a framework, FedProx, to tackle heterogeneity in federated networks. FedProx
can be viewed as a generalization and re-parametrization of FedAvg, the current
state-of-the-art method for federated learning. While this re-parameterization
makes only minor modifications to the method itself, these modifications have
important ramifications both in theory and in practice. Theoretically, we
provide convergence guarantees for our framework when learning over data from
non-identical distributions (statistical heterogeneity), and while adhering to
device-level systems constraints by allowing each participating device to
perform a variable amount of work (systems heterogeneity). Practically, we
demonstrate that FedProx allows for more robust convergence than FedAvg across
a suite of realistic federated datasets. In particular, in highly heterogeneous
settings, FedProx demonstrates significantly more stable and accurate
convergence behavior relative to FedAvg---improving absolute test accuracy by
22% on average.