These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Federated Learning (FL) is a nascent decentralized learning framework under
which a massive collection of heterogeneous clients collaboratively train a
model without revealing their local data. Scarce communication, privacy
leakage, and Byzantine attacks are the key bottlenecks of system scalability.
In this paper, we focus on communication-efficient distributed (stochastic)
gradient descent for non-convex optimization, a driving force of FL. We propose
two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em
$\beta$-Stochastic Sign SGD ($\beta$-StoSign)}, each of which compresses the
local gradients into bit vectors.
To handle unbounded gradients, Ada-StoSign uses a novel norm tracking
function that adaptively adjusts a coarse estimation on the $\ell_{\infty}$ of
the local gradients - a key parameter used in gradient compression. We show
that Ada-StoSign converges in expectation with a rate $O(\log T/\sqrt{T} +
1/\sqrt{M})$, where $M$ is the number of clients. To the best of our knowledge,
when $M$ is sufficiently large, Ada-StoSign outperforms the state-of-the-art
sign-based method whose convergence rate is $O(T^{-1/4})$. Under bounded
gradient assumption, $\beta$-StoSign achieves quantifiable Byzantine resilience
and privacy assurances, and works with partial client participation and
mini-batch gradients which could be unbounded. We corroborate and complement
our theories by experiments on MNIST and CIFAR-10 datasets.