Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

Authors: Ming Xiang, Lili Su | Published: 2022-10-03 | Updated: 2023-02-17

2022.10.032025.04.03

Authors: Ming Xiang, Lili Su
Published: 2022-10-03 | Updated: 2023-02-17

Source: https://arxiv.org/abs/2210.00665

PDF: https://arxiv.org/pdf/2210.00665

AIにより推定されたラベル

アルゴリズム収束保証アルゴリズム設計

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named Adaptive Stochastic Sign SGD (Ada-StoSign) and β-Stochastic Sign SGD (β-StoSign), each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the ℓ_∞ of the local gradients – a key parameter used in gradient compression. We show that Ada-StoSign converges in expectation with a rate $O(\log T/\sqrt{T} + 1/\sqrt{M})$, where M is the number of clients. To the best of our knowledge, when M is sufficiently large, Ada-StoSign outperforms the state-of-the-art sign-based method whose convergence rate is O(T^− 1/4). Under bounded gradient assumption, β-StoSign achieves quantifiable Byzantine resilience and privacy assurances, and works with partial client participation and mini-batch gradients which could be unbounded. We corroborate and complement our theories by experiments on MNIST and CIFAR-10 datasets.