Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Authors: Cong Xie, Oluwasanmi Koyejo, Indranil Gupta | Published: 2018-05-25 | Updated: 2019-05-18

2018.05.252025.05.28

Authors: Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
Published: 2018-05-25 | Updated: 2019-05-18

Source: https://arxiv.org/abs/1805.10032

PDF: https://arxiv.org/pdf/1805.10032

Labels Predicted by AI

Loss Function Linear Model Reinforcement Learning Optimization

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.