In this paper, we propose a first-order distributed optimization algorithm
that is provably robust to Byzantine failures-arbitrary and potentially
adversarial behavior, where all the participating agents are prone to failure.
We model each agent's state over time as a two-state Markov chain that
indicates Byzantine or trustworthy behaviors at different time instants. We set
no restrictions on the maximum number of Byzantine agents at any given time. We
design our method based on three layers of defense: 1) temporal robust
aggregation, 2) spatial robust aggregation, and 3) gradient normalization. We
study two settings for stochastic optimization, namely Sample Average
Approximation and Stochastic Approximation. We provide convergence guarantees
of our method for strongly convex and smooth non-convex cost functions.