Federated learning has arisen as a mechanism to allow multiple participants
to collaboratively train a model without sharing their data. In these settings,
participants (workers) may not trust each other fully; for instance, a set of
competitors may collaboratively train a machine learning model to detect fraud.
The workers provide local gradients that a central server uses to update a
global model. This global model can be corrupted when Byzantine workers send
malicious gradients, which necessitates robust methods for aggregating
gradients that mitigate the adverse effects of Byzantine inputs. Existing
robust aggregation algorithms are often computationally expensive and only
effective under strict assumptions. In this paper, we introduce LayerwisE
Gradient AggregatTiOn (LEGATO), an aggregation algorithm that is, by contrast,
scalable and generalizable. Informed by a study of layer-specific responses of
gradients to Byzantine attacks, LEGATO employs a dynamic gradient reweighing
scheme that is novel in its treatment of gradients based on layer-specific
robustness. We show that LEGATO is more computationally efficient than multiple
state-of-the-art techniques and more generally robust across a variety of
attack settings in practice. We also demonstrate LEGATO's benefits for gradient
descent convergence in the absence of an attack.