Federated learning distributes model training among a multitude of agents,
who, guided by privacy concerns, perform training using their local data but
share only model parameter updates, for iterative aggregation at the server. In
this work, we explore the threat of model poisoning attacks on federated
learning initiated by a single, non-colluding malicious agent where the
adversarial objective is to cause the model to misclassify a set of chosen
inputs with high confidence. We explore a number of strategies to carry out
this attack, starting with simple boosting of the malicious agent's update to
overcome the effects of other agents' updates. To increase attack stealth, we
propose an alternating minimization strategy, which alternately optimizes for
the training loss and the adversarial objective. We follow up by using
parameter estimation for the benign agents' updates to improve on attack
success. Finally, we use a suite of interpretability techniques to generate
visual explanations of model decisions for both benign and malicious models and
show that the explanations are nearly visually indistinguishable. Our results
indicate that even a highly constrained adversary can carry out model poisoning
attacks while simultaneously maintaining stealth, thus highlighting the
vulnerability of the federated learning setting and the need to develop
effective defense strategies.