Recent research has successfully demonstrated new types of data poisoning
attacks. To address this problem, some researchers have proposed both offline
and online data poisoning detection defenses which employ machine learning
algorithms to identify such attacks. In this work, we take a different approach
to preventing data poisoning attacks which relies on cryptographically-based
authentication and provenance to ensure the integrity of the data used to train
a machine learning model. The same approach is also used to prevent software
poisoning and model poisoning attacks. A software poisoning attack maliciously
alters one or more software components used to train a model. Once the model
has been trained it can also be protected against model poisoning attacks which
seek to alter a model's predictions by modifying its underlying parameters or
structure. Finally, an evaluation set or test set can also be protected to
provide evidence if they have been modified by a second data poisoning attack.
To achieve these goals, we propose VAMP which extends the previously proposed
AMP system, that was designed to protect media objects such as images, video
files or audio clips, to the machine learning setting. We first provide
requirements for authentication and provenance for a secure machine learning
system. Next, we demonstrate how VAMP's manifest meets these requirements to
protect a machine learning system's datasets, software components, and models.