Major cloud operators offer machine learning (ML) as a service, enabling
customers who have the data but not ML expertise or infrastructure to train
predictive models on this data. Existing ML-as-a-service platforms require
users to reveal all training data to the service operator. We design,
implement, and evaluate Chiron, a system for privacy-preserving machine
learning as a service. First, Chiron conceals the training data from the
service operator. Second, in keeping with how many existing ML-as-a-service
platforms work, Chiron reveals neither the training algorithm nor the model
structure to the user, providing only black-box access to the trained model.
Chiron is implemented using SGX enclaves, but SGX alone does not achieve the
dual goals of data privacy and model confidentiality. Chiron runs the standard
ML training toolchain (including the popular Theano framework and C compiler)
in an enclave, but the untrusted model-creation code from the service operator
is further confined in a Ryoan sandbox to prevent it from leaking the training
data outside the enclave. To support distributed training, Chiron executes
multiple concurrent enclaves that exchange model parameters via a parameter
server. We evaluate Chiron on popular deep learning models, focusing on
benchmark image classification tasks such as CIFAR and ImageNet, and show that
its training performance and accuracy of the resulting models are practical for
common uses of ML-as-a-service.