As Machine Learning (ML) gets applied to security-critical or sensitive
domains, there is a growing need for integrity and privacy for outsourced ML
computations. A pragmatic solution comes from Trusted Execution Environments
(TEEs), which use hardware and software protections to isolate sensitive
computations from the untrusted software stack. However, these isolation
guarantees come at a price in performance, compared to untrusted alternatives.
This paper initiates the study of high performance execution of Deep Neural
Networks (DNNs) in TEEs by efficiently partitioning DNN computations between
trusted and untrusted devices. Building upon an efficient outsourcing scheme
for matrix multiplication, we propose Slalom, a framework that securely
delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX
or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate
Slalom by running DNNs in an Intel SGX enclave, which selectively delegates
work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet
variants) we obtain 6x to 20x increases in throughput for verifiable inference,
and 4x to 11x for verifiable and private inference.