Machine learning as a service has given raise to privacy concerns surrounding
clients' data and providers' models and has catalyzed research in private
inference (PI): methods to process inferences without disclosing inputs.
Recently, researchers have adapted cryptographic techniques to show PI is
possible, however all solutions increase inference latency beyond practical
limits. This paper makes the observation that existing models are ill-suited
for PI and proposes a novel NAS method, named CryptoNAS, for finding and
tailoring models to the needs of PI. The key insight is that in PI operator
latency cost are non-linear operations (e.g., ReLU) dominate latency, while
linear layers become effectively free. We develop the idea of a ReLU budget as
a proxy for inference latency and use CryptoNAS to build models that maximize
accuracy within a given budget. CryptoNAS improves accuracy by 3.4% and latency
by 2.4x over the state-of-the-art.