A common pain point in differentially private machine learning is the
significant runtime overhead incurred when executing Differentially Private
Stochastic Gradient Descent (DPSGD), which may be as large as two orders of
magnitude. We thoroughly demonstrate that by exploiting powerful language
primitives, including vectorization, just-in-time compilation, and static graph
optimization, one can dramatically reduce these overheads, in many cases nearly
matching the best non-private running times. These gains are realized in two
frameworks: JAX and TensorFlow. JAX provides rich support for these primitives
as core features of the language through the XLA compiler. We also rebuild core
parts of TensorFlow Privacy, integrating features from TensorFlow 2 as well as
XLA compilation, granting significant memory and runtime improvements over the
current release version. These approaches allow us to achieve up to 50x
speedups in comparison to the best alternatives. Our code is available at
https://github.com/TheSalon/fast-dpsgd.