The rapid adoption of machine learning has increased concerns about the
privacy implications of machine learning models trained on sensitive data, such
as medical records or other personal information. To address those concerns,
one promising approach is Private Aggregation of Teacher Ensembles, or PATE,
which transfers to a "student" model the knowledge of an ensemble of "teacher"
models, with intuitive privacy provided by training teachers on disjoint data
and strong privacy guaranteed by noisy aggregation of teachers' answers.
However, PATE has so far been evaluated only on simple classification tasks
like MNIST, leaving unclear its utility when applied to larger-scale learning
tasks and real-world datasets.
In this work, we show how PATE can scale to learning tasks with large numbers
of output classes and uncurated, imbalanced training data with errors. For
this, we introduce new noisy aggregation mechanisms for teacher ensembles that
are more selective and add less noise, and prove their tighter
differential-privacy guarantees. Our new mechanisms build on two insights: the
chance of teacher consensus is increased by using more concentrated noise and,
lacking consensus, no answer need be given to a student. The consensus answers
used are more likely to be correct, offer better intuitive privacy, and incur
lower-differential privacy cost. Our evaluation shows our mechanisms improve on
the original PATE on all measures, and scale to larger tasks with both high
utility and very strong privacy ($\varepsilon$ < 1.0).