These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
When training a machine learning model with differential privacy, one sets a
privacy budget. This budget represents a maximal privacy violation that any
user is willing to face by contributing their data to the training set. We
argue that this approach is limited because different users may have different
privacy expectations. Thus, setting a uniform privacy budget across all points
may be overly conservative for some users or, conversely, not sufficiently
protective for others. In this paper, we capture these preferences through
individualized privacy budgets. To demonstrate their practicality, we introduce
a variant of Differentially Private Stochastic Gradient Descent (DP-SGD) which
supports such individualized budgets. DP-SGD is the canonical approach to
training models with differential privacy. We modify its data sampling and
gradient noising mechanisms to arrive at our approach, which we call
Individualized DP-SGD (IDP-SGD). Because IDP-SGD provides privacy guarantees
tailored to the preferences of individual users and their data points, we find
it empirically improves privacy-utility trade-offs.
External Datasets
MNIST
SVHN
CIFAR10
References
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security
Deep learning with differential privacy
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas
Published: 2.18.2016
Modern mobile devices have access to a wealth of data suitable for learning
models, which in turn can greatly improve the user experience on the device.
For example, language models can improve speech recognition and text entry, and
image models can automatically select good photos. However, this rich data is
often privacy sensitive, large in quantity, or both, which may preclude logging
to the data center and training there using conventional approaches. We
advocate an alternative that leaves the training data distributed on the mobile
devices, and learns a shared model by aggregating locally-computed updates. We
term this decentralized approach Federated Learning.
We present a practical method for the federated learning of deep networks
based on iterative model averaging, and conduct an extensive empirical
evaluation, considering five different model architectures and four datasets.
These experiments demonstrate the approach is robust to the unbalanced and
non-IID data distributions that are a defining characteristic of this setting.
Communication costs are the principal constraint, and we show a reduction in
required communication rounds by 10-100x as compared to synchronized stochastic
gradient descent.
International Conference on Learning Representations (ICLR)
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar
Published: 10.19.2016
Some machine learning applications involve training data that is sensitive,
such as the medical histories of patients in a clinical trial. A model may
inadvertently and implicitly store some of its training data; careful analysis
of the model may therefore reveal sensitive information.
To address this problem, we demonstrate a generally applicable approach to
providing strong privacy guarantees for training data: Private Aggregation of
Teacher Ensembles (PATE). The approach combines, in a black-box fashion,
multiple models trained with disjoint datasets, such as records from different
subsets of users. Because they rely directly on sensitive data, these models
are not published, but instead used as "teachers" for a "student" model. The
student learns to predict an output chosen by noisy voting among all of the
teachers, and cannot directly access an individual teacher or the underlying
data or parameters. The student's privacy properties can be understood both
intuitively (since no single teacher and thus no single dataset dictates the
student's training) and formally, in terms of differential privacy. These
properties hold even if an adversary can not only query the student but also
inspect its internal workings.
Compared with previous work, the approach imposes only weak assumptions on
how teachers are trained: it applies to any model, including non-convex models
like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and
SVHN thanks to an improved privacy analysis and semi-supervised learning.