Training neural networks usually require large numbers of sensitive training
data, and how to protect the privacy of training data has thus become a
critical topic in deep learning research. InstaHide is a state-of-the-art
scheme to protect training data privacy with only minor effects on test
accuracy, and its security has become a salient question. In this paper, we
systematically study recent attacks on InstaHide and present a unified
framework to understand and analyze these attacks. We find that existing
attacks either do not have a provable guarantee or can only recover a single
private image. On the current InstaHide challenge setup, where each InstaHide
image is a mixture of two private images, we present a new algorithm to recover
all the private images with a provable guarantee and optimal sample complexity.
In addition, we also provide a computational hardness result on retrieving all
InstaHide images. Our results demonstrate that InstaHide is not
information-theoretically secure but computationally secure in the worst case,
even when mixing two private images.
外部データセット
MNIST
CIFAR-10
CIFAR-100
ImageNet
参考文献
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security
Deep learning with differential privacy
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang
Published: 2016
IEEE Transactions on Wireless Communications
Federated learning over wireless fading channels
Mohammad Mohammadi Amiri, Deniz Gündüz
Published: 2020
Transactions on Information Forensics and Security
Privacy-preserving deep learning via additively homomorphic encryption
Communication-efficient distributed sgd with sketching
Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Ion Stoica, Raman Arora
Published: 2019
Graphs and Combinatorics
Recognizing intersection graphs of linear uniform hypergraphs
Michael S Jacobson, André E Kézdy, Jenő Lehel
Published: 1997
61st Annual IEEE Symposium on Foundations of Computer Science (FOCS)
A faster interior point method for semidefinite programming
Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, Zhao Song
Published: 2020
arxiv
被引用数 9
Federated Learning: Strategies for Improving Communication Efficiency
Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon
Published: 2016.10.18
Federated Learning is a machine learning setting where the goal is to train a
high-quality centralized model while training data remains distributed over a
large number of clients each with unreliable and relatively slow network
connections. We consider learning algorithms for this setting where on each
round, each client independently computes an update to the current model based
on its local data, and communicates this update to a central server, where the
client-side updates are aggregated to compute a new global model. The typical
clients in this setting are mobile phones, and communication efficiency is of
the utmost importance.
In this paper, we propose two ways to reduce the uplink communication costs:
structured updates, where we directly learn an update from a restricted space
parametrized using a smaller number of variables, e.g. either low-rank or a
random mask; and sketched updates, where we learn a full model update and then
compress it using a combination of quantization, random rotations, and
subsampling before sending it to the server. Experiments on both convolutional
and recurrent networks show that the proposed methods can reduce the
communication cost by two orders of magnitude.
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas
Published: 2016.2.18
Modern mobile devices have access to a wealth of data suitable for learning
models, which in turn can greatly improve the user experience on the device.
For example, language models can improve speech recognition and text entry, and
image models can automatically select good photos. However, this rich data is
often privacy sensitive, large in quantity, or both, which may preclude logging
to the data center and training there using conventional approaches. We
advocate an alternative that leaves the training data distributed on the mobile
devices, and learns a shared model by aggregating locally-computed updates. We
term this decentralized approach Federated Learning.
We present a practical method for the federated learning of deep networks
based on iterative model averaging, and conduct an extensive empirical
evaluation, considering five different model architectures and four datasets.
These experiments demonstrate the approach is robust to the unbalanced and
non-IID data distributions that are a defining characteristic of this setting.
Communication costs are the principal constraint, and we show a reduction in
required communication rounds by 10-100x as compared to synchronized stochastic
gradient descent.