Abstract
Over the past few years, providers such as Google, Microsoft, and Amazon have
started to provide customers with access to software interfaces allowing them
to easily embed machine learning tasks into their applications. Overall,
organizations can now use Machine Learning as a Service (MLaaS) engines to
outsource complex tasks, e.g., training classifiers, performing predictions,
clustering, etc. They can also let others query models trained on their data.
Naturally, this approach can also be used (and is often advocated) in other
contexts, including government collaborations, citizen science projects, and
business-to-business partnerships. However, if malicious users were able to
recover data used to train these models, the resulting information leakage
would create serious issues. Likewise, if the inner parameters of the model are
considered proprietary information, then access to the model should not allow
an adversary to learn such parameters. In this document, we set to review
privacy challenges in this space, providing a systematic review of the relevant
research literature, also exploring possible countermeasures. More
specifically, we provide ample background information on relevant concepts
around machine learning and privacy. Then, we discuss possible adversarial
models and settings, cover a wide range of attacks that relate to private
and/or sensitive information leakage, and review recent results attempting to
defend against such attacks. Finally, we conclude with a list of open problems
that require more work, including the need for better evaluations, more
targeted defenses, and the study of the relation to policy and data protection
efforts.