These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Quantifying the impact of individual data samples on machine learning models
is an open research problem. This is particularly relevant when complex and
high-dimensional relationships have to be learned from a limited sample of the
data generating distribution, such as in deep learning. It was previously shown
that, in these cases, models rely not only on extracting patterns which are
helpful for generalisation, but also seem to be required to incorporate some of
the training data more or less as is, in a process often termed memorisation.
This raises the question: if some memorisation is a requirement for effective
learning, what are its privacy implications? In this work we unify a broad
range of previous definitions and perspectives on memorisation in ML, discuss
their interplay with model generalisation and their implications of these
phenomena on data privacy. Moreover, we systematise methods allowing
practitioners to detect the occurrence of memorisation or quantify it and
contextualise our findings in a broad range of ML learning settings. Finally,
we discuss memorisation in the context of privacy attacks, differential privacy
(DP) and adversarial actors.