These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning models are vulnerable to membership inference attacks in
which an adversary aims to predict whether or not a particular sample was
contained in the target model's training dataset. Existing attack methods have
commonly exploited the output information (mostly, losses) solely from the
given target model. As a result, in practical scenarios where both the member
and non-member samples yield similarly small losses, these methods are
naturally unable to differentiate between them. To address this limitation, in
this paper, we propose a new attack method, called \system, which can exploit
the membership information from the whole training process of the target model
for improving the attack performance. To mount the attack in the common
black-box setting, we leverage knowledge distillation, and represent the
membership information by the losses evaluated on a sequence of intermediate
models at different distillation epochs, namely \emph{distilled loss
trajectory}, together with the loss from the given target model. Experimental
results over different datasets and model architectures demonstrate the great
advantage of our attack in terms of different metrics. For example, on
CINIC-10, our attack achieves at least 6$\times$ higher true-positive rate at a
low false-positive rate of 0.1\% than existing methods. Further analysis
demonstrates the general effectiveness of our attack in more strict scenarios.