Recent studies propose membership inference (MI) attacks on deep models,
where the goal is to infer if a sample has been used in the training process.
Despite their apparent success, these studies only report accuracy, precision,
and recall of the positive class (member class). Hence, the performance of
these attacks have not been clearly reported on negative class (non-member
class). In this paper, we show that the way the MI attack performance has been
reported is often misleading because they suffer from high false positive rate
or false alarm rate (FAR) that has not been reported. FAR shows how often the
attack model mislabel non-training samples (non-member) as training (member)
ones. The high FAR makes MI attacks fundamentally impractical, which is
particularly more significant for tasks such as membership inference where the
majority of samples in reality belong to the negative (non-training) class.
Moreover, we show that the current MI attack models can only identify the
membership of misclassified samples with mediocre accuracy at best, which only
constitute a very small portion of training samples.
We analyze several new features that have not been comprehensively explored
for membership inference before, including distance to the decision boundary
and gradient norms, and conclude that deep models' responses are mostly similar
among train and non-train samples. We conduct several experiments on image
classification tasks, including MNIST, CIFAR-10, CIFAR-100, and ImageNet, using
various model architecture, including LeNet, AlexNet, ResNet, etc. We show that
the current state-of-the-art MI attacks cannot achieve high accuracy and low
FAR at the same time, even when the attacker is given several advantages.
The source code is available at https://github.com/shrezaei/MI-Attack.