These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Membership inference attacks (MIAs) have emerged as the standard tool for
evaluating the privacy risks of AI models. However, state-of-the-art attacks
require training numerous, often computationally expensive, reference models,
limiting their practicality. We present a novel approach for estimating
model-level vulnerability, the TPR at low FPR, to membership inference attacks
without requiring reference models. Empirical analysis shows loss distributions
to be asymmetric and heavy-tailed and suggests that most points at risk from
MIAs have moved from the tail (high-loss region) to the head (low-loss region)
of the distribution after training. We leverage this insight to propose a
method to estimate model-level vulnerability from the training and testing
distribution alone: using the absence of outliers from the high-loss region as
a predictor of the risk. We evaluate our method, the TNR of a simple loss
attack, across a wide range of architectures and datasets and show it to
accurately estimate model-level vulnerability to the SOTA MIA attack (LiRA). We
also show our method to outperform both low-cost (few reference models) attacks
such as RMIA and other measures of distribution difference. We finally evaluate
the use of non-linear functions to evaluate risk and show the approach to be
promising to evaluate the risk in large-language models.