These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Models (LLMs) have the promise to revolutionize computing
broadly, but their complexity and extensive training data also expose
significant privacy vulnerabilities. One of the simplest privacy risks
associated with LLMs is their susceptibility to membership inference attacks
(MIAs), wherein an adversary aims to determine whether a specific data point
was part of the model's training set. Although this is a known risk, state of
the art methodologies for MIAs rely on training multiple computationally costly
shadow models, making risk evaluation prohibitive for large models. Here we
adapt a recent line of work which uses quantile regression to mount membership
inference attacks; we extend this work by proposing a low-cost MIA that
leverages an ensemble of small quantile regression models to determine if a
document belongs to the model's training set or not. We demonstrate the
effectiveness of this approach on fine-tuned LLMs of varying families (OPT,
Pythia, Llama) and across multiple datasets. Across all scenarios we obtain
comparable or improved accuracy compared to state of the art shadow model
approaches, with as little as 6% of their computation budget. We demonstrate
increased effectiveness across multi-epoch trained target models, and
architecture miss-specification robustness, that is, we can mount an effective
attack against a model using a different tokenizer and architecture, without
requiring knowledge on the target model.