Membership Inference Attacks Against In-Context Learning

TOP Literature Database Membership Inference Attacks Against In-Context Learning

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2409.01380

PDF

https://arxiv.org/pdf/2409.01380

Paper Information

Author: Rui Wen;Zheng Li;Michael Backes;Yang Zhang
Published: 9-3-2024
Affiliation: CISPA Helmholtz Center for Information Security
Country: Germany
Conference

Labels Estimated by AI

Membership Inference Prompt Injection Attack Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.

External Datasets

AGNews

TREC

DBPedia