Extracting Memorized Training Data via Decomposition

TOP Literature Database Extracting Memorized Training Data via Decomposition

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2409.12367

PDF

https://arxiv.org/pdf/2409.12367

Paper Information

Author: Ellen Su;Anu Vellore;Amy Chang;Raffaele Mura;Blaine Nelson;Paul Kassianik;Amin Karbasi
Published: 9-19-2024
Updated: 10-2-2024
Affiliation: Robust Intelligence
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Training Data Extraction Method Model Performance Evaluation Prompting Strategy

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.

External Datasets

New York Times articles

Wall Street Journal articles