Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

TOP Literature Database Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2503.17514

PDF

https://arxiv.org/pdf/2503.17514

Paper Information

Author: Ken Ziyu Liu,Christopher A. Choquette-Choo,Matthew Jagielski,Peter Kairouz,Sanmi Koyejo,Percy Liang,Nicolas Papernot
Published: 3-22-2025
Updated: 3-25-2025
Affiliation: Google
Country: United States of America
Conference

Labels Estimated by AI

RAG Membership Disclosure Risk Adversarial attack

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

An important question today is whether a given text was used to train a large language model (LLM). A \emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are \emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

External Datasets

FineWeb-Edu