These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
An important question today is whether a given text was used to train a large
language model (LLM). A \emph{completion} test is often employed: check if the
LLM completes a sufficiently complex text. This, however, requires a
ground-truth definition of membership; most commonly, it is defined as a member
based on the $n$-gram overlap between the target text and any text in the
dataset. In this work, we demonstrate that this $n$-gram based membership
definition can be effectively gamed. We study scenarios where sequences are
\emph{non-members} for a given $n$ and we find that completion tests still
succeed. We find many natural cases of this phenomenon by retraining LLMs from
scratch after removing all training samples that were completed; these cases
include exact duplicates, near-duplicates, and even short overlaps. They
showcase that it is difficult to find a single viable choice of $n$ for
membership definitions. Using these insights, we design adversarial datasets
that can cause a given target sequence to be completed without containing it,
for any reasonable choice of $n$. Our findings highlight the inadequacy of
$n$-gram membership, suggesting membership definitions fail to account for
auxiliary information available to the training algorithm.