These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Data contamination, i.e., the presence of test data from downstream tasks in
the training data of large language models (LLMs), is a potential major issue
in measuring LLMs' real effectiveness on other tasks. We propose a
straightforward yet effective method for identifying data contamination within
LLMs. At its core, our approach starts by identifying potential contamination
at the instance level; using this information, our approach then assesses wider
contamination at the partition level. To estimate contamination of individual
instances, we employ "guided instruction:" a prompt consisting of the dataset
name, partition type, and the random-length initial segment of a reference
instance, asking the LLM to complete it. An instance is flagged as contaminated
if the LLM's output either exactly or nearly matches the latter segment of the
reference. To understand if an entire partition is contaminated, we propose two
ideas. The first idea marks a dataset partition as contaminated if the average
overlap score with the reference instances (as measured by ROUGE-L or BLEURT)
is statistically significantly better with the completions from guided
instruction compared to a "general instruction" that does not include the
dataset and partition name. The second idea marks a dataset partition as
contaminated if a classifier based on GPT-4 with few-shot in-context learning
prompt marks multiple generated completions as exact/near-exact matches of the
corresponding reference instances. Our best method achieves an accuracy between
92% and 100% in detecting if an LLM is contaminated with seven datasets,
containing train and test/validation partitions, when contrasted with manual
evaluation by human experts. Further, our findings indicate that GPT-4 is
contaminated with AG News, WNLI, and XSum datasets.