Large Language Models (LLMs) have experienced rapid advancements, with
applications spanning a wide range of fields, including sentiment
classification, review generation, and question answering. Due to their
efficiency and versatility, researchers and companies increasingly employ
LLM-generated data to train their models. However, the inability to track
content produced by LLMs poses a significant challenge, potentially leading to
copyright infringement for the LLM owners. In this paper, we propose a method
for injecting watermarks into LLM-generated datasets, enabling the tracking of
downstream tasks to detect whether these datasets were produced using the
original LLM. These downstream tasks can be divided into two categories. The
first involves using the generated datasets at the input level, commonly for
training classification tasks. The other is the output level, where model
trainers use LLM-generated content as output for downstream tasks, such as
question-answering tasks. We design a comprehensive set of experiments to
evaluate both watermark methods. Our results indicate the high effectiveness of
our watermark approach. Additionally, regarding model utility, we find that
classifiers trained on the generated datasets achieve a test accuracy exceeding
0.900 in many cases, suggesting that the utility of such models remains robust.
For the output-level watermark, we observe that the quality of the generated
text is comparable to that produced using real-world datasets. Through our
research, we aim to advance the protection of LLM copyrights, taking a
significant step forward in safeguarding intellectual property in this domain.