These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Nowadays, powerful large language models (LLMs) such as ChatGPT have
demonstrated revolutionary power in a variety of tasks. Consequently, the
detection of machine-generated texts (MGTs) is becoming increasingly crucial as
LLMs become more advanced and prevalent. These models have the ability to
generate human-like language, making it challenging to discern whether a text
is authored by a human or a machine. This raises concerns regarding
authenticity, accountability, and potential bias. However, existing methods for
detecting MGTs are evaluated using different model architectures, datasets, and
experimental settings, resulting in a lack of a comprehensive evaluation
framework that encompasses various methodologies. Furthermore, it remains
unclear how existing detection methods would perform against powerful LLMs. In
this paper, we fill this gap by proposing the first benchmark framework for MGT
detection against powerful LLMs, named MGTBench. Extensive evaluations on
public datasets with curated texts generated by various powerful LLMs such as
ChatGPT-turbo and Claude demonstrate the effectiveness of different detection
methods. Our ablation study shows that a larger number of words in general
leads to better performance and most detection methods can achieve similar
performance with much fewer training samples. Moreover, we delve into a more
challenging task: text attribution. Our findings indicate that the model-based
detection methods still perform well in the text attribution task. To
investigate the robustness of different detection methods, we consider three
adversarial attacks, namely paraphrasing, random spacing, and adversarial
perturbations. We discover that these attacks can significantly diminish
detection effectiveness, underscoring the critical need for the development of
more robust detection methods.