MGTBench: Benchmarking Machine-Generated Text Detection

TOP Literature Database MGTBench: Benchmarking Machine-Generated Text Detection

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2303.14822

PDF

https://arxiv.org/pdf/2303.14822

Paper Information

Author: Xinlei He;Xinyue Shen;Zeyuan Chen;Michael Backes;Yang Zhang
Published: 3-27-2023
Updated: 1-16-2024
Affiliation: CISPA Helmholtz Center for Information Security
Country: Germany
Conference

Labels Estimated by AI

Prompt Injection Performance Evaluation MGT Detection Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Nowadays, powerful large language models (LLMs) such as ChatGPT have demonstrated revolutionary power in a variety of tasks. Consequently, the detection of machine-generated texts (MGTs) is becoming increasingly crucial as LLMs become more advanced and prevalent. These models have the ability to generate human-like language, making it challenging to discern whether a text is authored by a human or a machine. This raises concerns regarding authenticity, accountability, and potential bias. However, existing methods for detecting MGTs are evaluated using different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework that encompasses various methodologies. Furthermore, it remains unclear how existing detection methods would perform against powerful LLMs. In this paper, we fill this gap by proposing the first benchmark framework for MGT detection against powerful LLMs, named MGTBench. Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods. Our ablation study shows that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples. Moreover, we delve into a more challenging task: text attribution. Our findings indicate that the model-based detection methods still perform well in the text attribution task. To investigate the robustness of different detection methods, we consider three adversarial attacks, namely paraphrasing, random spacing, and adversarial perturbations. We discover that these attacks can significantly diminish detection effectiveness, underscoring the critical need for the development of more robust detection methods.

External Datasets

Essay

Reuters

References

Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo

Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, Benjamin Schmidt, Andriy Mulyar

Published: 2023

International Joint Conference on Natural Language Processing (IJCNLP)

Identifying Real or Fake Articles: Towards better Language Modeling

Sameer Badaskar, Sachin Agarwal, Shilpa Arora

Published: 2008

Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL/IJCNLP)

Conda: Contrastive domain adaptation for ai-generated text detection

Amrita Bhattacharjee, Tharindu Kumarage, Raha Moraffah, Huan Liu

Published: 2023

CoRR

Gpt-neox-20b: An open-source autoregressive language model

Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

Published: 2022

OpenAI Technical Report

Language models are few-shot learners

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei

Published: 2020

CoRR

On the possibilities of ai-generated text detection

Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

Published: 2023

CoRR

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

Published: 2022

Annual Conference on Neural Information Processing Systems (NIPS)

Deep Reinforcement Learning from Human Preferences

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

Published: 2017

Proceedings of NAACL-HLT

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Published: 2019

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Glm: General language model pretraining with autoregressive blank infilling

Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang

Published: 2022

Annual Meeting of the Association for Computational Linguistics (ACL)

GLTR: Statistical Detection and Visualization of Generated Text

Sebastian Gehrmann, Hendrik Strobelt, Alexander M. Rush

Published: 2019

How close is chatgpt to human experts? comparison corpus, evaluation, and detection

B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, Y. Wu

Published: 2023

Artificial Intelligence: Methodology, Systems, and Applications (AIMSA)

N-gram feature selection for authorship identification

John Houvardas, Efstathios Stamatatos

Published: 2006

arxiv

Cited by 3

Annual Meeting of the Association for Computational Linguistics (ACL)

Automatic Detection of Generated Text is Easiest when Humans are Fooled

Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, Douglas Eck

Published: 11.2.2019

Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of research interest, but humans and machines rely on different cues to make their decisions. Here, we perform careful benchmarking and analysis of three popular sampling-based decoding strategies---top-$k$, nucleus sampling, and untruncated random sampling---and show that improvements in decoding methods have primarily optimized for fooling humans. This comes at the expense of introducing statistical abnormalities that make detection easy for automatic systems. We also show that though both human and automatic detector performance improve with longer excerpt length, even multi-sentence excerpts can fool expert human raters over 30% of the time. Our findings reveal the importance of using both human and automatic detectors to assess the humanness of text generation systems.

Identification of AI Output Deep Learning Method Text Perturbation Method

Annual Meeting of the Association for Computational Linguistics (ACL)

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Stephanie Lin, Jacob Hilton, Owain Evans

Published: 2022

Roberta: A robustly optimized bert pretraining approach

International Conference on Machine Learning (ICML)

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn

Published: 1.27.2023

The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.

Identification of AI Output Text Perturbation Method Deep Learning Method

IEEE Symposium on Security and Privacy (S&P)

Deepfake Text Detection: Limitations and Opportunities

Jiameng Pu, Zain Sarwar, Sifat Muhammad Abdullah, Abdullah Rehman, Yoonjin Kim, Parantapa Bhattacharya, Mobin Javed, Bimal Viswanath

Published: 2023

Improving Language Understanding by Generative Pre-Training

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

Published: 2016

OpenAI blog

Language models are unsupervised multitask learners

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever

Published: 2019

Journal of machine learning research

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu

Published: 2020

Annual Meeting of the Association for Computational Linguistics (ACL)

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency

Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che

Published: 2019

CoRR

Release Strategies and the Social Impacts of Language Models

Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Jasmine Wang

Published: 2019

CoRR

Learning to summarize from human feedback

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano

Published: 2020

CoRR

Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text

Jinyan Su, Terry Yue Zhuo, Di Wang, Preslav Nakov

Published: 2023

CoRR

ChatGPT: The End of Online Exam Integrity?

Teo Susnjak

Published: 2022

Conference on Empirical Methods in Natural Language Processing (EMNLP)

Authorship Attribution for Neural Text Generation

Adaku Uchendu, Thai Le, Kai Shu, Dongwon Lee

Published: 2020

Conference on Empirical Methods in Natural Language Processing (EMNLP)

TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation

Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, Dongwon Lee

Published: 2021

Advances in Neural Information Processing Systems

Attention is all you need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, I. Polosukhin

Published: 2017

CoRR

Ghostbuster: Detecting text ghostwritten by large language models

Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein

Published: 2023

CoRR

Ethical and social risks of harm from Language Models

Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William S. Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

Published: 2021

arxiv

Cited by 1

Conference on Neural Information Processing Systems (NeurIPS)

Defending Against Neural Fake News

Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi

Published: 5.30.2019

Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news. Modern computer security relies on careful threat modeling: identifying potential threats and vulnerabilities from an adversary's point of view, and exploring potential mitigations to these threats. Likewise, developing robust defenses against neural fake news requires us first to carefully investigate and characterize the risks of these models. We thus present a model for controllable text generation called Grover. Given a headline like `Link Found Between Vaccines and Autism,' Grover can generate the rest of the article; humans find these generations to be more trustworthy than human-written disinformation. Developing robust verification techniques against generators like Grover is critical. We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias -- and sampling strategies that alleviate its effects -- both leave artifacts that similar discriminators can pick up on. We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news.

Identification of AI Output Deep Learning Method Cyber Threat