DEMASQ: Unmasking the ChatGPT Wordsmith

TOP Literature Database DEMASQ: Unmasking the ChatGPT Wordsmith

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2311.05019

PDF

https://arxiv.org/pdf/2311.05019

Paper Information

Author: Kavita Kumari;Alessandro Pegoraro;Hossein Fereidooni;Ahmad-Reza Sadeghi
Published: 11-9-2023
Affiliation: Technical University of Darmstadt
Country: Germany
Conference: Network and Distributed System Security Symposium (NDSS)

Labels Estimated by AI

Energy-Based Model Prompt Injection Evaluation Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The potential misuse of ChatGPT and other Large Language Models (LLMs) has raised concerns regarding the dissemination of false information, plagiarism, academic dishonesty, and fraudulent activities. Consequently, distinguishing between AI-generated and human-generated content has emerged as an intriguing research topic. However, current text detection methods lack precision and are often restricted to specific tasks or domains, making them inadequate for identifying content generated by ChatGPT. In this paper, we propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content. Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods. DEMASQ is an energy-based detection model that incorporates novel aspects, such as (i) optimization inspired by the Doppler effect to capture the interdependence between input text embeddings and output labels, and (ii) the use of explainable AI techniques to generate diverse perturbations. To evaluate our detector, we create a benchmark dataset comprising a mixture of prompts from both ChatGPT and humans, encompassing domains such as medical, open Q&A, finance, wiki, and Reddit. Our evaluation demonstrates that DEMASQ achieves high accuracy in identifying content generated by ChatGPT.

External Datasets

benchmark dataset

medical dataset

wiki dataset

open Q&A dataset

reddit dataset

finance dataset

political dataset

arXiv dataset

Task1

Task2

Task3