Text Perturbation Method

ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation

Authors: Ruixuan Liu, Toan Tran, Tianhao Wang, Hongsheng Hu, Shuo Wang, Li Xiong | Published: 2024-12-30 | Updated: 2025-05-07
Text Perturbation Method
Backdoor Detection
Watermarking Technology

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

Authors: Xiaoyu Zhang, Cen Zhang, Tianlin Li, Yihao Huang, Xiaojun Jia, Ming Hu, Jie Zhang, Yang Liu, Shiqing Ma, Chao Shen | Published: 2023-12-17 | Updated: 2025-03-15
Text Perturbation Method
Prompt Injection
Attack Method

Robust Distortion-free Watermarks for Language Models

Authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang | Published: 2023-07-28 | Updated: 2024-06-06
Text Perturbation Method
Digital Watermarking for Generative AI
Statistical Hypothesis Testing

Provable Robust Watermarking for AI-Generated Text

Authors: Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang | Published: 2023-06-30 | Updated: 2023-10-13
Text Perturbation Method
Digital Watermarking for Generative AI
Robustness of Watermarking Techniques

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Authors: Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn | Published: 2023-01-26 | Updated: 2023-07-23
Identification of AI Output
Text Perturbation Method
Deep Learning Method

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Authors: Ahmadreza Azizi, Ibrahim Asadullah Tahmid, Asim Waheed, Neal Mangaokar, Jiameng Pu, Mobin Javed, Chandan K. Reddy, Bimal Viswanath | Published: 2021-03-07 | Updated: 2021-03-11
Text Perturbation Method
Backdoor Detection
Attack Method

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Authors: Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, Maosong Sun | Published: 2020-11-20 | Updated: 2021-11-03
Text Perturbation Method
Trigger Detection
Backdoor Detection

A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

Authors: Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier | Published: 2020-10-22
Text Perturbation Method
Causes of Information Leakage
Machine Learning Algorithm

FastWordBug: A Fast Method To Generate Adversarial Text Against NLP Applications

Authors: Dou Goodman, Lv Zhonghou, Wang minghua | Published: 2020-01-31
Text Perturbation Method
Adversarial Perturbation Techniques
Natural Language Processing

Automatic Detection of Generated Text is Easiest when Humans are Fooled

Authors: Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, Douglas Eck | Published: 2019-11-02 | Updated: 2020-05-07
Identification of AI Output
Text Perturbation Method
Deep Learning Method