テキストの摂動手法

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

Authors: Xiaoyu Zhang, Cen Zhang, Tianlin Li, Yihao Huang, Xiaojun Jia, Ming Hu, Jie Zhang, Yang Liu, Shiqing Ma, Chao Shen | Published: 2023-12-17 | Updated: 2025-03-15
テキストの摂動手法
プロンプトインジェクション
攻撃手法

Robust Distortion-free Watermarks for Language Models

Authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang | Published: 2023-07-28 | Updated: 2024-06-06
テキストの摂動手法
生成AI向け電子透かし
統計的仮説検定

Provable Robust Watermarking for AI-Generated Text

Authors: Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang | Published: 2023-06-30 | Updated: 2023-10-13
テキストの摂動手法
生成AI向け電子透かし
透かし技術の堅牢性

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Authors: Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn | Published: 2023-01-26 | Updated: 2023-07-23
AIによる出力の識別
テキストの摂動手法
深層学習手法

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Authors: Ahmadreza Azizi, Ibrahim Asadullah Tahmid, Asim Waheed, Neal Mangaokar, Jiameng Pu, Mobin Javed, Chandan K. Reddy, Bimal Viswanath | Published: 2021-03-07 | Updated: 2021-03-11
テキストの摂動手法
バックドアモデルの検知
攻撃手法

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Authors: Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, Maosong Sun | Published: 2020-11-20 | Updated: 2021-11-03
テキストの摂動手法
トリガーの検知
バックドアモデルの検知

A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

Authors: Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier | Published: 2020-10-22
テキストの摂動手法
情報漏洩の原因
機械学習アルゴリズム

FastWordBug: A Fast Method To Generate Adversarial Text Against NLP Applications

Authors: Dou Goodman, Lv Zhonghou, Wang minghua | Published: 2020-01-31
テキストの摂動手法
敵対的摂動手法
自然言語処理

Automatic Detection of Generated Text is Easiest when Humans are Fooled

Authors: Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, Douglas Eck | Published: 2019-11-02 | Updated: 2020-05-07
AIによる出力の識別
テキストの摂動手法
深層学習手法