テキスト生成手法

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

Authors: Andy Zhou, Kevin Wu, Francesco Pinto, Zhaorun Chen, Yi Zeng, Yu Yang, Shuang Yang, Sanmi Koyejo, James Zou, Bo Li | Published: 2025-03-20
エラー処理
テキスト生成手法
テストケース生成

OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions

Authors: Yu-Shin Huang, Peter Just, Krishna Narayanan, Chao Tian | Published: 2024-10-06
テキスト生成手法
最適化問題

Adversarial Text Purification: A Large Language Model Approach for Defense

Authors: Raha Moraffah, Shubh Khandelwal, Amrita Bhattacharjee, Huan Liu | Published: 2024-02-05
テキスト生成手法
プロンプトインジェクション
敵対的テキスト浄化

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Authors: Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song | Published: 2023-11-19 | Updated: 2023-11-25
テキスト生成手法
バックドア攻撃
ポイズニング

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

Authors: Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber | Published: 2023-10-24 | Updated: 2024-03-03
テキスト生成手法
プロンプトインジェクション
攻撃手法

Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

Authors: Atefeh Mahdavi, Neda Keivandarian, Marco Carvalho | Published: 2023-08-29
テキスト生成手法
敵対的訓練
適応型誤用検出

Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs

Authors: Da Silva Gameiro Henrique, Andrei Kucharavy, Rachid Guerraoui | Published: 2023-04-18
LLMセキュリティ
テキスト生成手法
生成的敵対ネットワーク

Masked Language Model Based Textual Adversarial Example Detection

Authors: Xiaomei Zhang, Zhaoxi Zhang, Qi Zhong, Xufei Zheng, Yanjun Zhang, Shengshan Hu, Leo Yu Zhang | Published: 2023-04-18 | Updated: 2024-01-28
DNN IP保護手法
テキスト生成手法
生成的敵対ネットワーク

Semantic-Preserving Adversarial Text Attacks

Authors: Xinghao Yang, Weifeng Liu, James Bailey, Dacheng Tao, Wei Liu | Published: 2021-08-23 | Updated: 2023-03-03
アルゴリズム
テキスト生成手法
敵対的サンプル

MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models

Authors: Thai Le, Suhang Wang, Dongwon Lee | Published: 2020-09-01 | Updated: 2020-09-27
データ生成
テキスト生成手法
敵対的攻撃