TextGuard: Provable Defense against Backdoor Attacks on Text Classification Authors: Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song | Published: 2023-11-19 | Updated: 2023-11-25 テキスト生成手法バックドア攻撃ポイズニング 2023.11.19 2025.04.03 文献データベース
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition Authors: Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber | Published: 2023-10-24 | Updated: 2024-03-03 テキスト生成手法プロンプトインジェクション攻撃手法 2023.10.24 2025.04.03 文献データベース
Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification Authors: Atefeh Mahdavi, Neda Keivandarian, Marco Carvalho | Published: 2023-08-29 テキスト生成手法敵対的訓練適応型誤用検出 2023.08.29 2025.04.03 文献データベース
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs Authors: Da Silva Gameiro Henrique, Andrei Kucharavy, Rachid Guerraoui | Published: 2023-04-18 LLMセキュリティテキスト生成手法生成的敵対ネットワーク 2023.04.18 2025.04.03 文献データベース
Masked Language Model Based Textual Adversarial Example Detection Authors: Xiaomei Zhang, Zhaoxi Zhang, Qi Zhong, Xufei Zheng, Yanjun Zhang, Shengshan Hu, Leo Yu Zhang | Published: 2023-04-18 | Updated: 2024-01-28 DNN IP保護手法テキスト生成手法生成的敵対ネットワーク 2023.04.18 2025.04.03 文献データベース
Semantic-Preserving Adversarial Text Attacks Authors: Xinghao Yang, Weifeng Liu, James Bailey, Dacheng Tao, Wei Liu | Published: 2021-08-23 | Updated: 2023-03-03 アルゴリズムテキスト生成手法敵対的サンプル 2021.08.23 2025.04.03 文献データベース
MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models Authors: Thai Le, Suhang Wang, Dongwon Lee | Published: 2020-09-01 | Updated: 2020-09-27 データ生成テキスト生成手法敵対的攻撃 2020.09.01 2025.04.03 文献データベース
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification Authors: Chuanshuai Chen, Jiazhu Dai | Published: 2020-07-11 | Updated: 2021-03-15 テキスト生成手法バックドア攻撃ポイズニング 2020.07.11 2025.04.03 文献データベース