Adversarial Suffix Filtering: a Defense Pipeline for LLMs Authors: David Khachaturov, Robert Mullins | Published: 2025-05-14 Prompt validation倫理基準遵守Attack Detection Method 2025.05.14 2025.05.28 Literature Database
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction Authors: Yulin Chen, Haoran Li, Yuan Sui, Yue Liu, Yufei He, Yangqiu Song, Bryan Hooi | Published: 2025-04-29 Indirect Prompt InjectionPrompt validationAttack Method 2025.04.29 2025.05.27 Literature Database
Watermarking Needs Input Repetition Masking Authors: David Khachaturov, Robert Mullins, Ilia Shumailov, Sumanth Dathathri | Published: 2025-04-16 LLM Performance EvaluationPrompt validationWatermark Design 2025.04.16 2025.05.27 Literature Database
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design Authors: Andreas Happe, Jürgen Cito | Published: 2025-04-14 TestbedPrompt validationProgress Tracking 2025.04.14 2025.05.27 Literature Database
Can Indirect Prompt Injection Attacks Be Detected and Removed? Authors: Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi | Published: 2025-02-23 Prompt validationMalicious PromptAttack Method 2025.02.23 2025.05.27 Literature Database
Toxicity Detection for Free Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08 Indirect Prompt InjectionPrompt validationMalicious Prompt 2024.05.29 2025.05.27 Literature Database
Large Language Model Sentinel: LLM Agent for Adversarial Purification Authors: Guang Lin, Toshihisa Tanaka, Qibin Zhao | Published: 2024-05-24 | Updated: 2025-04-23 Prompt validationAdversarial Text PurificationDefense Mechanism 2024.05.24 2025.05.27 Literature Database
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan | Published: 2023-11-20 | Updated: 2024-02-18 Prompt InjectionPrompt validationRobustness Evaluation 2023.11.20 2025.05.28 Literature Database
Fact-Checking Complex Claims with Program-Guided Reasoning Authors: Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, Preslav Nakov | Published: 2023-05-22 Prompt validationDetection of MisinformationReal-World Fact-Checking 2023.05.22 2025.05.28 Literature Database
Towards Few-Shot Fact-Checking via Perplexity Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Madian Khabsa, Pascale Fung | Published: 2021-03-17 Few-Shot LearningPrompt validationDetection of Misinformation 2021.03.17 2025.05.28 Literature Database