Prompt validation

Adversarial Suffix Filtering: a Defense Pipeline for LLMs

Authors: David Khachaturov, Robert Mullins | Published: 2025-05-14
Prompt validation
倫理基準遵守
Attack Detection Method

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction

Authors: Yulin Chen, Haoran Li, Yuan Sui, Yue Liu, Yufei He, Yangqiu Song, Bryan Hooi | Published: 2025-04-29
Indirect Prompt Injection
Prompt validation
Attack Method

Watermarking Needs Input Repetition Masking

Authors: David Khachaturov, Robert Mullins, Ilia Shumailov, Sumanth Dathathri | Published: 2025-04-16
LLM Performance Evaluation
Prompt validation
Watermark Design

Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design

Authors: Andreas Happe, Jürgen Cito | Published: 2025-04-14
Testbed
Prompt validation
Progress Tracking

Can Indirect Prompt Injection Attacks Be Detected and Removed?

Authors: Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi | Published: 2025-02-23
Prompt validation
Malicious Prompt
Attack Method

Toxicity Detection for Free

Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08
Indirect Prompt Injection
Prompt validation
Malicious Prompt

Large Language Model Sentinel: LLM Agent for Adversarial Purification

Authors: Guang Lin, Toshihisa Tanaka, Qibin Zhao | Published: 2024-05-24 | Updated: 2025-04-23
Prompt validation
Adversarial Text Purification
Defense Mechanism

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan | Published: 2023-11-20 | Updated: 2024-02-18
Prompt Injection
Prompt validation
Robustness Evaluation

Fact-Checking Complex Claims with Program-Guided Reasoning

Authors: Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, Preslav Nakov | Published: 2023-05-22
Prompt validation
Detection of Misinformation
Real-World Fact-Checking

Towards Few-Shot Fact-Checking via Perplexity

Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Madian Khabsa, Pascale Fung | Published: 2021-03-17
Few-Shot Learning
Prompt validation
Detection of Misinformation