プロンプトの検証

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction

Authors: Yulin Chen, Haoran Li, Yuan Sui, Yue Liu, Yufei He, Yangqiu Song, Bryan Hooi | Published: 2025-04-29

インダイレクトプロンプトインジェクション

プロンプトの検証

攻撃手法

2025.04.29

文献データベース

Watermarking Needs Input Repetition Masking

Authors: David Khachaturov, Robert Mullins, Ilia Shumailov, Sumanth Dathathri | Published: 2025-04-16

LLM性能評価

プロンプトの検証

透かし設計

2025.04.16

文献データベース

Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design

Authors: Andreas Happe, Jürgen Cito | Published: 2025-04-14

テストベッド

プロンプトの検証

進捗追跡

2025.04.14

文献データベース

Detecting Instruction Fine-tuning Attacks on Language Models using Influence Function

Authors: Jiawei Li | Published: 2025-04-12 | Updated: 2025-09-30

バックドア攻撃

プロンプトの検証

感情分析

2025.04.12

文献データベース

Can Indirect Prompt Injection Attacks Be Detected and Removed?

Authors: Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi | Published: 2025-02-23

プロンプトの検証

悪意のあるプロンプト

攻撃手法

2025.02.23 2025.04.03

文献データベース

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs

Authors: Rui Pu, Chaozhuo Li, Rui Ha, Zejian Chen, Litian Zhang, Zheng Liu, Lirong Qiu, Zaisheng Ye | Published: 2024-10-18 | Updated: 2025-07-08

LLMの安全機構の解除

プロンプトインジェクション

プロンプトの検証

2024.10.18

文献データベース

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Authors: Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, Tom Ault, Leslie Barrett, David Rabinowitz, John Doucette, NhatHai Phan | Published: 2024-07-20 | Updated: 2025-07-10

プロンプトインジェクション

プロンプトの検証

敵対的攻撃

2024.07.20

文献データベース

Toxicity Detection for Free

Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08

インダイレクトプロンプトインジェクション

プロンプトの検証

悪意のあるプロンプト

2024.05.29 2025.04.03

文献データベース

Large Language Model Sentinel: LLM Agent for Adversarial Purification

Authors: Guang Lin, Toshihisa Tanaka, Qibin Zhao | Published: 2024-05-24 | Updated: 2025-04-23

プロンプトの検証

敵対的テキスト浄化

防御メカニズム

2024.05.24

文献データベース

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan | Published: 2023-11-20 | Updated: 2024-02-18

プロンプトインジェクション

プロンプトの検証

ロバスト性に関する評価

2023.11.20 2025.04.03

文献データベース