プロンプトインジェクション

Defeating Prompt Injections by Design

Authors: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr | Published: 2025-03-24

インダイレクトプロンプトインジェクション

プロンプトインジェクション

2025.03.24 2025.04.03

文献データベース

Large Language Models powered Network Attack Detection: Architecture, Opportunities and Case Study

Authors: Xinggong Zhang, Qingyang Li, Yunpeng Tan, Zongming Guo, Lei Zhang, Yong Cui | Published: 2025-03-24

プロンプトインジェクション

プロンプトリーキング

侵入検知システム

2025.03.24 2025.04.03

文献データベース

Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection

Authors: Fei Zuo, Junghwan Rhee, Yung Ryn Choe | Published: 2025-03-24

サイバー脅威インテリジェンス

プロンプトインジェクション

情報抽出

2025.03.24 2025.04.03

文献データベース

STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models

Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang | Published: 2025-03-23

プロンプトインジェクション

悪意のあるプロンプト

防御手法の効果分析

2025.03.23 2025.04.03

文献データベース

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun | Published: 2025-03-20

バックドア攻撃

プロンプトインジェクション

大規模言語モデル

2025.03.20 2025.04.03

文献データベース

Detecting LLM-Written Peer Reviews

Authors: Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah | Published: 2025-03-20

プロンプトインジェクション

生成AI向け電子透かし

透かし設計

2025.03.20 2025.04.03

文献データベース

Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao | Published: 2025-03-19

プロンプトインジェクション

大規模言語モデル

攻撃手法

2025.03.19 2025.04.03

文献データベース

Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models

Authors: Prashant Kulkarni, Assaf Namer | Published: 2025-03-18

プロンプトインジェクション

プロンプトリーキング

攻撃手法

2025.03.18 2025.04.03

文献データベース

MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting

Authors: Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang | Published: 2025-03-17

プロンプトインジェクション

大規模言語モデル

攻撃手法

2025.03.17 2025.04.03

文献データベース

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14

LLMの安全機構の解除

プロンプトインジェクション

悪意のあるプロンプト

2025.03.14 2025.04.03

文献データベース