What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks Authors: Nathalie Kirch, Constantin Weisser, Severin Field, Helen Yannakoudakis, Stephen Casper | Published: 2024-11-02 | Updated: 2025-05-14 LLMの安全機構の解除プロンプトインジェクション探索的攻撃 2024.11.02 文献データベース
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models Authors: Benji Peng, Keyu Chen, Qian Niu, Ziqian Bi, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K. Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin | Published: 2024-10-20 | Updated: 2025-05-08 LLMセキュリティLLMの安全機構の解除プロンプトインジェクション 2024.10.20 文献データベース
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs Authors: Rui Pu, Chaozhuo Li, Rui Ha, Zejian Chen, Litian Zhang, Zheng Liu, Lirong Qiu, Zaisheng Ye | Published: 2024-10-18 | Updated: 2025-07-08 LLMの安全機構の解除プロンプトインジェクションプロンプトの検証 2024.10.18 文献データベース
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method Authors: Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | Published: 2024-09-23 | Updated: 2025-04-01 LLMの安全機構の解除モデル性能評価情報抽出 2024.09.23 2025.04.03 文献データベース