One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models Authors: Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin | Published: 2025-05-12 LLMセキュリティLLMの安全機構の解除プロンプトインジェクション 2025.05.12 2025.05.14 Literature Database
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Authors: Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Vinod P | Published: 2025-04-30 LLMの安全機構の解除プロンプトインジェクション説明手法 2025.04.30 2025.05.12 Literature Database
LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware Authors: Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar | Published: 2025-04-09 LLMの安全機構の解除フレームワーク効率的な構成検証 2025.04.09 2025.05.12 Literature Database
Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms Authors: Shuoming Zhang, Jiacheng Zhao, Ruiyuan Xu, Xiaobing Feng, Huimin Cui | Published: 2025-03-31 LLMセキュリティLLMの安全機構の解除プロンプトインジェクション 2025.03.31 2025.05.12 Literature Database
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14 LLMの安全機構の解除プロンプトインジェクション悪意のあるプロンプト 2025.03.14 2025.05.12 Literature Database
Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search Authors: Andy Zhou | Published: 2025-03-13 | Updated: 2025-03-16 LLMの安全機構の解除攻撃手法生成モデル 2025.03.13 2025.05.12 Literature Database
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models Authors: Benji Peng, Keyu Chen, Qian Niu, Ziqian Bi, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K. Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin | Published: 2024-10-20 | Updated: 2025-05-08 LLMセキュリティLLMの安全機構の解除プロンプトインジェクション 2024.10.20 2025.05.12 Literature Database
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method Authors: Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | Published: 2024-09-23 | Updated: 2025-04-01 LLMの安全機構の解除モデル性能評価情報抽出 2024.09.23 2025.05.12 Literature Database