LLMの安全機構の解除

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

Authors: Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin | Published: 2025-05-12
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

Authors: Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Vinod P | Published: 2025-04-30
LLMの安全機構の解除
プロンプトインジェクション
説明手法

LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware

Authors: Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar | Published: 2025-04-09
LLMの安全機構の解除
フレームワーク
効率的な構成検証

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms

Authors: Shuoming Zhang, Jiacheng Zhao, Ruiyuan Xu, Xiaobing Feng, Huimin Cui | Published: 2025-03-31
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14
LLMの安全機構の解除
プロンプトインジェクション
悪意のあるプロンプト

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

Authors: Andy Zhou | Published: 2025-03-13 | Updated: 2025-03-16
LLMの安全機構の解除
攻撃手法
生成モデル

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

Authors: Benji Peng, Keyu Chen, Qian Niu, Ziqian Bi, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K. Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin | Published: 2024-10-20 | Updated: 2025-05-08
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Authors: Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | Published: 2024-09-23 | Updated: 2025-04-01
LLMの安全機構の解除
モデル性能評価
情報抽出