LLMの安全機構の解除

LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware

Authors: Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar | Published: 2025-04-09
LLMの安全機構の解除
フレームワーク
効率的な構成検証

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms

Authors: Shuoming Zhang, Jiacheng Zhao, Ruiyuan Xu, Xiaobing Feng, Huimin Cui | Published: 2025-03-31
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14
LLMの安全機構の解除
プロンプトインジェクション
悪意のあるプロンプト

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

Authors: Andy Zhou | Published: 2025-03-13 | Updated: 2025-03-16
LLMの安全機構の解除
攻撃手法
生成モデル

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Authors: Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | Published: 2024-09-23 | Updated: 2025-04-01
LLMの安全機構の解除
モデル性能評価
情報抽出