LLMの安全機構の解除

I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference

Authors: Zibo Gao, Junjie Hu, Feng Guo, Yixin Zhang, Yinglong Han, Siyuan Liu, Haiyang Li, Zhiqiang Lv | Published: 2025-05-10 | Updated: 2025-05-14
LLMの安全機構の解除
プロンプトリーキング
攻撃検出手法

Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs

Authors: Chetan Pathade | Published: 2025-05-07 | Updated: 2025-05-13
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

Authors: Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Vinod P | Published: 2025-04-30
LLMの安全機構の解除
プロンプトインジェクション
説明手法

LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware

Authors: Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar | Published: 2025-04-09
LLMの安全機構の解除
フレームワーク
効率的な構成検証

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms

Authors: Shuoming Zhang, Jiacheng Zhao, Ruiyuan Xu, Xiaobing Feng, Huimin Cui | Published: 2025-03-31
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14
LLMの安全機構の解除
プロンプトインジェクション
悪意のあるプロンプト

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

Authors: Andy Zhou | Published: 2025-03-13 | Updated: 2025-03-16
LLMの安全機構の解除
攻撃手法
生成モデル

CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning

Authors: Adel ElZemity, Budi Arief, Shujun Li | Published: 2025-03-12 | Updated: 2025-09-17
LLMの安全機構の解除
セキュリティ分析
プロンプトインジェクション

A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos

Authors: Yang Yao, Xuan Tong, Ruofan Wang, Yixu Wang, Lujundong Li, Liang Liu, Yan Teng, Yingchun Wang | Published: 2025-02-19 | Updated: 2025-06-03
LLMの安全機構の解除
倫理的考慮
大規模言語モデル

QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language

Authors: Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang | Published: 2025-02-13 | Updated: 2025-05-26
LLMの安全機構の解除
プロンプトリーキング
教育的分析