大規模言語モデル

Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate

Authors: Senmao Qi, Yifei Zou, Peng Li, Ziyi Lin, Xiuzhen Cheng, Dongxiao Yu | Published: 2025-04-23

インダイレクトプロンプトインジェクション

マルチラウンド対話

大規模言語モデル

2025.04.23

文献データベース

Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey

Authors: Shuang Tian, Tao Zhang, Jiqiang Liu, Jiacheng Wang, Xuangou Wu, Xiaoqiang Zhu, Ruichen Zhang, Weiting Zhang, Zhenhui Yuan, Shiwen Mao, Dong In Kim | Published: 2025-04-22

インダイレクトプロンプトインジェクション

プロンプトインジェクション

大規模言語モデル

2025.04.22

文献データベース

CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis

Authors: Sofia Della Penna, Roberto Natella, Vittorio Orbinato, Lorenzo Parracino, Luciano Pianese | Published: 2025-04-08

LLMの応用

モデル性能評価

大規模言語モデル

2025.04.08

文献データベース

Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking

Authors: Yu-Hang Wu, Yu-Jie Xiong, Jie-Zhang | Published: 2025-04-08

LLMの応用

プロンプトインジェクション

大規模言語モデル

2025.04.08

文献データベース

PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization

Authors: Aofan Liu, Lulu Tang, Ting Pan, Yuguo Yin, Bin Wang, Ao Yang | Published: 2025-04-02

モデル性能評価

大規模言語モデル

透かし

2025.04.02

文献データベース

Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing

Authors: Johan Wahréus, Ahmed Hussain, Panos Papadimitratos | Published: 2025-03-27

システム開発

プロンプトインジェクション

大規模言語モデル

2025.03.27 2025.04.03

文献データベース

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun | Published: 2025-03-20

バックドア攻撃

プロンプトインジェクション

大規模言語モデル

2025.03.20 2025.04.03

文献データベース

Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao | Published: 2025-03-19

プロンプトインジェクション

大規模言語モデル

攻撃手法

2025.03.19 2025.04.03

文献データベース

MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting

Authors: Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang | Published: 2025-03-17

プロンプトインジェクション

大規模言語モデル

攻撃手法

2025.03.17 2025.04.03

文献データベース

Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application

Authors: Wenzhuo Xu, Zhipeng Wei, Xiongtao Sun, Zonghao Ying, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Quanchen Zou | Published: 2025-03-10 | Updated: 2025-07-31

プロンプトインジェクション

大規模言語モデル

透かし技術の堅牢性

2025.03.10

文献データベース