LLMの安全機構の解除

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs

Authors: Wence Ji, Jiancan Wu, Aiying Li, Shuyi Zhang, Junkang Wu, An Zhang, Xiang Wang, Xiangnan He | Published: 2025-09-24
LLMの安全機構の解除
プロンプトインジェクション
生成モデル

Send to which account? Evaluation of an LLM-based Scambaiting System

Authors: Hossein Siadati, Haadi Jafarian, Sima Jafarikhah | Published: 2025-09-10
LLMの安全機構の解除
研究方法論
詐欺対策

Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift

Authors: Shuai Yuan, Zhibo Zhang, Yuxi Li, Guangdong Bai, Wang Kailong | Published: 2025-09-08
LLMの安全機構の解除
出力の有害度の算出
攻撃検出手法

EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint

Authors: Zhenhua Xu, Meng Han, Wenpeng Xing | Published: 2025-09-03
LLMの安全機構の解除
データ保護手法
プロンプトの検証

Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes

Authors: Zilong Lin, Zichuan Li, Xiaojing Liao, XiaoFeng Wang | Published: 2025-08-18
LLMの安全機構の解除
データ生成手法
出力の有害度の算出

PRISON: Unmasking the Criminal Potential of Large Language Models

Authors: Xinyi Wu, Geng Hong, Pei Chen, Yueyue Chen, Xudong Pan, Min Yang | Published: 2025-06-19 | Updated: 2025-08-04
LLMの安全機構の解除
法執行回避
研究方法論

LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji | Published: 2025-06-11
LLMの安全機構の解除
プロンプトインジェクション
敵対的攻撃

Privacy and Security Threat for OpenAI GPTs

Authors: Wei Wenying, Zhao Kaifa, Xue Lei, Fan Ming | Published: 2025-06-04
LLMの安全機構の解除
プライバシー問題
防御メカニズム

BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

Authors: Kalyan Nakka, Nitesh Saxena | Published: 2025-06-03
LLMの安全機構の解除
フィッシング攻撃の検出率
プロンプトインジェクション

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Authors: Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei | Published: 2025-05-27
LLMの安全機構の解除
プロンプトインジェクション
攻撃の評価