RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks Authors: Hanbo Huang, Yiran Zhang, Hao Zheng, Xuan Gong, Yihan Li, Lin Liu, Shiyu Liang | Published: 2025-09-25 Disabling Safety Mechanisms of LLMPrompt InjectionWatermark Design 2025.09.25 2025.09.27 Literature Database
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs Authors: Wence Ji, Jiancan Wu, Aiying Li, Shuyi Zhang, Junkang Wu, An Zhang, Xiang Wang, Xiangnan He | Published: 2025-09-24 Disabling Safety Mechanisms of LLMPrompt InjectionGenerative Model 2025.09.24 2025.09.26 Literature Database
Send to which account? Evaluation of an LLM-based Scambaiting System Authors: Hossein Siadati, Haadi Jafarian, Sima Jafarikhah | Published: 2025-09-10 Disabling Safety Mechanisms of LLMResearch Methodology詐欺対策 2025.09.10 2025.09.12 Literature Database
Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift Authors: Shuai Yuan, Zhibo Zhang, Yuxi Li, Guangdong Bai, Wang Kailong | Published: 2025-09-08 Disabling Safety Mechanisms of LLMCalculation of Output HarmfulnessAttack Detection Method 2025.09.08 2025.09.10 Literature Database
EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint Authors: Zhenhua Xu, Meng Han, Wenpeng Xing | Published: 2025-09-03 Disabling Safety Mechanisms of LLMData Protection MethodPrompt validation 2025.09.03 2025.09.05 Literature Database
Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes Authors: Zilong Lin, Zichuan Li, Xiaojing Liao, XiaoFeng Wang | Published: 2025-08-18 Disabling Safety Mechanisms of LLMData Generation MethodCalculation of Output Harmfulness 2025.08.18 2025.08.20 Literature Database
PRISON: Unmasking the Criminal Potential of Large Language Models Authors: Xinyi Wu, Geng Hong, Pei Chen, Yueyue Chen, Xudong Pan, Min Yang | Published: 2025-06-19 | Updated: 2025-08-04 Disabling Safety Mechanisms of LLM法執行回避Research Methodology 2025.06.19 2025.08.06 Literature Database
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji | Published: 2025-06-11 Disabling Safety Mechanisms of LLMPrompt InjectionAdversarial attack 2025.06.11 2025.06.13 Literature Database
Privacy and Security Threat for OpenAI GPTs Authors: Wei Wenying, Zhao Kaifa, Xue Lei, Fan Ming | Published: 2025-06-04 Disabling Safety Mechanisms of LLMPrivacy IssuesDefense Mechanism 2025.06.04 2025.06.06 Literature Database
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage Authors: Kalyan Nakka, Nitesh Saxena | Published: 2025-06-03 Disabling Safety Mechanisms of LLMDetection Rate of Phishing AttacksPrompt Injection 2025.06.03 2025.06.05 Literature Database