On the Feasibility of Hijacking MLLMs’ Decision Chain via One Perturbation Authors: Changyue Li, Jiaying Li, Youliang Yuan, Jiaming He, Zhicong Huang, Pinjia He | Published: 2025-11-25 2025.11.25 文献データベース
Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization Authors: Xurui Li, Kaisong Song, Rui Zhu, Pin-Yu Chen, Haixu Tang | Published: 2025-11-24 2025.11.24 文献データベース
Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion Authors: Yu Cui, Yifei Liu, Hang Fu, Sicheng Pan, Haibin Zhang, Cong Zuo, Licheng Wang | Published: 2025-11-24 2025.11.24 文献データベース
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation Authors: Junbo Zhang, Ran Chen, Qianli Zhou, Xinyang Deng, Wen Jiang | Published: 2025-11-24 2025.11.24 文献データベース
LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models Authors: Muhammad Usman Shahid, Chuadhry Mujeeb Ahmed, Rajiv Ranjan | Published: 2025-11-24 2025.11.24 文献データベース
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations Authors: Ryan Wong, Hosea David Yu Fei Ng, Dhananjai Sharma, Glenn Jun Jie Ng, Kavishvaran Srinivasan | Published: 2025-11-24 2025.11.24 文献データベース
RoguePrompt: Dual-Layer Ciphering for Self-Reconstruction to Circumvent LLM Moderation Authors: Benyamin Tafreshian | Published: 2025-11-24 2025.11.24 文献データベース
Evaluation of Real-Time Mitigation Techniques for Cyber Security in IEC 61850 / IEC 62351 Substations Authors: Akila Herath, Chen-Ching Liu, Junho Hong, Kuchan Park | Published: 2025-11-24 2025.11.24 文献データベース
Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic Authors: Mostafa Mozafari, Farooq Ahmad Wani, Maria Sofia Bucarelli, Fabrizio Silvestri | Published: 2025-11-24 2025.11.24 文献データベース
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security Authors: Wei Zhao, Zhe Li, Yige Li, Jun Sun | Published: 2025-11-20 2025.11.20 文献データベース