Don’t Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw Authors: Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu | Published: 2026-03-11 Indirect Prompt InjectionPrompt Injection安全性分析 2026.03.11 2026.03.13 Literature Database
Is Reasoning Capability Enough for Safety in Long-Context Language Models? Authors: Yu Fu, Haz Sameen Shahgir, Huanli Gong, Zhipeng Wei, N. Benjamin Erichson, Yue Dong | Published: 2026-02-09 Hallucination安全性分析推論能力 2026.02.09 2026.02.11 Literature Database
Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing Authors: Jona te Lintelo, Lichao Wu, Stjepan Picek | Published: 2026-02-09 Prompt InjectionLarge Language Model安全性分析 2026.02.09 2026.02.11 Literature Database
Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs Authors: Yukun Jiang, Hai Huang, Mingjie Li, Yage Zhang, Michael Backes, Yang Zhang | Published: 2026-02-09 Sparsity DefensePrompt Injection安全性分析 2026.02.09 2026.02.11 Literature Database
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning Authors: Zhengyue Zhao, Yingzi Ma, Somesh Jha, Marco Pavone, Patrick McDaniel, Chaowei Xiao | Published: 2025-07-14 | Updated: 2025-10-20 Large Language Model安全性分析評価基準 2025.07.14 2025.10.22 Literature Database