安全性分析

Don’t Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

Authors: Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu | Published: 2026-03-11
Indirect Prompt Injection
Prompt Injection
安全性分析

Is Reasoning Capability Enough for Safety in Long-Context Language Models?

Authors: Yu Fu, Haz Sameen Shahgir, Huanli Gong, Zhipeng Wei, N. Benjamin Erichson, Yue Dong | Published: 2026-02-09
Hallucination
安全性分析
推論能力

Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing

Authors: Jona te Lintelo, Lichao Wu, Stjepan Picek | Published: 2026-02-09
Prompt Injection
Large Language Model
安全性分析

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

Authors: Yukun Jiang, Hai Huang, Mingjie Li, Yage Zhang, Michael Backes, Yang Zhang | Published: 2026-02-09
Sparsity Defense
Prompt Injection
安全性分析

ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning

Authors: Zhengyue Zhao, Yingzi Ma, Somesh Jha, Marco Pavone, Patrick McDaniel, Chaowei Xiao | Published: 2025-07-14 | Updated: 2025-10-20
Large Language Model
安全性分析
評価基準