Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search Authors: Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia | Published: 2026-02-26 Prompt InjectionLarge Language Model脱獄手法 2026.02.26 2026.02.28 Literature Database
Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents Authors: Doron Shavit | Published: 2026-02-18 Large Language ModelAnalysis of Detection Methods評価メトリクス 2026.02.18 2026.02.20 Literature Database
DeepSight: An All-in-One LM Safety Toolkit Authors: Bo Zhang, Jiaxuan Guo, Lijun Li, Dongrui Liu, Sujin Chen, Guanxu Chen, Zhijie Zheng, Qihao Lin, Lewen Yan, Chen Qian, Yijin Zhou, Yuyao Wu, Shaoxiong Guo, Tianyi Du, Jingyi Yang, Xuhao Hu, Ziqi Miao, Xiaoya Lu, Jing Shao, Xia Hu | Published: 2026-02-12 Prompt InjectionLarge Language ModelEvaluation Method 2026.02.12 2026.02.14 Literature Database
Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing Authors: Jona te Lintelo, Lichao Wu, Stjepan Picek | Published: 2026-02-09 Prompt InjectionLarge Language Model安全性分析 2026.02.09 2026.02.11 Literature Database
BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models Authors: Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Guowen Xu | Published: 2026-02-05 LLM Performance Evaluationデータ毒性Large Language Model 2026.02.05 2026.02.07 Literature Database
How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks Authors: Yanshu Wang, Shuaishuai Yang, Jingjing He, Tong Yang | Published: 2026-02-04 LLM Performance EvaluationPrompt InjectionLarge Language Model 2026.02.04 2026.02.06 Literature Database
LLMs Can Unlearn Refusal with Only 1,000 Benign Samples Authors: Yangyang Guo, Ziwei Xu, Si Liu, Zhiming Zheng, Mohan Kankanhalli | Published: 2026-01-27 LLM活用Large Language Model安全性評価 2026.01.27 2026.01.29 Literature Database
SpatialJB: How Text Distribution Art Becomes the “Jailbreak Key” for LLM Guardrails Authors: Zhiyi Mou, Jingyuan Yang, Zeheng Qian, Wangze Ni, Tianfang Xiao, Ning Liu, Chen Zhang, Zhan Qin, Kui Ren | Published: 2026-01-14 LLM活用Prompt InjectionLarge Language Model 2026.01.14 2026.01.16 Literature Database
HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense Authors: Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li | Published: 2026-01-07 Prompt InjectionLarge Language ModelAdversarial Attack Detection 2026.01.07 2026.01.09 Literature Database
Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He | Published: 2026-01-07 Prompt InjectionLarge Language ModelAdversarial Attack Detection 2026.01.07 2026.01.09 Literature Database