Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization Authors: Xurui Li, Kaisong Song, Rui Zhu, Pin-Yu Chen, Haixu Tang | Published: 2025-11-24 Prompt InjectionLarge Language ModelMalicious Prompt 2025.11.24 2025.11.26 Literature Database
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation Authors: Junbo Zhang, Ran Chen, Qianli Zhou, Xinyang Deng, Wen Jiang | Published: 2025-11-24 Disabling Safety Mechanisms of LLMPrompt InjectionMalicious Prompt 2025.11.24 2025.11.26 Literature Database
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations Authors: Ryan Wong, Hosea David Yu Fei Ng, Dhananjai Sharma, Glenn Jun Jie Ng, Kavishvaran Srinivasan | Published: 2025-11-24 Ethical ConsiderationsLarge Language ModelMalicious Prompt 2025.11.24 2025.11.26 Literature Database
RoguePrompt: Dual-Layer Ciphering for Self-Reconstruction to Circumvent LLM Moderation Authors: Benyamin Tafreshian | Published: 2025-11-24 Indirect Prompt InjectionPrompt leakingMalicious Prompt 2025.11.24 2025.11.26 Literature Database
PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization Authors: Huseein Jawad, Nicolas Brunel | Published: 2025-11-20 Privacy-Preserving Data MiningPrompt leakingMalicious Prompt 2025.11.20 2025.11.22 Literature Database
Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security Authors: Hajun Kim, Hyunsik Na, Daeseon Choi | Published: 2025-11-18 Prompt EngineeringLarge Language ModelMalicious Prompt 2025.11.18 2025.11.20 Literature Database
SGuard-v1: Safety Guardrail for Large Language Models Authors: JoonHo Lee, HyeonMin Cho, Jaewoong Yun, Hyunjae Lee, JunKyu Lee, Juree Seok | Published: 2025-11-16 Prompt InjectionMalicious PromptAdaptive Misuse Detection 2025.11.16 2025.11.18 Literature Database
Better Privilege Separation for Agents by Restricting Data Types Authors: Dennis Jacob, Emad Alghamdi, Zhanhao Hu, Basel Alomair, David Wagner | Published: 2025-09-30 Indirect Prompt InjectionSecurity Strategy GenerationMalicious Prompt 2025.09.30 2025.10.02 Literature Database
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety Authors: Taegyeong Lee, Jeonghwa Yoo, Hyoungseo Cho, Soo Yong Kim, Yunho Maeng | Published: 2025-06-14 | Updated: 2025-09-30 AlignmentEthical StatementMalicious Prompt 2025.06.14 2025.10.02 Literature Database
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang | Published: 2025-03-23 Prompt InjectionMalicious PromptEffectiveness Analysis of Defense Methods 2025.03.23 2025.05.27 Literature Database