Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval Authors: Taiye Chen, Zeming Wei, Ang Li, Yisen Wang | Published: 2025-05-21 RAGLarge Language ModelDefense Mechanism 2025.05.21 2025.05.28 Literature Database
sudoLLM : On Multi-role Alignment of Language Models Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20 AlignmentPrompt InjectionLarge Language Model 2025.05.20 2025.05.28 Literature Database
Dark LLMs: The Growing Threat of Unaligned AI Models Authors: Michael Fire, Yitzhak Elbazis, Adi Wasenstein, Lior Rokach | Published: 2025-05-15 Disabling Safety Mechanisms of LLMPrompt InjectionLarge Language Model 2025.05.15 2025.05.28 Literature Database
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data Authors: Adel ElZemity, Budi Arief, Shujun Li | Published: 2025-05-15 LLM SecurityPrompt InjectionLarge Language Model 2025.05.15 2025.05.28 Literature Database
Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis Authors: Hudan Studiawan, Frank Breitinger, Mark Scanlon | Published: 2025-05-06 LLM Performance EvaluationLarge Language ModelEvaluation Method 2025.05.06 2025.05.27 Literature Database
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation Authors: Madhur Jindal, Hari Shrawgi, Parag Agrawal, Sandipan Dandapat | Published: 2025-04-28 User Identification SystemLarge Language ModelTrade-Off Between Safety And Usability 2025.04.28 2025.05.27 Literature Database
Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate Authors: Senmao Qi, Yifei Zou, Peng Li, Ziyi Lin, Xiuzhen Cheng, Dongxiao Yu | Published: 2025-04-23 Indirect Prompt InjectionMulti-Round DialogueLarge Language Model 2025.04.23 2025.05.27 Literature Database
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey Authors: Shuang Tian, Tao Zhang, Jiqiang Liu, Jiacheng Wang, Xuangou Wu, Xiaoqiang Zhu, Ruichen Zhang, Weiting Zhang, Zhenhui Yuan, Shiwen Mao, Dong In Kim | Published: 2025-04-22 | Updated: 2025-04-28 Indirect Prompt InjectionPrompt InjectionLarge Language Model 2025.04.22 2025.05.27 Literature Database
CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis Authors: Sofia Della Penna, Roberto Natella, Vittorio Orbinato, Lorenzo Parracino, Luciano Pianese | Published: 2025-04-08 LLM ApplicationModel Performance EvaluationLarge Language Model 2025.04.08 2025.05.27 Literature Database
Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking Authors: Yu-Hang Wu, Yu-Jie Xiong, Jie-Zhang | Published: 2025-04-08 LLM ApplicationPrompt InjectionLarge Language Model 2025.04.08 2025.05.27 Literature Database