Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models Authors: Yongcan Yu, Yanbo Wang, Ran He, Jian Liang | Published: 2025-05-28 LLM SecurityPrompt InjectionLarge Language Model 2025.05.28 2025.05.30 Literature Database
Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities Authors: Anton Tkachenko, Dmitrij Suskevic, Benjamin Adolphi | Published: 2025-05-26 Model evaluation methodsLarge Language ModelWatermarking Technology 2025.05.26 2025.05.28 Literature Database
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs Authors: Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee | Published: 2025-05-26 Prompt InjectionModel Performance EvaluationLarge Language Model 2025.05.26 2025.05.28 Literature Database
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval Authors: Taiye Chen, Zeming Wei, Ang Li, Yisen Wang | Published: 2025-05-21 RAGLarge Language ModelDefense Mechanism 2025.05.21 2025.05.28 Literature Database
sudoLLM : On Multi-role Alignment of Language Models Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20 AlignmentPrompt InjectionLarge Language Model 2025.05.20 2025.05.28 Literature Database
Dark LLMs: The Growing Threat of Unaligned AI Models Authors: Michael Fire, Yitzhak Elbazis, Adi Wasenstein, Lior Rokach | Published: 2025-05-15 Disabling Safety Mechanisms of LLMPrompt InjectionLarge Language Model 2025.05.15 2025.05.28 Literature Database
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data Authors: Adel ElZemity, Budi Arief, Shujun Li | Published: 2025-05-15 LLM SecurityPrompt InjectionLarge Language Model 2025.05.15 2025.05.28 Literature Database
Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis Authors: Hudan Studiawan, Frank Breitinger, Mark Scanlon | Published: 2025-05-06 LLM Performance EvaluationLarge Language ModelEvaluation Method 2025.05.06 2025.05.27 Literature Database
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation Authors: Madhur Jindal, Hari Shrawgi, Parag Agrawal, Sandipan Dandapat | Published: 2025-04-28 User Identification SystemLarge Language ModelTrade-Off Between Safety And Usability 2025.04.28 2025.05.27 Literature Database
Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate Authors: Senmao Qi, Yifei Zou, Peng Li, Ziyi Lin, Xiuzhen Cheng, Dongxiao Yu | Published: 2025-04-23 Indirect Prompt InjectionMulti-Round DialogueLarge Language Model 2025.04.23 2025.05.27 Literature Database