The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs Authors: Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu | Published: 2025-06-06 | Updated: 2025-10-30 AlignmentLarge Language Model安全性評価 2025.06.06 2025.11.01 Literature Database
A Red Teaming Roadmap Towards System-Level Safety Authors: Zifan Wang, Christina Q. Knight, Jeremy Kritz, Willow E. Primack, Julian Michael | Published: 2025-05-30 | Updated: 2025-06-09 Model DoSLarge Language Model製品安全性 2025.05.30 2025.06.11 Literature Database
SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Fernando Koch, Walid Saad, Holger Boche | Published: 2025-05-29 | Updated: 2025-10-27 Prompt InjectionLarge Language Model安全性評価 2025.05.29 2025.10.29 Literature Database
Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models Authors: Yongcan Yu, Yanbo Wang, Ran He, Jian Liang | Published: 2025-05-28 LLM SecurityPrompt InjectionLarge Language Model 2025.05.28 2025.05.30 Literature Database
Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities Authors: Anton Tkachenko, Dmitrij Suskevic, Benjamin Adolphi | Published: 2025-05-26 Model evaluation methodsLarge Language ModelWatermarking Technology 2025.05.26 2025.05.28 Literature Database
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs Authors: Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee | Published: 2025-05-26 Prompt InjectionModel Performance EvaluationLarge Language Model 2025.05.26 2025.05.28 Literature Database
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval Authors: Taiye Chen, Zeming Wei, Ang Li, Yisen Wang | Published: 2025-05-21 RAGLarge Language ModelDefense Mechanism 2025.05.21 2025.05.28 Literature Database
sudoLLM : On Multi-role Alignment of Language Models Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20 AlignmentPrompt InjectionLarge Language Model 2025.05.20 2025.05.28 Literature Database
Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration Authors: Tatia Tsmindashvili, Ana Kolkhidashvili, Dachi Kurtskhalia, Nino Maghlakelidze, Elene Mekvabishvili, Guram Dentoshvili, Orkhan Shamilov, Zaal Gachechiladze, Steven Saporta, David Dachi Choladze | Published: 2025-05-18 | Updated: 2025-08-11 Prompt InjectionLarge Language ModelPerformance Evaluation Method 2025.05.18 2025.08.13 Literature Database
Dark LLMs: The Growing Threat of Unaligned AI Models Authors: Michael Fire, Yitzhak Elbazis, Adi Wasenstein, Lior Rokach | Published: 2025-05-15 Disabling Safety Mechanisms of LLMPrompt InjectionLarge Language Model 2025.05.15 2025.05.28 Literature Database