Large Language Model

Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models

Authors: Arjun Krishna, Aaditya Rastogi, Erick Galinkin | Published: 2025-06-16
Prompt Injection
Large Language Model
Adversarial Attack Methods

Can We Infer Confidential Properties of Training Data from LLMs?

Authors: Penguin Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri | Published: 2025-06-12
Privacy Enhancing Technology
医療診断属性
Large Language Model

Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures

Authors: Yukai Zhou, Sibei Yang, Wenjie Wang | Published: 2025-06-09
Cooperative Effects with LLM
Cyber Threat
Large Language Model

A Red Teaming Roadmap Towards System-Level Safety

Authors: Zifan Wang, Christina Q. Knight, Jeremy Kritz, Willow E. Primack, Julian Michael | Published: 2025-05-30 | Updated: 2025-06-09
Model DoS
Large Language Model
製品安全性

Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models

Authors: Yongcan Yu, Yanbo Wang, Ran He, Jian Liang | Published: 2025-05-28
LLM Security
Prompt Injection
Large Language Model

Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities

Authors: Anton Tkachenko, Dmitrij Suskevic, Benjamin Adolphi | Published: 2025-05-26
Model evaluation methods
Large Language Model
Watermarking Technology

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

Authors: Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee | Published: 2025-05-26
Prompt Injection
Model Performance Evaluation
Large Language Model

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

Authors: Taiye Chen, Zeming Wei, Ang Li, Yisen Wang | Published: 2025-05-21
RAG
Large Language Model
Defense Mechanism

sudoLLM : On Multi-role Alignment of Language Models

Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20
Alignment
Prompt Injection
Large Language Model

Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration

Authors: Tatia Tsmindashvili, Ana Kolkhidashvili, Dachi Kurtskhalia, Nino Maghlakelidze, Elene Mekvabishvili, Guram Dentoshvili, Orkhan Shamilov, Zaal Gachechiladze, Steven Saporta, David Dachi Choladze | Published: 2025-05-18 | Updated: 2025-08-11
Prompt Injection
Large Language Model
Performance Evaluation Method