SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

TOP Literature Database SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2510.15476

PDF

https://arxiv.org/pdf/2510.15476

Paper Information

Author: Hanbin Hong,Shuya Feng,Nima Naderloui,Shenao Yan,Jingyu Zhang,Biying Liu,Ali Arastehfard,Heqing Huang,Yuan Hong
Published: 10-17-2025
Updated: 10-21-2025
Affiliation: University of Connecticut
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Large Language Model LLM Security シナリオベースの悪用(Fail to translate)

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date;\footnote{The dataset is released at \href{https://huggingface.co/datasets/youbin2014/JailbreakDB}{\textcolor{purple}{https://huggingface.co/datasets/youbin2014/JailbreakDB}}.} and (5) presenting a comprehensive evaluation platform and leaderboard of state-of-the-art methods \footnote{will be released soon.}. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.

External Datasets

JAILBREAKDB

HarmBench

JailbreakBench

AIR-Bench-2024