Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

TOP Literature Database Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2407.14937

PDF

https://arxiv.org/pdf/2407.14937

Paper Information

Author: Apurv Verma,Satyapriya Krishna,Sebastian Gehrmann,Madhavan Seshadri,Anu Pradhan,Tom Ault,Leslie Barrett,David Rabinowitz,John Doucette,NhatHai Phan
Published: 7-21-2024
Updated: 7-11-2025
Affiliation: Bloomberg
Country: United States of America
Conference: Trans. Mach. Learn. Res.

Labels Estimated by AI

Prompt validation Prompt Injection Adversarial attack

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Creating secure and resilient applications with large language models (LLM) requires anticipating, adjusting to, and countering unforeseen threats. Red-teaming has emerged as a critical technique for identifying vulnerabilities in real-world LLM implementations. This paper presents a detailed threat model and provides a systematization of knowledge (SoK) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners. By delineating prominent attack motifs and shedding light on various entry points, this paper provides a framework for improving the security and robustness of LLM-based systems.

External Datasets

GPT-3

ChatGPT

FlanT5-XXL

HackAPrompt