What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

TOP Literature Database What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.19773

PDF

https://arxiv.org/pdf/2505.19773

Paper Information

Author: Sangyeop Kim,Yohan Lee,Yongwoo Song,Kimin Lee
Published: 5-26-2025
Affiliation: Coxwave
Country: Korea
Conference

Labels Estimated by AI

Large Language Model Model Performance Evaluation Prompt Injection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

We investigate long-context vulnerabilities in Large Language Models (LLMs) through Many-Shot Jailbreaking (MSJ). Our experiments utilize context length of up to 128K tokens. Through comprehensive analysis with various many-shot attack settings with different instruction styles, shot density, topic, and format, we reveal that context length is the primary factor determining attack effectiveness. Critically, we find that successful attacks do not require carefully crafted harmful content. Even repetitive shots or random dummy text can circumvent model safety measures, suggesting fundamental limitations in long-context processing capabilities of LLMs. The safety behavior of well-aligned models becomes increasingly inconsistent with longer contexts. These findings highlight significant safety gaps in context expansion capabilities of LLMs, emphasizing the need for new safety mechanisms.