These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Resource Consumption Attacks (RCAs) have emerged as a significant threat to
the deployment of Large Language Models (LLMs). With the integration of vision
modalities, additional attack vectors exacerbate the risk of RCAs in large
vision-language models (LVLMs). However, existing red-teaming studies have
largely overlooked visual inputs as a potential attack surface, resulting in
insufficient mitigation strategies against RCAs in LVLMs. To address this gap,
we propose RECALLED (\textbf{RE}source \textbf{C}onsumption \textbf{A}ttack on
\textbf{L}arge Vision-\textbf{L}anguag\textbf{E} Mo\textbf{D}els), the first
approach for exploiting visual modalities to trigger unbounded RCAs
red-teaming. First, we present \textit{Vision Guided Optimization}, a
fine-grained pixel-level optimization, to obtain \textit{Output Recall}
adversarial perturbations, which can induce repeating output. Then, we inject
the perturbations into visual inputs, triggering unbounded generations to
achieve the goal of RCAs. Additionally, we introduce \textit{Multi-Objective
Parallel Losses} to generate universal attack templates and resolve
optimization conflicts when intending to implement parallel attacks. Empirical
results demonstrate that RECALLED increases service response latency by over 26
$\uparrow$, resulting in an additional 20\% increase in GPU utilization and
memory consumption. Our study exposes security vulnerabilities in LVLMs and
establishes a red-teaming framework that can facilitate future defense
development against RCAs.