These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The Key-Value (KV) cache, which stores intermediate attention computations
(Key and Value pairs) to avoid redundant calculations, is a fundamental
mechanism for accelerating Large Language Model (LLM) inference. However, this
efficiency optimization introduces significant yet underexplored privacy risks.
This paper provides the first comprehensive analysis of these vulnerabilities,
demonstrating that an attacker can reconstruct sensitive user inputs directly
from the KV-cache. We design and implement three distinct attack vectors: a
direct Inversion Attack, a more broadly applicable and potent Collision Attack,
and a semantic-based Injection Attack. These methods demonstrate the
practicality and severity of KV-cache privacy leakage issues. To mitigate this,
we propose KV-Cloak, a novel, lightweight, and efficient defense mechanism.
KV-Cloak uses a reversible matrix-based obfuscation scheme, combined with
operator fusion, to secure the KV-cache. Our extensive experiments show that
KV-Cloak effectively thwarts all proposed attacks, reducing reconstruction
quality to random noise. Crucially, it achieves this robust security with
virtually no degradation in model accuracy and minimal performance overhead,
offering a practical solution for trustworthy LLM deployment.