These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Models (LLMs) that can be deployed locally have recently
gained popularity for privacy-sensitive tasks, with companies such as Meta,
Google, and Intel playing significant roles in their development. However, the
security of local LLMs through the lens of hardware cache side-channels remains
unexplored. In this paper, we unveil novel side-channel vulnerabilities in
local LLM inference: token value and token position leakage, which can expose
both the victim's input and output text, thereby compromising user privacy.
Specifically, we found that adversaries can infer the token values from the
cache access patterns of the token embedding operation, and deduce the token
positions from the timing of autoregressive decoding phases. To demonstrate the
potential of these leaks, we design a novel eavesdropping attack framework
targeting both open-source and proprietary LLM inference systems. The attack
framework does not directly interact with the victim's LLM and can be executed
without privilege.
We evaluate the attack on a range of practical local LLM deployments (e.g.,
Llama, Falcon, and Gemma), and the results show that our attack achieves
promising accuracy. The restored output and input text have an average edit
distance of 5.2% and 17.3% to the ground truth, respectively. Furthermore, the
reconstructed texts achieve average cosine similarity scores of 98.7% (input)
and 98.0% (output).