These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Multi-GPU systems are becoming increasingly important in highperformance
computing (HPC) and cloud infrastructure, providing acceleration for
data-intensive applications, including machine learning workloads. These
systems consist of multiple GPUs interconnected through high-speed networking
links such as NVIDIA's NVLink. In this work, we explore whether the
interconnect on such systems can offer a novel source of leakage, enabling new
forms of covert and side-channel attacks. Specifically, we reverse engineer the
operations of NVlink and identify two primary sources of leakage: timing
variations due to contention and accessible performance counters that disclose
communication patterns. The leakage is visible remotely and even across VM
instances in the cloud, enabling potentially dangerous attacks. Building on
these observations, we develop two types of covert-channel attacks across two
GPUs, achieving a bandwidth of over 70 Kbps with an error rate of 4.78% for the
contention channel. We develop two end-to-end crossGPU side-channel attacks:
application fingerprinting (including 18 high-performance computing and deep
learning applications) and 3D graphics character identification within Blender,
a multi-GPU rendering application. These attacks are highly effective,
achieving F1 scores of up to 97.78% and 91.56%, respectively. We also discover
that leakage surprisingly occurs across Virtual Machines on the Google Cloud
Platform (GCP) and demonstrate a side-channel attack on Blender, achieving F1
scores exceeding 88%. We also explore potential defenses such as managing
access to counters and reducing the resolution of the clock to mitigate the two
sources of leakage.