These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As object detection models are increasingly deployed in cyber-physical
systems such as autonomous vehicles (AVs) and surveillance platforms, ensuring
their security against adversarial threats is essential. While prior work has
explored adversarial attacks in the image domain, those attacks in the video
domain remain largely unexamined, especially in the no-box setting. In this
paper, we present {\alpha}-Cloak, the first no-box adversarial attack on object
detectors that operates entirely through the alpha channel of RGBA videos.
{\alpha}-Cloak exploits the alpha channel to fuse a malicious target video with
a benign video, resulting in a fused video that appears innocuous to human
viewers but consistently fools object detectors. Our attack requires no access
to model architecture, parameters, or outputs, and introduces no perceptible
artifacts. We systematically study the support for alpha channels across common
video formats and playback applications, and design a fusion algorithm that
ensures visual stealth and compatibility. We evaluate {\alpha}-Cloak on five
state-of-the-art object detectors, a vision-language model, and a multi-modal
large language model (Gemini-2.0-Flash), demonstrating a 100% attack success
rate across all scenarios. Our findings reveal a previously unexplored
vulnerability in video-based perception systems, highlighting the urgent need
for defenses that account for the alpha channel in adversarial settings.