Adversarial attacks have exposed serious vulnerabilities in Deep Neural
Networks (DNNs) through their ability to force misclassifications through
human-imperceptible perturbations to DNN inputs. We explore a new direction in
the field of adversarial attacks by suggesting attacks that aim to degrade the
computational efficiency of DNNs rather than their classification accuracy.
Specifically, we propose and demonstrate sparsity attacks, which adversarial
modify a DNN's inputs so as to reduce sparsity (or the presence of zero values)
in its internal activation values. In resource-constrained systems, a wide
range of hardware and software techniques have been proposed that exploit
sparsity to improve DNN efficiency. The proposed attack increases the execution
time and energy consumption of sparsity-optimized DNN implementations, raising
concern over their deployment in latency and energy-critical applications.
We propose a systematic methodology to generate adversarial inputs for
sparsity attacks by formulating an objective function that quantifies the
network's activation sparsity, and minimizing this function using iterative
gradient-descent techniques. We launch both white-box and black-box versions of
adversarial sparsity attacks on image recognition DNNs and demonstrate that
they decrease activation sparsity by up to 1.82x. We also evaluate the impact
of the attack on a sparsity-optimized DNN accelerator and demonstrate
degradations up to 1.59x in latency, and also study the performance of the
attack on a sparsity-optimized general-purpose processor. Finally, we evaluate
defense techniques such as activation thresholding and input quantization and
demonstrate that the proposed attack is able to withstand them, highlighting
the need for further efforts in this new direction within the field of
adversarial machine learning.