Abstract
Machine Learning as a Service (MLaaS) is enabling a wide range of smart
applications on end devices. However, privacy-preserved computation is still
expensive. Our investigation has found that the most time-consuming component
of the HE-based linear computation is a series of Permutation (Perm) operations
that are imperative for dot product and convolution in privacy-preserved MLaaS.
To this end, we propose GALA: Greedy computAtion for Linear Algebra in
privacy-preserved neural networks, which views the HE-based linear computation
as a series of Homomorphic Add, Mult and Perm operations and chooses the least
expensive operation in each linear computation step to reduce the overall cost.
GALA makes the following contributions: (1) It introduces a row-wise weight
matrix encoding and combines the share generation that is needed for the
GC-based nonlinear computation, to reduce the Perm operations for the dot
product; (2) It designs a first-Add-second-Perm approach (named kernel
grouping) to reduce Perm operations for convolution. As such, GALA efficiently
reduces the cost for the HE-based linear computation, which is a critical
building block in almost all of the recent frameworks for privacy-preserved
neural networks, including GAZELLE (Usenix Security'18), DELPHI (Usenix
Security'20), and CrypTFlow2 (CCS'20). With its deep optimization of the
HE-based linear computation, GALA can be a plug-and-play module integrated into
these systems to further boost their efficiency. Our experiments show that it
achieves a significant speedup up to 700x for the dot product and 14x for the
convolution computation under different data dimensions. Meanwhile, GALA
demonstrates an encouraging runtime boost by 2.5x, 2.7x, 3.2x, 8.3x, 7.7x, and
7.5x over GAZELLE and 6.5x, 6x, 5.7x, 4.5x, 4.2x, and 4.1x over CrypTFlow2, on
AlexNet, VGG, ResNet-18, ResNet-50, ResNet-101, and ResNet-152, respectively.