AIセキュリティポータル K Program
MPC-Minimized Secure LLM Inference
Share
Abstract
Many inference services based on large language models (LLMs) pose a privacy concern, either revealing user prompts to the service or the proprietary weights to the user. Secure inference offers a solution to this problem through secure multi-party computation (MPC), however, it is still impractical for modern LLM workload due to the large overhead imposed by MPC. To address this overhead, we propose Marill, a framework that adapts LLM fine-tuning to minimize MPC usage during secure inference. Marill introduces high-level architectural changes during fine-tuning that significantly reduce the number of expensive operations needed within MPC during inference, by removing some and relocating others outside MPC without compromising security. As a result, Marill-generated models are more efficient across all secure inference protocols and our approach complements MPC-friendly approximations for such operations. Compared to standard fine-tuning, Marill results in 3.6-11.3x better runtime and 2.4-6.9x better communication during secure inference across various MPC settings, while typically preserving over 90% performance across downstream tasks.
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, Olivier Bachem
Published: 2024
HELiKs: HE Linear Algebra Kernels for Secure Inference
Shashank Balla, Farinaz Koushanfar
Published: 2023
On attention redundancy: A comprehensive study
Bian, Y., Huang, J., Cai, X., Yuan, J., Church, K.
Published: 2021
Security and composition of multiparty cryptographic protocols
R. Canetti
Published: 2000
THE-X: privacy-preserving transformer inference with homomorphic encryption
Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, Furu Wei
Published: 2022
Share