MPC-Minimized Secure LLM Inference

TOP Literature Database MPC-Minimized Secure LLM Inference

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2408.03561

PDF

https://arxiv.org/pdf/2408.03561

Paper Information

Author: Deevashwer Rathee;Dacheng Li;Ion Stoica;Hao Zhang;Raluca Popa
Published: 8-7-2024
Affiliation: UC Berkeley
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

MPC Algorithm LLM Performance Evaluation Model Performance Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Many inference services based on large language models (LLMs) pose a privacy concern, either revealing user prompts to the service or the proprietary weights to the user. Secure inference offers a solution to this problem through secure multi-party computation (MPC), however, it is still impractical for modern LLM workload due to the large overhead imposed by MPC. To address this overhead, we propose Marill, a framework that adapts LLM fine-tuning to minimize MPC usage during secure inference. Marill introduces high-level architectural changes during fine-tuning that significantly reduce the number of expensive operations needed within MPC during inference, by removing some and relocating others outside MPC without compromising security. As a result, Marill-generated models are more efficient across all secure inference protocols and our approach complements MPC-friendly approximations for such operations. Compared to standard fine-tuning, Marill results in 3.6-11.3x better runtime and 2.4-6.9x better communication during secure inference across various MPC settings, while typically preserving over 90% performance across downstream tasks.

External Datasets

ShareGPT

MTBench

MagiCoder

HumanEval

ParroT

WMT22