Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

TOP Literature Database Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2509.09091

PDF

https://arxiv.org/pdf/2509.09091

Paper Information

Author: Honglan Yu,Yibin Wang,Feifei Dai,Dong Liu,Haihui Fan,Xiaoyan Gu
Published: 9-11-2025
Affiliation: Institute of Information Engineering, Chinese Academy of Sciences
Country: China
Conference: International Conference on Database Systems for Advanced Applications (DASFAA)

Labels Estimated by AI

Differential Privacy Privacy Technique Algorithm

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

CPU-based trusted execution environments (TEEs) and differential privacy (DP) have gained wide applications for private inference. Due to high inference latency in TEEs, researchers use partition-based approaches that offload linear model components to GPUs. However, dense nonlinear layers of large language models (LLMs) result in significant communication overhead between TEEs and GPUs. DP-based approaches apply random noise to protect data privacy, but this compromises LLM performance and semantic understanding. To overcome the above drawbacks, this paper proposes CMIF, a Confidential and efficient Model Inference Framework. CMIF confidentially deploys the embedding layer in the client-side TEE and subsequent layers on GPU servers. Meanwhile, it optimizes the Report-Noisy-Max mechanism to protect sensitive inputs with a slight decrease in model performance. Extensive experiments on Llama-series models demonstrate that CMIF reduces additional inference overhead in TEEs while preserving user data privacy.

External Datasets

SST-2

QNLI

DialogSUM

IFEval