Efficient Decoding Methods for Language Models on Encrypted Data

TOP Literature Database Efficient Decoding Methods for Language Models on Encrypted Data

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2509.08383

PDF

https://arxiv.org/pdf/2509.08383

Paper Information

Author: Matan Avitan,Moran Baruch,Nir Drucker,Itamar Zimerman,Yoav Goldberg
Published: 9-10-2025
Affiliation: IBM Research
Country: Israel
Conference: IJCNLP-AACL

Labels Estimated by AI

HEサンプリング手法(Fail to translate) Efficiency Evaluation Probability distribution

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large language models (LLMs) power modern AI applications, but processing sensitive data on untrusted servers raises privacy concerns. Homomorphic encryption (HE) enables computation on encrypted data for secure inference. However, neural text generation requires decoding methods like argmax and sampling, which are non-polynomial and thus computationally expensive under encryption, creating a significant performance bottleneck. We introduce cutmax, an HE-friendly argmax algorithm that reduces ciphertext operations compared to prior methods, enabling practical greedy decoding under encryption. We also propose the first HE-compatible nucleus (top-p) sampling method, leveraging cutmax for efficient stochastic decoding with provable privacy guarantees. Both techniques are polynomial, supporting efficient inference in privacy-preserving settings. Moreover, their differentiability facilitates gradient-based sequence-level optimization as a polynomial alternative to straight-through estimators. We further provide strong theoretical guarantees for cutmax, proving it converges globally to a unique two-level fixed point, independent of the input values beyond the identity of the maximizer, which explains its rapid convergence in just a few iterations. Evaluations on realistic LLM outputs show latency reductions of 24x-35x over baselines, advancing secure text generation.

External Datasets

ccdv/arxiv-summarization