This paper introduces a vision of confidential prompting: securing user
prompts from untrusted, cloud-hosted large language model (LLM) provider while
preserving model confidentiality, output invariance, and compute efficiency. As
a first step toward this vision, we present Obfuscated Secure Partitioned
Decoding (OSPD), a system built on two key innovations. First, Secure
Partitioned Decoding (SPD) isolates user prompts within per-user processes
residing in a confidential virtual machine (CVM) on the cloud, which are
inaccessible for the cloud LLM while allowing it to generate tokens
efficiently. Second, Prompt Obfuscation (PO) introduces a novel cryptographic
technique that enhances SPD resilience against advanced prompt reconstruction
attacks. Together, these innovations ensure OSPD protects both prompt and model
confidentiality while maintaining service functionality. OSPD enables
practical, privacy-preserving cloud-hosted LLM inference for sensitive
applications, such as processing personal data, clinical records, and financial
documents.